Open science, open data
Lidia Stępińska-Ustasiak
Open Science Platform, ICM, University of Warsaw
“Open science is the idea that scientific knowledge of all kinds should
be openly shared as early as is practical in the discovery process.”
Michael Nielsen
Outline
• Openess in science
• Open research data
• Definitions
• Formats
• Levels of openess
• Depositing data
• Open Access and open research data pilot in Horizon2020
• CC licences in science
• Research Data Managment
4th Paradigm
• Empirical - describing natural
phenomena (last millenia)
• Theoretical - building models
and generalisations (last
centuries)
• Computational - simulating
complex phenomena (last
decades)
• Data Exploration “data-
intensive” scientific discovery
(last years)
Scholarly communication is changing
Traditional
system has
limitations
Costs of
subscriptions
are growing
New
technologies
develop very
fast
Production of
knowledge is
also very fast
Open Access
is the answer
Open
science
Open access
Open data
Open source
Open educational
resources
Open peer review
Open notebook
science
Citizen science
What does the EC understand by the OA?
• Online access at no charge to the
user
• To peer reviewed scientific
publications
• To scientific data
• Two main publishing business
models
• Self archiving – deposit manuscripts
& immediate/delayed OA provided by
autho (green OA)
• OA publishing – costs covered &
immediate OA provided by publisher
(gold model) e.g. „author pay” model
(APC)
Open access can be defined as the
practice of providing on-line access
to scientific information that is free
of charge to the end-user.
Objective
• The EC goal is to optimize the impact of research in Europe.
Expected benefits:
• Better and more efficient science (Science 2.0)
• Economic growth
• Broader, faster, more transparent and equal access for the benefit of
researchers, industry and citizens. (Responsible Research and
Innovations)
European Commission (2013):
„Open access can be defined as the practice of providing on-line access to
scientificinformation that is free of charge to the end-user and that is re-usable.
In the context of research and innovation, 'scientific information' can refer to
(i) peer-reviewed scientific research articles (published in scholarly journals)
or
(ii) research data (data underlying publications, curated data and/or raw data).”
Guidelines on Open Access to Scientific Publications and Research Data
in Horizon 2020. Version 16 December 2013.
Intellectual Property Rights in H2020
Scientific information
Articles and books
KTHBiblioteket,CC-BY-SA
https://www.flickr.com/photos/kthbiblioteket/4472640423/
Research data
„the recorded factual material commonly accepted in the
scientific community as necessary to validate research findings”
„Research data is data that is collected, observed, or created,
for purposes of analysis to produce original research results.”
„Data is anything that has been produced or created during research.”
Other definitions of research data
„…the recorded factual material commonly accepted in the
scientific community as necessary to validate research findings.”
„Anything & everything
produced in the course of
research”
Digital Curation Center
• Numerical data
• Text documents, lab notes
• Questionnaires, responses, transcripts
• Audiotapes, videotapes
• Photographs, films
• Artefacts, specimens, samples
• Models, algorithms, scripts
• Simulation results
• Methodologies and workflows
Examples of research data:
Numerical data
Text documents, lab notes
Questionnaires, responses, transcripts
Audiotapes, videotapes
Photographs, films
Artifacts, specimens, samples
Models, algorithms, scripts
Simulation results
Methodologies and workflows
Examples of research data
The focus [in the context of open access] is on research data
that is available in digital form.
“Open data and content can be freely used,
modified, and shared by anyone for any purpose.”
Open Knowledge Foundation
The Open Definition:
What is open data?
What is open data?
 make your stuff available on the Web
(whatever format) under an open
licence
 make it available as structured data
(e.g. Excel instead of a scan of a table)
 use non-proprietary formats (e.g. CSV
instead of Excel)
 use URIs to denote things, so that
people can point at your stuff
 link your data to other data to
provide context
Tim Berners-Lee, 5-star Open Data, 5stardata.info
This model is concerned with removing technical barriers to data re-use.
Formats
Type of data Reccomended Avoid for data sharing
Tabular CSV, TSV, SPSS portable Excel
Text Plain text, HTML, RTF PDF/A only if
layout matters
Word
Media Container: MP4, Ogg Codec:
Theora, Dirac, FLAC
Qiucktime
Images TIFF, JPEG2000, PNG Gif, JPG
Structured data XML, RDF RBDMS
Major sources of open data
Public data Research data
Specialized data repositories
Berman,Kleywegt,Nakamura,Markley(2012)
http://dx.doi.org/10.1016/j.str.2012.01.010
Protein Data Bank
– since 1971
Oxford Text Archive
– since 1976
GenBank
– since 1982
http://www.ncbi.nlm.nih.gov/ge
nbank/statistics
What about data for which no specialized
repositories exist?
➞ Broad or general data repositories
➞ Data journals
ZENODO
• Zenodo is a free-to-use data archive, run by CERN
• It accepts any kind of data, from any academic discipline
• It is generally preferable to store data in a disciplinary data centre,
but not all scholarly subjects are equally well served with data
centres, so this may make for a useful fallback option
• See http://zenodo.org/ for more details
Should all data be open?
Should all data be open? No.
But data existence should always be open:
• Allows discovery & negotiation on use
• Avoids pointless replication
Slide adapted from Kevin Ashley, DCC, CC-BY
Privacy protection (human subjects!)
National security issues
Protection of endangered species, of archaeological sites, etc.
Interference with commercialization plans
https://www.youtube.com/watch?v=RGtPVIBmFBI&feature=youtu.be
Why data sharing is worth your attention?
• Digital technology now used very widely in research, and is enabling
new research and scientific paradigms
• Research funders and publishers know that digital research data can
be expensive to produce but inexpensive to share, making reuse more
feasible and desirable
• The challenge is to ensure digital research findings can be reproduced
and cited
The long tail of research data
Size of
the data
Number of
datasets
Long-tail of data:
all the data produced by small research
groups and individual researchers
Big Data
„To me, the really difficult challenge is (…) the variety. The heterogeneity, as you put it.
And we see this particularly in what they call the long tail of data (…)”
Mark Parsons, Research Data Alliance
Excercise
Objections to data sharing
How to answer to the most
commonly heard objections to
data sharing?
1. My data in not of interest or
use to anyone else.
Replies (1)
• It is! Researchers want to access data from all kinds of studies,
methodologies and disciplines. It is very difficult to predict which
data may be important for future research. Your data! May also be
essential for teaching purposes. Sharing is not just about archiving
your data but about sharing them amongst colleagues.
2. I want to publish my work
before anyone else sees my data.
Replies (2)
• Data sharing will not stand in the way of you first using your data
for your publications. Most research funders allow you some
period of sole use, but also want timely sharing. Also remember
that you have already been working with your data for some time
so you undoubtedly know the data better than anyone coming to
use them afresh. If you are still concerned you can embargo your
data for a specific period of time.
3. If I ask my respondents for consent
to share their data, then they will not
agree to participate in the study.
Replies (3)
• Don’t assume, that participants will not participate because data
sharing is discussed. Talk to them, they may be less reluctant than you
might think or less concerned over data sharing. Make it clear that is
entirely their decision. Explain that data sharing means and why it
might be important.
• If you not have asked for permission during research you can return
to gain retrospective permission from participants.
4. I’m doing quantitative research
and the combination of my variables
discloses my participants’ identities.
Replies (4)
• Quantitative data can by anonymised trough processes of
aggregation, top coding, removal of variables or controlled access
to certain variables.
5. I have collected audio-visual data
and I cannot anonymise them,
therefore I cannot share these data.
Replies (5)
• Visual data can be anonymised trough blurring faces or distorting
voices but it can be time consuming. It can mean losing much of
the value of the data. It is better to ask for consent to share data
from participants to share data in unanonymised form or / and
control access to the data.
6. I’m doing highly sensitive research.
I cannot possibly make my data
available for others to see.
Replies (6)
• Ask respondents and see if you can get consent for sharing in the
first instance. Anonymisation procedures can help to protect
identifying information. If this two tactics are not apropriate. Than
consider controlling access to tha data or embargoing for a period
of time.
7. It is impossible to anonymise
my transcripts as too much
information is lost.
Replies (7)
• Sometimes access control on the data may be a better solution
than anonymisation if too much useful information would be lost.
8. My data collection contains the
data which I have purchesed and
it cannot be made public.
Replies (8)
• It is important to know who holds the copyright to the data you are
using and to obtain relevant permissions. You need to be aware of
the licence conditions of the data you are using and what you can
and cannot do with the data.
9. Other researchers would not
understand my data at all or may
use them for a wrong purpose.
Replies (9)
• Producing good documentation and providing contextual
information for your research project should enable other
researchers to corretly use and understand your data.
10. There is IPR in the data.
Replies (10)
• This should not be a problem if you seek copyright permission from
the owner of the intellectual property rights. This is best done
early on in the research project but also may be done
retrospectively.
Role playing exercise derived from the UKDA’s “Potential barriers to data sharing –
with suggested solutions” (CC-BY-NC-SA) The original is available from http://data-
archive.ac.uk/create-manage/training- resources
Open Access in Horizon 2020
Mandate on open access to publications:
„Under Horizon 2020, each beneficiary must ensure open access
to all peer-reviewed scientific publications relating to its results.”
Open Access in Horizon 2020
In order to comply with this requirement, beneficiaries must, at the very least,
ensure that their publications, if any, can be read online, downloaded and printed.
However, as any additional rights such as the right to copy, distribute, search, link,
crawl, and mine increase the utility of the accessible publication, beneficiaries
should make every effort to provide for as many of them as possible.
Open Access in Horizon 2020
Open research data pilot:
„The Open Research Data Pilot applies to two types of data:
1) the data (…) needed to validate the results presented in scientific
publications as soon as possible;
2) other data (…) as specified and within the deadlines laid down in the
data management plan.”
„Participating projects are required to deposit the research data
described above, preferably into a research data repository.”
Open Access in Horizon 2020
Open research data pilot:
„The Open Research Data Pilot applies to two types of data:
1) the data (…) needed to validate the results presented in scientific
publications as soon as possible;
2) other data (…) as specified and within the deadlines laid down in the
data management plan.”
„Participating projects are required to deposit the research data
described above, preferably into a research data repository.”
• Only for projects from 7 selected areas.
• You can opt-in, and you can also opt-out.
Open Access in Horizon 2020
Participating projects are required to deposit the research data described
above, preferably into a research data repository.
As far as possible, projects must then take measures to enable for third
parties to access, mine, exploit, reproduce and disseminate (free of charge
for any user) this research data.
One straightforward and effective way of doing this is to attach a Creative
Commons Licence (CC-BY or CC0 tool) to the data deposited.
H2020 - areas participating in the data pilot
• Future and Emerging Technologies
• Research infrastructures – part e-Infrastructures
• Leadership in enabling and industrial technologies – Information and
Communication Technologies
• Societal Challenge: 'Secure, Clean and Efficient Energy' – part Smart cities
and communities
• Societal Challenge: 'Climate Action, Environment, Resource Efficiency and
Raw materials' – except raw materials
• Societal Challenge: 'Europe in a changing world – inclusive, innovative and
reflective Societies'
• Science with and for Society Projects in other areas can participate on a
voluntary basis
Reasons for opting out
• If results are expected to be commercially or industrially exploited
• If participation is incompatible with the need for confidentiality in
connection with security issues
• If incompatible with existing rules on the protection of personal data
• Would jeopardise the achievement of the main aim of the action
• If the project will not generate / collect any research data
• If there are other legitimate reasons to not take part in the Pilot
Can opt out at proposal stage OR during lifetime of project.
Should describe issues in the project Data Management Plan.
Slide by Sarah Jones, adapted by Kevin Ashley, DCC, CC-BY
Legal aspects
CC licences
What are Creative Commons Licenses?
What are Creative Commons Licenses?
BY – Attribution
SA – Share Alike
NC – Non-commercial
ND – No derivatives
Public Domain
Public Domain Mark Public Domain Dedication
Gratis
open access
Libre
open access
the right to read the right to read
and re-use
CC0 is easy to use
You don’t need to know what rights actually apply to your dataset
(what is protected?)
 you should know this for CC-BY (and other CC licenses)
Why CC0 for research data?
BY: Datasets are particularly prone to attribution stacking, where a derivative
work must acknowledge all contributors to each work from which it is derived, no
matter how distantly.
SA: The problem with copyleft licences is they prevent the licensed data being
combined with data released under a different copyleft licence: the derived dataset
would not be able to satisfy both sets of licence terms simultaneously.
NC: Non-commercial licences may have wider implications than intended due
to the ambiguity of what constitutes a commercial use.
From:
Ball, A. (2014). ‘How to License Research Data’. DCC How-to Guides. Edinburgh: Digital Curation Centre.
Available online: http://www.dcc.ac.uk/resources/how-guides/license-research-data#x1-4000
Open Access in Horizon 2020
Open research data pilot:
„The use of a detailed data management plan covering individual
datasets is required for funded projects participating in the
Open Research Data Pilot.”
Research data management
…an active approach towards handling data
throughout all stages of the research data lifecycle.
What is Research Data Management?
Research data
lifecycle
Active data management
• Data management planning
• Creating data
• Documenting data
• Accessing & using data
• Storage and backup
• Selecting what to keep
• Sharing data
• Data licencing and citation
• Preserving data
• …
Digital Curation Center
1. Legal requirements to retain the data beyond its immediate use.
2. Scientific or Historical Value: this involves inferring anticipated future use.
3. Uniqueness: does it duplicate existing datasets?
4. Non-Replicability: would it be feasible to replicate the data? (high costs, one-time events)
5. Potential for Redistribution: the reliability, integrity, and usability of the data files (do
formats meet technical criteria? are IPRs addressed?)
6. Economic Case: costs for managing and preserving the data are justifiable when assessed
against evidence of potential future benefits.
7. Full documentation: documentation is comprehensive and correct.
Data Selection – guidelines
Based on:
Whyte, A. & Wilson, A. (2010). "How to Appraise and Select Research Data for Curation".
DCC How-to Guides. Edinburgh: Digital Curation Centre. Available online:
http://www.dcc.ac.uk/resources/how-guides/appraise-select-data
File formats - tactic
If you want your data to be re-used and sustainable in the long-term,
you typically want to opt for open, non-proprietary formats.
• Do you have a choice or do the instruments you use only export in
certain formats?
• What is common in your field? Try to use something that is accepted
and widespread
• Does your data centre recommend formats? If so it’s best to use
these.
Data selection…
…depends on what researchers want to do with their data;
what they are allowed to do with the data;
and what the institution can afford to do with the data.
Slide adapted from Kevin Ashley, DCC, CC-BY
A brief plan that outlines
• what data will be created and how
• how it will be managed (storage, back-up, access…)
• plans for data sharing and preservation
What is a DMP?
Slide from Kevin Ashley, DCC, CC-BY
Lots of research funders require DMP
Why develop a DMP?
DMPs are useful whenever researchers are creating data to:
• Make informed decisions to anticipate and avoid problems
• Avoid duplication, data loss and security breaches
• Develop procedures early on for consistency
• Ensure data are accurate, complete, reliable and secure
• Save time and effort
Slide adapted from Kevin Ashley, DCC, CC-BY
Five common themes
1. Description of data to be collected / created
(i.e. how will it be collected, content, type, format, volume...)
2. Documentation & metadata
(standards and formats, structure of file naming, etc.)
2. Ethics and Intellectual Property
(highlight any restrictions on data sharing e.g. privacy, confidentiality)
4. Plans for data sharing and access
(i.e. how, when, to whom)
5. Strategy for long-term preservation
www.dcc.ac.uk/resources/data-management-plans/checklist
Slide adapted from Kevin Ashley, DCC, CC-BY
Advice on writing DMPs
• Keep it short and simple, but be specific
• Seek advice - consult and collaborate
• Base plans on available skills and support
• Make sure implementation is feasible
• Remember: plans change and should evolve
For better understanding of your data
• Think about what is needed in order to find, evaluate, understand,
and reuse the data.
• Have you documented what you did and how?
• Did you develop code to run analyses? If so, this should be kept and
shared too.
• Is it clear what each bit of your dataset means? Make sure the units
are labelled and abbreviations explained.
• Record metadata so others can find your work e.g. title, date,
creator(s), subject, format, rights…,
Which data need to be kept
• Could this data be re-used
• Must it be kept as evidence or for legal reasons
• Should it be kept for its potential value
• Consider costs – do benefits outweigh cost?
• Evaluate criteria to decide what to keep
• 5 steps to decide what data to keep www.dcc.ac.uk/resources/how-
guides/five-steps-decide-what-data-keep
Where to deposit?
• Does your publisher or funder suggest a repository?
• Are there data centres or community databases for your discipline?
• Does your university offer support for long-term preservation?
Excercise
Define and select your data
Choose one specific research project and for this project:
1. Define what data will be generated (all of it!)
2. What would you select for preservation?
3. How would you share your data?
There is no such thing as ideal data.
Thank you for your attention
Contact:
l.stepinska-ustasiak@icm.edu.pl

Open science, open data - FOSTER training, Potsdam

  • 1.
    Open science, opendata Lidia Stępińska-Ustasiak Open Science Platform, ICM, University of Warsaw
  • 2.
    “Open science isthe idea that scientific knowledge of all kinds should be openly shared as early as is practical in the discovery process.” Michael Nielsen
  • 3.
    Outline • Openess inscience • Open research data • Definitions • Formats • Levels of openess • Depositing data • Open Access and open research data pilot in Horizon2020 • CC licences in science • Research Data Managment
  • 4.
    4th Paradigm • Empirical- describing natural phenomena (last millenia) • Theoretical - building models and generalisations (last centuries) • Computational - simulating complex phenomena (last decades) • Data Exploration “data- intensive” scientific discovery (last years)
  • 5.
    Scholarly communication ischanging Traditional system has limitations Costs of subscriptions are growing New technologies develop very fast Production of knowledge is also very fast Open Access is the answer
  • 6.
    Open science Open access Open data Opensource Open educational resources Open peer review Open notebook science Citizen science
  • 7.
    What does theEC understand by the OA? • Online access at no charge to the user • To peer reviewed scientific publications • To scientific data • Two main publishing business models • Self archiving – deposit manuscripts & immediate/delayed OA provided by autho (green OA) • OA publishing – costs covered & immediate OA provided by publisher (gold model) e.g. „author pay” model (APC) Open access can be defined as the practice of providing on-line access to scientific information that is free of charge to the end-user.
  • 8.
    Objective • The ECgoal is to optimize the impact of research in Europe. Expected benefits: • Better and more efficient science (Science 2.0) • Economic growth • Broader, faster, more transparent and equal access for the benefit of researchers, industry and citizens. (Responsible Research and Innovations)
  • 9.
    European Commission (2013): „Openaccess can be defined as the practice of providing on-line access to scientificinformation that is free of charge to the end-user and that is re-usable. In the context of research and innovation, 'scientific information' can refer to (i) peer-reviewed scientific research articles (published in scholarly journals) or (ii) research data (data underlying publications, curated data and/or raw data).” Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020. Version 16 December 2013.
  • 10.
  • 11.
    Scientific information Articles andbooks KTHBiblioteket,CC-BY-SA https://www.flickr.com/photos/kthbiblioteket/4472640423/ Research data „the recorded factual material commonly accepted in the scientific community as necessary to validate research findings”
  • 12.
    „Research data isdata that is collected, observed, or created, for purposes of analysis to produce original research results.” „Data is anything that has been produced or created during research.” Other definitions of research data „…the recorded factual material commonly accepted in the scientific community as necessary to validate research findings.”
  • 13.
    „Anything & everything producedin the course of research” Digital Curation Center
  • 14.
    • Numerical data •Text documents, lab notes • Questionnaires, responses, transcripts • Audiotapes, videotapes • Photographs, films • Artefacts, specimens, samples • Models, algorithms, scripts • Simulation results • Methodologies and workflows Examples of research data:
  • 15.
    Numerical data Text documents,lab notes Questionnaires, responses, transcripts Audiotapes, videotapes Photographs, films Artifacts, specimens, samples Models, algorithms, scripts Simulation results Methodologies and workflows Examples of research data The focus [in the context of open access] is on research data that is available in digital form.
  • 16.
    “Open data andcontent can be freely used, modified, and shared by anyone for any purpose.” Open Knowledge Foundation The Open Definition: What is open data?
  • 17.
    What is opendata?  make your stuff available on the Web (whatever format) under an open licence  make it available as structured data (e.g. Excel instead of a scan of a table)  use non-proprietary formats (e.g. CSV instead of Excel)  use URIs to denote things, so that people can point at your stuff  link your data to other data to provide context Tim Berners-Lee, 5-star Open Data, 5stardata.info This model is concerned with removing technical barriers to data re-use.
  • 18.
    Formats Type of dataReccomended Avoid for data sharing Tabular CSV, TSV, SPSS portable Excel Text Plain text, HTML, RTF PDF/A only if layout matters Word Media Container: MP4, Ogg Codec: Theora, Dirac, FLAC Qiucktime Images TIFF, JPEG2000, PNG Gif, JPG Structured data XML, RDF RBDMS
  • 19.
    Major sources ofopen data Public data Research data
  • 20.
    Specialized data repositories Berman,Kleywegt,Nakamura,Markley(2012) http://dx.doi.org/10.1016/j.str.2012.01.010 ProteinData Bank – since 1971 Oxford Text Archive – since 1976 GenBank – since 1982 http://www.ncbi.nlm.nih.gov/ge nbank/statistics
  • 21.
    What about datafor which no specialized repositories exist? ➞ Broad or general data repositories ➞ Data journals
  • 24.
    ZENODO • Zenodo isa free-to-use data archive, run by CERN • It accepts any kind of data, from any academic discipline • It is generally preferable to store data in a disciplinary data centre, but not all scholarly subjects are equally well served with data centres, so this may make for a useful fallback option • See http://zenodo.org/ for more details
  • 25.
  • 26.
    Should all databe open? No. But data existence should always be open: • Allows discovery & negotiation on use • Avoids pointless replication Slide adapted from Kevin Ashley, DCC, CC-BY Privacy protection (human subjects!) National security issues Protection of endangered species, of archaeological sites, etc. Interference with commercialization plans
  • 27.
  • 28.
    Why data sharingis worth your attention? • Digital technology now used very widely in research, and is enabling new research and scientific paradigms • Research funders and publishers know that digital research data can be expensive to produce but inexpensive to share, making reuse more feasible and desirable • The challenge is to ensure digital research findings can be reproduced and cited
  • 29.
    The long tailof research data Size of the data Number of datasets Long-tail of data: all the data produced by small research groups and individual researchers Big Data
  • 30.
    „To me, thereally difficult challenge is (…) the variety. The heterogeneity, as you put it. And we see this particularly in what they call the long tail of data (…)” Mark Parsons, Research Data Alliance
  • 31.
  • 32.
    How to answerto the most commonly heard objections to data sharing?
  • 33.
    1. My datain not of interest or use to anyone else.
  • 34.
    Replies (1) • Itis! Researchers want to access data from all kinds of studies, methodologies and disciplines. It is very difficult to predict which data may be important for future research. Your data! May also be essential for teaching purposes. Sharing is not just about archiving your data but about sharing them amongst colleagues.
  • 35.
    2. I wantto publish my work before anyone else sees my data.
  • 36.
    Replies (2) • Datasharing will not stand in the way of you first using your data for your publications. Most research funders allow you some period of sole use, but also want timely sharing. Also remember that you have already been working with your data for some time so you undoubtedly know the data better than anyone coming to use them afresh. If you are still concerned you can embargo your data for a specific period of time.
  • 37.
    3. If Iask my respondents for consent to share their data, then they will not agree to participate in the study.
  • 38.
    Replies (3) • Don’tassume, that participants will not participate because data sharing is discussed. Talk to them, they may be less reluctant than you might think or less concerned over data sharing. Make it clear that is entirely their decision. Explain that data sharing means and why it might be important. • If you not have asked for permission during research you can return to gain retrospective permission from participants.
  • 39.
    4. I’m doingquantitative research and the combination of my variables discloses my participants’ identities.
  • 40.
    Replies (4) • Quantitativedata can by anonymised trough processes of aggregation, top coding, removal of variables or controlled access to certain variables.
  • 41.
    5. I havecollected audio-visual data and I cannot anonymise them, therefore I cannot share these data.
  • 42.
    Replies (5) • Visualdata can be anonymised trough blurring faces or distorting voices but it can be time consuming. It can mean losing much of the value of the data. It is better to ask for consent to share data from participants to share data in unanonymised form or / and control access to the data.
  • 43.
    6. I’m doinghighly sensitive research. I cannot possibly make my data available for others to see.
  • 44.
    Replies (6) • Askrespondents and see if you can get consent for sharing in the first instance. Anonymisation procedures can help to protect identifying information. If this two tactics are not apropriate. Than consider controlling access to tha data or embargoing for a period of time.
  • 45.
    7. It isimpossible to anonymise my transcripts as too much information is lost.
  • 46.
    Replies (7) • Sometimesaccess control on the data may be a better solution than anonymisation if too much useful information would be lost.
  • 47.
    8. My datacollection contains the data which I have purchesed and it cannot be made public.
  • 48.
    Replies (8) • Itis important to know who holds the copyright to the data you are using and to obtain relevant permissions. You need to be aware of the licence conditions of the data you are using and what you can and cannot do with the data.
  • 49.
    9. Other researcherswould not understand my data at all or may use them for a wrong purpose.
  • 50.
    Replies (9) • Producinggood documentation and providing contextual information for your research project should enable other researchers to corretly use and understand your data.
  • 51.
    10. There isIPR in the data.
  • 52.
    Replies (10) • Thisshould not be a problem if you seek copyright permission from the owner of the intellectual property rights. This is best done early on in the research project but also may be done retrospectively.
  • 53.
    Role playing exercisederived from the UKDA’s “Potential barriers to data sharing – with suggested solutions” (CC-BY-NC-SA) The original is available from http://data- archive.ac.uk/create-manage/training- resources
  • 55.
    Open Access inHorizon 2020 Mandate on open access to publications: „Under Horizon 2020, each beneficiary must ensure open access to all peer-reviewed scientific publications relating to its results.”
  • 56.
    Open Access inHorizon 2020 In order to comply with this requirement, beneficiaries must, at the very least, ensure that their publications, if any, can be read online, downloaded and printed. However, as any additional rights such as the right to copy, distribute, search, link, crawl, and mine increase the utility of the accessible publication, beneficiaries should make every effort to provide for as many of them as possible.
  • 57.
    Open Access inHorizon 2020 Open research data pilot: „The Open Research Data Pilot applies to two types of data: 1) the data (…) needed to validate the results presented in scientific publications as soon as possible; 2) other data (…) as specified and within the deadlines laid down in the data management plan.” „Participating projects are required to deposit the research data described above, preferably into a research data repository.”
  • 58.
    Open Access inHorizon 2020 Open research data pilot: „The Open Research Data Pilot applies to two types of data: 1) the data (…) needed to validate the results presented in scientific publications as soon as possible; 2) other data (…) as specified and within the deadlines laid down in the data management plan.” „Participating projects are required to deposit the research data described above, preferably into a research data repository.” • Only for projects from 7 selected areas. • You can opt-in, and you can also opt-out.
  • 59.
    Open Access inHorizon 2020 Participating projects are required to deposit the research data described above, preferably into a research data repository. As far as possible, projects must then take measures to enable for third parties to access, mine, exploit, reproduce and disseminate (free of charge for any user) this research data. One straightforward and effective way of doing this is to attach a Creative Commons Licence (CC-BY or CC0 tool) to the data deposited.
  • 60.
    H2020 - areasparticipating in the data pilot • Future and Emerging Technologies • Research infrastructures – part e-Infrastructures • Leadership in enabling and industrial technologies – Information and Communication Technologies • Societal Challenge: 'Secure, Clean and Efficient Energy' – part Smart cities and communities • Societal Challenge: 'Climate Action, Environment, Resource Efficiency and Raw materials' – except raw materials • Societal Challenge: 'Europe in a changing world – inclusive, innovative and reflective Societies' • Science with and for Society Projects in other areas can participate on a voluntary basis
  • 61.
    Reasons for optingout • If results are expected to be commercially or industrially exploited • If participation is incompatible with the need for confidentiality in connection with security issues • If incompatible with existing rules on the protection of personal data • Would jeopardise the achievement of the main aim of the action • If the project will not generate / collect any research data • If there are other legitimate reasons to not take part in the Pilot Can opt out at proposal stage OR during lifetime of project. Should describe issues in the project Data Management Plan. Slide by Sarah Jones, adapted by Kevin Ashley, DCC, CC-BY
  • 62.
  • 63.
    What are CreativeCommons Licenses?
  • 64.
    What are CreativeCommons Licenses? BY – Attribution SA – Share Alike NC – Non-commercial ND – No derivatives
  • 65.
    Public Domain Public DomainMark Public Domain Dedication
  • 66.
    Gratis open access Libre open access theright to read the right to read and re-use
  • 67.
    CC0 is easyto use You don’t need to know what rights actually apply to your dataset (what is protected?)  you should know this for CC-BY (and other CC licenses)
  • 68.
    Why CC0 forresearch data? BY: Datasets are particularly prone to attribution stacking, where a derivative work must acknowledge all contributors to each work from which it is derived, no matter how distantly. SA: The problem with copyleft licences is they prevent the licensed data being combined with data released under a different copyleft licence: the derived dataset would not be able to satisfy both sets of licence terms simultaneously. NC: Non-commercial licences may have wider implications than intended due to the ambiguity of what constitutes a commercial use. From: Ball, A. (2014). ‘How to License Research Data’. DCC How-to Guides. Edinburgh: Digital Curation Centre. Available online: http://www.dcc.ac.uk/resources/how-guides/license-research-data#x1-4000
  • 69.
    Open Access inHorizon 2020 Open research data pilot: „The use of a detailed data management plan covering individual datasets is required for funded projects participating in the Open Research Data Pilot.”
  • 70.
  • 71.
    …an active approachtowards handling data throughout all stages of the research data lifecycle. What is Research Data Management? Research data lifecycle
  • 72.
    Active data management •Data management planning • Creating data • Documenting data • Accessing & using data • Storage and backup • Selecting what to keep • Sharing data • Data licencing and citation • Preserving data • … Digital Curation Center
  • 73.
    1. Legal requirementsto retain the data beyond its immediate use. 2. Scientific or Historical Value: this involves inferring anticipated future use. 3. Uniqueness: does it duplicate existing datasets? 4. Non-Replicability: would it be feasible to replicate the data? (high costs, one-time events) 5. Potential for Redistribution: the reliability, integrity, and usability of the data files (do formats meet technical criteria? are IPRs addressed?) 6. Economic Case: costs for managing and preserving the data are justifiable when assessed against evidence of potential future benefits. 7. Full documentation: documentation is comprehensive and correct. Data Selection – guidelines Based on: Whyte, A. & Wilson, A. (2010). "How to Appraise and Select Research Data for Curation". DCC How-to Guides. Edinburgh: Digital Curation Centre. Available online: http://www.dcc.ac.uk/resources/how-guides/appraise-select-data
  • 74.
    File formats -tactic If you want your data to be re-used and sustainable in the long-term, you typically want to opt for open, non-proprietary formats. • Do you have a choice or do the instruments you use only export in certain formats? • What is common in your field? Try to use something that is accepted and widespread • Does your data centre recommend formats? If so it’s best to use these.
  • 75.
    Data selection… …depends onwhat researchers want to do with their data; what they are allowed to do with the data; and what the institution can afford to do with the data. Slide adapted from Kevin Ashley, DCC, CC-BY
  • 76.
    A brief planthat outlines • what data will be created and how • how it will be managed (storage, back-up, access…) • plans for data sharing and preservation What is a DMP? Slide from Kevin Ashley, DCC, CC-BY
  • 77.
    Lots of researchfunders require DMP
  • 78.
    Why develop aDMP? DMPs are useful whenever researchers are creating data to: • Make informed decisions to anticipate and avoid problems • Avoid duplication, data loss and security breaches • Develop procedures early on for consistency • Ensure data are accurate, complete, reliable and secure • Save time and effort Slide adapted from Kevin Ashley, DCC, CC-BY
  • 79.
    Five common themes 1.Description of data to be collected / created (i.e. how will it be collected, content, type, format, volume...) 2. Documentation & metadata (standards and formats, structure of file naming, etc.) 2. Ethics and Intellectual Property (highlight any restrictions on data sharing e.g. privacy, confidentiality) 4. Plans for data sharing and access (i.e. how, when, to whom) 5. Strategy for long-term preservation www.dcc.ac.uk/resources/data-management-plans/checklist Slide adapted from Kevin Ashley, DCC, CC-BY
  • 80.
    Advice on writingDMPs • Keep it short and simple, but be specific • Seek advice - consult and collaborate • Base plans on available skills and support • Make sure implementation is feasible • Remember: plans change and should evolve
  • 81.
    For better understandingof your data • Think about what is needed in order to find, evaluate, understand, and reuse the data. • Have you documented what you did and how? • Did you develop code to run analyses? If so, this should be kept and shared too. • Is it clear what each bit of your dataset means? Make sure the units are labelled and abbreviations explained. • Record metadata so others can find your work e.g. title, date, creator(s), subject, format, rights…,
  • 82.
    Which data needto be kept • Could this data be re-used • Must it be kept as evidence or for legal reasons • Should it be kept for its potential value • Consider costs – do benefits outweigh cost? • Evaluate criteria to decide what to keep • 5 steps to decide what data to keep www.dcc.ac.uk/resources/how- guides/five-steps-decide-what-data-keep
  • 83.
    Where to deposit? •Does your publisher or funder suggest a repository? • Are there data centres or community databases for your discipline? • Does your university offer support for long-term preservation?
  • 85.
  • 86.
    Choose one specificresearch project and for this project: 1. Define what data will be generated (all of it!) 2. What would you select for preservation? 3. How would you share your data?
  • 87.
    There is nosuch thing as ideal data.
  • 88.
    Thank you foryour attention Contact: l.stepinska-ustasiak@icm.edu.pl

Editor's Notes

  • #2 I’m happy to be with you and explaine a bit how open acces to publication and data works in H2020.
  • #6 Scholarly comm is also changing very fast because of a few reasons
  • #7 Open acces to scientific publications is just one of elements of a phenomena calles open science. Open science is consisting of…
  • #9 Rhe reason why EC decided to implement oa policy is intention to opimize
  • #12 Open Access refers to articles and books and to research data as well.
  • #15 What it can be
  • #30 The long-tail is what is supposed to be captured by general data repositories, because Big Data projects usually have their own solutions. The majority of scientists in the world work in the long-tail of research.
  • #59 „if there are other legitimate reasons to not take part in the Pilot”
  • #65 Behind these icons are well-formulated legal contracts, in which a person who owns the IPR to the work being licensed (usually the creator) states what he/she wants to let the end-users of the work do with this work. Basically, the starting point in CC is: the user can do everything, but he must respect the clauses.
  • #67 From the legal perspective we can distingusih 2 models of OA….
  • #81 The begginings might be just a bit complicated