brian.hole@ubiquitypress.com	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  www.ubiquitypress.com	
  /	
  @ubiquitypress	
  
Brian	
  Hole	
  
DPC	
  Workshop,	
  York,	
  5	
  July	
  2013	
  
From	
  Open	
  Access	
  to	
  Open	
  Data	
  
brian.hole@ubiquitypress.com	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  www.ubiquitypress.com	
  /	
  @ubiquitypress	
  
The	
  Social	
  Contract	
  
of	
  Science	
  
•  ValidaKon	
  
•  DisseminaKon	
  
•  Further	
  development	
  
ScienKfic	
  MalpracKce	
  
•  Publishers	
  
•  Researchers	
  
•  Libraries,	
  repositories…	
  
•  All	
  outputs	
  
brian.hole@ubiquitypress.com	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  www.ubiquitypress.com	
  /	
  @ubiquitypress	
  
brian.hole@ubiquitypress.com	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  www.ubiquitypress.com	
  /	
  @ubiquitypress	
  
Repositories	
  
Modified	
  from:	
  XKCD	
  
brian.hole@ubiquitypress.com	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  www.ubiquitypress.com	
  /	
  @ubiquitypress	
  
Metajournals	
  as	
  incen6ves	
  
brian.hole@ubiquitypress.com	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  www.ubiquitypress.com	
  /	
  @ubiquitypress	
  
brian.hole@ubiquitypress.com	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  www.ubiquitypress.com	
  /	
  @ubiquitypress	
  
Why	
  data	
  journals?	
  
Amsterdam	
  manifesto:	
  
4.	
  A	
  data	
  citaKon	
  in	
  a	
  publicaKon	
  should	
  
resemble	
  a	
  bibliographic	
  citaKon	
  and	
  be	
  
located	
  in	
  the	
  publicaKon’s	
  reference	
  list.	
  
•  Data	
  can	
  (and	
  should)	
  be	
  cited	
  using	
  DataCite	
  DOIs	
  
in	
  arKcles,	
  but	
  this	
  is	
  not	
  enough.	
  
•  Researchers	
  understand	
  the	
  value	
  of	
  papers	
  
•  University	
  departments	
  and	
  the	
  REF	
  understand	
  
papers	
  
•  Researchers	
  know	
  where	
  to	
  put	
  paper	
  refs,	
  no	
  
need	
  for	
  extra	
  guidelines	
  
•  Publishers	
  rouKnely	
  strip	
  out	
  anything	
  else	
  
•  Familiar	
  impact	
  metrics	
  can	
  be	
  collected	
  
brian.hole@ubiquitypress.com	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  www.ubiquitypress.com	
  /	
  @ubiquitypress	
  
What	
  is	
  a	
  data	
  paper?	
  
A	
  data	
  paper	
  is	
  not…	
  
•  …	
  a	
  research	
  paper.	
  A	
  data	
  paper	
  only	
  	
  
describes	
  a	
  dataset.	
  But	
  it	
  will	
  reference	
  
research	
  papers	
  that	
  are	
  based	
  on	
  the	
  data.	
  
•  …	
  simply	
  replicaKon	
  of	
  the	
  informaKon	
  in	
  a	
  	
  
data	
  repository.	
  
A	
  data	
  paper…	
  
•  …	
  describes	
  the	
  methodology	
  with	
  which	
  
a	
  dataset	
  was	
  created.	
  
•  …	
  describes	
  the	
  dataset	
  itself.	
  
•  …	
  details	
  the	
  reuse	
  potenKal	
  of	
  the	
  data.	
  
•  …	
  is	
  oaen	
  authored	
  by	
  a	
  data	
  scienKst.	
  
•  …	
  is	
  citable,	
  enabling	
  reuse	
  to	
  be	
  tracked.	
  
brian.hole@ubiquitypress.com	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  www.ubiquitypress.com	
  /	
  @ubiquitypress	
  
General	
  structure	
  
•  Title	
  
•  Authors,	
  affiliaKons	
  
•  Abstract	
  
•  Keywords	
  
•  Context	
  
•  SpaKal	
  coverage,	
  temporal	
  coverage	
  
•  Methods	
  
•  Steps,	
  sampling	
  strategy,	
  quality	
  control,	
  constraints,	
  
ethical	
  consideraKons	
  	
  	
  
•  Dataset	
  descripKon	
  
•  Object	
  names,	
  data	
  type,	
  format	
  names	
  &	
  versions,	
  
creators,	
  creaKon	
  dates,	
  language,	
  license,	
  locaKon	
  
(DOI),	
  publicaKon	
  date	
  
•  Reuse	
  potenKal	
  
•  Acknowledgements	
  
•  References	
  
brian.hole@ubiquitypress.com	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  www.ubiquitypress.com	
  /	
  @ubiquitypress	
  
1. The	
  paper	
  contents	
  
a.  The	
  methods	
  secKon	
  of	
  the	
  paper	
  must	
  provide	
  
sufficient	
  detail	
  that	
  a	
  reader	
  can	
  understand	
  how	
  
the	
  resource	
  was	
  created.	
  
b.  The	
  resource	
  must	
  be	
  correctly	
  described.	
  
c.  The	
  reuse	
  secKon	
  must	
  provide	
  concrete	
  and	
  useful	
  
suggesKons	
  for	
  reuse	
  of	
  the	
  reuse.	
  
2.	
  The	
  deposited	
  resource	
  
a.  The	
  repository	
  must	
  be	
  suitable	
  for	
  resource	
  
and	
  have	
  a	
  sustainability	
  model.	
  
b. Open	
  license	
  permits	
  unrestricted	
  access	
  (e.g.	
  CC0).	
  
c.  A	
  version	
  in	
  an	
  open,	
  non-­‐proprietary	
  format.	
  
d. Labeled	
  in	
  such	
  a	
  way	
  that	
  a	
  3rd	
  party	
  can	
  make	
  
sense	
  of	
  it.	
  
e.  Must	
  be	
  acKonable.	
  
Peer	
  review	
  
brian.hole@ubiquitypress.com	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  www.ubiquitypress.com	
  /	
  @ubiquitypress	
  
•  Data	
  journals	
  need	
  to	
  be	
  built	
  within	
  the	
  community,	
  and	
  
to	
  adapt	
  to	
  its	
  requirements	
  
Important	
  principles	
  
•  Community	
  ownership	
  and	
  trust	
  is	
  important	
  
•  Full	
  transparency	
  in	
  processes	
  and	
  finances	
  
•  Sustainability	
  
•  Low	
  barriers	
  essenKal	
  
•  Zero	
  to	
  low	
  fees	
  
•  Quick	
  online	
  authoring	
  
•  Repository	
  integraKon	
  
brian.hole@ubiquitypress.com	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  www.ubiquitypress.com	
  /	
  @ubiquitypress	
  
PRIME:	
  Use	
  Case	
  #1	
  
•  A	
  UCL	
  Researcher	
  deposits	
  data	
  in	
  an	
  external	
  subject	
  repository.	
  	
  
•  The	
  subject	
  repository	
  sends	
  the	
  metadata	
  and	
  DOI	
  of	
  the	
  data	
  to	
  the	
  
UCL	
  insKtuKonal	
  repository	
  so	
  that	
  it	
  has	
  a	
  record	
  of	
  the	
  output.	
  	
  
brian.hole@ubiquitypress.com	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  www.ubiquitypress.com	
  /	
  @ubiquitypress	
  
Text	
  and	
  data	
  mining	
  
[the	
  benefits	
  of	
  text	
  mining	
  include]:	
  “increased	
  researcher	
  efficiency;	
  
unlocking	
  hidden	
  informaKon	
  and	
  developing	
  new	
  knowledge;	
  exploring	
  
new	
  horizons;	
  improved	
  research	
  and	
  evidence	
  base;	
  and	
  improving	
  the	
  
search	
  process	
  and	
  quality.	
  Broader	
  economic	
  and	
  societal	
  benefits	
  
include	
  cost	
  savings	
  and	
  producKvity	
  gains,	
  innovaKve	
  new	
  service	
  
development,	
  new	
  business	
  models	
  and	
  new	
  medical	
  treatments.”	
  
JISC	
  
“The	
  downstream	
  value	
  of	
  high	
  quality,	
  high	
  throughput	
  chemical	
  
informaKon	
  extracted	
  from	
  the	
  literature	
  can	
  be	
  measured	
  against	
  
convenKonal	
  abstracKon	
  services…	
  with	
  a	
  combined	
  annual	
  turnover	
  of	
  
perhaps	
  $500-­‐1,000	
  million	
  dollars.	
  We	
  believe	
  our	
  tools	
  are	
  capable	
  of	
  
building	
  the	
  next	
  and	
  beoer	
  generaKon	
  of	
  services.”	
  
Peter	
  Murray-­‐Rust	
  
brian.hole@ubiquitypress.com	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  www.ubiquitypress.com	
  /	
  @ubiquitypress	
  
“Licences	
  for	
  Europe”	
  
•  Focus	
  was	
  to	
  create	
  new	
  licenses	
  to	
  enable	
  TDM	
  
•  I.e.	
  researcher	
  would	
  need	
  one	
  license	
  from	
  each	
  
publisher.	
  Much	
  TDM	
  work	
  involves	
  hundreds	
  of	
  
publishers,	
  can	
  take	
  weeks	
  just	
  for	
  one.	
  
•  Focus	
  pre-­‐determined	
  from	
  start:	
  to	
  come	
  up	
  with	
  
proposals	
  on	
  licenses	
  only.	
  Discussion	
  of	
  excepKons	
  
allowed	
  but	
  not	
  to	
  be	
  part	
  of	
  recommendaKons.	
  
•  Unbalanced	
  setup:	
  large	
  corporate	
  publishers,	
  technology	
  
sector	
  poorly	
  represented.	
  
Working	
  Group	
  4:	
  Text	
  and	
  Data	
  Mining	
  
•  UP	
  walked	
  out	
  with	
  civil	
  society	
  groups.	
  Not	
  prepared	
  to	
  
endorse	
  licenses	
  as	
  acceptable.	
  	
  
•  Tell	
  your	
  publisher	
  or	
  associaKon	
  that	
  this	
  is	
  important	
  to	
  you.	
  
•  Workshop	
  at	
  the	
  BL	
  to	
  inform	
  policy	
  makers	
  in	
  late	
  Sept	
  2013.	
  
brian.hole@ubiquitypress.com	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  www.ubiquitypress.com	
  /	
  @ubiquitypress	
  
Links	
  
hop://www.ubiquitypress.com	
  
hop://www.metajnl.com	
  	
  
hop://www.ucl.ac.uk/prime	
  
hop://www.ubiquitypress.com/Licence_for_Europe_Text_Data_Mining_Withdrawal	
  	
  
	
  
brian.hole@ubiquitypress.com	
  

From Open Access to Open Data

  • 1.
    brian.hole@ubiquitypress.com                              www.ubiquitypress.com  /  @ubiquitypress   Brian  Hole   DPC  Workshop,  York,  5  July  2013   From  Open  Access  to  Open  Data  
  • 2.
    brian.hole@ubiquitypress.com                              www.ubiquitypress.com  /  @ubiquitypress   The  Social  Contract   of  Science   •  ValidaKon   •  DisseminaKon   •  Further  development   ScienKfic  MalpracKce   •  Publishers   •  Researchers   •  Libraries,  repositories…   •  All  outputs  
  • 3.
    brian.hole@ubiquitypress.com                              www.ubiquitypress.com  /  @ubiquitypress  
  • 4.
    brian.hole@ubiquitypress.com                              www.ubiquitypress.com  /  @ubiquitypress   Repositories   Modified  from:  XKCD  
  • 5.
    brian.hole@ubiquitypress.com                              www.ubiquitypress.com  /  @ubiquitypress   Metajournals  as  incen6ves  
  • 6.
    brian.hole@ubiquitypress.com                              www.ubiquitypress.com  /  @ubiquitypress  
  • 7.
    brian.hole@ubiquitypress.com                              www.ubiquitypress.com  /  @ubiquitypress   Why  data  journals?   Amsterdam  manifesto:   4.  A  data  citaKon  in  a  publicaKon  should   resemble  a  bibliographic  citaKon  and  be   located  in  the  publicaKon’s  reference  list.   •  Data  can  (and  should)  be  cited  using  DataCite  DOIs   in  arKcles,  but  this  is  not  enough.   •  Researchers  understand  the  value  of  papers   •  University  departments  and  the  REF  understand   papers   •  Researchers  know  where  to  put  paper  refs,  no   need  for  extra  guidelines   •  Publishers  rouKnely  strip  out  anything  else   •  Familiar  impact  metrics  can  be  collected  
  • 8.
    brian.hole@ubiquitypress.com                              www.ubiquitypress.com  /  @ubiquitypress   What  is  a  data  paper?   A  data  paper  is  not…   •  …  a  research  paper.  A  data  paper  only     describes  a  dataset.  But  it  will  reference   research  papers  that  are  based  on  the  data.   •  …  simply  replicaKon  of  the  informaKon  in  a     data  repository.   A  data  paper…   •  …  describes  the  methodology  with  which   a  dataset  was  created.   •  …  describes  the  dataset  itself.   •  …  details  the  reuse  potenKal  of  the  data.   •  …  is  oaen  authored  by  a  data  scienKst.   •  …  is  citable,  enabling  reuse  to  be  tracked.  
  • 9.
    brian.hole@ubiquitypress.com                              www.ubiquitypress.com  /  @ubiquitypress   General  structure   •  Title   •  Authors,  affiliaKons   •  Abstract   •  Keywords   •  Context   •  SpaKal  coverage,  temporal  coverage   •  Methods   •  Steps,  sampling  strategy,  quality  control,  constraints,   ethical  consideraKons       •  Dataset  descripKon   •  Object  names,  data  type,  format  names  &  versions,   creators,  creaKon  dates,  language,  license,  locaKon   (DOI),  publicaKon  date   •  Reuse  potenKal   •  Acknowledgements   •  References  
  • 10.
    brian.hole@ubiquitypress.com                              www.ubiquitypress.com  /  @ubiquitypress   1. The  paper  contents   a.  The  methods  secKon  of  the  paper  must  provide   sufficient  detail  that  a  reader  can  understand  how   the  resource  was  created.   b.  The  resource  must  be  correctly  described.   c.  The  reuse  secKon  must  provide  concrete  and  useful   suggesKons  for  reuse  of  the  reuse.   2.  The  deposited  resource   a.  The  repository  must  be  suitable  for  resource   and  have  a  sustainability  model.   b. Open  license  permits  unrestricted  access  (e.g.  CC0).   c.  A  version  in  an  open,  non-­‐proprietary  format.   d. Labeled  in  such  a  way  that  a  3rd  party  can  make   sense  of  it.   e.  Must  be  acKonable.   Peer  review  
  • 11.
    brian.hole@ubiquitypress.com                              www.ubiquitypress.com  /  @ubiquitypress   •  Data  journals  need  to  be  built  within  the  community,  and   to  adapt  to  its  requirements   Important  principles   •  Community  ownership  and  trust  is  important   •  Full  transparency  in  processes  and  finances   •  Sustainability   •  Low  barriers  essenKal   •  Zero  to  low  fees   •  Quick  online  authoring   •  Repository  integraKon  
  • 12.
    brian.hole@ubiquitypress.com                              www.ubiquitypress.com  /  @ubiquitypress   PRIME:  Use  Case  #1   •  A  UCL  Researcher  deposits  data  in  an  external  subject  repository.     •  The  subject  repository  sends  the  metadata  and  DOI  of  the  data  to  the   UCL  insKtuKonal  repository  so  that  it  has  a  record  of  the  output.    
  • 13.
    brian.hole@ubiquitypress.com                              www.ubiquitypress.com  /  @ubiquitypress   Text  and  data  mining   [the  benefits  of  text  mining  include]:  “increased  researcher  efficiency;   unlocking  hidden  informaKon  and  developing  new  knowledge;  exploring   new  horizons;  improved  research  and  evidence  base;  and  improving  the   search  process  and  quality.  Broader  economic  and  societal  benefits   include  cost  savings  and  producKvity  gains,  innovaKve  new  service   development,  new  business  models  and  new  medical  treatments.”   JISC   “The  downstream  value  of  high  quality,  high  throughput  chemical   informaKon  extracted  from  the  literature  can  be  measured  against   convenKonal  abstracKon  services…  with  a  combined  annual  turnover  of   perhaps  $500-­‐1,000  million  dollars.  We  believe  our  tools  are  capable  of   building  the  next  and  beoer  generaKon  of  services.”   Peter  Murray-­‐Rust  
  • 14.
    brian.hole@ubiquitypress.com                              www.ubiquitypress.com  /  @ubiquitypress   “Licences  for  Europe”   •  Focus  was  to  create  new  licenses  to  enable  TDM   •  I.e.  researcher  would  need  one  license  from  each   publisher.  Much  TDM  work  involves  hundreds  of   publishers,  can  take  weeks  just  for  one.   •  Focus  pre-­‐determined  from  start:  to  come  up  with   proposals  on  licenses  only.  Discussion  of  excepKons   allowed  but  not  to  be  part  of  recommendaKons.   •  Unbalanced  setup:  large  corporate  publishers,  technology   sector  poorly  represented.   Working  Group  4:  Text  and  Data  Mining   •  UP  walked  out  with  civil  society  groups.  Not  prepared  to   endorse  licenses  as  acceptable.     •  Tell  your  publisher  or  associaKon  that  this  is  important  to  you.   •  Workshop  at  the  BL  to  inform  policy  makers  in  late  Sept  2013.  
  • 15.
    brian.hole@ubiquitypress.com                              www.ubiquitypress.com  /  @ubiquitypress   Links   hop://www.ubiquitypress.com   hop://www.metajnl.com     hop://www.ucl.ac.uk/prime   hop://www.ubiquitypress.com/Licence_for_Europe_Text_Data_Mining_Withdrawal       brian.hole@ubiquitypress.com