Archiving	
  Small	
  
                    Science	
  Data	
  Sets	
  
                               	
  
Carly	
  Strasser	
  |	
  carly.strasser@ucop.edu	
  |	
  www.carlystrasser.net	
  
John	
  Kunze	
  
Patricia	
  Cruse	
  




                           Personal	
  Digital	
  Archiving	
  	
  |	
  	
  February	
  2012	
  
UGLY TRUTH
                                                    Many	
  
                                                    Earth	
  |	
  Environmental	
  |	
  Ecological	
  
                                                    scientists…	
  	
  
                                                    	
  
5shortessays.blogspot.com	
  



                                                                 	
  
                          are	
  not	
  taught	
  data	
  management	
  
                          don’t	
  know	
  what	
  metadata	
  are	
  
                          can’t	
  name	
  data	
  centers	
  or	
  repositories	
  
                          don’t	
  share	
  data	
  publicly	
  or	
  store	
  it	
  in	
  an	
  archive	
  
                          aren’t	
  convinced	
  they	
  should	
  share	
  data	
  

                                                                           	
  
Where	
  do	
  data	
  end	
  up?	
  
                                                        From	
  Flickr	
  by	
  diylibrarian	
  




                                                                                                   www




                          blog.order2disorder.com	
  




                                                                                                   From	
  Flickr	
  by	
  csessums	
  
  Data	
  
Metadata	
  




                                                                                                       From	
  Flickr	
  by	
  csessums	
  
                                                                           Recreated	
  from	
  Klump	
  et	
  al.	
  2006	
  
Where	
  do	
  data	
  end	
  up?	
  
                                                                     From	
  Flickr	
  by	
  diylibrarian	
  




                                                                                                                www




  Data	
  
                                                                                          www
Metadata	
  
                              From	
  Flickr	
  by	
  torkildr	
  




                                                                                        Recreated	
  from	
  Klump	
  et	
  al.	
  2006	
  
Facilitate	
  

                        Archiving	
  
        Data	
                              Data	
  Reuse	
  
management	
             Sharing	
  
&	
  organization	
                       Reproducibility	
  

                        Publishing	
  
Why	
  are	
  you	
  
                                                                   promoting	
  
                                                                     Excel?	
  

    Develop	
  an	
  open	
  source	
  &	
  free	
  	
  
                              Excel	
  add-­‐in	
  
                                               	
  
Add-­‐in:	
   	
  Little	
  pieces	
  of	
  software	
  	
  
   	
     	
     	
  Download	
  to	
  extend	
  the	
  capabilities	
  of	
  Excel	
  
   	
     	
     	
  Appear	
  as	
  “ribbon”	
  




                                                                               www.ablebits.com	
  
Why	
  are	
  you	
  
                             promoting	
  
                               Excel?	
  




Everyone	
  uses	
  it	
  
Stopgap	
  measure	
  
	
  




	
  
DCXL	
  Project	
  Goals	
  

Audience:	
  Earth,	
  atmospheric,	
  
environmental,	
  ecological	
  scientists	
  
	
  
	
  



Contributors:	
  UC	
  community,	
  DataONE,	
  
broader	
  community	
  via	
  conferences	
  
	
  
	
  


Method:	
  Collect	
  requirements	
  via	
  surveys,	
  
interviews,	
  polls	
  
	
  
	
  
~ 150	
  scientists	
  
•  No	
  data	
  preservation	
  
   –  Unaware	
  of	
  archives	
  
   –  Resistant	
  to	
  sharing	
  
•  Poor	
  data	
  documentation	
  
•  90%	
  use	
  other	
  programs	
  along	
  with	
  Excel	
  
Requirements	
  
1.  Must	
  work	
  for	
  Excel	
  users	
  without	
  the	
  add-­‐in	
  
2.  No	
  additional	
  software	
  (other	
  than	
  add-­‐in	
  and	
  Excel)	
  necessary	
  
3.  Can	
  be	
  used	
  offline	
  
Requirements	
  
1.    Must	
  work	
  for	
  Excel	
  users	
  without	
  the	
  add-­‐in	
  
2.    No	
  additional	
  software	
  (other	
  than	
  add-­‐in	
  and	
  Excel)	
  necessary	
  
3.    Can	
  be	
  used	
  offline	
  
4.    Perform	
  CSV	
  compatibility	
  checks,	
  reporting,	
  and	
  automated	
  fixes	
  
Requirements	
  
1.  Must	
  work	
  for	
  Excel	
  users	
  without	
  the	
  add-­‐in	
  
2.  No	
  additional	
  software	
  (other	
  than	
  add-­‐in	
  and	
  Excel)	
  necessary	
  
3.  Can	
  be	
  used	
  offline	
  
4.  Perform	
  CSV	
  compatibility	
  checks,	
  reporting,	
  and	
  automated	
  fixes	
  
5.  Add	
  Metadata	
  to	
  data	
  file	
  
     a.  Can	
  use	
  existing	
  metadata	
  as	
  a	
  template	
  
     b.  Add-­‐in	
  can	
  automatically	
  generate	
  some	
  of	
  the	
  metadata	
  
           where	
  the	
  info	
  is	
  available	
  from	
  the	
  file	
  
6.  Generate	
  a	
  citation	
  for	
  the	
  data	
  file	
  
Requirements	
  
1.  Must	
  work	
  for	
  Excel	
  users	
  without	
  the	
  add-­‐in	
  
2.  No	
  additional	
  software	
  (other	
  than	
  add-­‐in	
  and	
  Excel)	
  necessary	
  
3.  Can	
  be	
  used	
  offline	
  
4.  Perform	
  CSV	
  compatibility	
  checks,	
  reporting,	
  and	
  automated	
  fixes	
  
5.  Add	
  Metadata	
  to	
  data	
  file	
  
     a.  Can	
  use	
  existing	
  metadata	
  as	
  a	
  template	
  
     b.  Add-­‐in	
  can	
  automatically	
  generate	
  some	
  of	
  the	
  metadata	
  
           where	
  the	
  info	
  is	
  available	
  from	
  the	
  file	
  
6.  Generate	
  a	
  citation	
  for	
  the	
  data	
  file	
  
7.  Deposit	
  data	
  and	
  metadata	
  in	
  a	
  repository	
  
dcxl.cdlib.org	
  




DCXLatCDL	
  




          @dcxlCDL	
  
dcxl.cdlib.org	
  
@dcxlCDL	
  
www.facebook.com/DCXLatCDL	
  


                                     www.carlystrasser.net	
  
                                 carlystrasser@gmail.com	
  
                                            @carlystrasser	
  

DCXL Lightning Talk: Archiving Small Datasets

  • 1.
    Archiving  Small   Science  Data  Sets     Carly  Strasser  |  carly.strasser@ucop.edu  |  www.carlystrasser.net   John  Kunze   Patricia  Cruse   Personal  Digital  Archiving    |    February  2012  
  • 2.
    UGLY TRUTH Many   Earth  |  Environmental  |  Ecological   scientists…       5shortessays.blogspot.com     are  not  taught  data  management   don’t  know  what  metadata  are   can’t  name  data  centers  or  repositories   don’t  share  data  publicly  or  store  it  in  an  archive   aren’t  convinced  they  should  share  data    
  • 3.
    Where  do  data  end  up?   From  Flickr  by  diylibrarian   www blog.order2disorder.com   From  Flickr  by  csessums   Data   Metadata   From  Flickr  by  csessums   Recreated  from  Klump  et  al.  2006  
  • 4.
    Where  do  data  end  up?   From  Flickr  by  diylibrarian   www Data   www Metadata   From  Flickr  by  torkildr   Recreated  from  Klump  et  al.  2006  
  • 6.
    Facilitate   Archiving   Data   Data  Reuse   management   Sharing   &  organization   Reproducibility   Publishing  
  • 7.
    Why  are  you   promoting   Excel?   Develop  an  open  source  &  free     Excel  add-­‐in     Add-­‐in:    Little  pieces  of  software          Download  to  extend  the  capabilities  of  Excel        Appear  as  “ribbon”   www.ablebits.com  
  • 8.
    Why  are  you   promoting   Excel?   Everyone  uses  it   Stopgap  measure      
  • 9.
    DCXL  Project  Goals   Audience:  Earth,  atmospheric,   environmental,  ecological  scientists       Contributors:  UC  community,  DataONE,   broader  community  via  conferences       Method:  Collect  requirements  via  surveys,   interviews,  polls      
  • 10.
    ~ 150  scientists   •  No  data  preservation   –  Unaware  of  archives   –  Resistant  to  sharing   •  Poor  data  documentation   •  90%  use  other  programs  along  with  Excel  
  • 11.
    Requirements   1.  Must  work  for  Excel  users  without  the  add-­‐in   2.  No  additional  software  (other  than  add-­‐in  and  Excel)  necessary   3.  Can  be  used  offline  
  • 12.
    Requirements   1.  Must  work  for  Excel  users  without  the  add-­‐in   2.  No  additional  software  (other  than  add-­‐in  and  Excel)  necessary   3.  Can  be  used  offline   4.  Perform  CSV  compatibility  checks,  reporting,  and  automated  fixes  
  • 13.
    Requirements   1.  Must  work  for  Excel  users  without  the  add-­‐in   2.  No  additional  software  (other  than  add-­‐in  and  Excel)  necessary   3.  Can  be  used  offline   4.  Perform  CSV  compatibility  checks,  reporting,  and  automated  fixes   5.  Add  Metadata  to  data  file   a.  Can  use  existing  metadata  as  a  template   b.  Add-­‐in  can  automatically  generate  some  of  the  metadata   where  the  info  is  available  from  the  file   6.  Generate  a  citation  for  the  data  file  
  • 14.
    Requirements   1.  Must  work  for  Excel  users  without  the  add-­‐in   2.  No  additional  software  (other  than  add-­‐in  and  Excel)  necessary   3.  Can  be  used  offline   4.  Perform  CSV  compatibility  checks,  reporting,  and  automated  fixes   5.  Add  Metadata  to  data  file   a.  Can  use  existing  metadata  as  a  template   b.  Add-­‐in  can  automatically  generate  some  of  the  metadata   where  the  info  is  available  from  the  file   6.  Generate  a  citation  for  the  data  file   7.  Deposit  data  and  metadata  in  a  repository  
  • 15.
  • 16.
    dcxl.cdlib.org   @dcxlCDL   www.facebook.com/DCXLatCDL   www.carlystrasser.net   carlystrasser@gmail.com   @carlystrasser