Monday, 20 February 2012

Accelerating Scientific Progress through Public Availability of Research Data

This session featured three speakers: Atul Butte from Stanford, Vernon Asper of the University of Southern Mississippi, and Heather Piwowar of NESCent / University of British Columbia.

Atul Butte described the revolutionary effects of public microarray data repositories such as GEO. dbGaP collects a wide range of genetics data, including all data from the Framingham Heart Study (some data requires permission to access).  He also talked about Assay Depot, which allows scientists to order tests from labs worldwide. Often these labs offer services beyond what's available locally, and are cost-effective and easily accessible to dry bench researchers. Butte emphasized that "you can never outsource asking good questions," but researchers no longer need to gather all the data on their own.

During the 2010 Gulf of Mexico oil spill, Vernon Asper conducted research on board the RV Pelican. As part of a government process called Natural Resources Damage Assessment (NRDA), a database called ERMA was created showing where data on the spill had been collected. However, these data are not actually publicly accessible, and academic scientists were largely left out of the NRDA process. Academic scientists had difficulty coordinating efforts, resulting in duplication as well as missed opportunities. Three main issues hampered sharing of data: legal investigations limited what government agencies could release; some journals refused to publish articles if preliminary findings had been reported; and news media exaggerated scientists' findings in several cases. Asper recommended Nature's coverage of the spill, as their reporters were actually on scene.

Heather Piwowar situated the discussion of data in a broader context. Despite the growth in publicly available datasets, most data are not found in repositories and many authors are unwilling to share data when asked (particularly in cancer research). When 5-10% of papers contain errors which change the conclusions, it's crucial for other scientists to have access to data in order to replicate findings. Piwowar studies which datasets get reused; to do this, she follows citations in Google Scholar and Web of Science, then reads citing papers to find out if the citation means the data was actually reused. She's found that data housed in some repositories receive much more reuse than others. She also discussed the altmetrics movement as another way to attempt to measure data reuse, including the project she's involved with, total-impact. When researchers see that "the pain is worth the gain" - that their data does get reused - more of them will share their data, strengthening the scientific enterprise as a whole.

No comments:

Post a Comment