Saturday 9 November 2019

Where to deposit my proteomics data: ProteomeXchange

The ProteomeXchange (PX) (http://www.proteomexchange.org) consortium aggregates the major proteomics resources and has standardized data submission and dissemination of mass spectrometry proteomics data worldwide since 2012. Since its inception, the ProteomeXchange (PX) has aimed to standardize data submission and dissemination of public MS proteomics data worldwide.



Some Stats are always welcome: In terms of distribution of datasets across individual resources, 12 335 datasets (87.1%), had been submitted to PRIDE, followed by MassIVE  (1 126 datasets, 7.9%), jPOST (352 datasets, 2.5%), iProX (174 datasets, 1.2%), PASSEL (139 datasets, 1.0%) and Panorama Public  (43 datasets, 0.3%).


At present all of the partners store datasets mainly coming from DDA (Data Dependent Acquisition) workflows, but they can support other data workflows as well, for instance, DIA (Data Independent Acquisition) and top-down proteomics, among others, mainly as ‘Partial’ datasets.

Important concept: submissions can be COMPLETE or PARTIAL. COMPLETE is the ultimate goal ✊. 

Getting Bigger

Thanks to the perceived reliability of PX resources and in parallel, to the requirements of scientific journals and funding agencies, common practice has changed rapidly in the proteomics field and data sharing has become the norm; increasing the number of public data. At the end of June 2019, more than 14 100 datasets 💥 had been submitted to PX resources since 2012, and from those, more than 9 500 in just the last three years.

 


Two additional resources have joined the consortium in the last three years: iProX (National Center for Protein Sciences, Beijing, China) in 2017, and Panorama Public (13) (University of Washington, Seattle, WA, USA) in 2018. 


As a consequence of the unprecedented availability of proteomics data in the public domain, data re-use continues to increase and we anticipate that the number of resources re-using public proteomics data will only increase in the near future.

However, a major challenge is the lack of experimental design metadata, the good news is that in order to increase re-usability of datasets, we are working towards enabling improvements in metadata annotation by the data submitters, but also a posteriori by data curators and other third parties.

In parallel, ProteomeXchange resources increasingly integrate proteomics data in other EMBL-EBI resources such as UniProt, Ensembl, Expression Atlas or Nextprot.

Clinical and Controlled Access Proteomics Data 

One increasingly relevant topic is the management of clinical, potentially sensitive, proteomics data and whether they should be considered as patient identifiable or not. This topic has recently gained more relevance after the introduction of the GDPR (General Data Protection Regulation) guidelines by the European Union. The proteomics community needs to develop rules and best-practice guidelines for dealing with this type of datasets.

The PX is implementing frameworks for controlled-access proteomics data, so alternative data submission mechanisms will have to be developed for those. At present, authors that have already been advised to follow different data management practices for potentially sensitive proteomics datasets, are advised to contact resources such as EGA (European Genotype-phenome Archive), dbGAP or JGA (Japanese Genotype-phenotype Archive).

Where do I get updates about new datasets

For regular announcements of all the new publicly available datasets, users can follow our Twitter account (@proteomexchange) or subscribe to the following Rich Site Summary (RSS) feed (https://groups.google.com/forum/feed/proteomexchange/msgs/rss_v2_0.xml).



No comments:

Post a Comment