Showing posts with label PSI. Show all posts
Showing posts with label PSI. Show all posts

Friday, 11 September 2015

An API for all MS-based File formats

We recently released and published our first Java API (Application Programming Interface) for the most common file formats in proteomics, not only ms files but also identification files such as mzIdentML and mztab. 

ms-data-core-api (https://github.com/PRIDE-Utilities/ms-data-core-api)

The library allow the end-users and the developers to use a common data structure for proteomics independently of the file types, and .. But first lets try to understand what is a API.

What is an API?

Imagine you are a builder or civil engineering and your are building your bridge, different components, blocks and different teams needs to be coordinated and plugged for the final results. Wrong communications between the members of the teams, different block sizes or building plans only produced strange results. 

In the simplest terms, APIs are sets of requirements, data structures, objects that govern how applications and software components can talk each other. An API, is a set of routines and protocols that provide building blocks for computer programmers and web developers to build software applications. In the past, APIs were largely associated with computer operating systems and desktop applications. In recent years though, we have seen the emergence of Web APIs (Web Services).


What is ms-data-core-api?

The ms-data-core-api is a free, open-source library for developing computational proteomics tools and pipelines. The Application Programming Interface, written in Java, enables rapid tool creation by providing a robust, pluggable programming interface and common data model. The data model is based on controlled vocabularies/ontologies and captures the whole range of data types included in common proteomics experimental workflows, going from spectra to peptide/protein identifications to quantitative results. 

The library contains readers for three of the most used Proteomics Standards Initiative standard file formats: mzML, mzIdentML, and mzTab. In addition to mzML, it also supports other common mass spectra data formats: dta, ms2, mgf, pkl, apl (text-based), mzXML and mzData (XML-based). Also, it can be used to read PRIDE XML, the original format used by the PRIDE database, one of the world-leading proteomics resources. Finally, we present a set of algorithms and tools whose implementation illustrates the simplicity of developing applications using the library.