BigBio Notes: perl proteomics

Showing posts with label perl proteomics. Show all posts

Friday, 2 January 2015

Brazil: A place for Science and Friendship

Búzios

It's really difficult to break stereotypes, especially for developing countries, like Brazil. If you mention its name around the world they are immediately associated with: sports, music, beaches, rum and "País do Carnaval". If you ask to someone in the streets of Germany or China about personalities from Brazil, they will mention Pelé. Breaking stereotypes is a task for years or centuries but we are going in the right direction.

Hotel Ferradura/ Ferradura Resort

Last December I attended to the 2nd Proteomics Meeting of the Brazilian Proteomics Society jointly with the 2nd Pan American HUPO Meeting in Hotel Ferradura/ Ferradura Resort, Búzios, Rio de Janeiro State, Brazil. The venue was gorgeous, mountains close to a small bay that offers calm, clear waters and the open sea. We arrived after 2 hours by car from Rio international airport. My plans, give a talk about PRIDE and ProteomeXchange but more than that, my talk was about "if we really need to share our proteomics data".

ProteoStats: Computing false discovery rates in proteomics

By Amit K. Yadav (@theoneamit) & Yasset Perez-Riverol (@ypriverol):

Perl is a legacy language thought to be abstruse by many modern programmers. I’m passionate with the idea of not letting die a programming language such as Perl. Even when the language is used less in Computational Proteomics, it is still widely used in Bioinformatics. I’m enthusiastic writing about new open-source libraries in Perl that can be easily used. Two years ago, I wrote a post about InSilicoSpectro and how it can be used to study protein databases like I did in “In silico analysis of accurate proteomics, complemented by selective isolation of peptides”.

Today’s post is about ProteoStats [1], a Perl library for False Discovery Rate (FDR) related calculations in proteomics studies. Some background for non-experts:

One of the central and most widely used approach for shotgun proteomics is the use of database search tools to assign spectra to peptides (called as Peptide Spectrum Matches or PSMs). To evaluate the quality of the assignments, these programs need to calculate/correct for population wise error rates to keep the number of false positives under control. In that sense, the best strategy to control the false positives is the target-decoy approach. Originally proposed by Elias & Gygi in 2007, the so-called classical FDR strategy or formula proposed involved a concatenated target-decoy (TD) database search for FDR estimation. This calculation is either done by the search engine or using scripts (in-house, non-published, not benchmarked, different implementations).

So far, the only library developed to compute FDR at spectra level, peptide level and protein level FDRs is MAYU [2]. But, while MAYU only uses the classical FDR approach, ProteoStats provides options for 5 different strategies for calculating the FDR. The only prerequisite being that you need to search using a separate TD database as proposed by Kall et al (2008) [3]. Also, ProteoStats provides a programming interface that can read the native output from most widely used search tools and provide FDR related statistics. In case of tools not supported, pepXML, which has become a de facto standard output format, can be directly read along with tabular text based formats like TSV and CSV (or any other well-defined separator).

Perl Proteomics & InSilicoSpectro

In contrast with genomics, bioinformaticians in proteomics don’t have a "big" and "complete" perl library for proteomics data analysis. It could be related with the "heterogeneity" in proteomics. A lot of different instruments, protocols, properties. Also genomic have a huge community (bioinformaticians) and standardize tools (instruments and software’s). In 2006 Collinge and Masselot published an open-source perl library named InSilicoSpectro. The aim was provide a set of recurrent functions that are necessary for proteomics data analysis.

Some of the Illustrative functions are: mz list file format conversions, protein sequence digestion, theoretical peptide and fragment mass computations, graphical display, matching with experimental data, isoelectric point estimation (with different methods), and peptide retention time prediction.

BigBio Notes

Friday, 2 January 2015

Brazil: A place for Science and Friendship

Wednesday, 20 August 2014

ProteoStats: Computing false discovery rates in proteomics

Wednesday, 25 April 2012

Perl Proteomics & InSilicoSpectro