BigBio Notes: Mascot

Showing posts with label Mascot. Show all posts

Saturday, 4 October 2014

Analysis of histone modifications with PEAKS 7: A respond to Search Engines comparison from PEAKs Team

Recently we posted a comparison of different search engines for PTMs studies (Evaluation of Proteomic Search Engines for PTMs Identification). After some discussion of the mentioned results in our post the PEAKS Team just published a blog post with the reanalysis of the dataset. Here the results:

Originally Posted in Peaks Blog:

The complex nature of histone modification patterns has posed as a challenge for bioinformatics analysis over the years. Yuan et al. [1] conducted a study using two datasets from human HeLa histone samples, to benchmark the performance of current proteomic search engines. This article was published in J Proteome Res. 2014 Aug 28 (PubMed), and the data from the two datasets, HCD_Histone and CID_Histone (PXD001118), was made publically available through ProteomeXchange. With this data, the article uses eight different proteomic search engines to compare and evaluate the performance and capability of each. The evaluated search engines in this study are: pFind, Mascot, SEQUEST, ProteinPilot, PEAKS 6, OMSSA, TPP and MaxQuant.

In this study, PEAKS 6 was used to compare the performance capabilities between search engines. However, PEAKS 7, which was released November 2013, is the latest version available of the PEAKS Studio software. PEAKS 7 not only includes better performance than PEAKS 6, but a lot of additional and improved features. Our team has reanalyzed the two datasets HCD_Histone and CID_Histone with PEAKS 7 to update the ID results presented in the publication by Yuan et al. These updated results showed that instead, it is PEAKS, pFind and Mascot that identify the most confident results.

Evaluation of Proteomic Search Engines for PTMs Identification

The peptide-centric MS strategy is called bottom-up, in which proteins are extracted from cells, digested into peptides with proteases, and analyzed by liquid chromatography tandem mass spectrometry (LC−MS/MS). More specifically, peptides are resolved by chromatography, ionized in mass spectrometers, and scanned to obtain full MS spectra. Next, some high-abundance peptides (precursor ions) are selected and fragmented to obtain MS/MS spectra by high- energy C-trap dissociation (HCD) or collision-induced dissociation (CID).

Then, peptides are commonly identified by searching the MS/MS spectra against a database and finally assembled into identified proteins. Database searching plays an important role in proteomics analysis because it can be used to translate thousands of MS/MS spectra into protein identifications (IDs).

Many database search engines have been developed to quickly and accurately analyze large volumes of proteomics data. Some of the more well-known search engines are Mascot, SEQUEST, PEAKS DB, ProteinPilot, Andromeda, and X!Tandem. Here a list of commonly use search engines in proteomics and mass spectrometry.

ProteoStats: Computing false discovery rates in proteomics

By Amit K. Yadav (@theoneamit) & Yasset Perez-Riverol (@ypriverol):

Perl is a legacy language thought to be abstruse by many modern programmers. I’m passionate with the idea of not letting die a programming language such as Perl. Even when the language is used less in Computational Proteomics, it is still widely used in Bioinformatics. I’m enthusiastic writing about new open-source libraries in Perl that can be easily used. Two years ago, I wrote a post about InSilicoSpectro and how it can be used to study protein databases like I did in “In silico analysis of accurate proteomics, complemented by selective isolation of peptides”.

Today’s post is about ProteoStats [1], a Perl library for False Discovery Rate (FDR) related calculations in proteomics studies. Some background for non-experts:

One of the central and most widely used approach for shotgun proteomics is the use of database search tools to assign spectra to peptides (called as Peptide Spectrum Matches or PSMs). To evaluate the quality of the assignments, these programs need to calculate/correct for population wise error rates to keep the number of false positives under control. In that sense, the best strategy to control the false positives is the target-decoy approach. Originally proposed by Elias & Gygi in 2007, the so-called classical FDR strategy or formula proposed involved a concatenated target-decoy (TD) database search for FDR estimation. This calculation is either done by the search engine or using scripts (in-house, non-published, not benchmarked, different implementations).

So far, the only library developed to compute FDR at spectra level, peptide level and protein level FDRs is MAYU [2]. But, while MAYU only uses the classical FDR approach, ProteoStats provides options for 5 different strategies for calculating the FDR. The only prerequisite being that you need to search using a separate TD database as proposed by Kall et al (2008) [3]. Also, ProteoStats provides a programming interface that can read the native output from most widely used search tools and provide FDR related statistics. In case of tools not supported, pepXML, which has become a de facto standard output format, can be directly read along with tabular text based formats like TSV and CSV (or any other well-defined separator).

Some Reasons to Rename my Blog as BioCode's Notes

Hi Dear Readers:

I’ve decided that it would be prudent, exposure-wise, to change the name of my professional blog to BioCode's Notes, for a number of reasons:

1. People into bioinformatics comprise a significant part of my –alas, still small- readership. They tend to be always hungry for code tips, language comparisons, and other things that do not fit neatly under the umbrella of “computational proteomics”.

2. My own work is straying more and more from computational proteomics per se into other problems linking biology (Proteomics, Genomics, Life Sciences) with programming (R, Java, Perl, C++). Biocoding is now my bread-and-butter…

3. I need a shorter, catchier name that is easy to use in coffee talks, presentations, or when sharing links with friends.

4. I also decided to add a Blog's mascot, our T-rex:
              Truth    => Science is about Truth.
Tea: UK Science.
              STaTisTics => OK, this one’s got as many ‘S’ as ‘T’, but the latter is more frequent in English.
              T-rex => The future belongs to Big Data, which we’ll use (and are already
   using) to trace back the march of evolution to our preferred
                                 species, including the dinosaurs. And last, but not least, this is
                                 Abel’s (my son) favorite animal.

Hope you enjoy this Idea
Yasset

BigBio Notes

Saturday, 4 October 2014

Analysis of histone modifications with PEAKS 7: A respond to Search Engines comparison from PEAKs Team

Monday, 8 September 2014

Evaluation of Proteomic Search Engines for PTMs Identification

Wednesday, 20 August 2014

ProteoStats: Computing false discovery rates in proteomics

Tuesday, 22 October 2013

Some Reasons to Rename my Blog as BioCode's Notes