Showing posts with label peptide/protein identification. Show all posts
Showing posts with label peptide/protein identification. Show all posts

Saturday, 28 November 2015

Protein identification with Comet, PeptideProphet and ProteinProphet using BioDocker containers


Proteomics data analysis is dominated by database-based search engines strategies. Perhaps the most common protocol today is to retrieve raw data from a mass spectrometry, convert the raw data from binary format to a text-based format and then process it using a database search algorithm. The resulting data need to be statistically filtered in order to converge to a final list of identified peptides and proteins.

Amount Search Engines, Comet (the youngest son of SEQUEST) is one of the most popular nowadays. Today we are going to show how to run a simple analysis protocol using the Comet database search engine followed by statistical analysis using PeptideProphet and ProteinProphet, two of the most known and robust processing algorithms for proteomics data.

This pipeline is available in TPP, however several users prefer to use the individual components rather than Trans-proteomics Pipeline.  The big differential here is how we are going to do it. Instead of going through the step-by-step in how to install and configure Comet and TPP, we are going to run the pipeline using Docker containers from the BioDocker project (you can get more information on the project here).

Saturday, 4 October 2014

Analysis of histone modifications with PEAKS 7: A respond to Search Engines comparison from PEAKs Team

Recently we posted a comparison of different search engines for PTMs studies (Evaluation of Proteomic Search Engines for PTMs Identification). After some discussion of the mentioned results in our post the  PEAKS Team just published a blog post with the reanalysis of the dataset. Here the results:

Originally Posted in Peaks Blog:
The complex nature of histone modification patterns has posed as a challenge for bioinformatics analysis over the years. Yuan et al. [1] conducted a study using two datasets from human HeLa histone samples, to benchmark the performance of current proteomic search engines. This article was published in J Proteome Res. 2014 Aug 28 (PubMed), and the data from the two datasets, HCD_Histone and CID_Histone (PXD001118), was made publically available through ProteomeXchange. With this data, the article uses eight different proteomic search engines to compare and evaluate the performance and capability of each. The evaluated search engines in this study are: pFind, Mascot, SEQUEST, ProteinPilot, PEAKS 6, OMSSA, TPP and MaxQuant. 
In this study, PEAKS 6 was used to compare the performance capabilities between search engines. However, PEAKS 7, which was released November 2013, is the latest version available of the PEAKS Studio software. PEAKS 7 not only includes better performance than PEAKS 6, but a lot of additional and improved features. Our team has reanalyzed the two datasets HCD_Histone and CID_Histone with PEAKS 7 to update the ID results presented in the publication by Yuan et al.  These updated results showed that instead, it is PEAKS, pFind and Mascot that identify the most confident results.

Thursday, 24 October 2013

Creating an Open Source Revolution in Computational Proteomics

First of all, I don’t want to discuss in this post about Open-Source, its strengths & strengths. This post is about the most useful Open-Source packages, frameworks or libraries in the field of computational proteomics (a short version of our manuscript “Open source libraries and frameworks for Mass Spectrometry based Proteomics: A developer’s perspective”). 


Schema of the possible computational processing steps of a proteomics data set.

In proteomics like other Omics, the bioinformatics efforts can be divided in three major fields: data processing, storage and visualization. From MS/MS preprocessing to post-processing of the identifications results, even though the objectives of these libraries and packages can vary significantly, they usually share a number of features. Common use cases include the handling of protein and peptide sequences, the parsing of results from various proteomics search engines output files, and the visualization of MS-related information (including mass spectra and chromatograms).

Monday, 14 October 2013

What is your tool for peptide/protein identification?

In 2012 We published a Poll about most used softwares from peptide/protein identification in proteomics in Computational Proteomics Linkedin Group. I decided to reproduce the Poll here because linkedin remove this option and also here i have the opportunity to add more softwares to the list.