BigBio Notes: 2012

Monday 3 December 2012

#HavanaBioinfo2012 Workshop

As part of Heberprot-P Havana 2012 International Congress, the CIGB is organizing a pre-congress Workshop on Bioinformatics and Biotechnology Applications. These tools are widely used in studies related to the EGF-EGFr system and will be of relevant importance in the future development of new therapeutics. The course will cover topics such as: computational proteomics and genomics, data integration, expression data analysis and regulatory networks, protein-protein interaction networks, mathematical models of biochemical pathways, drug design, virtual screening, docking and QSAR. Bioinformatics and OMICS, Havana 2012 will be held from December 8th to 11th 2012 at Occidental Miramar Hotel. (Workshop Site)

Why R for Mass Spectrometrist and Computational Proteomics

Why R:

Actually, It is a common practice the integration of the statistical analysis of the resulted data and in silico predictions of the data generated in your manuscript and your daily research. Mass spectrometrist, biologist and bioinformaticians commonly use programs like excel, calc or other office tools to generate their charts and statistical analysis. In recent years many computational biologists especially those from the Genomics field, regard R and Bioconductor as fundamental tools for their research.

R is a modern, functional programming language that allows for rapid development of ideas; it is a language and environment for statistical computing and graphics.The rich set of inbuilt functions makes it ideal for high-volume analysis or statistical studies.

Computational Methods of AP/MS Protein Interaction Data

I want to share with you an excellent presentation from Professor Alexey Nesvizhskii (Dept. of Pathology, University of Michigan). Even when this presentation is from 2009, some concepts like protein inference, label free quantification are now generating an important number of new algorithms are tools. Also, it is an excellent starting point for biologist and developers about Computational Proteomics Algorithms and Methods.

Monday 21 May 2012

An "in-house" Tool

One of the small hidden details in publications, even in those with a higher impact, is the use of "in-house programs". What is an "in-house" program or tool: Normally is a piece of software that researchers use to analyze process or visualize the experimental data, but most important the software it-self is not published.

The term by itself is inoffensive, but the concept could be extremely dangerous. We can cite hundreds of manuscripts that included in the data analysis "in-house" tools, but never the terms "in-house instruments". The authors always needs to cite the manufacturer, the reagents, even the year and the company. I know, we have a section to describe data processing but mostly we cite some parameters, and the well known software like search engines (Mascot, X!Tandem, Sequest, etc). But at some point of this section several times you can find the term "in-house" tool. It could be a reference to an excel formula or to a complete and complex java program with many tasks like parsing a search engine output, computing the FDR, removing false-positive identifications, computing peptide-spectrum-match redundancy, etc. The are not a real/objective measure to distinguish between a little-simple tool and a complex tool one.

Perl Proteomics & InSilicoSpectro

In contrast with genomics, bioinformaticians in proteomics don’t have a "big" and "complete" perl library for proteomics data analysis. It could be related with the "heterogeneity" in proteomics. A lot of different instruments, protocols, properties. Also genomic have a huge community (bioinformaticians) and standardize tools (instruments and software’s). In 2006 Collinge and Masselot published an open-source perl library named InSilicoSpectro. The aim was provide a set of recurrent functions that are necessary for proteomics data analysis.

Some of the Illustrative functions are: mz list file format conversions, protein sequence digestion, theoretical peptide and fragment mass computations, graphical display, matching with experimental data, isoelectric point estimation (with different methods), and peptide retention time prediction.

BigBio Notes