Showing posts with label Software libraries. Show all posts
Showing posts with label Software libraries. Show all posts

Wednesday, 17 September 2014

Who is a senior developer anyway?

Who is a senior developer anyway?

What makes you a “senior developer”? Everyone and their dog calls themselves a senior developer these days. From fresh graduates to the CTO, everyone is a senior developer. But what the hell does it even mean?

Technologists

Some developers are avid technologists. They got into programming really because they like tinkering. If it hadn’t been 7 languages in 7 weeks, it would have been a box of meccano or they’d be in their shed busy inventing the battery operated self-tieing tie. These people are great to have on your team, they’ll be the ones continually bombarding you with the latest and greatest shiny. If you ever want to know if there’s an off the shelf solution to your problem, they’ll know the options, have tried two of them, and currently have a modified version of a third running on their raspberry pi.

The trouble with technologists is more technology is always the answer. Why have a HTTP listener when you can have a full stack application server? Why use plain old TCP when you can introduce an asynchronous messaging backbone? Why bother trying to deliver software when there’s all these toys to play with!

Wednesday, 20 August 2014

ProteoStats: Computing false discovery rates in proteomics

By Amit K. Yadav (@theoneamit) & Yasset Perez-Riverol (@ypriverol):

Perl is a legacy language thought to be abstruse by many modern programmers. I’m passionate with the idea of not letting die a programming language such as Perl. Even when the language is used less in Computational Proteomics, it is still widely used in Bioinformatics. I’m enthusiastic writing about new open-source libraries in Perl that can be easily used. Two years ago, I wrote a post about InSilicoSpectro and how it can be used to study protein databases like I did in “In silico analysis of accurate proteomics, complemented by selective isolation of peptides”. 

Today’s post is about ProteoStats [1], a Perl library for False Discovery Rate (FDR) related calculations in proteomics studies. Some background for non-experts:

One of the central and most widely used approach for shotgun proteomics is the use of database search tools to assign spectra to peptides (called as Peptide Spectrum Matches or PSMs). To evaluate the quality of the assignments, these programs need to calculate/correct for population wise error rates to keep the number of false positives under control. In that sense, the best strategy to control the false positives is the target-decoy approach. Originally proposed by Elias & Gygi in 2007, the so-called classical FDR strategy or formula proposed involved a concatenated target-decoy (TD) database search for FDR estimation. This calculation is either done by the search engine or using scripts (in-house, non-published, not benchmarked, different implementations). 

So far, the only library developed to compute FDR at spectra level, peptide level and protein level FDRs is MAYU [2]. But, while MAYU only uses the classical FDR approach, ProteoStats provides options for 5 different strategies for calculating the FDR. The only prerequisite being that you need to search using a separate TD database as proposed by Kall et al (2008) [3]. Also, ProteoStats provides a programming interface that can read the native output from most widely used search tools and provide FDR related statistics. In case of tools not supported, pepXML, which has become a de facto standard output format, can be directly read along with tabular text based formats like TSV and CSV (or any other well-defined separator). 

Wednesday, 8 January 2014

Are you a Computational Biologist or Bioinformaticist or Bioinformatician?

A recent discussion was provoked by on twitter January 8 regarding what is the choice term for referring to those researchers working on Bioinformatics and Computational Biology fields.
This debate is older than people may think and it looks like an insignificant topic, but when you are writing your CV or your internet profile, or you’re looking for a new job, you will need a professional title, and that is really important. If you also look the volume of discussion and opinions about this topic on internet you will realize that the community have different points of view. I've use some of my spare time to read in detail different opinions about it, and also collect some those opinions and articles. Let’s see what common terms are used nowadays for these researchers:

Bioinformaticist, Bioinformatician, Computational Biologist, Digital biologist, bioinformatics analyst


Thursday, 24 October 2013

Creating an Open Source Revolution in Computational Proteomics

First of all, I don’t want to discuss in this post about Open-Source, its strengths & strengths. This post is about the most useful Open-Source packages, frameworks or libraries in the field of computational proteomics (a short version of our manuscript “Open source libraries and frameworks for Mass Spectrometry based Proteomics: A developer’s perspective”). 


Schema of the possible computational processing steps of a proteomics data set.

In proteomics like other Omics, the bioinformatics efforts can be divided in three major fields: data processing, storage and visualization. From MS/MS preprocessing to post-processing of the identifications results, even though the objectives of these libraries and packages can vary significantly, they usually share a number of features. Common use cases include the handling of protein and peptide sequences, the parsing of results from various proteomics search engines output files, and the visualization of MS-related information (including mass spectra and chromatograms).