BigBio Notes: in-house programs

Monday, 28 October 2013

One step ahead in Bioinformatics using Package Repositories

About a year ago I published a post about in-house tools in research and how using this type of software may end up undermining the quality of a manuscript and the reproducibility of its results. While I can certainly relate to someone reluctant to release nasty code (i.e. not commented, not well-tested, not documented), I still think we must provide (as supporting information) all “in-house” tools that have been used to reach a result we intend to publish. This applies especially to manuscripts dealing with software packages, tools, etc. I am willing to cut some slack to journals such as Analytical Chemistry or Molecular Cell Proteomics, whose editorial staffs are –and rightly so- more concerned about quality issues involving raw data and experimental reproducibility, but in instances like Bioinformatics, BMC Bioinformatics, several members of the Nature family and others at the forefront of bioinformatics, methinks we should hold them to a higher standard. Some of these journals would greatly benefit from implementing a review system from the point of view of Software Production, moving bioinformatics and science in general one step forward in terms of reproducibility and software reusability. What do you think would happen if the following were checked during peer reviewing?

Monday, 21 May 2012

An "in-house" Tool

One of the small hidden details in publications, even in those with a higher impact, is the use of "in-house programs". What is an "in-house" program or tool: Normally is a piece of software that researchers use to analyze process or visualize the experimental data, but most important the software it-self is not published.

The term by itself is inoffensive, but the concept could be extremely dangerous. We can cite hundreds of manuscripts that included in the data analysis "in-house" tools, but never the terms "in-house instruments". The authors always needs to cite the manufacturer, the reagents, even the year and the company. I know, we have a section to describe data processing but mostly we cite some parameters, and the well known software like search engines (Mascot, X!Tandem, Sequest, etc). But at some point of this section several times you can find the term "in-house" tool. It could be a reference to an excel formula or to a complete and complex java program with many tasks like parsing a search engine output, computing the FDR, removing false-positive identifications, computing peptide-spectrum-match redundancy, etc. The are not a real/objective measure to distinguish between a little-simple tool and a complex tool one.