About a year ago I published a post about in-house tools in research and how using this type of software may end up undermining the quality of a manuscript and the reproducibility of its results. While I can certainly relate to someone reluctant to release nasty code (i.e. not commented, not well-tested, not documented), I still think we must provide (as supporting information) all “in-house” tools that have been used to reach a result we intend to publish. This applies especially to manuscripts dealing with software packages, tools, etc. I am willing to cut some slack to journals such as Analytical Chemistry or Molecular Cell Proteomics, whose editorial staffs are –and rightly so- more concerned about quality issues involving raw data and experimental reproducibility, but in instances like Bioinformatics, BMC Bioinformatics, several members of the Nature family and others at the forefront of bioinformatics, methinks we should hold them to a higher standard. Some of these journals would greatly benefit from implementing a review system from the point of view of Software Production, moving bioinformatics and science in general one step forward in terms of reproducibility and software reusability. What do you think would happen if the following were checked during peer reviewing?
- Quality of the documentation, in terms of examples, use cases and in-code comments (functions, classes)
- Availability of a complete set of unit tests (Most programming languages contain packages providing a complete environment for testing all software components (classes, functions) of the tools and libraries they are used to develop).
- Reusability
Manuscripts should be sent for review not only to biologists or bioinformaticians with a background in biological sciences or chemistry, but also to researchers with strong, solid skills in the field of software production, who would then be able to perform a detailed analysis of the code, documentation quality and unit tests.
Another suggestion: tools and software should be available through package repositories (Maven, CPAN, CRAN, PyPI), a good example of which is the Journal of Statistical Software - CRAN engage. Such repositories would provide for easier finding, installing and testing of different packages and their dependencies, furthering the advance of science and research.
Another suggestion: tools and software should be available through package repositories (Maven, CPAN, CRAN, PyPI), a good example of which is the Journal of Statistical Software - CRAN engage. Such repositories would provide for easier finding, installing and testing of different packages and their dependencies, furthering the advance of science and research.
Number of packages per Repository in the last three years. |
The good news is that there already are software package repositories for most of the existing programming languages, and they keep growing (Figure 1):
- C++: Boost
- Haskell: Hackage
- Java: Maven and SpringSource
- JavaScript: Scripteka and JSAN & npm
- PHP: Pear and Pecl
- Perl: CPAN
- Python: PyPi
- R: CRAN, Bioconductor
- Ruby: RubyGems
Of course I am not holding my breath hoping that these suggestions reach the ears of a friendly science journal editor. But there’s one thing we (bioinformaticians and developers reading this post) can all do now, which is to up the ante in our tool development practices. What do you think?
Thanks Laurent for the suggestion. I will add Bioconductor
ReplyDelete