BigBio Notes: September 2015

Wednesday 30 September 2015

First Scrum Board

Here, my first Scrum board to guide the release of OmicsDI project.

Team members update the task board continuously each sprint; if someone thinks of a new task (“test a new machine learning algorithm”), she writes a new card and puts it on the wall. Either during or before the daily scrum, estimates are changed (up or down), and cards are moved around the board.

Each row on the Scrum board is a user story, which is the unit of work we encourage teams to use for their product backlog.

During the sprint planning meeting, the team selects the product backlog items they can complete during the next Spring. Each product backlog item is turned into multiple sprint backlog items. Each of these is represented by one task card that is placed on the Scrumboard.

Story (User Story): The story description (“As a user we want to…”) shown on that row.
Ongoing: Any card being worked on goes here. The programmer who chooses to work on it moves it over when she's ready to start the task. Often, this happens during the daily scrum when someone says, “I'm going to work on the boojum today.”
Testing: A lot of tasks have corresponding test task cards. So, if there's a “Code the boojum class” card, there is likely one or more task cards related to testing: “Test the boojum”, “Write FitNesse tests for the boojum,” “Write FitNesse fixture for the boojum,”
Done: Cards pile up over here when they're done. They're removed at the end of the sprint. Sometimes we remove some or all during a sprint if there are a lot of cards.

Optionally, depending on the team, the culture, the project and other considerations:

Notes: Just a place to jot a note or two.
Tests Specified: We like to do “Story Test-Driven Development,” or “Acceptance Test-Driven Development,” which means the tests are written before the story is coded. Many teams find that it helps to have acceptance tests identified before coding begins on a particular story. This column just contains a checkmark to indicate the tests are specified.

Friday 11 September 2015

An API for all MS-based File formats

We recently released and published our first Java API (Application Programming Interface) for the most common file formats in proteomics, not only ms files but also identification files such as mzIdentML and mztab.

ms-data-core-api (https://github.com/PRIDE-Utilities/ms-data-core-api)

The library allow the end-users and the developers to use a common data structure for proteomics independently of the file types, and .. But first lets try to understand what is a API.

What is an API?

Imagine you are a builder or civil engineering and your are building your bridge, different components, blocks and different teams needs to be coordinated and plugged for the final results. Wrong communications between the members of the teams, different block sizes or building plans only produced strange results.

In the simplest terms, APIs are sets of requirements, data structures, objects that govern how applications and software components can talk each other. An API, is a set of routines and protocols that provide building blocks for computer programmers and web developers to build software applications. In the past, APIs were largely associated with computer operating systems and desktop applications. In recent years though, we have seen the emergence of Web APIs (Web Services).

What is ms-data-core-api?

The ms-data-core-api is a free, open-source library for developing computational proteomics tools and pipelines. The Application Programming Interface, written in Java, enables rapid tool creation by providing a robust, pluggable programming interface and common data model. The data model is based on controlled vocabularies/ontologies and captures the whole range of data types included in common proteomics experimental workflows, going from spectra to peptide/protein identifications to quantitative results.

The library contains readers for three of the most used Proteomics Standards Initiative standard file formats: mzML, mzIdentML, and mzTab. In addition to mzML, it also supports other common mass spectra data formats: dta, ms2, mgf, pkl, apl (text-based), mzXML and mzData (XML-based). Also, it can be used to read PRIDE XML, the original format used by the PRIDE database, one of the world-leading proteomics resources. Finally, we present a set of algorithms and tools whose implementation illustrates the simplicity of developing applications using the library.