BigBio Notes

Saturday, 17 March 2018

Big Data: Is not only a fancy/catchy name

The field of biomedical research has a new trend to use fancy terms in the title of papers/grants in order to attract the attention of reviewers, journals and grant agencies. Amount others are: large-scale, complete map, draft, landscape, deep, full, and Big Data. Figure 1 shows the exponential use of these words in pubmed articles.

Figure 1: Number of mentions of specific terms in pubmed by years.

I will stop here to discuss the term Big data.

Data Visualization: Plots You Should be Using More

Inspired by this blog post

1- Parallel Coordinates — A parallel coordinates graph arrays multiple variables alongside one another with each scaled from highest to the lowest value (highest at the top, lowest at the bottom) and with lines connecting each entity’s position for each variable, horizontally across the graph. Due to a large number of cases represented, it is often presented using an interactive view where individual lines can be selected and highlighted.

Monit: Monitoring your Services

Bioinformatics Applications are moving more in the direction of "Microservices" Architectures where services should be fine-grained and the protocols should be lightweight. Microservices Architectures decomposed the application into different smaller services improving the modularity; making the application easier to develop, deploy and maintain. It also parallelizes development by enabling small autonomous teams to develop, deploy and scale their respective services independently.

With more services (Databases, APIs, Web Applications, Pipelines) more components should be trace, monitor, to know the health of your application. There might be different roles that are played by different services (in different physical/logical machines) that can be even geographically isolated from each other. As a whole, these services/servers might be providing a combined service to the end application. A particular issue or problem on any of the server should not affect the final application behavior and must be found and fixed before the outage happens.

Multiple applications allow developers/devops and sysadmins to monitor all the services in a microservices application, but the most popular ones are Nagios and Monit.

How to estimate and compute the isoelectric point of peptides and proteins?

By +Yasset Perez-Riverol and +Enrique Audain :

Isoelectric point (pI) can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge is equal to zero. Currently, there are available different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Peptide/Protein fractionation according to their pI is widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. The experimental pI records generated by pI-based fractionation procedures are a valuable information to validate the confidence of the identifications, to remove false positive and and could be used to re-compute peptide/protein posterior error probabilities in MS-based proteomics experiments.

Theses approaches require an accurate theoretical prediction of pI. Even thought there are several tools/methods to predict the isoelectric point, it remains hard to define beforehand what methods perform well on a specific dataset.

We believe that the best way to compute the isoelectric point (pI) is to have a complete package with most of the algorithms and methods in the state of the art that can do the job for you [2]. We recently developed an R package (pIR) to compute isoelectric point using long-standing and novels pI methods that can be grouped in three main categories : a) iterative, b) Bjellvist-based methods and c) machine learning methods. In addition, pIR also offers a statistical and graphical framework to evaluate the performance of each method and its capability to “detect” outliers (those peptides/protein with theoretical pI biased from experimental value) in a high-throughput environment.

First lets install the package:

First, we need to install devtools:

install.packages("devtools")
library(devtools)

Then we just call:

install_github("ypriverol/pIR")
library(pIR)

Useful Links to prepare your presentation or talk

Creating effective slides: Design, Construction, and Use in Science:

GitHub in Numbers for Bioinformaticians

Impact of Github in Numbers:

BigBio Notes

Saturday, 17 March 2018

Big Data: Is not only a fancy/catchy name

Sunday, 11 March 2018

Data Visualization: Plots You Should be Using More

Wednesday, 24 January 2018

Monit: Monitoring your Services

Thursday, 9 June 2016

How to estimate and compute the isoelectric point of peptides and proteins?

Saturday, 14 May 2016

Useful Links to prepare your presentation or talk

Sunday, 3 April 2016

GitHub in Numbers for Bioinformaticians