Showing posts with label biological databases. Show all posts
Showing posts with label biological databases. Show all posts

Wednesday, 24 January 2018

Monit: Monitoring your Services

Bioinformatics Applications are moving more in the direction of "Microservices" Architectures where services should be fine-grained and the protocols should be lightweight. Microservices Architectures decomposed the application into different smaller services improving the modularity; making the application easier to develop, deploy and maintain. It also parallelizes development by enabling small autonomous teams to develop, deploy and scale their respective services independently.



With more services (Databases, APIs, Web Applications, Pipelines) more components should be trace, monitor, to know the health of your application. There might be different roles that are played by different services (in different physical/logical machines) that can be even geographically isolated from each other. As a whole, these services/servers might be providing a combined service to the end application. A particular issue or problem on any of the server should not affect the final application behavior and must be found and fixed before the outage happens.

Multiple applications allow developers/devops and sysadmins to monitor all the services in a microservices application, but the most popular ones are Nagios and Monit.

Monday, 24 August 2015

Moving Bioinformatics to the Cloud

Constantly we presence new technologies being developed and streamed to public, to researchers that work with molecular biology, the ones that get our attention normally comes from new laboratory methodologies or instruments. In this post, we are going to talk about a different situation that is calling the attention of researchers who work with molecular biology, and more specifically, bioinformatics, in a different way. I’m writing about a technological innovation that comes from the computational field and can have a great impact on how we do biological analysis with bioinformatics software.


A few years ago a cloud startup called dotCloud developed a new software called Docker, to be used only internally, the software made so much success that just after two years releasing docker to the public, the newly Docker company has an estimate worth of $1 bn.


What is Docker and Why it Matters?


Docker has several ways to be employed in different environments, what it does is to basically, provide to the user isolated and containerized software that can be executed apart from the host operating system. It is very similar to what a Virtual Machine does, the difference is that there is no guest operating system. These containers use some system libraries and apply some abstraction layers to the execution of the software inside, in the end you have an isolated environment with a custom software inside that can be shared.


What this has to do with Bioinformatics?


Imagine that you are a senior researcher, or even a recently accepted student, trying to learn how to do some analysis. You are a lab specialist but computers are not your thing. Now imagine that the software you are trying to run needs a Linux operating system with a gcc compiler version 4.9.3 and some libraries like GD. Sounds bad right? That’s where Docker comes in. Docker allows developers to ship software inside a container, that is, a custom environment with all the necessary tools and configuration to run a specific program, what you have to do if just download the container and execute the program inside.  Running a Docker container is just as simple as running a program in the command line.


Benefits for Bioinformatics


For a bioinformatician this brings several other benefits. Something that is getting attention today is how to deal with reproducible research in the bioinformatics field. Different computers with different configurations, libraries and software versions can produce different results when comparing results from different software. If we had the chance to transform the environment variable into a constant, that problem would be reduced a lot.


The BioDocker Project


In 2014, a new project called BioDocker was founded. Recently, the project assumed a community-driven policy, the main idea is to get feedback from the community and to enjoy the specialty of each member. The goal here is to provide containerized bioinformatics tools to the general public. For developers bioinformaticians, the project also provides specifications, settings and guidelines on how to produce your own Biodocker containers. Defining guidelines like that we hope that the use of Docker become more common, helping people to deal more easily with different software and to reduce the problem with the reproducible research.


Wrapping up


Docker is a new technology that is gaining a lot of space nowadays, and slowly , it is getting some space in the bioinformatics field as well. It is definitively worth to get some time to learn how to work with it.

Thursday, 13 August 2015

The future of Proteomics: The Consensus


After the Big Nature papers about the Human Proteome [1][2] the proteomics community has been divided by the same well-known topics than genomics had before: same reasons, same discussions [3-7]. No one discusses about the technical issues, the instrument settings, nothing about the samples processing, even anything about the analytical method (Most of both projects are "common" bottom-up experiments). Main issues are data-analysis problems and still Computational Proteomics Challenges.  

Thursday, 23 October 2014

Which journals release more public proteomics data!!!

I'm a big fan of data and the -omics family. Also, I like the idea of make more & more our data public available for others, not only for reuse, but also to guarantee the reproducibility and quality assessment of the results (Making proteomics data accessible and reusable: Current state of proteomics databases and repositories). I'm wondering which of these journals (list - http://scholar.google.co.uk/) encourages their submitters and authors to make their data publicly available:



Journal
h5-index
h5-median
Molecular & Cellular Proteomics
74
101
Journal of Proteome Research
70
91
Proteomics
60
76
Biochimica et Biophysica Acta (BBA)-Proteins and Proteomics
52
78
Journal of Proteomics
49
60
Proteomics - Clinical Applications
35
43
Proteome Science
23
32

After a simple statistic, based on PRIDE data:


Number of PRIDE projects by Journal

Thursday, 4 September 2014

Quick Guide to the New Uniprot Web

Probably Uniprot is one of the most used and well-established services in bioinformatics worldwide. With more than 12 years, is one of the major resources of biological information and the reference catalog of protein sequence in the World. The aim of Uniprot is provide the scientific community with a single, centralized, authoritative resource for protein sequences and functional information. It started in 2002 when the Swiss‐Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt) consortium.

Wednesday, 8 January 2014

Are you a Computational Biologist or Bioinformaticist or Bioinformatician?

A recent discussion was provoked by on twitter January 8 regarding what is the choice term for referring to those researchers working on Bioinformatics and Computational Biology fields.
This debate is older than people may think and it looks like an insignificant topic, but when you are writing your CV or your internet profile, or you’re looking for a new job, you will need a professional title, and that is really important. If you also look the volume of discussion and opinions about this topic on internet you will realize that the community have different points of view. I've use some of my spare time to read in detail different opinions about it, and also collect some those opinions and articles. Let’s see what common terms are used nowadays for these researchers:

Bioinformaticist, Bioinformatician, Computational Biologist, Digital biologist, bioinformatics analyst


Thursday, 7 November 2013

News: JBioWH WebServices

We decided to develop a JBioWH webservice to provides the JBioWH data through internet. The source code is under development now but you can test the server on:
 
Only the DataSet module is available and you can retrieve the dataset info 
using the server URL.  The webservices is able to send data in XML and JSON.

We are open to develop any webservices requested by users. So, let me know if 
your specific needs.

Regards 


Friday, 1 November 2013

Integrating the Biological Universe

Integrating biological data is perhaps one of the most daunting tasks any bioinformatician has to face. From a cursory look, it is easy to see two major obstacles standing in the way: (i) the sheer amount of existing data, and (ii) the staggering variety of resources and data types used by the different groups working in the field (reviewed at [1]). In fact, the topic of data integration has a long-standing history in computational biology and bioinformatics. A comprehensive picture of this problem can be found in recent papers [2], but this short comment will serve to illustrate some of the hurdles of data integration and as a not-so-shameless plug for our contribution towards a solution.

"Reflecting the data-driven nature of modern biology, databases have grown considerably both in size and number during the last decade. The exact number of databases is difficult to ascertain. While not exhaustive, the 2011 Nucleic Acids Research (NAR) online database collection lists 1330 published biodatabases (1), and estimates derived from the ELIXIR database provider survey suggest an approximate annual growth rate of ∼12% (2). Globally, the numbers are likely to be significantly higher than those mentioned in the online collection, not least because many are unpublished, or not published in the NAR database issue." [1]

Some basic concepts

Traditionally, biological database integration efforts come in three main flavors:
  • Federated: Sometimes termed portal, navigational or link integration, it is based on the use of hyperlinks to join data from disparate sources; early examples include SRS and Entrez. Using the federated approach, it is relatively easy to provide current, up-to-date information, but maintaining the hyperlinks requires considerable effort.
  • Mediated or View Integration:  Provides a unified query interface and collects the results from various data sources (BioMart).
  • Warehouse: In this approach different data sources are stored in one place; examples include BioWarehouse and JBioWH. While it provides faster querying over joined datasets, it also requires extra care to maintain the underlying databases completely updated.