BigBio Notes: November 2013

Tuesday 12 November 2013

My List of Most Active Twitter Users in Proteomics

Recently, I published a list of my top influential authors in Computational proteomics. The list was created using a my PhD References and other resources such as linkedin, twitter, google scholar. I will try to do the same here using the most active twitter accounts that i follow. Twitter can be incredibly powerful for both consuming and contributing to the dialogue in your field. Twitter can be an excellent real-time source of new publications, fresh developments, and current opinion. If you like and use twitter these are some of the twitter account i follow (no order) in Proteomics:

Thursday 7 November 2013

News: JBioWH WebServices

We decided to develop a JBioWH webservice to provides the JBioWH data through internet. The source code is under development now but you can test the server on:

http://hydrax.icgeb.trieste.it:8080/jbiowh-webservices/

Only the DataSet module is available and you can retrieve the dataset info
using the server URL. The webservices is able to send data in XML and JSON.

We are open to develop any webservices requested by users. So, let me know if
your specific needs.

Regards

Friday 1 November 2013

Integrating the Biological Universe

Integrating biological data is perhaps one of the most daunting tasks any bioinformatician has to face. From a cursory look, it is easy to see two major obstacles standing in the way: (i) the sheer amount of existing data, and (ii) the staggering variety of resources and data types used by the different groups working in the field (reviewed at [1]). In fact, the topic of data integration has a long-standing history in computational biology and bioinformatics. A comprehensive picture of this problem can be found in recent papers [2], but this short comment will serve to illustrate some of the hurdles of data integration and as a not-so-shameless plug for our contribution towards a solution.

"Reflecting the data-driven nature of modern biology, databases have grown considerably both in size and number during the last decade. The exact number of databases is difficult to ascertain. While not exhaustive, the 2011 Nucleic Acids Research (NAR) online database collection lists 1330 published biodatabases (1), and estimates derived from the ELIXIR database provider survey suggest an approximate annual growth rate of ∼12% (2). Globally, the numbers are likely to be significantly higher than those mentioned in the online collection, not least because many are unpublished, or not published in the NAR database issue." [1]

Some basic concepts

Traditionally, biological database integration efforts come in three main flavors:

Federated: Sometimes termed portal, navigational or link integration, it is based on the use of hyperlinks to join data from disparate sources; early examples include SRS and Entrez. Using the federated approach, it is relatively easy to provide current, up-to-date information, but maintaining the hyperlinks requires considerable effort.
Mediated or View Integration: Provides a unified query interface and collects the results from various data sources (BioMart).
Warehouse: In this approach different data sources are stored in one place; examples include BioWarehouse and JBioWH. While it provides faster querying over joined datasets, it also requires extra care to maintain the underlying databases completely updated.