BigBio Notes: EBI

Showing posts with label EBI. Show all posts

Tuesday, 25 November 2014

HUPO-PSI Meeting 2014: Rookie’s Notes

Standardisation: the most difficult flower to grow.

The PSI (Proteomics Standard Initiative) 2014 Meeting was held this year in Frankfurt (13-17 of April) and I can say I’m now part of this history. First, I will try to describe with a couple of sentences (for sure I will fai) the incredible venue, the Schloss Reinhartshausen Kempinski. When I saw for the first time the hotel, first thing came to my mind was those films from the 50s. Everything was elegant, classic, sophisticated - from the decoration to a small latch. The food was incredible and the service is first class from the moment you set foot on the front step and throughout the whole stay.

Standardization is the process of developing and implementing technical standards. Standardization can help to maximize compatibility, interoperability, safety, repeatability, or quality. It can also facilitate commoditization of formerly custom processes. In bioinformatics, the standardization of file formats, vocabulary, and resources is a job that all of us appreciate but for several reasons nobody wants to do. First of all, standardization in bioinformatics means that you need to organize and merge different experimental and in-silico pipelines to have a common way to represent the information. In proteomics for example, you can use different sample preparation, combined with different fractionation techniques and different mass spectrometers; and finally using different search engines and post-processing tools. The diversity and possible combinations is needed because allow to explore different solutions for complex problems. (Standarization in Proteomics: From raw data to metadata files).

What is BioHackathon 2014?

In a week BioHackathon 2014 will start (http://www.biohackathon.org/). It will be my first time ins this kind of "meeting". I will give a talk about PRIDE and ProteomeXchange and future developments of both platforms (below the complete list of talks).

But first, a quick introduction of BioHackathon. National Bioscience Database Center (NBDC) and Database Center for Life Science (DBCLS) have been organizing annual BioHackathon since 2008, mainly focusing on standardization (ontologies, controlled vocabularies, metadata) and interoperability of bioinformatics data and web services for improving integration (semantic web, web services, data integration), preservation and utilization of databases in life sciences. This year, we will focus on the standardization and utilization of human genome information with Semantic Web technologies in addition to our previous efforts on semantic interoperability and standardization of bioinformatics data and Web services.

Quick Guide to the New Uniprot Web

Probably Uniprot is one of the most used and well-established services in bioinformatics worldwide. With more than 12 years, is one of the major resources of biological information and the reference catalog of protein sequence in the World. The aim of Uniprot is provide the scientific community with a single, centralized, authoritative resource for protein sequences and functional information. It started in 2002 when the Swiss‐Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt) consortium.

Integrating the Biological Universe

Integrating biological data is perhaps one of the most daunting tasks any bioinformatician has to face. From a cursory look, it is easy to see two major obstacles standing in the way: (i) the sheer amount of existing data, and (ii) the staggering variety of resources and data types used by the different groups working in the field (reviewed at [1]). In fact, the topic of data integration has a long-standing history in computational biology and bioinformatics. A comprehensive picture of this problem can be found in recent papers [2], but this short comment will serve to illustrate some of the hurdles of data integration and as a not-so-shameless plug for our contribution towards a solution.

"Reflecting the data-driven nature of modern biology, databases have grown considerably both in size and number during the last decade. The exact number of databases is difficult to ascertain. While not exhaustive, the 2011 Nucleic Acids Research (NAR) online database collection lists 1330 published biodatabases (1), and estimates derived from the ELIXIR database provider survey suggest an approximate annual growth rate of ∼12% (2). Globally, the numbers are likely to be significantly higher than those mentioned in the online collection, not least because many are unpublished, or not published in the NAR database issue." [1]

Some basic concepts

Traditionally, biological database integration efforts come in three main flavors:

Federated: Sometimes termed portal, navigational or link integration, it is based on the use of hyperlinks to join data from disparate sources; early examples include SRS and Entrez. Using the federated approach, it is relatively easy to provide current, up-to-date information, but maintaining the hyperlinks requires considerable effort.
Mediated or View Integration: Provides a unified query interface and collects the results from various data sources (BioMart).
Warehouse: In this approach different data sources are stored in one place; examples include BioWarehouse and JBioWH. While it provides faster querying over joined datasets, it also requires extra care to maintain the underlying databases completely updated.

BigBio Notes

Tuesday, 25 November 2014

HUPO-PSI Meeting 2014: Rookie’s Notes

Wednesday, 29 October 2014

What is BioHackathon 2014?

Thursday, 4 September 2014

Quick Guide to the New Uniprot Web

Friday, 1 November 2013

Integrating the Biological Universe