BigBio Notes: 2014

Tuesday 25 November 2014

HUPO-PSI Meeting 2014: Rookie’s Notes

Standardisation: the most difficult flower to grow.

The PSI (Proteomics Standard Initiative) 2014 Meeting was held this year in Frankfurt (13-17 of April) and I can say I’m now part of this history. First, I will try to describe with a couple of sentences (for sure I will fai) the incredible venue, the Schloss Reinhartshausen Kempinski. When I saw for the first time the hotel, first thing came to my mind was those films from the 50s. Everything was elegant, classic, sophisticated - from the decoration to a small latch. The food was incredible and the service is first class from the moment you set foot on the front step and throughout the whole stay.

Standardization is the process of developing and implementing technical standards. Standardization can help to maximize compatibility, interoperability, safety, repeatability, or quality. It can also facilitate commoditization of formerly custom processes. In bioinformatics, the standardization of file formats, vocabulary, and resources is a job that all of us appreciate but for several reasons nobody wants to do. First of all, standardization in bioinformatics means that you need to organize and merge different experimental and in-silico pipelines to have a common way to represent the information. In proteomics for example, you can use different sample preparation, combined with different fractionation techniques and different mass spectrometers; and finally using different search engines and post-processing tools. The diversity and possible combinations is needed because allow to explore different solutions for complex problems. (Standarization in Proteomics: From raw data to metadata files).

PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Slides Presentation:

Youtube Presentation:

Monday 24 November 2014

QC metrics into Progenesis QI for proteomics

Originally posted NonLinear

Proteomics as a field is rapidly maturing; there is a real sense that established techniques are being improved, and excitement at emerging techniques for quantitation. Central to this revolution is the application of effective quality control (QC) – understanding what adversely affects proteomics data, monitoring for problems, and being able to pin down and address them when they arise.

We’ve been at the forefront of QC implementation over the years, from our early involvement in the Fixing Proteomics campaign to our staff (in a previous guise!) publishing on proteomics QC^[1], and it’s an area that’s very important to us – we want you to have confidence in your data and your results, as well as our software.

What is BioHackathon 2014?

In a week BioHackathon 2014 will start (http://www.biohackathon.org/). It will be my first time ins this kind of "meeting". I will give a talk about PRIDE and ProteomeXchange and future developments of both platforms (below the complete list of talks).

But first, a quick introduction of BioHackathon. National Bioscience Database Center (NBDC) and Database Center for Life Science (DBCLS) have been organizing annual BioHackathon since 2008, mainly focusing on standardization (ontologies, controlled vocabularies, metadata) and interoperability of bioinformatics data and web services for improving integration (semantic web, web services, data integration), preservation and utilization of databases in life sciences. This year, we will focus on the standardization and utilization of human genome information with Semantic Web technologies in addition to our previous efforts on semantic interoperability and standardization of bioinformatics data and Web services.

Ontologies versus controlled vocabularies.

While the minimum data standards describe the types of data elements to be captured, the use of standard vocabularies as values to populate the information about these data elements is also important to support interoperability. In many cases, groups develop term lists (controlled vocabularies) that describe what kinds of words and word phrases should be used to describe the values for a given data element. In the ideal case each term is accompanied by a textual definition that describes what the term means in order to support consistency in term use. However, many bioinformaticians have begun to develop and adopt ontologies that can serve in place of vocabularies for use as these allowed term lists. As with a specific vocabulary, an ontology is a domain-specific dictionary of terms and definitions. But an ontology also captures the semantic relationships between the terms, thus allowing logical inferencing about the entities represented by the ontology and by the data annotated using the ontology’s terms.

The semantic relationships incorporated into the ontology represent universal relations between the classes represented by its terms based on knowledge about the entities described by the terms established previously. An ontology is a representation of universals; it described what is general in reality, not what is particular. Thus, ontologies describe classes of entities whereas databases tend to describe instances of entities.

The Open Biomedical Ontology (OBO) library was established in 2001 as a repository of ontologies developed for use by the biomedical research community (http://sourceforge.net/projects/obo). In some cases, the ontology is composed of a highly focused set of terms to support the data annotation needs of a specific model organism community (e.g. the Plasmodium Life Cycle Ontology). In other cases, the ontology covers a broader set of terms that is intended to provide comprehensive coverage of an entire life science domain (e.g. the Cell Type Ontology). The European Bioinformatics Institute has also developed the Ontology Lookup Service (OLS) that provides a web service interface to query multiple OBO ontologies from a single location with a unified output format (http://www.ebi.ac.uk/ontology-lookup/). Both the BioPortal and the OLS permit users to browse individual ontologies and search for terms across ontologies according to term name and certain associated attributes.

Thursday 23 October 2014

Which journals release more public proteomics data!!!

I'm a big fan of data and the -omics family. Also, I like the idea of make more & more our data public available for others, not only for reuse, but also to guarantee the reproducibility and quality assessment of the results (Making proteomics data accessible and reusable: Current state of proteomics databases and repositories). I'm wondering which of these journals (list - http://scholar.google.co.uk/) encourages their submitters and authors to make their data publicly available:

Journal	h5-index	h5-median
Molecular & Cellular Proteomics	74	101
Journal of Proteome Research	70	91
Proteomics	60	76
Biochimica et Biophysica Acta (BBA)-Proteins and Proteomics	52	78
Journal of Proteomics	49	60
Proteomics - Clinical Applications	35	43
Proteome Science	23	32

After a simple statistic, based on PRIDE data:

Number of PRIDE projects by Journal

Analysis of histone modifications with PEAKS 7: A respond to Search Engines comparison from PEAKs Team

Recently we posted a comparison of different search engines for PTMs studies (Evaluation of Proteomic Search Engines for PTMs Identification). After some discussion of the mentioned results in our post the PEAKS Team just published a blog post with the reanalysis of the dataset. Here the results:

Originally Posted in Peaks Blog:

The complex nature of histone modification patterns has posed as a challenge for bioinformatics analysis over the years. Yuan et al. [1] conducted a study using two datasets from human HeLa histone samples, to benchmark the performance of current proteomic search engines. This article was published in J Proteome Res. 2014 Aug 28 (PubMed), and the data from the two datasets, HCD_Histone and CID_Histone (PXD001118), was made publically available through ProteomeXchange. With this data, the article uses eight different proteomic search engines to compare and evaluate the performance and capability of each. The evaluated search engines in this study are: pFind, Mascot, SEQUEST, ProteinPilot, PEAKS 6, OMSSA, TPP and MaxQuant.

In this study, PEAKS 6 was used to compare the performance capabilities between search engines. However, PEAKS 7, which was released November 2013, is the latest version available of the PEAKS Studio software. PEAKS 7 not only includes better performance than PEAKS 6, but a lot of additional and improved features. Our team has reanalyzed the two datasets HCD_Histone and CID_Histone with PEAKS 7 to update the ID results presented in the publication by Yuan et al. These updated results showed that instead, it is PEAKS, pFind and Mascot that identify the most confident results.

Who is a senior developer anyway?

Originally posted in Java Code Geeks

by David Green on October 28th, 2013 | Filed in: Software Development

What makes you a “senior developer”? Everyone and their dog calls themselves a senior developer these days. From fresh graduates to the CTO, everyone is a senior developer. But what the hell does it even mean?

Technologists

Some developers are avid technologists. They got into programming really because they like tinkering. If it hadn’t been 7 languages in 7 weeks, it would have been a box of meccano or they’d be in their shed busy inventing the battery operated self-tieing tie. These people are great to have on your team, they’ll be the ones continually bombarding you with the latest and greatest shiny. If you ever want to know if there’s an off the shelf solution to your problem, they’ll know the options, have tried two of them, and currently have a modified version of a third running on their raspberry pi.

The trouble with technologists is more technology is always the answer. Why have a HTTP listener when you can have a full stack application server? Why use plain old TCP when you can introduce an asynchronous messaging backbone? Why bother trying to deliver software when there’s all these toys to play with!

Installing standalone SpectraST in linux

Some tips to install SpectraST in linux in standalone:

1. - Download TTP latest version.
2. - go to SpectraST folder and run make:

cd TPP-x.x.x/trans_proteomic_pipeline/src/Search/SpectraST
make -f Makefile_STANDALONE_LINUX

ProteoWizard: The chosen one in RAW file conversion

I'm the chosen one.

After five years in proteomics and a quick walk through different computational proteomics topics such as: database analysis, proteomics repositories and databases or identification algorithms I'm sure that the most painful and no grateful job is work with file formats: writing, reading, and dealing with end-users.

File formats (the way that we use to represent, storage and exchange our data) are fundamentals piece in bioinformatics, more than that, are one of the milestone of the Information Era. In some fields the topic is more stable than others, but the topic is still in the table for most of us. To have a quick idea see the evolution of general standards in recent years like XML, JSON and recently YAML.

New Release of Spectronaut™ 6.0 from Biognosys

Biognosys releases Spectronaut™ 6.0

Every researcher with access to high-resolution mass spectrometer can now benefit from the Spectronaut™ software September 10, 2014 – Zurich-Schlieren (CH) – Biognosys AG, a Swiss Proteomics Company, announced today the next release of its Spectronaut™ software for analysis of Hyper Reaction Monitoring (HRM) data that will now also be available for industry partners upon request. HRM-MSTM is a targeted proteomics technology developed by Biognosys that enables reproducible and accurate quantification of 1000s of proteins in a single instrument run. HRM is based on data-independent acquisition (DIA or SWATH), which can be performed on most state of the art high-resolution mass spectrometric systems. Founded in 2008 as spin-off from the lab of proteomics pioneer Ruedi Aebersold at ETH Zurich, Biognosys is dedicated to transform life science with superior technology and software.

Contact: info@biognosys.ch, www.biognosys.ch

Proteomics & personalized medicine Issue in Proteomics

A new issue in Proteomics was recently edited and published by René P. Zahedi et al. regarding proteomics and personalized medicine. This Focus Issue comprises a total of eight valuable contributions from various experts in the field of proteomics research, ranging from methodical development and optimisation to applications dealing with complex samples in biomedical research. Urbani et al. report direct analytical assessment of sample quality for biomarker investigation. They pinpoint the impact of pre-analytical variables that cause major errors in clinical testing. Marko-Varga et al. describe the usage of MALDI imaging as novel tool for personalised diagnostics, as they follow drug action upon treatment of malignant melanoma. Selheim et al. established a novel super-SILAC mix for acute myeloid leukemia (AML) and demonstrate its usage as internal standard for personalized proteomics of AML patients. Jiang et al. demonstrate how SILAC can be utilized to investigate the secretome of activated hepatic stellate cells, the main fibroblast cell type in liver fibrosis. This is an important step for a better understanding of cellular mechanisms during the recovery of liver fibrosis. Borchers et al. introduce novel software for a fast analysis of large datasets derived from crosslinking experiments in order to study protein-protein interactions from large-scale experiments. Gevaert et al. present a technology that allows studying the specificity of methionine sulfoxide reductases and apply it to human samples. The oxidation of free and protein-bound methionine into methionine sulfoxide is a frequently occurring modification caused by reactive oxygen species. This modification may interfere with the identification of posttranslational modification such as protein phosphorylation as well as the peptide identification itself. Mechtler et al. push technology development forward to ultra-low flow nanoHPLC separations. This technology allows obtaining comprehensive proteomic data from less than 100 ng of protein starting material. Finally, Shen et al. demonstrate a rapid and reproducible one-dimensional fast and quantitative LC-MS/MS technology avoiding time- and sample-consuming prefractionation strategies.

Monday 8 September 2014

Evaluation of Proteomic Search Engines for PTMs Identification

The peptide-centric MS strategy is called bottom-up, in which proteins are extracted from cells, digested into peptides with proteases, and analyzed by liquid chromatography tandem mass spectrometry (LC−MS/MS). More specifically, peptides are resolved by chromatography, ionized in mass spectrometers, and scanned to obtain full MS spectra. Next, some high-abundance peptides (precursor ions) are selected and fragmented to obtain MS/MS spectra by high- energy C-trap dissociation (HCD) or collision-induced dissociation (CID).

Then, peptides are commonly identified by searching the MS/MS spectra against a database and finally assembled into identified proteins. Database searching plays an important role in proteomics analysis because it can be used to translate thousands of MS/MS spectra into protein identifications (IDs).

Many database search engines have been developed to quickly and accurately analyze large volumes of proteomics data. Some of the more well-known search engines are Mascot, SEQUEST, PEAKS DB, ProteinPilot, Andromeda, and X!Tandem. Here a list of commonly use search engines in proteomics and mass spectrometry.

Start a startup or Work for someone else?

Originally posted on P4P:

When you look online for advice about entrepreneurship, you will see a lot of "just do it":

The best way to get experience... is to start a startup. So, paradoxically, if you're too inexperienced to start a startup, what you should do is start one. That's a way more efficient cure for inexperience than a normal job. - Paul Graham, Why to Not Not Start a Startup

There is very little you will learn in your current job as a {consultant, lawyer, business person, economist, programmer} that will make you better at starting your own startup. Even if you work at someone else’s startup right now, the rate at which you are learning useful things is way lower than if you were just starting your own. - David Albert, When should you start a startup?

This advice almost never comes with citations to research or quantitative data, from which I have concluded:

The sort of person who jumps in and gives advice to the masses without doing a lot of research first generally believes that you should jump in and do things without doing a lot of research first.

NEW NIST 2014 mass spectral library

Originally posted in NIST 2014.

Identify your mass spectra with the new NIST 14 Mass Spectral Library and Search Software.

NIST 14 - The successor to NIST 11 (2011) - Is a collection of:

Electron ionization (EI) mass spectra
Tandem MS/MS spectra (ion trap and collision cell)
GC method and retention data
Chemical structures and names
Software for searching and identifying your mass spectra
NIST 14 is integrated with most mass spectral data systems, including Agilent ChemStation/MassHunter, Thermo Xcalibur, and others. The NIST Library is known for its high quality, broad coverage, and accessibility. It is a product of a three decade, comprehensive evaluation and expansion of the world's most widely used and trusted mass spectral reference library compiled by a team of experienced mass spectrometrists in which each spectrum was examined for correctness.

Improvements from 2011 version:

Increased coverage in all libraries: 32,355 more EI spectra; 138,875 more MS/MS spectra; 37,706 more GC data sets
Retention index usable in spectral match scoring
Improved derivative naming, user library features, links to InChIKey, and other metadata.
Upgrade discount for any previous version
Lowest Agilent format price available

MS/MS and GC libraries may now be optionally purchased separately at very low cost
Learn what`s new http://www.sisweb.com/software/ms/nist.htm#whatsnew

Pick related PDFs

Thursday 4 September 2014

Quick Guide to the New Uniprot Web

Probably Uniprot is one of the most used and well-established services in bioinformatics worldwide. With more than 12 years, is one of the major resources of biological information and the reference catalog of protein sequence in the World. The aim of Uniprot is provide the scientific community with a single, centralized, authoritative resource for protein sequences and functional information. It started in 2002 when the Swiss‐Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt) consortium.

Adding CITATION to your R package

Original post from Robin's Blog:

Software is very important in science – but good software takes time and effort that could be used to do other work instead. I believe that it is important to do this work – but to make it worthwhile, people need to get credit for their work, and in academia that means citations. However, it is often very difficult to find out how to cite a piece of software – sometimes it is hidden away somewhere in the manual or on the web-page, but often it requires sending an email to the author asking them how they want it cited. The effort that this requires means that many people don’t bother to cite the software they use, and thus the authors don’t get the credit that they need. We need to change this, so that software – which underlies a huge amount of important scientific work – gets the recognition it deserves.

Original post from GitHub Guides:

Digital Object Identifiers (DOI) are the backbone of the academic reference and metrics system. If you’re a researcher writing software, this guide will show you how to make the work you share on GitHub citable by archiving one of your GitHub repositories and assigning a DOI with the data archiving tool Zenodo.

ProTip: This tutorial is aimed at researchers who want to cite GitHub repositories in academic literature. Provided you’ve already set up a GitHub repository, this tutorial can be completed without installing any special software. If you haven’t yet created a project on GitHub, start first byuploading your work to a repository.

ProteoStats: Computing false discovery rates in proteomics

By Amit K. Yadav (@theoneamit) & Yasset Perez-Riverol (@ypriverol):

Perl is a legacy language thought to be abstruse by many modern programmers. I’m passionate with the idea of not letting die a programming language such as Perl. Even when the language is used less in Computational Proteomics, it is still widely used in Bioinformatics. I’m enthusiastic writing about new open-source libraries in Perl that can be easily used. Two years ago, I wrote a post about InSilicoSpectro and how it can be used to study protein databases like I did in “In silico analysis of accurate proteomics, complemented by selective isolation of peptides”.

Today’s post is about ProteoStats [1], a Perl library for False Discovery Rate (FDR) related calculations in proteomics studies. Some background for non-experts:

One of the central and most widely used approach for shotgun proteomics is the use of database search tools to assign spectra to peptides (called as Peptide Spectrum Matches or PSMs). To evaluate the quality of the assignments, these programs need to calculate/correct for population wise error rates to keep the number of false positives under control. In that sense, the best strategy to control the false positives is the target-decoy approach. Originally proposed by Elias & Gygi in 2007, the so-called classical FDR strategy or formula proposed involved a concatenated target-decoy (TD) database search for FDR estimation. This calculation is either done by the search engine or using scripts (in-house, non-published, not benchmarked, different implementations).

So far, the only library developed to compute FDR at spectra level, peptide level and protein level FDRs is MAYU [2]. But, while MAYU only uses the classical FDR approach, ProteoStats provides options for 5 different strategies for calculating the FDR. The only prerequisite being that you need to search using a separate TD database as proposed by Kall et al (2008) [3]. Also, ProteoStats provides a programming interface that can read the native output from most widely used search tools and provide FDR related statistics. In case of tools not supported, pepXML, which has become a de facto standard output format, can be directly read along with tabular text based formats like TSV and CSV (or any other well-defined separator).

Thesis: Development of computational methods for analysing proteomic data for genome annotation

Thesis by Markus Brosch in 2009 about Computational proteomics methods for analysing proteomic data for genome annotation.

Notes from Abstract

Proteomic mass spectrometry is a method that enables sequencing of gene product fragments, enabling the validation and reﬁnement of existing gene annotation as well as the detection of novel protein coding regions. However, the application of proteomics data to genome annotation is hindered by the lack of suitable tools and methods to achieve automatic data processing and genome mapping at high accuracy and throughput.

In the ﬁrst part of this project I evaluate the scoring schemes of “Mascot”, which is a peptide identiﬁcation software that is routinely used, for low and high mass accuracy data and show these to be not suﬃciently accurate. I develop an alternative scoring method that provides more sensitive peptide identiﬁcation speciﬁcally for high accuracy data, while allowing the user to ﬁx the false discovery rate. Building upon this, I utilise the machine learning algorithm “Percolator” to further extend my Mascot scoring scheme with a large set of orthogonal scoring features that assess the quality of a peptide-spectrum match.

To close the gap between high throughput peptide identiﬁcation and large scale genome annotation analysis I introduce a proteogenomics pipeline. A comprehensive database is the central element of this pipeline, enabling the eﬃcient mapping of known and predicted peptides to their genomic loci, each of which is associated with supplemental annotation information such as gene and transcript identiﬁers.

In the last part of my project the pipeline is applied to a large mouse MS dataset. I show the value and the level of coverage that can be achieved for validating genes and gene structures, while also highlighting the limitations of this technique. Moreover, I show where peptide identiﬁcations facilitated the correction of existing annotation, such as re-deﬁning the translated regions or splice boundaries.

Moreover, I propose a set of novel genes that are identiﬁed by the MS analysis pipeline with high conﬁdence, but largely lack transcriptional or conservational evidence.

Java Optimization Tips (Memory, CPU Time and Code)

There are several common optimization techniques that apply regardless of the language being used. Some of these techniques, such as global register allocation, are sophisticated strategies to allocate machine resources (for example, CPU registers) and don't apply to Java bytecodes. We'll focus on the techniques that basically involve restructuring code and substituting equivalent operations within a method.

EntrySet vs KeySet

-----------------------------------------


More efficient:

for (Map.Entry entry : map.entrySet()) {
    Object key = entry.getKey();
    Object value = entry.getValue();
}

than:

for (Object key : map.keySet()) {
    Object value = map.get(key);
}

Avoid to create threads without run methods

------------------------------------


Usage Example: 

public class Test
{
 public void method() throws Exception
 {
  new Thread().start();  //VIOLATION
 }
}
Should be written as:

public class Test
{
 public void method(Runnable r) throws Exception
 {
  new Thread(r).start();  //FIXED
 }
}

Initialise the ArrayList if you know in advance the size

--------------------------------------------

 
For example, use this code if you expect your ArrayList to store around 1000 objects:

List str = new ArrayList(1000)

Use ternary operators

----------------------------------------


class Use_ternary_operator_correction
{
 public boolean test(String value)
 {
  if(value.equals("AppPerfect"))  // VIOLATION
  {
   return true;
  }
  else
  {
   return false;
  }
 }
}

Should be written as:


class Use_ternary_operator_correction
{
 public boolean test(String value)
 {
  return value.equals("AppPerfect"); // CORRECTION
 }
}

Always declare constant fields Static


public class Always_declare_constant_field_static_violation
{
 final int MAX = 1000; // VIOLATION
 final String NAME = "Noname"; // VIOLATION
}

Should be written as:

public class Always_declare_constant_field_static_correction
{
 static final int MAX = 1000; // CORRECTION
 static final String NAME = "Noname"; // VIOLATION
}

Sunday 6 April 2014

SWATH-MS and next-generation targeted proteomics

For proteomics, two main LC-MS/MS strategies have been used thus far. They have in common that the sample proteins are converted by proteolysis into peptides, which are then separated by (capillary) liquid chromatography. They differ in the mass spectrometric method used.

The first and most widely used strategy is known as shotgun proteomics or discovery proteomics. For this method, the MS instrument is operated in data-dependent acquisition (DDA) mode, where fragment ion (MS2) spectra for selected precursor ions detectable in a survey (MS1) scan are generated (Figure 1 - Discovery workflow). The resulting fragment ion spectra are then assigned to their corresponding peptide sequences by sequence database searching (See Open source libraries and frameworks for mass spectrometry based proteomics: A developer's perspective).

The second main strategy is referred to as targeted proteomics. There, the MS instrument is operated in selected reaction monitoring (SRM) (also called multiple reaction monitoring) mode (Figure 1 - Targeted Workflow). With this method, a sample is queried for the presence and quantity of a limited set of peptides that have to be specified prior to data acquisition. SRM does not require the explicit detection of the targeted precursors but proceeds by the acquisition, sequentially across the LC retention time domain, of predefined pairs of precursor and product ion masses, called transitions, several of which constitute a definitive assay for the detection of a peptide in a complex sample (See Targeted proteomics) .

Figure 1 - Discovery and Targeted proteomics workflows

Most read from the Journal of Proteome Research for 2013.

1- Protein Digestion: An Overview of the Available Techniques and Recent
    Developments

    Linda Switzar, Martin Giera, Wilfried M. A. Niessen

    DOI: 10.1021/pr301201x

2- Andromeda: A Peptide Search Engine Integrated into the MaxQuant
     Environment

     Jürgen Cox, Nadin Neuhauser, Annette Michalski, Richard A. Scheltema, Jesper
     V. Olsen, Matthias Mann

     DOI: 10.1021/pr101065j

2- Evaluation and Optimization of Mass Spectrometric Settings during
     Data-dependent Acquisition Mode: Focus on LTQ-Orbitrap Mass Analyzers

     Anastasia Kalli, Geoffrey T. Smith, Michael J. Sweredoski, Sonja Hess

     DOI: 10.1021/pr3011588

3- An Automated Pipeline for High-Throughput Label-Free Quantitative
     Proteomics

     Hendrik Weisser, Sven Nahnsen, Jonas Grossmann, Lars Nilse, Andreas Quandt,
     Hendrik Brauer, Marc Sturm, Erhan Kenar, Oliver Kohlbacher, Ruedi Aebersold,
     Lars Malmström

     DOI: 10.1021/pr300992u

4- Proteome Wide Purification and Identification of O-GlcNAc-Modified Proteins
     Using Click Chemistry and Mass Spectrometry

     Hannes Hahne, Nadine Sobotzki, Tamara Nyberg, Dominic Helm, Vladimir S.
     Borodkin, Daan M. F. van Aalten, Brian Agnew, Bernhard Kuster

     DOI: 10.1021/pr300967y

5- A Proteomics Search Algorithm Specifically Designed for High-Resolution
     Tandem Mass Spectra

     Craig D. Wenger, Joshua J. Coon

     DOI: 10.1021/pr301024c

6- Analyzing Protein–Protein Interaction Networks

    Gavin C. K. W. Koh, Pablo Porras, Bruno Aranda, Henning Hermjakob, Sandra E.
    Orchard

    DOI: 10.1021/pr201211w

7- Combination of FASP and StageTip-Based Fractionation Allows In-Depth
     Analysis of the Hippocampal Membrane Proteome

     Jacek R. Wisniewski, Alexandre Zougman, Matthias Mann

     DOI: 10.1021/pr900748n

8- The Biology/Disease-driven Human Proteome Project (B/D-HPP): Enabling
     Protein Research for the Life Sciences Community

     Ruedi Aebersold, Gary D. Bader, Aled M. Edwards, Jennifer E. van Eyk, Martin
     Kussmann, Jun Qin, Gilbert S. Omenn

     DOI: 10.1021/pr301151m

9- Comparative Study of Targeted and Label-free Mass Spectrometry Methods
      for Protein Quantification

       Linda IJsselstijn, Marcel P. Stoop, Christoph Stingl, Peter A. E. Sillevis Smitt,
       Theo M. Luider, Lennard J. M. Dekker

       DOI: 10.1021/pr301221f

Wednesday 19 February 2014

In the ERA of science communication, Why you need Twitter, Professional Blog and ImpactStory?

Where is the information? Where are the scientifically relevant results? Where are the good ideas? Are these things (only) in journals? I usually prefer to write about bioinformatics and how we should include, annotate and cite our bioinformatics tools inside research papers (The importance of Package Repositories for Science and Research, The problem of in-house tools); but this post represents my take on the future of scientific publications and their dissemination based on the manuscript “Beyond the paper” (1).

In the not too distant future, today’s science journals will be replaced by a set of decentralized, interoperable services that are built on a core infrastructure of open data and evolving standards — like the Internet itself. What the journal did in the past for a single article, the social media and internet resources are doing for the entire scholarly output. We are now immersed in a transition to another science communication system— one that will tap on Web technology to significantly improves dissemination. I prefer to represent the future of science communication by a block diagram where the four main components: (i) Data, (ii) Publications, (iii) Dissemination and (iv) Certification/Reward are completely interconnected:

Solving Invalid signature in JNLP

I have this error each time i run my jnlp:

invalid SHA1 signature file digest for

I found some discussions about possible solutions:

http://stackoverflow.com/questions/8176166/invalid-sha1-signature-file-digest

http://stackoverflow.com/questions/11673707/java-web-start-jar-signing-issue

But he problem was still there. I solved the problem using plugin option (<unsignAlreadySignedJars>true</unsignAlreadySignedJars>) and removing previous signatures to avoid possible signature duplications:



  <plugin>
     <groupId>org.codehaus.mojo.webstart</groupId>
       <artifactId>webstart-maven-plugin</artifactId>
         <executions>
           <execution>
             <id>jnlp-building</id>
             <phase>package</phase>
               <goals>
                 <goal>jnlp</goal>
               </goals>
            </execution>
         </executions>
         <configuration>
           <!-- Include all the dependencies -->
           <excludeTransitive>false</excludeTransitive>
           <unsignAlreadySignedJars>true</unsignAlreadySignedJars>
           <verbose>true</verbose>
           <verifyjar>true</verifyjar>
           <!-- The path where the libraries are stored -->
           <libPath>lib</libPath>
           <jnlp>
             <inputTemplate>webstart/jnlp-template.vm</inputTemplate>
             <outputFile>ProteoLimsViewer.jnlp</outputFile>
             <mainClass>cu.edu.cigb.biocomp.proteolims.gui.ProteoLimsViewer</mainClass>
           </jnlp>
           <sign>
             <keystore>keystorefile</keystore>
             <alias>proteolimsviewer</alias>
             <storepass>password</storepass>
             <keypass>password</keypass>
             <keystoreConfig>
               <delete>false</delete>
               <gen>false</gen>
             </keystoreConfig>
           </sign>
              <!-- building process -->
              <pack200>false</pack200>
              <verbose>true</verbose>
         </configuration>
     </plugin>

Wednesday 22 January 2014

What is a bioinformatician

By Anthony Fejes originally posted in blog.fejes.ca

I’ve been participating in an interesting conversation on linkedin, which has re-opened the age old question of what is a bioinformatician, which was inspired by a conversation on twitter, that was later blogged. Hopefully I’ve gotten that chain down correctly.

In any case, it appears that there are two competing schools of thought. One is that bioinformatician is a distinct entity, and the other is that it’s a vague term that embraces anyone and anything that has to do with either biology or computer science. Frankly, I feel the second definition is a waste of a perfectly good word, despite being a commonly accepted method.

Some of the most cited manuscripts in Proteomics and Computational Proteomics (2013)

Some of the most cited manuscripts in 2013 in the field of Proteomics and Computational Proteomics (no order):

The Proteomics Identifications (PRIDE) database and associated tools: status in 2013:

     The PRoteomics IDEntifications (PRIDE, http://www.ebi.ac.uk/pride) database
     at the European Bioinformatics Institute is one of the most prominent data
     repositories of mass spectrometry (MS)-based proteomics data. Here, we
     summarize recent developments in the PRIDE database and related tools.
     First, we provide up-to-date statistics in data content, splitting the figures by
     groups of organisms and species, including peptide and protein
     identifications, and post-translational modifications. We then describe the
     tools that are part of the PRIDE submission pipeline, especially the recently
     developed PRIDE Converter 2 (new submission tool) and PRIDE Inspector
     (visualization and analysis tool). We also give an update about the integration
     of PRIDE with other MS proteomics resources in the context of the
     ProteomeXchange consortium. Finally, we briefly review the quality control
     efforts that are ongoing at present and outline our future plans.

Are you a Computational Biologist or Bioinformaticist or Bioinformatician?

A recent discussion was provoked by @attilacsordas on twitter January 8 regarding what is the choice term for referring to those researchers working on Bioinformatics and Computational Biology fields.

Are bioinformaticians and bioinformatists alike? :)
— attilacsordas (@attilacsordas) January 8, 2014

This debate is older than people may think and it looks like an insignificant topic, but when you are writing your CV or your internet profile, or you’re looking for a new job, you will need a professional title, and that is really important. If you also look the volume of discussion and opinions about this topic on internet you will realize that the community have different points of view. I've use some of my spare time to read in detail different opinions about it, and also collect some those opinions and articles. Let’s see what common terms are used nowadays for these researchers:

Bioinformaticist, Bioinformatician, Computational Biologist, Digital biologist, bioinformatics analyst

My Formula as a Bioinformatician

Every day, I enjoy reading about bioinformatics in blogs, linkedin, and twitter; away from my daily reading of manuscripts journals. I strongly think that the future of publications/science will be closer & closer to the open access style and this emergent way to publish your ideas faster/brief in your own space. Some of my old co-workers don't understand this way to get in touch with science using informal environments rather than arbitrary/supervised spaces; I just said to them, we make the future, not the past. Reading the popular post “A guide for the lonely bioinformatician”, I was thinking about the last three years and how I have been built my own formula to survive as a lonely bioinformatician in a small country, with a lousy internet connection and without a bioinformatics environment.

All the bioinformaticians that I met during these three years can be categorized in three major groups considering their original background:

1)    MDs, Biologist, Biochemist, Chemist
2)    Physicist, Mathematicians, Computer Scientist, Software Engineers, Software
       Developers
3)    Philosophers, *

As an embryonic and growing field the diversity is huge, then it is quite complex to express all the data behavior in one model or a formula. Here I will summarize some of the variables of my formula, extremely correlated with the original post suggestions:

PRIDE Inspector 2.0 is an integrated desktop application for MS Proteomics data analysis and visualization.

- The current version support PRIDE XML, mzIdentML, as well as providing direct access to PRIDE public database.

- The new version also support of Mass Spectra Formats such as mzxml, mgf, pkl, ms2, dta.

- Some of the new features are: Fragmentation Visualization.

- Protein and Peptide Group Visualization.

- Visualization of Peptide and Protein Properties (Scores, pI, etc)

- New Chart Options.

- Others ...

Links:

- Publication: http://www.nature.com/nbt/journal/v30/n2/pdf/nbt.2112.pdf

- Release: http://ftp.pride.ebi.ac.uk/pride/resources/tools/inspector/latest/desktop/pride-inspector.zip

- Source Code: http://code.google.com/p/pride-toolsuite/wiki/PRIDEInspector

Tuesday 25 November 2014

Monday 24 November 2014

Wednesday 29 October 2014

Sunday 26 October 2014

Thursday 23 October 2014

Saturday 4 October 2014

Wednesday 17 September 2014

Who is a senior developer anyway?

Tuesday 16 September 2014

Sunday 14 September 2014

Wednesday 10 September 2014

Monday 8 September 2014

Sunday 7 September 2014

Friday 5 September 2014

Thursday 4 September 2014

Tuesday 26 August 2014

Wednesday 20 August 2014

Sunday 8 June 2014

Sunday 6 April 2014

Monday 3 March 2014

Wednesday 19 February 2014

Friday 7 February 2014

Wednesday 22 January 2014

Monday 20 January 2014

Wednesday 8 January 2014