Proteomics data analysis is in the middle of a big transition. We are moving from small experiments (e.g. a couple of RAW files, samples) to big large scale experiments. While the average number of RAW files per datasets in PRIDE hasn't grown in the last 6 years (Figure 1), we can see multiple experiments with more than 1000 RAW files (Figure 1 - right).
|
Figure 1: The boxplot of the number of files per dataset in PRIDE (left - outliers removed; right - outliers included) |
On the other side, File size shows a trend towards large RAW files (
Figure 2).
|
Figure 2: Box plot of file size by datasets in PRIDE (outliers removed) |
Then, how proteomics data analysis can be moved towards large scale and elastic compute architectures such as Cloud infrastructures or High-performance computing (HPC) clusters?