Big data: The power of petabytes

Researchers are struggling to analyse the steadily swelling troves of '-omic' data in the quest for patient-centred health care. Harvesting genomes or even exomes at the population scale produces a vast amount of data, perhaps up to 40 petabytes (40 million gigabytes) each year. Nevertheless, raw storage is not the primary computational concern.

A greater concern is the amount of variant data being analysed from each individual. “The computation scales linearly with respect to the number of people ... but as you add more variables, it becomes exponential as you start to look at different combinations.” This becomes particularly problematic if there are additional data related to clinical symptoms or gene expression. Processing data of this magnitude from thousands of people can paralyse tools for statistical analysis that might work adequately in a small laboratory study.


Want to receive more content like this in your inbox?