February 10, 2012

DNA Sequencing, the Firehose of Huge Data

As the genome sequencing costs get to the less than $1,000 per genome range, expect the rate of data generation from DNA sequencing to explode.

For example,
  • BGI is sequencing the equivalent of 2,000 human genomes per day
  • 30,000 human genomes will be sequenced by 2012-end.
    Remember, currently the world population is 6.9 billion and there are countless living beings of other species

The Human Microbiome Project, which is sequencing the microbial populations in the human digestive tract, has generated about a million times as much sequence data as a single human genome.
Moreover, DNA is just part of the story. To truly understand biology, researchers are gathering data on the RNA, proteins and chemicals in cells. That data can be even more voluminous than data on genes. And those different types of data have to be integrated.

