The 1,000 Genomes Project is to benefit from a powerful new computational tool which can analyse half a million DNA sequences within ten minutes.
The tool, which uses an innovative statistical technique to analyse genetics data faster and more accurately than previous methods, should allow scientists to detect more subtle genetic variations at a lower cost.
Over the last five years, the experimental technology used to obtain genetic sequences has massively improved. Whereas it took 13 years to obtain the first fully sequenced human genome, scientists now plan to sequence 1,000 more human genomes within the next three years, to find the subtle genetic variations between different human beings.
One of these techniques is pyrosequencing, which provides longer sequences of base pairs (250 compared to 35 with other methods). However, with these new techniques comes an enormous amount of data, so scientists are continually looking for innovative new techniques to analyse the data at a higher speed and to a greater accuracy than ever before.
"We're on the edge of a real technological revolution that I think will help us understand the genetic causes of diseases in humans and how genetic materials determine traits in animals," said Gabor Marth, a member of the 1,000 Genomes Project from from Boston College in the USA. "It is going to lead to less expensive technologies that will allow researchers to decode any individual."
Marth and his team at Boston College believe their new PyroBayes software considerably improves on previous techniques. The software has already been applied to pyrosequencing data from the Roche/454 Life Sciences machine, with promising results. In a Nature Methods article by Marth, his post-doctoral researcher Chip Stewart and graduate students Aaron Quinlan and Mike Strömberg, the team showed that the technique is faster and more accurate than other methods.
"PyroBayes is able to base-call pyrosequencing reads from the 454 Life Sciences sequencing machine so efficiently because of the empirical model that we developed, as well as an efficient C++ software implementation," says Aaron Quinlan. "It differs from the base-calling software that comes with the 454 machines in the statistical models that it uses to decide how many nucleotides to call in each cycle of the sequencing machine and what the accuracy of each base is."
The main advantage of Pyrobayes is that the accuracy estimates (often referred to as base quality values), he added, are high and more accurate than those produced by the native software. Quinlan explained that higher base quality values allows you to detect variations between the different genomes with a greater sensitivity.
"In other words, one is able to detect more of the genetic variation that exists in the DNA you've sequenced for the same cost," he said.
The technique has also been used to resequence the genome of a worm using the Illumina/Solexa machine, as covered in another Nature Methods article. Resequencing is the process of directly comparing DNA from a sample to DNA contained in a reference sequence of the same species to find genetic variation, and it will be used heavily in the 1,000 Genomes Project.