An international consortium has begun an ambitious project to sequence the complete genomes of at least 1,000 people in order to learn more about the effect of genetics on disease.
The 1000 Genomes Project will be supported by the Wellcome Trust Sanger Institute, UK, the Beijing Genomics Institute (BGI Shenzhen), China, and the National Human Genome Research Institute (NHGRI), part of the US National Institutes of Health (NIH).
The goal of the project is to provide the most detailed ever map of human genetic variation to support the study of disease. Almost daily, a new gene or genetic mutation is linked in someway to a disease. However, we have yet to compile enough genetic data to be able to fully appreciate how common these connections are and therefore how big a role the genetic mutation plays in a given disease.
Genetic variations can also help explain a person's reaction to a drug or environmental factors.
The Project will draw on the expertise of multidisciplinary research teams with the aim of helping overcome these limitations and opening the door towards truly personalised medicine. The data will be made available on freely accessible databases.
"The 1000 Genomes Project will examine the human genome at a level of detail that no one has done before," said Dr Richard Durbin, of the Wellcome Trust Sanger Institute, who is co-chair of the consortium.
"Such a project would have been unthinkable only two years ago. Today, thanks to amazing strides in sequencing technology, bioinformatics and population genomics, it is now within our grasp. So we are moving forward to build a tool that will greatly expand and further accelerate efforts to find more of the genetic factors involved in human health and disease."
Variation in the human genome is organized into local neighbourhoods called haplotypes, which are stretches of DNA usually inherited as intact blocks of information. Using recently developed catalogues of human genetic variation, for example the HapMap, researchers already have discovered more than 100 regions of the genome containing variants that are associated with risk of common human diseases such as diabetes, coronary artery disease, prostate and breast cancer, rheumatoid arthritis, inflammatory bowel disease and age-related macular degeneration.
However, the existing maps are not extremely detailed, and so researchers often must follow those studies with costly and time-consuming DNA sequencing to help pinpoint the precise causative variants. The new map would enable researchers to more quickly zero in on disease-related genetic variants, speeding efforts to use genetic information to develop new strategies for diagnosing, treating and preventing common diseases.
"This new project will increase the sensitivity of disease discovery efforts across the genome five-fold and within gene regions at least 10-fold," said NHGRI director Dr Francis Collins.
The Project will identify variants present in 1 per cent or more of the human population across most of the genome, and down to 0.5 per cent or lower within the genes themselves.
The 1,000 subjects of the research will remain anonymous. The teams involve hope that by using that many people, the data will be representative of 1 per cent of the population. Current information is thought to represent around 10 per cent of the population.
The researchers will not only map out single nucleotide polymorphisms (SNPs), but also will produce information on larger differences in genome structure called structural variants. Structural variants are rearrangements, deletions or duplications of segments of the human genome and are thought to play a role in susceptibility to certain conditions, such as mental retardation and autism.
Who's doing what with what equipment?
The sequencing work itself will be carried out at the Sanger Institute, BGI Shenzhen and NHGRI's Large-Scale Sequencing Network, which includes the Broad Institute of MIT and Harvard; the Washington University Genome Sequencing Center at the Washington University School of Medicine in St. Louis; and the Human Genome Sequencing Center at the Baylor College of Medicine in Houston. The consortium may add other participants over time.
In the first phase of the Project, three pilots will be conducted over about a year in order to decide how to complete the research most efficiently and cheaply.
The first pilot will involve sequencing the genomes of two nuclear families (both parents and an adult child) at deep coverage that averages 20 passes of each genome. This will provide a comprehensive dataset from six people that will help the project figure out how to identify variants using the new sequencing platforms, and serve as a basis for comparison for other parts of the effort.
The second pilot will involve sequencing the genomes of 180 people at low coverage that averages two passes of each genome. This will test the ability to use low-coverage data from new sequencing platforms to identify sequence variants and to put them in their genomic context.
The third pilot will involve sequencing the coding regions, called exons, of about 1,000 genes in about 1,000 people. This is aimed at exploring how best to obtain an even more detailed catalogue in the approximately 2 per cent of the genome that is comprised of protein-coding genes.
Among the populations whose DNA will be sequenced during the Project are: Yoruba in Ibadan, Nigeria; Japanese in Tokyo; Chinese in Beijing; Utah residents with ancestry from northern and western Europe; Luhya in Webuye, Kenya; Maasai in Kinyawa, Kenya; Toscani in Italy; Gujarati Indians in Houston; Chinese in metropolitan Denver; people of Mexican ancestry in Los Angeles; and people of African ancestry in the southwestern United States.
"This project will examine the human genome in a detail that has never been attempted - the scale is immense," said Dr Gil McVean from the University of Oxford, UK.
"When up and running at full speed, this project will generate more sequence in two days than was added to public databases for all of the past year."