1000 Genomes Project

1000 Genomes Project: Mapping humanity's genetic variation by next generation sequencing

All humans are more than 99 percent the same at the genetic level. However, it is important to understand particularly the one percent of genetic material that varies among people because it can help explain individual differences in susceptibility to disease, response to drugs or reaction to environmental factors.
Drawing on the expertise of multidisciplinary research teams, the international 1000 Genomes Project (www.1000genomes.org), planned in September 2007 in Hinxton, UK, and launched in January 2008, will develop a new map of the differences in the human genome.
In August 2008, the Max-Planck-Institute of molecular genetics in Berlin was included as partner in the international consortium with Dr. Ralf Sudbrak as leader of the German part of the project. Prof. Dr. Hans Lehrach is director of the division in which the sequencing operations are performed. He is member of the 1000 Genomes Project steering committee as well as coordinator of the NGFN consortium Mutanom. 
In June 2010 the researchers of the 1000 Genomes Project announced the successful completion of the three pilot studies. Therewith the main phase of the projects has started. The goal is to create an open access databse with genomic information of 2,500 individuals from 27 different population groups worldwide.

The 1000 Genomes Project aims to produce a catalogue of variants that are present at 1 percent or greater frequency in the human population across the genome. An international research consortium is sequencing with ambitious effort the genomes of at least a thousand people from randomly selected populations from throughout the world. Comparing the DNA variation between healthy and ill subjects will allow researchers to pinpoint genes, structural variants in chromosomes and other individual genomic variations that are associated with diseases ranging from cardiovascular diseases to cancer.

The amount of data produced by the 1000 Genomes Project is unprecedented in biomedical research. Currently, the total size of the datasets is over 50 terabytes, or 50,000 gigabytes. That corresponds to almost eight trillion DNA base pairs, or terabases, of sequence data. Researchers have free access to the pilot data of the 1000 Genomes Projekt on the project website and may download the data via the NCBI (ftp://ftp-trace.ncbi.nih.gov/1000genomes/) or EBI (ftp://ftp.1000genomes.ebi.ac.uk/).

Pilot projects – testing essential aspects of project feasibility

The first pilot project involved sequencing the genomes of six people (two nuclear families each with two parents and a daughter) at high coverage. Each sample was sequenced an average of 20 - 60 times, and using a variety of sequencing technologies. Previous “personal genomes” were each based on only a single sequencing method, and thus were limited to what that method could detect. By using multiple methods, the Project has uncovered not only a more complete picture of DNA variation in these individuals, but also learned about the strengths and limitations of each of the current technologies. These data also served as a comparison group for the genome sequences analyzed in the other pilot projects. The six genomes were sequenced by academic centers in China, Germany, the U.K., and the U.S., as well as by three companies, using platforms from the companies: 454 Life Sciences, a Roche company; Applied Biosystems, an Applera Corp. business; and Illumina Inc. All of the platforms were able to sequence 85-90 percent of a genome and produce high-quality data.

The second pilot project sequenced the genomes of 179 people at low coverage -- an average of three passes of the genome. Although sequencing costs are dropping, it is still very expensive to sequence the genomes of hundreds of people deeply enough to find all of the genetic variants in each genome accurately. An alternative approach is to sequence many genomes at light coverage, and then combine the data from many people to discover genetic variants that they share. The results of the pilot project confirmed that this strategy is effective and will allow the project to meet its goal of discovering sequence variants that are shared with other people.

The third pilot project involved sequencing the coding regions, called exons, of 1,000 genes in about 700 people to explore how best to obtain a detailed catalog in the approximately 2 percent of the genome that is composed of protein-coding genes. This Project provided unprecendented sample size to learn about the patterns of rare variation in the human population.

INTRANET (Members login)