Genetic Epidemiology

Coordinator:    Prof. Dr. Michael Krawczak
Institution: Institut für Medizinische Informatik und Statistik

Genomic data are generated by massively parallel genotyping of hundreds of thousands of markers, by sequencing whole genomes or parts of them. Features and scale of such data pose large challenges to their storage, processing and analysis. The subproject has made important contributions to each of these three aspects, with a focus on mathematical and statistical modeling of such data. Newly developed data base solutions enable consistent data storage and rapid data access. Semi-automated quality control procedures guaranteed data of high quality for reliable analysis results. Modeling approach for quality control included the valuation of genome coverage by specific marker sets and estimates of the chances of detecting genuine disease associations. In particular, we could show the existence of different error profiles for a number of sequencing platforms and could quantify their extent. An implication of this finding is a likely bias in analysis results when data from different platforms are combined. Numerous additional theoretical and practical studies led to improved analyses of genomic data. For example, we could show the existence of a gradient in the occurrence long homozygous stretches in European populations and evaluated the performance of genotype prediction algorithms (imputation). We assisted cooperating teams with statistical analyses, e.g. how genes and environmental factors act together. We developed new approaches for identifying genetic variants that influence the etiology of two or even more diseases and successfully applied these approaches. Furthermore, we proposed a new statistical approach for detection of so-called allelic imbalance, where only one of the copies of the homologous chromosomes is being transcribed into protein predecessors (transcription). This approach allows for errors in the sequence data and achieves a higher detection rate than classical methods, making this approach of particular importance for the analysis of tumor cells. In summary, the subproject, with its mathematical-statistical focus, has made essential contributions to the elucidation of the genetic causes for diseases, but also described for the first time some general patterns in the genetic variation of European populations.

Further Coordinators:
INTRANET (Members login)