Sanger Centre and Wellcome Trust
Munich 30 November 2000 The Sanger Centre is a research centre and primarily funded by the Wellcome Trust. It is located in Hinxton Hall, near Cambridge, in the "Genome Campus". Richard Durbin from Sanger gave a short introduction into the centre, the international activities and the actual projects.
The Sanger Centre was founded in 1993 and moved 1996 into its new, purpose-built main building. It is part of the Wellcome Trust Genome Campus, which includes the European Bioinformatics Institute (EBI), an outstation of the European Molecular Biology Laboratory - EMBL, and the UK Human Genome Mapping Project Resource Centre (HGMP-RC), which is funded by the UK Medical Research Council. The Human Cancer Genome Project is also located at Sanger.
Sanger's aim is to further the knowledge of the biology of organisms, particularly through large scale sequencing and analysis of their genomes.
The Sanger Centre is a major contributor to the Human Genome Project (HGP) - the international collaboration to decode the human genome. It is responsible for about one-third of the sequence data for HGP, working on chromosomes 1, 6, 9, 10, 13, 20, 22 (finished) and X. Sanger also sequences the genomes of pathogens (disease) and model organism as well as providing in vitro and in silico data for gene expression.
Currently about 570 staff works on projects from Streptococcus equi to the human genome. Two-third of these work principally is done on human genome sequencing projects. A genome is a complete set of inheritable instructions (i.e. Genetic instructions) required to make an organism. The instructions are genes - functions. They are contained in chromosomes, which are long chains of deoxyribonucleic acid. Richard Durbin discussed the molecular genetic paradigm. The biology and medicine of human beings is determined by the interaction of the products of our genes with each other, and with the environment. Therefore if we know all the genes, we can approach biology and medicine from two sides: genes and molecules on the one hand, and observed outcomes on the other. Products of our genome interact with each other and with the environment and pathogens
The Human Genome Project
The initiative started in concept in 1985, in principle in 1990, and in earnest in 1995. The sequencing of 3 billion bases was an unprecedented technical and logistical challenge for biology. The international public and charity funding agencies are committed to provide the sequence in the public domain for free research and commercial use. They are accessible from all over the world and now researchers have to find the meaningful bits. Today 90% of the human genome sequence is now publicly available, it is usable now and widely used. 30% of the sequence is in complete" form, high accuracy archival reference quality. The centre aims to complete essentially the entire genome to this standard by 2003. The team is proud of the quality of the data, there is at least one error in 10000/100 000 bases.
Richard Durban listed some other Sanger Centre research programmes:
- pathogen sequencing programme
- human genetic programme - study the genetic variation (SNP) and find disease genes
- cancer genome project
- informatics including: support data collection; analyse and present results; and develop methodology, algorithms and data resources
In a typical month the centre generates about 30,000,000 bases per day of human raw sequence - this is the basic data which involves overlaps and repeated sequences. The capacity is about 100,000 reads per day between all machines. That means an additional 40 Gbyte storage per day.
The genomic information can be used to identity genetic factors that are involved in common disease, assessment of individual risks, development of new drugs, better diagnosis e.g. Tumor classification and increase the understanding of basic biological science. In the human genetics one finds genetic variations and use it to find human genetic factors in disease. In cancer one want to find genetic changes that take place during cancer, to aid in diagnosis and cure.
Common disease with a strong genetic component
Richard Durbin mentioned the strong relationship of disease and genetics. Asthma for example has a heritability of 60%, with bone mineral density the relationship is 60 - 80% and with Insulin-dependent diabetes mellitius it is 50 - 90%. This figures result from the heritability of common complex traits and diseases from twin studies. Over 100 genes involved in disease have so far been identified.
Sanger Human Genetics Programme
Sanger Centre is a member of The SNP Consortium (SNPs: single nucleotide = single base; polymorphism = more than one form). It consists of 13 companies, for example Beecham, Novartis, IBM, Motorola, the Wellcome Trust and 5 research centres. The aim is to find points in the human genome where there is variation in the population (SNPs). The primary targets have been met 6 months earlier.
Sanger Pathogen Programme
Here the centre sequence pathogen genomes and start functional studies based on the sequences. Microbial pathogens have genomes that can be, or have been sequenced. New genes are target for new drugs e.g. If the protein is specific to the pathogen. Those genes, involved in the interaction between host and pathogen are candidates for vaccines. Sanger actually has completed the sequence of M. tuberculosis (TB), typhi, jejuni (food poisoning), leprae, miningitis and pestis and is still sequencing 18 other bacteria and parts of 7 protozoa, e.g. Malaria and sleeping sickness.
Sanger after the Human Genome Project
Now Sanger is studying the genetics of model organisms, for example the mouse as a model mammal, the zebrafish as a model vertebrate and the nematode worm as a very simple animal.
Informatics
The information technology supports the data collection, the analysis and presents results, and has to develop methods and algorithms and data resource. Thus the informatics has to build databases and tools to help to manage this information. Ensembl is a new computational project to identify the genes in the human genome sequence and keep them connected to other information resources as the knowledge develops. The other problem is the endless amount of data, reference human genome sequence 3 GBytes, analysis results 200 GBytes, mouse genome 3.2 Tbytes and human genome skims 10 Tbytes (?). Genomics now requires High-Performance Computing (HPC), as the rate of acquisition of genomic data inceases 4-fold per year, 2 Mbyte in 1995 to 3 Tbyte in 2000. The future needs more compute power, more sophisticated data management, and better algorithms - real engineering.
Uwe Harms
[News on Advanced IT][Calendar][Analysis][IT in Medicine]
|