Computer Science and Electrical Engineering
University of Maryland Baltimore County
CS Graduate Seminar

Whole Genome Assembly: Tactical and Strategic Implications

Granger Sutton
 Celera Genomics
 

2:00pm Friday, November 16, 2001 Lecture Hall V, Engineering and Computer Science Building

ABSTRACT

The assembly of a shotgun data set of end-sequence reads was considered controversial at the time Celera proposed to apply the method to the Human Genome in mid-1998. Critics claimed that the computation would involve an impossible amount of computer time, that the size and repetitiveness of the genome would confound all attempts at assembly should sufficient computer efficiency be achieved, and that even if an assembly was produced it would be of an extremely poor quality and partial nature.

In 1999 the informatics research team at Celera produced an assembly of the Drosophila genome from a whole genome shotgun data set consisting of 3.2 million reads, 72% of which were paired-end reads from 2Kbp and 10Kbp inserts in a 1 to 1.32 mix. The assembly consisted of completely ordered and oriented contigs covering an estimated 97.2% of the genome with only 1630 gaps of average size 1,415bp. Results of high quality have now been obtained for the human and mouse genomes and our reconstruction of the human genome has been  reported upon in Science. We discuss our algorithmic approach, the strategic pros and cons of the method, and the implications for annotation and discovery efforts. We also discuss how one verifies an assembly, contrast this to the human genome produced by the HGP, and discuss possible finishing strategies.
 

BRIEF BIO

Dr. Sutton received his Ph.D. in Computer Science from University of Maryland in 1992. He has a M.S. in Computer Engineering from Stanford University and a B.S. in Electrical Engineering from University of Maryland. Dr. Sutton's research has focused on solving computational problems in sequencing and analyzing DNA.  He has been an author of several key tools for bioinformatics, including a protein multiple sequence aligner and the TIGR shotgun fragment assembler.  He is currently a Director in Celera's Informatics Research group. His team continues to develop the Celera shotgun fragment assembler.