Whole Genome Assembly: Tactical and Strategic Implications
Granger Sutton
Celera Genomics
2:00pm Friday, November 16, 2001 Lecture Hall V, Engineering and Computer Science Building
ABSTRACT
The assembly of a shotgun data set of end-sequence reads was considered controversial at the time Celera proposed to apply the method to the Human Genome in mid-1998. Critics claimed that the computation would involve an impossible amount of computer time, that the size and repetitiveness of the genome would confound all attempts at assembly should sufficient computer efficiency be achieved, and that even if an assembly was produced it would be of an extremely poor quality and partial nature.
In 1999 the informatics research team at Celera produced
an assembly of the Drosophila genome from a whole genome shotgun data set
consisting of 3.2 million reads, 72% of which were paired-end reads from
2Kbp and 10Kbp inserts in a 1 to 1.32 mix. The assembly consisted of completely
ordered and oriented contigs covering an estimated 97.2% of the genome
with only 1630 gaps of average size 1,415bp. Results of high quality have
now been obtained for the human and mouse genomes and our reconstruction
of the human genome has been reported upon in Science. We discuss
our algorithmic approach, the strategic pros and cons of the method, and
the implications for annotation and discovery efforts. We also discuss
how one verifies an assembly, contrast this to the human genome produced
by the HGP, and discuss possible finishing strategies.
BRIEF BIO
Dr. Sutton received his Ph.D. in Computer Science from
University of Maryland in 1992. He has a M.S. in Computer Engineering from
Stanford University and a B.S. in Electrical Engineering from University
of Maryland. Dr. Sutton's research has focused on solving computational
problems in sequencing and analyzing DNA. He has been an author of
several key tools for bioinformatics, including a protein multiple sequence
aligner and the TIGR shotgun fragment assembler. He is currently
a Director in Celera's Informatics Research group. His team continues to
develop the Celera shotgun fragment assembler.