Advertisement
NO ACCESS
Special Reviews

The Sequence of the Human Genome

J. Craig Venter, Mark D. Adams, Eugene W. Myers, Peter W. Li, Richard J. Mural, Granger G. Sutton, Hamilton O. Smith, Mark Yandell, Cheryl A. Evans, Robert A. Holt, Jeannine D. Gocayne, Peter Amanatides, Richard M. Ballew, Daniel H. Huson, Jennifer Russo Wortman, Qing Zhang, Chinnappa D. Kodira, Xiangqun H. Zheng, Lin Chen, Marian Skupski, Gangadharan Subramanian, Paul D. Thomas, Jinghui Zhang, George L. Gabor Miklos, Catherine Nelson, Samuel Broder, Andrew G. Clark, Joe Nadeau, Victor A. McKusick, Norton Zinder, Arnold J. Levine, Richard J. Roberts, Mel Simon, Carolyn Slayman, Michael Hunkapiller, Randall Bolanos, Arthur Delcher, Ian Dew, Daniel Fasulo, Michael Flanigan, Liliana Florea, Aaron Halpern, Sridhar Hannenhalli, Saul Kravitz, Samuel Levy, Clark Mobarry, Knut Reinert, Karin Remington, Jane Abu-Threideh, Ellen Beasley, Kendra Biddick, Vivien Bonazzi, Rhonda Brandon, Michele Cargill, Ishwar Chandramouliswaran, Rosane Charlab, Kabir Chaturvedi, Zuoming Deng, Valentina Di Francesco, Patrick Dunn, Karen Eilbeck, Carlos Evangelista, Andrei E. Gabrielian, Weiniu Gan, Wangmao Ge, Fangcheng Gong, Zhiping Gu, Ping Guan, Thomas J. Heiman, Maureen E. Higgins, Rui-Ru Ji, Zhaoxi Ke, Karen A. Ketchum, Zhongwu Lai, Yiding Lei, Zhenya Li, Jiayin Li, Yong Liang, Xiaoying Lin, Fu Lu, Gennady V. Merkulov, Natalia Milshina, Helen M. Moore, Ashwinikumar K Naik, Vaibhav A. Narayan, Beena Neelam, Deborah Nusskern, Douglas B. Rusch, Steven Salzberg, Wei Shao, Bixiong Shue, Jingtao Sun, Zhen Yuan Wang, Aihui Wang, Xin Wang, Jian Wang, Ming-Hui Wei, Ron Wides, Chunlin Xiao, Chunhua Yan, Alison Yao, Jane Ye, Ming Zhan, Weiqing Zhang, Hongyu Zhang, Qi Zhao, Liansheng Zheng, Fei Zhong, Wenyan Zhong, Shiaoping C. Zhu, Shaying Zhao, Dennis Gilbert, Suzanna Baumhueter, Gene Spier, Christine Carter, Anibal Cravchik, Trevor Woodage, Feroze Ali, Huijin An, Aderonke Awe, Danita Baldwin, Holly Baden, Mary Barnstead, Ian Barrow, Karen Beeson, Dana Busam, Amy Carver, Angela Center, Ming Lai Cheng, Liz Curry, Steve Danaher, Lionel Davenport, Raymond Desilets, Susanne Dietz, Kristina Dodson, Lisa Doup, Steven Ferriera, Neha Garg, Andres Gluecksmann, Brit Hart, Jason Haynes, Charles Haynes, Cheryl Heiner, Suzanne Hladun, Damon Hostin, Jarrett Houck, Timothy Howland, Chinyere Ibegwam, Jeffery Johnson, Francis Kalush, Lesley Kline, Shashi Koduru, Amy Love, Felecia Mann, David May, Steven McCawley, Tina McIntosh, Ivy McMullen, Mee Moy, Linda Moy, Brian Murphy, Keith Nelson, Cynthia Pfannkoch, Eric Pratts, Vinita Puri, Hina Qureshi, Matthew Reardon, Robert Rodriguez, Yu-Hui Rogers, Deanna Romblad, Bob Ruhfel, Richard Scott, Cynthia Sitter, Michelle Smallwood, Erin Stewart, Renee Strong, Ellen Suh, Reginald Thomas, Ni Ni Tint, Sukyee Tse, Claire Vech, Gary Wang, Jeremy Wetter, Sherita Williams, Monica Williams, Sandra Windsor, Emily Winn-Deen, Keriellen Wolfe, Jayshree Zaveri, Karena Zaveri, Josep F. Abril, Roderic Guigó, Michael J. Campbell, Kimmen V. Sjolander, Brian Karlak, Anish Kejariwal, Huaiyu Mi, Betty Lazareva, Thomas Hatton, Apurva Narechania, Karen Diemer, Anushya Muruganujan, Nan Guo, Shinji Sato, Vineet Bafna, Sorin Istrail, Ross Lippert, Russell Schwartz, Brian Walenz, Shibu Yooseph, David Allen, Anand Basu, James Baxendale, Louis Blick, Marcelo Caminha, John Carnes-Stine, Parris Caulk, Yen-Hui Chiang, My Coyne, Carl Dahlke, Anne Deslattes Mays, Maria Dombroski, Michael Donnelly, Dale Ely, Shiva Esparham, Carl Fosler, Harold Gire, Stephen Glanowski, Kenneth Glasser, Anna Glodek, Mark Gorokhov, Ken Graham, Barry Gropman, Michael Harris, Jeremy Heil, Scott Henderson, Jeffrey Hoover, Donald Jennings, Catherine Jordan, James Jordan, John Kasha, Leonid Kagan, Cheryl Kraft, Alexander Levitsky, Mark Lewis, Xiangjun Liu, John Lopez, Daniel Ma, William Majoros, Joe McDaniel, Sean Murphy, Matthew Newman, Trung Nguyen, Ngoc Nguyen, Marc Nodell, Sue Pan, Jim Peck, Marshall Peterson, William Rowe, Robert Sanders, John Scott, Michael Simpson, Thomas Smith, Arlan Sprague, Timothy Stockwell, Russell Turner, Eli Venter, Mei Wang, Meiyuan Wen, David Wu, Mitchell Wu, Ashley Xia, Ali Zandieh, and Xiaohong Zhu
Science
16 Feb 2001
Vol 291, Issue 5507
pp. 1304-1351

Abstract

A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies—a whole-genome assembly and a regional chromosome assembly—were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional ∼12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

Get full access to this article

View all available purchase options and get full access to this article.

Already a subscriber or AAAS Member? Log In

Supplementary Material

File (1058040s2a_large.jpeg)
File (1058040s2a_med.gif)
File (1058040s2a_thumb.gif)
File (1058040s2b_large.jpeg)
File (1058040s2b_med.gif)
File (1058040s2b_thumb.gif)
File (1058040s2c_large.jpeg)
File (1058040s2c_med.gif)
File (1058040s2c_thumb.gif)
File (1058040s2d_large.jpeg)
File (1058040s2d_med.gif)
File (1058040s2d_thumb.gif)
File (1058040s2e_large.jpeg)
File (1058040s2e_med.gif)
File (1058040s2e_thumb.gif)
File (1058040s3-10_large.jpeg)
File (1058040s3-10_med.gif)
File (1058040s3-10_thumb.gif)
File (1058040s3-11_large.jpeg)
File (1058040s3-11_med.gif)
File (1058040s3-11_thumb.gif)
File (1058040s3-12_large.jpeg)
File (1058040s3-12_med.gif)
File (1058040s3-12_thumb.gif)
File (1058040s3-13_large.jpeg)
File (1058040s3-13_med.gif)
File (1058040s3-13_thumb.gif)
File (1058040s3-14_large.jpeg)
File (1058040s3-14_med.gif)
File (1058040s3-14_thumb.gif)
File (1058040s3-15_large.jpeg)
File (1058040s3-15_med.gif)
File (1058040s3-15_thumb.gif)
File (1058040s3-16_large.jpeg)
File (1058040s3-16_med.gif)
File (1058040s3-16_thumb.gif)
File (1058040s3-17_large.jpeg)
File (1058040s3-17_med.gif)
File (1058040s3-17_thumb.gif)
File (1058040s3-18_large.jpeg)
File (1058040s3-18_med.gif)
File (1058040s3-18_thumb.gif)
File (1058040s3-19_large.jpeg)
File (1058040s3-19_med.gif)
File (1058040s3-19_thumb.gif)
File (1058040s3-1_large.jpeg)
File (1058040s3-1_med.gif)
File (1058040s3-1_thumb.gif)
File (1058040s3-20_large.jpeg)
File (1058040s3-20_med.gif)
File (1058040s3-20_thumb.gif)
File (1058040s3-21_large.jpeg)
File (1058040s3-21_med.gif)
File (1058040s3-21_thumb.gif)
File (1058040s3-22_large.jpeg)
File (1058040s3-22_med.gif)
File (1058040s3-22_thumb.gif)
File (1058040s3-23_large.jpeg)
File (1058040s3-23_med.gif)
File (1058040s3-23_thumb.gif)
File (1058040s3-24_large.jpeg)
File (1058040s3-24_med.gif)
File (1058040s3-24_thumb.gif)
File (1058040s3-2_large.jpeg)
File (1058040s3-2_med.gif)
File (1058040s3-2_thumb.gif)
File (1058040s3-3_large.jpeg)
File (1058040s3-3_med.gif)
File (1058040s3-3_thumb.gif)
File (1058040s3-4_large.jpeg)
File (1058040s3-4_med.gif)
File (1058040s3-4_thumb.gif)
File (1058040s3-5_large.jpeg)
File (1058040s3-5_med.gif)
File (1058040s3-5_thumb.gif)
File (1058040s3-6_large.jpeg)
File (1058040s3-6_med.gif)
File (1058040s3-6_thumb.gif)
File (1058040s3-7_large.jpeg)
File (1058040s3-7_med.gif)
File (1058040s3-7_thumb.gif)
File (1058040s3-8_large.jpeg)
File (1058040s3-8_med.gif)
File (1058040s3-8_thumb.gif)
File (1058040s3-9_large.jpeg)
File (1058040s3-9_med.gif)
File (1058040s3-9_thumb.gif)
File (c10_science.pdf)
File (c11_science.pdf)
File (c12_science.pdf)
File (c13_science.pdf)
File (c14_science.pdf)
File (c15_science.pdf)
File (c16_science.pdf)
File (c17_science.pdf)
File (c18_science.pdf)
File (c19_science.pdf)
File (c1_science.pdf)
File (c20_science.pdf)
File (c21_science.pdf)
File (c22_science.pdf)
File (c2_science.pdf)
File (c3_science.pdf)
File (c4_science.pdf)
File (c5_science.pdf)
File (c6_science.pdf)
File (c7_science.pdf)
File (c8_science.pdf)
File (c9_science.pdf)
File (cx_science.pdf)
File (cy_science.pdf)
File (jpdisclaimer.jpg)
File (jpgenome.pdf)
File (key-med.gif)
File (legend_key_2.pdf)
File (legend_text_2.pdf)

REFERENCES AND NOTES

1
Sinsheimer R. L., Genomics 5, 954 (1989);
; U.S. Department of Energy, Office of Health and Environmental Research, Sequencing the Human Genome: Summary Report of the Santa Fe Workshop, Santa Fe, NM, 3 to 4 March 1986 (Los Alamos National Laboratory, Los Alamos, NM, 1986).
2
R. Cook-Deegan, The Gene Wars: Science, Politics, and the Human Genome (Norton, New York, 1996).
3
Sanger F., et al., Nature 265, 687 (1977).
4
Seeburg P. H., et al., Trans. Assoc. Am. Physicians 90, 109 (1977).
5
Strauss E. C., Kobori J. A., Siu G., Hood L. E., Anal. Biochem. 154, 353 (1986).
6
Gocayne J., et al., Proc. Natl. Acad. Sci. U.S.A. 84, 8296 (1987).
7
Martin-Gallardo A., et al., DNA Sequence 3, 237 (1992);
McCombie W. R., et al., Nature Genet. 1, 348 (1992);
Jensen M. A., et al., DNA Sequence 1, 233 (1991).
8
Adams M. D., et al., Science 252, 1651 (1991).
9
Adams M. D., et al., Nature 355, 632 (1992);
Adams M. D., Kerlavage A. R., Fields C., Venter J. C., Nature Genet. 4, 256 (1993);
Adams M. D., Soares M. B., Kerlavage A. R., Fields C., Venter J. C., Nature Genet. 4, 373 (1993);
Polymeropoulos M. H., et al., Nature Genet. 4, 381 (1993);
Marra M., et al., Nature Genet. 21, 191 (1999).
10
Adams M. D., et al., Nature 377, 3 (1995);
White O., et al., Nucleic Acids Res. 21, 3829 (1993).
11
Sanger F., Coulson A. R., Hong G. F., Hill D. F., Petersen G. B., J. Mol. Biol. 162, 729 (1982).
12
Mahy B. W. J., Esposito J. J., Venter J. C., Am. Soc. Microbiol. News 57, 577 (1991).
13
Fleischmann R. D., et al., Science 269, 496 (1995).
14
Fraser C. M., et al., Science 270, 397 (1995).
15
Bult C. J., et al., Science 273, 1058 (1996);
Tomb J. F., et al., Nature 388, 539 (1997);
Klenk H. P., et al., Nature 390, 364 (1997).
16
Venter J. C., Smith H. O., Hood L., Nature 381, 364 (1996).
17
Schmitt H., et al., Genomics 33, 9 (1996).
18
Zhao S., et al., Genomics 63, 321 (2000).
19
Lin X., et al., Nature 402, 761 (1999).
20
Weber J. L., Myers E. W., Genome Res. 7, 401 (1997).
21
Green P., Genome Res. 7, 410 (1997).
22
Pennisi E., Science 280, 1185 (1998).
23
Venter J. C., et al., Science 280, 1540 (1998).
24
Adams M. D., et al., Nature 368, 474 (1994).
25
Marshall E., Pennisi E., Science 280, 994 (1998).
26
Adams M. D., et al., Science 287, 2185 (2000).
27
Rubin G. M., et al., Science 287, 2204 (2000).
28
Myers E. W., et al., Science 287, 2196 (2000).
29
Collins F. S., et al., Science 282, 682 (1998).
30
International Human Genome Sequencing Consortium (2001), Nature 409, 860 (2001).
31
Institutional review board: P. Calabresi (chairman), H. P. Freeman, C. McCarthy, A. L. Caplan, G. D. Rogell, J. Karp, M. K. Evans, B. Margus, C. L. Carter, R. A. Millman, S. Broder.
32
Eligibility criteria for participation in the study were as follows: prospective donors had to be 21 years of age or older, not pregnant, and capable of giving an informed consent. Donors were asked to self-define their ethnic backgrounds. Standard blood bank screens (screening for HIV, hepatitis viruses, and so forth) were performed on all samples at the clinical laboratory prior to DNA extraction in the Celera laboratory. All samples that tested positive for transmissible viruses were ineligible and were discarded. Karyotype analysis was performed on peripheral blood lymphocytes from all samples selected for sequencing; all were normal. A two-staged consent process for prospective donors was employed. The first stage of the consent process provided information about the genome project, procedures, and risks and benefits of participating. The second stage of the consent process involved answering follow-up questions and signing consent forms, and was conducted about 48 hours after the first.
33
DNA was isolated from blood (173) or sperm. For sperm, a washed pellet (100 μl) was lysed in a suspension (1 ml) containing 0.1 M NaCl, 10 mM tris-Cl–20 mM EDTA (pH 8), 1% SDS, 1 mg proteinase K, and 10 mM dithiothreitol for 1 hour at 37°C. The lysate was extracted with aqueous phenol and with phenol/chloroform. The DNA was ethanol precipitated and dissolved in 1 ml TE buffer. To make genomic libraries, DNA was randomly sheared, end-polished with consecutive BAL31 nuclease and T4 DNA polymerase treatments, and size-selected by electrophoresis on 1% low-melting-point agarose. After ligation to Bst XI adapters (Invitrogen, catalog no. N408-18), DNA was purified by three rounds of gel electrophoresis to remove excess adapters, and the fragments, now with 3′-CACA overhangs, were inserted into Bst XI-linearized plasmid vector with 3′-TGTG overhangs. Libraries with three different average sizes of inserts were constructed: 2, 10, and 50 kbp. The 2-kbp fragments were cloned in a high-copy pUC18 derivative. The 10- and 50-kbp fragments were cloned in a medium-copy pBR322 derivative. The 2- and 10-kbp libraries yielded uniform-sized large colonies on plating. However, the 50-kbp libraries produced many small colonies and inserts were unstable. To remedy this, the 50-kbp libraries were digested with Bgl II, which does not cleave the vector, but generally cleaved several times within the 50-kbp insert. A 1264-bp Bam HI kanamycin resistance cassette (purified from pUCK4; Amersham Pharmacia, catalog no. 27-4958-01) was added and ligation was carried out at 37°C in the continual presence of Bgl II. As Bgl II–Bgl II ligations occurred, they were continually cleaved, whereas Bam HI–Bgl II ligations were not cleaved. A high yield of internally deleted circular library molecules was obtained in which the residual insert ends were separated by the kanamycin cassette DNA. The internally deleted libraries, when plated on agar containing ampicillin (50 μg/ml), carbenicillin (50 μg/ml), and kanamycin (15 μg/ml), produced relatively uniform large colonies. The resulting clones could be prepared for sequencing using the same procedures as clones from the 10-kbp libraries.
34
Transformed cells were plated on agar diffusion plates prepared with a fresh top layer containing no antibiotic poured on top of a previously set bottom layer containing excess antibiotic, to achieve the correct final concentration. This method of plating permitted the cells to develop antibiotic resistance before being exposed to antibiotic without the potential clone bias that can be introduced through liquid outgrowth protocols. After colonies had grown, QBot (Genetix, UK) automated colony-picking robots were used to pick colonies meeting stringent size and shape criteria and to inoculate 384-well microtiter plates containing liquid growth medium. Liquid cultures were incubated overnight, with shaking, and were scored for growth before passing to template preparation. Template DNA was extracted from liquid bacterial culture using a procedure based upon the alkaline lysis miniprep method (173) adapted for high throughput processing in 384-well microtiter plates. Bacterial cells were lysed; cell debris was removed by centrifugation; and plasmid DNA was recovered by isopropanol precipitation and resuspended in 10 mM tris-HCl buffer. Reagent dispensing operations were accomplished using Titertek MAP 8 liquid dispensing systems. Plate-to-plate liquid transfers were performed using Tomtec Quadra 384 Model 320 pipetting robots. All plates were tracked throughout processing by unique plate barcodes. Mated sequencing reads from opposite ends of each clone insert were obtained by preparing two 384-well cycle sequencing reaction plates from each plate of plasmid template DNA using ABI-PRISM BigDye Terminator chemistry (Applied Biosystems) and standard M13 forward and reverse primers. Sequencing reactions were prepared using the Tomtec Quadra 384-320 pipetting robot. Parent-child plate relationships and, by extension, forward-reverse sequence mate pairs were established by automated plate barcode reading by the onboard barcode reader and were recorded by direct LIMS communication. Sequencing reaction products were purified by alcohol precipitation and were dried, sealed, and stored at 4°C in the dark until needed for sequencing, at which time the reaction products were resuspended in deionized formamide and sealed immediately to prevent degradation. All sequence data were generated using a single sequencing platform, the ABI PRISM 3700 DNA Analyzer. Sample sheets were created at load time using a Java-based application that facilitates barcode scanning of the sequencing plate barcode, retrieves sample information from the central LIMS, and reserves unique trace identifiers. The application permitted a single sample sheet file in the linking directory and deleted previously created sample sheet files immediately upon scanning of a sample plate barcode, thus enhancing sample sheet-to-plate associations.
35
Sanger F., Nicklen S., Coulson A. R., Proc. Natl. Acad. Sci. U.S.A. 74, 5463 (1977);
Prober J. M., et al., Science 238, 336 (1987).
36
Celera's computing environment is based on Compaq Computer Corporation's Alpha system technology running the Tru64 Unix operating system. Celera uses these Alphas as Data Servers and as nodes in a Virtual Compute Farm, all of which are connected to a fully switched network operating at Fast Ethernet speed (for the VCF) and gigabit Ethernet speed (for data servers). Load balancing and scheduling software manages the submission and execution of jobs, based on central processing unit (CPU) speed, memory requirements, and priority. The Virtual Compute Farm is composed of 440 Alpha CPUs, which includes model EV6 running at a clock speed of 400 MHz and EV67 running at 667 MHz. Available memory on these systems ranges from 2 GB to 8 GB. The VCF is used to manage trace file processing, and annotation. Genome assembly was performed on a GS 160 running 16 EV67s (667 MHz) and 64 GB of memory, and 10 ES40s running 4 EV6s (500 MHz) and 32 GB of memory. A total of 100 terabytes of physical disk storage was included in a Storage Area Network that was available to systems across the environment. To ensure high availability, file and database servers were configured as 4-node Alpha TruClusters, so that services would fail over in the event of hardware or software failure. Data availability was further enhanced by using hardware- and software-based disk mirroring (RAID-0), disk striping (RAID-1), and disk striping with parity (RAID-5).
37
Trace processing generates quality values for base calls by means of Paracel's TraceTuner, trims sequence reads according to quality values, trims vector and adapter sequence from high-quality reads, and screens sequences for contaminants. Similar in design and algorithm to the phred program (174), TraceTuner reports quality values that reflect the log-odds score of each base being correct. Read quality was evaluated in 50-bp windows, each read being trimmed to include only those consecutive 50-bp segments with a minimum mean accuracy of 97%. End windows (both ends of the trace) of 1, 5, 10, 25, and 50 bases were trimmed to a minimum mean accuracy of 98%. Every read was further checked for vector and contaminant matches of 50 bp or more, and if found, the read was removed from consideration. Finally, any match to the 5′ vector splice junction in the initial part of a read was removed.
38
National Center for Biotechnology Information (NCBI); available at www.ncbi.nlm.nih.gov/.
40
All bactigs over 3 kbp were examined for coverage by Celera mate pairs. An interval of a bactig was deemed an assembly error where there were no mate pairs spanning the interval and at least two reads that should have their mate on the other side of the interval but did not. In other words, there was no mate pair evidence supporting a join in the breakpoint interval and at least two mate pairs contradicting the join. By this criterion, we detected and broke apart bactigs at 13,037 locations, or equivalently, we found 2.13% of the bactigs to be misassembled.
41
We considered a BAC entry to be chimeric if, by the Lander-Waterman statistic (175), the odds were 0.99 or more that the assembly we produced was inconsistent with the sequence coming from a single source. By this criterion, 714 or 2.2% of BAC entries were deemed chimeric.
42
Myers G., Selznick S., Zhang Z., Miller W., J. Comput. Biol. 3, 563 (1996).
43
E. W. Myers, J. L. Weber, in Computational Methods in Genome Research, S. Suhai, Ed. (Plenum, New York, 1996), pp. 73–89.
44
P. Deloukas et al., Science 282, 744 (1998).
45
M. A. Marra et al., Genome Res.7, 1072 (1997).
46
J. Zhang et al., data not shown.
47
Shredded bactigs were located on long CSA scaffolds (>500 kbp) and the distribution of these fragments on the scaffolds was analyzed. If the spread of these fragments was greater than four times the reported BAC length, the BAC was considered to be chimeric. In addition, if >20% of bactigs of a given BAC were found on a different scaffolds that were not adjacent in map position, then the BAC was also considered as chimeric. The total chimeric BACs divided by the number of BACs used for CSA gave the minimal estimate of chimerism rate.
48
Hattori M., et al., Nature 405, 311 (2000).
49
Dunham I., et al., Nature 402, 489 (1999).
50
Carvalho A. B., Lazzaro B. P., Clark A. G., Proc. Natl. Acad. Sci. U.S.A. 97, 13239 (2000).
51
The International RH Mapping Consortium, available at www.ncbi.nlm.nih.gov/genemap99/.
53
Schuler G. D., Trends Biotechnol. 16, 456 (1998).
54
Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J., J. Mol. Biol. 215, 403 (1990).
55a
Olivier M., et al., Science 291, 1298 (2001).
56
Chaudhari N., Hahn W. E., Science 220, 924 (1983);
Milner R. J., Sutcliffe J. G., Nucleic Acids Res. 11, 5497 (1983).
57
Dickson D., Nature 401, 311 (1999).
58
Ewing B., Green P., Nature Genet. 25, 232 (2000).
59
Roest Crollius H., et al., Nature Genet. 25, 235 (2000).
60
M. Yandell, in preparation.
61
Pruitt K. D., Katz K. S., Sicotte H., Maglott D. R., Trends Genet. 16, 44 (2000).
62
Scaffolds containing greater than 10 kbp of sequence were analyzed for features of biological importance through a series of computational steps, and the results were stored in a relational database. For scaffolds greater than one megabase, the sequence was cut into single megabase pieces before computational analysis. All sequence was masked for complex repeats using Repeatmasker (52) before gene finding or homology-based analysis. The computational pipeline required ∼7 hours of CPU time per megabase, including repeat masking, or a total compute time of about 20,000 CPU hours. Protein searches were performed against the nonredundant protein database available at the NCBI. Nucleotide searches were performed against human, mouse, and rat Celera Gene Indices (assemblies of cDNA and EST sequences), mouse genomic DNA reads generated at Celera (3×), the Ensembl gene database available at the European Bioinformatics Institute (EBI), human and rodent (mouse and rat) EST data sets parsed from the dbEST database (NCBI), and a curated subset of the RefSeq experimental mRNA database (NCBI). Initial searches were performed on repeat-masked sequence with BLAST 2.0 (54) optimized for the Compaq Alpha compute-server and an effective database size of 3 × 109 for BLASTN searches and 1 × 109 for BLASTX searches. Additional processing of each query-subject pair was performed to improve the alignments. All protein BLAST results having an expectation score of <1 × 10−4, human nucleotide BLAST results having an expectation score of <1 × 10−8 with >94% identity, and rodent nucleotide BLAST results having an expectation score of <1 × 108 with >80% identity were then examined on the basis of their high-scoring pair (HSP) coordinates on the scaffold to remove redundant hits, retaining hits that supported possible alternative splicing. For BLASTX searches, analysis was performed separately for selected model organisms (yeast, mouse, human, C. elegans, and D. melanogaster) so as not to exclude HSPs from these organisms that support the same gene structure. Sequences producing BLAST hits judged to be informative, nonredundant, and sufficiently similar to the scaffold sequence were then realigned to the genomic sequence with Sim4 for ESTs, and with Lap for proteins. Because both of these algorithms take splicing into account, the resulting alignments usually give a better representation of intron-exon boundaries than standard BLAST analyses and thus facilitate further annotation (both machine and human). In addition to the homology-based analysis described above, three ab initio gene prediction programs were used (63).
63
Uberbacher E. C., Xu Y., Mural R. J., Methods Enzymol. 266, 259 (1996);
Burge C., Karlin S., J. Mol. Biol. 268, 78 (1997);
Mural R. J., Methods Enzymol. 303, 77 (1999);
Salamov A. A., Solovyev V. V., Genome Res. 10, 516 (2000);
; Floreal et al., Genome Res. 8, 967 (1998).
64
Miklos G. L., John B., Am. J. Hum. Genet. 31, 264 (1979);
Francke U., Cytogenet. Cell Genet. 65, 206 (1994).
65
P. E. Warburton, H. F. Willard, in Human Genome Evolution, M. S. Jackson, T. Strachan, G. Dover, Eds. (BIOS Scientific, Oxford, 1996), pp. 121–145.
66
Horvath J. E., Schwartz S., Eichler E. E., Genome Res. 10, 839 (2000).
67
Bickmore W. A., Sumner A. T., Trends Genet. 5, 144 (1989).
68
Holmquist G. P., Am. J. Hum. Genet. 51, 17 (1992).
69
Bernardi G., Gene 241, 3 (2000).
70
Zoubak S., Clay O., Bernardi G., Gene 174, 95 (1996).
71
Ohno S., Trends Genet. 1, 160 (1985).
72
Broman K. W., Murray J. C., Sheffield V. C., White R. L., Weber J. L., Am. J. Hum. Genet. 63, 861 (1998).
73
McEachern M. J., Krauskopf A., Blackburn E. H., Annu. Rev. Genet. 34, 331 (2000).
74
Bird A., Trends Genet. 3, 342 (1987).
75
Gardiner-Garden M., Frommer M., J. Mol. Biol. 196, 261 (1987).
76
Larsen F., Gundersen G., Lopez R., Prydz H., Genomics 13, 1095 (1992).
77
Cross S. H., Bird A., Curr. Opin. Genet. Dev. 5, 309 (1995).
78
J. Peters, Genome Biol. 1, reviews1028.1 (2000) ().
79
Grunau C., Hindermann W., Rosenthal A., Hum. Mol. Genet. 9, 2651 (2000).
80
Antequera F., Bird A., Proc. Natl. Acad. Sci. U.S.A. 90, 11995 (1993).
81
Cross S. H., et al., Mamm. Genome 11, 373 (2000).
82
Slavov D., et al., Gene 247, 215 (2000).
83
Smit A. F., Riggs A. D., Nucleic Acids Res. 23, 98 (1995).
84
Elliott D. J., et al., Hum. Mol. Genet. 9, 2117 (2000).
85
Makeyev A. V., Chkheidze A. N., Lievhaber S. A., J. Biol. Chem. 274, 24849 (1999).
86
Pan Y., Decker W. K., Huq A. H. H. M., Craigen W. J., Genomics 59, 282 (1999).
87
Nouvel P., Genetica 93, 191 (1994).
88
Goncalves I., Duret L., Mouchiroud D., Genome Res. 10, 672 (2000).
89
Lek first compares all proteins in the proteome to one another. Next, the resulting BLAST reports are parsed, and a graph is created wherein each protein constitutes a node; any hit between two proteins with an expectation beneath a user-specified threshold constitutes an edge. Lek then uses this graph to compute a similarity between each protein pair ij in the context of the graph as a whole by simply dividing the number of BLAST hits shared in common between the two proteins by the total number of proteins hit by i and j. This simple metric has several interesting properties. First, because the similarity metric takes into account both the similarity and the differences between the two sequences at the level of BLAST hits, the metric respects the multidomain nature of protein space. Two multidomain proteins, for instance, each containing domains A and B, will have a greater pairwise similarity to each other than either one will have to a protein containing only A or B domains, so long as A-B–containing multidomain proteins are less frequent in the proteome than are single-domain proteins containing A or B domains. A second interesting property of this similarity metric is that it can be used to produce a similarity matrix for the proteome as a whole without having to first produce a multiple alignment for each protein family, an error-prone and very time-consuming process. Finally, the metric does not require that either sequence have significant homology to the other in order to have a defined similarity to each other, only that they share at least one significant BLAST hit in common. This is an especially interesting property of the metric, because it allows the rapid recovery of protein families from the proteome for which no multiple alignment is possible, thus providing a computational basis for the extension of protein homology searches beyond those of current HMM- and profile-based search methods. Once the whole-proteome similarity matrix has been calculated, Lek first partitions the proteome into single-linkage clusters (27) on the basis of one or more shared BLAST hits between two sequences. Next, these single-linkage clusters are further partitioned into subclusters, each member of which shares a user-specified pairwise similarity with the other members of the cluster, as described above. For the purposes of this publication, we have focused on the analysis of single-linkage clusters and what we have termed “complete clusters,” e.g., those subclusters for which every member has a similarity metric of 1 to every other member of the subcluster. We believe that the single-linkage and complete clusters are of special interest, in part, because they allow us to estimate and to compare sizes of core protein sets in a rigorous manner. The rationale for this is as follows: if one imagines for a moment a perfect clustering algorithm capable of perfectly partitioning one or more perfectly annotated protein sets into protein families, it is reasonable to assume that the number of clusters will always be greater than, or equal to, the number of single-linkage clusters, because single-linkage clustering is a maximally agglomerative clustering method. Thus, if there exists a single protein in the predicted protein set containing domains A and B, then it will be clustered by single linkage together with all single-domain proteins containing domains A or B. Likewise, for a predicted protein set containing a single multidomain protein, the number of real clusters must always be less than or equal to the number of complete clusters, because it is impossible to place a unique multidomain protein into a complete cluster. Thus, the single-linkage and complete clusters plus singletons should comprise a lower and upper bound of sizes of core protein sets, respectively, allowing us to compare the relative size and complexity of different organisms' predicted protein set.
90
Smith T. F., Waterman M. S., J. Mol. Biol. 147, 195 (1981).
91
Delcher A. L., et al., Nucleic Acids Res. 27, 2369 (1999).
92
Arabidopsis Genome Initiative, Nature408, 796 (2000).
93
The probability that a contiguous set of proteins is the result of a segmental duplication can be estimated approximately as follows. Given that protein A and B occur on one chromosome, and that A′ and B′ (paralogs of A and B) also exist in the genome, the probability that B′ occurs immediately after A′ is 1/N, where N is the number of proteins in the set (for this analysis, N = 26,588). Allowing for B′ to occur as any of the next J-1 proteins [leaving a gap between A′ and B′ increases the probability to (J – 1)/N; allowing B′A′ or A′B′ gives a probability of 2(J – 1)/N]. Considering three genes ABC, the probability of observing A′B′C′ elsewhere in the genome, given that the paralogs exist, is 1/N2. Three proteins can occur across a spread of five positions in six ways; more generally, we compute the number of ways that K proteins can be spread across J positions by counting all possible arrangements of K – 2 proteins in the J – 2 positions between the first and last protein. Allowing for a spread to vary from K positions (no gaps) to J gives L=X=K2J2XK2arrangements. Thus, the probability of chance occurrence is L/NK–1. Allowing for both sets of genes (e.g., ABC and A′B′C′) to be spread across J positions increases this to L2/NK–1. The duplicated segment might be rearranged by the operations of reversal or translocation; allowing for M such rearrangements gives us a probability P = L2M/NK–1. For example, the probability of observing a duplicated set of three genes in two different locations, where the three genes occur across a spread of five positions in both locations, is 36/N2; the expected number of such matched sets in the predicted protein set is approximately (N)36/N2 = 36/N, a value «1. Therefore, any such duplications of three genes are unlikely to result from random rearrangements of the genome. If any of the genes occur in more than two copies, the probability that the apparent duplication has occurred by chance increases. The algorithm for selecting candidate duplications only generates matched protein sets with P « 1.
94
Trask B. J., et al., Hum. Mol. Genet. 7, 13 (1998);
Sharon D., et al., Genomics 61, 24 (1999).
95
Barbazuk W. B., et al., Genome Res. 10, 1351 (2000);
McLysaght A., Enright A. J., Skrabanek L., Wolfe K. H., Yeast 17, 22 (2000);
Burt D. W., et al., Nature 402, 411 (1999) .
96
Reviewed in
Skrabanek L., Wolfe K. H., Curr. Opin. Genet. Dev. 8, 694 (1998).
97
Taillon-Miller P., Gu Z., Li Q., Hillier L., Kwok P. Y., Genome Res. 8, 748 (1998);
Taillon-Miller P., Piernot E. E., Kwok P. Y., Genome Res. 9, 499 (1999).
98
Altshuler D., et al., Nature 407, 513 (2000).
99
Marth G. T., et al., Nature Genet. 23, 452 (1999).
100
W.-H. Li, Molecular Evolution (Sinauer, Sunderland, MA, 1997).
101
Cargill M., et al., Nature Genet. 22, 231 (1999).
102
Halushka M. K., et al., Nature Genet. 22, 239 (1999).
103
Zhang J., Madden T. L., Genome Res. 7, 649 (1997).
104
M. Nei, Molecular Evolutionary Genetics (Columbia Univ. Press, New York, 1987).
105
From the observed coverage of the sequences at each site for each individual, we calculated the probability that a SNP would be detected at the site if it were present. For each level of coverage, there is a binomial sampling of the two homologs for each individual, and a heterozygous site could only be ascertained if both homologs are present, or if two alleles from different individuals are present. With coverage x from a given individual, both homologs are present in the assembly with probability 1 − (1/2)x−1. Even if both homologs are present, the probability that a SNP is detected is <1 because a fraction of sites failed the quality criteria. Integrating over coverage levels, the binomial sampling, and the quality distribution, we derived an expected number of sites in the genome that were ascertained for polymorphism for each individual. The nucleotide diversity was then the observed number of variable sites divided by the expected number of sites ascertained.
106
Nachman M. W., Bauer V. L., Crowell S. L., Aquadro C. F., Genetics 150, 1133 (1998).
107
D. A. Nickerson et al., Nature Genet.19, 233 (1998);
Nickerson D. A., et al., Genomic Res. 10, 1532 (2000);
Jorde L., et al., Am. J. Hum. Genet. 66, 979 (2000);
Wang D. G., et al., Science 280, 1077 (1998) .
108
Przeworski M., Hudson R. R., Di Rienzo A., Trends Genet. 16, 296 (2000).
109
Tavare S., Theor. Popul. Biol. 26, 119 (1984).
110
R. R. Hudson, in Oxford Surveys in Evolutionary Biology, D. J. Futuyma, J. D. Antonovics, Eds. (Oxford Univ. Press, Oxford, 1990), vol. 7, pp. 1–44.
111
Clark A. G., et al., Am. J. Hum. Genet. 63, 595 (1998).
112
M. Kimura, The Neutral Theory of Molecular Evolution (Cambridge Univ. Press, Cambridge, 1983).
113
Kaessmann H., Heissig F., von Haeseler A., Paabo S., Nature Genet. 22, 78 (1999).
114
Sonnhammer E. L., Eddy S. R., Durbin R., Proteins 28, 405 (1997).
115
Bateman A., et al., Nucleic Acids Res. 28, 263 (2000).
116
Brief description of the methods used to build the Panther classification. First, the June 2000 release of the GenBank NR protein database (excluding sequences annotated as fragments or mutants) was partitioned into clusters using BLASTP. For the clustering, a seed sequence was randomly chosen, and the cluster was defined as all sequences matching the seed to statistical significance (E-value < 10−5) and “globally” alignable (the length of the match region must be >70% and <130% of the length of the seed). If the cluster had more than five members, and at least one from a multicellular eukaryote, the cluster was extended. For the extension step, a hidden Markov Model (HMM) was trained for the cluster, using the SAM software package, version 2. The HMM was then scored against GenBank NR (excluding mutants but including fragments for this step), and all sequences scoring better than a specific (NLL-NULL) score were added to the cluster. The HMM was then retrained (with fixed model length) and all sequences in the cluster were aligned to the HMM to produce a multiple sequence alignment. This alignment was assessed by a number of quality measures. If the alignment failed the quality check, the initial cluster was rebuilt around the seed using a more restrictive E-value, followed by extension, alignment, and reassessment. This process was repeated until the alignment quality was good. The multiple alignment and “general” (i.e., describing the entire cluster, or “family”) HMM (176) were then used as input into the BETE program (177). BETE calculates a phylogenetic tree for the sequences in the alignment. Functional information about the sequences in each cluster were parsed from SwissProt (178) and GenBank records. “Tree-attribute viewer” software was used by biologist curators to correlate the phylogenetic tree with protein function. Subfamilies were manually defined on the basis of shared function across subtrees, and were named accordingly. HMMs were then built for each subfamily, using information from both the subfamily and family (K. Sjölander, in preparation). Families were also manually named according to the functions contained within them. Finally, all of the families and subfamilies were classified into categories and subcategories based on their molecular functions. The categorization was done by manual review of the family and subfamily names, by examining SwissProt and GenBank records, and by review of the literature as well as resources on the World Wide Web. The current version (2.0) of the Panther molecular function schema has four levels: category, subcategory, family, and subfamily. Protein sequences for whole eukaryotic genomes (for the predicted human proteins and annotated proteins for fly, worm, yeast, and Arabidopsis) were scored against the Panther library of family and subfamily HMMs. If the score was significant (the NLL-NULL score cutoff depends on the protein family), the protein was assigned to the family or subfamily function with the most significant score.
117
Ponting C. P., Schultz J., Milpetz F., Bork P., Nucleic Acids Res. 27, 229 (1999).
118
A. Goffeau et al., Science 274, 546, 563 (1996).
119
C. elegans Sequencing Consortium, Science282, 2012 (1998).
120
S. A. Chervitz et al., Science282, 2022 (1998).
121
E. R. Kandel, J. H. Schwartz, T. Jessell, Principles of Neural Science (McGraw-Hill, New York, ed. 4, 2000).
122
Goodenough D. A., Goliger J. A., Paul D. L., Annu. Rev. Biochem. 65, 475 (1996).
123
Wilkinson D. G., Int. Rev. Cytol. 196, 177 (2000).
124
Nakamura F., Kalb R. G., Strittmatter S. M., J. Neurobiol. 44, 219 (2000).
125
Horner P. J., Gage F. H., Nature 407, 963 (2000);
Casaccia-Bonnefil P., Gu C., Chao M. V., Adv. Exp. Med. Biol. 468, 275 (1999).
126
Wang S., Barres B. A., Neuron 27, 197 (2000).
127
Geppert M., Sudhof T. C., Annu. Rev. Neurosci. 21, 75 (1998);
Littleton J. T., Bellen H. J., Trends Neurosci. 18, 177 (1995).
128
Maximov A., Sudhof T. C., Bezprozvanny I., J. Biol. Chem. 274, 24453 (1999).
129
B. Sampo et al., Proc. Natl. Acad. Sci. U.S.A. 97, 3666 (2000).
130
Lemke G., Glia 7, 263 (1993).
131
M. Bernfield et al., Annu. Rev. Biochem.68, 729 (1999).
132
Perrimon N., Bernfield M., Nature 404, 725 (2000).
133
Lindahl U., Kusche-Gullberg M., Kjellen L., J. Biol. Chem. 273, 24979 (1998).
134
J. L. Riechmann et al., Science290, 2105 (2000).
135
Hurskainen T. L., Hirohata S., Seldin M. F., Apte S. S., J. Biol. Chem. 274, 25555 (1999).
136
Black R. A., White J. M., Curr. Opin. Cell Biol. 10, 654 (1998).
137
Aravind L., Dixit V. M., Koonin E. V., Trends Biochem. Sci. 24, 47 (1999).
138
A. G. Uren et al., Mol. Cell6, 961 (2000).
139
Garcia-Meunier P., Etienne-Julan M., Fort P., Piechaczyk M., Bonhomme F., Mamm. Genome 4, 695 (1993).
140
K. Meyer-Siegler et al., Proc. Natl. Acad. Sci. U.S.A. 88, 8460 (1991).
141
Mansur N. R., Meyer-Siegler K., Wurzer J. C., Sirover M. A., Nucleic Acids Res. 21, 993 (1993).
142
Tatton N. A., Exp. Neurol. 166, 29 (2000).
143
Kenmochi N., et al., Genome Res. 8, 509 (1998).
144
Chen F. W., Ioannou Y. A., Int. Rev. Immunol. 18, 429 (1999).
145
Madsen H. O., Poulsen K., Dahl O., Clark B. F., Hjorth J. P., Nucleic Acids Res. 18, 1513 (1990).
146
Chambers D. M., Peters J., Abbott C. M., Proc. Natl. Acad. Sci. U.S.A. 95, 4463 (1998);
Khalyfa A., Carlson B. M., Carlson J. A., Wang E., Dev. Dyn. 216, 267 (1999).
147
Aeschlimann D., Thomazy V., Connect. Tissue Res. 41, 1 (2000).
148
Munroe P., et al., Nature Genet. 21, 142 (1999);
Wu S. M., Cheung W. F., Frazier D., Stafford D. W., Science 254, 1634 (1991);
Furie B., et al., Blood 93, 1798 (1999).
149
Kehoe J. W., Bertozzi C. R., Chem. Biol. 7, R57 (2000).
150
Pawson T., Nash P., Genes Dev. 14, 1027 (2000).
151
van der Velden A. W., Thomas A. A., Int. J. Biochem. Cell Biol. 31, 87 (1999).
152
Fraser C. M., et al., Science 281, 375 (1998);
Tettelin H., et al., Science 287, 1809 (2000).
153
Brett D., et al., FEBS Lett. 474, 83 (2000).
154
Muller H. J., Kern H., Z. Naturforsch. B 22, 1330 (1967).
155
H. J. Muller, in Heritage from Mendel, R. A. Brink, Ed. (Univ. of Wisconsin Press, Madison, WI, 1967), p. 419.
156
J. F. Crow, M. Kimura, Introduction to Population Genetics Theory (Harper & Row, New York, 1970).
157
K. Kobayashi et al., Nature 394, 388 (1998).
158
Feinberg A. P., Curr. Top. Microbiol. Immunol. 249, 87 (2000).
159
Collins C. A., Guthrie C., Nature Struct. Biol. 7, 850 (2000).
160
Eddy S. R., Curr. Opin. Genet. Dev. 9, 695 (1999).
161
Wang Q., Khillan J., Gadue P., Nishikura K., Science 290, 1765 (2000).
162
Holcik M., Sonenberg N., Korneluk R. G., Trends Genet. 16, 469 (2000).
163
McKinsey T. A., Zhang C. L., Lu J., Olson E. N., Nature 408, 106 (2000).
164
Capanna E., Romanini M. G. M., Caryologia 24, 471 (1971).
165
Maynard Smith J., J. Theor. Biol. 128, 247 (1987).
166
Charlesworth D., Charlesworth B., Morgan M. T., Genetics 141, 1619 (1995).
167
Bailey J. E., Nature Biotechnol. 17, 616 (1999).
168
Maleszka R., de Couet H. G., Miklos G. L., Proc. Natl. Acad. Sci. U.S.A. 95, 3731 (1998).
169
Miklos G. L., J. Neurobiol. 24, 842 (1993).
170
Crutchfield J. P., Young K., Phys. Rev. Lett. 63, 105 (1989);
Gell-Mann M., Lloyd S., Complexity 2, 44 (1996).
171
Barabasi A. L., Albert R., Science 286, 509 (1999).
172
Colucci-Guyon E., et al., Cell 79, 679 (1994).
173
J. Sambrook, E. F. Fritch, T. Maniatis, Molecular Cloning: A Laboratory Manual (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, ed. 2, 1989).
174
Ewing B., Green P., Genome Res. 8, 186 (1998);
Ewing B., Hillier L., Wendl M. C., Green P., Genome Res. 8, 175 (1998).
175
Lander E. S., Waterman M. S., Genomics 2, 231 (1988).
176
Krogh A., Sjölander K., J. Mol. Biol. 235, 1501 (1994).
177
Sjölander K., Proc. Int. Soc. Mol. Biol. 6, 165 (1998).
178
Bairoch A., Apweiler R., Nucleic Acids Res. 28, 45 (2000).
180
Tatusov R. L., Galperin M. Y., Natale D. A., Koonin E. V., Nucleic Acids Res. 28, 33 (2000).
181
We thank E. Eichler and J. L. Goldstein for many helpful discussions and critical reading of the manuscript, and A. Caplan for advice and encouragement. We also thank T. Hein, D. Lucas, G. Edwards, and the Celera IT staff for outstanding computational support. The cost of this project was underwritten by the Celera Genomics Group of the Applera Corporation. We thank the Board of Directors of Applera Corporation: J. F. Abely Jr. (retired), R. H. Ayers, J.-L. Bélingard, R. H. Hayes, A. J. Levine, T. E. Martin, C. W. Slayman, O. R. Smith, G. C. St. Laurent Jr., and J. R. Tobin for their vision, enthusiasm, and unwavering support and T. L. White for leadership and advice. Data availability: The genome sequence and additional supporting information are available to academic scientists at the Web site (www.celera.com). Instructions for obtaining a DVD of the genome sequence can be obtained through the Web site. For commercial scientists wishing to verify the results presented here, the genome data are available upon signing a Material Transfer Agreement, which can also be found on the Web site.

Information & Authors

Information

Published In

Science
Volume 291 | Issue 5507
16 February 2001

Submission history

Received: 5 December 2000
Accepted: 19 January 2001
Published in print: 16 February 2001

Permissions

Request permissions for this article.

Authors

Affiliations

J. Craig Venter*
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Mark D. Adams
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Eugene W. Myers
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Peter W. Li
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Richard J. Mural
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Granger G. Sutton
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Hamilton O. Smith
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Mark Yandell
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Cheryl A. Evans
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Robert A. Holt
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Jeannine D. Gocayne
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Peter Amanatides
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Richard M. Ballew
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Daniel H. Huson
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Jennifer Russo Wortman
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Qing Zhang
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Chinnappa D. Kodira
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Xiangqun H. Zheng
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Lin Chen
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Marian Skupski
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Gangadharan Subramanian
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Paul D. Thomas
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Jinghui Zhang
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
George L. Gabor Miklos
GenetixXpress, 78 Pacific Road, Palm Beach, Sydney 2108, Australia.
Catherine Nelson
Berkeley Drosophila Genome Project, University of California, Berkeley, CA 94720, USA.
Samuel Broder
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Andrew G. Clark
Department of Biology, Penn State University, 208 Mueller Lab, University Park, PA 16802, USA.
Joe Nadeau
Department of Genetics, Case Western Reserve University School of Medicine, BRB-630, 10900 Euclid Avenue, Cleveland, OH 44106, USA.
Victor A. McKusick
Johns Hopkins University School of Medicine, Johns Hopkins Hospital, 600 North Wolfe Street, Blalock 1007, Baltimore, MD 21287–4922, USA.
Norton Zinder
Rockefeller University, 1230 York Avenue, New York, NY 10021–6399, USA.
Arnold J. Levine
Rockefeller University, 1230 York Avenue, New York, NY 10021–6399, USA.
Richard J. Roberts
New England BioLabs, 32 Tozer Road, Beverly, MA 01915, USA.
Mel Simon
Division of Biology, 147-75, California Institute of Technology, 1200 East California Boulevard, Pasadena, CA 91125, USA.
Carolyn Slayman
Yale University School of Medicine, 333 Cedar Street, P.O. Box 208000, New Haven, CT 06520–8000, USA.
Michael Hunkapiller
Applied Biosystems, 850 Lincoln Centre Drive, Foster City, CA 94404, USA.
Randall Bolanos
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Arthur Delcher
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Ian Dew
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Daniel Fasulo
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Michael Flanigan
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Liliana Florea
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Aaron Halpern
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Sridhar Hannenhalli
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Saul Kravitz
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Samuel Levy
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Clark Mobarry
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Knut Reinert
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Karin Remington
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Jane Abu-Threideh
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Ellen Beasley
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Kendra Biddick
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Vivien Bonazzi
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Rhonda Brandon
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Michele Cargill
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Ishwar Chandramouliswaran
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Rosane Charlab
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Kabir Chaturvedi
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Zuoming Deng
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Valentina Di Francesco
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Patrick Dunn
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Karen Eilbeck
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Carlos Evangelista
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Andrei E. Gabrielian
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Weiniu Gan
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Wangmao Ge
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Fangcheng Gong
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Zhiping Gu
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Ping Guan
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Thomas J. Heiman
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Maureen E. Higgins
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Rui-Ru Ji
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Zhaoxi Ke
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Karen A. Ketchum
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Zhongwu Lai
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Yiding Lei
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Zhenya Li
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Jiayin Li
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Yong Liang
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Xiaoying Lin
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Fu Lu
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Gennady V. Merkulov
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Natalia Milshina
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Helen M. Moore
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Ashwinikumar K Naik
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Vaibhav A. Narayan
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Beena Neelam
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Deborah Nusskern
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Douglas B. Rusch
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Steven Salzberg
The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.
Wei Shao
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Bixiong Shue
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Jingtao Sun
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Zhen Yuan Wang
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Aihui Wang
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Xin Wang
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Jian Wang
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Ming-Hui Wei
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Ron Wides
Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, 52900 Israel.
Chunlin Xiao
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Chunhua Yan
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Alison Yao
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Jane Ye
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Ming Zhan
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Weiqing Zhang
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Hongyu Zhang
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Qi Zhao
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Liansheng Zheng
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Fei Zhong
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Wenyan Zhong
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Shiaoping C. Zhu
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Shaying Zhao
The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.
Dennis Gilbert
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Suzanna Baumhueter
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Gene Spier
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Christine Carter
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Anibal Cravchik
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Trevor Woodage
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Feroze Ali
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Huijin An
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Aderonke Awe
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Danita Baldwin
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Holly Baden
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Mary Barnstead
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Ian Barrow
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Karen Beeson
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Dana Busam
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Amy Carver
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Angela Center
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Ming Lai Cheng
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Liz Curry
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Steve Danaher
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Lionel Davenport
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Raymond Desilets
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Susanne Dietz
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Kristina Dodson
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Lisa Doup
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Steven Ferriera
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Neha Garg
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Andres Gluecksmann
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Brit Hart
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Jason Haynes
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Charles Haynes
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Cheryl Heiner
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Suzanne Hladun
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Damon Hostin
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Jarrett Houck
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Timothy Howland
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Chinyere Ibegwam
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Jeffery Johnson
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Francis Kalush
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Lesley Kline
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Shashi Koduru
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Amy Love
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Felecia Mann
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
David May
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Steven McCawley
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Tina McIntosh
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Ivy McMullen
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Mee Moy
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Linda Moy
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Brian Murphy
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Keith Nelson
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Cynthia Pfannkoch
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Eric Pratts
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Vinita Puri
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Hina Qureshi
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Matthew Reardon
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Robert Rodriguez
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Yu-Hui Rogers
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Deanna Romblad
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Bob Ruhfel
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Richard Scott
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Cynthia Sitter
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Michelle Smallwood
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Erin Stewart
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Renee Strong
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Ellen Suh
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Reginald Thomas
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Ni Ni Tint
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Sukyee Tse
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Claire Vech
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Gary Wang
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Jeremy Wetter
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Sherita Williams
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Monica Williams
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Sandra Windsor
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Emily Winn-Deen
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Keriellen Wolfe
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Jayshree Zaveri
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Karena Zaveri
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Josep F. Abril
Grup de Recerca en Informàtica Mèdica, Institut Municipal d'Investigació Mèdica, Universitat Pompeu Fabra, 08003-Barcelona, Catalonia, Spain.
Roderic Guigó
Grup de Recerca en Informàtica Mèdica, Institut Municipal d'Investigació Mèdica, Universitat Pompeu Fabra, 08003-Barcelona, Catalonia, Spain.
Michael J. Campbell
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Kimmen V. Sjolander
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Brian Karlak
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Anish Kejariwal
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Huaiyu Mi
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Betty Lazareva
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Thomas Hatton
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Apurva Narechania
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Karen Diemer
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Anushya Muruganujan
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Nan Guo
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Shinji Sato
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Vineet Bafna
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Sorin Istrail
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Ross Lippert
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Russell Schwartz
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Brian Walenz
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Shibu Yooseph
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
David Allen
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Anand Basu
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
James Baxendale
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Louis Blick
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Marcelo Caminha
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
John Carnes-Stine
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Parris Caulk
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Yen-Hui Chiang
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
My Coyne
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Carl Dahlke
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Anne Deslattes Mays
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Maria Dombroski
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Michael Donnelly
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Dale Ely
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Shiva Esparham
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Carl Fosler
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Harold Gire
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Stephen Glanowski
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Kenneth Glasser
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Anna Glodek
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Mark Gorokhov
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Ken Graham
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Barry Gropman
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Michael Harris
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Jeremy Heil
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Scott Henderson
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Jeffrey Hoover
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Donald Jennings
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Catherine Jordan
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
James Jordan
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
John Kasha
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Leonid Kagan
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Cheryl Kraft
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Alexander Levitsky
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Mark Lewis
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Xiangjun Liu
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
John Lopez
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Daniel Ma
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
William Majoros
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Joe McDaniel
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Sean Murphy
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Matthew Newman
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Trung Nguyen
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Ngoc Nguyen
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Marc Nodell
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Sue Pan
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Jim Peck
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Marshall Peterson
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
William Rowe
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Robert Sanders
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
John Scott
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Michael Simpson
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Thomas Smith
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Arlan Sprague
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Timothy Stockwell
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Russell Turner
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Eli Venter
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Mei Wang
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Meiyuan Wen
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
David Wu
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Mitchell Wu
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Ashley Xia
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Ali Zandieh
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
Xiaohong Zhu
Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.

Notes

*
To whom correspondence should be addressed. E-mail: [email protected]

Metrics & Citations

Metrics

Article Usage
Altmetrics

Citations

Export citation

Select the format you want to export the citation of this publication.

Cited by

  1. Structure and properties of DNA, Essentials of Medical Biochemistry, (453-476), (2023).https://doi.org/10.1016/B978-0-323-88541-6.00026-0
    Crossref
  2. Advances in Next-Generation Sequencing Technologies and Functional Investigation of Candidate Variants in Neurological and Behavioral Disorders, Encyclopedia of Behavioral Neuroscience, 2nd edition, (390-404), (2022).https://doi.org/10.1016/B978-0-12-819641-0.00145-6
    Crossref
  3. Adaptive Machine Learning Algorithm and Analytics of Big Genomic Data for Gene Prediction, Tracking and Preventing Diseases with Artificial Intelligence, (103-123), (2022).https://doi.org/10.1007/978-3-030-76732-7_5
    Crossref
  4. The status of proteomics as we enter the 2020s: Towards personalised/precision medicine, Analytical Biochemistry, 644, (113840), (2022).https://doi.org/10.1016/j.ab.2020.113840
    Crossref
  5. Networked collective microbiomes and the rise of subcellular 'units of life', Trends in Microbiology, 30, 2, (112-119), (2022).https://doi.org/10.1016/j.tim.2021.09.011
    Crossref
  6. Prediction of qualitative antibiofilm activity of antibiotics using supervised machine learning techniques, Computers in Biology and Medicine, 140, (105065), (2022).https://doi.org/10.1016/j.compbiomed.2021.105065
    Crossref
  7. The forensic genomics toolbox is expanding, BioTechniques, 72, 1, (5-7), (2022).https://doi.org/10.2144/btn-2021-0103
    Crossref
  8. Technique development of high-throughput and high-sensitivity sample preparation and separation for proteomics, Bioanalysis, 14, 2, (101-111), (2022).https://doi.org/10.4155/bio-2021-0202
    Crossref
  9. Recursive Genome Function of the Cerebellum: Geometric Unification of Neuroscience and Genomics, Handbook of the Cerebellum and Cerebellar Disorders, (1559-1602), (2022).https://doi.org/10.1007/978-3-030-23810-0
    Crossref
  10. Physikalische, genetische und funktionelle Kartierung des Genoms, Bioanalytik, (965-981), (2022).https://doi.org/10.1007/978-3-662-61707-6
    Crossref
  11. See more
Loading...

View Options

Check Access

Log in to view the full text

AAAS ID LOGIN

AAAS login provides access to Science for AAAS Members, and access to other journals in the Science family to users who have purchased individual subscriptions.

Log in via OpenAthens.
Log in via Shibboleth.
More options

Purchase digital access to this article

Download and print this article for your personal scholarly, research, and educational use.

Purchase this issue in print

Buy a single issue of Science for just $15 USD.

View options

PDF format

Download this article as a PDF file

Download PDF

Media

Figures

Multimedia

Tables

Share

Share

Share article link

Share on social media