The sequence of the human genome has been determined

The two teams of scientists announced a draft human genome sequence in June 2000 to great fanfare, and published their data simultaneously in February 2001. By the start of 2003, the final sequence was completed, two years ahead of the schedule set over a decade previously and well under budget.

The sequencing of the human genome revealed several interesting characteristics:

► Of the 3.2 billion base pairs, less than 2 percent are coding regions, containing a total of 21,000 genes. Before sequencing began, estimates of the number of human genes ranged from 80,000 to 100,000. This lower number of genes, not many more than the fruit fly, means that the observed diversity of proteins, which led to the 100,000 estimate, must be produced posttranscriptionally. An average eukaryotic gene, then, codes for several different proteins.

► The average gene has 27,000 base pairs. There is great variation in gene sizes, from 1,000 to 2.4 million base pairs. That is to be expected, as human proteins vary in size (as do RNAs), ranging from 100 to about 5,000 amino acids per polypeptide chain. Virtually all human genes have many introns (Figure 17.22).

► Over 50 percent of the genome is made up of highly repetitive sequences. Repetitive sequences near genes are GC-rich, while those farther away from genes are AT-rich.

► Almost all (99.9%) of the genome is the same in all people. Even this apparent homogeneity means that there are many individual differences. Scientists have mapped over 2 million single-nucleotide polymorphisms (SNPs)—bases that differ in at least 1 percent of people.

► Genes are not evenly distributed over the genome. The small chromosome 19 is packed densely with genes, while chromosome 8 has long stretches of "gene desert," with no coding regions. The Y chromosome has the fewest genes (231), while chromosome 1 has the most (2,968).

► The functions of many genes are not known. There are 740 genes coding for RNAs that are not translated into proteins. Of these RNAs, several dozen are tRNAs, and a few are rRNAs and splicing RNAs. The roles of the rest are not clear. Nor are the roles of the hundreds of genes encoding protein kinases, although it is a good bet that they are involved in cell signaling.

    How do genes code for 80,000 to 100,000 different polypeptides?
