The Evolution of Genome Size

We can now consider what determines the sizes of the genomes of different organisms. As organisms evolved to become more complex, how did the number of functional genes increase so that the organisms could carry out the greater variety of metabolic activities associated with that complexity?

Complex organisms have more DNA than do simpler ones

As we saw in Chapters 13 and 14, genome size varies tremendously among organisms. The first pattern to be detected was that genome sizes are generally correlated with organisms' complexity. The genome of Mycoplasma genitalium, the simplest known prokaryote, has only 470 genes. Rickettsia prowazekii, the prokaryote that causes typhus, has 634 genes. Homo sapiens, on the other hand, has about 21,000 protein-coding genes. Figure 26.7 shows the relative sizes of several prokaryotic and eukaryotic genomes.

It is not surprising that more complex genetic instructions are needed for building and maintaining a large, complex organism than a small, simple one. What is surprising is that some organisms, such as lungfishes, some salamanders, and lilies, have about 40 times as much DNA as humans do. Clearly, a lungfish or a lily is not 40 times as complex as a human. Why does genome size vary so much?

Some of the apparent variation in genome size disappears when we compare the portion of DNA that actually encodes RNAs or proteins. The size of the coding genome of organisms varies in a way that makes sense. Eukaryotes have more coding DNA than prokaryotes; plants have more coding DNA than single-celled organisms; invertebrates with wings,

H. influenzae E. coli Yeast Arabidopsis Drosophila C. elegans (nematode) Sea squirt Pufferfish Mouse Human


Dark bars show results based on complete 1 genome sequencing.

] Fungus

I Plant


Lighter bars are estimates based on sequencing samples.


0 25 50 75 100 Number of genes X 1,000

26.7 Complex Organisms Have More Genes than Simpler Organisms Genome sizes have been measured or estimated in a variety of organisms ranging from single-celled prokaryotes to vertebrates.

legs, and eyes have more coding DNA than nematodes; and vertebrates have more coding DNA than invertebrates. The organisms with the largest amount of nuclear DNA (some ferns and flowering plants) have 80,000 times as much as the simplest organisms, but no species has more than 20 times as many protein-coding genes as a bacterium. Therefore, most of the variation in genome size lies not in the number of functional genes, but in the amount of noncoding DNA (Figure 26.8).

What maintains such large quantities of noncoding DNA in the cells of most organisms? Does this noncoding DNA have a function, or is it "junk?" Most of this DNA appears to be nonfunctional. Much of it may consist of pseudogenes that are simply carried in the genome because the cost of doing so is very small. Some of it consists of parasitic transposable elements that spread through populations because they reproduce faster than the host genome. Investigators can use one type of transposable element, retrotransposons, to estimate the rates at which species lose DNA.

Retrotransposons copy themselves with the aid of RNA, as we saw in Chapter 14. The most common type of retro-transposon carries duplicated sequences at each end, called long terminal repeats (LTRs). Occasionally, LTRs join together in the host genome, at which time the DNA between them is excised. When this happens, one of the LTRs is left behind. The number of such "orphaned" LTRs in a genome is a measure of how many retrotransposons have been lost. By comparing the number of LTRs in the genomes of Hawaiian crickets of the genus Laupala and those of fruit flies (Dro-sophila), investigators found that Laupala loses DNA more than 40 times more slowly than Drosophila. As a result, the genome of Laupala is 11 times larger than that of Drosophila. Why species differ so greatly in the rate at which they lose DNA is not understood.

Gene duplication can increase genome size and complexity

The identical copies of a duplicated gene can have any one of three different fates:

► Both copies of the gene may retain their original function, with the result that the organism produces larger quantities of the gene's RNA or protein product.

► One copy of the gene may be incapacitated by the accumulation of deleterious mutations and become a func-tionless pseudogene.

► One copy of the gene may retain its original function while the second copy accumulates enough mutations that it can perform a different function.

It is the third of the above fates that is most significant for evolution.

How often do gene duplications arise, and which of the three outcomes described above is most likely? These questions can be addressed by counting the number of synonymous nucleotide base changes in the genome of an organism. This number is then compared with the number of base changes that caused protein alterations, to see which number changed faster. Investigators have found that rates of gene duplication are fast enough for a yeast or Drosophila population to acquire several hundred duplicate genes over neloo e g a h

2o o

E. coli

Coli Genome Size

0.001 0.01 0.1 1 10 100 1000 Genome size ( x 109 base pairs)

•Human Lungfish Lily

0.001 0.01 0.1 1 10 100 1000 Genome size ( x 109 base pairs)

26.8 A Large Proportion of DNA Is Noncoding Most of the DNA

of bacteria and yeasts encodes RNAs or proteins, but most of the DNA of more complex organisms is noncoding. Most noncoding DNA is probably nonfunctional.

Beta chain

Precambrian Cambrian Ordovician Silurian Devonian Carboniferous Permian

Tertiary ||






a Family

Beta chain

a Family

p Family

Hemoglobin p Family


Precambrian Cambrian Ordovician Silurian Devonian Carboniferous Permian

Tertiary ||


Millions of years ago (mya)

the course of a million years. They also found that most of the duplicated genes in these organisms are very young. Extra genes typically are lost from a genome within 10 million years (which is rapid on an evolutionary time scale).

Although extra genes usually disappear rapidly, some duplication events lead to the evolution of genes with new functions. Several successive rounds of duplication and mutation may result in a gene family, a group of homologous genes with related functions, often arrayed in tandem along a chromosome. An example of this process is provided by the globin gene family (see Figure 14.7). The globins were among the first proteins to be sequenced and compared. Comparisons of their amino acid sequences strongly suggest that the different globins arose via gene duplications. How long the globins have been evolving separately can also be inferred by comparing their amino acid sequences. The greater the number of amino acid differences between two globins, the further back in time was their most recent common ancestor.

Hemoglobin, a tetramer consisting of two a-globin chains and two P-globin chains, carries oxygen in blood. Myoglo-bin, a monomer, is the primary oxygen storage protein in muscle. Myoglobin's affinity for O2 is much higher than that of hemoglobin. In contrast, hemoglobin evolved to be more diversified in its role. Hemoglobin binds O2 in the lungs or gills, where the O2 concentration is relatively high, transports it to deep body tissues, where the O2 concentration is low, and releases it in those areas. With its more complex tetrameric structure (see Figure 3.8), hemoglobin is able to carry four molecules of O2, as well as hydrogen ions and carbon dioxide, in the blood.

To estimate the time of the globin gene duplication that gave rise to the a- and P-globin gene clusters, we can create a gene tree based on the estimated number of base substitutions necessary to account for the observed amino acid dif


26.9 A Globin Family Gene Tree This gene tree suggests that the a-globin and P-globin gene clusters diverged about 450 mya, at about the time of the origin of the vertebrates.

ferences between the globins. Based on this gene tree, and assuming that the rate of amino acid substitution has been relatively constant since then—about 100 substitutions per 500 million years—the two globin gene clusters are estimated to have split about 450 mya (Figure 26.9).

Was this article helpful?

0 0
Essentials of Human Physiology

Essentials of Human Physiology

This ebook provides an introductory explanation of the workings of the human body, with an effort to draw connections between the body systems and explain their interdependencies. A framework for the book is homeostasis and how the body maintains balance within each system. This is intended as a first introduction to physiology for a college-level course.

Get My Free Ebook


  • andrea
    Who has more coding dna lungfish or e coli?
    8 years ago

Post a comment