Info

Sequence position (bp)

Figure 3.7 Isochore maps of some eukaryotic chromosomes, calculated from the genomic databases with an improved 'compositional segmentation' algorithm. For human chromosomes XXI and XXII the largest available contigs were taken. Note that there is much more structure in the human and fruit fly genomes, indicating variable gene density along the chromosome, than in the C. elegans, Arabidopsis, and yeast genomes, indicating a more even spread of genes. After Oliver et al. (2001), with permission from Elsevier.

COMPARING GENOMES 67 Table 3.3 GC contents of coding regions of genomes of some selected species from the three domains of life

Overall First codon Second codon Third codon position position position

Archaea

Methanococcus maripaludis 34.08

Halobacterium salinarum 64.85

Thermoplasma acidophilum 47.37

Sulfolobus solfataricus 36.49

Pyrococcus furiosus 31.09 Bacteria

Sinorhizobium meliloti 63.14

Escherichia coli 51.39

Leptospira interrogans 36.42

Actinomyces naeslundi 67.51

Thermotoga maritima 46.40

Eukarya

Saccharomyces cerevisiae (budding yeast) 39.76

Caenorhabditis elegans (roundworm) 42.93

Arabidopsis thaliana (thale cress) 44.60

Drosophila melanogaster (fruit fly) 53.97

Fundulus heteroclitus (mummichog fish) 53.93

Danio rerio (zebrafish) 50.94

Xenopus laevis (clawed frog) 46.97

Vipera aspis (aspis viper) 52.20

Anas platyrhynchos (mallard duck) 51.27

Sus scrofa (wild pig) 54.44

Source: Data from www.kazusa.or.jp/codon (Nakamura et al. 2000).

44.70 32.29 25.26

68.07 43.04 83.45

49.55 37.55 55.02

43.22 33.65 32.59

49.46 34.56 39.24

64.88 45.23 79.30 58.27 40.84 55.06 44.70 34.62 29.94

44.60 36.61 38.09

49.95 38.93 39.90

50.90 40.52 42.37

55.87 41.51 64.52

55.10 40.89 65.81

54.50 40.82 57.52

One of the forces acting upon the GC content is asymmetry in substitution rates. If the rate of substitution from G/C to A/T is denoted as u and the rate of substitution from A/T to G/C as v, then the equilibrium content of GC is expected to take the value PGC, where (Graur and Li 2000)

If the two rates are equal, PGC is expected to be 1/2, or 50%. A high GC ratio is evidence for v being larger than u. The asymmetry of substitutions is called GC mutational pressure. However, GC content is not only shaped by mutation, but to some extent also by selection. One of the selective constraints arises from codon usage. We know that several amino acids are encoded in the DNA by more than one triplet, but this does not imply that all possible triplets are used in proportion. In most amino acids there is a bias towards the use of certain codons because the tRNAs of these codons are more abundant (codon usage bias, see Section 2.2). Depending on the number of Gs in the preferred codon, selection can constrain the GC content. Another source of selection is due to the superior stability of the G-C bond, which uses three hydrogen bridges, whereas the A-T bond uses two. It has been argued by several researchers that high ambient temperatures would favour high GC contents and that endothermic ('warm blooded') animals (birds and mammals) would have higher GC contents than ectothermic ('cold-blooded') animals (most reptiles, amphibians, fish, and invertebrates).

If selection is a major factor influencing the GC content and if GC mutational pressure drives the GC content upwards, one would expect that PGC is higher in the third codon positions of a protein-encoding gene, compared with the first and second positions. There is some evidence that this is indeed the case both for prokaryotes and eukar-yotes (Table 3.3); however, whether this is indeed indicative of selection is questionable; bias introduced by mobile elements with a high GC content is usually ignored (Duret and Hurst 2001). The possible role of temperature stability in the evolution of GC content was critically examined by Hughes (1999) and he concluded that the hypothesis is not supported strongly by the data. Among thermophilic prokaryotes there are species with low and high GC contents and within the vertebrates the picture is also not consistent (Table 3.3). The strongest effect in the GC content seems to stem from phylogenetic constraints: taxonomically related species tend to have similar GC contents. Although purifying selection may play a modest role, the dominant factors acting upon GC content seem to be neutral processes, GC mutational pressure, and random drift.

3.1.4 Gene order

In genetics, two loci are called syntenic if they are located on the same chromosome (Russell 2002). In genomics, however, the term synteny is used to indicate a situation where a series of genes is arranged in the same order on different genomes (Gibson and Muse 2002). Passarge et al. (1999) have rightly pointed out that this new usage is incorrect and etymologically awkward. Another term that may be more appropriate is colinearity; however, in this book we will comply with the most common usage and use synteny in the genomics understanding. The presence of synteny between two genomes is somewhat dependent on the scale of analysis. On the level of the chromosomes, synteny may be demonstrated by techniques such as chromosome painting (using inter-species fluorescent in situ hybridization); however, this does not exclude the presence of extensive rearrangements on the level of individual genes. The term microsynteny is used to indicate detailed sequence comparions between individual genes within a chromosomal segment.

In any genome, genes are found to be organized in clusters and these clusters sometimes maintain the same order across species, even across groups as far apart as mammals and fish. Well-known examples of synteny are histone genes, Hox gene clusters, and the genes of the major histocompat-ibility complex (MHC). There are also large syn-teny blocks, covering hundreds of kilobase pairs, between the genomes of rice and Arabidopsis (see Section 3.2).

The Hox genes are a famous example of longrange synteny. Indeed, the same order of genes can be found in the Hox clusters of nematodes, insects, and mammals. Hox genes encode transcription factors that regulate developmental patterns across the anterior-posterior axis of bilaterian animals (Carroll et al. 2005). Macro-evolutionary relationships in the animal kingdom can be partly understood as duplications followed by neo- and nonfunctionalization of essentially the same pattern of Hox genes (Amores et al. 1998; Carroll 2000).

Synteny analysis is an important tool in comparative genomics. The relative order of genes in one species can provide clues about the presence or even the function of genes in another species. Similarly, by looking at the order of genes in a cluster one can discover genes by homology to another species that were missed by automatic gene-finding algorithms. Synteny analysis can also be a tool to reveal duplications across species, for example by searching two regions in one genome that have the same gene order as one region in another genome (doubly conserved synteny; Kellis et al. 2004).

How could such blocks of gene order be maintained while other regions of the genome are reshuffled extensively by recombination? How can it be that some genes are free to move through the genome while others are tied, for millions of years, to the same neighbours? One of the reasons could be selective pressure acting upon the cluster as an integrated whole. This is certainly the case if genes are organized in operons, as in all prokaryotic and some eukaryotic genomes (see Sections 3.2 and 3.3).

Another functional constraint is illustrated by the Hox genes. In most animals, the order of these genes along the chromosome is the same as the order of their expression domains along the anterior-posterior body axis. In addition, the genes at the front end of the complex are expressed earlier in development than the ones at the back end. These observations suggest that it is the requirement for coherent temporal expression that is maintaining colinearity of the Hox cluster (Patel 2004). The genetic developmental system that governs the basic positional information of tissues in all animals was called the zootype (Slack et al. 1993).

Another reason for conservation of gene order, proposed more recently, is interdigitization of regulatory elements. We know that the expression of genes is controlled by regulatory elements, usually in the 5' region of the gene. It turns out that some regulatory elements may be physically linked to genes close by, or even be located in the introns of other genes. The fixation of regulatory elements inside the territory of neighbouring genes thus forges a physical bond resulting in close linkage between the genes. Another way in which regulatory elements may link genes together occurs when the expression of a group of genes is controlled by a single locus-control region. The principle of the locus-control region was first discovered in the b-globulin cluster of the human genome, but similar regions, presumably participating in dynamic chromatin alteration, have now been found in other gene clusters. In all these cases genes cannot move independently from each other without gaining a severe selective disadvantage. The shuffling and reorganization of the genome during evolution, as highlighted in Chapter 1 and indicated by the term turbulence, is inevitably limited to some extent by such processes.

An example of synteny analysis is provided by a comparison between two distantly related species of nematode, sharing a common ancestor 300-500 million years ago (Guiliano et al. 2003). A nematode parasite of vertebrates, Brugia malayi (order Ascarida), was compared with the well-known model species C. elegans (order Rhabditida). Whereas the genome of the latter species is completely known and many genes are annotated, the genome of Brugia is only known incompletely; the comparison was undertaken partly to reveal more of the function of genes in Brugia from knowledge of C. elegans. Figure 3.8

Brugia malayi

Caenorhabditis elegans

0 0

Post a comment