Protein

Figure 1.1. The Central Dogma assumes that biological information is transfered is from DNA to RNA ^^^^^^^^ to proteins. Recent discoveries of viruses that transcribe information from RNA to DNA has required modification in the Dogma. Three processes are involved in the Central Dogma: DNA replication, transcription of the genetic information into RNA, and translation of the messenger RNA into a polypeptide (protein).

fatal neurodegenerative diseases. The term prion refers to proteinaceous infectious particles (Prusiner and Scott 1997). Prion diseases include bovine spongiform encephalopathy (BSE or "mad cow disease") in cattle, scrapie in sheep, and Creutzfeld-Jakob Disease or kuru in humans. These "proteinaceous infective particles" do not contain DNA, but are able to transmit the disease to other individuals who eat the altered proteins (Prusiner and Scott 1997). Current data suggest the altered protein acts as a template upon which the normal protein is refolded into a deformed molecule through a process facilitated by another protein (Prusiner and Scott 1997, Tuite 2000). Such abnormal proteins are transmitted to daughter cells, thus propagating the mutant phenotype in the absence of mutated nucleic acid.

Despite these exceptions, the Central Dogma remains a major tenet of modern biology. In insects, the genes (DNA) are found in complex structures called chromosomes that consist of proteins, RNA, and DNA. This chapter reviews the structure of DNA and RNA, the basis of the genetic code, the processes involved in DNA replication, and changes in DNA that result in mutations.

1.3. The "RNA World" Came First?

It is now widely accepted that there was an era on Earth during which RNA played the role of both genetic material and main agent of catalytic activity, e.g., had ribozyme activity (DiGiulio 1997, Jeffares et al. 1998, Poole et al. 1998, Cooper 2000, Eddy 2001). This implies that proteins in the modern world replaced RNA as the main catalysts (enzymes). The "RNA organism" is thought to have had a multiple-copy, double-stranded RNA genome capable of recombination and splicing. The RNA genome was probably fragmented into "chromosomes" (Jeffares et al. 1998). RNA could have been the first genetic material because we now know it can serve as a template for self-replication and can catalyze a number of chemical reactions, including the polymerization of nucleotides (Johnston et al. 2001). It is thought that interactions between RNA and amino acids then evolved into the present-day world in which DNA is the primary stable repository of genetic information.

1.4. The Molecular Structure of DNA

Deoxyribonucleic acid (DNA) is a long polymeric molecule consisting of numerous individual monomers that are linked in a series and organized in a helix. Each monomer is called a nucleotide. Each nucleotide is itself a complex molecule made up of three components: (1) a sugar, (2) a nitrogenous base, and (3) a phosphoric acid.

In DNA, the sugar component is a pentose (with five carbon atoms) in a ring form that is called 2'-deoxyribose (Figure 1.2).

The nitrogenous bases are single- or double-ring structures that are attached to the 1'-carbon of the sugar. The bases are purines (adenine and guanine) or pyrimidines (thymine and cytosine) (Figure 1.3). When a sugar is joined to a base, it is called a nucleoside.

A nucleoside is converted to a nucleotide by the attachment of a phosphoric acid group to the 5'-carbon of the sugar ring (Figure 1.4). The four different nucleotides that polymerize to form DNA are 2'-deoxyadenosine 5'-triphosphate (dATP or A), 2'-deoxyguanosine 5'-triphosphate (dGTP or G), 2'-deoxycytidine 5'-triphosphate (dCTP or C), and 2'-deoxythymidine 5'-triphosphate (dTTP or T) (Figure 1.5). These names are usually abbreviated as dATP, dGTP, dCTP, and dTTP, or shortened further as A, G, C, and T.

Individual nucleotides are linked together to form a polynucleotide by phosphodiester bonds (Figure 1.4). Polynucleotides have chemically distinct ends. In Figure 1.5, the top of the polynucleotide ends with a nucleotide in which the triphosphate group attached to the 5'-carbon has not participated in a phosphodiester bond. This is called the 5' or 5'-P terminus. At the other end of the molecule the unreacted group is not the phosphate, but the 3'-hydroxyl. This is called the 3' or 3'-OH terminus. This distinction between the two ends (5' and 3') means that polynucleotides have an orientation that is very important in many molecular genetics applications.

Polynucleotides can be of any length and have any sequence of bases. The DNA molecules in chromosomes are probably several million nucleotides long. Because there are no restrictions on the nucleotide sequence, a polynucleotide just 10 nucleotides long could have any one of 410 (or 1,048,576) different sequences. This ability to vary the sequence is what allows DNA to contain complex genetic information.

1.5. The Molecular Structure of RNA

RNA is also a polynucleotide, but with two important differences from the structure of DNA. First, the sugar in RNA is ribose (Figure 1.2). Second, RNA contains the nitrogenous base

Dna Molecular Structure
Figure 1.2. Structure of sugars found in nucleic acids; 2'-deoxyribose is found in DNA and ribose is found in RNA.
Purines Found Dna
Figure 1.3. Bases in DNA are purines (adenine and guanine) or pyrimidines (thymine and cytosine). Uracil is substituted for thymine in RNA.

uracil (U) instead of thymine (Figure 1.3). The four nucleotides that polymerize to form RNA are adenosine 5'-triphosphate, guanosine 5'-triphosphate, cytidine 5'-triphosphate, and uridine 5'-triphosphate, which are abbreviated as ATP, GTP, CTP, and UTP or A, G, C, or U. The individual nucleotides are linked together with 3' to 5' phosphodiester bonds. RNA is typically single-stranded, although it can form complex structures (such as hairpins) or become double-stranded under some circumstances.

1.6. The Double Helix

The discovery, by Watson and Crick (1953), that DNA is a double helix of antiparallel polynucleotides ranks as one of the most important discoveries in biology. Nitrogenous bases are located inside the double helix, with the sugar and phosphate groups forming the backbone of the molecule on the outside (Figure 1.6). The nitrogenous bases of the two polynucleotides interact by hydrogen bonding, with an adenine (A) pairing to a thymine (T) and a guanine (G) to a cytosine (C).

Hydrogen bonds are weak bonds in which two negatively charged atoms share a hydrogen atom between them. Two hydrogen bonds form between A and T, and three between G and C.

Hydrogen Bonds Sugar
Figure 1.4. A nucleoside consists of a sugar joined to a base. It becomes a nucleotide when a phosphoric acid group is attached to the 5'-carbon of the sugar. Nucleotides link together by phosphodiester bonds to form polynucleotides.

Bonding between G and C is thus stronger, and more energy is required to break it. The hydrogen bonds, and other molecular interactions called stacking interactions, hold the double helix together.

The DNA helix turns approximately every 10 base pairs (abbreviated as 10 bp), with spacing between adjacent bp of 3.4 angstroms (A) so that a complete turn requires 34 A (Figure 1.6). The helix is 20 A in diameter and right handed. This means that each chain follows a clockwise path. The strands run antiparallel to each other, with one running in the 5' to 3' direction and the other in the 3' to 5' direction. The DNA helix has two grooves, a major and a minor groove (Figure 1.6). Proteins involved in DNA replication and transcription often interact with the DNA and each other within these grooves.

1.7. Complementary Base Pairing Is Fundamental

The principle of complementary base pairing is a fundamental element of DNA and of great practical significance in many techniques used in genetic engineering. A pairs with T and G pairs with C. Normally, no other base pairing pattern will fit in the helix or allow hydrogen bonding to occur (Figure 1.7).

Complementary base pairing provides the mechanism by which the sequence of a DNA molecule is retained during replication of the DNA molecule, which is crucial

Deoxyguanosine From Dna
Figure 1.5. The four trinucleotides from which DNA is synthesized are 2'-deoxyadenosine 5'-triphosphate ^^^^^^^^ (dATP), 2'-deoxyguanosine 5'-triphosphate (dGTP), 2'-deoxycytidine 5'-triphosphate (dCTP), and 2'-deoxythymidine 5'-triphosphate (dTTP).
Guanosine Triphosphate
Figure 1.6. Two representations of the double helix structure of DNA. The model on the left shows the hydrogen bonding between nitrogenous bases that holds the two antiparallel strands together. The model on the right shows the relative sizes of the atoms in the molecule.

if the information contained in the gene is not to be altered or lost during cell division. Complementary base pairing is also important in the transcription and expression of genetic information in the living insect.

1.8. DNA Exists in Several Forms

DNA actually is a dynamic molecule in living organisms and has several different variations in form. In some regions of the chromosome, the strands of the DNA molecule may separate and later come back together. DNA typically is right-handed, and it can form more than 20 slightly different variations of right-handed helices. In some regions of the molecule, it can even form left-handed helices. If segments of nucleotides in the same strand are complementary, the DNA may even fold back upon itself in a hairpin structure.

DNA exists in different crystalline forms, depending upon the amount of water present in the DNA solution. The B form is the structure in which DNA commonly occurs under most cellular conditions. A-DNA is more compact than B-DNA, with 11 bp per turn of the helix and a diameter of 12 A. In addition, C-, D-, E-, and Z-DNA have been found. The Z-DNA form has a left-handed helix rather than a right-handed helix. A triple helical form (H) also occurs. A, H, and Z forms are thought to occur in cells, and C, D, and E forms of DNA may be produced only under laboratory conditions.

Complementary Base Pairing
Figure 1.7. A) Complementary base-pairing of polynucleotides by hydrogen bonds holds the two strands of ^^^^^^^^ the DNA molecule together. B) Thymine (T) pairs with adenine (A) with two hydrogen bonds, and guanine (G) pairs with cytosine (C) with three.

1.9. Genes

The concept of a "gene" has evolved as genetics has changed (Muller 1947, Maienschein 1992). Until 1944, when Avery et al. (1944) demonstrated that the genetic information resided in nucleic acids, it was considered possible that the genetic information was encoded in proteins. "Genes" can be a specific location on a chromosome, a particular type of biochemical material, and a physiological unit that directs development. Genes are segments of a DNA molecule, which may vary in size from as few as 75 nucleotides (nt) to more than 200 kilobases (kb) of DNA. (A kilobase is 1000 nucleotides.) Genes contain biological information by coding for the synthesis of an RNA molecule. The RNA may subsequently direct the synthesis of an enzyme or other protein molecule. RNA also may be used directly as the

Figure 1.8. Genetic information is contained in genes carried on one of the two strands (coding strand). ^^^^^^^^ The complementary strand in that region is the noncoding strand. Genes can occur on different strands at different points of the DNA molecule. Noncoding DNA between genes is called intergenic or spacer DNA.

Figure 1.8. Genetic information is contained in genes carried on one of the two strands (coding strand). ^^^^^^^^ The complementary strand in that region is the noncoding strand. Genes can occur on different strands at different points of the DNA molecule. Noncoding DNA between genes is called intergenic or spacer DNA.

gene product itself, e.g., as transfer RNA, ribosomal RNA, small nucleolar RNA, and small nuclear RNA (Eddy 2001). Proteins may regulate other genes, form part of the structure of cells, or function as enzymes. Expression of the information contained in protein-coding genes involves a two-step process of transcription and translation (Figure 1.1).

We now know that the actual genetic information is determined by just one of the two polynucleotide strands of the double-helix DNA molecule. This is called the coding strand, and the other strand is the noncoding complement to it. Sometimes the coding strand is known as the sense strand and the noncoding as the antisense strand. A few examples are known in which both strands in a specific region code for different genes. Often one strand of the double helix may be the sense strand over part of its length but be the antisense strand over other segments (Figure 1.8). As you can see, the definition of a "gene" is complex and has changed through time (Eddy 2001, Nelkin 2001). A protein-coding gene typically includes a variety of regulatory structures and signals, as will be described in Chapter 2.

1.10. The Genetic Code Is a Triplet and Is Degenerate

The genetic code for a protein-coding gene is based on the sequence of three nucleotides in the DNA molecule. The triplet sequence (or codon) determines which amino acids are assembled in a particular sequence into proteins. It is possible to order four different bases (A, T, C, G) in combinations of three into 64 triplets or codons. However, there are only approximately 20 different amino acids, so the question immediately arises: what do the other 44 codons do?

The answer is that the code is degenerate, with all amino acids except methionine and tryptophan determined by more than one codon (Table 1.1). The codons in Table 1.1 are represented by A, U, C, and G because the genetic information in DNA is transcribed by messenger RNA, which uses U instead of T.

The genetic code contains punctuation codons. Three different codons (UAA, UGA, and UAG) function as "stop" messages or termination codons; they occur at the end of a protein-coding gene to indicate where translation should stop. AUG serves as an initiation or start codon when it occurs at the front end of a gene. Because AUG is the sole codon for the amino acid methionine, AUGs also are found in the middle of genes.

Table 1.1. The 20 Amino Acids That Occur in Proteins and Their Codons

Amino acid

Abbreviations

Codons

Was this article helpful?

0 0

Responses

  • victor
    Is pyrimidines present in mrna?
    8 years ago
  • demsas
    Where is deoxyribose found?
    8 years ago
  • luca
    Why is uracil only found in rna?
    8 years ago
  • anke
    What bonds two nucleotides together?
    8 years ago
  • deborah
    Which purines are found in dna?
    7 years ago

Post a comment