ancestor of all modern organisms and that, therefore, their pattern requires no further explanation2 . This `frozen accidenť hypothesis is a useful null model against which other models can be tested, but does not predict the observed order in the genetic code. The model has also been criticized because we now know that the code is not universal, and thus variant codes might have existed before the last common ancestor, as well as at present. There are three main challenges to the frozen-accident model, which are based on `adaptive', `historicaľ and `chemicaľ arguments. All three deal only with the genetic code present in the last universal ancestor and might not apply to more-recent changes. The `adaptive' challenge suggests that the pattern of codon assignments is an adaptation that optimizes some function, such as minimization of errors caused by mutation or mistranslation. The `historicaľ challenge suggests that THE GENETIC CODE remains an enigma, even though the full codon catalog was deciphered over 30 years ago. Although we know which base triplets encode which amino acids, and even how these assignments vary among taxa, we do not know why the specific codon assignments take their actual form1 . Why, for instance, does the AUU triplet encode isoleucine rather than some other amino acid? Why do some amino acids have more codons than others? And why do amino acids that have similar chemical properties tend to have similar codons (Fig. 1)? The simplest answer is that codon assignments were historical accidents that became fixed in the last common the genetic code accumulated amino acids over a long period of time and that codon assignments reflect this pattern of incremental expansion. The `chemicaľ challenge suggests that certain codon assignments were directly influenced by favorable chemical interactions between particular amino acids and short nucleic acid sequences, whereas lack of such interactions excluded other amino acids from proteins entirely. Here, we evaluate the evidence for these three views and suggest how they might be combined into a coherent synthesis of code evolution. Adaptation the best of all possible codes? The earliest explanations for the observed order in the genetic code, such as Crick's ingenious commaless code3 , assumed that natural selection somehow optimized the codon catalog. Given that more changes to a protein are deleterious than beneficial, the genetic code should reduce the impact of errors: the pattern of degeneracy, which groups together codons for the same amino acid, certainly has this effect (Fig. 1). The `lethal mutation' model4 proposed that the genetic code reduces the effects of point mutation, whereas the `translation error' model5 proposed that the code structure instead reduces the effects of errors during translation. The principal evidence that supported these early models came from inspection of the genetic code itself: (1) codons for the same amino acid typically vary only at the third position; (2) amino acids that have U at the second position of their codon are hydrophobic, whereas those that have A at the second position are hydrophilic; and (3) the genetic code initially appeared to be universal5 . This evidence is neither compelling nor unequivocal. Crick's wobble hypothesis6 explained much of the degeneracy of the code in terms of simple chemical considerations: a single tRNA anticodon can recognize multiple codons by nonstandard base pairing. The association between second-position base and amino acid hydrophobicity holds only for two of the four bases TIBS 24 ­ JUNE 1999 2410968­0004/99/$ ­ See front matter 1999, Elsevier Science. All rights reserved. PII: S0968-0004(99)01392-4 31 Ouyang, L., Chen, X. and Bieker, J. J. (1998) J. Biol. Chem. 273, 23019­23025 32 Tanese, N. et al. (1996) Proc. Natl. Acad. Sci. U. S. A. 93, 13611­13616 33 Ryu, S., Zhou, S., Ladurner, A. G. and Tjian, R. (1999) Nature 397, 446­450 34 Schaeper, U. et al. (1995) Proc. Natl. Acad. Sci. U. S. A. 92, 10467­10471 35 Nibu, Y. et al. (1998) EMBO J. 17, 7009­7020 36 Koritschoner, N. P. et al. (1997) J. Biol. Chem. 272, 9573­9580 37 Sogawa, K. et al. (1993) Nucleic Acids Res. 21, 1527­1532 38 Yet, S. F. et al. (1998) J. Biol. Chem. 273, 1026­1031 39 Subramaniam, M. et al. (1995) Nucleic Acids Res. 23, 4907­4912 40 Imataka, H. et al. (1992) EMBO J. 11, 3663­3671 41 Wimmer, E. A. et al. (1993) Nature 366, 690­694 REVIEWS Selection, history and chemistry: the three faces of the genetic code Robin D. Knight, Stephen J. Freeland and Laura F. Landweber The genetic code might be a historical accident that was fixed in the last common ancestor of modern organisms. `Adaptive', `historicaľ and `chemicaľ arguments, however, challenge such a `frozen accidenť model. These arguments propose that the current code is somehow optimal, reflects the expansion of a more primitive code to include more amino acids, or is a consequence of direct chemical interactions between RNA and amino acids, respectively. Such models are not mutually exclusive, however. They can be reconciled by an evolutionary model whereby stereochemical interactions shaped the initial code, which subsequently expanded through biosynthetic modification of encoded amino acids and, finally, was optimized through codon reassignment. Alternatively, all three forces might have acted in concert to assign the 20 `naturaľ amino acids to their present positions in the genetic code. R. D. Knight, S. J. Freeland and L. F. Landweber are at the Dept of Ecology and Evolutionary Biology, Guyot Hall, Princeton University, Princeton, NJ 08544-1003, USA. Email: lfl@princeton.edu REVIEWS TIBS 24 ­ JUNE 1999 242 (Fig. 1). Finally, if code optimization had actually occurred, then the present genetic code must have been selected from a large pool of alternative genetic codes (a problem when the code was thought to be absolutely invariant). These shortcomings, given the choice of the frozenaccident theory as an alternative, probably account for the decline of adaptive explanations towards the end of the 1960s. A variety of criteria have been used to assess whether the genetic code is in some sense optimal. These analyses fall into two main classes: `statisticaľ and `engineering'. The statistical ap- proaches7­11 compare the natural code with many randomly generated alternative codes and typically have concluded that the genetic code conserves amino acid properties far better than would a random code. In contrast, the engineering approaches12­16 compare the natural code with only the best possible alternative (i.e. the code that formally minimizes the change in amino acid properties following an average single point mutation), and conclude that the genetic code is still far from optimal. The statistical approach provides a more realistic representation of the variability available to selection than does the engineering approach. Because the engineering approach measures optimality on a linear scale as a fraction of the distance between the mean and optimal codes, it ignores the distribution of possible codes. This distribution is roughly Gaussian: increasingly optimal codes are increasingly rare, and the difference between successively more optimal codes decreases as optimality increases. Consequently, the globally optimal code might be unattainable, whereas the most optimal code accessible by point mutations is still closer to optimal than almost all alternatives. In fact, our unpublished results indicate that the canonical genetic code is closer to optimal than practically all alternatives, and this conclusion holds for differences in both measurement of optimality and distribution of possible codes. However, the evolutionary plasticity of the code might have been limited by unknown chemical or historical constraints. The principal objection to optimization theories has been that a change in the genetic code causes mutations in every protein, most of which are likely to be deleterious. Consequently, once cells relied on a particular genetic code to any appreciable extent, the further changes required by the optimization process would have become increasingly un- likely2 . The ability of the genetic code to change is a prerequisite for theories that involve optimization through a stepwise evolutionary process. The discovery that the genetic code is not invariant17 removed this objection: if the genetic code recently has changed in apparently nonadaptive ways, then similar changes might have facilitated adaptation in the past. Actual changes in the nuclear genomes of eukaryotes (Fig. 2a) indicate that, even in metabolically complex organisms, the code is far from frozen. Two mechanisms account for the codon swapping evident in a variety of species, and in both nuclear and mitochondrial genomes (Fig. 2). In the Osawa­Jukes mechanism18 , particular codons vanish from the genome because of mutational pressure on the genome for changes in A.T or G.C composition, and the corresponding tRNAs are lost. When the mutational pressure later reverses, codons that lack cognate tRNAs inhibit translation. Consequently, any mutation that allows translation of these codons is advantageous. Such a mutation can occur through duplication of an existing tRNA gene and subsequent mutation of the anticodon to recognize a different codon. If the mutated tRNA still retains its original aminoacyltRNA synthetase specificity, the codon will encode an amino acid that differs from that used by the canonical code. The Schultz­Yarus mechanism19 is similar but does not require the complete disappearance of a codon from the U C A G U UUU Phe UCU Ser UAU Tyr UGU Cys UUC Phe UCC Ser UAC Tyr UGC Cys UUA Leu UCA Ser UAA TER UGA TER UUG Leu UCG Ser UAG TER UGG Trp C CUU Leu CCU Pro CAU His CGU Arg CUC Leu CCC Pro CAC His CGC Arg CUA Leu CCA Pro CAA Gln CGA Arg CUG Leu CCG Pro CAG Gln CGG Arg A AUU Ile ACU Thr AAU Asn AGU Ser AUC Ile ACC AAC Asn AGC Ser AUA Ile ACA Thr AAA Lys AGA Arg AUG Met ACG Thr AAG Lys AGG Arg G GUU Val GCU Ala GAU Asp GGU Gly GUC Val GCC Ala GAC Asp GGC Gly GUA Val GCA Ala GAA Glu GGA Gly GUG Alkyl Val GCG Ala GAG Glu GGG Gly Thr Alkyl STOP Acidic Amide Aromatic Basic Sulfur containing Hydroxyl containing Figure 1 The `universaľ genetic code. Shading indicates polar requirement (PR)1 : lighter shades (black text), PR 6 (hydrophobic); medium shades (yellow text), PR 6­8 (medium); darker shades (white text) PR 8 (hydrophilic). Amino acids whose codons have U at the second position tend to be unusually hydrophobic; those whose codons have A at the second position tend to be hydrophilic. Amino acids that share structural similarity tend to share codon sets connected by single point mutations: for instance, the basic amino acids arginine, lysine and histidine are connected. Ter, termination codon. TIBS 24 ­ JUNE 1999 243 genome before the transfer takes place. Instead, a mutation in a duplicated tRNA that generates either a new anticodon or a new aminoacyl-charging specificity leads to ambiguous translation of one or more codons. If this new specificity confers an advantage, selection will fix the new codon set. The fact that certain Candida species have ambiguous translation depending on the circumstances, CUG will encode either serine or leucine supports the model20 . History searching for footprints of the code's ancestors Historical theories propose that the present code evolved from a simpler ancestral form: proteins produced by the initial, limited, set of amino acids synthesized new amino acids that could in turn be incorporated into the code. Recently introduced amino acids presumably would take over codons from their metabolic precursors; this could happen only if the resulting changes in protein structure were not widely del- eterious2 . Consequently, historical theories often predict that similar amino acids would be assigned to similar codons even without explicit selection for error minimization. The principal evidence for coevolution of amino acids and the code through stepwise expansion comes from cases in which dissimilar amino acids from related biosynthetic pathways also share similar codons (Fig. 3). Several authors argue that a disproportionate number of biosynthetically related amino acids have codons connected by single point mutations14,16,21,22 ; however, because many amino acids are interconvertible, even randomized codes show similar associations between biosynthetically related amino acids and single base changes in codons23 . One intriguing suggestion is that the first- and second-position bases have different functions: the second-position bases connect amino acids that have similar properties; and the first-position bases connect amino acids from the same biosynthetic pathway24 . Codons of the form GNN correspond to amino acids thought to be most primitive for several reasons24 ; this might suggest that UNN, CNN and ANN codons were transferred to novel amino acids as their synthesis became possible. This hypothesis constrains the set of possible codes considerably, but does not explain the near optimality of the code11 . Another approach looks at the phylogenies of tRNAs and of aminoacyl-tRNA synthetases (the enzymes that specifically link amino acids to their cognate tRNAs). If amino acids were added sequentially to the code, then tRNA and aminoacyl-tRNA synthetase phylogenies should be congruent; this would reflect duplication and divergence of a tRNA and its cognate synthetase as each amino acid was added. Unfortunately, most studies that examined tRNA phy- logenies25­27 have assumed that trees derived from the set of tRNAs in different species are congruent, which is not the case28 . Because tRNAs can change either their anticodons or their amino acid specificity remarkably easily29, modern tRNA phylogenies are unlikely to reveal anything about the phylogeny of tRNAs in the last common ancestor. Furthermore, tRNA phylogenies are likely to become increasingly unstable as more sequences are added: this apparent tRNA flexibility is consistent with the requirement of the adaptive theories that the code be able to change. Phylogenies of aminoacyl-tRNA synthetases prove slightly more revealing. Aminoacyl-tRNA synthetases fall into two main classes. Some of those for related amino acids cluster together30 , and phylogenies are similar among widely separated taxa31. Interestingly, REVIEWS U C A G U UUU Phe UCU Ser UAU Tyr UGU Cys UUC Phe UCC Ser UAC Tyr UGC Cys UUA Leu UCA Ser UAA TER UGA TER UUG Leu UCG Ser UAG TER UGG Trp C CUU Leu CCU Pro CAU His CGU Arg CUC Leu CCC Pro CAC His CGC Arg CUA Leu CCA Pro CAA Gln CGA Arg CUG Leu CCG Pro CAG Gln CGG Arg A AUU Ile ACU Thr AAU Asn AGU Ser AUC Ile ACC AAC Asn AGC Ser AUA Ile ACA Thr AAA Lys AGA Arg AUG Met ACG Thr AAG Lys AGG Arg G GUU Val GCU Ala GAU Asp GGU Gly GUC Val GCC Ala GAC Asp GGC Gly GUA Val GCA Ala GAA Glu GGA Gly GUG Val GCG Ala GAG Glu GGG Gly Thr Trp Ancestral mitochondrion ­Dictyostelium ­Plants Chondrus crispus Some prymnesophytes Nonsense Yeast Nonsense Candida Prototheca (alga) Various Bilateria (Ser) ­Drosophila (nonsense) ­Vertebrates (Gly) ­Tunicates (TER) Asn Platyhelminths Echinoderms Various Some chlorophytes (UAG = Leu) Some chlorophytes (UAG = Ala) Platyhelminths (UAA = Tyr) Met Yeast Triploblasts ­Echinoderms Thr Yeast U C A G U UUU Phe UCU Ser UAU Tyr UGU Cys UUC Phe UCC Ser UAC Tyr UGC Cys UUA Leu UCA Ser UAA TER UGA TER UUG Leu UCG Ser UAG TER UGG Trp C CUU Leu CCU Pro CAU His CGU Arg CUC Leu CCC Pro CAC His CGC Arg CUA Leu CCA Pro CAA Gln CGA Arg CUG Leu CCG Pro CAG Gln CGG Arg A AUU Ile ACU Thr AAU Asn AGU Ser AUC Ile ACC AAC Asn AGC Ser AUA Ile ACA Thr AAA Lys AGA Arg AUG Met ACG Thr AAG Lys AGG Arg G GUU Val GCU Ala GAU Asp GGU Gly GUC Val GCC Ala GAC Asp GGC Gly GUA Val GCA Ala GAA Glu GGA Gly GUG Val GCG Ala GAG Glu GGG Gly Thr Ser Candida ­Saccharomyces Nonsense Micrococcus Nonsense Micrococcus Nonsense Mycoplasma Spiroplasma Cys/Trp Euplotes/ Mycoplasma Spiroplasma Gln Diplomonads Acetabularia Some ciliates ­Other ciliates (a) Nuclear variants (b) Mitochondrial variants Figure 2 Naturally occurring variants of the canonical genetic code. (a) Nuclear variants (including changes effective within bacterial genomes)34,48,49 . (b) Mitochondrial variants48,50,51 (yeast variants are from http://www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode c). Missense changes are shown in yellow; nonsense changes are shown in gray; changes in termination codons are shown in red. ` indicates a reversal of a change in a particular lineage. REVIEWS TIBS 24 ­ JUNE 1999 244 although most organisms have a class II lysyl-tRNA synthetase, some archaea and spirochetes have a class I lysyltRNA synthetase32. Given that the class I lysyl-tRNA synthetases are monophyletic and cluster within the other type I syn- thetases31 , the last common ancestor of all organisms probably contained both types of synthetase, and all lineages probably lost one or the other at a later stage33 . However, because the complete set of tRNA synthetases and tRNAs was present in the last common ancestor, phylogenetic analysis alone cannot discriminate between stepwise introduction of amino acids into translation and stepwise takeover of aminoacylation by protein aminoacyl-tRNA synthetases from more-primitive catalysts. Although congruence between tRNA and synthetase phylogenies would have provided striking evidence for sequential amino acid incorporation, the lack of such congruence provides evidence against expansion of the code during synthetase evolution. The present synthetases might have usurped the roles of earlier ribozymes that had the same functions, erasing the information in the original synthetases about the order in which amino acids were added to the code. Stereochemistry does it fit the evidence? Stereochemical theories propose that amino acids are assigned to particular codons because of direct chemical interactions between RNA and amino acids. If these interactions follow consistent patterns, similar amino acids should bind to similar short RNA motifs and should therefore have similar codons. Although the resulting pattern of codon assignments might be adaptive, relative to randomized codes (because a point mutation would tend to substitute a relatively similar amino acid), it need not have been explicitly selected for this effect. Thus, the rules that constrain the set of chemically plausible codes might also lead to apparent error minimization. The fact that the genetic code initially appeared to be universal provided the strongest support for stereochemical theories, because it suggested that the actual code is the only possible code. However, the known variations in the code do not disprove the stereochemical theories. All deviations from the canonical code appeared recently in comparison with the last common ancestor: the first surviving change probably appeared in the lineage leading to diplomonads34 , and most are much more recent. Furthermore, no known code differs by more than a few amino acids from the standard code. Because translation pairs codons with amino acids through a tRNA adaptor, the mechanisms that allowed recent changes in the genetic code might be entirely different from those that generated the code initially. All stereochemical theories have dealt only with the canonical code found in the last common ancestor, Tyr UAY His CAY Trp UGR Phe UUY Leu CUN Ile AUY Met AUR Lys AAR Gln CAR Arg CGN Pro CCN UUR AGR Ser UCN AGR Gly GGN Cys UGY Val GCN Ala GUN Asp GAY Asn AAY Citrate Glucose PG PEP Pyruvate Acetyl-CoA OAA Glu GAR KG Ru(5)P Trp UGR Ile AUY Lys AAR Pro CCN CUNLeu UUR Arg CGN AGR Ser UCN AGR Gly GGN Ala GUN Gln CAR His CAY Asp GAY Glu GAR Val GCN Thr ACN Asn AAY Met AUR Cys UGY Phe UUY (a) (c) Trp UGR Ile AUY Pro CCN CUNLeu UUR Arg CGN AGR Ser UCN AGR Gly GGN Ala GUN Gln CAR His CAY Asp GAY Val GCN Thr ACN Met AUR Cys UGY Phe UUY(b) Glu GAR Asn AAY Lys AAR Thr ACN Tyr UAY Tyr UAY Alkyl Alkyl Acidic Amide Aromatic Basic Sulfur containing Hydroxyl containing Figure 3 Biosynthetic pathways and code assignments. (a) Primitive sulfur-metabolizing bacteria (hypothetical)47. (b) Generalized prokaryotes21. (c) Escherichia coli24. Shading indicates polar requirement (PR)1: lighter shades (black text), PR 6 (hydrophobic); medium shades (yellow text), PR 6­8 (medium); darker shades (white text) PR 8 (hydrophilic). Bounded areas highlight codons that share the same first base identity. KG, -ketoglutarate; OAA, oxaloacetic acid; PEP, phosphoenolpyruvate; PG, phosphoglycerate; Ru(5)P, ribulose 5-phosphate. TIBS 24 ­ JUNE 1999 245 because later changes probably were unaffected by stereochemical constraints. The first stereochemical theories about the origin of the code relied on chemical models. These provided weak support for a variety of possible pairing mechanisms: amino acids might bind to their cognate codons35 , anticodons36 , reversed codons37 , codon­anticodon double helices38 or a complex of four nucleotides containing the anticodon at the end of the acceptor stem39 . Unfortunately, the diversity of results reduces their significance: the apparent freedom inherent in the building and REVIEWS (a) (b) (c) Origin of code Time Last common ancestor Figure 4 Three models of early code evolution. The `universaľ genetic code found in the last common ancestor (pink circle) might or might not be similar to the first genetic code that evolved (blue circle). (a) The primordial genetic code is maintained by lineage merging in a reticulate network: there is little competition between lineages, and lineages that share the majority genetic code have the advantage of using novel proteins from other lineages when protocells merge. (b) Strong selection for increased code efficiency among lineages drives the code in the last common ancestor far from the primordial code. Most lineages with variant codes become extinct, but a few successfully reach new local optima. (c) Despite competition among lineages, the chemical factors leading to the establishment of the original genetic code are much the same as the factors that influence the error in a given amino acid substitution; therefore the final code remains similar to the initial code. Aptamer experiments can distinguish (b) from (a) and (c) by providing evidence for a primordial code that might or might not be similar to the code in the last common ancestor. Evolution of a complex RNA world? Origin of the earth Origin of life Last common ancestor Extant life Code expansion ­ coevolution Origin of code Code origin ­ stereochemistry Code adaptation ­ error minimization Evolution of a complex RNA world? (a)Antagonistic evolutionary forces (b)Complementary evolutionary forces Figure 5 Three facets of code evolution. The genetic code probably originated through stereochemical interactions and, then, underwent a period of expansion in which new amino acids were incorporated. The evolution of the tRNA system, which separated codons from direct interaction with amino acids, then allowed reassignment of codons and, therefore, adaptive evolution. Traditionally, these forces have been assumed to be antagonistic (a), but they might actually have been complementary (b); for example, current codon assignments might assign biosynthetically similar amino acids to similar codons, which would meet both stereochemical and adaptive criteria. REVIEWS TIBS 24 ­ JUNE 1999 246 interpretation of these models has undermined the significance of any particular model, especially in the absence of empirical predictions. Another approach has been to examine interactions between amino acids and individual bases or nucleotides. Early studies showed that `polar requirement,' a partitioning coefficient of a water­pyridine system that reflects hydrophobicity, varies among secondposition bases1 . Other approaches included tests for the following: (1) correlations between the hydrophobicity of an amino acid and particular nucleotides or dinucleotides; (2) correlations between the partitioning coefficients of amino acids and nucleotides on various surfaces; and (3) differential effects of particular amino acids on nucleotide solubility. These studies tend to show weak associations between anticodons and amino acids40 . The most direct test of RNA­aminoacid interactions is to determine the precise RNA sequences that bind most strongly to each amino acid. In vitro selection, which isolates nucleic acid molecules that bind to a particular target by selective amplification over several generations41 , has generated aptamers (RNA ligands) for several amino acids. Interactions between arginine and RNA have been studied in most detail: several laboratories have selected and characterized the binding of arginine to arginine aptamers. The set of codons assigned to arginine occurs far more often at arginine-binding sites than would be expected by chance: arginine anticodons, and the codon sets assigned to other amino acids, do not show this associ- ation42 . We propose that this is also the case for at least some other amino acids and their codons, and that arginine interacts with its codons in other contexts, such as in RNA-binding proteins. Such intrinsic affinities between codons and amino acids might have influenced early codon assignments. Information about RNA molecules that bind to other amino acids will test the generality of this hypothesis. The first isoleucine aptamers seem to have critical isoleucine codons at their binding sites, although the first valine aptamers do not43 . The RNA world: the milieu of code evolution? Translation presents a `chicken or egg' problem: given that many crucial components of the translation apparatus (including aminoacyl-tRNA synthetases, release factors and much of the ribosome) are made of protein, how could translation ever have evolved? The RNA-world hypothesis44 avoids this problem by suggesting that RNA preceded DNA and protein and acted as both genetic material and catalyst. The structure of the genetic code might contain information about the chemical environment in which the code evolved. Two plausible pathways explain how a genetic code arose in an RNA world. First, RNA catalysts might have built specific peptides residue by residue, much in the way that short peptides are now constructed by specific enzymes. Once a general translation system evolved, it would have supplanted these early peptide-synthesis pathways. Second, some ribozymes might have used amino acids, and later peptides, as cofactors45 . As peptide synthesis became more feasible, the peptide parts of the hybrid catalysts would increasingly have replaced the RNA components; the final result was a protein world in which a few essential nucleotide cofactors remained as molecular fossils. In either case, specific interactions between RNA and amino acids would have been necessary to establish the initial coding system. Compelling evidence (see above) supports the idea that arginine, and perhaps isoleucine, interacts with its codons in RNA aptamers and that the genetic code is highly optimal with respect to error minimization. When sequences for aptamers for more amino acids are available, we will be able to test whether chemical factors influenced the choice of amino acids and their codon assignments in the canonical genetic code. Assuming that each amino acid was originally assigned those codons for which it has greatest chemical affinity, it would be possible to reconstruct this primordial genetic code. The divergence between this primordial code and the code found in the last common ancestor of all life could test models of early code evolution (Fig. 4). We envisage a series of definite, although perhaps overlapping, stages in the evolution of the code (Fig. 5). At first, in the RNA world, stereochemical interactions would have largely determined the correspondence between certain RNA-sequence tags and amino acids. Such early peptides, generated by direct templating43 or similar mechanisms, need not have had catalytic function: for instance, short positively charged arginine repeats might have neutralized the phosphate backbones of RNA molecules, potentially allowing uptake of the latter through membranes46 and/or their refolding into active structures. As amino acid and peptide cofactors, and eventually catalysts, became more prevalent at the onset of the RNA­protein world, coevolution of the code and the amino acid set might have led to expansion of the code on the basis of metabolic relatedness47 . This expansion would also have preserved the rules initially established by stereochemical interactions in order to continue making the original templated protein or proteins. Finally, after the evolution of the mRNA­tRNA­aminoacyl-tRNA-synthetase system removed direct interaction between amino acids and codons, codon swapping in different lineages would have permitted some degree of code optimization by codon reassignment. Code optimization, however, need not be limited to this late stage: error minimization might have acted in concert both with stereochemical considerations and with biosynthetically driven code expansion to produce the canonical code (Fig. 5b). Recent evidence that suggests that the code has a highly optimized structure7­11 highlights the crucial gap in our understanding of its evolution: the pattern of chemical interactions between the 64 codons and 20 amino acids remains largely unknown. Only when these interactions are known will we be able to understand the relative importance of selection, history and chemistry in code evolution. References 1 Woese, C. R., Dugre, D. H., Saxinger, W. C. and Dugre, S. A. (1966) Proc. Natl. Acad. Sci. U. S. A. 55, 966­974 2 Crick, F. H. C. (1968) J. Mol. Biol. 38, 367­379 3 Crick, F. H. C. (1957) Biochem. Soc. Symp. 14, 25­26 4 Sonneborn, T. M. (1965) in Evolving Genes and Proteins (Bryson, V. and Vogel, H. J., eds), pp. 377­297, Academic Press 5 Woese, C. R. (1967) The Genetic Code: The Molecular Basis for Genetic Expression, Harper and Row 6 Crick, F. H. (1966) J. Mol. Biol. 19, 548­555 7 Alff-Steinberger, C. (1969) Proc. Natl. Acad. Sci. U. S. A. 64, 584­591 8 Haig, D. and Hurst, L. D. (1991) J. Mol. Evol. 33, 412­417 9 Ardell, D. H. (1998) J. Mol. Evol. 47, 1­13 10 Freeland, S. J. and Hurst, L. D. (1998) J. Mol. Evol. 47, 238­248 11 Freeland, S. J. and Hurst, L. D. (1998) Proc. R. Soc. London Ser. B 265, 2111­2119 12 Wong, J. T. (1980) Proc. Natl. Acad. Sci. U. S. A. 77, 1083­1086 13 Di Giulio, M. (1989) J. Mol. Evol. 29, 288­293 14 Di Giulio, M. (1991) Z. Naturforsch. 46c, 305­312 15 Di Giulio, M., Capobianco, M. R. and Medugno, M. (1994) J. Theor. Biol. 168, 43­51 16 Di Giulio, M. (1998) J. Mol. Evol. 46, 615­621 17 Barrell, B. G., Bankier, A. T. and Drouin, J. (1979) Nature 282, 189­194 TIBS 24 ­ JUNE 1999 2470968­0004/99/$ ­ See front matter 1999, Elsevier Science. All rights reserved. PII: S0968-0004(99)01396-1 18 Osawa, S. and Jukes, T. H. (1988) Trends Genet. 4, 191­198 19 Schultz, D. W. and Yarus, M. (1994) J. Mol. Biol. 235, 1377­1380 20 Yarus, M. and Schultz, D. W. (1997) J. Mol. Evol. 45, 1­8 21 Wong, J. T-F. (1975) Proc. Natl. Acad. Sci. U. S. A. 72, 1909­1912 22 Miseta, A. (1989) Physiol. Chem. Phys. Med. NMR 21, 237­242 23 Amirnovin, R. (1997) J. Mol. Evol. 44, 473­476 24 Taylor, F. J. R. and Coates, D. (1989) Biosystems 22, 177­187 25 Eigen, M. and Winkler-Oswatitsch, R. (1981) Naturwissenschaften 68, 282­292 26 Fitch, W. M. and Upper, K. (1987) Cold Spring Harbor Symp. Quant. Biol. 52, 759­767 27 Eigen, M. et al. (1989) Science 244, 673­679 28 Saks, M. E. and Sampson, J. R. (1995) J. Mol. Evol. 40, 509­518 29 Saks, M. E., Sampson, J. R. and Abelson, J. (1998) Science 279, 1665­1670 30 Nagel, G. M. and Doolittle, R. F. (1995) J. Mol. Evol. 40, 487­498 31 Ribas de Pouplana, L., Turner, R. J., Steer, B. A. and Schimmel, P. (1998) Proc. Natl. Acad. Sci. U. S. A. 95, 11295­11300 32 Ibba, M., Bono, J. L., Rosa, P. A. and Soll, D. (1997) Proc. Natl. Acad. Sci. U. S. A. 94, 14383­14388 33 Landweber, L. F. and Katz, L. A. (1998) Trends Ecol. Evol. 13, 93­94 34 Keeling, P. J. and Doolittle, W. F. (1997) Mol. Biol. Evol. 14, 895­901 35 Pelc, S. R. and Welton, M. G. E. (1966) Nature 209, 868­872 36 Dunnill, P. (1966) Nature 210, 1267­1268 37 Root-Bernstein, R. S. (1982) J. Theor. Biol. 94, 895­904 38 Hendry, L. B. and Whitham, F. H. (1979) Perspect. Biol. Med. 22, 333­345 39 Shimizu, M. (1982) J. Mol. Evol. 18, 297­303 40 Lacey, J. C., Jr (1992) Orig. Life Evol. Biosph. 22, 243­275 41 Landweber, L. F., Simon, P. J. and Wagner, T. A. (1998) BioScience 48, 94­103 42 Knight, R. D. and Landweber, L. F. (1998) Chem. Biol. 5, R215­R220 43 Yarus, M. (1998) J. Mol. Evol. 47, 109­117 44 Gilbert, W. (1986) Nature 319, 618 45 Szathmáry, E. (1993) Proc. Natl. Acad. Sci. U. S. A. 90, 9916­9920 46 Jay, D. G. and Gilbert, W. (1987) Proc. Natl. Acad. Sci. U. S. A. 84, 1978­1980 47 Dillon, L. S. (1973) Bot. Rev. 39, 301­345 48 Osawa, S. (1995) Evolution of the Genetic Code, Oxford University Press 49 Tourancheau, A. B. et al. (1995) EMBO J. 14, 3262­3267 50 Hayashi-Ishimaru, Y. et al. (1996) Curr. Genet. 30, 29­33 51 Hayashi-Ishimaru, Y., Ehara, M., Inagaki, Y. and Ohama, T. (1997) Curr. Genet. 32, 296­299 REFLECTIONS After graduating from Medical School in 1961, I went to work in Seymour Benzer's laboratory at Purdue University, where I was privileged to participate in a series of exciting experiments on the then emergent genetic code. One study that received some notoriety was a critical test of the `adaptor hypothesis' proposed by Francis Crick in 1958. Crick had postulated that a small oligonucleotide, possibly soluble RNA (sRNA, as it was then known; tRNA as it is known today), functions as an adaptor for the incorporation of amino acids into protein1 . Thus, it followed that once an amino acid is attached to sRNA, the specificity with which it is incorporated into protein resides solely in the sRNA adaptor to which it is attached. The Raney-nickel experiment2 (Fig. 1), as it came to be called, is often cited as the critical experiment that proved the adaptor hypothesis. It allowed us to demonstrate that, in an in vitro proteinsynthesis system, alanine from alanyl- tRNACys is incorporated into protein at positions normally occupied by cysteine rather than at those occupied by alanine. The Raney-nickel experiment, however, was only one of a series of experiments that confirmed the adaptor hypothesis. As interesting as the results of the experiments themselves was the way in which these experiments came to be done and what followed. Genetic studies of allele-specific suppression that led to the Raney-nickel experiment This trail of research did not start out as an attempt to test the adaptor hypothesis, but developed from allelespecific (i.e. mutant-specific) geneticsuppressor studies of the phage-T4 rII system by Benzer and Champe, which began around 1959 (Ref. 3). While these studies were in progress, a series of papers from Francois Gros' laboratory4­6 revealed that Escherichia coli grown in the presence of 5-fluorouracil (5-FU) made abnormal proteins. For example, alkaline phosphatase and -galactosidase were shown to have altered amino acid compositions and altered thermostability, but conserved antigenicity. We now believe that 5-FU exerts its suppressor activity because, although it is incorporated into mRNA as uracil, it base pairs with guanine in aminoacyltRNA anticodons (i.e. it exhibits the incorporation specificity of cytosine). The observation that the amounts of proline and tyrosine incorporated into total protein, as well as into tRNAcontaining fractions, were markedly increased, suggested that the effect was informationally specific. In parallel with the 5-FU studies, yet a third system, E. coli tryptophan synthetase, provided insight into allele-specific suppression. Yanofsky and St Lawrence, in a review entitled `Gene Action'7 , suggested that some forms of allele-specific suppression that they had seen in their studies might be caused by alterations in the specificity with which amino acids are incorporated into protein. Members of the Benzer lab decided to attempt to explain allele-specific suppression of rII mutants of phage T4. The fluorouracil effect suggested a biochemical variable that they could include in their studies. The most striking aspect of the fluorouracil effect was its high degree of specificity: it restored enzyme activity very effectively for some rII mutants but not at all for others. This suggested that there was a relationship between the fluorouracil effect and the apparently altered specificity of amino acid incorporation reported by the Gros and Yanofsky labs4­7 . By 1960, five years of intensive genetic mapping by the Benzer lab had saturated the rII region with mutations to a degree unprecedented in any other genetic system. We therefore hoped that the patterns of suppression by 5-FU at specific sites might be correlated with the wealth of detailed information about those sites. The problem of allele-specific suppression became even more interesting when it was noted that certain strains of E. coli K carry genetic suppressors (which eventually turned out to be mutant tRNA genes, as predicted) whose action mimics the phenotypicsuppressor activity of 5-FU. Back to Camelot: defining the specific role of tRNA in protein synthesis