Mobile Elements: Drivers of Genome Evolution Haig H. Kazazian Jr.* Mobile elements within genomes have driven genome evolution in diverse ways. Particularly in plants and mammals, retrotransposons have accumulated to constitute a large fraction of the genome and have shaped both genes and the entire genome. Although the host can often control their numbers, massive expansions of retrotransposons have been tolerated during evolution. Now mobile elements are becoming useful tools for learning more about genome evolution and gene function. Mobile, or transposable, elements are prevalent in the genomes of all plants and animals. Indeed, in mammals they and their recognizable remnants account for nearly half of the genome (1, 2), and in some plants they constitute up to 90% of the genome (3). If, as many believe, the origins of life are in an "RNA world" followed by reverse transcription into DNA, then mobile elements could have been very early participants in genome formation (4). Indeed, mobile elements and genes appear to have forged a mutually beneficial relationship. How did this relationship come about? It is clear how mobile elements benefit from genes, because without genes they cannot survive from one generation to the next. But how have genes benefited from the genome shaping of mobile elements? Important insights into genome evolution have emerged from the mining of multiple genome sequences. Here, I concentrate on how mobile elements have affected the evolution of genes and their function, particularly of humans and other mammals. Mobile elements are DNA sequences that have the ability to integrate into the genome at a new site within their cell of origin (5). These elements include (i) DNA transposons, (ii) autonomous retrotransposons, and (iii) nonautonomous retrotransposons (Fig. 1). The mechanism by which many of these elements move is well known, but for others, such as mammalian retrotransposons, there is still much to learn. DNA Transposons DNA transposons are prevalent in bacteria (where they are called IS, or insertion sequences), but are also found in the genomes of many metazoa, including insects, worms, and humans. These elements are generally excised from one genomic site and integrated into another by a "cut and paste" mechanism. Because sequence specificity of integration is limited to a small number of nucleotides-- e.g., TA dinucleotides for Tc1 of Caenorhabditis elegans--insertions can occur at a large number of genomic sites. However, daughter insertions for most, but not all, DNA transposons occur in proximity to the parental insertion. This is called "local hopping." Active transposons encode a transposase enyme between inverted-repeat termini. The transposase binds at or near the inverted repeats and to the target DNA. It then performs a DNA breakage reaction to remove the transposon from its "old" site and a joining reaction to insert the transposon into its "new" site. These reactions proceed with the hydrolysis of phosphodiester bonds between the transposon and flanking DNA to liberate 3 -OH residues that carry out the attack at the "new" site (6). Because the two strands of the "new" DNA are attacked at staggered sites, the inserted transposon is flanked by small gaps which, when filled in by host enzymes, leads to short duplications of sequence at the target sites. These are called target site duplications (TSDs), and their length is often characteristic for a particular transposon (7). The reactions needed to move a piece of DNA use recombinase enzymes, of which there are two main classes. The first class is called conservative because the enzymes do not require high-energy cofactors, the total number of phosphodiester bonds remains unchanged, and no DNA degradation or resynthesis occurs. Examples of this recombinase type are the integrase protein of bacteriophage , Cre recombinase, and Flp recombinase. The second class is the transposases that catalyze a whole set of reactions necessary for DNA transposition. Examples are the transposases of Mu, P elements, and the Tc1/ mariner family, and the integrases of long terminal repeat (LTR) retrotransposons and retroviruses. All of these enzymes share certain structural motifs such as a D,D35E sequence (aspartate, aspartate, 35 amino acid residues, then a glutamate) and a handlike three-dimensional structure (6, 8). Although these elements generally transpose to genomic sites less than 100 kb from their original site (e.g., the Drosophila P element), some are able to make distant "hops" (e.g., the fish Tc1/mariner element; see below). LTR Retrotransposons Retrotransposons are transcribed into RNA, and then reverse transcribed and reintegrated into the genome, thereby duplicating the element. The major classes of retrotransposons either contain long terminal repeats at both ends (LTR retrotransposons) or lack LTRs and possess a polyadenylate sequence at their 3 termini (non-LTR retrotransposons). LTR retrotransposons and retroviruses are quite similar in structure (Fig. 1). They both contain gag and pol genes that encode a viral particle coat (GAG) and a reverse transcriptase (RT), ribonuclease H (RH), and integrase (IN) to provide enzymatic activities for making cDNA from RNA and inserting it into the genome. They differ in that retroviruses encode an envelope protein that facilitates their movement from one cell to another, whereas LTR retrotransposons either lack or contain a remnant of an env gene and can only reinsert into the genome from which they came. Reverse transcription of retroviral RNA or LTRretrotransposon RNA occurs within the viral or viral-like particle in the cytoplasm (9), and is a complicated, multistep process (Fig. 2). In contrast, reverse transcription of non-LTR retrotransposons occurs by a very different mechanism (see below). Many LTR retrotransposons target their insertions to relatively specific genomic sites. For example, Ty3 elements of Saccharomyces cerevisiae target specifically to a few nucleotides from RNA polymerase III (Pol III) transcription initiation sites (10). Moreover, Pol III transcription factors, TFIIIB and TFIIIC, are essential for Ty3 integration. Ty1 finds a haven within 750 base pairs (bp) upstream of Pol IIItranscribed genes (11), and Ty5 targets the heterochromatin of telomeres and the silent mating loci (12). Ty5 requires a specific protein partner, Sir4, for tethering its cDNA to telomeric DNA, and the interaction sites of Ty5 (six amino acids in the integrase domain) with Sir4 (a region near the C terminus) have been characterized (12). In contrast to the Ty elements of S. cerevisiae, Tf elements of Schizosaccharomyces pombe cluster 100 to Department of Genetics, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA. *E-mail: kazazian@mail.med.upenn.edu REVIEW 12 MARCH 2004 VOL 303 SCIENCE www.sciencemag.org1626 400 nucleotides upstream of Pol II­transcribed genes (13). The retroviruses HIV (human immunodeficiency virus) and MLV (mouse leukemia virus) share many structural features with LTR retrotransposons. In general, HIV inserts into many sites throughout actively transcribed genes (14), whereas MLV integrates preferentially into the promoters of active genes (15). The preference of retroviruses for insertion sites in and around genes may explain the occurrence of leukemia-producing insertions into the promoter of the LMO-2 gene in 2 of 10 patients undergoing retroviral gene therapy for severe combined immunodeficiency (16). Non-LTR Retrotransposons Non-LTR retrotransposons are typified by LINE-1 (long interspersed nucleotide elements­1, or L1) elements of mammals. Fulllength non-LTR retrotransposons are 4 to 6 kb in length and usually have two open reading frames (ORFs), one encoding a nucleic acid binding protein, and the other encoding an endonuclease and an RT (Fig. 1). Because these elements encode activities necessary for their retrotransposition, they are called autonomous even though they probably also require host proteins to complete retrotransposition. Some non-LTR retrotransposons integrate at specific genomic sites. R1 and R2 of Drosophila melanogaster and Bombyx mori integrate at specific ribosomal RNA gene locations (17), whereas heT-A and TART elements help maintain the telomeres of Drosophila melanogaster chromosomes (18) and TRAS1 and SART1 integrate into telomeric repeats of B. mori (19). In contrast, mammalian L1 elements apparently integrate at a very large number of sites in the genome because their endonuclease prefers to cleave DNA at a short consensus sequence (5 -TTTT/A-3 , where / designates the cleavage site) (20, 21). Our knowledge of most of the steps leading to retrotransposition of non-LTR retrotransposons is sketchy except for the reverse transcription process. In contrast to reverse transcription of LTR retrotransposons and retroviruses, this process takes place on nuclear genomic DNA through target primed reverse transcription, or TPRT (Fig. 2) (22, 23). The great majority of mammalian L1 insertions are 5 truncated and much less than the full length of 6 kb. However, the mechanism of 5 truncation is still unclear. In about 30% of mammalian L1 insertions, but not in Drosophila R1 or R2 insertions, the 5 end of the insertion sequence is inverted. A likely explanation for this phenomenon is a variation on TPRT, called "twin priming" (Fig. 2 legend) (24). Retroelements Distinct from Both LTR and Non-LTR Retrotransposons Two infrequently observed families of retroelements distinct from both LTR retrotransposons and non-LTR retrotransposons have been described. One is the DIRS1-like family that lacks many characteristics of both LTR and non-LTR retrotransposons. Discovered in Dictyostelium discoideum, these elements have RT domains with homology to LTR retrotransposons, but they lack the aspartate protease and D,D35E integrase of LTR retrotransposons (25). They also lack typical LTRs, polyadenylate [poly(A)] tails, and target-site duplications. Their mechanism of integration is mysterious, but they may generate closed-circle DNA by reverse transcription, followed by integration using DNA recombination. The second family is an unusual class of elements, exemplified by Penelope of Drosophila virilis and Athena of bdelloid rotifers, which contain characteristics of both nonLTR and LTR retrotransposons (26). Like non-LTR retrotransposons, they are frequently 5 truncated and have variable-length TSDs. However, some have LTRs, either in a direct or inverted orientation. Importantly, their RT is disrupted by a short, classic intron that contains in-frame stop codons and frameshifts, and intronless elements have not been found. Moreover, their RT sequence is closeFig. 1. Classes of mobile elements. DNA transposons, e.g., Tc-1/mariner, have inverted terminal inverted repeats (ITRs) and a single open reading frame (ORF) that encodes a transposase. They are flanked by short direct repeats (DRs). Retrotransposons are divided into autonomous and nonautonomous classes depending on whether they have ORFs that encode proteins required for retrotransposition. Common autonomous retrotransposons are (i) LTRs or (ii) non-LTRs (see text for a discussion of other retrotransposons that do not fall into either class). Examples of LTR retrotransposons are human endogenous retroviruses (HERV) (shown) and various Ty elements of S. cerevisiae (not shown). These elements have terminal LTRs and slightly overlapping ORFs for their group-specific antigen (gag), protease (prt), polymerase (pol), and envelope (env) genes. They produce target site duplications (TSDs) upon insertion. Also shown are the reverse transcriptase (RT) and endonuclease (EN) domains. Other LTR retrotransposons that are responsible for most mobile-element insertions in mice are the intracisternal A-particles (IAPs), early transposons (Etns), and mammalian LTR-retrotransposons (MaLRs). These elements are not present in humans, and essentially all are defective, so the source of their RT in trans remains unknown. L1 is an example of a non-LTR retrotransposon. L1s consist of a 5 -untranslated region (5 UTR) containing an internal promoter, two ORFs, a 3 UTR, and a poly(A) signal followed by a poly(A) tail (An). L1s are usually flanked by 7- to 20-bp target site duplications (TSDs). The RT, EN, and a conserved cysteine-rich domain (C) are shown. An Alu element is an example of a nonautonomous retrotransposon. Alus contain two similar monomers, the left (L) and the right (R), and end in a poly(A) tail. Approximate full-length element sizes are given in parentheses. [Modified from (31)] R E V I E W www.sciencemag.org SCIENCE VOL 303 12 MARCH 2004 1627 ly related to that of telomerase. The presence of the RT strongly suggests that these elements are mobilized through an RNA intermediate, but the RT-disrupting intron means that they must have used an RT derived in trans from another genomic source. Retrotransposons--Drivers of Genome Evolution Genome evolution in eukaryotes has been driven by a number of processes, including the breakage and rejoining of different chromosomes (translocations), gene and segmental duplication, the shuffling of functional domains in exons, and gene conversion. NonLTR retrotransposons have had a very long history over some 500 to 600 million years. They contain an RT that is similar to the RT of the mobile group II introns that occur in mitochondrial and chloroplast genomes of fungi and plants, and certain bacterial genomes (27, 28). They also inhabit some yeast genomes, including that of Candida albicans (29). Their early evolutionary role is murky, but during recent times within mammals, they have been another important force in genome change. Mammalian L1 elements affect the genome in many unusual ways, both destructive and constructive (Fig. 3). The destructive processes include insertion, and rearrangement due to homologous recombination. The average human diploid genome has 80 to 100 active L1s (30), and L1 insertions account for about 1 in 1200 human mutations, some of which cause disease (31). Moreover, at least 1 in every 50 humans has a new genomic L1 insertion that occurred in parental germ cells or in early embryonic development (32­34). In contrast, laboratory strains of mice have an estimated 3000 active L1 elements in their genomes (34), and L1 insertions are a much greater fraction of disease-producing mutations in the mouse than they are in humans (31). A canine L1 insertion disrupting the factor IX gene produces hemophilia B (35). Because active L1s have also been isolated from gorilla DNA (36), it seems likely that all mammals have active L1 elements that can be copied into new genomic locations and can occasionally produce disease. In contrast to many other mobile elements, L1s have a marked cis preference, whereby their proteins greatly prefer to act on the RNA that encodes them (37). Nevertheless, they are still able on occasion to mobilize nonautonomous sequences in trans. Because the short interspersed nucleotide elements (SINEs) and LINEs of many species share homologous sequences at their 3 end upstream of the poly(A), it is postulated that the RT encoded by these "stringent" LINEs interacts with the shared 3 -end sequence to mobilize the SINE in trans. Trans mobilization of an eel SINE by an eel LINE has been demonstrated in cultured human HeLa cells (38). Human Alu elements are another SINE that are probably mobilized by LINEs. These 300-bp elements, derived from 7SL RNA, do not encode proteins, yet have expanded to 1.1 million copies, or 11%, of the human genome. Their B1 homologs make up almost 3% of the mouse genome. Alu insertions have accounted for over 20 cases of human genetic disease, and Alu retrotransposition events occur in at least 1 in every 30 individuals (31). Recently, trans mobilization of a transfected, marked Alu by an active human L1 was demonstrated in cultured HeLa cells (39). In addition, retrotransposition of a transfected Alu mediated by an endogenous L1 was demonstrated in cultured cells treated with an inhibitor of topoisomerase II (40). Moreover, a single mouse B1 insertion has recently been found, suggesting that presentday mouse L1s can also act occasionally in trans (41). Fig. 2. Reverse transcription mechanisms. (A) Reverse transcription of LTR retrotransposons and retroviruses begins with the copying into DNA of the region near the 5 end of the RNA using a tRNA primer (a and b), followed by degradation of the 5 region of the RNA (c), a jump of the newly synthesized DNA to the 3 end of the RNA (d), and completion of synthesis of the first strand (e). Next, the elementencoded RNAse H degrades most of the RNA (f). Then, the short remaining RNA primes the synthesis of the right end of the second DNA strand using the first DNA strand as template (g). Another jump of second-strand DNA to the left end of the DNA (h) is followed by completion of second-strand synthesis (i). During the process, LTRs are formed. [Modified from (9)] (B) Reverse transcription of non-LTR retrotransposons begins with nicking of the bottom strand of DNA by the endonuclease, leaving a 5 -PO4 and a 3 -OH. The 3 -OH then serves as a primer with the element RNA (R1, R2, L1, etc.) as template for the RT. Because reverse transcription occurs on the target DNA after cleavage, the process is called target primed reverse transcription, or TPRT (22, 23). [Modified from (31)] In a variation of TPRT, called "twin priming," inversions are formed (not shown). Here, it is proposed that the second strand of DNA is cleaved during reverse transcription of the first strand, and the 3 -OH of the second strand becomes a second primer for reverse transcription internally on L1 RNA. Resolution of this second cDNA produces the inversion (24). R E V I E W 12 MARCH 2004 VOL 303 SCIENCE www.sciencemag.org1628 Two complementary reasons for the large number of Alus in the human genome in the face of present-day L1 cis preference have been suggested. They are the simultaneous occurrence at a particular evolutionary time of a highly trans-active L1 subfamily and transcription of Alu sequences susceptible to mobilization. Genome analysis suggests that a large burst of Alu insertions (and processed pseudogenes) occurred 40 million years ago when three presently inactive L1 subfamilies were prevalent and perhaps contained a large number of active members (42). At that time, Alu elements were special in their ability to gain access to the L1 retrotransposition machinery (39, 43). Alu RNA binds the SRP9/14 subunit of the signal recognition particle, bringing it into proximity with ribosomes and nascent L1 proteins on L1 RNA. But Alu sequence evolution has resulted in a decline in SRP 9/14 binding to Alu RNA during primate evolution (44), suggesting that 40 million years ago Alu RNA had an enhanced ability to gain access to L1 proteins. Processed pseudogenes and SVA elements are two other nonautonomous retrotransposons that are probably mobilized by human L1s because, like Alus, they end in poly(A), have L1-type TSDs, and insert at L1 endonuclease cleavage sites. A processed pseudogene arises by reverse transcription of a cellular mRNA followed by integration of the resulting cDNA into the genome. Roughly 5000 processed pseudogenes exist in the human genome, accounting for 0.5% of its mass. Processed pseudogenes are not usually transcribed because they lack an external promoter. Human L1s probably drive low-level retrotransposition of processed pseudogenes in cultured cells (36, 45). To date, no disease-causing insertions of processed pseudogenes have been found. SVA elements are nonautonomous, composite sequences containing a SINE derived from a human endogenous retrovirus (SINE-R), a variable number of tandem repeats (VNTR) segment, and a partial Alu sequence. Although there are only a few thousand of these elements in the human genome, SVA insertions have been found in three cases of human disease, and thus may be currently mobilized at a high frequency. One insertion has further hallmarks of Fig. 3. Non-LTR retrotransposons are drivers of genome evolution. (A) Generally destructive mechanisms are (1) insertion of L1 elements, usually 5 truncated or 5 inverted; (2) trans-driven insertion of processed pseudogenes, Alus, and SVAs; (3) deletions and duplications due to unequal homologous recombination between Alus or L1s; (4) occasional deletions or inversions occurring upon insertion of L1s; and (5) segmental duplications leading to deletions and duplications. (Here a double crossing over facilitated by pairing at Alus moves a segment of DNA from one chromosome to another. Subsequent segregation places the two homologous segments in the same diploid genome.) (B) Generally constructive mechanisms are (1) repair of double-strand breaks by L1 insertion; (2) 3 or 5 transduction; (3) formation of chimeric retrogenes; (4) use of L1 or Alu sequence in coding regions of genes; (5) expression of genes 5 to full-length L1s via an antisense promoter in L1; and (6) premature cleavage of gene transcripts at strong poly(A) signals in L1. Not shown are potential roles in the origin of eukaryotic telomerase and X-chromosome inactivation. R E V I E W www.sciencemag.org SCIENCE VOL 303 12 MARCH 2004 1629 an L1-mediated event: namely, sequence derived from the 3 flank of an element, called a 3 -transduced sequence (see below); and a 5 inversion (46). Indeed, this event fulfills a prediction that, because of 3 transduction followed by severe 5 truncation, some L1-driven insertions could completely lack retrotransposon sequence (47). The insertion into an -spectrin gene contains only 3 -transduced sequence that is partially inverted and completely lacks its full-length SVA parent. L1s can also produce large DNA rearrangements upon insertion. Analyses of numerous L1 insertions in cultured cells have shown that about 10% are associated with large deletions of genomic DNA (48, 49). One naturally occurring L1 insertion associated with a large deletion has been found in the mouse (50). L1s and Alus provide material for DNA mispairing and unequal crossing over (homologous recombination), leading to deletion or duplication of sequences between the repeats. A number of these events have involved Alus, whereas only a few involving larger L1 elements have been described (31). The small number of mispairing and unequal crossingover events between L1s is somewhat surprising, but may relate to the relatively low representation of L1s in regions of high gene density, in contrast to the much higher density of Alus presently in these regions. [Because Alu insertion is dependent on L1 machinery, Alus and L1s have similar insertion sites (51). Thus, the present distribution of these elements may reflect evolutionary selection against L1s in generich regions.] Similarly, homologous recombination between Alus may have been involved in the genesis of segmental duplications, duplicated sequence blocks of 200 to 400 kb that account for up to 5% of the human genome. When these homologous sequence blocks are within 5 Mb of each other, they have an important role in human disease, producing large deletions, duplications, and inversions secondary to mispairing and unequal crossing over (52). A high proportion of Alu elements (29%) at the ends of segmental duplications suggests that many were generated by Alu mispairing followed by homologous recombination (53). Offsetting these potentially destructive processes for the genome, L1s are constructive in numerous ways. First, they occasionally repair double-strand breaks in DNA by inserting into the genome via an endonuclease-independent pathway. Rare instances of this "bandage" phenomenon have been observed in vivo, but endonuclease-independent L1 insertions are common in cultured cells that are defective in DNA-repair proteins, e.g., XRCC4 (54). Second, L1 retrotransposition can often move sequences 3 to a parental L1 to a new genomic location. Because L1s contain a weak RNA cleavage and polyadenylation signal, their transcript is frequently not cleaved at the 3 end of the L1 but instead is cleaved after a downstream poly(A) signal. By this mechanism, 10 to 20% of recent L1 retrotranspositions contain sequences derived from the 3 flank of the parental L1, called 3 transductions. These events have the potential to shuffle exons and regulatory sequences to new genomic sites (47). Occasionally 5 transduction due to initiation of transcription from a promoter upstream of a full-length L1 also occurs. Third, L1 retrotransposition can produce new chimeric retrogenes that are often expressed. These genes are probably generated through template switching of L1 RT from L1 RNA or Alu RNA to other small nuclear RNAs. In the human genome sequence, there are some 80 chimeric retrogenes whose 5 regions originate from small nuclear RNAs, such as U6, U3, U5, and 5S RNA, and whose 3 regions are the 3 ends of L1 or Alu elements (55). Fourth, retrotransposons have shaped mammalian genomes by providing their sequences for a number of protein-coding exons of genes. In the human genome, L1 or Alu sequences are present in nearly 200 confirmed and 2400 predicted protein-coding sequences (56). However, amino acids translated from these sequences still need to be demonstrated in the protein products of these genes. Fifth, L1 retrotransposons can also affect gene expression. They contain an antisense promoter in the 400 to 600 region of their 5 UTR, and a number of expressed genes located 5 to full-length L1s have alternate transcription start sites in this L1 region (57). Moreover, because there are a number of strong poly(A) signals embedded in L1 sequence, L1 transcripts can also be cleaved prematurely (58). This means that an L1 positioned in the transcriptional sense orientation in an intron of a gene may cause a reduction in the gene's transcript level. In addition, ancient mobile elements probably provided sequences for key host proteins and may have a role in other important biological processes. (i) A DNA transposon is the likely source of RAG1 and RAG2, the recombinase-activating proteins that carry out V(D)J recombination of immunoglobulin genes (59). (ii) An ancient retrotransposon may have provided an important enzymatic activity, telomerase, for the eukaryotic cell. DNA ends of chromosomes, telomeres, are maintained by telomerase, an RT that acts via TPRT and is closely related structurally to the RT of non-LTR retrotransposons (60, 61). As we learn more about the vast array of nonLTR retrotransposons, it appears likely that eukaryotic telomerase had its origin from a retrotransposon RT (26). (iii) Although the evidence is only circumstantial, L1 elements may serve as "booster stations" for the spread of gene inactivation transmitted by Xist RNA in X-chromosome inactivation (62). Genome Size and Mobile Element Clades S. cerevisiae contains only a handful of retrotransposon types, or clades, and each clade contains less than 100 elements. Retrotransposons make up a small fraction of the yeast genome, probably because their rate of retrotransposition is rather low, about 10 5 to 10 7 per generation, and their rate of removal by recombination between LTRs is high (63) (Table 1). On the other hand, although the genomes of other organisms, such as Drosophila and various fish, contain a large number of different clades of both LTR and nonLTR retrotransposons, relatively little genome space is devoted to retrotransposons (4% of the Drosophila genome). Although Table 1. Mobile element dynamics in model organisms. Tns, DNA transposons; Rtns, retrotransposons. Organisms are budding yeast, S. cerevesiae; mustard weed, A. thaliana; roundworm, C. elegans; fruit fly, D. melanogaster; mouse, M. musculis; human, H. sapiens. Organism Mobile element type (% of genome) Active element(s) Estimate of insertion freq. per generation Estimate of removal freq. Tns LTR Rtns Non-LTR Rtns Budding yeast 0 3 0 LTR Rtn 10 5 ­10 7 * High (LTR recombination) Mustard weed 5 5 0.5 Tn, LTR Rtn ? ? Roundworm 12 0 0.4 Tn Very low ?(Low) Fruit fly 0.3 2.7 0.9 Tn, LTR Rtn, non-LTR Rtn 10 1 ­10 2 High (deletion and selection) Mouse 0.9 10 27 LTR Rtn, non-LTR Rtn 10 1 Low Human 3 8.5 35 Non-LTR Rtn 10 1 § Low *See (63). Mobile element insertion rates for P and I element hybrid dysgenesis crosses are 10°. In natural crosses, transposition and retrotransposition rates are 10 1 to 10 2 [for copia and Doc, see (65); for mariner, see (66)]. See (67). §See (31). R E V I E W 12 MARCH 2004 VOL 303 SCIENCE www.sciencemag.org1630 Drosophila elements, such as P and I, insert at frequencies of one per meiosis in hybrid dysgenesis crosses (64), and other elements such as copia, doc, and mariner move at relatively high frequencies of 10 1 to 10 3 per generation (65, 66), both selection and deletion of elements after insertion probably account for the small number of each element type in the fly genome (67). Similarly, pufferfish have six clades of non-LTR retrotransposons and eight clades of LTR retrotransposons, but a total of only about 5000 retrotransposons (68). In contrast, humans and mice have a very small number of non-LTR retrotransposon clades ( six), but a very large number of total non-LTR retrotransposons ( 1,500,000) (1, 2). Although the combined rate of retrotransposition for the autonomous (L1s) and nonautonomous (Alus, processed pseudogenes, SVAs) retrotransposons is probably 10 1 per generation, the clearance rate due to deletion must be very much lower than that in the Drosophila and pufferfish genomes (67). Primarily because of these two factors, the human genome is 20 times as large as the Drosophila genome and 8 times as large as the pufferfish genome. Controlling Mobile Elements Although transposable elements are continuously entering new genomic sites, phenotypealtering mutations caused by their insertions are much less frequent than are point mutations in most organisms, with the exception of fruit flies, corn, and wheat. Indeed, transposable elements that alter phenotype were discovered only after many years of genetic analysis. Although many genomes contain a large number of active elements, they remain reasonably stable, perhaps because 10% of the genome in organisms with highly active mobile elements, such as mice and humans, consists of protein-coding and regulatory sequences (1, 2). Similarly, only a small fraction of the maize genome consists of genes and regulatory sequences (3). Thus, with notable exceptions like Drosophila, transposable-element mobility is low in small genomes, where genes constitute a large fraction. In large genomes, with more active elements, only a small fraction of the genome is susceptible to deleterious insertions. Yet, in both of these scenarios, the host places further controls on mobility. At least two control mechanisms are known: (i) cosuppression usually mediated by small interfering RNA (siRNA) and (ii) methylation. During cosuppression, both the expression of an introduced transgene and its endogenous homologs are suppressed. Both transcriptional and posttranscriptional cosuppression of Ty1 retrotransposition in S. cerevisiae have been demonstrated, although the mechanisms remain unknown (69, 70). Cosuppression of Drosophila I factor, a nonLTR retrotransposon--probably by an siRNA mechanism--has also been observed (71). Perhaps the best-characterized regulation of a mobile element is that of siRNA action on the Tc1 transposon of C. elegans (72). Tc-1 transposition occurs only in somatic cells and is completely suppressed in germ cells. The mechanism underlying normal suppression begins with readthrough transcription of the transposon from an upstream C. elegans gene. Double-strand RNA (dsRNA) of the terminal inverted repeats (TIRs) forms as a result of "snap back" of one TIR onto the other. The 54-nucleotide (nt) TIR dsRNA is then cleaved to 20 to 27 nt by the RNAse III­like enzyme DCR-1 (dicer) to produce the siRNA, leading to destruction of Tc1 RNA by the standard RNA interference mechanism. Mutants of suppression lack Tc1 siRNA and allow germline transposition to occur. Methylation of mobile elements is another control device used in nature (73). Mouse intracisternal A particles (IAPs) are LTRcontaining, retroviral-like retrotransposons that frequently cause disease by insertion into genes (31). A direct correlation exists between demethylation of mouse IAPs and an increase in their expression (74). In addition, other mammalian retrotransposons are hypomethylated in germ cells and in very early development when they are able to retrotranspose, and hypermethylated in somatic cells where their expression is not detectable and they cannot be mobilized. However, the role of methylation in controlling retrotransposition is still unclear. Repetitive DNA, including multiple copies of an LTR retrotransposon, is largely unmethylated, whereas genes are mostly methylated in an invertebrate (75). Therefore, study of the rate of retrotransposition of a marked retrotransposon introduced into the genomes of both normal and methylation-defective mice would be useful. Present and Future Uses of Mobile Elements For many years, P transposable elements of Drosophila have been a powerful tool for insertional mutagenesis, providing a method to link phenotype with genomic sequence (76). Recently, bacterial transposons have also been successfully used as insertional mutagens to study the function of 50% of the annotated genes in S. cerevisiae (77). To aid DNA sequencing, bacterial transposons have been inserted randomly into DNA from various sources, including fragmented bacterial artificial chromosomes and cDNAs. The mutagenized fragments are then separated, and sequencing reactions are performed using primers complementary to transposon end sequences (78). Young L1s and Alus are polymorphic as to presence in the human genome, meaning that an L1 at a particular locus may be present at that site in 100% of human genomes. These polymorphic elements can then be used to track the migration of human populations, or if the elements are present in some species and not others, they can be used to determine the evolutionary history of those species (79). Moreover, because L1 alleles at a locus can also vary in their capability to retrotranspose (80), the potential for individual variation in retrotransposition capability is great. Mobile elements will soon be useful in determining the function of many mammalian genes after gene knockout by insertional mutagenesis. A consensus sequence of the fish Tc1/mariner-type DNA transposon, called Sleeping Beauty (SB), has been constructed. The transposase of this rejuvenated element is 20 to 40 times as active as natural transposases of the Tc1/mariner family. When the transposon is inserted into the genome of mice that already contain the SB transposase, it is mobilized in the subsequent generation from its genomic location to another genomic site at a rate of one to two insertions per offspring (81, 82). However, as expected, insertions are heavily concentrated close to the original transposon site; about 50% are within 3 Mb and 80% are on the same chromosome as the original transposon (82). On the other hand, L1 elements offer the potential for generating retrotranspositions at random sites throughout the genome. Retrotransposition from human L1 transgenes has been obtained in mice (32), and present insertion frequencies are 1 in every 15 to 20 offspring. With further improvements, this system may have substantial practical value for making random gene knockouts to determine gene function. The SB transposon has also proven useful as a gene-delivery vector to liver cells in animal systems. In long-term studies in mice, factor IX deficiency and tyrosinase deficiency have been corrected with SB transposon vectors (83, 84). Summary Over millions of years of evolution, mobile elements have achieved a balance between detrimental effects on the individual and long-term beneficial effects on a species through genome modification. Indeed, we may soon learn that the shaping of the genome by mobile elements has played an important role in events leading to speciation. Whether these repeated sequences are now "junk DNA" is a complex issue. Some may have had an important function long ago, but have lost that role today. Others may never have had a function, yet the cluttering of our R E V I E W www.sciencemag.org SCIENCE VOL 303 12 MARCH 2004 1631 genomes with nonfunctional DNA was a small price to pay for the genome malleability they provided. References and Notes 1. International Human Genome Sequencing Consortium, Nature 409, 860 (2001). 2. Mouse Genome Sequencing Consortium, Nature 420, 520 (2002). 3. P. SanMiguel et al., Science 274, 765 (1996). 4. J. Brosius, H. Tiedge, Virus Genes 11, 163 (1995). 5. N. L. Craig, R. Craigie, M. Gellert, A. M. Lambowitz, Eds., Mobile DNA II (American Society for Microbiology, Washington, DC, 2002). 6. K. Mizuuchi, T. Baker, in Mobile DNA II, N. L. Craig, R. Craigie, M. Gellert, A. M. Lambowitz, Eds. (American Society for Microbiology, Washington, DC, 2002), pp. 12­23. 7. N. L. Craig, in Mobile DNA II, N. L. Craig, R. Craigie, M. Gellert, A. M. Lambowitz, Eds. (American Society for Microbiology, Washington, DC, 2002), pp. 3­11. 8. M. J. Curcio, K. M. Derbyshire, Nature Rev. Mol. Cell Biol. 4, 865 (2003). 9. D. F. Voytas, J. D. Boeke, in Mobile DNA II, N. L. Craig, R. Craigie, M. Gellert, A. M. Lambowitz, Eds. (American Society for Microbiology, Washington, DC, 2002), pp. 631­662. 10. D. L. Chalker, S. B. Sandmeyer, Genes Dev. 6, 117 (1992). 11. S. E. Devine, J. D. Boeke, Genes Dev. 10, 620 (1996). 12. Y. Zhu, J. Dai, P. G. Fuerst, D. F. Voytas, Proc. Natl. Acad. Sci. U.S.A. 100, 5891 (2003). 13. N. J. Bowen, I. K. Jordan, J. A. Epstein, V. Wood, H. L. Levin, Genome Res. 13, 1984 (2003). 14. A. R. Schroder et al., Cell 110, 521 (2002). 15. X. Wu, Y. Li, B. Crise, S. M. Burgess, Science 300, 1749 (2003). 16. S. Hacein-Bey-Abina et al., Science 302, 415 (2003). 17. J. L. Jakubczak, Y. Xiong, T. H. Eickbush, J. Mol. Biol. 212, 37 (1990). 18. M. L. Pardue, P. G. Debaryshe, Annu. Rev. Genet. 37, 485 (2003). 19. H. Takahaski, S. Okazaki, H. Fujiwara, Nucleic Acids Res. 25, 1578 (1997). 20. Q. Feng, J. V. Moran, H. H. Kazazian Jr., J. D. Boeke, Cell 87, 905 (1996). 21. J. Jurka, Proc. Natl. Acad. Sci. U.S.A. 94, 1872 (1997). 22. D. D. Luan, M. H., Korman, J.L. Jakubczak, T. H. Eickbush Cell 72, 595 (1993). 23. G. J. Cost, Q. Feng, A. Jacquier, J. D. Boeke, EMBO J. 21, 5899 (2002). 24. E. M. Ostertag, H. H. Kazazian Jr., Genome Res. 11, 2059 (2001). 25. J. Cappello, K. Handelsman, H. F. Lodish, Cell 43, 105 (1985). 26. I. R. Arkhipova, K. I. Pyatkov, M. Meselson, M. B. Evgen'ev, Nature Genet. 33, 123 (2003). 27. H. S. Malik, W. D. Burke, T. H. Eickbush, Mol. Biol. Evol. 16, 793 (1999). 28. M. Belfort, V Derbyshire, M. M. Parker, B. Cousineau, A. M. Lambowitz, in Mobile DNA II, N. L. Craig, R. Craigie, M. Gellert, A. M. Lambowitz, Eds. (American Society for Microbiology, Washington, DC, 2002), p. 761. 29. T. J. D. Goodwin, J. E. Ormandy, R. T. M. Poulter, Curr. Genet. 39, 83 (2001). 30. B. Brouha et al., Proc. Natl. Acad. Sci. U.S.A. 100, 5280 (2003). 31. E. M. Ostertag, H. H. Kazazian Jr., Annu. Rev. Genet. 35, 501 (2001). 32. E. M. Ostertag et al., Nature Genet. 32, 655 (2002). 33. E. T. Luning Prak, A. W. Dodson, E. A. Farkash, H. H. Kazazian Jr., Proc. Natl. Acad. Sci. U.S.A. 100, 1832 (2003). 34. J. L. Goodier, E. M. Ostertag, K. Du, H. H. Kazazian Jr., Genome Res. 11, 1677 (2001). 35. M. B. Brooks, W. K. Gu, J. L. Barnas, J. Ray, K. Ray, Mamm. Genome 14, 788 (2003). 36. G. D. Swergold, personal communication. 37. W. Wei et al., Mol. Cell. Biol. 21, (2001). 38. M. Kajikawa, N. Okada, Cell 111, 433 (2002). 39. M. Dewannieux, C. Esnault, T. Heidmann, Nature Genet. 35, 15 (2003). 40. C. R. Hagan, R. F. Scheffield, C. M. Rudin, Nature Genet. 35, 219 (2003). 41. J. M. Bomar et al., Nature Genet. 15, 270 (2003). 42. K. Ohshima et al., Genome Biol. 4, R74 (2003). 43. J. D. Boeke, Nature Genet. 16, 6 (1997). 44. H. Fan, J. L. Goodier, J. R. Chamberlain, D. R. Engelke, R. J. Maraia, Mol. Cell. Biol. 18, 3201 (1998). 45. C. Esnault, J. Maestre, T. Heidmann, Nature Genet. 24, 363 (2000). 46. E. M. Ostertag, J. L. Goodier, Y. Zhang, H. H. Kazazian Jr., Am. J. Hum. Genet. 73, 1444 (2003). 47. J. V. Moran, R. J. DeBerardinis, H. H. Kazazian Jr., Science 283, 1530 (1999). 48. N. Gilbert, S. Lutz-Prigge, J. V. Moran, Cell 110, 315 (2002). 49. D. E. Symer et al., Cell 110, 327 (2002). 50. S. M. Garvey, C. Rajan, A. P. Lerner, W. N. Frankel, G. A. Cox, Genomics 79, 146 (2002). 51. I. Ovchinnikov, A. B. Troxel, G. D. Swergold, Genome Res. 11, 2050 (2001). 52. B. S. Emanuel, T. H. Shaikh, Nature. Rev. Genet. 2, 791 (20012) 53. J. A. Bailey, G. Liu, E. E. Eichler, Am. J. Hum. Genet. 73, 823 (2003). 54. T. A. Morrish et al., Nature Genet. 31, 159 (2002). 55. A. Buzdin et al., Nucleic Acids Res. 31, 4385 (2003). 56. W.-H. Li, Z. Gu, H. Wang, A. Nekrutenko, Nature 409, 847 (2001). 57. P. Nigumann, K. Redik, K. Matlik, M. Speek, Genomics 79, 628 (2002). 58. V. Perepelitsa-Belancio, P. Deininger, Nature Genet. 35, 363 (2003). 59. A. Agrawal, Q. M. Eastman, D. G. Schatz, Nature 394, 744 (1998). 60. J. Lingner et al., Science 276, 561 (1997). 61. M. Meyerson et al., Cell 90, 785 (1997). 62. M. F. Lyon, Cytogenet. Cell Genet. 80, 133 (1998). 63. M. J. Curcio, D. J. Garfinkel, Proc. Natl. Acad. Sci. U.S.A. 88, 936 (1991). 64. A. Bucheton, I. Busseau, D. Teninges, in Mobile DNA II, N. L. Craig, R. Craigie, M. Gellert, A. M. Lambowitz, Eds. (American Society for Microbiology, Washington, DC, 2002), p. 796. 65. E. G. Pasyukova, S. V. Nuzhdin, D. A. Filatov, Genet. Res. 72, 1 (1998). 66. D. Garza, M. Medhora, A. Koga, D. L. Hartl, Genetics 128, 303 (1991). 67. T. H. Eickbush, A. V. Furano, Curr. Opin. Genet. Dev. 12, 669 (2002). 68. J.-N. Volff, L. Bouneau, C. Ozouf-Costas, C. Fischer, Trends Genet. 19, 674 (2003). 69. Y. W. Jiang, Genes Dev. 16, 467 (2002). 70. D. J. Garfinkel, K. Nyswaner, J. Wang, J.-Y. Cho, Genetics 165, 83 (2003). 71. S. Jensen, M.-P. Gassama, T. Heidmann, Nature Genet. 21, 209 (1999). 72. T. Sijen, R. H. A. Plasterk, Nature 426, 310 (2003). 73. T. H. Bestor, Trends Genet. 19, 185 (2003). 74. C. P. Walsh, J. R. Chaillet, T. H. Bestor, Nature Genet. 20, 116 (1998). 75. M. W. Simmen et al., Science 283, 1164 (1999). 76. A. C. Spradling et al., Genetics 153, 135 )1999) 77. Y. Schevchenko et al., Nucleic Acids Res. 30, 2469 (2002). 78. A. Kumar, M. Snyder, Nature Rev. Genet. 2, 302 (2001). 79. M. A. Batzer, P. L. Deininger, Nature Rev. Genet. 3, 370 (2002). 80. S. M. Lutz, B. J. Vincent, H. H. Kazazian Jr., M. A. Batzer, J. V. Moran, Am. J. Hum. Genet. 73, 1431 (2003). 81. C. M. Carlson et al., Genetics 165, 243 (2003). 82. K. Horie et al., Mol. Cell. Biol. 23, 9189 (2003). 83. S. R. Yant et al., Nature. Genet. 25, 35 (2000). 84. E. Montini et al., Mol. Ther. 6, 759 (2002). 85. I acknowledge J. Goodier, E. Luning Prak, E. Ostertag, D. Babushok, and J. Moran for helpful comments on the manuscript, and the NIH for grant support. R E V I E W 12 MARCH 2004 VOL 303 SCIENCE www.sciencemag.org1632