CG920 Genomics Lesson 2 Genes Identification Jan Hejätko Functional Genomics and Proteomics of Plants, Mendel Centre for Plant Genomics and Proteomics, Central European Institute of Technology (CEITEC), Masaryk University, Brno hejatko(5)sei.muni.cz, www.ceitec.muni.cz INVESTICE DO ROZVOJE VZDĚLÁVÁNÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Literature ■ Literature sources for Chapter 02: Plant Functional Genomics, ed. Erich Grotewold, 2003, Humana Press, Totowa, New Jersey Majoros, W.H., Pertea, M., Antonescu, C. and Salzberg, S.L. (2003) GlimmerM, Exonomy, and Unveil: three ab initio eukaryotic genefinders. Nucleic Acids Research, 31(13). Singh, G. and Lykke-Andersen, J. (2003) New insights into the formation of active nonsensemediated decay complexes. TRENDS in Biochemical Sciences, 28 (464). Wang, L. and Wessler, S.R. (1998) Inefficient reinitiation is responsible for upstream open reading frame-mediated translational repression of the maize R gene. Plant Cell, 10, (1733) de Souza et al. (1998) Toward a resolution of the introns earlyylate debate: Only phase zero introns are correlated with the structure of ancient proteins PNAS, 95, (5094) Feuillet and Keller (2002) Comparative genomics in the grass family: molecular characterization of grass genome structure and evolution Ann Bot, 89 (3-10) Frobius, A.C., Matus, D.Q., and Seaver, E.C. (2008). Genomic organization and expression demonstrate spatial and temporal Hox gene colinearity in the lophotrochozoan Capitella sp. I. PLoS One 3, e4004 INVESTICE DO ROZVOJE VZDĚLÁVÁNÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Outline ■ Forward and Reverse Genetics Approaches ■ Differences between the approaches used for identification of genes and their function ■ Identification of Genes Ab Initio ■ Structure of genes and searching for them ■ Genomic colinearity and genomic homology ■ Experimental Genes Identification ■ Constructing gene-enriched libraries using methylation filtration technology ■ EST libraries Forward and reverse genetics i MINISTERSTVO ŠKOLSTVÍ EVROPSKÁ UNIE ■ mládeže a tělovýchovy INVESTICE DO ROZVOJE VZDĚLÁVÁNÍ pro konkurenceschopnost Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Outline ■ Forward and Reverse Genetics Approaches ■ Differences between the approaches used for identification of genes and their function ť. EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OPVidělávání mládeže a tělovýchovy pro konkurenceschopnost imi INVESTICE DO ROZVOJE VZDĚLÁVANÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Forward vs. Reverse Genetics Revolution in understanding the term „gene" „classical" genetics approaches „reverse genetics" approaches 5TTATATATATATATTAAAAAATAAAATAAAA G AAC AAAAAAG AAAATAAAATA... .3' ■ Kill ■ I insertional mutagenesis intron transmembrane region duplication trt-1 element GIF G2 En-1 ga att ca a gt c gt CAC TACAAG 1122 jcgtggagact 1123 e>ion intron transmembrane region duplication I En-1 element ...aat tea agt cgt gga gac tac act.. NSSRGDYT 1 Identification of the role of ARR21 gene • Hypothetical signal transducer in two-component system of Arabidopsis ť. EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OPVidělávání mládeže a tělovýchovy pro konkurenceschopnost imi INVESTICE DO ROZVOJE VZDĚLÁVANÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Identification of the role of ARR21 gene Recent Model of the CK Signaling via Multistep Phosphorelay (MSP) Pathway HPt Proteins • AHP1-6 NUCLEUS PM AHK sensor histidine kinases • AHK2 • AHK3 • CRE1/AHK4/WOL Response Regulators ^j^^^ý ARR1"24 REGULATION OF TRANSCRIPTION INTERACTION WITH EFFECTOR PROTEINS Identification of the role of ARR21 gene • Hypothetical signal transducer in two-component system of Arabidopsis • Mutant identified by searching in databases of insertional mutants (SINS-sequenced insertion site) using BLAST EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, mládeže a tělovýchovy OP Vzdělávání pro konkurenceschopnost tu imi INVESTICE DO ROZVOJE VZDĚLÁVANÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Identification of the role of ARR21 CJ©n© — isolation of insertional mutant Searching in databases of insertional mutants (SINS) Insert_SIHS: 01_09_64 Query: 8 0 tcctagcgttcatgagcgtaccatacttgacaanagagaacgtagccagccatttacagg 139 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Sbjct: 58319 tcctagcgttcatgagcgtaccatacttgacaagagagaaegtagceagccatttacagg 58378 Arr21: 1830 InsertSIHS Query: 140 010964 11tyatatctcttgtcaaaaatgtttttggatttt actgt 179 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Sbj ct: 58379 tttgatatctcttgtcaaaaatgtttttggattttaetgt 58418 ftrr21: 1890 Localization of dSpm insertion in genome sequence of ARR21 using sequenation of PCR products ATG|= D2 D1 K W 1727 bp 1728 bp P _16k- 16o_ Identification of the role of ARR21 gene • Hypothetical signal transducer in two-component system of Arabidopsis • Mutant identified by searching in databases of insertional mutants (SINS-sequenced insertion site) using BLAST • Expression of ARR21 in wild-type and inhibition of expression of ARR21 in insertional mutant confirmed at the RNA level i MINISTERSTVO ŠKOLSTVÍ EVROPSKÁ UNIE ■ mládeže a tělovýchovy INVESTICE DO ROZVOJE VZDĚLÁVÁNÍ pro konkurenceschopnost Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Identification of the role of ARR21 gene — analysis of expression wild type expression insertional mutant vs wild type íjěně cycles ACTIN 2/20 ACTIN 2/25 ARR21 / 30 A RR21 / 35 ARR21 /40 S 6 2 ■ i/t TJ >f m -U Ť; n TJ Of u 4P £ w n "D — i 2 m ? 1 1 « I I o * ta J in 1 o 3 roots p 4- — v) "D « : I m -si I water DNA n 1 (ft O) 7 = |3 EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, mládeže a tělovýchovy ť. gene / cycles primers ACTIN 2 / 25 aktU1 -aktL1 ARR21 /40 2UI -2LII ARR21 /40 1UII - 1LI ARR21 / 40 2UI -dsLb OP Vzdělávání pro konkurenceschopnost iui C o 0 0 >3 2 = controls water DNA INVESTICE DO ROZVOJE VZDĚLÁVANÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Identification of the role of ARR21 gene • Hypothetical signal transducer in two-component system of Arabidopsis • Mutant identified by searching in databases of insertional mutants (SINS-sequenced insertion site) using BLAST • Expression of ARR21 in wild-type and inhibition of expression of ARR21 in insertional mutant confirmed at the RNA level • Phenotype analysis of insertional mutant i MINISTERSTVO ŠKOLSTVÍ EVROPSKÁ UNIE ■ mládeže a tělovýchovy INVESTICE DO ROZVOJE VZDĚLÁVÁNÍ pro konkurenceschopnost Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Identification of the role of ARR21 gene - phenotype analysis of mutant Analysis of sensitivity to plant growth regulators 2,4-D a kinetin ethylene Light of various wavelengths 100 _ 30 o CM 10 No alterations - nor in flowering, neither in the number of the seeds EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, mládeže a tělovýchovy ť. OP Vzdělávání pro konkurenceschopnost imi 3 10 30 100 300 1000 kinetin p.g -11 INVESTICE DO ROZVOJE VZDĚLÁVÁNÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Identification of the role of ARR21 gene — possible reasons for the absence of the phenotype • Functional redundance within the gene family ť. EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OPVidělávání mládeže a tělovýchovy pro konkurenceschopnost imi INVESTICE DO ROZVOJE VZDĚLÁVANÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Identification of the role of ARR21 gene - homology of ARR genes Identification of the role of ARR21 gene - causes of absence of the phenotype • Functional redundance within the gene family? • Phenotype only under specific conditions ť. EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OPVidělávání mládeže a tělovýchovy pro konkurenceschopnost imi INVESTICE DO ROZVOJE VZDĚLÁVANÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Identification of the role of ARR21 gene - summary ■ Gene ARR21 identified by comparative analysis of Arabidopsis genome ■ Based on sequence analysis, its function was predicted ■ Site-specific expression of ARR21 gene was proved at the RNA-level ■ Identification of gene function by insertional mutagenesis in case of ARR21 in development of Arabidopsis was not successful, probably because of functional redundancy within the gene family INVESTICE DO ROZVOJE VZDĚLÁVÁNÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Outline ■ for identification of genes and their function Identification of Genes Ab Initio Structure of genes and searching for them ť. EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OPVidělávání mládeže a tělovýchovy pro konkurenceschopnost iMi INVESTICE DO ROZVOJE VZDĚLÁVANÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Genes Structure Promoter Polyadei signal ition ATG....ATTCAK 5'UTR ATTATCTGATATA ... .ATAAATAAATGCGA ____ 3'UTR tu RNA Splicing ť. EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OPVidělávání mládeže a tělovýchovy pro konkurenceschopnost imi INVESTICE DO ROZVOJE VZDĚLÁVANÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Identification of Genes Ab Initio ■ Omitting 5' and 3' UTR ■ Identification of translation start (ATG) and stop codon (TAG, TAA, TGA) ■ Finding donor (typically GT) and acceptor (AG) splicing sites ■ Many ORFs are not real coding sequences - in Arabidopsis, there are on average approximately 350 milion ORFs in every 900 bp of sequencer) ■ Using various statistic models (e.g. Hidden Markov Model - HMM, see recommended literature, Majoros et al., 2003) to evaluate and score the weight of identified donor and acceptor sites INVESTICE DO ROZVOJE VZDĚLÁVÁNÍ EVROPSKÁ UNIE W^F I mládeže a tělovýchovy pro Konkurenceschopnost mna* Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Splicing Site Prediction Programs for splice site prediction (specifity approximately 35 %) GeneSplicer (http://www.tigr.org/tdb/GeneSplicer/gene spl.html) SplicePredictor (http://deepc2.psi.iastate.edu/cgi-bin/sp.cgi) ť. EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OPVidělávání mládeže a tělovýchovy pro konkurenceschopnost iMi INVESTICE DO ROZVOJE VZDĚLÁVANÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky SplicePredictor BCB @ ISU Download Help Tutorial References Contact SplicePredictor - a method to identify potential splice sites in (plant) prc-mRNA by sequence inspection using Baycsian statistical models (click here to access the older method using logitlincar models) Sequences should be in the one-letter-code ({a,b,c,g,h,k,m,n,r,s,t,u,w,y}), upper or lower case; all other characters are ignored during input. Multiple sequence input is accepted in FAST A format (sequences separated by identifier lines of the form ">SQ;name_of_sequence comments'") or in CenBank format. Pasteyour genomic DNA sequence here: GAG GA GGC AC AAAAT G AC GMT ATACAAAAT G AT C TT AMC AGC T AAAC TATATT G GACAT T T T T T C GAT C TC A GAT AT A AAAGATTTCATTCAATATAATACTTGGATAAATACTCTTATTATTTTTCTTTAGTTTATTAAAAAAAACCTCTAATAAAT AC GAGT T T AAG T C CACAAAAT C GCT T AGAC TAAAATAC AC C AT AT AAT T T C AAAC GAT AAAGT T TACAAAAGT AATAT C C AAGT ATCTCATAGT CAACAT AT ATATAGTAATAAT TAGTTGACGT ATAAGAAAAT AAAAATAAATAAAT TAGTATCTTAT TTTGGGTGGTGCTGACTGGTGACTGGTGACTGCAGAATGCTCGGCAAATGGAACCATATCCCAAGACATGGGTTTTAGAT ... or upload your sequence file (specify file name): [ Browse,., ]'. ... or type in the GenBank accession number of your sequence: ť. EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OPVidělávání MLÁDEŽE A TĚLOVÝCHOVY pro konkurenceschopnost imi INVESTICE DO ROZVOJE VZDĚLÁVANÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky SplicePredictor Whjit do the output columns mean? SplicePredictor. Date run: Wed Nov Version oi February 13, 2U0b. 9 11:30:14 2Q0b Species: Model: Prediction cutoff {2 In [BF1} \ I.oca L prun i mj : Non-canonical sites; Homo sapiens 2-clans Bayesian 3.00 on not scored Sequence 1: your-sequence, from 1 to 9490. Pntonrial splice Aires CCGAATGCCTGAGATATTGTTTCCTAAAATGAGATGATTGTTTTTATTTATTACCATGATTTGTTTGTACTAAGCTTCCTTTCCCCTTTGCAATACATACLGATATAAATTCATACATGTTCCTAATTTTATTTT GGCTTACGGACTCTATAACAAAGGATTTTACTCTACTAACAAAAATAAATAATGGTACTAAACAAACATGATTCGAAGGAAAGGGGAAACGTTATQTATCCTATATTTAAGTATGTACAAGGATTAAAATAAAA BpuEl Bglll TGCACTTGAGTTTATGGTTTTCTTTGGTGGAAGATCTATATGTATCTATATCTATATTATTTTACTCTTTTCTTCGTCGTCATTTATAGTATATTATATATATGCACACACACACACAC^ TATA~GTATAGiCTC —- <— <— <— —- <— ACGTGAACTCAAATACCAAAASAAACCACCTTCTAGATATACATAGATATAGATATAATAAAATGAGAAAAGAAGCAGCAGTAAATATCATATAATATATATACGTGTGTGTGTGTGTGGATATACATATCGAG jtbal ppuEl AATTCTAGATAAAATATATAGAAATGGATCTTGAGAATCATTTTTTTTGTATTCTTTTGTTATCAAAGGGTTTCGACTTTGCTCCGAGGAAGAAGATAATATGAAAAGAGCTTTTTAGGGTTTATCATTCTCCT TTAAGATCTATTTTATATATCTTTACCTAGAACTCTTAGTAAAAAAAACATAAGAAAACAATAGTTTCCCAAAGCTGAAACGAGGCTCCTTCTTCTATTATACTTTTCTCGAAAAATCCCAAATAGTAAGAGGA t q i sequence 1 ■j rhq qaiHna * P'R G 1 A < - 75 ttttttCgatnlcAflrtl 0.973 7.16 0.000 0 7 (5 1 1) A <- - 134 attatttttctttAGtt 999 14.86 0.000 D 000 7 (5 1 1) A - 500 gattttgttgtttAGtc 977 7.4B o.ooo 0 000 7 (5 1 1) .-. < - - 780 LcLqt-Lal. Lq La LAGcL 986 8.56 0.000 0 7 (5 1 1) A <- - 948 tattttttgaaatAGat 968 €.80 0,000 0 000 7 (5 : 1) A < - - 1051 ■::cidd"_ t. LaAGaa 0. 930 5.19 0.000 a 7 (5 1 1! A - 1213 ttatttattttttAGtt . 998 12.14 0.000 a 7 (5 1 1) A - - 1373 tttcctctctr.anAGga 0.999 13.17 0.000 0 000 7 (5 1 li A <- - 14S7 tttatatattgatAGtg 0. as ■. 4.04 o.ooo 0 000 7 (5 1 1) A <- - isai atgtqttgcttqLAGqa . 982 8.03 0.000 0 7 (5 1 1] A <- - nai ggttgrgcgaaat-AGgg 0. 886 4.10 0.000 0 000 7 (5 1 11 A - 2440 ' AM I I HrlHHH I 1 1 AflH 1 0, 939 5.46 0.000 0 7 (5 1 U A <- - 2479 catctaaaattttAGat a. 942 5.59 0.000 0 000 7 (5 I 1. ----- > 2546 aagGTagta »09 4.,v. 0.885 : 903 1 .-. (5 5 5) - - 2572 L L1111LL q q u AGca 0. 930 5.16 0.000 0 000 7 (5 1 1 A <--- - 2763 ctcaaattcacaaAGgt . 873 3.86 0.185 0 :: 11 (5 5 1 A <--- - 2 7 62 L LLc;qt L'-caLLACJ^q 0. 952 5.98 0.220 000 11 (5 b 1 i--- - 3022 LLLqtLtq^atjLaAGcL 0. 956 6.16 0.221 000 11 (5 b 1] A - 304B r.rrrgcaa-a^AfAGga . 7.15 0.229 i: 000 1 1 (5 5 1] A - »171 egt egtea 11 tatAGta 0. I8S 8.74 0.000 ooo 7 (!> 1 1) A <— 3284 c: t tttyl '_a LcaaAGyy 0. 993 10.03 0.000 006 8 IS 1 2 J □ > 3372 aatGTaagg 0. 933 5.28 0.855 i 849 15 lb 5 5) A < — - 3451 rtat yc:t 1 cc:l f.gi Afi^a 0. 916 4 .77 0.2 93 065 12 [5 5 A <- - 3581 cgatcgccgttctAGgt 0. 850 3.47 0.000 0 000 7 [5 _ 1) n 3649 cacGTarra 0. 933 5.25 0.000 B48 1 1 (5 1 5) .-. . __ - 4254 attattgttcttcACat . 998 12.82 0.000 0 C'jv - (5 . H /■. <- - 4351 tttcttacaLLqcAGaa 0. 9.42 0.000 000 7 (5 1 1] /■. - - 4633 qLc:LtqLLLeLt:LAGqq 0. 8 ■■ 3 3.97 0.000 000 7 (5 1 1] H - 4976 cttgttgrrrr.tr.p,Gr.r . 952 5.98 0.000 i: 000 7 (5 1 1] A - - 5004 ttttttttttgccAGag 0. 11.17 0.000 000 7 !'■> 1 11 : ----> 5356 caaGTqaal 0. 821 3.04 0.387 000 11 (5 b 1 5384 rrgGTaaga . 94 : 5.54 0.478 i: 090 13 (5 5 A <- - 5403 actCtgt t tctttAGet 0. 894 4.26 0.000 0 ■■ [5 1 A <--- - 5441 ctttctctctaacAGaa »95 10.43 0.387 0 : 11 (5 5 1 H ^--- - 547? rrgrraaaarranAG^r 965 6.62 0.478 i: 090 13 (5 5 i' > 5745 qqqGTaaqa 0. 3.48 0. 390 l 956 15 (5 b 5] A <--- - 5806 catcatatcctaaAGgt . 948 5.83 0.458 0 :: 11 (5 5 1 i--- - 613 b qqletaL'-aLlalAGql 0. 13.59 0.508 050 12 (5 b 2] <- - 6552 qqatttlcacqtcAGaq 0. ■.:t 5.42 0.000 000 7 (5 1 1) i<—- 1-02931 Ibjow-1 M K R A F —1 1-uORF-1 L -XI1O07- J pcgl peg ^naBI TGACTTTGCAAJACGTGA44TGTAAGGCACTTTGATCGTTGTACTTTGTTGCTTTTTATACGTATCGCTTCCTACAATAACTTAACAATGCTTCCTCGTAGAATTGCAAAACATTTCTGCACCGTGATTTACAT -———H—........................3184 ACTGAAACGT1TTCCAC1TTACATTCCGTGAAACTAGCAACA1GAAACAACGAAAAA1ATGCATAGCGAAGGATG1TATTCAAT1GTTACGAAGSAGCATCTTAACGTTT1GTAAACACC1GGCACTAAATGTA rCOlCRI Sad ^vul GACTGAGCTC1TTTCAG1GGC11CT1TGCAGCAGCT1CT1CC1TGGAGGACTAATCAAGACAGAAATC1GT1CCTCTAAAAACGATCGCCGT1GTAGGTAATC1TGCCAT1CTTGACGAG1CT1GATC1TTAGA CTGACTCGAGAAAAGTCACCGAAGAAACGTCGTCGAAGAAGGAACCTCCTGATTAGTTCTGTCTTTAGACAAGGAGATTTTTGCTAGCGGCAAGATCCATTAGAACGGTAAGAACTGCTCAGAACTAGAAATCT F-sil BssSI jAsel ATCAAAT1TA1AAGGGA1CACGAGATACACG1ATTAAT1ATTA1TT1TTT1TT1TTTGCTTTTTGTGG1T ■ I.........I.........I.........I.........I.........I.........I........ TAGTTTAAATATTCCCTAGTGC1CTATGTGCATAAT1AATAATAAAAAAAAAAAAAACGAAAAACACCAAT. taWPTca Hindlll 'vllül T1CAC1CAAATGATGGTGAAAGTTACAAAGCTTGTGGCTTCACG1CCAATTGTGGTC CAAGTGAGT1TAC1A1CA1T~T1AATG~T TCGAACA1CGAAGT GCAGG~TAACAtCAj -Xckw- < V 1 TTT1GCG1CC1GGTAAT1CTGC1T1CTT1CT1CTAAAT1ATACGATGATTCTACATT1CTACTCATCTCGT1CTTGTT1TTCAAA1GATATAATTA1TGTGTG1ATATCACCCA1TCATGTATA1TTA1TGAAA .......I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I...... AAAACGCAGGACCATTAAGAtGAAAGAAAGAAGATTTAATATGCTACTAAGATGTAAAGATGAGTAGAGCAAGAACAAAAAGTTTACTATATTAATAACACACATATAGTGGGTAAGTACATATAAATAACTTT -exon 4 - psml pglll pspEI AATATAGGCAITCCIGGIGGTIGTITTCGAGIGCAIITGGATCICAAATTGGCGAACAACAACGGAGAACCIAGTCAAAGAGGTCGCTTCAT^TACCGAAGATCTCCGGACAAGICTAGTTICGGAGAITGAAA TTATATCCGTAAGGACCACCAACAAAAOCTCACGTAAACCTAGAGTTTAACCGCTTGTTGTTOCCTCTTGGATCAGTTTCTCCAGCGAAGTAAATGGCTTCTAGAGGCCTGTTCAGATCAAAGCCTCTAACTTT .AFLVVVFECIWISNWRTTTENLVKEVASFTEDLRTSLVSEIE 1020a 1 Splicing Site Prediction Programs for splice site prediction (specifity approximately 35 %) GeneSplicer (http://www.tigr.org/tdb/GeneSplicer/gene spl.html) SplicePredictor (http://deepc2.psi.iastate.edu/cgi-bin/sp.cgi) □ NetGene2 (http://www.cbs.dtu.dk/services/NetGene2/) INVESTICE DO ROZVOJE VZDĚLÁVÁNÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky NetGene2 wsss. ras EITiW ITS IS«3iSR3!ftWMaiOH:."*K!*! IK) NetGene2 Server The NetGene2 server is a service producing neural network predictions of splice sites in human. C. etegans and A. thaliai Instructions Output format Abstract Performanc SUBMISSION Submission of a local file with a single sequence: File in FASTA format_I Browse,,.~~) Human Co elegans (J A. thaliana [ Clear fields ) | Send file | CENTERFO RBIOLOGI CALSEQU ENCEANA LVSIS CBS STAFF CBS » Prediction Servers » NetGeno2 MSCMNi earn trawls Submission by pasting a single sequence: Sequence name C Human (_) c. elegans ® A. thaliana Sequence gaggaggcacaaaatgacgaatatacaaaatgatcttaaacagctaaactatattggacattttttcgatc I tcagatata aaagatttcattcaatataatacttggataaatactcttattatttttctttagtttattaaaaaaaacct ctaataaat acgagtttaagtccacaaaatcgcttagactaaaatacaccatataatttcaaacgataaagtttacaaaa | 1 Clear fields ]| Send file | NOTE: The submitted sequences are kept confidential and will be erased immediately after processing. NetGene2 Prediction done *** NetGene2 v. 2.4 ** The sequence: Sequence has the following composition; Length: 9490 nucleotides. 31.8% A, 17.04 C, 19.6% G, 31. 74 T, 0.0* X, 3< í.54 G+C Donor splice sites, direct s I. ran^ pos 51->3' phase strand confidence 5 r exon intron 2 1704 0 + D. . 8 7 TTCCAAACAC"GTTAATATTT 1906 0 + 0 . 99 CGGTGAACGGAGTCAGAACAT 3582 1 + 1. 00 GCCGTTCTAGAGTAATCTTGC 3765 1 + 1. 00 TTGCGTCCTGAGTAATTCTGC 4134 0 + 0 . 74 TCAAACACAGAGTTGTTAAAA 4619 : + 0. .74 AGCAAGAAAGAGTCTTGTTTC 4915 0 + D. . 94 CGTTCCTCTGAGTAAATACTG 5356 c + D. . 87 TCTCAACCAAAGTGAATGTTT 5384 l + 1. . 00 GA'_TTGGTTG" -.A:";-C'_C_ 5809 : + 1. . 00 TATCCTAAAGAGTGTGTCCAA 6057 0 + : . c c GCAGTCTTTGAGTAAGCTACT :: ':■< i + D. .74 CTCTTCACAAAGTAAATCTAG ii - 0 + i. . C C GGACTGCCÄAAGTAAGTTTAA 7886 0 + 0. .74 GAACAAAATGAGTTAGÄTGAA 9325 c + 0. .74 GAAGATTAGGAGTTTTTCTCT Donor splice sites, complement strand pos 3 T->5' pos 5'->31 phase strand confidence 5' Acceptor splice sites, direct strand exon intron pos 5'->31 phase strand confidence b1 intron exon 31 1213 0 i 0. r- TATTTTTTAGATTATGGAGAC 1221 2 ■ 0. .87 AGTTATGGAGAACAAGAATCG 1373 0 - 0. .71 TCTCTCACAGA GACACAGAAT 1487 1 + 0. .81 ATATTGATAGATGGGACATTA 4254 0 + 1. .00 TGTTCTTCAGAATCGCACCAT H 4832 2 • 0. .54 AAAATTGCAGATTCCAGTGGC 5004 0 + 0. ■J. TTTTTGCCAGAAGATACACAC 5472 1 + 0. .96 AAAATTACAGACTCTGCTCAA 6135 0 ■ 1, .00 ATTATTATAGA GTAAGATTAA H 6490 : 0. . -C AAAGTTACAG ATGGTGGAGAA 6744 0 - 0. .59 TGTCAAACAGATTTCGTAGAG 7447 0 0. .96 TTCTGCACAGAATGCCAGAAA 7780 2 + 0. .76 TCCATTTCAGAATACAGAACA 7786 2 • 0. .92 TCAGATACAG A AACACATGCA CCtiATi:CCTCA(;ATtntTTT[;qtAAiATQACAri:An[:íTtTTATrTAlfAí:CAfr.ATTTCTTtC.UC TAAl CtATArAiATICAtrCAtCTTCCTAATtTlATTtl [<6tTTACQQJ(TĽ.TAT AAC AAAGGATT I T Af.lf. T ACT AAfiAAAAATAAAT AATIiGl AT. TlAALAnALATGA tCCACTTCACTTTAIfiCTTTTCTITCCIWlAACATClATAHHTCtíTAtCTATATTATTTtACTCT' T A~TT AAQTAliJT Af.A AGGAT T AAAA7AAAA ľTCTTCCTCQTCAíTTATACtATATTATATArATCCACACAÍACACACACCiATAtCTAtAriC'r. ACGTGAACTCAAATACCAAAACAAACCACt T TCT&GATATAE AI ACATATAGAT11A AT/innATÚACAftŕlAQitAGrACG^ ■VI pfhMEI aa r "n aíjat aaaí t ti a; Ar:AAATí".catľ r :gai:aa rr.An * t; r r r^f at tí r i nmi aI IT AAfJATCIATI tt AT AT attttt ACCTA^AAfľTr tt ACAAAAA AAAC ATAAtAA/i Afľ AAT ATGTAACCCAC M TGATC&ITCTACTTTCTTGCTTT1TATACGTATCGCT' ; A. A' I _L j I !iAAAL ! AGĽ AA'^A 7 IJů AAC. a ALG.A AAA a [ a ''jC A IAGCGAAGGA GAřlínGCTCT M TCAOTGGÍTTÍ T TTGCACCAtf TTTľ AAGA^AGAAATCTGTTCCTCTAAAAACCbTCHCrCTTĹTAGQTAAlCTTGCFAlTrTT£ACGACTCTTGATCTTTAQA Ľ1GAC1ĽGAGAAAAGI Ľ»ĽLGAAGAAACQ tLG11 i AĽ'-j j 1 A L GAAL 1 !iL ' L AljA A! " A ji Ai 1 L 1 TAC T T TAAA T AT 1 CC' TTtTTCTAJUlTTATACCATCATTCTArATTTCTACTCATCTCGTTCTlGTTTTTCiAATGAIATilTIATTCTG IAAAGAACA1 11 U = AlGĽr AC t A AGiTG 1 AAAGAIGAGUQACLA ACAACAAA AAGT ľ 1 AC T Al Aľ UA T AACICACA I AT AG I GGSI AAC1 AĽA Ti1 AAA t A AL1 U ti* f I pili íUpEl '.ZLÍ r' LL" I ĽľJ I ;C 1 ľ G I ľ 1 I I'fjAC I UC A ľ' I GGAIC l CA AA1 f ĽGCCAAC AA'JAACGĽAĽAAĽC I AC ľ C AAA U AĽ1, r C CC l r [; Aľ i ! ACCĽA1.ĽA ľĽ I CL'C'JALAAr," Z i^r [ fCCC, ÍG I AAGGAt ĽAí CA.ACA.1AAGC TCA-T.G1 ÍASCL 1 AIjAL t f ľ AACtGĽt f'G( fliCÍ. 1 Ľ t IGGA 'C AG I , i r i. v v v r f c \ n \ i\ \ v a ■ ■ " r n i v i I C rcrAGIG'AQl AAAFG£C1 IC1AGAGGÍH t Cl 1ĽAGA1C AA.AGCC RNA Splicing and Adaptation ■ Flexibility in splicing site recognition in plants in practice -example of developmental plasticity of (not only) plants Identification of mutant with point mutation (transition G—>A) exactly at the splice site at the 5' end of the 4th exoAnNI TdsrnT AlwNI Bpml PflMI Asel Psil Spel Bell II II II CTGCGAAlTACAAAGITGTWTGTCnGAlCCTAAAlTGAATGCIClTGIG^ .........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I GAOGCiTmTGTiTCAACAAmACAGAAcraiGAiiTmciTACGAGmc^^ ■HHH^ RLVVVS. LVLIKVLYLQVC -PDR_UVb LsJ-no splicing- EXON 3-1— E LVKLT GAKTH EAKIN I INDVNGI IKPGR -PDR exon 3 ORF- Pst I Pvull BspMI Hpal Stul miTCITCniXTGlTGCR^m&CACIGITaľlTGGTa^a^ .....Tí........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I'" 1653 ATAAGAAGAAOGACAACGTCCAATTGrGACAAaGAACCAGGAGGAICGAa^ L F F L L L Q L T L L L G -no splicing- -pis1 EXON-4- CGKTTLLKALSGNLENNLK -pis1 exon 4 ORF- - EXON 4- LTLLLGPPSCGKTTLLKALSGNLENNLK -PDR exon 4 ORF- EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OPVidělávání '^7*^^ MLÁDEŽE A TĚLOVÝCHOVY pro konkurenceschopnost a .n a státním rozpočtem České republiky RNA Splicing and Adaptation Identification of mutant with point mutation (transition G—>A) exactly at the splice site at the 5' end of the 4th exon Analysis by RT PCR proved the presence of a fragment shorter than cDNA should be after the typical splicing event PDR Ula/PDR LI - 500 bp - 400 bp - 300 bp - 200 bp - 100 bp PDR Ulb/PDR Lib 500 bp 400 bp 300 bp 200 bp 100 bp RNA Splicing and Adaptation Flexibility in splicing site recognition in plants in practice example of developmental plasticity of (not only) plants Identification of mutant with point mutation (transition G—>A) exactly at the splice site at the 5' end of the 4th exon Analysis by RT PCR proved the presence of a fragment shorter than cDNA should be after the typical splicing event Sequenation of this fragment then suggested alternative splicing with the closest possible splice site in exon 4 AlwNI til PflMI |Asel Psil jSpel ^cll CTGC GAATT ACAAAGTT GT TAT TGTCT TG ATC CT AAATT GAATG CTC TT GTG TT TTC TATTT CT CCA GGAAC TGGTG AAGCT CACTGGT GCAAAAAC AC ATGAAGCC AAGAT AA ACA TT ATT AA TGA TGTTAAT GGC AT TAT AA AGC CA GGAAG GTTAG TAG TT GTC TC CTA AC TAGTT TTGAT CAAAGTTT TATAC CT TCAAG TGT GC T GA CGCTT AATGT TT CAACAATAAC AGAAC TAGGA TTT AA CTT AC GAGAA CAC AA AAG AT AAA GAGGT CC TTG AC CAC TT CGAGT GAC CACGT TT TTGTGTAC TT CGGTT CTA TT TGT AA TAA TT ACT AC AAT TACCGTAATATT TCG GT CCT TC CAA TC ATC AA CAG AG GAT TGATC AAAAC TAGTT TC AAAAT ATGGAAGT TC ACACG A LVLIKVLYLQVC -no splicing - MStl lr TA TT CTT CT TGC TGTTGCAGGT T A AC ACT GTT GC TTGGT CC CCTAGCTGCGG, AAA AC AAC TT TGT TAAAG GC CTT GT CTGGAAAT TT AGAAAACAAT CTAAAGGT TC TAA TG ATG AA AGC AG TTA TATCATT TTC TT GTGAA GAT TT TTT TG CTG CA GCT GT GTG AA GTT TGTAC CT TTT C AT AAGAA GAACGAC AAC GT CCA AT TGT GA CAACG AAC CA GGt ",GATCGACGCC . TTT TGTTG AAACA AT TTC CGGAA CAGAC CT TTAAATCT TT TGT TAGAT TT CCAAGATT AC TAG TT TCG TC AAT AT AGT AAAAGAACAC TT CTA AA AAA AC GAC GT CGA CA CAC TT CAA AC ATGGAAAAG LFFLLLQ LTLLLG -no splicing -1- RNA Splicing and Adaptation ■ Divergencies at splice site recognition in plants in practice -example of developmental plasticity of (not only) plants Identification of mutant with point mutation (transition G—>A) exactly at the splice site at the 5' end of the 4th exon Analysis by RT PCR proved the presence of a fragment shorter than cDNA should be after the typical splicing event Sequenation of this fragment then suggested alternative splicing with the closest possible splice site in exon 4 Existence of similar defense mechanisms was proven in different organisms as well (e.g. Instability of mutant mRNA with early stop codon formation (> 50 - 55 bp before typical stop codon) in eukaryotes, see recommended literature - Singh and Lykke-Andersen, 2003 ť. EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OPVidělávání mládeže a tělovýchovy pro konkurenceschopnost imi INVESTICE DO ROZVOJE VZDĚLÁVANÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Identification of Genes Ab Initio Programs for exon prediction □ 4 types of exons (according to location in the gene): initial internal terminal single □ Programs predict splice sites and they take into account the structure of the type of exon as well • initial: □ Genescan (http://genes.mit.edu/GENSCAN.html) GeneMark.hi 1 (http://opal.biologv.gatech.edu/GeneMark/) • internal: □ MZEF (http://rulai.cshl.org/tools/genefinder/) INVESTICE DO ROZVOJE VZDĚLÁVÁNÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky GENSCAN The New GENSCAN Web Server at MIT Identification of complete gene structures in genomic DNA w i // Co o) . .-. .-oOOo-(_)~o00o-. .-. .-. .-. .-. .-. .-. .-. .-. .-. I X| I I \ /II IXI I |\ /I I IX| I I\ /I I IX. I I |\ /I I IX| I I\ /I I |X I I |\ /I I |X| | |\ / I I IX I I IN /MIX / \I Iixii I/ NI I ix I I I/ \i i|X| i i/ NI I ixi i i/ \i i|x| i i/ NI I ixi i i/ \lI|X|I 1/ M i ixii I/ For information about Genscan, click here 'his server provides access to the program Genscan for predicting the locations and exon-intron tructures of genes in genomic sequences from a variety of organisms. 'his server can accept sequences up to I million base pairs (1 Mbp) in length. If you have trouble with le web server or if you have a large number of sequences to process, request a local copy of the rogram (see instructions at the bottom of this page) or use the GENSCAN email server. If your browse s.g., Lynx) does not support file upload or multipart forms, use the older version. Organism: I Suboptimal exon cutoff (optional): Q iequence name (optional): Tint options: Jpload your DNA sequence file (one-letter code., upper or lower case> spaces/numbers ignored): pr paste your DNA sequence here (one-letter code, upper or lower case, spaces/numbers ignored): GAGCAGGCACAAAATGACCAATArACAj^ATGATCTTAAACAGCTA/uLCTATňTrGGACATrTTTTCGATC TCAGATATA AAAGArTTCATTCAATATAATACTTGGATAAATACTCTTATTATTTTrCTTTAGrTTA'I'TAAAAAAAACCT CTAATAAAT ArGAGTTTAAGTrrACAAAATCGCiTTAGACTAAAATArACCATATAATTTrAAACfiATAAAriTTTACAAAA GTAATATCC M^ATCTC ATAGTC AACATAT ATATAGTAATAATTAGTT G AC GTAT A A GAAA A T A AAA A T A A AT A A AT T A GTATCTTAT TTTGGGTGGTGCTÜACTGGTGACrGGTGAL'TGCAGAATGCTGGGCAAATGGAACL-ATATCL'L-AAGACATÜG GTTTTAGAT AGAACAAAATAAGTGTi^rGAAGGAATriATATTAAAAfiTCAAATAG-AATAATTATAAATATTGTAATTACCA AATAAAAAC "o have the results mailed to you, enter your email address here (optional): GENSCAN CENSCANW output for sequence CKI1 r GENSCAN 1.0 Date run: 10-Nov-105 Time: 02:24:26 Sequence CKI1 : 9490 bp : 36.53% C+G : Isochore 1 (0-43 C+G%) Parameter matrix: Arabidopsis. smat Predicted genes/exons: Gil.Ex Type S .Begin .End .Len Fr Ph I/Ac Do/T CodRg P. Tscr. 1 . 00 Prom + 1497 1536 40 3 85 1.01 Init + 3708 3764 57 2 0 63 51 37 0 499 4 03 1.02 Intr + 3894 4133 240 2 0 -3 7 327 0 713 17 32 1.03 Intr + 4255 4914 660 0 0 86 59 296 0 771 22 5 7 1 . 04 Intr + 5005 5383 37 9 0 1 70 91 343 0 772 31 41 1.05 Intr + 5473 6056 584 2 2 38 99 582 0 722 50 76 1. 06 Intr + 6136 7368 1233 0 0 58 108 655 0 977 56 86 1 .07 Term + 7448 7660 213 1 0 43 35 212 0 999 12 65 1.08 PlyA + 7910 7915 6 -0. 45 2 . 03 ElyA - 7976 7971 6 -4 83 2 . 02 Term - 8793 8050 744 0 0 107 37 542 0 997 48 46 2.01 Init - 9253 8936 318 1 0 105 73 336 0 999 41 18 Suboptimal exons with probability > 0 100 Exnum Type s .Begin . .End . Len Fr Ph B/AC Do/T CodRg P Tscr.. s.ooi Init + 1867 1905 39 0 0 64 40 57 0 298 3. 74 S.002 Init + 2374 2442 69 0 0 55 95 -11 0 132 2. 40 S.003 Intr + 3894 4110 217 2 1 -3 -34 307 0 177 11 55 S.004 Intr + 4352 4914 563 0 2 75 59 338 0 187 26. 20 S.005 Intr + 5005 5379 375 0 0 70 8 335 0 212 22 99 S.006 Intr + 5442 6056 615 2 0 95 99 589 0 208 57 32 r r r r GENSCAN GENSCAN predicted genes in sequence 02:56:23 □ i i i.... i.... i i.... i....i kh O.d 0.5 2.S 3.0 3.5 4.0 4.5 5.0 I .... i .... I .... i .... I ■ ... i .... I ■ 5.0 5.5 6.0 6.5 i .... I .... i .... I I .... i .... I kh 7 0 7 5 8.0 8.5 Key: Initial exon Internal exon Terminal exon Single-exi 'ii gene Optimal exon Suboptimal exon Regulation of Translation • Splicing in Untranslated Regions - important regulation part of genes Translational repression by short ORFs in 5' UTR Identified e.g. in maize (Wang and Wessler, 1998, see recommended literature for additional info.) In case of CKI1 there was an attempt to prove this mechanism of regulation using transgenic lines carrying uidA under control of two versions of promoter (unconfirmed so far) m k r a f . ATGaaaagagcttttTAG ATGatggtgaaagttaca.... m k r a f . m m v k v t... ATGaaaagagcttttTAG ATGatggtgaaagttaca.... ť. EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OPVidělávání mládeže a tělovýchovy pro konkurenceschopnost iMi INVESTICE DO ROZVOJE VZDĚLÁVANÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Gene Modelling Programs for gene modelling □ Those that take into account other parameters as well, e.g.continuity of ORFs □ Genescan (http://genes.mit.edu/GENSCAN.html) - very good foor prediction of exons in coding regions (tested for gene PDR9, Genescan identified all of the 23 (!) exons) GeneMark.hi 1 (http://opal.biologv.gatech.edu/GeneMark/) □ GlimmerHMM (http://http://ccb.ihu.edu/software/gIimmerhmm/ i MINISTERSTVO ŠKOLSTVÍ EVROPSKÁ UNIE ■ mládeže a tělovýchovy INVESTICE DO ROZVOJE VZDĚLÁVÁNÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky GeneMark GeneMark1" A family of gene prediction programs provided by Mark Borodovsky's Bio informatics Group at the Georgia Institute of Technology, Atlanta, Georgia. What's New: - November, 2005 Prokaryotes: predicted gene database. Prokaryotes: models for GeneMark and GeneMark. hmm. Supported by NIH Gene Prediction in Bacteria and Archaea For bacterial and archaeal gene prediction, you can use the parallel combination of the GeneMark and GeneMark.hmm programs here. If the DNA sequence of interest belongs to a species whose name is not in the list of available models, you should use either the Heuristic models option or, if the sequence is longer than 1 Mb, generate models with the self-training program GeneMarkS, Both options will allow you to generate models and then to use GeneMark.hmm and GeneMark in parallel. Gene Prediction in Eukaryotes For eukaryotic gene prediction, you can use the parallel combination of the GeneMark and GeneMark.hmm programs here. Gene Prediction in EST and cDNA To analyze ESTs and cDNAs, please follow Iii this I Gene Prediction in Viruses For viral gene prediction, or to access our virus database VIOLIN, please follow this link. What the programs do: Borodovsky Group Gene Prediction Programs • GeneMark ■ GeneMark.hmm • Frame-by-Frame ■ GeneMarkS • Heuristic models statistics ■ Documented GeneMark.* usage Help . References • Papers . FAQ • Contact Databases of predicted genes . ProkaryotesN6w! • Viruses/Phages (VIOLIN) Bioinformatics Resources • Links Bioinformatics Studies at Georgia Tech • MS Degree Program • PhD Program • Lectures • Seminars • Center For Bioinformatics and Eukaryotic GeneMark.hmm^1|2' jMoiitMs^Me) References: ^Borodovsky M. and Lukashin A, (unpublished) zLonnsadze A., Ter-Hovhannisyan V., ChernofTY. and Borodovsky M., "Gene identification in novel eukaryotic genomes by self-training algorithm" Nucleic Acids Research, 2005, Vol. 33, No. 20, 6494-6506 Accuracy comparison UPDATE October 2005. Added pre-built models of eukaryotic GeneMark.hmm ES-3.0 (E -eukaryotic; S - self-training; 3.0 - the version) Listing of previous updates Input Sequence Title (optional): 9_ EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, MLÁDEŽE A TĚLOVÝCHOVY OP Vzdělávání pro konkurenceschopnost at > [ČKŤT Sequencer ctttcccctttgciatacJtjggatatiaittcatiiitgttcitiittttJtttttgcacttgagtttatggttttrtttggtggaagj ;ctatatctatattattttartcttttcttcgtcgtratttatagtatattatatatatgcacacacacacacacctatatgtatagctc a a a at a* at a g at ait gf g at c tt gf a g aat c at tt ttttt gt at t c tt tt gt t at c a aagggtttcgactttgrctccgaggaagaagataat ľtttttagggtttatcattctccttgactttgcaaaacgtgaaatgtaaggcactttgatcgttgtactttgttgctttttatacgtatc ataagrttaacaatgcttcctcgrtagaatt gcaaaacatttgrtggaccgrtgatttacatgact gage^ rt ggaggact aat c aagacagaaatct gtt c ctctaaaaac gat ege c grttct a ggrt aat ctt gcc att ^ rataagggatcacgagatacacgrtattaattattattttttttttttttgctttttgrtggttatacaagttcac^ CT GT G-GCTT C AC FT C C AATT GT &&T CTTTT &C&T C CT tgSaatt-rt gcttt-rtttcttctaaattatacgat gattctacatttctactc z gttttt c aaat jat at ijtt att gtgt gt atat cac; catt c atgt at attt at t gaaaaat atag&C ATT C CT> G-GTT GTT T T C GA ňTCTCAAATTG&C&AACÄACAAC&&Ä&ňňCC^ jAAAATTTACATAT&CCAA&ACAAACTTATCTAC^ A AC AC Laactaatt acat aaat t í att ctt agtt att atett gtt at ataacatt aactat a at z gtt crtt gtt gtt att attgtt ctt cagAT C SC AC C ATTGTT GTTT GT AGCTT ATT C AACGAT C CTT C AAGT CTC AC AAGTTTCGT AC AT í&TCTCAT&TTTTCTTACATT&CA&AATCAAACACAA&T&TC&CTGTTTTTGCCAATTCCTC&TC&AATTCAA&TC&TG&AGACrACACT ^AACCCT&&ATCAGTTAACTG&TC&TCTTAAC&&&AACTCAACGAAATCTCA&TC&TTA&AT&TAACCCATACA&ATTGGTTCCAAGCAG TAACTACACTACAGCCTTTGTA&&AACGA&CTT&&GA&i^&AAGATAAC&AGACTCTAATACA&A&C&TGPrTAGCTTGTACAK^ rCTTT^&G-TTTCC&&TTAAGACTTTňňCC&^G-TTTT^ňCňGTTTG-ňňTCTňCACGGC&ňAG-ň&CTTTACATG-T&GACAňňGGňC&G-C rTC&Ti^A&&TTCACT&AATGATTCTTTCTTCATCTCCAAT&&CTC&ATTTGCTTC&CTA&A&AATC&AACTCCCTCTGGTC^^ rTGCACTTCCň&TG-&CTňCGňG-&T&GAGňTC^ňňGňTTAň&ňTACCAň&CTTTTT&CTCTGTT^T&ňňCTTTC&&H&TTCCTCT&írt acat attt c act tt gatgcagt aaaaat g c ategactt gttgtttct c agettctt ccaatggtttttttttt gccagAGAT AC AC ACT C ^CAAAWA&&A&CAACACGCATCAA&CACCAA&C&GAAAA&&CAAAATATCAACTTATT&TG&TTAT&ATATTTCTTGGCTTC>^ ľ GT GT GG-TTT AT GAT GC AAGC AAC AAG-GAGAGAGAT GC AT ATG-C GT GC AAC KT GAT AAAC C AAATGGAAGCGACACAACAAGCTGAGAG Sequence File upload:* Species :«| Athdiana ES-3.0 Model description Output Options Email Address: (required for graphical output or sequences longer than 400000 bp)€ I ,'l Generate PDF graphics (screen) H Generate PostScript graphics (ernail)»> H Print GeneMark 2,4 predictions in addition to GeneMark.hmm predictions* H Translate predicted genes into protein* imi Run Default Start GeneMark.hmm | lávání ____™„„ j„ „,______nancována Evropským sociálním fondem a státním rozpočtem České republiky GeneMark Result of last submission: Viať PDF Graphäal Output GeneMar k hnim Listing Go to: GeneMarkhmm Protein Tramlatiüm Go to: Job Submission EuJf ar i oty c G-cnctlark . hnnri versi on bp 3.9 r i 1 25 f £ 0 0 S Sequence name: CKI1 Sequence length.: 5043 bp G+C content: 3S . 75* Hat r ices file: /home/g e rain ark/ euk_ghjn. in it r ices/ ath.il i an Thu Oct 1 11:09:£4 £009 Predicted genes/ exons Gene Eicon Strand Eh on S # Typt E^on Ringe Exon Length Stirt/ En. d Fr aunt „™„..„C1. r r r- r r i 1 4 5 5 1 + + + + + + + In it iil Internal Int unii Irit4rn.il Int 4 rn.il Int e rnil TeLimin.il 9S9 1025 1155 1512 ZZEE 3397 4709 57 1 5 1394 2175 2S44 3317 4529 4921 240 £60 379 584 1233 213 r r EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, MLÁDEŽE A TĚLOVÝCHOVY ť. ES OP Vzdělávání pro konkurenceschopnost at > tí IMl j-. J /ZDĚLÁVÁNi Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky GeneMark Result of last submission: View PDF Graphical Output GeiueMarkhmm Listing Go to: GeneMarkhmm Protein Translations Go to: Job Submission EuJtariotyc GeneHark . hnun version, bp 3.9 ^>zil Z5{ £008 Secjuerice mine: CKI1 Sequence leri.gt±i: 5043 bp G+C content: 3S."J3* Eta.tr ices file: /honňe/genuriark/ euJí_ghjiň. matr icts/ atb-al i ari.a_hjYuiri3. Ornod Thu Oct 1 11:09:24 £009 GeneMark.hnnm prediction Thu Nov 10 03:23:47 EST 2005. Order 5. Window 96. Step 12, 4/6 Predicted gines/ ckůrs Gene Ex on. Str arid Ex on s s Type Lriit iil Lntern.il Irvfc4rn.il Irvfc4rn.il Lrvt 4 rn.il Irvfc 4 rn.il TeLrnin.il Eicon Rixige Eicon Leng-fch. 959 1025 SU 3 ■ ■ 1155 1394 £40 151£ £175 650 £266 2544 373 £7 34 3317 5S4 3 3 97 45 2 9 1£3 3 4709 4921 £13 Stiit/End Fr uľi4 0.5 0.0 1.0r 0.5 PI "rt ' ' " J i ■ A ju I /x i , i /I . I I U Ii L i i , i .1 ^ In ,1 1.1. 1. 1 /i iH 4400 ť. EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OPVidělávání MLÁDEŽE A TĚLOVÝCHOVY pro KonkureneMchopnort at > tí lnu 4400 4600 _L_L 4ä0i} 5200 5000 6000 5200 5600 Nucleotide Position 6000 .ÁVÁNÍ ancovana Evropským sociálním fondem a státním rozpočtem České republiky Genomic Homologies ■ Searching for genes according to homologies with known sequences Comparison with EST databases □ BLASTN (http://www.ncbi.nlm.nih.gov/BLAST/, http://workbench.sdsc.edu/ Comparison with protein databases BLASTX (http://www.ncbi.nlm.nih.gov/BLAST/, http://workbench.sdsc.edu/ □ Genewise (http://www.ebi.ac.uk/Wise2/) They compare protein sequence with genomic DNA (after reverse transcription), therefore the aminoacid sequence is needed Comparison with homologous genome sequences from related species □ VISTA/AVID (http://www.lbl.gov/Tech-Transfer/techs/lbnl1690.html) i MINISTERSTVO ŠKOLSTVÍ EVROPSKÁ UNIE Wm* ■ mládeže a tělovýchovy INVESTICE DO ROZVOJE VZDĚLÁVÁNÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Outline Tor identification ot genes ana tneir Tunction ■ Identification of Genes Ab Initio ■ Structure of genes and searching for them ■ Genomic colinearity and genomic homology ť. EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, MLÁDEŽE A TĚLOVÝCHOVY OP Vzdělávání pro konkurenceschopnost 5 > to 1 INVESTICE DO ROZVOJE VZDĚLÁVANÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Genomic Colinearity Genomes of related species (despite large differencies) are characterized by similarities in sequence organization -> possibility to use this information for identification of genes in related species when searching in databases General scheme of work while applying genomic colinearity (also called ..comparative genomics") for experimental identification of genes in related species: Mapping small genomes using low-copy DNA markers (e.g. RFLP) i Using these markers for identification of orthologous genes (genes with the same or similar function) of related species ] Small genome (e.g. rice, 466 Mbp) can be used as a guide: molecular low-copy markers (e.g. RFLP) bound to gene of interest are identified and these regions are then used as a probe for searching in BAC libraries during identification of orthologous regions of large genomes (e.g. barley: 5 Gbp, or wheat: 16 Gbp) Genomic Colinearity B 140 kb 20 kb 50 kb Maize (2500 Mbp) Rice (400 Mbp) Hexaploid wheat (16 000 Mbp) I Barley (5000 Mbp) I Rice (400 Mbp) c fD High gene density Gene-rich region 15 Feuillet and Keller, 2002 1Mb Genomic Colinearity Can be mostly used for the species of grass (e.g. using related genes of species of barely, wheat, rice, maize) Small genome reorganizations (deletions, duplications, inversions, translocations smaller than a few cM) are then detected by detailed sequentional comparative analysis During evolution there's occured some divergencies in related species, mostly in non-coding regions (invasion of retrotransposons etc.) 3* INVESTICt uu Kuz.vujt vz.uti_MvÁNÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Genomic Colinearity Genomic colinearity of HOX genes in animals ■ Transcription factors controlling organisation of body in anterio-posterior axis -LUpKuKiJ to pr>:M nil Position of genes in corresponds with spatial during development genome expression 5',' SralTrilJ7(l .199.360 nnoi Interspecies conservation H-Hr- C.ipi-LTfr] Cjpl-Scj-L □.! Capr-jHi.-tr* CnpT-lr>jr-f I no i fin &p: ib Cspl.pfc Capi Did Cast-So- Cdgl-Arfo frtoliurr- rmi - Ö.1 E 1110 35 £1 Elcivtto: WM..UOHI1__ BlJUicNttMifä HhW 1 lacEBjnoii p[fi»*i BriiniMilmi 9-UftL lfJLlOT.il PG9-14 Posterior CdK*— Ua fix U--I-I ■ .I- Cdx ^ f:"^ä I Med Post PG8 Ejupryi___ L jpiU'l U Ijri • 4 Gapltplü IniJ-i- Rr-t-r- Ii-.." — F iMCWKni I-miS 1 BrwTnnkatarra H&hb -- BmncT»ciFlarriH - BiSiitfio^yr* Hut.? Nnrgi; HmcT ■ r I.IL:: L-UrJI^il I I' J-- .■ .'.i.i ■ Bf Ji iJiiuLlinTia rt 31J 1- CaDlbntJ ":irt Mr™ &5snchiMtQFa luul&l"' ' ^00Ss* -Inf- NKrcia'nM I Eve i Mux PG7 PG6 PG5 PG4 Xlox Gsx PG3 PG1-2 Anterior Central Outline Experimental Genes Identification ■ Constructing gene-enriched libraries using methylation filtration technology EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OP Vzdělávání MLÁDEŽE A TĚLOVÝCHOVY pro konkurenceschopnost INVESTICE DO ROZVOJE VZDELÁVANÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Methylation Filtration ■ Preparation of gene-enriched libraries by technology of methylation filtration ■ genes are (mostly!) hypomethylated, noncoding regions are methylated using bacterial restriction-modification system, which recognizes methylated DNA with restriction enzymes McrAa McrBC McrBC recognizes methylated cytosin (in DNA), which comes after purine (G or A) □ For cleavage the distance of these sites 40-2000 bp is necessary i MINISTERSTVO ŠKOLSTVÍ EVROPSKÁ UNIE ■ mládeže a tělovýchovy INVESTICE DO ROZVOJE VZDĚLÁVÁNÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Methylation Filtration ■ Preparation of gene-enriched libraries by technology of methylation filtration Scheme of work during preparation of BAC genome libraries using methylation filtration: preparation of genomic DNA without addition of organelle DNA (chloroplasts and mitochondria) □ fragmentation of DNA (1-4 kbp) and ligation of adaptors □ preparation of BAC libraries in mcrBC+ strain of E. coli □ selection of positive clones Limitied usage: enrichment of coding DNA only approx. 5-10% BUS* "Xfiy i MINISTERSTVO ŠKOLSTVÍ EVROPSKÁ UNIE ■ mládeže a tělovýchovy INVESTICE DO ROZVOJE VZDĚLÁVÁNÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Outline Differences between the approaches used top icieniiTicaiion ot genes ano tneir Tunciion ■ Experimental Genes Identification ■ Constructing gene-enriched libraries using methylation filtration technology ■ EST libraries i i ť. EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OPVidělávání MLÁDEŽE A TĚLOVÝCHOVY pro konkurenceschopnost iMi INVESTICE DO ROZVOJE VZDĚLÁVANÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky EST Libraries Preparation of EST libraries Isolation of mRNA Reverse transcription Ligation of linkers and synthesis of second cDNA strand Cloning into suitable bacterial vector Transformation into bacteria and isolation of DMA (amplification of DNA) Sequencing using primers specific for used plasmid Saving the results of sequencing into public database cctacgattatacccccaa ggatgctaatatgggggttatacaagtgtt jjttttttit: Základy genomiky II, Identifikace genů Outline ■ Forward and Reverse genetics approaches ■ Differences between the approaches used for identification of genes and their function ■ Identification of Genes Ab Initio ■ Structure of genes and searching for them ■ Genomic colinearity and genomic homology ■ Experimental Genes Identification ■ Constructing gene-enriched libraries using methylation filtration technology ■ EST libraries Forward and reverse genetics i MINISTERSTVO ŠKOLSTVÍ EVROPSKÁ UNIE ■ mládeže a tělovýchovy INVESTICE DO ROZVOJE VZDĚLÁVÁNÍ pro konkurenceschopnost Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Discussion ť. EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OP Vzdělávání MLÁDEŽE A TĚLOVÝCHOVY pro konkurenceschopnost imi INVESTICE DO ROZVOJE VZDĚLÁVANÍ Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky