GENSCAN 1.0 Date run: 31-Aug-99 Time: 07:40:46 Sequence addX : 100000 bp : 43.74% C+G : Isochore 2 (43.00 - 51.00 C+G%) Parameter matrix: HumanIso.smat Predicted genes/exons: Gn.Ex Type S .Begin ...End .Len Fr Ph I/Ac Do/T CodRg P.... Tscr.. ----- ---- - ------ ------ ---- -- -- ---- ---- ----- ----- ------ 1.01 Intr + 1470 1515 46 1 1 87 97 27 0.584 0.97 1.02 Intr + 4316 4370 55 2 1 103 89 47 0.802 5.28 1.03 Intr + 6717 7634 918 2 0 68 76 530 0.821 41.25 1.04 Intr + 10839 10927 89 2 2 68 51 116 0.974 4.67 1.05 Intr + 11553 11793 241 0 1 38 98 160 0.952 9.55 1.06 Intr + 13015 13045 31 0 1 113 76 17 0.599 0.70 1.07 Intr + 14764 14878 115 2 1 84 94 -5 0.586 -0.79 1.08 Intr + 15797 15860 64 2 1 70 76 59 0.778 1.62 1.09 Intr + 15975 16111 137 2 2 84 108 72 0.980 8.17 1.10 Intr + 16365 16436 72 0 0 50 97 74 0.890 3.02 1.11 Intr + 17554 17647 94 1 1 54 115 73 0.967 6.57 1.12 Intr + 20434 20617 184 0 1 61 100 176 0.936 15.46 1.13 Term + 22771 22898 128 2 2 72 43 97 0.940 2.04 1.14 PlyA + 27468 27473 6 1.05 2.03 PlyA - 28182 28177 6 1.05 2.02 Term - 29547 29386 162 0 0 117 43 48 0.752 1.14 2.01 Init - 30262 30206 57 1 0 74 110 94 0.926 9.55 2.00 Prom - 35263 35224 40 -2.86 3.03 PlyA - 36022 36017 6 1.05 3.02 Term - 36469 36375 95 2 2 44 53 28 0.473 -7.11 3.01 Init - 37075 36922 154 1 1 101 77 207 0.498 21.04 3.00 Prom - 38598 38559 40 -8.16 4.00 Prom + 38914 38953 40 -7.66 4.01 Init + 39688 39820 133 0 1 51 91 125 0.489 9.40 4.02 Term + 41401 42368 968 2 2 68 42 391 0.553 24.72 4.03 PlyA + 42752 42757 6 1.05 5.00 Prom + 43014 43053 40 -4.96 5.01 Init + 72592 72643 52 0 1 73 59 58 0.434 2.73 5.02 Intr + 73713 74096 384 1 0 42 45 182 0.284 4.12 5.03 Intr + 76043 76064 22 0 1 115 105 -3 0.698 0.70 5.04 Intr + 78143 78820 678 2 0 -17 81 534 0.574 32.93 5.05 Intr + 79538 80107 570 2 0 88 96 183 0.907 10.98 5.06 Intr + 80193 80334 142 0 1 84 87 30 0.993 2.86 5.07 Intr + 80435 80707 273 1 0 51 98 294 0.943 24.43 5.08 Intr + 80829 81088 260 2 2 59 -73 566 0.461 33.86 5.09 Intr + 82315 82369 55 1 1 99 100 -9 0.430 0.38 5.10 Term + 83509 83745 237 0 0 75 38 149 0.413 4.77 5.11 PlyA + 86683 86688 6 1.05 6.02 PlyA - 86829 86824 6 1.05 6.01 Sngl - 90335 88686 1650 2 0 119 49 1935 0.914 185.97 Predicted peptide sequence(s): >addX|GENSCAN_predicted_peptide_1|724_aa XRYNPSKTSNGHQSKSMLKDDLKLSSSEDSDGEQPEPPPTNKWQLDNWLNKVNPHKVSPA SSVDSNIPSSQGYKKEGREQGTGNSYTDTSGPKETSSATPGRDSKTIQKGSESGRGRQKS PAQSDSTTQRRTVGKKQPKKAEKAAAEEPRGGLKIESETPVDLASSMPSSRHKAATKGSR KPNIKKESKSSPRPTAEKKKYKSTSKSSQKSREIIETDTSSSDSDESESLPPSSQTPKYP ESNRTPVKPSSVEEEDSFFRQRMFSPMEEKELLSPLSEPDDRYPLIVKIDLNLLTRIPGK PYKETEPPKGEKKNVPEKHTREAQKQASEKVSNKGKRKHKNEDDNRASESKKPKTEDKNS AGHKPSSNRESSKQSAAKEKDLLPSPAGPVPSKDPKTEHGSRKRTISQSSSLKSSSNSNK ETSGSSKNSSSTSKQKKTEGKTSSSSKEVKVKITVYASITLIYFFGQEKAPSSSSNCPPS APTLDSSKPRRTKLVFDDRNYSADHYLQEAKKLKHNADALSDRFEKAVYYLDAVVSFIEC GNALEKNAQESKSPFPMYSETVDLIKYTMKLKNYLAPDATAADKRLTVLCLRCESLLYLR LFKLKKENALKYSKTLTEHLKLSPGNSGNYSSGASSASASGSSVTIPQKIHQMAASYVQV TSNFLYATEIWDQAEQLSKEQKEFFAELDKVMGPLIFNASIMTDLVRYTRQGLHWLRQDA KLIS >addX|GENSCAN_predicted_peptide_2|72_aa MWHLKLCAVLMIFLLLLGQKKTLFLKCGPGMMYIPGKERTAVTSDNAPFYGILINCILAG DTQVKQSCIFNI >addX|GENSCAN_predicted_peptide_3|82_aa MGREFGNLTRMRHVISYSLSPFEQRAYPHVFTKGIPNVLRRIRESFFRVVPQFVVFYLIY TWGTEEFERSKRKNPAAYENDK >addX|GENSCAN_predicted_peptide_4|366_aa MKKLYKTYATKEGIPKSNRSHLYNTVRLFTPCTRHKQAPGDQVTGILPSVELLFNLDRIT TVEHLLKSVLLYNINNSVSFSSAVKCVCNLMIKEPKSSSRTLGRAPYSFTFNSQFEFGKK HKWIQIDVTSLLQPLVASNKRSIHMSINFTCMKDQLEHPSAQNGLFNMTLVSPSLILYLN DTSAQAYHSWYSLHYKRRPSQGPDQERSLSAYPVGEEAAEDGRSSHHRHRRGQETVSSEL KKPLGPASFNLSEYFRQFLLPQNECELHDFRLSFSQLKWDNWIVAPHRYNPRYCKGDCPR AVGHRYGSPVHTMVQNIIYEKLDSSVPRPSCVPAKYSPLSVLTIEPDGSIAYKEYEDMIA TKCTCR >addX|GENSCAN_predicted_peptide_5|890_aa MEILEGEGLERQCPALVASGEAGSFPPGLALSPARDPSPTVNPTGKAQNCAGLRPSAVEP LRVSLPPPAAAGACASLPPVPRLPEIEPQVTGSVHARGRLSPGVASFLHFLCMARSSGLQ RLCIWGAATDEPVNQSHTRSVDAAWGPEGIDFWQATPLLYALAAEAEAAAQAAEPPSPPA SRAAYRQRLQGAQRRVLRETSFQRKELRMSLPARLRPTVPARPPATHPRSASLSHPGGEG EPARSRAPAPGTAGRGPLANQQRKWCFSEPGKLDRVGRGGGPARECLGEACSSSGLPGPE PLEFQHPALAKFEDHEVGWLPETQPQGSMNLDSGSLKLGDAFRPASRSRSASGEVLGSWG GSGGTIPIVQVWKSGDAGCVHASDQPYGTGLGQRTGQVTVPTEYPLHECPGTAGADDCWQ GVNGSVGISRPTSHTPTGTANDNIPTIDPTGLTTNPPTAAESDLLKPVPADALGLSGNDT PGPSHNTALARGTGQPGSRPTWPSQCLEELVQELARLDPSLCDPLASQPSPEPPLGLLDG LIPLAEVRAAMRPACGEAGEEAASTFEPGSYQFSFTQLLPAPREETRLENPATHPVLDQP CGQGLPAPNNSIQGKKVELAARLQKMLQDLHTEQERLQGEAQAWARRQAALEAAVRQACA PQELERFSRFMADLERVLGLLLLLGSRLARVRRALARAASDSDPDEQASLLQRLRLLQRQ EEDAKELKEHVARRERAVREVLVRALPVEELRVYCALLAGKAAVLAQQRNLDERIRLLQD QLDAIRDDLGHHAPDTTGSARPEAGSRGWELKFSPFHIVISPEGKNELTTFVPLSAPQTS RLCGDQKPGELTLQDTASRGRCPNGESWYLRFAIRSFSSLPGPLGQPQGL >addX|GENSCAN_predicted_peptide_6|549_aa MALAAAAAAAAAGVSQAAVLGFLQEHGGKVRNSELLSRFKPLLDXXXXXXXXXXRDRFKQ FVNNVAVVKELDGVKFVVLRKKPRPPEPEPAPFGPPGAAAQPSKPTSTVLPRSASAPGAP PLVRVPRPVEPPGDLGLPTEPQDTPGGPASEPAQPPGERSADPPLPALELAQATERPSAD AAPPPRAPSEAASPCSDPPDAEPGPGAAKGPPQQKPCMLPVRCVPAPATLRLRAEEPGLR RQLSEEPSPRSSPLLLRRLSVEESGLGLGLGPGRSPHLRRLSRAGPRLLSPDAEELPAAP PPSAVPLEPSEHEWLVRTAGGRWTHQLHGLLLRDRGLAAKRDFMSGFTALHWAAKSGDGE MALQLVEVARRSGAPVDVNARSHGGYTPLHLAALHGHEDAAVLLVVRLGAQVHVRDHSGR RAYQYLRPGSSYALRRLLGDPGLRGTTEPDATGGGSGSLAARRPVQVAATILSSTTSAFL GVLADDLMLQDLARGLKKSSSFSKFLSASPMAPRKKTKIRGGLPAFSEISRRPTPGPLAG LVPSFPPTT Column Description ------ ------------------------------------------------------------- Gn.Ex gene number, exon number (for reference) Type Init = Initial exon Intr = Internal exon Term = Terminal exon Sngl = Single-exon gene Prom = Promoter PlyA = poly-A signal S DNA strand (+ = input strand; - = opposite strand) Begin beginning of exon or signal (numbered on input strand) End end point of exon or signal (numbered on input strand) Len length of exon or signal (bp) Fr "absolute reading frame" relative to start of sequence. For example, if nucleotides 1,2,3 of the sequence are read as a codon, that's called reading frame 0. If 2,3,4 are read as a codon, that's reading frame 1. If 3,4,5 are read as a codon, that's reading frame 2, and so on. This information, together with the starting and ending positions of the exon, is sufficient to give the amino acid sequence encoded by the exon. Another use of the reading frame is that if you see two adjacent predicted exons separated by a relatively short intron which share the same reading frame, it may be worth looking at the possibility that the intervening intron is not correct, i.e. that the two exons plus the intervening intron might form one long exon (assuming there are no inframe stops in the intron, of course). Ph "net phase" of exon (exon length modulo 3) For example, an exon of length 15 bp has net phase 0 since 15 is divisible by 3, an exon of length 16 bp has net phase 1 because 16 divided by 3 leaves a remainder of 1, an exon of length 17 bp has net phase 2, and an exon of length 18 bp has net phase 0 again. The point of this is that exons whose net phase is 0 can be omitted from the gene without disrupting the reading frame: such exons are candidates for being either 1) incorrect, or 2) alternatively spliced. I/Ac initiation signal or acceptor splice site score (x 10) (If below zero, probably not a real acceptor site.) Do/T donor splice site or termination signal score (x 10) (If below zero, probably not a real donor site.) CodRg coding region score (x 10) Low coding region scores may indicate potentially incorrect predictions or genes with unusual amino acid and/or codon usage patterns. P probability of exon (sum over all parses containing exon) This quantity is close to the actual probability that the predicted exon is correct. Tscr exon score (depends on length, I/Ac, Do/T and CodRg scores) An overall measure of exon quality based on local sequence properties