t t Protein information resources Bioinformatics - lectures Introduction Information networks Protein information resources Genome information resources DNA sequence analysis Pairwise sequence alignment Multiple sequence alignment Secondary database searching Analysis packages Protein structure modelling Protein information resources biological databases - introduction primary protein sequence databases composite protein sequence databases secondary databases composite secondary databases protein structure databases protein structure classification databases Biological databases - introduction Vast amounts of data produced - databases must be established for storage of the data. Databases must be maintained and disseminated together with the analysis tools. Classification of databases >~ flat files ** relational ** object-oriented primary secondary composite LOCUS DEFINITION ACCESSION NID KEYWORDS SOURCE ORGANISM REFERENCE AUTHORS TITLE JOURNAL MEDLINE COMMENT FEATURES mRNA gene CDS BASE COUNT ORIGIN 1 61 121 181 241 301 361 421 481 541 601 DRODPPC 4001 bp D.melanogaster M30116 gl57291 mRNA INV 15-MAR-1990 complex (DPP-C), complete cds D.melanogaster, cDNA to mRNA. Drosophila melanogaster Eukaryotae; mitochondrial eukaryotes; Metazoa; Arthropoda; Tracheata; Insecta; Pterygota; Diptera; Brachycera; Muscomorpha; Ephydroidea; Drosophilidae; Drosophila. 1 (bases 1 to 4001) Padgett,R.W., St Johnston,R.D. and Gelbart,W.M. A transcript from a Drosophila pattern gene predicts a protein homologous to the transforming growth factor-beta family Nature 325, 81-84 (1987) 87090408 The ion codon could be at either 1188-1190 or 1587-1589. Loca t i on/Qua1i f iers 1..4001 /organism«"Drosophila melanogaster" /db_xref="taxon:7227" <1..3918 /gene="dpp" /note="decapentaplegic protein mRNA" /db_xref="FlyBase:FBgn0000490" 1..4001 /note="decapentaplegic" /gene="dpp" /allele="" /db_xref="FlyBase:FBgn0000490" 1188..2954 /gene="dpp" /note="decapentaplegic protein (1188 could be 1587)" /codon_start=l /db_xref="FlyBase:FBgn0000490" /db_xrefÄ"PID:gl57292H /translation="MRAWLLLLAVLATFQTIVRVASTEDISQRFIAAIAPVAAHIPLA SASGSGSGRSGSRSVGASTSTALAKAFNPFSEPASFSDSDKSHRSKTNKKPSKSDANR LGYDA YYCHGKC PF PLADHFNSTNHAWQTLVNNMNPGKVPKACCVPTQLDS VAML YL NDQSTWLKNYQEMTWGCGCR" 1170 a 1078 c 956 g 797 t gtcgttcaac agcgctgatc gagtttaaat ctataccgaa atgagcggcg gaaagtgagc cacttggcgt gaacccaaag ctttcgagga aaattctcgg acccccatat acaaatatcg gaaaaagtat cgaacagttt cgcgacgcga agcgttaaga tcgccaaaag atctccgtgc ggaaacaaag aaattgaggc actattaaga gattgttgtt gtgcgcgagt gtgtgtcttc agctgggrtgt gtggaatgtc aactgacggg ttgtaaaggg aaaccctgaa atccgaacgg ccagccaaag caaataaagc tgtgaatacg aattaagtac aacaaacagt tactgaaaca gatacagatt cggattcgaa tagagaaaca gatactggag atgcccccag aaacaattca attgcaaata tagtgcgttg cgcgagtgcc agtggaaaaa tatgtggatt acctgcgaac cgtccgccca aggagccgcc gggtgacagg tgtatccccc aggataccaa cccgagccca gaccgagatc cacatccaga tcccgaccgc agggtgccag tgtgtcatgt gccgcggcat accgaccgca gccacatcta ccgaccaggt gcgcctcgaa tgcggcaaca caattttcaa // 3841 aactgtataa acaaaacgta tgccctataa atatatgaat aactatctac 3901 gttctaagct aagctcgaat aaatccgtac acgttaatta atctagaatc 3961 acgcgtaagc tcagcatgtt ggataaatta atagaaacga g Paper 1 Paper 2 Paper 3 Paper 4 SELECT ^^^©s^ffil - 'iJsSssSvab ^^^oS:^ ■ SS^EÄHriiSS ^ BíSftíSss^.-:' b - ^^^^h§H E?^^ffi_^fl. i ü . o_. SP^ňíľSŕSS iijld Síri. Ow-OH^ííí^'■"■ ■" ■ BM&Ä?v . |§||||fB ^^^Äj:'?: ■'^^^^^H ^^S^K^^ŕ o" ■ 1 - ^^^^H fi^Sl^^^Ě °>^ ' "ÍÍÍĚSÍÍÍriVrííBB ■■ _ ■- B:ďľľľ|jľ5/Jr?j?r-|.- '.-. " l-^-S-irrr-.^^i^S ■ r ■ ŕ J fpffiffiffiffiffi ŕ'o ľSí'í'ň'i.'ííSíí^HB ' - ^Srtiíií'tiííííjí -jř' o - SggggSa^lB ! l^^^^S ^ Í ^^^^^s^s j^^^^^l'/1"' öi^^^^g ' . ^^^Ä!":-.- ňSSSSSSrŕňíPÍBJS I B?K^5S5Ä?ŕrŕr" r -! 'IlliPi ! ^^M.;^ WĚm&..\ PROJECT __^^^M_^H«_ ^^ - . . ^^-^^-jj^^^ta | j|[j I \ r HH ^H 1 ■ H "■ i , ■— .- - _ . - . ■"_■- v_ ^H- '►x - ■"l Hl. j j i ^ - ^ Author 1 -1 Author 1-2 Author 2-1 Author 2-2 Author 2-3 Author 3-1 JOIN cctggcgtcgtggtg agcagc tcggcc tgc cggcc c tggccgg11 1655a ATRSPGWISDDEPG YDLDLFCIPNHYAED LERVFIPHGLIMDRT ERLARDVMKEMGGHH IVAL...... Similaríty(X) object X message class sequence Similarity class structure Similarity class expression Similarity class pathway Similarity Levels of protein structure and corresponding databases primary sequence \ i Secondary motif \ / \ ternary domain module AVILDRYFH [AS]-[TL]2-X[DE]-R-[FYW]2-H /\ a,b,c @5*,# primary database secondary database structure database Primary Secondary Tertiary F ■- P A ■L F A HB ■ F L A V > Primary protein sequence databases PIR MIPS SWISS-PROT TrEMBL NRL-3D Store biomolecular sequences and annotations. Primary protein sequence databases ■ PIR - Protein Sequence Database *- 1960s by Margaret Dayhoff ** maintained by international consortium ^ four sections PIR1-PIR4 PIR1 - fully classified and annotated entries PIR2 - preliminary entries PIR3 - unverified entries PIR4 - conceptual translations of artefactual sequences, non-transcribed, non-translated ■ MIPS - Martinsried Institute for Protein Sequences ** collects and processes sequence data for PIR Primary protein sequence databases SWISS-PROT ** University Geneva *EBI ■► Swiss Inst, of Bioinformatics ** high-level annotations including description of the function, structure and domains, post-translational modifications, variants, etc. ** annotated manually (high quality) >- automatically annotated = TrEMBL >- minimally redundant ** interlinked with many other sources >- efficient searching of selected fields only >- most widely used protein sequences database Primary protein sequence databases TrEMBL - Translated EMBL ** computer-annotated supplement of SWISS-PROT >- contains translations of all coding sequences in EMBL >- SP-TrEMBL (SWISS-PROT TrEMBL), REM-TrEMBL NRL-3D >- produced by PI R from sequences extracted from Brookhaven Protein Databank (PDB) *■ annotations in PIR format including structural information extracted from PDB: secondary elements, active site Ms, experimental method, resolution ** makes sequence information in PDB searchable by keywords and similarity ID AC DT DT DT DE GN OS oc RN RP RH RA RL RN RP RM RA RL CC CC CC CC CC DR DR DR DR DR KW FT FT FT FT FT FT FT FT FT FT FT SQ DECA_DROME P07713; 01-APR-1988 01-APR-1988 01-FEB-1995 STANDARD; PRT; 588 AA. ÍREL. 07, CREATED) (REL. 07, LAST SEQUENCE UPDATE) (REL. 31, LAST ANNOTATION UPDATE) DECAPENTAPLEGIC PROTEIN PRECURSOR (DPP-C PROTEIN). DPP. DROSOPHILA MELANOGASTER (FRUIT FLY). EUKARYOTA; METAZOA; ARTHROPODA; INSECTA; DIPTERA. [1] SEQUENCE FROM N.A. 87090408 PADGETT R.W., ST JOHNSTON R.D., GELBART W.M. ; NATURE 325:81-84(1987). [2] CHARACTERIZATION, AND SEQUENCE OF 457-476. 90258853 PANGANIBAN G.E.F., RASHKA K.E., NEITZEL M.D., HOFFMANN F.M.; MOL. CELL. BIOL. 10:2669-2677(1990). -!- FUNCTION: DPP IS REQUIRED FOR THE PROPER DEVELOPMENT OF THE EMBRYONIC DORSAL HYPODERM, FOR VIABILITY OF LARVAE AND FOR CELL VIABILITY OF THE EPITHELIAL CELLS IN THE IMAGINAL DISKS. SUBUNIT: HOMODIMER, DISULFIDE-LINKED. SIMILARITY: TO OTHER GROWTH FACTORS OF THE TGF-BETA FAMILY. EMBL; M30116; DMDPPC. PIR; A26158; A26158. HSSP; P08112; 1TFG. FLYBASE; FBGN00Ö0490; DPP. PROSITE; PS00250; TGF_BETA. GROWTH FACTOR; DIFFERENTIATION; SIGNAL. » SIGNAL PROPEP CHAIN DISULFID DISULFID DISULFID DISULFID CARBOHYD CARBOHYD CARBOHYD CARBOHYD SEQUENCE 1 ? 457 487 516 520 120 342 377 529 588 AA; » 456 588 553 585 587 552 120 342 377 529 65850 MW; POTENTIAL. DECAPENTAPLEGIC PROTEIN. BY SIMILARITY. BY SIMILARITY. BY SIMILARITY. INTERCHAIN (BY SIMILARITY). POTENTIAL. POTENTIAL. POTENTIAL. POTENTIAL. 1768420 CN; MRAWLLLLAV LATFQTIVRV ASTEDISQRF IAAIAPVAAH IPLASASGSG SGRSGSRSVG ASTSTALAKA FNPFSEPASF SDSDKSHRSK TNKKPSKSDA NRQFNEVHKP RTDQLENSKN KSKQLVNKPN HNKMAVKEQR SHHKKSHHHR SHQPKQASAS TESHQSSSIE SIFVEEPTLV LDREVASINV PANAKAIIAE QGPSTYSKEA LIKDKLKPDP STLVEIEKSL LSLFNMKRPP KIDRSKIIIP EPMKKLYAEI MGHELDSVNI PKPGLLTKSA NTVRSFTHKD SKIDDRFPHH HRFRLHFDVK SIPADEKLKA AELQLTRDAL SQQWASRSS ANRTRYQVLV YDITRVGVRG QREPSYLLLD TKTVRLNSTD TVSLDVQPAV DRWLASPQRN YGLLVEVRTV RSLKPAPHHH VRLRRSADEA HERWQHKQPL LFTYTDDGRH KARSIRDVSG GEGGGKGGRN KRHARRPTRR KNHDDTCRRH SLYVDFSDVG WDDWIVAPLG YDAYYCHGKC PFPLADHFNS TNHAWQTLV NNMNPGKVPK ACCVPTQLDS VAMLYLNDQS TWLKNYQEM TWGCGCR Composite protein sequence databases NRDB OWL MIPSX SWISS-PROT+TrEMBL Amalgates a number of primary sources, using a set of clearly defined criteria. Composite protein sequence databases NRDB - Non-Redundant DataBase ** developed and maintained by NCBCI ** composite: GenPept (CDS translations of G en Bank), GenPeptupdate, PDB sequences, SWISS-PROT, SWISS-PROTupdate, RIR *■ advantages: comprehesive and up-to date >■ disadvantages: not fully redundant (only identical copies removed), occurence of multiple entries due to polymorphism, incorrect sequences amended in SWISS-PROT re-introduced by translation of GenBank >- default database of the NCBI BLAST (ENTREZ/NCBI) Composite protein sequence databases OWL developed and maintained by University of Leads composite: SWISS-PROT, PIR1-4, GenPept, NRL-3D SWISS-PROT the highest priority for annotation advantages: less redundant, fully indexed (fast) disadvantages: not up-to-date (released every 6-8 weeks), incorrect sequences available from SEQNET of UK EMBnet Composite protein sequence databases ■ MIPSX ** developed by Max-Planck Institute in Martinsried >► composite: PIR1-4, MIPS, NRL-3D, SWISS-PROT, TrEMBL, GenPept, Kabat, PSeqIP ** identical entries and subsequences removed ■ SWISS-PROT+TrEMBL ** developed and maintained by EBI >► composite: SWISS-PROT, TrEMBL ** advantages: comprehensive, minimally redundant, fewer errors ** disadvantages: not as up-to-date as NRDB >■ available from SRS of EBI NRDB OWL PDB SWISS-PROT SWISS-PROT PIR PIR GenBank GenPept NRL-3D SWISS-PROTupdate GenPeptupdate MIPSX SP+TrEMBL PIR1-4 SWISS-PROT MlPSOwn TrEMBL MIPSTrn MIPSH PIRMOD NRL-3D SWISS-PROT EMTrans GBTrans Kabát PseqlP Secondary databases Contains information derived from primary sequence data, typically in the form of abstractions: regular expressions, fingerprints, blocks, profiles or Hidden Markov Models. These abstractions represent destinations of the most conserved features of multiple alignments. The abstractions are useful for discrimination of family membership for newly determined sequences. Terms used in sequence analysis methods fingerprint motif 'cydeggis cyedggis eyeeggit cyhgdggs .cyŕgdgnt insertions frequency matrix weight matrix (block) C-Y-X2-[DGj-G-x-[ST regular expression Three principal methods for building secondary databases Single motif methods fuzzy regular expression [IDENTIFY] t exact regular expression (PROSITE) Füll domain alignmant methods profiles (PROFILE UBRARYI I Hidden Markov Models (PFAM) identity matrices (PRINTS) * Multiple motif methods weight matrices (BLOCKS} Name Helix-loop-helix (Myc type) Sequence [DENSTAP]-K-[LIVMWAGN]- (FYWCPHKR) -[LrVT]-[LIV]-x(2)- [STAV)-[LrVMSTAC]-x-rVMFYH]- [LIVMTA]-{P({P}-tLIVMSR] Structure Function DNA Binding Example 3CRO Cys-His zinc finger Leucine zipper C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H L-x(6)-L-x(6)-L-x(6)-L \ L L E5SK^ E3SSČ3 DNA Binding DNA Binding 2DRP lYSA Secondary databases PROSITE PRINTS BLOCKS PROFILES P F AM IDENTIFY Secondary databases PROSITE ** historically the first secondary database ** maintained by Swiss Institute of Bioinformatics ** motivation: identification of protein families >- abstraction: regular expressions (patterns) ** construction: automatic multiple alignment and manual extraction of conserved regions ** ideally patterns should identify only true-positives (not false-positives) ** entries deposited as two distinct files: pattern file and documentation files ^ primary source: SWISS-PROT ID OPSIN; PATTERN. AC PS00238; APR-1990 (CREATED); NOV-1997 (DATA UPDATE); NOV-1997 {INFO UPDATE). Visual pigments (opsins) retinal binding site. ÍLIVMW]-tPGC]-x<3>-[SAC]-K-[STALIMJ-[GSACNV]-[STACP]-3e(2]-(DENF]-[AP]- x(2)-EXY]. '■ /RELEASE=32,49340; /TOTAL=53(53); /POSITIVE=53(53); /UNKNOWN=0(0); /FALSB_POS=0(0); /FALSE_NEG=0; /PARTIAL=1; /TAXO-RANGE=??E?7; /MAX-REPEAT=1; /SITE=5,retinal; P06002, OPSl_DROME, T; P28678, OPS1_DROPS, T; P22269, OPSl_CALVI, T P08099, OPS2_DROME, T; P28679, OPS2_DROPS, T; P04950, OPS3_DROME/ T P28680, OPS3_DROPS, T; P08255, OPS4_DROME, T; P29404, OPS4_DROPS, T P17646, OPS4_DROVI, T; P35362, OPSD_SPHSP, T; P41591, OPSD_ANOCA, T P41590, OPSD_ASTFA, T; P02699, OPSD_BOVIN, T; P32308, OPSD_CANFA, T P32309, OPSD_CARAU, Tr P22328, OPSD_CHICK, T; P28681, OPSD__CRI.GR, T P08100, OPSD__HUXAN, Tí P15409, OPSD_MOUSE, T; P3S403, OPSD_POMMI, T P02700, OPSD_SHEEP, T; P29403, OPSD_XENLA, T; P22671, OPSD_LAMJA, T P31355, OPSD_RANPI, T; P24603, OPSD_LOLFO, T; P09241, OPSD_OCTDO, T P35356, OPSD_PROCL, T; P31356, OPSDJTODPA, T; P35360, OPS1_LIMPO, T P35361, OPS2_LIMPO/ T; P32310, OPSB_CARAU, T; P28682, OPSB_CHICK, T P35357, OPSB_GECGE, T; P03999, OPSB_HUMAN, T; P28684, OPSV_CHICK, T P22330, OPSG_AS'TFA, T; P22331, OPSH_ASTFAi T; P32311, OPSG__CARAU, T P32312, OPSH_CARAU, Tí P23683, OPSG_CHICK, T; P35358, OPSG_GECGE, T P04001, OPSG_HUKAN, T; P41592, OPSR_ANOCA, T; P22332, OPSR_ASTFAf T P32313, OPSR_CARAU, T; P22329, OPSR_CHICK, T; P04000, OPSR_HUMANr T P34989, OPSL_CALJA, T; P353S9, OPSU_BRARE, T; P23820, REISJTODPA, T P47803, RGR_BOVIN , T; P47804, RGR_HUMAN , T; P17645, OPS3_DROVI, P; PDOC00211; {PDOC00211} {PS00238; OPSIN} {BEGIN} * Visual pigments (opsins) retinal binding site * Visual pigments [1,2] are the light-absorbing molecules that mediate vision. They consist of an apoprotein, opsin, covalently linked to the chromophore cis-retinal. Vision is effected through the absorption of a photon by cis-retinal which is isomerized to trans-retinal. This isomerization leads to a change of conformation of the protein. Opsins are integral membrane proteins with seven transmembrane regions that belong to family 1 of G-protein coupled receptors (see ). In vertebrates four different pigments are generally found. Rod cells, which mediate vision in dim light, contain the pigment rhodopsin. Cone cells, which function in bright light, are responsible for color vision and contain three or more color pigments (for example, in mammals: red, blue and green). In Drosophila, the eye is composed of 800 facets or ommatidia. Each ommatidium contains eight photoreceptor cells (R1-R8): the Rl to R6 cells are outer cells, R7 and R8 inner ceils. Each of the three types of cells (R1-R6, R7 and R8) expresses a specific opsin. Proteins evolutionary related to opsins include squid retinochrome, also known as retinal photoisomerase, which converts various isomers of retinal into 11-cis retinal and mammalian retinal pigment epithelium (RPE) RGR [3], a protein that may also act in retinal isomerization. The attachment site for retinal in the above proteins is a conserved lysine residue in the middle of the seventh transmembrane helix. The pattern we developed includes this residue. -Consensus pattern: ELIVMW] - [PGC] -x (3) - [SAC] -K- [stálím] - [GSACNV] [STACP] -x(2) - [DENF] - [AP] -x(2)-MYl [K is the retinal binding site] -Sequences known to belong to this class detected by the pattern: ALL. -Other sequence(s) detected in SWISS-PROT: NONE. -Last update: November 1997 / Pattern and text revised. [ 1] Applebury M.L., Hargrave P.A. Vision Res. 26:1881-1895(1986). [ 2] Fryxell K.J., Meyerowitz E.M. J. Mol. Evol. 33:367-378(1991). [ 3] Shen D, Jiang M., Hao W., Tao L., Salazar M., Fong H.K.W. Biochemistry 33:13117-13125(1994). {END} Secondary databases PRINTS ** developed at University College London >- motivation: identification of protein families by more than one pattern ** abstraction: fingerprints (aligned motifs) fingerprints store original sequence information ** construction: sequence information in a seed motifs are augmented through iterative database scanning ** construction of fingerprints done manually >- primary source (original): OWL *- primary source (new): SWISS-PROT and SP-TrEMBL (aj OPSIN OPSIN SIGNATURE Type of fingerprint: COMPOUND with 3 elements Links: PRINTS; PR00237 GPCRRHODOPSN; PR00247 GPCRCAMP; PR00248 GPCRMGR PRINTS; PR00249 GPCRSECRETIN; PR00250 GPCRSTE2; PR00251 BACTRLOPSIN PROSITE; PS00238 OPSIN; PS00237 G_PROTEIN_RECEPTOR BLOCKS; BL0023 8 SBASE; OPSD_HUMAN GCRDB; GCR_0085 Creation date 20-DEC-I993; UPDATE 2-JUL-1996 1. APPLEBURY, M.L. and HARGRAVE, P.A. Molecular biology of the visual pigments. VISION RES. 26 <12) 1881-1895 <1986). SUMMARY INFORMATION 73 codes involving 3 elements 1 codes involving 2 elements COMPOSITE FINGERPRINT INDEX 31 73 73 73 21 0 1 1 112 3 (O INITIAL MOTIF SETS ■ OPSINl Length of mot :if = 13 Motif numbe ■r = 1 Opsin motif I - 1 PCODE ST INT YVTVQHKKLRTPL 0PSD_BOVIN 60 60 YVTVQHKKLRTPL OPSD_HÜMAN 60 60 YVTVQHKKLRTPL OPSD_SHEEP 60 60 AATMKFKKLRHPL OPSG_HUMAN 76 76 AATMKFKKLRHPL OPSR__HUMAN 76 76 YIFATTKSLRTPA OPS1_DROME 73 73 VATLRYKKLRQPL OPSB_HUMAN 57 57 YIFGGTKSLRTPA OPS2_DR0ME 80 80 WVFSAAKSLRTPS OPS3_DROME 81 81 WIFSTSKSLRTPS 0PS4_DR0ME 77 77 YLFSKTKSLQTPA OPSD_OCTDO 58 58 YLFTKTKSLQTPA OPSD_LOLFO 57 57 OPSIN2 Length of motif = 13 Motif number = 2 Opsin motif II - 1 PCODE ST INT GWSRYIPEGMQCS OPSD_BOVIN 174 101 GWSRYIPEGLQCS OPSD_HUMAN 174 101 GWSRYIPQGMQCS OPSD_SHEEP 174 101 GWSRYWPHGLKTS OPSG_HUMAN 190 101 GWSRYWPHGLKTS OPSR_HUMAM 190 101 GWSRYVPEGNLTS 0PS1_DR0>JE 187 101 GWSRFIPEGLQCS OPSB_HUMAN 171 101 GWSAYVPEGNLTA 0PS2_DROME 194 101 TWGRFVPEGYLTS OPS3_DROME 194 100 FWDRFVPEGYLTS OPS4_DROME 190 100 NWGAYVPEGILTS OPSD_OCTDO 174 103 GWGAYTLEGVLCN OPSD LOLFO 173 103 Secondary databases i BLOCKS (abstraction: blocks) i PROFILES (abstraction: profiles) i PFAM (abstraction: Hidden Markov Models) IDENTIFY ** developed at Stanford University ** abstraction: motifs encoded by fuzzy approach (alternative residues are tolerated in motifs) ** construction: automatically derived using the program eMOTIF >- primary sources: PRINTS and BLOCKS Properties of Residue property Small Small hydroxyl Basic Aromatic Basic Small hydrophobic Medium hydrophobic Acidic/amide Smalt/polar acids used in eMOTIF Residue groups Ala, Gly Ser, Th r Lys, Arg Phe, Tyr, Trp His, Lys, Arg Val, Leu, íle Val, Leu, íle. Met Asp, Gluf Asn, Gin Ala, Gly, Ser, Thr, Pro Secondary database Primary source PROSITE SWISS-PROT Profiles SWISS-PROT PRINTS OWL* Pfam SWISS-PROT BLOCKS PROSITE/PRINTS IDENTIFY BLOCKS/PRINTS Stored information Regular expressions (patterns) Weighted matrices (profiles) Aligned motifs (fingerprints) Hidden Markov Models (HMMs) Aligned motifs (blocks) Fuzzy regular expressions (patterns) Composite secondary databases INTERPRO - Integrated resource of Protein Families, Domains and Sites ** developed by EBI, SIB, University of Manchester, Sanger Centre, GENE-IT, CNRS/INRA, LION Bioscience AG and University of Bergen (European Research Project) ** provides an integrated view of the commonly used secondary databases: PROSITE, PRINTS, SMART, Pfam and ProDom ** accessible by ftp, www and via member databases S--------Oc^D m SRS %v index backup q,------0^3 ß /B—O—13 user ftp flatfile dump / ProDom ftp INR A GH-----a 6 jJAB-O-í ui-ji www ^S. flattener >^ exp |É_/ PRINTS ftp U Man EH----O^Éfo^-B-O-^ autnor www intäpro ^w imp W*v Pfam ftp Sanger f search \ PROSITE / ftp SIB admin develop sptr Protein structure databases PDB PDBsum Protein structure classification databases SCOP CATCH