Protein structure modelling Bioinformatics - lectures Introduction Information networks Protein information resources Genome information resources DNA sequence analysis Pairwise sequence alignment Multiple sequence alignment Secondary database searching Analysis packages Protein structure modelling Protein structure modelling protein structure protein structure databases prediction of secondary structure prediction of protein fold prediction of tertiary structure modelling of protein-ligand complexes Protein structure Proteins are build up by amino acids that are linked by peptide bonds. The 20 different amino acids occur naturally in proteins. Protein structure can be experimentally determined by X-ray crystallography, nuclear magnetic resonance (NMR) or by electron crystallography. Levels of protein structure: ** primary structure ** secondary structure ** supersecondary structure ** tertiary structure >- quaternary structure M side chain amino group carboxul group ib) peptide bond n M+1 Y Glycine Alanine Valine Gly Ala Val G A V Leucine Leu L Isoleucine íle I Serine Ser S Threonine Cysteine Methionine Proline Thr Cys Met Pro T C M P Aspartic acid Asparagine Glutamic acid Asp Asn Glu D N E Glutamine Gin Q Lysine Lys K Arginme Arg R Histidine His H Phenylalanine Phe F Tyrosine Tyr Y Tryptophan Trp W Primary structure: the linear seq Secondary structure: regions of Id a-helices, ß-Super-secondary structure: the arrangen discrete folc Greek keys, e the overall f c packing of ; structure elei the arrangen tein molecule the arranger protein-prot« Tertiary structure: Quaternary structure: Quinternary structure: uence of amino acids in a protein molecule cal regularity within a protein fold (e.g., turns, ß-strands) nent of a-helices and/or ß-strands into ling units (e.g., ß-barrels, ßaß-units, tc.) )ld of a protein sequence, formed by the its secondary and/or super-secondary nents lent of separate protein chains in a pro- ; with more than one subunit nent of separate molecules, such as in sin or protein-nuclei c add interactions Primary Secondary \ F- ■- P A L F A FLA \ J rv Tertiary Quaternary Synchrotron radiation facility European Synchrotron Radiation Facility at Grenoble, France Protein structure databases PDB PDBsum Protein structure classification databases SCOP CATCH Protein structure databases PDB - Protein Data Bank ** developed at Brookhaven National Laboratory ** currently maintained by Research Col laboratory for Structural Bio informatics (RSCB) *■ world repository of three-dimensional protein structures ** entries from crystallographic analysis (80%), nuclear magnetic resonance (16%) and modelling (2%) ** entries stored as flat files composed of section for information records and section for co-ordinates ** entries identified by unique PDB-ID code (e.g., 1EDE) ** searchable by keywords >■ interactive visualization of structures PROTEIN DATA BANK Summaiylnformati View Structure Download/Display File Structural Neighbors Geometry Other Sources Sequence Details Structure Explorer-1CV2 Summary Informatioi OÔ0O0 Compound: Authors: Exp. Method: Classification: EC Number: Source: Primary Citation: Hydrolytic Haloalkane Dehalogenase Linb From Sphingomonas Paucimobilis Ut26 At 1.6 A Resolution MoIJd: 1; Molecule: Haloalkane Dehalogenase; Chain: A; Synonym: Linb, 1,3,4,6-Tetrach!oro-1,4-Cyclohexadiene Hydrolase; Ec: 3.8.1.5; Engineered: Yes J. Marek, J. Vévodova, J. Damborsky, I. Smatanova, L A. Svensson, J. Newman, Y. Nagata, M. Takagi X-ray Diffraction Hydrolase 3.8.1.5 Sphingomonas Paucimobilis Marek, J., Vévodova, J., Smatanova, I., Nagata, Y., Svensson, LA., Newman, J., Takagi, M., Damborsky, J.: Crystal Structure of the Haloalkane Dehalogenase from Sphingomonas Paucimobilis Ut26 Biochemistry 39 pp. 14082 (2000) [ Medline j Search Lite Search Fields Deposition Date: 22-Aug-1999 Release Date: 11 -Sep-2000 Resolution /Aj: 1.58 Space Group: P 21212 Unit Cell: dim [A]: a 50.26 b 71.67 c 72.70 angles [J: alpha $0.00 beta 90.00 gamma 90-00 R-Value: 0.149 Polymer Chains: A Atoms: 2750 HET groups: HOH Residues: 296 Entry from the PDB database (header) HEADER TITLE TITLE COMPND COMPND COMPND COMPND COMPND COMPND COMPND COMPND SOURCE SOURCE SOURCE SOURCE SOURCE SOURCE SOURCE KEYWDS EXPDTA AUTHOR AUTHOR REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK HYDROLASE HYDROLYTIC HALOALKANE 2 PAUCIMOEILIS UT2 6 AT MOL ID : 1r HALOALKANE 22-AUG-99 1CV2 DEHALOGENASE LINE FROM SPHINGOMONAS 1.6 A RESOLUTION DEHALOGENASE; 2 MOLECULE 3 CHAIN: A. 4 SYNONYM: LINE, 1,3,4, 6-TETRACHLOPO-1, 4-CYCLOHEXAD IENE 5 HYDROLASE; 6 EC: 3.8.1.5; 7 ENGINEERED: YES; 8 BIOLCGICALJJNIT: MONOMER MOL_ID: 1; 2 OPGANISM_SCIENTIFIC: SPHINGOMONAS PAUCIMOEILIS; 3 STRAIN: UT2 6; 4 EXPPESSION_SYSTEM: ESCHERICHIA COLI; 5 EXPPESSION_SYSTEM_STRAIN: HE101; 6 EXPPESSION_SYSTEM_VECTOP_TYPE: PLASMID; 7 EXPPESSION_SYSTEM_PLASMID: PMYLE1 DEHALOGENASE, LINDANE, EIODEGRADATION, ALPHA/BETA-HYDPOLASE X-RAY DIFFRACTION J.MAPEK,J.VÉVODOVA,J.DAMEOPSKY,I.SMATANOVA.L.A.SVENSSON, 2 1 1 1 1 1 1 1 2 2 3 3 3 3 3 J . NEWMAN,Y.NAGATA,M.TAKAGI REFERENCE 1 AUTH I.SMATANOVA,Y.NAGATA,L.A.SVENSSON,M.TAKAGI,J.MAPEK TITL CRYSTALLIZATION AND PRELIMINARY X-RAY DIFFRACTION TITL 2 ANALYSIS OF HALOALKANE DEHALOGENASE LINE FROM TITL 3 SPHINGOMONAS PAUCIMOEILIS UT26 PEF ACTA CPYST. D V. D53 123 1 1999 PEFN DK ISSN 0907-4449 RESOLUTION. 1.58 ANGSTROMS. REFINEMENT PROGRAM AUTHORS SHELXL-97 G.M.SHELDPICK Entry from the PDB database (crystallographic info) REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK 3 3 3 3 3 3 3 2 90 2 90 290 2 90 2 90 2 90 2 90 2 90 2 90 290 2 90 2 90 290 290 2 90 2 90 2 90 2 90 2 90 2 90 2 90 2 90 2 90 1.5S 20. 0 0.000 94. 2 THROUGHOUT RANDOM P 21 21 2 DATA USED IN REFINEMENT. RESOLUTION RANGE HIGH (ANGSTROMS) RESOLUTION RANGE LOW (ANGSTROMS) DATA CUTOFF (SIGMA(F)) COMPLETENESS FOR RANGE (%) CROSS-VALIDATION METHOD FREE R VALUE TEST SET SELECTION CRYSTALLOGRAPHIC SYMMETRY SYMMETRY OPERATORS FOR SPACE GROUP: SYMOP SYMMETRY NNNMMM OPERATOR 1555 X,Y,Z 2555 -X,-Y,Z 3555 l/2-X,l/2+Y,-Z 4555 l/2+X,l/2-Y,-Z WHERE NNN -> OPERATOR NUMBER MMM -> TRANSLATION VECTOR CRYSTALLOGRAPHIC SYMMETRY TRANSFORMATIONS THE FOLLOWING TRANSFORMATIONS OPERATE ON THE ATOM/HETATM RECORDS IN THIS ENTRY RELATED MOLECULES. TO PRODUCE CRYSTALLOGRAPHICALLY SMT RYl SMT RY2 SMT RY3 SMT RYl SMT RY2 SMT RY3 1 1 1 2 2 2 1 0 0 -1 0 0 000000 000000 000000 000000 000000 000000 0 1 0 0 -1 0 000000 000000 000000 000000 000000 000000 0 0 1 0 0 1 000000 000000 000000 000000 000000 000000 0 0 0 0 0 0 00000 00000 00000 00000 00000 00000 Entry from the PDB database (sequence, sec. elements) DBREF 1CV2 A 1 296 DBJ BAA03443 BAA0 3443 1 296 SEQRES 1 A 296 MET SER LEU GLY ALA LYS PRO PHE GLY GLU LYS LYS PHE SEQRES 2 A 296 ILE GLU ILE LYS GLY ARG ARG MET ALA TYR ILE ASP GLU SEQRES 3 A 296 GLY THR GLY ASP PRO ILE LEU PHE GLN HIS GLY ASN PRO SEQRES 4 A 296 THR SER SER TYR LEU TRP ARG ASN ILE MET PRO HIS CYS SEQRES 5 A 296 ALA GLY LEU GLY ARG LEU ILE ALA CYS ASP LEU ILE GLY SEQRES 6 A 296 MET GLY ASP SER ASP LYS LEU ASP PRO SER GLY PRO GLU SEQRES HELIX 1 1 SER A 42 ALA A 53 HELIX 2 2 TYR A S2 LEU A 96 HELIX 3 3 TRP A 10 9 ARG A 120 HELIX 4 4 GLU A 145 ARG A 155 HELIX 5 5 GLY A 15 9 LEU A 164 HELIX 6 6 VAL A 168 LEU A 177 HELIX 7 7 GLU A 1S4 GLU A 192 HELIX S S ARG A 202 ILE A 211 HELIX 9 9 ALA A 21S SER A 234 HELIX 10 10 T HR A 250 ARG A 25S HELIX 11 11 ILE A 274 ASP A 277 HELIX 12 12 SER A 27S LEU A 2 93 SHEET 1 SI S LYS A 12 ILE A 14 0 SHEET 2 SI S MET A 21 GLU A 26 ■ -1 N MET A 21 O ILE A 14 SHEET 3 SI S ARG A 57 ASP A 62 ■ -1 N ALA A 60 O ILE A 24 SHEET 4 SI S ASP A 30 HIS A 36 1 N ILE A 32 O ARG A 57 SHEET 5 SI S VAL A 102 HIS A 107 1 N VAL A 103 O PRO A 31 SHEET 6 SI S VAL A 125 MET A 131 1 N ALA A 12 9 O LEU A 104 SHEET 1 SI S LYS A 23S PRO A 245 1 N ILE A 241 O TYR A 130 SHEET S SI S GLN A 2 63 GLY A 270 1 N THR A 264 O LYS a : 23S CISPEP 1 ASN A 3S PRO A 39 0 -2 . 5C 1 CISPEP 2 ASP A 73 PRO A 74 0 -2 . 4C 1 CISPEP 3 T HR A 216 PRO A 217 0 -3 .04 CISPEP 4 GLU A 244 PRO A 245 0 3 .01 CISPEP 5 PRO A 295 ALA A 296 0 20 .14 J-J-J-J-J-J-J-J-J-J-J-J-J-J-J-J-J-J-J-J-J-J-J-J-J-J-J-J-J-J- sssssssssssssssssssssssssssssssssssss 5-5-5-5-5-5-5- 1-3 1-3 1-3 1-3 1-3 1-3 1-3 OJ OJ OJ OJ OJ OJ OJ OJ DO DO DO DO DO DO DO DO DO DO h^ h^ -J CPi Ln J^ OJ DO h^ O UD CO -J cn Ln Ji OJ DO h^ O UD CO -JcnLn^ojDOi-LOUDCo-_jcnLn^ojDOi-L ■3 n tu tu ü Ö íi td J- DO DO O d M J- n tu Ü íi ffl > n o o n ěi O ring 5- hítiJtiJtiJti]tiJtiJtiítiJtiJtiítiJOOOOOOOWWtfiWWtfiWlfilfi)H^^)H>H]H]^H] o 3 iDcooDcocococomoDmoDco-j^-j-j-j-j^mcnmmmmmcnmüiüiüiüiLnrrrr UDL^cnL^^cn^CO|^OUDOOJOJOJOODO[^|^|^OOOOl^l^ODO[^|^|-^OUDUD--J--J J^l^cnUDCü[^^|^[^OJaiOJi^UDJiUDO^^aiUDO ŕCOHŕŕCncnNHŕ-JCO-J-JŕŕLnODHŕŕLnHŕOOJHlDHOCOlDOlDOLncn I—^ I—^ I—^ I—^ I—^ I—^ I—^ I—^ I—^ I—^ I—^ I—*■ I—^ I—^ I—^ I—^ \-L HH-J-JCOlDŕŕQlUDCOlDŕm^COCOOmŕlDCOCOŕHmŕŕlDmHOLnmOLnLn HCnOÜDHWrlDÜDHrrWOHlDOm^LÜLn-JlDHmLnLnOJrOODOi-JmrCOLÜ OOH-JHHrOOrODW-JÜ-JmCO-JcnÜlO-jmiDHOW-JülDW^HLnOJOH lili ODO[^[^OJ[^OJOJ|^|^[^[^|^|^[^|^[^[^[^|^|^OOOOJ[^ WaiCOQiHmHLn-JCOCO-JŕH^HH-JCOm-J-JŕmŕmOOmlDlDlDCOQimŕm H^Qiŕm-jmcoHCOOLnLnLnH-JŕŕOcocooco-jwmLnLn-J-Jui^HO^mco fl) T) D CD Q. OJ ooooooooooooooooooooooooooooooooooooo OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO |_1 tO |_1 |_1 |_1 |_1 |_1 h-^l-L h^DODOh-LI-LI-LI-LI-LLnCnCn^DOl-L |_L|_L|_L|_L|_L|_L|_L|_lDOOJDO roQiiDLnODrocn-J-JmOH^WQimLnwoww-JOOyD-JWOOJHOQiroOHO CO Wm-JmULnLn-JOWmCOLnOŕNNLnŕHWO-JOLnOCOUŕmlDNNOmŕH OJ gnonnnnooongnooonngíonnnoongnonngoong Protein structure databases PDBsum ** developed at University College London ** summaries and analyses of protein structures (secondary database derived from PDB) >- summary of PDB entries: resolution, R-factor, # protein chains, topology, ligands, metal ions, etc ** analysis of PDB entries: protein-metal and protein-ligand interactions, protein validation ** provides links to many related databases PDBsum i m ■ Wk iy*& P^F^h F^^H I \ * ■* ^t brB | RasMol NIRITIL v.1.0 Structure viewers 1^23* PDBid:1cv2 Hydrolase Title: Hydrofytic haloalkane dehalogenase lint from sphingomonas paucimobilis ut26 at 1.6 a resolution Structure: Haloalkane dehalogenase. Chain: a. Synonym: lint, 1,3,4,6-tetrachloro- 1,4-cyclohexadiene hydrolase. Engineered: yes Source: Sphingomonas paucimobilis. Strain: ut26. Expressed in: escherichia coli. Resolution: 15Sk R-f actor: 0.152. R-free: 0.211. Authors: J.Marek, J.Vévodova, J.D amborsky, LSmatanova, LASvensson, J.Newman, Y.N agata, M.Takagi Date: 22-Aug-i PDB header records Enzyme class from PDB file: E.C.3.8.1.5 E.C.->PDB Chain A í 293 residues structural classification (1 domain): Links CATH no. Class Architecture EE1 [EI 3.40.50.950 -> AlptiaBeta 3-Layer(aba) Sandwich =CZ>-------------^^---------^=>--------------^-------^---------^=>- GAKPFGEKKFIEIKGRRMAYIDEGTGDPILFQHGNPT55YLWRNIMPHCAGLGRLIACÜL 3 Í5 5 25 S i5 S S 35 55 51 S o o o o o M j A n 4 H j 2-2—i—^—^#^i^------^=>^^^*^ T GMG D S DK LDP S G P E R Y AY AE HRD YLD ALWE ALD LG DR V VL V VHDffG S A LG FDW AR RHR E 5 ?5 Ť5 š5 i5 55 55 Í35 Í35 fTô i75 Ho A ä H6_ H7__ H$ _ H9_mp oT--------^-------------viF—^^S^-------^-------^~^----------- R VQQ T A YME A T AMP T EWAD F P EQDR DL FQ AFR SQ AG E E L VLQDH VF VEQ VL PGLTLRPLS (U Í15 Í15 Í35 Í35 Í55 Í55 Í25 Í35 vfô vfš íiô HU H12 HL3 HL4 vftft ft H15 A ■^Wvíř---*W^?\. ^^^^---1----- EAEMAAYREPFLAAGEARRPTL5WPRQIPIAGTPADVVAIARDYAGWL5E5PIPKLFINA LÍ4 Í35 Í35 ^35 i35 275 2Í5 2?5 Š35 215 Š15 í35 ft ftftv H16 ft A ftvHl7 ft HIS O—^^---r---->—j^-^W¥^ ;í EPGALTTGRMRDFCRTWPNQTEITVAGAHFIQED5PDEIGAAIAAFVRRLRPA 234 i55 i55 i25 i25 ľí5 ľí5 35 3š ii5 2?T P°W version View chain A alone. PROMOTIF summary: 1 sheet, 8 strands, 18 helices, 19 beta turns, 3 gamma turns, 1 beta bulge, 1 beta hairpin, 4 beta alpha beta units, 1 psi-loop. Protein structure classification databases Classification attempts to capture the structural similarities among proteins. The structural similarities relate to the evolution The structural similarities may imply function. The classification scheme is dependent on the underlying philosophy. Protein structure classification databases SCOP - Structural Classification of Proteins ** developed at MRC Laboratory of Molecular Biology ** construction: combination of manual and automatic methods (complicated by multidomain proteins) fold = same secondary elements in same arrangement, independently of common evolutionary origin superfamily = low identity but common evolutionary origin implied from common structure and function family = sequence identity >30% Protein: Haloalkane dehalogenase from Sphingomonas paucimobilis, UT26, LinB Lineage: 1. Root: scop 2. Class: Alpha and beta proteins (a/b) Mainly parallel beta sheets (beta-alpha-beta units) 3. Fold: alpha/beta-Hvdrolases core: 3 layers, a/b/a; mixed beta-sheet of 8 strands, order 12435678, strand 2 is antiparallel to the rest 4. Superfamilv: alpha/beta-Hvdrolases many members have let-handed crossover connection between strand 8 and additional strand 9 5. Family: Haloalkane dehalogenase 6. Protein: Haloalkane dehalogenase 7. Species: Sphingomonas paucimobilis. UT26, LinB PDB Entry Domains: 1. 1CV2lBM 1. chain aEmD 2. IdOLüm complexed with br, gol 1. chain a CD Protein structure classification databases CATCH - Class, Architecture, Topology, Homology ** developed at University College London ** construction: mostly automatic ** unique numbering scheme analogous to Enzyme Classification (E.C.) scheme class = gross secondary structure content architecture = gross secondary structure arrangement topology = shape and connectivity of secondary structures (60% of larger protein matches smaller one) homology = sequence identity >35%, common ancestry sequence = clustering based on sequence identity CAfH Home>Top>C[3]>A[40]>T ^; t "A rArí ATAAŮÄľl l-T>CÄC»ZlirA CV3TCAOÍM CATH DHS Gene3D Impala FTP Internal H [950] > S [15] > N [2] View as XML Search Domain 1cv2A0 General text There are either no other non-identical relatives within this fold group or the structural comparisons for this domain have not yet been calculated. Navigation Top of heirarchy Up one level Prediction of secondary structure ■ Algorithms assign probability for occurrence of a-helix, ß-strand, turn and random coil at particular position in the sequence. stereochemical and Methods: statistical, stereochemical and homology/neural networks based. All methods rely on information derived from known 3D structures. Most recent methods use the information from multiple alignments. Reliability of the best current methods is >70%. Prediction of secondary structure Chou-Fasman and GOR ** statistical - amino acids show preference for particular secondary structure elements PHD and NNPredict *■ neural networks - the rules for prediction are not defined in advance, they are created by training NNSSP and PREDATOR ** nearest neighbour approach J PRED ** consensus approach - utilises multiple alignments and state-of-art method - makes isensus 1 10 20 30 40 50 60 Flavodoxin A K nnpredict . - P redictP rotein - E SSPRED ■ E GOR H E Levin - E DPM — 4* SOPMA * M CNRS Consensus - AKIGLFYGTQTGVTQTI AESIQQEFGGESIVDLNDlANADASDLNAYDYLII GCPTWNVG E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E EE 10FV Beta 1 .....^EE£ E - - - -E - - - - E- E- - E E E E E ET - - T E- E .....E E E E E H H H H H H H hhhhhhhhhhh hh_hhhhhhh hHh H H H H H H H H H H H H HHHHHHHHHHH H H H H H H H H E E E Alpha 1 H H E E E E E E HHH--HHHHHHHHHHH E- - - - EEHHHHHHHHH - - T T - - EH- H H HHH H H h.....e aga- - h h t h H........HH- TT - H H.....EU- HH- HHHH Beta 2 H H H H H H H H H H H H H H H H H H H H H H H H H1AIH H H H H H H H H - - E H E - H E H - - T- - Alpha 2 E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E Beta 3 HH - HT T E- E 70 60 90 100 110 120 Flavodoxin nnpredict PredictProtein SSPRED GOR Levin DPM SOPMA CNRS Consensus 10FV ELQSDWEGI YDDL DSVNF QGKKVAYFGAGDQVGYSDNFQDAMGI LEEKI SSLGSQTVGYW HHHHHHHHHHH HHHH-H- - - - H H E ■ ■ H H -- H - - E E E HHHHHHHHHHH H HHHH H HHT T - - - T T T HHHH H MlH H IT T T T - H H E E- T T H H HT T Alpha 3 Alpha 4 E E E E E E E E E E E E E E E E E E E E E E E E E EdEGlT T E E E E E E E E E E E E Beta 4 H H H H H H H H - - - - E E..... - - - E E E E - T - - - T E - T T T -E.......- - - - - E E..... HH HH H H T T H H E E E - HHHHHHHH HHHHHHHHHHHH HHHHHHHHHHHH HHHHHHHH HHHHHHHHHH HHHHHHHHH HHHHHHHHHHH HHHHHHHHHH - E - - E -E E E E E - - E E E - -E E E E E E ET- ■ E T - ■ EE- E-- E E - - -.EE--■ Alpha 5 Flavodoxin nnpredict PredictProtein SSPRED GOR Levin DPM SOPMA CNRS Consensus 10FV 130 140 150 160 PIEGYDFNESKAVRNNQFVGLAI DEDNQPDLTKNRI KTWVSQLKSEFGL - E E E E .............H H - - ...........E E E - - E E E E ...............EEEEE EEE- - - HHHHHHHHHHHHHE HlH H H - - - H H TE-EEE--HHHH-H-T--EE........HHH- ■-•T---TH-HHHH- --EE- HHHHH - - T - - — ..........TTEETTEE- • [£E|- T T H H H H ........H- HHHH- T- EE- \M£h"- H - - - H H H- H H H H H HHHHHHHHHHHHHHH HHH H HHH H H H H H H H H H H H HBOH HMBHH H HHHHHHHHHHHH H H H H HHHHHHHHHHHH HHHHHHHHHHHHH H- - -HHHH H H H E HH- -- - H H- -TTTT-HH- - Betas Alpha 6 Prediction of protein fold Threading ** treading = protein fold assignment or fold recognition ** target sequence is searched against database of folds (3D profiles) and threaded models are constructed >- 3D profile - each residue in 3D structure is assigned environmental variables (buried area, fraction of side chain covered polar atoms, secondary structure, etc.). Assumption - environment of the residue should be more conserved that the residue itself. ** residue can be also described by its interactions ** match of target sequence with 3D profile (quality of threaded models) is quantified by Z-score or energy ** limitation: can not handle multi-domain proteins Ol sÄ G-A-L-T-E-S-Q-V- Fold 1 Fold 2 Fold 3 Fold 4 ... Calculate SCORE or ENERGY Fold n Jl í H—> Energy Predicted Fold Prediction of protein fold Bioinbgu ** consensus method utilising predictions from five different algorithms 3D-PSSM ** scoring functions: ID-PSSMs (sequence profiles built from relatively close homologues), 3D-PSSMs (more general profiles containing more remote homologues), matching of secondary structure elements, and propensities of the residues for solvent accessibility GenThreader ** hybrid method: profile-based alignment, evaluation of alignments by threading, evaluation of threaded models by neural network Prediction of tertiary structure Ab initio *■ 3D structure of a protein is predicted from first principles (search for global minimum structure) ** current algorithms are not very reliable Homology modelling ** 1. alignment of modelled sequence against sequences of structurally similar proteins (templates) ** 2. "extraction" of the backbone from template structure and positioning of side-chain ** 3. modelling of loops ** 4. structure refinement and validation Validation of protein models Stereochemical Accuracy Qualily of Protein Models Packing Quality Folding Reliability ■ Torsion angles —i M amchaln torsion angle distribution, (Ramacftandran plots) -ŕ Sidechai n torsion angle distribution ÍX,-Xa Plo») ■ Planati ty of peptide bonds -ŕ íůangladistribution * Chiralily of C-atoms -> Cangiedistribuoori * Bond lengths * Bond angles * Planarity -ŕ Arofnaiic ring syslems and sp'-hybridized end groups • Interatomic distances -> "ßumpcheck7 -> 'Atomic contact qualily' • Secondary structural elements y LocaĎon and geometry cf secondary structurflf elements » Hydrophob city -> Distribulionot polar and nonpolarammo acids ■ Solvent accessible surface of amino acids • Unsatisfied buried H-bond donors/acceptors 3D-comparison model/ Template structure -^ RMS deviations b^twMn backbone atoms 3D-1D-protiles -ŕ Comparison ot environment strings with amino acid sequences Knowledge-based potentials -* Energy-based comparison Prediction of tertiary structure SWISS-MODEL >■ fully automated modelling server ** input = protein sequence; output = PDB file » 1. search of ExNRL-3D using BLASTP for potential templates; 2. select all templates with sequence identities above 25%; 3. Generate structures of 3D models; 4. energy minimise models using GROMOS 96 ** first approach and optimise mode (Swiss-PDBViewer) MODELLER >- most widely used academic program for homology modelling (satisfaction of spatial restrains) Modelling of protein ligand-complexes ■ Docking ** positioning of small organic molecules (ligands) inside the protein active site ** different orientations and conformations of the ligand are evaluated using geometric or energetic scoring functions ** Protei n-l i gand interaction energy = van der Waal s term + electrostatic term + H-bond term + entropie term ** flexible docking - considers different conformation of ligand; different rotamers of protein side chains Software: DOCK, AUTODOCK, FLEX