!"#!"#$% #% The wonderful world of structure archiving What’s happening and what’s next? Gerard Kleywegt, PDBe, EMBL-EBI Winter School on Structural Biology, CEITEC, Brno, 13 February 2015 Summary… More! Bigger! Better! Cooler! Blobbier! Outline •! PDB in 2014/2015 •! More! (100,000+ entries) •! Bigger! (Large structures & “Formageddon”) •! Better! (D&A & validation) •! PDBe in 2015 •! Cooler! (Nifty new things coming this year) •! The future •! Blobbier! (Cellular imaging and hybrid methods) (images unrelated) Who’s who again? PDBe, wwPDB, EMDataBank PDBe at a glance •! Mission: Bringing Structure to Biology •! Founding partner of Worldwide Protein Data Bank (wwPDB) •! Birthplace of Electron Microscopy Data Bank (EMDB) •! Founding partner of EMDataBank •! Major activities: •! Deposition and annotation site for structural data on biomacromolecules & complexes (X-ray, NMR, EM) •! Integrated resource to serve structural data and information •! Liaise with structural biology community •! Guided by advisory bodies made up of community experts •! PDBe, wwPDB and EMDataBank advisory committees •! (Validation) Task Forces (method-specific) & ad-hoc working groups pdbe.org !"#!"#$% !% wwPDB wwpdb.org wwPDB partnership •! Collaborate on “data in” •! Policy issues •! Weekly releases •! Validation standards •! Format specifications •! Chemical Component Dictionary •! Deposition and annotation procedures •! Archive quality and remediation •! Journal interactions •! Community interactions •! Friendly competition on “data out” •! Serving PDB data with added-value •! PDB-based services •! Other services, resources and activities wwpdb.org Electron Microscopy Data Bank (EMDB) •! Founded at EMBL-EBI in 2002 •! Since 2007 - operated jointly by PDBe and RCSB •! EMDataBank resource (NCMI, PDBe, RCSB) funded by NIH since 2007 •! Additional funding from EMBL-EBI, BBSRC, EU and MRC •! Own Advisory Committee and Validation Task Force emdatabank.org – pdbe.org/emdb Roles of PDBe, wwPDB and EMDataBank Gutmanas et al., Acta Cryst. D69, 710 (2013) Roles PDBe wwPDB EMDataBank Community interactions CCP4, CCPN, CCP-EM (V)TFs, IUCr, journals, … EM-VTF, workshops, portal Challenges CAPRI, CASD-NMR (CASP) EM modelling Formats (3D cellular imaging data?) PDB, PDBx, working groups Maps, FSC, segmentations Data models & ontologies Crystallisation ontology, CCPN PDBx EMDB data model, EMX New methods (SXT? 3DSEM? CLEM?) SAS? Hybrids? (?) Deposition, annotation, validation, archiving, distribution (3D cellular imaging archive?) PDB, BMRB EMDB Integration SIFTS and more PDB annotation EMDB annotation Advanced services exposing structural information Many! - PDB in 2014/2015 More! Bigger! Better! PDB in 2014/2015 •! More! •! Over 100,000 entries in the active archive •! More than 10,000 of these are NMR entries •! Bigger! •! Large structures now released intact (not SPLIT) •! Required move to mmCIF/PDBx (“Formageddon”) •! Better! •! New common Deposition & Annotation system •! Validation reports •! Archive remediation !"#!"#$% &% 14 May 2014: 100,000+ PDB entries! Over 10,000 NMR entries in the PDB •! Including >100 membrane proteins Large structures now released intact •! Limitations of PDB format necessitated SPLIT entries •! mmCIF/PDBx format does not have these limitations •! Workshop at EMBL-EBI in 2011 – decision to support PDBx in major refinement packages and to switch to PDBx as the distribution format for the PDB archive mmcif.wwpdb.org Large structures to be released intact •! 2013 •! Large structures can be deposited intact (ADIT, AutoDep) •! Distribution as PDBx and PDBML in separate ftp area, and also as SPLIT entries in regular archive •! 2014 •! New wwPDB Deposition & Annotation system designed to handle large structures •! July: previously SPLIT entries reunited and distributed in parallel in separate ftp area •! 10 December: “Formageddon”! •! Switched to PDBx as archive distribution format •! SPLIT entries replaced by reunited entries mmCIF/PDBx workshop for programmers EMBL-EBI, November 2013 New Deposition & Annotation system •! Designed to handle large structures from any combination of techniques (X-ray, NMR, EM) •! Jointly developed by wwPDB partners •! Replaces AutoDep, ADIT, EMDep •! Validation integral part of deposition and annotation •! X-ray module in production since January 2014 •! X-ray/NMR/EM/neutron coming in 2015 http://deposit.wwpdb.org/deposition !"#!"#$% '% Validation of structural data and models is crucial Archive remediation •! Never-ending process to improve the content, description and consistency of data in the PDB archive •! Incidental: affecting one or a few entries, often acting on information from users •! Archive-wide •! In the past: literature citations, sequence references, taxonomy assignments, peptide ligands (inhibitors, antibiotics) •! Future: carbohydrates, protein modifications, metal-containing ligands PDBe in 2015 Cooler! PDBe in 2015 – lots of cool stuff coming soon! •! Cooler! •! Redesigned PDB and EMDB entry pages, based on extensive user-testing •! Several unique features to enhance the content of the archives •! New search/browse system that makes finding things easier and allows for further analysis •! Redesigned “corporate” web site (pdbe.org) •! For developers: all our data will be available through an API •! A bit later: validation portal to make analysing and validating Xray, NMR and EM structures easier •! Lots of 3DEM-related activity, too! PDBe website needed a make-over •! Many improvements: •! Quality of underlying data (literature, assemblies, …) •! Unique data (e.g., through mining full-text articles) •! Data access through API (PDB, EMDB, SIFTS, PISA, validation) •! Data discovery and analysis (search/browse) •! User-driven redesign of entry pages •! User interfaces (e.g., linked 1ry/2ry/3ry structure viewer) •! Javascript-based 3D viewers (instead of Java applets) •! Improved “corporate” website Expression data Genomic context Bio. Process Bio. Function Pathway Disease Taxonomy Cellular Location Protein Enzymes Active site Small mol. binding Small mol. Representative lig. conformer Related str. Publication Macromolecular Structure Sample Info. Exp. info Representative Binding site Quality of Structure Representative structure Seq. Similar Struct. Similar Flexible/disorder Sec. structure Seq. motif Biol. Complex Interfaces Struct. Domain PDBe “tubemap” relationships Close relation (>7 users) Relation (>4 users) Relation within a block Relation between blocks UniProt card-sorting names: Function Subcellular location Names and taxonomy Family and domains !"#!"#$% $% Citation information •! Primary citation: figures with captions if full-text publication •! Reviews/articles that reference primary citation or mention PDB entry in full text (datamining by Europe PMC) Entry images •! Display “preferred” assembly ! smallest assembly that contains all entities (i.e., distinct molecules) •! If no such assembly, use the one with the most entities •! … and find out if annotation is correct or not Colour by chain Colour by entity PDB entry 3MIN Entry/assembly images – virus structures 1NQX 3ZFF !"#!"#$% (% PDBe – new search/browse system •! Provides suggestions based on reference data (auto- complete) •! E.g., enzyme names, GO terms, protein names, CATH classification •! Supports categorisation of result set (“faceting” – like Amazon) •! By method of structure determination, release date, resolution, ligands, author, CATH domain, … •! Supports multiple views of the result set •! Entry view, molecule view (more to come) Auto-complete interface Search results – entry view Search results – entry view Search results – macromolecule view !"#!"#$% )% Search results – compound view Search results – protein-family view PDBe – new “corporate” web site The PDBe API •! Application Programming Interface •! PDB, EMDB, CCD, SIFTS, PISA, SSM, validation, topology, search system •! Used to populate new PDBe entry pages •! Available to external developers Info: sameer@ebi.ac.uk Delivering, analysing and validating experimental data at PDBe EMDB volume viewer pdbe.org/emd-1788 !"#!"#$% *% EMDB visual map analysis pdbe.org/emd-1788 Slice viewer for tomograms Collaboration with OME (Dundee) pdbe.org/emd-1053 OLDERADO •! Helps you analyse NMR ensembles •! How many clusters? •! Representative models? •! Which rigid domains? •! pdbe.org/olderado Vivaldi pdbe.org/vivaldi ChartsInteractive3Dviewer ModelselectionSummaryMoregraphs Vivaldi++ (design study) Prototype/Mock-up - “EDS+” +,--./0123%45/6% 7898:;80<280%=->1% 7/?5<,/.-/4/-% @,8-513% 7/?5<,/.:/0125:% 45/6%651;%/-/:12>0% ,=% ?/-/:1>2% 7/?5<,/%50%B>:,?% C4/28--%@,8-513% !"#!"#$% D% Ligand-validation prototype PDBe - validation portal •! Unified delivery of experimental data and validation information for X-ray, NMR and EM •! Visualisation (1D, 2D, 3D, linked) •! Selections •! Terminology •! Look-and-feel (Mock-up!) PDBe - validation portal •! If you can use one, you can use them all! •! Lowers barrier for non- experts •! Interactive pages will include •! Interactive 3D viewer •! Graphs, plots and tables, tightly coupled to 3D viewer •! Static entry pages will provide intermediate level of detail (Bob Hanson of Jmol/JSmol fame) The future Blobbier! The world is changing •! Biology •! Structural biology •! Bioinformatics •! ICT •! Funding landscape •! Emerging nations •! How about structural biology archives? Challenges facing archives •! Increasing size and complexity of structures and data •! More heterogeneous information at a range of scales •! Need to coordinate across disciplines •! Need to integrate structural data on scales from atoms to cells •! Need to integrate structure with other biological and chemical data •! Need to deliver appropriate structure data to non-experts (in context of their work) •! (Funding) !"#!"#$% #E% Structural biology archives today Technique Models Data X-ray PDB PDB NMR PDB BMRB + PDB 3DEM PDB EMDB •! Simple world •! 3 techniques •! 3 archives •! atomistic models •! Trends •! Many techniques, ranging in scale from atoms to cells •! Many types of model: atomistic, “residue-istic”, map segmentations (with contrast), envelopes (no contrast), geometric shapes •! Hybrid methods (integrative modelling): mixed models (atomistic/blobby; experimental/theoretical) and heterogeneous data with variable information content Structural biology archives tomorrow? Technique Models Data X-ray PDB PDB NMR PDB BMRB + PDB 3DEM PDB + “ModelDB” + “BlobDB” EMDB SAXS/SANS PDB + “SASDB” “SASDB” + PDB Theoretical models “ModelDB” n/a Hybrid methods PDB + “ModelDB” + “BlobDB” “BlobDB”? PDB? SXT, 3DSEM, … “BlobDB” “3DCellDB”? EMDB? CLEM PDB + “ModelDB” + “BlobDB” “3DCellDB”? EMDB? •! “ModelDB” = theoretical atomistic models? •! “BlobDB” = non-atomistic models and hybrid methods data? •! “SASDB” = SAXS/SANS data and models not archived elsewhere? •! “3DCellDB” = 3D cellular imaging data and segmentations? The key will be “integration” Image: Zeev-Ben-Mordehai et al., 2014 Integrating imaging and 3D structural data Instruct & Euro-BioImaging •! Enriching biology with structural information Scales / Methods Atoms Molecules Machines SamplesCells NMR X-ray EM CLEMSAXS ET 3DSEMSXT Integrating imaging and 3D structural data Elixir •! Annotating and linking structure through biological information ChEBI ENA/UniProt ChEMBL/IntAct Reactome Information / Resources Chemistry Sequences Interactions PathwaysVariation Ensembl Integrating imaging and 3D structural data • Archive 3D cellular imaging data (EM, ET; later: SXT, 3DSEM, CLEM) • Annotate using bioinformatics resources, classifications and ontologies (UniProt, GO) • Link to 3D molecular structure data (X-ray, NMR, EM) in PDB and EMDB • Provide tools to make the information easily accessible and to facilitate discovery Sample Atomic details !"#!"#$% ##% Archiving cellular imaging data •! EMDB: mainly single-particle, tomogram-averaging and tomographic data •! EMPIAR: •! Archive raw data associated with EMDB entries (for validation, methods development, teaching, …) •! Develop tools for segmentation annotation of 3D imaging data •! Develop integrated viewer for structural data on scales from molecules to cells (or atoms to small samples) •! Future (if funded): develop archive for SXT and 3DSEM data and accommodate CLEM data in EMDB/EMPIAR EMPIAR pdbe.org/empiar Prototype browser (EMD-2179) PDBe team Funding & acknowledgments •! Many thanks to many colleagues at PDBe, EMBL-EBI, wwPDB, EMDataBank and elsewhere! Questions & discussion… http://pdbe.org/ http://www.facebook.com/proteindatabank http://twitter.com/PDBeurope http://youtube.com/user/ProteinDataBank http://wwwdev.ebi.ac.uk/pdbe/ http://wwwdev.ebi.ac.uk/pdbe/entry/search/index http://wwwdev.ebi.ac.uk/pdbe/api/doc/api-index.html http://wwwdev.ebi.ac.uk/pdbe/entry/pdb/4i91 !"#!"#$% #!% If you see this slide, I’ve gone too far