D486–D492 Nucleic Acids Research, 2018, Vol. 46, Database issue Published online 6 November 2017 doi: 10.1093/nar/gkx1070 PDBe: towards reusable data delivery infrastructure at protein data bank in Europe Saqib Mir1 , Younes Alhroub1 , Stephen Anyango1 , David R. Armstrong1 , John M. Berrisford1 , Alice R. Clark1 , Matthew J. Conroy1 , Jose M. Dana1 , Mandar Deshpande1 , Deepti Gupta1 , Aleksandras Gutmanas1 , Pauline Haslam1 , Lora Mak1 , Abhik Mukhopadhyay1 , Nurul Nadzirin1 , Typhaine Paysan-Lafosse1,2 , David Sehnal3 , Sanchayita Sen1 , Oliver S. Smart1 , Mihaly Varadi1 , Gerard J. Kleywegt1 and Sameer Velankar1,* 1 Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Welcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK, 2 InterPro, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Welcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK and 3 CEITEC - Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 625 00 Brno-Bohunice, Czech Republic Received September 22, 2017; Revised October 14, 2017; Editorial Decision October 16, 2017; Accepted October 26, 2017 ABSTRACT The Protein Data Bank in Europe (PDBe, pdbe.org) is actively engaged in the deposition, annotation, remediation, enrichment and dissemination of macromolecular structure data. This paper describes new developments and improvements at PDBe addressing three challenging areas: data enrichment, data dissemination and functional reusability. New features of the PDBe Web site are discussed, including a context dependent menu providing links to raw experimental data and improved presentation of structures solved by hybrid methods. The paper also summarizes the features of the LiteMol suite, which is a set of services enabling fast and interactive 3D visualization of structures, with associated experimental maps, annotations and quality assessment information. We introduce a library of Web components which can be easily reused to port data and functionality available at PDBe to other services. We also introduce updates to the SIFTS resource which maps PDB data to other bioinformatics resources, and the PDBe REST API. INTRODUCTION The Protein Data Bank (PDB, (1)) is the single global archive of experimentally determined three-dimensional (3D) structures of biological macromolecules and their complexes. The Protein Data Bank in Europe (PDBe; pdbe.org; (2)) is a founding member of the Worldwide Protein Data Bank (wwPDB; http://wwpdb.org (3)), the international consortium that manages the PDB archive. The other members of the consortium are the Research Collaboratory for Structural Bioinformatics (RCSB PDB; (4)), the Biological Magnetic Resonance Data Bank (BMRB; (5)), and Protein Data Bank Japan (PDBj; (6)). The wwPDB partners collaborate on the annotation of macromolecular structure depositions and release new data into the PDB archive each week. Since 2015, the wwPDB partners have used a unified system for deposition, curation and validation of the deposited structure data, called OneDep, with PDBe being responsible for the annotation of all depositions from European and African institutions (7). Each wwPDB partner site has developed unique services for delivering PDB data to the scientific community. One of the principal focuses of PDBe activities is enrichment and dissemination of PDB data to the wider user community. This community not only includes domain experts such as structural biologists, but also encompasses users with varying expertise in, and knowledge of, structural data, such as bio- and chemo-informaticians, modellers, clinicians, life scientists and students. Structural biology archives are thus faced with the dual challenge of providing appropriate and consistent access to their data, as well as developing discovery and visualisation mechanisms for the benefit of all users. PDBe is addressing the following three main areas in order to meet this challenge and improve the accessibility of PDB data: enrichment of PDB data, ensuring its efficient delivery, and development of reusable Web components. *To whom correspondence should be addressed. Tel: +44 1223 494646; Fax: +44 1223 494468; Email: sameer@ebi.ac.uk C The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from https://academic.oup.com/nar/article-abstract/46/D1/D486/4595861 by Masaryk University user on 04 April 2018 Nucleic Acids Research, 2018, Vol. 46, Database issue D487 Data enrichment Most biological investigations require the use of multiple data resources that are often disparate and independent. Integration of PDB information with other biological resources provides biological context to the information encoded in atomic coordinates and better understanding of function and mechanism of action of biological systems. Cross-linking PDB data with other data resources also facilitates discovery of structure data. Since 2002, in collaboration with the UniProt team (8), PDBe has developed and maintained SIFTS (Structure Integration with Function, Taxonomy and Sequence, (9)), a resource providing residuelevel mapping between UniProt KnowledgeBase (UniProtKB) and PDB entries, as well as annotations from IntEnz (10), GO (11), Pfam (12), InterPro (13), SCOP (14), CATH (15) and the NCBI taxonomy resources (16). Recently, PDBe has carried out a number of improvements to the way SIFTS mappings are derived and presented. Many genes can encode for more than one protein product, e.g. through alternative splicing of mRNA. It is estimated that approximately 70% of human genes undergo alternative splicing (17) with the resulting individual protein products represented in UniProtKB as isoforms. One of the isoforms (usually the most prevalent) is termed canonical and until the recent update of the SIFTS resource, it was only this sequence to which a polypeptide chain in a PDB entry was mapped. To address this shortcoming, multiple (and potentially overlapping) mappings were enabled and, as a result, SIFTS now supports mappings to the isoform which best represents the sequence of a polypeptide chain in the PDB entry. For example, the sequence of the isoform 3 of the N-terminal splice region of a cyclic AMP-specific phosphodiesterase from Rattus norvergicus (PDB entry 1LOI, pdbe.org/1loi) has no sequence identity to the N-terminus of isoform 1 (termed the ‘canonical’ sequence). The ability to provide multiple mappings between PDB and UniProtKB also allows mappings to homologous proteins. The UniProt Reference Clusters (UniRef, (18)) provide clustered sets of sequences from UniProtKB. UniRef90 is built by clustering UniProtKB sequences with 11 or more residues such that each cluster is composed of sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (called seed sequence) of the cluster. If a protein chain in a PDB entry covers at least 70% of the canonical sequence in the primary mapping, alignments against all the cluster members belonging to the same set as the canonical sequence are made available from the SIFTS resource. For example, while the PDB contains structures for approximately 2800 unique human proteins, our analysis shows that a further 3300 sequences for human proteins can be mapped to structures from other organisms at >90% sequence identity. The new version of SIFTS has also updated the rules for mapping between structures in the PDB and Pfam such that structures are now mapped to Pfam only if the entire Pfam domain is present in the sample sequence. SIFTS data continues to be accessible via the PDBe REST API (pdbe.org/api) and can be downloaded from the FTP site (ftp://ftp.ebi.ac.uk/pub/databases/pdb/). We continue updating the PDB entry pages with the relevant information provided by SIFTS as shown in Figure 1. Popular drug names are typically not provided for most ligands deposited in the PDB archive. PDBe has mapped InChIKeys (19) of all ligands in the PDB archive to those of drugs available in DrugBank (20), thus enabling searches for ligands and corresponding structures using drug names, additional synonyms, registered brand names and even E numbers used to define food additives within the European Union. Data delivery Catering to a wide variety of users requires development of flexible and robust modes of access, along with presentation and visualization of macromolecular structure data. This includes logically grouping information to provide relevant context on PDBe Web pages, and developing programmatic access to the underlying data for bioinformatics use cases. In addition to accessing the structure data itself, users with different levels of expertise seek information from the PDB archive by searching for concepts they are most familiar with, such as sequences, genes, small molecules and biological functions. A search system must, therefore, support querying over a rich and extensive set of metadata and associated biological information. Lastly, the increasing size and complexity of macromolecular structure data makes real-time 3D visualization in the Web browser a challenging task. RESTful application programming interface (API) Advanced data-intensive approaches are needed to connect 3D structures to the wider context of biological data and scientific literature. Many research groups develop bespoke protocols to retrieve and collate 3D structure data with other information, such as cross-links to other biological resources described above. This information often resides in multiple files, each in a different format. As described previously (2), in order to facilitate programmatic data retrieval, PDBe has developed a public RESTful (Representational state transfer) API (pdbe.org/api), which underpins the production workflows at PDBe and provides a simple, reliable, and lightweight mechanism to query macromolecular structure data, select entries or molecules of interest, and access targeted information about them. The PDBe REST API is organised into modules pertaining to the core archive, chemistry, SIFTS, validation information and assembly information from PDBePISA (21). In the past two years, 18 additional REST call end-points were added providing a more complete coverage of underlying macromolecular structure data and metadata, while maintaining backwards compatibility. The PDBe REST API has been integrated into various external tools and services, such as the Volume Slicer (22) for Electron Microscopy Data Bank (EMDB; (23)) entries, LiteMol (Sehnal et al. in press), Jmol/JSmol (24) and Coot (25) to display SIFTS mappings and/or validation information, and JalView (26) to directly query PDB data from the viewer. Downloaded from https://academic.oup.com/nar/article-abstract/46/D1/D486/4595861 by Masaryk University user on 04 April 2018 D488 Nucleic Acids Research, 2018, Vol. 46, Database issue Figure 1. Screenshot from PDB Web pages for PDB entry 4IGK (structure of human BRCA1 BRCT in complex with ATRIP peptide, pdbe.org/4igk), showing SIFTS annotations from GO, Pfam, InterPro and CATH. CoordinateServer Continuing advances in structure determination techniques lead to increased size and complexity of structures available from the PDB, making efficient data delivery a challenging task. Many use-cases, however, may only require a portion of the data for a PDB entry, such as a ligand and its immediate environment. To address this challenge, PDBe, in collaboration with the Central European Institute of Technology (CEITEC, https: //www.ceitec.eu/), have developed the CoordinateServer, capable of dynamically extracting and transmitting subsets of atomic coordinates for a given structure, thereby significantly reducing the network transfer size. The server can also perform common tasks, such as assembly generation, finding backbone, sidechain or ligand atoms and finding atomic coordinates of residues within a given radius from the ligand, including residues from symmetry-related molecules. The server is accessible online at www.ebi.ac.uk/ pdbe/coordinates/. DensityServer Data delivery challenges are even more pronounced for experimental data. Electric potential maps for models derived by electron cryo-microscopy for instance, may be several gigabytes in size. The DensityServer, also developed in partnership with CEITEC, can dynamically extract and transmit portions of experimental maps. Moreover, experimental maps can be requested as a full resolution subset (e.g., around a binding site) or as a complete map at a dynamically down-sampled resolution, enabling near instantaneous data transfer and rendering in both cases. The DensityServer is accessible online at www.ebi.ac.uk/pdbe/ densities/. The CoordinateServer can return data encoded in the PDBx/mmCIF (27) format, thus making it compatible with the data standards developed to represent structural biology information. In addition, both servers also support data compression using the new BinaryCIF format (Sehnal et al. in press), which uses standard PDBx/mmCIF dictionary definitions and can store macromolecular models, experimental maps, added annotations and other data. The BinaryCIF format thus, not only provides a uniform datastorage mechanism, it further reduces the size of transmitted data (Sehnal et al. in press), enabling even structures of large viral particles to be transferred rapidly to the user. Reusability––web components Many biological data resources, including PDBe, are engaged in similar tasks in order to provide broadly similar features, such as displaying a carousel of images, formatting a search result, visualising sequences and annotations and viewing molecules in 3D (28). By utilizing Web standards such as Web components, the corresponding data and functionality can be made instantly and easily portable, thus encouraging reuse. These Web components can be shared, either to fetch third party data into a service, or to reuse the functionality to manipulate data from another resource, thereby saving development time. Furthermore, Web components ensure consistent visualization of the same data across different services, resulting in a better user experi- ence. Downloaded from https://academic.oup.com/nar/article-abstract/46/D1/D486/4595861 by Masaryk University user on 04 April 2018 Nucleic Acids Research, 2018, Vol. 46, Database issue D489 Web components are reusable and customizable JavaScript-based widgets that conform to Web standards and can be freely used on Web pages in all modern Web browsers with minimal programming effort. They remove the need for technical know-how to develop interactive visualizations of data, and data can be shared across components and services without the need for replication. PDBe has developed a library of Web components encapsulating various features. Most of them have been included in the PDBe Web site and are freely re-usable by any other Web resource by including simple custom HTML elements. The library currently contains 10 components, with more actively being developed. These include: • LiteMol 3D Viewer: Integral to understanding macromolecular structure data and elucidating function is the ability to view these data in 3D. With the advent of WebGL and HTML5, a number of online tools were developed for 3D visualization of biological molecules (29,30). The LiteMol 3D Viewer (https://litemol.org, (Sehnal et al. in press)), developed in collaboration with CEITEC, is a new WebGL-based viewer with a low memory footprint and compatible with all major browsers without any additional plugins, and therefore compatible with tablets and mobile devices. Based on the requested visualization, LiteMol automatically queries the Coordinate and Density servers to fetch relevant atomic coordinates or portions of electron density or electric potential maps, respectively. It accepts as input PDBx/mmCIF as well as the BinaryCIF format described above. The viewer has the ability to generate interactive visualisations of 3D coordinate data with standard representations, as well as overlaid experimental data and annotations such as sequence or structure annotations and quality assessment information from wwPDB validation reports, which it dynamically queries from the PDBe REST API. • PDB residue interactions: Contributed by Melis Kayikci at the MRC Laboratory of Molecular Biology (MRCLMB, http://www2.mrc-lmb.cam.ac.uk/), Cambridge, UK, this component displays, in interactive graphical form, the atomic contacts between each of the secondary structure elements in a protein. It links directly to more detailed views on the Rajini website (http://www.mrc-lmb.cam.ac.uk/rajini). The width of the connection between each of the secondary structure elements is proportional to the number of interatomic contacts in the interface. Hovering over these connections will display the exact number of atomic contacts in that particular interaction. • Links to wwPDB partners: A simple component which shows links to the three wwPDB sites providing PDB data: PDBe, PDBj and RCSB PDB. • PDB REDO: The PDB REDO component shows the change in geometric quality and in fit to experimental data between the original PDB entry and the automatically re-refined model available from the PDB REDO resource (31). The geometric quality score combines evaluations of the Ramachandran statistics, side-chain rotamericity and atomic packing. • PDB UniProt Viewer (UniPDB): Provides a summary of PDB entries containing a sequence mapped to a particular UniProt code and displays which portion of the whole UniProt sequence is present in the PDB entry (32). The display also highlights any differences between these sequences due to, for example, engineered mutations or expression tags. • Experimental data: Dynamically searches for and displays information about unprocessed experimental data related to an entry if it is found in collaborating archives (see below). • PDB Prints: PDB Prints is a collection of logos in a specific order providing essential information about an entry at a glance, such as citation, taxonomy, sample production technique, experimental method and protein, nucleic acid and heterogen content (33). These logos link to detailed information about each category on PDBe Web pages. • PDB 3D complex: This component gives a brief summary of the symmetry of the quaternary structure. The component also displays a confidence measure that estimates the probability of a particular quaternary structure being a biologically relevant assembly (34) and links to the PiQsi pages for further details. We have previously described the Sequence Feature View and the Topology Viewer (2), providing interactive linear representation of protein sequences and 2D representation of the secondary structure elements for protein chains in a PDB entry respectively, together with sequence, structure and validation annotations. Both of these viewers have now been converted into Web components. On PDBe molecule view pages, the sequence viewer, the topology viewer and LiteMol 3D viewer components work in a synchronized and interactive fashion. Selecting a residue or a secondary structure element in one of them highlights or focuses the view on the same element in the other two viewers (Figure 2). All of the above Web components invoke the PDBe REST API, thus ensuring consistent data across the entire suite. The Web components are created using AngularJS (https://angularjs.org/), Polymer (https://www. polymer-project.org/) and D3.js (https://d3js.org/), and simple instructions how to download and reuse them are provided at www.ebi.ac.uk/pdbe/pdb-component-library. We have also incorporated most of our Web components into BioJS (35), a standard JavaScript library of over 100 open source life science related Web components. PDB entry pages The PDB entry pages serve as the main mechanism for disseminating information about PDB entries, including core PDB data, value added information and cross-references to other resources. As described previously (2), available information is arranged into six main topics, each represented by a separate Web page within the entry: summary, citations, structure analysis, function and biology, ligands and environments, and experiments and validation. A number of changes have taken place since 2016, including incorporation of the LiteMol 3D viewer capable of displaying electron density for structures determined using X-ray crystallography and electric potential maps for structures determined using electron cryo-microscopy, as shown in Figure 3. Downloaded from https://academic.oup.com/nar/article-abstract/46/D1/D486/4595861 by Masaryk University user on 04 April 2018 D490 Nucleic Acids Research, 2018, Vol. 46, Database issue Figure 2. Screenshot of interactive sequence (1D), topology (2D) and 3D structure components for the catalytic subunit of cAMP-dependent protein kinase (PDB entry 1ATP, pdbe.org/1atp). Figure 3. 3D interactive visualisations of two instances of muramic acid in PDB structures of comparable resolution using LiteMol 3D viewer. (A) Data for entry 1LOD (pdbe.org/1lod) at 2.05 ˚A resolution. (B) Data for entry 5M1A (pdbe.org/5m1a) at 2.0 ˚A resolution. In both panels, electron density shown in blue mesh is where the experimental data and model agree (so called 2mFo-DFc map, plotted at contour level of 1.5␴, where ␴ is the standard deviation of the map), while electron density expected from the model and not present in the experimental data is shown in red mesh (negative values in mFo-DFc map, plotted at –3␴ contour level), and electron density unexplained by the model is shown in green mesh (positive values in mFo-DFc map, plotted at +3␴ contour). For structures solved by multiple experimental techniques, representation of the experimental information has been improved with validation information and experimental setup described for each of the employed methods in a dedicated tab. In particular, for structures where associated small-angle scattering (SAS) data is available at the Small-Angle Scattering Biological Data Bank (SASBDB, sasbdb.org, (36)), information is retrieved directly from SASBDB via its API. The PDBe page for such entries shows key parameters derived from the scattering profile, such as the weight and oligomerisation state of the studied molecular system, and provides basic information on the sample and sample conditions, detector and radiation source. PDBe entry pages have a panel that includes a navigation menu with quick links between the sub-pages of the entry and download links to all available files for an enDownloaded from https://academic.oup.com/nar/article-abstract/46/D1/D486/4595861 by Masaryk University user on 04 April 2018 Nucleic Acids Research, 2018, Vol. 46, Database issue D491 Figure 4. Examples showing summary and links to experimental data automatically fetched for (A) PDB entry 3J7L (pdbe.org/3j7l) from EMPIAR (B) PDB entry 5TOK (pdbe.org/5tok) from SBGrid Data Bank and (C) PDB entry 4WEQ (pdbe.org/4weq) from IRRMC. try. It also includes context-dependent information ‘scent trail’ features in the form of Web components. For example, on the experiments and validation page, if unprocessed experimental data is available in collaborating archives, the Experimental Data Web component provides a corresponding summary and links. Currently, the widget searches three external resources for raw experimental data for an entry: EMPIAR (37), SBGrid Data Bank (38) and Integrated Resource for Reproducibility in Macromolecular Crystallography (IRRMC) (39). Examples of this are shown in Figure 4. On the structure analysis page, links to perform structure or sequence similarity searches are presented. While viewing a particular ligand in the ligands and environments page, the ‘scent trail’ offers direct links to search for either similar ligands or sub-structures in ChEMBL (40) or binding site details for the ligand environment in PDBeMotif (41). FUTURE DEVELOPMENTS We are currently working on an advanced search feature, which will allow users to query the available data by specific criteria, such as the presence of protein sequence or structural motifs, and will allow combining these criteria with logical operators. We are incorporating small molecule and binding site data into our Web pages and query mechanism, allowing users to search for structures based on binding site characteristics, as well as providing details of interactions between a ligand and its binding site on PDBe Web pages. We are also in the process of exposing ORCID (https://orcid.org/) persistent digital identifiers for entry authors, and allowing users to sign in with their ORC IDs to claim previously released entries that have no ORCID information. For the LiteMol suite, we are developing a mechanism by which users can add, save and share annotations directly in the 3D viewer. We are also continuing to develop other features of the PDBe Web site as reusable Web components, and adding more methods to the API to provide access to more data in the PDB archive files, such as richer electron cryo-microscopy metadata available in version 5 of PDBx/mmCIF. ACKNOWLEDGEMENTS We would like to thank all collaborators and partners at the EMBL-EBI, EMBL, wwPDB, EMDB, CCP4, SASBDB, CCPN, CCDC, CEITEC, MRC-LMB and other collaborative efforts, as well as the structural biology and bioinformatics community. FUNDING Wellcome Trust [104948]; UK Biotechnology and Biological Sciences Research Council [BB/M011674/1, BB/N019172/1, BB/M020347/1]; European Union [284209]; European Molecular Biology Laboratory (EMBL). Funding for open access charge: EMBL. Conflict of interest statement. None declared. REFERENCES 1. Bernstein,F.C., Koetzle,T.F., Williams,G.J., Meyer,E.F., Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) The Protein Data Bank. Eur. J. Biochem., 80, 319–324. 2. Velankar,S., Van Ginkel,G., Alhroub,Y., Battle,G.M., Berrisford,J.M., Conroy,M.J., Dana,J.M., Gore,S.P., Gutmanas,A., Haslam,P. et al. (2016) PDBe: Improved accessibility of macromolecular structure data from PDB and EMDB. Nucleic Acids Res., 44, D385–D395. 3. Berman,H., Henrick,K. and Nakamura,H. (2003) Announcing the worldwide Protein Data Bank. Nat. Struct. Biol., 10, 980–980. 4. Berman,H.M. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242. 5. Ulrich,E.L., Akutsu,H., Doreleijers,J.F., Harano,Y., Ioannidis,Y.E., Lin,J., Livny,M., Mading,S., Maziuk,D., Miller,Z. et al. (2008) BioMagResBank. Nucleic Acids Res., 36, D402–D408. 6. Kinjo,A.R., Suzuki,H., Yamashita,R., Ikegawa,Y., Kudou,T., Igarashi,R., Kengaku,Y., Cho,H., Standley,D.M., Nakagawa,A. et al. (2012) Protein Data Bank Japan (PDBj): Maintaining a structural data archive and resource description framework format. Nucleic Acids Res., 40, D453–D460. 7. Young,J.Y., Westbrook,J.D., Feng,Z., Sala,R., Peisach,E., Oldfield,T.J., Sen,S., Gutmanas,A., Armstrong,D.R., Berrisford,J.M. Downloaded from https://academic.oup.com/nar/article-abstract/46/D1/D486/4595861 by Masaryk University user on 04 April 2018 D492 Nucleic Acids Research, 2018, Vol. 46, Database issue et al. (2017) OneDep: unified wwPDB system for deposition, biocuration, and validation of macromolecular structures in the PDB archive. Structure, 25, 536–545. 8. Bateman,A., Martin,M.J., O’Donovan,C., Magrane,M., Alpi,E., Antunes,R., Bely,B., Bingley,M., Bonilla,C., Britto,R. et al. (2017) UniProt: The universal protein knowledgebase. Nucleic Acids Res., 45, D158–D169. 9. Velankar,S., Dana,J.M., Jacobsen,J., van Ginkel,G., Gane,P.J., Luo,J., Oldfield,T.J., O’Donovan,C., Martin,M.-J. and Kleywegt,G.J. (2013) SIFTS: Structure Integration with Function, Taxonomy and Sequences resource. Nucleic Acids Res., 41, D483–D489. 10. Fleischmann,A., Darsow,M., Degtyarenko,K., Fleischmann,W., Boyce,S., Axelsen,K.B., Bairoch,A., Schomburg,D., Tipton,K.F. and Apweiler,R. (2004) IntEnz, the integrated relational enzyme database. Nucleic Acids Res., 32, D434–D437. 11. Blake,J.A., Christie,K.R., Dolan,M.E., Drabkin,H.J., Hill,D.P., Ni,L., Sitnikov,D., Burgess,S., Buza,T., Gresham,C. et al. (2015) Gene ontology consortium: Going forward. Nucleic Acids Res., 43, D1049–D1056. 12. Finn,R.D., Coggill,P., Eberhardt,R.Y., Eddy,S.R., Mistry,J., Mitchell,A.L., Potter,S.C., Punta,M., Qureshi,M., Sangrador-Vegas,A. et al. (2016) The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res., 44, D279–D285. 13. Finn,R.D., Attwood,T.K., Babbitt,P.C., Bateman,A., Bork,P., Bridge,A.J., Chang,H.Y., Dosztanyi,Z., El-Gebali,S., Fraser,M. et al. (2017) InterPro in 2017-beyond protein family and domain annotations. Nucleic Acids Res., 45, D190–D199. 14. Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247, 536–540. 15. Sillitoe,I., Lewis,T.E., Cuff,A., Das,S., Ashford,P., Dawson,N.L., Furnham,N., Laskowski,R.A., Lee,D., Lees,J.G. et al. (2015) CATH: Comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res., 43, D376–D381. 16. Federhen,S. (2012) The NCBI Taxonomy database. Nucleic Acids Res., 40, D136–D143. 17. Tress,M.L., Abascal,F. and Valencia,A. (2017) Alternative splicing may not be the key to proteome complexity. Trends Biochem. Sci., 42, 98–110. 18. Suzek,B.E., Wang, Huang,Y., McGarvey,H., Wu,P.B. and UniProt Consortium,C.H. (2015) UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics, 31, 926–932. 19. Heller,S., McNaught,A., Stein,S., Tchekhovskoi,D. and Pletnev,I. (2013) InChI––the worldwide chemical structure identifier standard. J. Cheminform., 5, 7. 20. Law,V., Knox,C., Djoumbou,Y., Jewison,T., Guo,A.C., Liu,Y., MacIejewski,A., Arndt,D., Wilson,M., Neveu,V. et al. (2014) DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res., 42, D1091–D1097. 21. Krissinel,E. and Henrick,K. (2007) Inference of macromolecular assemblies from crystalline state. J. Mol. Biol., 372, 774–797. 22. Salavert-Torres,J., Iudin,A., Lagerstedt,I., Sanz-Garc´ıa,E., Kleywegt,G.J. and Patwardhan,A. (2016) Web-based volume slicer for 3D electron-microscopy data from EMDB. J. Struct. Biol., 194, 164–170. 23. Tagari,M., Newman,R., Chagoyen,M., Carazo,J.M. and Henrick,K. (2002) New electron microscopy database and deposition system. Trends Biochem. Sci., 27, 589. 24. Hanson,R.M. (2010) Jmol-a paradigm shift in crystallographic visualization. J. Appl. Crystallogr., 43, 1250–1260. 25. Emsley,P., Lohkamp,B., Scott,W.G. and Cowtan,K. (2010) Features and development of Coot. Acta Crystallogr. Sect. D Biol. Crystallogr., 66, 486–501. 26. Waterhouse,A.M., Procter,J.B., Martin,D.M.A., Clamp,M. and Barton,G.J. (2009) Jalview Version 2–a multiple sequence alignment editor and analysis workbench. Bioinformatics, 25, 1189–1191. 27. Bourne,P.E., Berman,H.M., McMahon,B., Watenpaugh,K.D., Westbrook,J.D. and Fitzgerald,P.M.D. (1997) Macromolecular crystallographic information file. Methods Enzymol., 277, 571–590. 28. Abriata,L.A. (2017) Structural database resources for biological macromolecules. Brief. Bioinform., 18, 659–669. 29. Yuan,S., Chan,H.C.S. and Hu,Z. (2017) Implementing WebGL and HTML5 in macromolecular visualization and modern computer-aided drug design. Trends Biotechnol., 35, 559–571. 30. Abriata,L.A. (2017) Web apps come of age for molecular sciences. Informatics, 4, 28. 31. Joosten,R.P., Salzemann,J., Bloch,V., Stockinger,H., Berglund,A.C., Blanchet,C., Bongcam-Rudloff,E., Combet,C., Da Costa,A.L., Deleage,G. et al. (2009) PDB REDO: automated re-refinement of X-ray structure models in the PDB. J. Appl. Crystallogr., 42, 376–384. 32. Velankar,S., Alhroub,Y., Best,C., Caboche,S., Conroy,M.J., Dana,J.M., Fernandez Montecelo,M.A., van Ginkel,G., Golovin,A., Gore,S.P. et al. (2012) PDBe: Protein Data Bank in Europe. Nucleic Acids Res., 40, D445–D452. 33. Velankar,S. and Kleywegt,G.J. (2011) The Protein Data Bank in Europe (PDBe): bringing structure to biology. Acta Crystallogr. D. Biol. Crystallogr., 67, 324–330. 34. Levy,E.D. (2007) PiQSi: Protein Quaternary Structure Investigation. Structure, 15, 1364–1367. 35. G´omez,J., Garc´ıa,L.J., Salazar,G.A., Villaveces,J., Gore,S., Garc´ıa,A., Mart´ın,M.J., Launay,G., Alc´antara,R., Del-Toro,N. et al. (2013) BioJS: an open source JavaScript framework for biological data visualization. Bioinformatics, 29, 1103–1104. 36. Valentini,E., Kikhney,A.G., Previtali,G., Jeffries,C.M. and Svergun,D.I. (2015) SASBDB, a repository for biological small-angle scattering data. Nucleic Acids Res., 43, D357–D363. 37. Iudin,A., Korir,P.K., Salavert-Torres,J., Kleywegt,G.J. and Patwardhan,A. (2016) EMPIAR: a public archive for raw electron microscopy image data. Nat. Methods, 13, 387–388. 38. Morin,A., Eisenbraun,B., Key,J., Sanschagrin,P.C., Timony,M.A., Ottaviano,M. and Sliz,P. (2013) Collaboration gets the most out of software. Elife, 2, e01456. 39. Grabowski,M., Langner,K.M., Cymborowski,M., Porebski,P.J., Sroka,P., Zheng,H., Cooper,D.R., Zimmerman,M.D., Elsliger,M.A., Burley,S.K. et al. (2016) A public database of macromolecular diffraction experiments. Acta Crystallogr. Sect. D Struct. Biol., 72, 1181–1193. 40. Bento,A.P., Gaulton,A., Hersey,A., Bellis,L.J., Chambers,J., Davies,M., Kr¨uger,F.A., Light,Y., Mak,L., McGlinchey,S. et al. (2014) The ChEMBL bioactivity database: an update. Nucleic Acids Res., 42, D1083–D1090. 41. Golovin,A. and Henrick,K. (2008) MSDmotif: exploring protein sites and motifs. BMC Bioinformatics, 9, 312. Downloaded from https://academic.oup.com/nar/article-abstract/46/D1/D486/4595861 by Masaryk University user on 04 April 2018