Biophysical Reviews (2018) 10:391-410 https://doi.org/10.1007/s12551-017-0376-1 CrossMark Hierarchical design of artificial proteins and complexes toward synthetic structural biology Ryoichi Arai1'2'3'4© Received: 25 October 2017 /Accepted: 23 November 2017 /Published online: 14 December 2017 © International Union for Pure and Applied Biophysics (IUPAB) and Springer-Verlag GmbH Germany, part of Springer Nature 2017 Abstract In multiscale structural biology, synthetic approaches are important to demonstrate biophysical principles and mechanisms underlying the structure, function, and action of bio-nanomachines. A central goal of "synthetic structural biology" is the design and construction of artificial proteins and protein complexes as desired. In this paper, I review recent remarkable progress of an array of approaches for hierarchical design of artificial proteins and complexes that signpost the path forward toward synthetic structural biology as an emerging interdisciplinary field. Topics covered include combinatorial and protein-engineering approaches for directed evolution of artificial binding proteins and membrane proteins, binary code strategy for structural and functional de novo proteins, protein nanobuilding block strategy for constructing nano-architectures, protein-metal-organic frameworks for 3D protein complex crystals, and rational and computational approaches for design/creation of artificial proteins and complexes, novel protein folds, ideal/optimized protein structures, novel binding proteins for targeted therapeutics, and self-assembling nanomaterials. Protein designers and engineers look toward a bright future in synthetic structural biology for the next generation of biophysics and biotechnology. Keywords Artificial protein and complex ■ Combinatorial library ■ Computational design ■ Directed evolution ■ Hierarchical design ■ Protein engineering Introduction Living organisms are maintained by self-assembling bio-molecules, such as proteins, nucleic acids, sugars, and This article is part of a Special Issue on 'Biomolecules to Bio-nanomachines - Fumio Arisaka 70th Birthday' edited by Damien Hall, Junichi Takagi and Haruki Nakamura. 53 Ryoichi Arai rarai@ shinshu-u.ac.jp 1 Department of Applied Biology, Faculty of Textile Science and Technology, Shinshu University, Ueda, Nagano 386-8567, Japan 2 Department of Supramolecular Complexes, Research Center for Fungal and Microbial Dynamism, Shinshu University, Minamiminowa, Nagano 399-4598, Japan 3 Institute for Biomedical Sciences, Interdisciplinary Cluster for Cutting Edge Research, Shinshu University, Matsumoto, Nagano 390-8621, Japan 4 Division of Structural and Synthetic Biology, RIKEN Center for Life Science Technologies, Tsurumi, Yokohama, Kanagawa 230-0045, Japan lipids. Chemical reconstitution of living matter is an ultimate goal of synthetic and systems biology. Design of structural and functional artificial proteins and complexes is a key challenge in "synthetic structural biology." Synthetic biology is a field of research concerned with the design and construction of new biological parts, devices, and systems, and the redesign of existing, natural biological systems for useful purposes. Synthetic structural biology is a new interdisciplinary field of synthetic biology and structural biology. In multiscale structural biology, synthetic approaches are important to demonstrate biophysical principles and mechanisms underlying the structure, function, and action of bio-nanomachines. In recent years, DNA origami has been developed as a synthetic approach to the design and construction of various supramolecular nanostructures. DNA base complementarity can be exploited in the rational design of artificial nanostructures with versatile two-dimensional (2D) and three-dimensional (3D) shapes, such as polyhedra (Ke 2014). However, nucleic acids generally comprise the bases A, T, G, and C, and the ensuing limitations on numbers of Springer 392 Biophys Rev (2018) 10:391-410 combinations and chemical features may confine the potential to produce molecules with advanced functions. In contrast with DNA, proteins comprise 20 types of amino acids, allowing greater diversity of chemical properties. Accordingly, the enormous numbers of possible sequence combinations expand the probabilities to create diverse and advanced functions. Natural proteins are the most versatile biomacromolecules, and perform complex and functional tasks in all organisms, because of the formation of intricate and refined tertiary and quaternary structures with versatile chemical properties and functionalities. Protein functions are essentially determined by their 3D structures. Protein structures are constructed on four hierarchical levels. Specifically primary structure refers to amino acid sequences, and secondary structures are local regular forms of a-helices or |3-strands with hydrogen bonds. In globular forms of proteins, elements of a-helices and/or |3-sheets and loops are folded into tertiary structures, and self-assembly of folded chains from multiple polypeptides produces quaternary structures. These complex and refined 3D structures produce the versatile functionalities of proteins. A central goal of protein engineering and synthetic structural biology is to design and create novel structural and functional proteins and protein complexes as desired. The design of de novo proteins, which are not derived from natural protein sequences, has been in essence an exploration of untracked areas of amino-acid sequence space. This exploration can be challenging, both because sequence space is vast, and because the contribution of many cooperative and long-range interactions causes a significant gap between the primary structures and their resulting tertiary and quaternary structures. Research into de novo protein designs has progressed toward the construction of novel proteins, and has been achieved largely from combinatorial approaches (Keefe and Szostak 2001; Urvoas et al. 2012), rational and computational design approaches (Dahiyat and Mayo 1997; Kuhlman et al. 2003; Koga et al. 2012; Huang et al. 2016), and semirational approaches that include elements of both (Kamtekar et al. 1993; Hecht et al. 2004; Urvoas et al. 2012). Recent advances in science and technology, such as significant increases in the number of 3D protein structures deposited in the protein data bank (PDB), rapid advances in computer hardware and software, and the reduced costs of DNA synthesis for artificial genes, lead to further developments of design and construction of artificial proteins and complexes. In this review, I describe recent progress in various approaches for designing and constructing artificial proteins and protein complexes. From the viewpoint of building blocks, I focus on the hierarchical design of artificial proteins and protein complexes from primary to quaternary structures using a range of combinatorial, protein engineering, rational and computational approaches for generating protein structures and functions. As shown in Fig. 1, there are two axes in the design of artificial proteins and complexes: the horizontal axis is the hierarchy of protein structures from primary to quaternary and supra-quaternary structures; and the vertical axis is how to design artificial proteins and complexes: a variety of combinatorial and protein-engineering (Fig. la-g) and rational and computational (Fig. lh-n) approaches. Combinatorial and protein-engineering approaches for artificial protein and complex design Hierarchical design of tertiary structures of artificial proteins In general, protein structures are characterized by four hierarchical levels from primary to quaternary structures, and therefore adoption of hierarchical approaches is essential for protein design (Bryson et al. 1995). In this section, I briefly review several combinatorial and protein-engineering approaches for designing and constructing artificial proteins from the viewpoint of the hierarchical design of building blocks. Further details and specific topics are described in the following sections. Using amino acid residues as building blocks in primary structures, huge combinatorial libraries of totally random polypeptide sequences are the primitive starting point of protein evolution. Due to the low frequency of soluble and folded proteins in random sequences, fully randomized sequence libraries have to be searched with powerful selection methods (Urvoas et al. 2012). One remarkable result was the isolation of a de novo ATP binding protein from fully randomized sequences by mRNA display (Fig. la) (Keefe and Szostak 2001). The frequency of folded and functional sequences in completely random sequences is very low when placed in comparison to our current experimental screening power. Thus, several strategies have been proposed to focus the exploration on limited sequence spaces expected to be rich in folded structures, using some designed patterns of amino acid sequences as building blocks (Urvoas et al. 2012). One of the most fruitful strategies is known as the binary code strategy which was developed to produce primary structure libraries for tertiary structures of de novo proteins using secondary structure units with binary patterns of polar and nonpolar residues (Fig. Id), and various structural and functional de novo proteins with a-helices and/or |3-sheets have been successfully created (Kamtekar et al. 1993; Hecht et al. 2004; Smith and Hecht 2011). Architectures of protein domains have evolved by the combinatorial assembly and exchange of pre-existing polypeptide modular segments, secondary structure elements and supersecondary structure motifs, derived from exon shuffling, nonhomologous recombination or alternative splicing (Fig. Springer Biophys Rev (2018) 10:391-410 393 X 0) Q. E o u ■a c IS tfl c c '5 c a) c '5 4-) O i— a. ■a c re re 11 re re c re o _ i- re fflV*TTV*TTV Replication ylTTVflTVfTTV >), that are systematically explored during docking, c The docking procedure, which is independent of the amino acid sequence of the building blocks, identifies large interfaces with high densities of contacting residues formed by well-anchored regions of the protein structure. The details of such an interface, boxed, are shown in (d). e Amino acid sequences are designed at the new interface to stabilize the modeled configuration and to drive co-assembly of the two components. Reprinted with permission from King et al. (2014); copyright © 2014, NPG. f, g Designed self-assembling nanocages. f A one-component hyperstable icosahedron with a de novo helical bundle (red helices) fused in the center of the face (Hsia et al. 2016). g Two-component megadalton- scale icosahedra (Bale et al. 2016). The two components of each are colored in blue and yellow. Reprinted with permission from Huang et al. (2016); copyright © 2016, NPG. h-k Enveloped protein nanocages (EPNs) comprise cell-derived membrane envelopes containing multiple protein nanocages (Votteler et al. 2016). h Representative cryo-EM images showing extracellular vesicles/EPNs in culture supernatants from 293 T cells that expressed EPN-01. i Central slice from a cryo-EM tomographic reconstruction of a released EPN. Two internal protein nanocages are marked with arrowheads, j Isosurface model of the 3D cryo-EM reconstruction from (i). The EPN membrane is green and individual protein nanocages are gold, k Single-particle cryo-EM reconstruction of the nanocages released from EPNs following detergent treatment. Charge density from the 5.7 A resolution electron microscopy reconstruction is shown in gray (contoured at 4.5