|
|
||||||||
1 Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
2 Basic Research Program, Science Applications International Corporation (SAIC)-Frederick, Inc., Laboratory of Experimental and Computational Biology, Frederick, Maryland 21702, USA
3 School of Computer Science, Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel
Reprint requests to: Ruth Nussinov, NCI-Frederick Building 469, Room 151, Frederick, MD 21702, USA; email: ruthn{at}ncifcrf.gov; fax: (301) 846-5598.
(RECEIVED September 20, 2002; FINAL REVISION December 23, 2002; ACCEPTED February 23, 2003)
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.0232903.
| Abstract |
|---|
|
|
|---|
Keywords: Protein folding; building blocks; protein structure prediction; hierarchical folding; folding complexity
| Introduction |
|---|
|
|
|---|
Several models have been proposed for protein folding, including (1) the framework model, (2) the nucleation and growth mechanism, (3) the diffusion-collision model, (4) the hydrophobic collapse model, and (5) the hierarchical model (Haspel et al. 2003). In recent years, a folding scheme (Rose 1979; Lesk and Rose 1981) proposed over two decades ago has been increasingly accepted: Protein folding is a hierarchical process (Baldwin and Rose 1999a, b) and the driving force is the hydrophobic effect (Dill 1985, Dill 1990; Dill and Chan 1997). This idea was further expanded by Srinivasan and Rose (1995, 1999). The main considerations are steric effects and conformational entropy, and the driving forces are hydrophobic interactions and hydrogen bond formation. Srinivasan and Rose start with folding small fragments of the protein, sampling random local conformations and scoring them according to how energetically favorable they are. The process repeats itself until the entire protein chain is folded. Limited proteolysis has illustrated that one can cut the protein structure into fragments and reassemble them through fragment complementation to yield the native fold (Taniuchi et al. 1986; Fisher and Taniuchi 1992; Fontana et al. 1997, 1999; Kuhlman et al. 1997; Yang et al. 1998; Polverino de Laureto et al. 1999, 2001). And, conversely, covalently ligating two protein molecules bound in a complex still leads to the same native complex (for review, see Tsai et al. 2001).
Computationally, there are several approaches to predict protein structures. If the target sequence is similar to sequences whose corresponding structures are available, homology modeling is the best approach to follow. If the similarity is lower, threading is often carried out. If no homologous sequences with corresponding structures are available in the Protein Data Bank (PDB; Bernstein et al. 1977), ab initio simulations are attempted, if the sequences are short. Otherwise, the sequence may be parsed into short, overlapping segments (say,
810 residues in length) and look-up tables for the potential structures of these are entered. These tables are generated through analysis of most available structures in the PDB. Combinations of these structures are evaluated and ranked (Bystroff and Baker 1998; Bystroff et al. 2000).
Here we adopt a strategy that is based on hierarchical protein-folding concepts (Crippen 1978; Rose 1979; Wodak and Janin 1981; Zehfus and Rose 1986; Zehfus 1993; Wu et al. 1995; Panchenko et al. 1996, 1997; Peng and Wu 2000). As previously, the sequence is cut into segments. However, the major difference between the strategy followed here and previous ones is that here the target sequence is cut into longer fragments, each of which is in principle able to fold independently and is assumed, based on a stability scoring function, to be a local minimum. That is, the fragments are not chosen arbitrarily, but are chosen on the basis of their stability properties. Although there are no direct experimental data indicating that these fragments are able to fold independently, there is a fairly good correspondence between the computationally generated fragments and those obtained from limited proteolysis, for proteins for which such experimental data exist (Tsai et al. 2002). The minimal fragment size is 15 residues. The maximum can be any value. The stabilities (and population times) of these fragments differ. Figure 1
describes our scheme. The first step involves cutting the target sequence into building blocks and assigning their conformations. In the second step, the building blocks are assembled combinatorially. In the third step, the structure is refined to finally yield the predicted conformation. A major advantage of such an approach is that, by cutting the target sequence into fragments and assembling them, we may be able to achieve a substantial reduction in computational times. The rationale behind such a scheme is that it follows hierarchical protein-folding pathways: Initially, local fragments fold on themselves, with subsequent stepwise assembly. To be able to adopt such a strategy, we need several elements:
|
|
|
|
|
The building block folding model (Tsai et al. 2000, 2001) enables visualizing the protein dynamical folding pathways. The model postulates that protein folding is a hierarchical process and that the basic unit from which a fold is constructed, the hydrophobic folding unit (HFU), is the outcome of a combinatorial assembly of a set of building blocks. The HFUs, in turn, associate to form intramolecular domains. The building block is defined as a highly populated, contiguous fragment in a given protein structure. According to this model, if one cuts out a building block from the protein chain, the most highly populated conformation of the resulting peptide in solution would very likely be similar to that of the building block when it is embedded in the native protein structure, even though it may happen that an alternative conformation is selected in the combinatorial assembly process. The algorithm creates an "anatomy tree" that depicts the organization of the 1-D polypeptide chain in 3-D space and describes the most likely folding pathway(s). Each node in the tree is a one-segment building block. Here, the level of a building block node in the tree will be referred to as the cutting level of that building block. This model can be used as a basis for protein modeling and prediction of local and global protein structures.
| Results |
|---|
|
|
|---|
-helix with a short loop attached to it) to an almost independent protein domain. The most frequent cases were of supersecondary structure elements, like two helices connected by a loop, or two strands and a loop.
|
The distribution of the building block stability score in the cluster varies: For clusters consisting of building blocks derived from the same protein family, the distribution is narrow. This is expected, as they are very similar structurally and sequentially, and are surrounded by a very similar environment. However, clusters derived from dissimilar proteins have broad distributions of stability scores. The stability scores (described in Materials and Methods) are used in the weighting scheme of the assignment algorithm.
The building block assignment to a target sequence
Table 2
presents some results of the building block assignment to target sequences. Figure 5AF
illustrates several examples taken from Table 2
of the assigned building blocks superimposed on the target protein. The target proteins either do not appear in the original database, or were removed from it before the execution of the program, so that they will not bias the results. Further, the database used for the assignment contains mainly building blocks from more distant proteins. Improvements in both the algorithm and the weighting function still need to be carried out; nevertheless, in many cases the algorithm finds good building block assignments even when the sequence homology is rather weak. The percent sequence identity and similarity between the assigned building block and the target sequence are given in the table and in the figures. As Table 2
and Figure 5AF
show, the target protein classes and sizes vary. Furthermore, the assigned building blocks do not necessarily derive from the same protein class as the target protein. This is consistent with the building block being a stand-alone unit.
|
-atom pairs above the allowed matching threshold (here in the range of 3.54 Å) are not matched. As seen in Table 2In the current implementation of the assignment algorithm, a shortest path is found between all the starting vertices (those connected to the "source" vertex) to the "target" vertex. This leads to more than one possible solution, that is, more than one possible building block assignment. Here we provide the "best" assignment result, that is, the result that visually looks best. We note, however, that in all cases, the weighting function ranked these results among the top three. We further note that our assignment algorithm is fully automated. The FlexProt algorithm is used only for testing and displaying the results graphically.
| Discussion |
|---|
|
|
|---|
To be able to apply such a scheme, we need two components. The first is a sequentially nonredundant database of building blocks, each with its associated structure. The second is an efficient assignment algorithm with an appropriate weighting function. The sequentially nonredundant database of building blocks that we have created is based on our building block cutting algorithm.
Most structures in the PDB (Bernstein et al. 1977) have been cut, and the resulting building blocks have been clustered. Our sequence length-independent energy function does not account for electrostatics, and is based on the native structures; nevertheless, it is in fairly good agreement with experiment (Tsai et al. 2002). The comparisons have been carried out for all proteins for which there are available limited proteolysis data, using pepsin, subtilisin, trypsin, thermolysin, and proteinase K.
A detailed description of the clustering procedure and of the clusters is presented in Materials and Methods and also in Haspel et al. (2003). Inspection of the clusters illustrates that, although some clusters contain only building blocks from the same family, others consist of building blocks from dissimilar proteins. In such cases, although the structures of the building blocks are similar, they differ in their sequences and in their spatial interactions. This observation is consistent with the assumption that the building blocks are likely to be stand-alone units. This is particularly the case for larger fragments with a substantial hydrophobic core. Smaller building blocks also have a preferred conformation in solution, which is likely to resemble that observed in the native protein. Nevertheless, neighboring building blocks may lead to a selection of an alternate more favorable conformer.
The second component is a graph theoretic assignment algorithm that, given a protein sequence, finds the "optimal" building block assignment of that sequence. Here we have presented several examples for different proteins. The results achieved so far look promising, although considerable work is still needed to optimize and improve the performance of the algorithm. The next step in such a folding scheme is the combinatorial assembly of the assigned building blocks. Computationally this is the heaviest step. Work on this problem is already in progress in our laboratory (Inbar et al. 2003). Finally, because the target sequence is not completely covered by building blocks, the unassigned parts need to undergo modeling, with subsequent overall minimization.
The advantages inherent in "folding the protein in parts, and part assembly" have been recognized for a number of years (for review, see Hardin et al. 2002). There are two major differences between previous work (Oliva et al. 1997; Bystroff and Baker 1998; Bystroff et al. 2000) and the current work. First, previously the sequence has been cut into a large number of fragments by shifting a window across the sequence and testing a large number of sequence parts. Second, only very short sequence pieces were folded separately. However, fragments having a few (<10) residues usually do not have a prevailing 3-D conformation. Although short building block sequences usually do not have a conformation with a very high population time, nevertheless, these sequences were not chosen arbitrarily. They were selected owing to their being relatively stable with higher population times than other, arbitrarily chosen fragments. Our ability to cut the structures into building block fragments with high population times, consistent with experiment (Tsai et al. 2002) in principle enables us to cut the target sequence into building blocks. Furthermore, we have developed a procedure to identify building blocks that are critical for correct protein folding (Ma et al. 2000; Kumar et al. 2001). These building block fragments are largely buried in the protein core, inserted between sequentially adjacent building blocks. Assigning a critical building block to a given fragment in the target protein may indicate a likely spatial location, further reducing the computational folding complexity.
Conclusions
Binding and folding are similar processes, with similar underlying mechanisms. Experimentally, it has been shown that, in general, cutting the proteins and reassembling them yield similar conformations as when they are chain-connected. Similarly, linking a two-molecule complex yields a structure similar to a two-molecule association. These observations are consistent with a hierarchical protein-folding scheme. If we accept that protein folding is hierarchical, and that the driving force is the hydrophobic effect, we can devise approaches that make use of such a hierarchical concept. In such a scheme, a rational first step is to initially cut the target sequence into fragments that are likely to fold on themselves and subsequently to combinatorially assemble them.
Toward such an approach to reduce the computational complexity of protein folding, here we present two essential components: a library of nonredundant sequences of building blocks, clustered by their structures, and an algorithm assigning them to a target sequence. We further present some of the results we have obtained. These include proteins from different classes, with the building blocks that are not necessarily assigned from the same protein class. Our results are encouraging, indicating that folding by parts and part assembly may contribute to further progress in the protein-folding problem; nevertheless, it clearly needs further optimization, both on the algorithmic side, and in the scoring function. Furthermore, such schemes can combine with experiment, such as limited proteolysis (Fontana et al. 1997; Polverino de Laureto et al. 1999; Tsai et al. 2002) and spectroscopy to validate the target sequence cutting.
| Materials and methods |
|---|
|
|
|---|
![]() |
where Z stands for compactness, H for hydrophobicity, and I for isolatedness. Each quantity is calculated as the deviation from the averaged value of known protein structures. The average and standard deviation of these quantities were calculated from a nonredundant dataset of 930 representative single chain proteins from the PDB. Terms with superscript 1 were determined with respect to fragment size and those with superscript 2 were determined as a function of the fraction of the fragment size to the whole protein.
The three components are as follows:
"Compactness" is defined as the solvent accessible surface area (ASA; Lee and Richards 1971; Shrake and Rupley 1973) of the fragment, divided by its minimum possible surface area, which is the area of a sphere with volume equal to that of the fragment.
![]() |
"Degree of isolation" is the ratio of the fragments nonpolar ASA that was originally buried in the interior of the protein but is exposed to the solvent after cutting, to the ASA of the isolated fragment. This component is a measurement of the extent to which the stand-alone fragment obeys the hydrophobicity rules when it is cut out of its context.
![]() |
"Hydrophobicity" is the fraction of the buried nonpolar area out of the total nonpolar area of the fragment.
![]() |
The scoring function is independent of the building block length, and reflects the population time of the building block in solution: the larger the stability, the higher the population time.
The cutting algorithm is iterative. The level of cutting is determined by counting the number of steps that are needed to trace back to the initial, uncut structure. This structure is considered the root node. The cutting stops when no new fragments can be generated.
The building block clusters
The building block database was created using the data collected by Tsai et al. (2000; available at http://protein3d.ncifcrf.gov/tsai/).
The clustering algorithm is as follows:
Clustering has two stages:
We have created 24 different databases from four protein classes, all-
,
+ ß,
/ß, and all-ß proteins, for six cutting levels. The coordinates were taken from the PDB. Clustering was based on structural similarity. Each of the databases (from each class and level) was clustered separately. A more detailed description of the clustering procedure and analysis of the cluster statistics is given elsewhere (Haspel et al. 2003).
Figure 2
shows examples of multiply-aligned building blocks from two clusters. The multiple alignment has been carried out using MUSTA (Leibowitz et al. 2001a,b).
The nonredundant sequence database
At the end of the clustering procedure, we created a nonredundant sequence database that represents each clustered database. This is done as follows: For each cluster, the sequences of the cluster members are extracted to a FASTA format file (http://fasta.bioch.virginia.edu/fasta/). Within each cluster, the sequences are clustered using BLASTCLUST (http://www.ncbi.nlm.nih.gov/BLAST/). The sequence identity that determines whether two sequences belong to the same cluster is based on a statistical function developed by Abagyan and Batalov (1997). The function estimates the sequence identity and sequence similarity required to guarantee as much as possible structural similarity, depending on the lengths of the sequences. The function is described by a normal distribution with the following parameters:
![]() |
level for two sequences, where the length of the shortest sequence is 25 amino acids, the threshold will be:
![]() |
We used this equation to cluster the sequences at 4
levels, with over 99% confidence of having the same fold.
Therefore, each structural cluster can be associated with a nonredundant group of sequences with a local structural pattern. All nonredundant sequence groups of all structural clusters are gathered and reclustered using BLASTCLUST. Our goal at this stage is to eliminate all redundancies among clusters caused by similar sequences that fall into different structural clusters. The result of this procedure is a sequentially nonredundant database that represents the structural database by means of sequences. Each item in that database is associated with a specific structural cluster, such that a structural cluster can be represented by more than one sequence.
The building block assignment algorithm
The clusters of building blocks, grouped by their structures and by their sequences, constitute the input to a graph theoretic sequence assignment algorithm. The stages of the algorithm are the following:
Following are the details of the assignment algorithm:
Sequence alignment
For a given target sequence whose building block composition we wish to assign, we carry out a sequence comparison with the database containing the representatives of all building block clusters, using BLAST (Altschul et al. 1990) with default parameters, allowing gaps (note that this is a one-against-all pairwise alignment, and a not multiple alignment).
Construction of the graph
If a sequence similarity above a given threshold is found (that is, a building block in the database is found to match an area of the target protein sequence), this building block is represented as a weighted graph vertex.
The weighting scheme.
The current weighting scheme contains two components: the BLAST match score and the building block stability score. A candidate building block is considered only if the match spans the entire building block (allowing a 10-residue gap at each side) and if the match length is at least 15 residue long. The building blocks score is: -(BLAST score + 3 * stability score). The factor of three is there because the stability score is usually smaller than the BLAST score by a factor of at least 3. This factor gives the two measurements a roughly equal weight in the scoring function. The minus sign leads to negative weights, so that the best path will be the "shortest," that is, with the smallest weight. Although the BLAST score is always positive, a larger stability score reflects a more stable building block. Using this scheme, the more negative the building block total score, the more "suitable" it is. The weighting function may also include secondary structure prediction. However, its effectiveness may depend on the implementation scheme. Substantial work is still needed in this direction, along with additional potential parameters and their relative weights.
The edges.
In the graph, a directed edge connects two vertices if they are sequentially adjacent, and if they adhere to the rules followed in the generation of the building blocks from the native structures (no more than a 7-residue overlap, and not over 15 residues apart). The edge connecting the vertices is assigned the average weight of the two vertices.
Finding the matches
The next step involves finding the shortest paths of consecutive edges. We add two fictive vertices to the graph. The first is the "source" (starting) and the last is the "target" (ending) vertex. A zero-weight edge connects each of these to a vertex that is up to a distance of 15 residues. These either follow (the starting source vertex) or are prior to (the ending target vertex). It is easy to see that the resulting graph is directed and acyclic. In such a graph, there exists an algorithm that finds the shortest path between any pair of known vertices in a short time. The shortest path algorithm (Cormen et al. 1990) is used for this purpose. A path is actually a consecutive set of vertices that leads from the starting source vertex to the ending target vertex. Because each of these vertices is a building block, the path represents a possible assignment of building blocks to the sequence. Among the obtained paths, the highest scoring ones are retained. These are assumed to have a higher probability of being a true possible building block assignment for the sequence.
Figure 3
presents a flowchart of the building block assignment algorithm and Figure 4
presents an illustration of the algorithm.
The advantage of using a graph algorithm instead of a simple assignment scheme is that the algorithm performs the assignment more efficiently. If we denote the number of vertices (candidate building blocks) by V and the number of edges connecting them by E, the algorithm finds the shortest path in time proportional to (V + E). E can be of the order of V2, but because this graph is usually sparse, the number of edges is smaller than that. The algorithm performs the sorting on graph vertices (enabled by the fact that the graph does not contain any cycles), and then finds the shortest paths on the sorted graph. Thus, not all possible candidate paths are tested, only those compatible with the sorting. A simple, straightforward assignment process would force us to go over all the possible assignments and therefore would take much longer. Actually, the number of possible valid graph paths is the number of possible combinations of consecutive vertices. The number of possible paths in a graph can be exponential in the number of vertices, but in these cases it can be estimated in Vk, where k is the number of building blocks that match each protein segment. k can be estimated as 3 or 4 in most cases. Thus, the Single Source Shortest Paths algorithm (Cormen et al. 1990) greatly reduces the computational time. Following the sequence assignment, the original conformation of the building block in the known native structure is assigned to the candidate building block. Currently, we focus on all paths with the starting vertex within the first 15 residues, consistent with the building block cutting algorithm (Tsai et al. 2000). There were a few cases, though, in which no building block cover was found for a part of the protein (see, for example, pyridoxine 5'-phosphate synthase in Table 2
). In such a case only part of the protein was assigned, provided that it was large enough and contiguous (in this case it was the C-terminal domain, starting from residue 43). This implies allowing short gaps (up to 15 residues) and short overlaps. In the current version we allow an overlap of no more than two residues because the algorithm has a tendency toward creating large overlaps.
| Acknowledgments |
|---|
The content of this publication does not necessarily reflect the view or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organization imply endorsement by the U.S. Government. The publisher or recipient acknowledges right of the U.S. Government to retain a nonexclusive, royalty-free license in and to any copyright covering the article.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| References |
|---|
|
|
|---|
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403410.[CrossRef][Medline]
Baldwin, R.L. and Rose, G.D. 1999a. Is protein folding hierarchic? I. Local structure and peptide folding. Trends Biochem. Sci. 24: 2633.[CrossRef][Medline]
. 1999b. Is protein folding hierarchic? II. Folding intermediates and transition states. Trends Biochem. Sci. 24: 7784.[CrossRef][Medline]
Bernstein, F.C., Koetzle, T.F., Williams, G.J.B., Meyer, E.F., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T., and Tasumi, M. 1977. The Protein Data Bank: A computer-based archival file for macromolecular structures. J. Mol. Biol. 112: 535542.[Medline]
Bystroff, C. and Baker, D. 1998. Prediction of local structure in proteins using a library of sequence-structure motifs. J. Mol. Biol. 281: 565577.[CrossRef][Medline]
Bystroff, C., Thorsson, V., and Baker, D. 2000. HMMSTR: A hidden Markov model for local sequence-structure correlations in proteins. J. Mol. Biol. 301: 173190.[CrossRef][Medline]
Cormen, T., Leiserson, C., and Rivest, R. 1990. Introduction to algorithms. MIT Press, Cambridge, MA.
Crippen, G.M. 1978. The tree structural organization of proteins. J. Mol. Biol. 126: 315332.[CrossRef][Medline]
Dill, K.A. 1985. Theory for the folding and stability of globular proteins. Biochemistry 24: 15011509.[CrossRef][Medline]
. 1990. Dominant forces in protein folding. Biochemistry 31: 71347155.[CrossRef]
Dill, K.A. and Chan, H.S. 1997. From Levinthal to pathways to funnels. Nat. Struct. Biol. 4: 1019.[CrossRef][Medline]
Fischer, D., Nussinov, R., and Wolfson, H.J. 1992. 3-D substructure matching in protein molecules. In Proceedings of the Third Symposium on Combinatorial Pattern Matching, Tucson, Arizona. Lecture Notes on Computer Science, vol. 644, pp. 133147. Springer Verlag.
Fisher, A. and Taniuchi, H. 1992. A study of core domains, and the core domaindomain interaction of cytochrome c fragment complex. Arch. Biochem. Biophys. 296: 116.[CrossRef][Medline]
Fontana, A., Polverino de Laureto, P., De Filippis, V., Scaramella, E., and Zambonin, M. 1997. Probing the partly folded states of proteins by limited proteolysis. Fold. Des. 2: R17R26.[CrossRef][Medline]
. 1999. Limited proteolysis in the study of protein conformation. In Proteolytic enzymes: Tools and targets (eds. E.E. Sterchi and W. Stocker), pp. 257284. Springer Verlag, Heidelberg, Germany.
Hardin, C., Pogorelov, T.V., and Luthey-Schulten, Z. 2002. Ab initio protein structure prediction. Curr. Opin. Struct. Biol. 12: 176181.[CrossRef][Medline]
Haspel, N., Tsai, C.-J., Wolfson, H., and Nussinov, R. 2003. Hierarchical protein folding pathways: A computational study of protein fragments. Proteins 51: 203215.[CrossRef][Medline]
Inbar, Y., Benyamini, H., Nussinov, R., and Wolfson, H. 2003. Protein structure prediction via combinatorial assembly of substructural motifs. Intelligent Systems in Molecular Biology (ISMB). Bioinformatics 1: 110.
Kuhlman, B., Boice, J.A., Wu, W.J., Fairman, R., and Raleigh, D.P. 1997. Calcium binding peptides from
-lactalbumin: Implications for protein folding and stability. Biochemistry 36: 46074615.[CrossRef][Medline]
Kumar, S., Sham, Y.Y., Tsai, C.J., and Nussinov, R. 2001. Protein folding and function: The N-terminal fragment in adenylate kinase. Biophys. J. 80: 24392454.
Lee, B. and Richards, F. 1971. The interpretation of protein structures: Estimation of static accessibility. J. Mol. Biol. 55: 379400.[CrossRef][Medline]
Leibowitz, N., Fligelman, Z., Nussinov, R., and Wolfson, H. 2001a. Automated multiple structure alignment and detection of a common structural motif. Proteins 43: 235245.[CrossRef][Medline]
Leibowitz, N., Nussinov, R., and Wolfson, H. 2001b. MUSTAA general, efficient, automated method for multiple structure alignment and detection of a common motif: Application to proteins. J. Comput. Biol. 8: 93121.[CrossRef][Medline]
Lesk, A.M. and Rose, G.D. 1981. Folding unit in globular proteins. Proc. Natl. Acad. Sci. 78: 43044308.
Ma, B., Tsai, C.J., and Nussinov, R. 2000. Binding and folding: In search of intramolecular chaperone-like building block fragments. Protein Eng. 13: 617627.
Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247: 536540.[CrossRef][Medline]
Oliva, B., Bates, P., Quero, E., Aviles, F., and Sternberg, M.J. 1997. An automated classification of the structure of protein loops. J. Mol. Biol. 266: 814830.[CrossRef][Medline]
Panchenko, A.R., Luthey-Schulten, Z., and Wolynes, P.G. 1996. Foldons, protein structural modules, and exons. Proc. Natl. Acad. Sci. 93: 20082013.
Panchenko, A.R., Luthey-Schulten, Z., Cole, R., and Wolynes, P.G. 1997. The foldon universe: A survey of structural similarity and self-recognition of independently folding units. J. Mol. Biol. 272: 95105.[CrossRef][Medline]
Peng, Z.-Y. and Wu, L.C. 2000. Autonomous protein folding units. Adv. Protein Chem. 53:147.[Medline]
Polverino de Laureto, P., Scaramella, E., Frigo, M., Wonderich, F.G., De Filippis, V., Zambonin, M., and Fontana A. 1999. Limited proteolysis of bovine
-lactalbumin: Isolation and characterization of protein domains. Protein Sci. 8: 22902303.[Abstract]
Polverino de Laureto, P., Vinante, D., Scaramella, E., Frare, E., and Fontana, A. 2001. Stepwise proteolytic removal of the ß subdomain in
-lactalbumin. The protein remains folded and can form the molten globule in acid solution. Eur. J. Biochem. 268: 43244333.[Medline]
Rose, G.D. 1979. Hierarchic organization of domains in proteins. J. Mol. Biol. 134: 447470.[CrossRef][Medline]
Shatsky, M., Wolfson, H.J., and Nussinov, R. 2002. Flexible protein alignment and hinge detection. Proteins 48: 242256.[CrossRef][Medline]
Shrake, A. and Rupley, J. 1973. Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J. Mol. Biol. 79: 351371.[CrossRef][Medline]
Srinivasan R. and Rose G.D. 1995. LINUSA simple algorithm to predict the fold of a protein. Proteins 22: 8199.[CrossRef][Medline]
. 1999. A physical basis for protein secondary structure. Proc. Natl. Acad. Sci 96: 1425814263.
Taniuchi, H., Parr, G.R., and Juillerat, M.A. 1986. Complementation in folding and fragment exchange. Methods Enzymol. 131: 185217.[Medline]
Tsai, C., Maizel, J., and Nussinov, R. 2000. Anatomy of protein structure: Visualizing how a 1d protein chain folds into a 3d shape. Proc. Natl. Acad. Sci. 97: 1203812043.
Tsai, C., Ma, B., Sham, Y., Kumar, S., Wolfson, H., and Nussinov, R. 2001. Hierarchical, building block based computational method for protein structure prediction. IBM Journal of Research and Development 45: 513523.
Tsai, C.J., Polverino de Laureto, P., Fontana, A., and Nussinov R. 2002. Comparison of protein fragments identified by limited proteolysis and computational cutting of proteins. Protein Sci. 11: 17531770.
Wodak, S.J. and Janin, J. 1981. Location of structural domains in proteins. Biochemistry 20: 65446552.[CrossRef][Medline]
Wu, L.C., Peng, Z.Y., and Kim, P.S. 1995. Bipartite structure of the
-lactalbumin molten globule. Nat. Struct. Biol. 2: 281286.[CrossRef][Medline]
Yang, X.M., Yu, W.F., Li, J.H., Fuchs, J., Rizo, J., and Tasayco, M.L. 1998. NMR evidence for the reassembly of an
/ß domain after cleavage of an
-helix: Implications for protein design. J. Am. Chem. Soc. 120: 79857986.[CrossRef]
Zehfus, M.H. 1993. Improved calculations of compactness and a reevaluation of continuous compact units. Proteins 16: 293300.[CrossRef][Medline]
Zehfus, M.H. and Rose, G.D. 1986. Compact units in proteins. Biochemistry 25: 57595765.[CrossRef][Medline]
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
S. C. Li, D. Bu, X. Gao, J. Xu, and M. Li Designing succinct structural alphabets Bioinformatics, July 1, 2008; 24(13): i182 - i189. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Friedberg, T. Harder, R. Kolodny, E. Sitbon, Z. Li, and A. Godzik Using an alignment of fragment strings for comparing protein structures Bioinformatics, January 15, 2007; 23(2): e219 - e224. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Wainreb, N. Haspel, H. J. Wolfson, and R. Nussinov A permissive secondary structure-guided superposition tool for clustering of protein fragments toward protein structure prediction via fragment assembly Bioinformatics, June 1, 2006; 22(11): 1343 - 1352. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-C. Gelly, A. G. de Brevern, and S. Hazout 'Protein Peeling': an approach for splitting a 3D protein structure into compact fragments Bioinformatics, January 15, 2006; 22(2): 129 - 133. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-H.(G. Tsai, C.-J. Tsai, B. Ma, and R. Nussinov In silico protein design by combinatorial assembly of protein building blocks Protein Sci., October 23, 2004; 13(10): 2753 - 2765. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||