|
|
||||||||
Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
Reprint requests to: M. Cynthia Goh, Department of Chemistry, University of Toronto, 80 St. George St., Toronto, ON M5S 3H6, Canada; e-mail: cgoh{at}alchemy.chem.utoronto.ca; fax: (416) 978-4526.
(RECEIVED June 7, 2002; FINAL REVISION July 24, 2002; ACCEPTED August 12, 2002)
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.0218502.
| Abstract |
|---|
|
|
|---|
Keywords: Collagen; helical wheel; triple-helix; bioinformatics; protein structure prediction
| Introduction |
|---|
|
|
|---|
22% occurrence of each in type I collagen) proline and 4-hydroxyproline, respectively. Since the initial high-resolution single crystal structure of a short model triple-helical peptide by Bella et al. in 1994, a total of 10 such structures have been solved (Table 1
1050 amino acids contained in each of the three polypeptide subunits of the typical forming collagen. We believe that an understanding of the three-dimensional orientations of charged residues and regions of high steric bulk along a collagen molecule is absolutely essential in investigating interactions of collagen with other molecules, whether collagenous or noncollagenous. This is highlighted by the distinct morphological differences observed on self-assembly of collagen with potentially quite subtle changes in assembly conditions (see Paige and Goh 2001; Paige et al. 2001). Therefore, given a triple-helical primary sequence, we would like to be able to predict both the orientations and the spatial proximity of chemically active and relevant moieties.
|
-helical coiled coil, which has enjoyed a great deal of success with structural prediction (Skolnick et al. 1999; Kajava 2001), including parameterization (Harbury et al. 1995) and statistical analysis of common features in crystal structure (Yang et al. 1999). The collagen triple-helix should be more straightforward for parameterization and prediction than is the
-helical coiled-coil, in that the packing of multiple helices side-by-side and the prediction of the relative orientations of helices are not issues. To provide an experimental basis for the prediction of the three-dimensional layout of residues in a collagen-scale triple-helix, we have performed a statistical analysis of all existing high-resolution model peptide structures. Although models of the triple-helix have been developed that successfully describe some of the features of X-ray diffraction experiments and single crystal structures (Brodsky and Shah 1995; Mayo 1996; Beck and Brodsky 1998) or during the course of molecular dynamics studies (Klein and Huang 1999), we have decided to avoid any preconceptions of the structural motif and instead simply perform a statistical analysis using all presently available structures. | Results and Discussion |
|---|
|
|
|---|
-helix: the number of residues per 360° turn and the translation along the length of the helix per residue. This is insufficient for a triple-helix, however, because the residues do not all fall the same distance from the middle of the helix. Hence, a third parameter must also be included: the distance from the center-line of the helix. Unlike an
-helix, in which the rise between residues is constant, each residue in a Gly-X-Y triplet will be translated differently along the length of the helix. As a final wrinkle in comparison to the
-helical wheel, the tripeptide nature of the triple-helix structure means that the relative locations of the three polypeptide chains must also be part of our parameter set.
Taking the Gly-X-Y triplets as independent subunits within each model peptide structure and considering each triple-helical peptide structure in a cylindrical frame of reference, we have statistically determined values of translation along the long-axis of the helix (
z), the angular stagger (
), and the radius from the helix center (rh). The following five sets of statistics are sufficient to locate the backbone and Cß atoms in a triple-helix:
, C and O) and for Cß;
z and 
for triplets, determined from Glyn
Glyn+1, Xn
Xn+1, and Yn
Yn+1 using all backbone atoms;
z and 
between C
atoms for Gly
X and X
Y, and
z for Y
Gly;
z and 
of N, C, O, and Cß relative to C
at each triplet position; and
z and 
.
As is apparent from Table 1
, the currently available peptide structures are almost exclusively Gly-imino-imino in all cases, except Protein Data Bank (PDB) entry 1BKV. (Analysis was performed separately for the iminodeficient region in the middle of 1BKV.) Rather than excluding specific triplets from the data set on the basis of conformational differences or anomalies, we have chosen to perform a statistical filter on the data set, as described in Materials and Methods.
This analysis allows generation of the C
and Cß trace (looking down the helical long-axis) shown in Figure 1
in what we propose to be a useful helical-wheel format for the triple-helix, along the lines of the
-helical wheel of Schiffer and Edmundson (1967). The well-known left-handed triplet-to-triplet helicity (in which each triplet itself winds in a right-handed manner) and right-handed chain-to-chain helicity are readily apparent in Figure 1
. Along with this projection down the helical long-axis, we produce the corresponding position of each atom along the long-axis. All parameters required for production of a C
trace of a triple-helix backbone are detailed in Table 2
. Although the C
trace, perhaps of the X and Y residues only, may be the most desirable manner to represent a triple-helix with upwards of 3000 residues, applications such as side-chain prediction may require the entire backbone. Given the highly characteristic dihedral angles of the triple-helix (Brodsky and Shah 1995; Mayo 1996; Beck and Brodsky 1998), prediction of the C
locations should also allow the locations of the other backbone atoms to be predicted. Statistical analysis of the set of model peptide crystal structures shows this to indeed be true. Parameters allowing incorporation of the remainder of the backbone and of Cß atoms are provided in Table 3
. PDB format model structure files, using each parameter set, for the simple sequence [(G-A-A)10]3 are available in the Supplementary Material. Triple-helix structures generated using the statistically derived parameters in Tables 2 and 3![]()
show very close agreement with expected bond lengths, as based on the work of Engh and Huber (1991). The backbone dihedral angles are also in excellent agreement with those of the data sets used. These validations are shown in detail in Table 4
for both the iminorich and iminodeficient parameter sets. Further validation of the parameter set is provided through analysis of the hydrogen-bonding patterns, using the DSSP program of Kabsch and Sander (1983). Both of the model PDB structures display the expected Gly-to-X hydrogen-bond patterns, with calculated strengths of -1.9 to -2.0 kcal/mole for the iminorich parameter set and -2.8 to -2.9 kcal/mole for the parameters based on the iminodeficient middle portion of 1BKV. As more high-resolution structures are solved, the statistically derived parameters will certainly improve.
|
|
|
|
z and the triplet-to-triplet 
. The triplet-to-triplet 
of 53° and chain-to-chain 
of -102° for the iminorich region generate
6.8 triplets and
3.5 residues per 360° revolution, corresponding quite closely to the 7/2 (or 75) helix proposed initially by Okuyama et al. (1977) Those of 40° and -107° in the iminodeficient region give nine triplets and 3.36 residues per 360°, which is somewhat compressed compared with the Rich-Crick 10/3 (or 107) helix (Rich and Crick 1961). For the iminodeficient region, the statistics are entirely based on a single region of one model peptide structure (1BKV)these parameters will definitely improve as further (G-X-Y)n structures are solved. We have chosen to keep the statistically derived values in both cases, rather than values based on the ideal Okuyama or Rich-Crick models. In comparison to existing homology modeling methods, our parameterization has the following differences. First, the peptide backbone used for prediction of the triple-helix is a statistical compilation of all existing high-resolution structures. Second, the parameter set allows for easy prediction of a triple-helix of any lengthextrapolation of an arbitrarily long structure from a short peptide using homology modeling tools is by no means trivial. Finally, consideration of the primary sequence of a triple-helical molecule would allow the prediction of regions of high verus low triple-helical propensity, as extensively studied by Persikov et al. (2000, 2002). The parameters given herein very readily allow the production of a triple-helix, with various regions predicted to be of differing helical stability.
A program such as SCWRL (Bower et al. 1997) takes one existing peptide backbone from a PDB file and adds side-chain atoms onto it. This is a highly valuable methodology for homology modeling; however, unlike this approach, the attempt here is to produce a statistically representative generalization of the triple-helix. In a strict homology model, one would need either to arbitrarily choose only a single triple-helical structure to use as the homologous scaffold with SCWRL, or to generate a parameter set equivalent to that given herein. Our statistically produced triple-helix provides a structure with an overall average root mean square difference (RMSD) of 3.70 Å in comparison to model peptides generated by SCWRL using each of the iminorich sequences used herein (Table 1
). Notably, the statistical parameters produce a triple-helix with only a 0.263 Å RMSD from the SCWRL-modeled chains AC of PDB 1K6F and 0.269 Å from chains DF (for atoms N, C
, C, O, and Cß), showing an uncanny closeness to the extremely high-resolution structure 1K6F solved by Berisio et al. (2002). This could be interpreted to imply overrepresentation of 1K6F in the overall parameter set, because it comprises just <30% of the values included. The equivalent parameter set calculated without the inclusion of 1K6F, however, provides a predicted structure with an RMSD of only 0.053 Å compared to that with the parameter set given herein. Therefore, 1K6F improves the statistical values rather than skewing the overall parameter set. As a result, homology models constructed with 1K6F would likely be highly similar to structures generated with the iminorich parameter set given in Tables 2 and 3![]()
. However, to generate a triple-helix of arbitrary length, statistical parameters such as those herein would still need to be produced from 1K6F and would be based on only this single structure, which is subtly different from the statistically derived data set. Software such as Gencollagen has been used extremely successfully for molecular dynamic predictions of short model peptides (Klein and Huang 1999) but is based on idealized bond parameters. Our parameter set does not rely in any way on an idealized triple-helix; instead, we present a statistically derived framework, independent of any single crystal structure, from which triple-helical collagen structures may be accurately predicted.
Given the agreement in bond lengths and backbone dihedral angles (as in Table 4
), as well as with the interchain H-bonding patterns, we believe that the statistical parameters given herein are very reasonable for predicting a triple-helixespecially in iminorich regions. Source code is freely available on request for the generation of triple-helical structures of any primary sequence and length, and a Web-based interface will be available at http://www.chem.utoronto.ca/staff/MCG/. We also plan to maintain an updated parameter set at this web site as further structures become available.
| Materials and methods |
|---|
|
|
|---|
Before statistical analysis, each triple-helical structure was converted to cylindrical polar coordinates. This process, briefly, involved the following steps. Assuming net directionality running N-terminal to C-terminal for a triple-helix, a vector composed of the sum of atom-to-atom vectors for a given model peptide structure will be primarily aligned with the helical long-axis. Depending on the structure in question, the optimal composition of this aggregate vector sum varied. For example,
[C
n-1
C
n + Nn-1
Nn + Cn-1
Cn] seemed to provide the best alignment for 1QSU. Other structures were better aligned by an aggregate vector summing each bond along the backbone. See the Supplementary Material for the details of the compositions of aggregate vectors used.
The atomic coordinates for the entire structure were then rotated such that this aggregate vector was aligned along an arbitrary axis which was chosen as the Z-axis in an (rh,
, z) style cylindrical polar-coordinate system, where rh is the distance from the center-line of the cylinder,
is the counterclockwise angle from an arbitrary line in the plane perpendicular to z (i.e., C
of G1 lies at
= 0 in Fig. 1
), and z is the translational distance along the cylinder. The means and standard deviations given in Tables 2 and 3![]()
could then be readily calculated.
For structure 1BKV, residues 1021 of each chain were analyzed separately owing to the more extended conformation observed in this amino acidrich X and Y region (note that the last triplet in this region is the imino triplet Gly-Pro-Hypit was included because of its better agreement with the amino portion) compared with the other nine structures that are iminorich. Rather than picking specific triplets to exclude from the overall data set (such as terminal triplets or those containing nonstandard residues), any values lying outside of ±1.645 SD from the mean of the original data set were excluded from the values included in Figure 1
and in Tables 2 and 3![]()
. Because there is no single accepted method for rejection of outliers (for extensive discussion, see Barnett and Lewis 1994), this is necessarily an arbitrary trimming of each mean. Were the initial data sets normally distributed, 10% of the data points would be excluded by this filter; in almost all cases, <10% of the data was excluded, and all resulting normal distributions had a better qualitative fit to the central desired portion of the data, implying that the trimmed means and standard deviations are representative. Also, in all cases but four that are indicated in the table, trimmed means calculated in this manner were <1% different from those calculated by the often used method with four iterations of trimming outliers at µ ± 2
, where a new µ and
are calculated after each trim. Normally distributed values were not assumed; comparison to normal distributions was simply used as a qualitative aid during analysis. Note that no such filtering was applied in the separate data set for the middle section of 1BKV for any statistics reported with an N of <85. The proportions excluded by this statistical filtering are available in the Supplementary Material. In general, such filtering will become less and less necessary as more model peptide structures become incorporated into the parameter set.
SCWRL 2.95 (Bower et al. 1997) freely available from F.E. Cohen (University of California at San Francisco, USA) and R.L. Dunbrack (Fox Chase Cancer Center, Philadelphia, PA, USA) at http://www.fccc.edu/research/labs/dunbrack/scwrl/) was used for comparison with homology models. Each PDB file listed in Table 1
was used as a homology model backbone template for comparison to PDB files produced using the parameters in Tables 2 and 3![]()
. Note that for the iminorich regions, SCWRL was used to produce a homology model with pure (GPP) triplet repeats to ensure prediction of a similar Cß orientation to the statistical average. RMSD calculations were performed with ProFit 2.2, freely available from A.C.R. Martin, University of Reading, UK, (http://www.bioinf.org.uk/software/profit/), which uses the McLachlan algorithm (McLachlan 1982).
Finally, it should be noted that the parameters given herein are amenable to the production of triple-helical structures lying along the long axis of a cylinder described in cylindrical polar coordinates of the form (rh,
, z). The sign convention used is as follows: A positive 
is a counterclockwise rotation, and a positive
z corresponds to a translation along the helical long-axis from N-terminal to C-terminal. Conversion of such a structure to cartesian coordinates in form (x, y, z) is straightforward: x = rh(cos
), y = rh(sin
), and z remains unchanged.
| Electronic supplemental material |
|---|
|
|
|---|
| Acknowledgments |
|---|
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| References |
|---|
|
|
|---|
Beck, K. and Brodsky, B. 1998. Supercoiled protein motifs: The collagen triple-helix and the
-helical coiled coil. J. Struct. Biol. 122: 1729.[CrossRef][Medline]
Bella, J., Eaton, M., Brodsky, B., and Berman, H.M. 1994. Crystal and molecular structure of a collagen-like peptide at 1.9 Å resolution. Science 266: 7581.
Bella, J., Brodsky, B., and Berman, H.M. 1995. Hydration structure of a collagen peptide. Structure 3: 893906.[Medline]
Berisio, R., Vitagliano, L., Mazzarella, L., and Zagari, A. 2002. Crystal structure of the collagen triple-helix model [(Pro-Pro-Gly)10]3. Protein Sci. 11: 262270.
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The Protein Data Bank. Nucleic Acids Res. 28: 235242.
Bernstein, F.C., Koetzle, T.F., Williams, G.J., Meyer, Jr., E.E., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T., and Tasumi, M. 1977. The Protein Data Bank: A computer-based archival file for macromolecular structures. J. Mol. Biol. 112: 535542.[Medline]
Bower, M.J., Cohen, F.E., and Dunbrack, Jr., R.L. 1997. Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: A new homology modeling tool. J. Mol. Biol. 267: 12681282.[CrossRef][Medline]
Brodsky, B. and Shah, N.K. 1995. Protein motifs, 8: The triple-helix motif in proteins. FASEB J. 9: 15371546.[Abstract]
Engh, R.A. and Huber, R. 1991. Accurate bond and angle parameters for X-ray protein-structure refinement. Acta Crystallogr. A 47: 392400.[CrossRef]
Harbury, P.B., Tidor, B., and Kim, P.S. 1995. Repacking protein cores with backbone freedom: Structure prediction for coiled coils. Proc. Natl. Acad. Sci. 92: 84088412.
Kabsch, W. and Sander, C. 1983. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22: 25772637.[CrossRef][Medline]
Kajava, A.V. 2001. Proteins with repeated sequence: Structural prediction and modeling. J. Struct. Biol. 134: 132144.[CrossRef][Medline]
Klein, T.E. and Huang, C.C. 1999. Computational investigations of structural changes resulting from point mutations in a collagen-like peptide. Biopolymers 49: 167183.[CrossRef][Medline]
Kramer, R.Z., Vitagliano, L., Bella, J., Berisio, R., Mazzarella, L., Brodsky, B., Zagari, A., and Berman, H.M. 1998. X-ray crystallographic determination of a collagen-like peptide with the repeating sequence (Pro-Pro-Gly). J. Mol. Biol. 280: 623638.[CrossRef][Medline]
Kramer, R.Z., Bella, J., Mayville, P., Brodsky, B., and Berman, H.M. 1999. Sequence-dependent conformational variations of collagen triple-helical structure. Nat. Struct. Biol. 6: 454457.[CrossRef][Medline]
Kramer, R.Z., Venugopal, M.G., Bella, J., Mayville, P., Brodsky, B., and Berman, H.M. 2000. Staggered molecular packing in crystals of a collagen-like peptide with a single charged pair. J. Mol. Biol. 301: 11911205.[CrossRef][Medline]
Mayo, K.H. 1996. NMR and X-ray studies of collagen model peptides. Biopolymers 40: 359370.[CrossRef][Medline]
McLachlan, A.D. 1982. Rapid comparison of protein structures. Acta Crystallogr. A 38: 871873.[CrossRef]
Nagarajan, V., Kamitori, S., and Okuyama, K. 1998. Crystal structure analysis of collagen model peptide (Pro-Pro-Gly)10. J. Biochem. (Tokyo) 124: 11171123.
Nagarajan, V., Kamitori, S., and Okuyama, K. 1999. Structure analysis of a collagen-model peptide with a (Pro-Hyp-Gly) sequence repeat. J. Biochem. (Tokyo) 125: 310318.
Okuyama, K., Takayanagi, M., Ashida, T., and Kakudo, M. 1977. New structural model for collagen. Polym. J. 9: 341343.[CrossRef]
Paige, M.F. and Goh, M.C. 2001. Ultrastructure and assembly of segmental long spacing (SLS) collagen studied by atomic force microscopy. Micron 32: 355361.
Paige, M.F., Rainey, J.K., and Goh, M.C. 2001. A study of fibrous long spacing collagen ultrastructure and assembly by atomic force microscopy. Micron 32: 341353.
Persikov, A.V., Ramshaw, J.A., Kirkpatrick, A., and Brodsky, B. 2000. Amino acid propensities for the collagen triple-helix. Biochemistry 39: 1496014967.[CrossRef][Medline]
. 2002. Peptide investigations of pairwise interactions in the collagen triple-helix. J. Mol. Biol. 316: 385394.[CrossRef][Medline]
Rich, A. and Crick, F.H.C. 1961. Molecular structure of collagen. J. Mol. Biol. 3: 483506.[Medline]
Schiffer, M. and Edmundson, A.B. 1967. Use of helical wheels to represent the structures of proteins and to identify segments with helical potential. Biophys. J. 7: 121135.
Skolnick, J., Kolinski, A., and Mohanty, D. 1999. De novo predictions of the quaternary structure of leucine zippers and other coiled coils. Int. J. Quantum Chem. 75: 165176.[CrossRef]
Vitagliano, L., Berisio, R., Mazzarella, L., and Zagari, A. 2001. Structural bases of collagen stabilization induced by proline hydroxylation. Biopolymers 58: 459464.[CrossRef][Medline]
Yang, P.K., Tzou, W.S., and Hwang, M.J. 1999. Restraint-driven formation of
-helical coiled coils in molecular dynamics simulations. Biopolymers 50: 667677.[CrossRef][Medline]
| Web references |
|---|
|
|
|---|
http://www.rcsb.org/pdb; Protein Data Bank.
http://www.fccc.edu/research/labs/dunbrack/scwrl; SCWRL 2.95.
http://www.chem.utoronto.ca/staff/MCG/; Web-based interface for the generation of triple-helical structures of any primary sequence and length.
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
S. Perumal, O. Antipova, and J. P. R. O. Orgel Collagen fibril architecture, domain organization, and triple-helical conformation govern its proteolysis PNAS, February 26, 2008; 105(8): 2824 - 2829. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. A. Gregory, N. M. Thielens, M. Matsushita, R. Sorensen, G. J. Arlaud, J. C. Fontecilla-Camps, and C. Gaboriaud The X-ray Structure of Human Mannan-binding Lectin-associated Protein 19 (MAp19) and Its Interaction Site with Mannan-binding Lectin and L-ficolin J. Biol. Chem., July 9, 2004; 279(28): 29391 - 29397. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Gaboriaud, J. Juanhuix, A. Gruez, M. Lacroix, C. Darnault, D. Pignol, D. Verger, J. C. Fontecilla-Camps, and G. J. Arlaud The Crystal Structure of the Globular Head of Complement Protein C1q Provides a Basis for Its Versatile Recognition Properties J. Biol. Chem., November 21, 2003; 278(47): 46974 - 46982. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. A. Gregory, N. M. Thielens, G. J. Arlaud, J. C. Fontecilla-Camps, and C. Gaboriaud X-ray Structure of the Ca2+-binding Interaction Domain of C1s: INSIGHTS INTO THE ASSEMBLY OF THE C1 COMPLEX OF COMPLEMENT J. Biol. Chem., August 22, 2003; 278(34): 32157 - 32164. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |