|
|
||||||||
1 W.M. Keck Foundation Center for Molecular Structure, Department of Chemistry and Biochemistry, California State University (CSU) Fullerton, Fullerton, California 92834-6866, USA
2 Department of Biochemistry and Biophysics, Texas A&M University, College Station, Texas 77843-2128, USA
3 Biology and Biotechnology Research Program, Lawrence Livermore National Laboratory (LLNL), Livermore, California 94551, USA
Reprint requests to: Katherine A. Kantardjieff, W.M. Keck Foundation Center for Molecular Structure, Department of Chemistry and Biochemistry, California State University Fullerton, 800 N. State College Blvd., Fullerton, CA 92834-6866, USA; e-mail: kkantardjieff{at}fullerton.edu; fax: (734) 939-4225.
(RECEIVED March 3, 2003; FINAL REVISION June 9, 2003; ACCEPTED June 9, 2003)
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.0350503.
| Abstract |
|---|
|
|
|---|
Keywords: Solvent content; protein crystals; Matthews coefficient; Matthews probabilities
| Introduction |
|---|
|
|
|---|
26% solvent content). Matthews recognized that the distribution of VM would be useful in preliminary studies of protein crystals to estimate the number of molecules per asymmetric unit, particularly in the molecular weight region below 70 kD, although he suggested that examples would likely be found with VM lying outside the range. Although it was also noted that higher molecular weight proteins had a tendency to form crystals with a higher fractional volume of solvent (Matthews 1976), there were not enough data to statistically determine the range of VM for such proteins. More than 30 years have passed since Matthews first analysis of protein crystal solvent content, yet the original distribution of VM is still widely used as a guide in determining the contents of the crystallographic asymmetric unit. Given the plethora of crystal forms now available in the Protein Data Bank (PDB; Berman et al. 2000), we decided to revisit the distribution of VM for protein crystals and determine whether the range of values and frequencies has substantially changed. We have treated complexes of proteins and nucleic acids as a separate group, and we have also examined VM for nucleic acid crystals.
| Results |
|---|
|
|
|---|
47%. Here, we have used an average partial specific volume (psv) of 0.74 cm3/g for proteins, which, unless there is reason to believe that a protein has a significantly different psv, is still appropriate for most proteins (Matthews 1968; Arakawa and Timasheff 1985; Prakash and Timasheff 1985; Perkins 1986; Durchschlag and Zipper 1994; Quillin and Matthews 2000).
|
|
|
Nucleic acids
The frequency distribution of VM for nucleic acid crystals is shown in Figure 4
. The range of VM for nucleic acids is also large, with a mean of 2.59 Å3/Dalton, median of 2.34 Å3/Dalton, and most frequent value of 2.35 Å3/Dalton, corresponding to a solvent content of
64%. Here, we have used an average psv of 0.50 cm3/g for nucleic acids in the calculation of the solvent content for these crystal forms, although psv will depend on buffer type, pH, and ionic strength (Cohen and Eisenberg 1968; Woodward and Lebowitz 1980). When the nucleic acid data are clustered into subsets by molecular weight and resolution (not shown), similar features to those in the protein data are observed. Although lower molecular weight nucleic acids tend to contribute to the low end of the VM distribution, they are more widely distributed throughout the range than is the case for proteins. Molecular weight is not a significant discriminator of VM for nucleic acids. The molecular weight frequency distribution itself for nucleic acids is rather narrow, whereas the range and frequency distribution of VM versus resolution for nucleic acid crystal structures are much broader. As might be expected, crystals diffracting to higher resolution again appear to cluster near the lower end of the VM distribution, further evidence that more tightly packed crystals generally diffract to higher resolution. Because the bulk of the nucleic acid data pertains to DNA crystals (281), use of the VM distribution for predictive purposes, as described in the discussion, will be restricted to DNA crystals. In view of the small data set, discrimination by resolution in the VM probability calculator is not reliable and has not been implemented.
|
60% under the assumption of an average protein/nucleic acid ratio of 75%:25%. The solvent content for each complex has to be calculated based on the actual protein/nucleic acid ratio as described in the Experimental section. The molecular weight frequency distribution for these complexes is rather narrow, although the range of the molecular weights is quite broad, and molecular weight is not a significant discriminator of VM for crystals of these complexes. Despite the small sample size (410), when the data are clustered into subsets by molecular weight and resolution (data not shown), similar features to those in the protein data emerge. Again, resolution appears to be a significant discriminator of VM, but sample size is insufficient to allow reliable discrimination by resolution in the VM probability calculator.
|
| Discussion |
|---|
|
|
|---|
|
The results are represented in tabular form at the top of the output, followed by two graphs showing the normalized probability distributions (resolution corrected and all data) against VM and solvent content, respectively (Fig. 6
). It must be understood that the results are always relative probabilities based on our current state of knowledge, and that exceptions are possible, despite very low statistical probabilities.
| Materials and methods |
|---|
|
|
|---|
Given the large total amount of independent entries, multiple observations of the same molecule in the same crystal form did not appear to create significant over sampling (as indicated by smooth VM distributions), with two exceptions: a high frequency of occurrence of T4 lysozyme mutant structures in the protein data set, belonging to the space group P3221, and a high frequency of occurrence of DNA polymerase ß in the protein/nucleic acid complex data set, belonging to the space group P21212. Nevertheless, to reduce the possibility of statistical bias and create "nonredundant" data sets of "unique" crystal forms, 3536 records having the same space group, cell volume within 1%, and MW within 1% were removed, leaving only the highest resolution record of each set of "duplicates" in the data set. The 1% filter, in the absence of detailed analysis of intermolecular contacts, is a reasonable approach to eliminate most trivial repetitions of closely related structures, such as isomorphous mutants of the same protein and inhibitor complexes of a given protein. Descriptive statistics (limits, mean, median, and mode) were calculated for the frequency distributions of VM, resolution, and molecular weight, and the VM frequency distributions for proteins, nucleic acids, and protein nucleic acid complexes were analyzed as a function of both molecular weight and resolution. Cluster (Tryon 1939; Tryon and Bailey 1973; Hartigan 1975) and discriminant function (Klecka 1980; Kachigan 1986; Huberty 1994) analysis were performed in an attempt to reveal any statistically significant relationships that could be used to calculate probabilities and to determine which parameters may best discriminate between clusters of data.
Implementation of the Matthews probability calculator
The frequency distribution of VM has been approximated by an empirical five parameter
![]() |
double exponential (modified "extreme function") suitable for the description of highly skewed peaks.
![]() |
![]() |
For proteins, the function was parameterized for 12 resolution ranges containing all VM data from highest resolution to each respective lower resolution boundary. The corresponding parameter files and function subroutine may be downloaded from the Web page, and will be updated periodically.
| Acknowledgments |
|---|
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| References |
|---|
|
|
|---|
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P. 2000. The Protein Data Bank. Nucleic Acids Res. 28: 235242.
Cohen, G. and Eisenberg, H. 1968. Deoxyribonucleate solutions: Sedimentation in a density gradient, partial specific volumes, density and refractive index increments, and preferential interactions. Biopolymers 6: 10771100.[CrossRef][Medline]
Durchschlag, H. and Zipper, P. 1994. Calculation of the partial volume of organic compounds and polymers. Prog. Colloid Polym. Sci. 94: 2039.
Hartigan, J. 1975. Clustering algorithms. Wiley, New York.
Huberty, C.J. 1994. Applied discriminant analysis. Wiley and Sons, New York.
Kachigan, S.K. 1986. Statistical analysis. Radius Press, New York.
Klecka, W.R. 1980. Discriminant analysis. Sage, Beverly Hills, CA.
Matthews, B.W. 1968. Solvent content of protein crystals. J. Mol. Biol. 33: 491497.[Medline]
. 1976. X-ray crystallographic studies of proteins. Annu. Rev. Phys. Chem. 27: 493523.[CrossRef]
Perkins, S.J. 1986. Protein volumes and hydration effects: The calculation of partial specific volumes, neutron scattering matchpoints and 280-nm absorption coefficients for proteins and glycoproteins from amino acid sequences. Eur. J. Biochem. 157: 169180.[Medline]
Podjarny, A., Howard, E., Mitschler, A., and Chevrier, B. 2002. X-ray crystallography at subatomic resolution. Europhys. News 33: 111.
Prakash, V. and Timasheff, S.N. 1985. Calculation of partial specific volumes of proteins in 8 M urea solutions. Methods Enzymol. 117: 5360.[Medline]
Quillin, M.L. and Matthews, B.W. 2000. Accurate calulation of the density of proteins. Acta Crystallogr. D56: 791794.
Tryon, R.C. 1939. Cluster analysis. Edwards Brothers, Ann Arbor, MI.
Tryon, R.C. and Bailey, D.E. 1973. Cluster analysis. McGraw-Hill, New York.
Woodward, R.S. and Lebowitz, J.J. 1980. A revised equation relating DNA buoyant density to guanine plus cytosine content. Biochem. Biophys. Methods 2: 307309.[CrossRef][Medline]
Wukovitz, S.W. and Yeates, T.O. 1995. Why protein crystals favour some space-groups over others. Nat. Struct. Biol. 2: 10621067.[CrossRef][Medline]
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
M. Chruszcz, A. Wlodawer, and W. Minor Determination of Protein Structures--A Series of Fortunate Events Biophys. J., July 1, 2008; 95(1): 1 - 9. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Chruszcz, W. Potrzebowski, M. D. Zimmerman, M. Grabowski, H. Zheng, P. Lasota, and W. Minor Analysis of solvent content and oligomeric states in protein crystals--does symmetry matter? Protein Sci., April 1, 2008; 17(4): 623 - 632. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Guhaniyogi, T. Wu, S. S. Patel, and A. M. Stock Interaction of CheY with the C-Terminal Peptide of CheZ J. Bacteriol., February 15, 2008; 190(4): 1419 - 1428. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. E. Douglas, K. D. Corbett, J. M. Berger, G. McFadden, and T. M. Handel Structure of M11L: A myxoma virus structural homolog of the apoptosis inhibitor, Bcl-2 Protein Sci., April 1, 2007; 16(4): 695 - 703. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. A. Watson, J. Brown, K. Harlos, J. A. Eble, T. S. Walter, and C. A. O'Callaghan The Crystal Structure and Mutational Binding Analysis of the Extracellular Domain of the Platelet-activating Receptor CLEC-2 J. Biol. Chem., February 2, 2007; 282(5): 3165 - 3172. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Verras, A. Alian, and P. R.O. d. Montellano Cytochrome P450 active site plasticity: attenuation of imidazole binding in cytochrome P450cam by an L244A mutation Protein Eng. Des. Sel., November 1, 2006; 19(11): 491 - 496. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Ritter and G. E. Schulz Structural Basis for the Entrance into the Phenylpropanoid Metabolism Catalyzed by Phenylalanine Ammonia-Lyase PLANT CELL, December 1, 2004; 16(12): 3426 - 3436. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |