|
|
||||||||
Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, Colorado 80523, USA
Reprint requests to: Robert W. Woody, Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, CO 80523, USA; e-mail: rww{at}lamar.colostate.edu; fax: (970) 491-0494.
(RECEIVED June 17, 2003; FINAL REVISION September 30, 2003; ACCEPTED September 30, 2003)
Supplemental material: See www.proteinscience.org
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.03258404.
1 Wallace et al. (2003) did not specify absolute values in their definition of R. However, the absence of negative values of R and Rav reported in their article indicates that they have used absolute differences in calculating R values, as it should be. Also, the Rav values they reported included the differences between the CD estimates and crystal structure values of the
, ß and turns fractions only. ![]()
2 The bacterial source of the protein is listed as it appears in the corresponding PDB file. The source as listed in Park et al. (1992) may be different because of changes in microbial taxonomy. ![]()
3 Bovine cytochrome oxidase and porin (gene OmpF product) were represented twice in the Park et al. (1992) set of membrane proteins. The cytochrome c oxidase, the source of which was identified as human erythrocytes, was actually bovine cytochrome oxidase; the two CD spectra of bovine cytochrome oxidase were almost identical. ![]()
4 Of the two porin (gene OmpF product) spectra, we selected the one from Dr. A. Tucker and Dr. J.H. Lakey (batch 2). Dr. Tucker also provided the spectra of two other porins. ![]()
| Abstract |
|---|
|
|
|---|
Keywords: protein secondary structure; reference protein set; membrane proteins; protein CD; CDPro
Abbreviations: CD, circular dichroism CCA, the convex constraint method for protein CD analysis CDSSTR, Johnsons minimal basis-random selection method for protein CD analysis CONTIN/LL, the ridge-regression method for protein CD analysis combined with the locally linearized method for variable selection DSSP, a computer program for defining secondary structure of proteins PDB, Protein Data Bank SELCON3, the self-consistent method for protein CD analysis, version 3 MP13, reference set of 13 membrane proteins MP30, reference set of 30 membrane proteins SP29, reference set of 29 soluble proteins SP37, reference set of 37 soluble proteins SP42, reference set of 42 soluble proteins SP43, reference set of 43 soluble proteins SP48, reference set of 48 soluble proteins SMP50, reference set of 50 soluble + membrane proteins SMP56, reference set of 56 soluble + membrane proteins RMS, root mean square NRMSD, normalized RMS deviation
, RMS deviation r, correlation coefficient
R, regular
-helix
D, distorted
-helix
, total
-helix ßR, regular ß-strand ßD, distorted ß-strand ß, total ß-sheet T, turns U, unordered fX, fractional content of secondary structure X, X =
, ß, T and U
X, RMS deviation between the CD-estimated and the X-ray values of the secondary structure X for a set of proteins, X =
, ß, T and U rX, correlation between the CD-estimated and the X-ray values of the secondary structure X for a set of proteins, X =
, ß, T and U
f, RMS deviation between the CD estimates and the crystal structure values of secondary structure fractions for a given protein
| Introduction |
|---|
|
|
|---|
) can be expressed as a linear combination of secondary structure component spectra, Bk
, given as
![]() |
where fk is the fraction of secondary structure k, forms the basis for such an analysis. Earlier methods used Bk
obtained from polypeptides in specific conformations (Greenfield and Fasman 1969; Brahms and Brahms 1980). Most current methods (Hennessey Jr. and Johnson Jr. 1981; Provencher and Glöckner 1981; Pancoska and Keiderling 1991; Böhm et al. 1992; Andrade et al. 1993; Sreerama and Woody 1993) use Bk
derived from a set of CD spectra of proteins with known secondary structure, that is, a reference protein set. Reference protein sets that include a large number of proteins, belonging to different tertiary structure classes and with varying secondary structure contents, have been constructed (Hennessey Jr. and Johnson Jr. 1981; Yang et al. 1986; Pancoska et al. 1995; Sreerama and Woody 2000b; Sreerama et al. 2000, 2001) and are expected to provide a good representation of the spectral and structural variability in proteins. However, such reference protein sets currently available for protein CD analysis include only soluble proteins due to the paucity of membrane protein structures.
Wallace et al. (2003) have recently examined the performance of soluble protein reference sets in analyzing membrane protein CD spectra, using CDPro software (Sreerama and Woody 2000b). Their analysis was performed for eight membrane proteins, and the results for two representative proteins were presented. They concluded that the soluble protein reference sets give inaccurate results for membrane protein CD analysis, which they attributed to differences in spectral characteristics of membrane and soluble proteins, thus necessitating the development of a membrane protein reference set.
A reference set of membrane protein CD spectra, but without any secondary structure information, was developed by Park et al. (1992). This set of CD spectra was used to estimate the transmembrane and peripheral helical content in the corresponding membrane proteins with the convex constraint analysis (CCA; Perczel et al. 1991). Such an analysis of membrane protein CD spectra without the knowledge of secondary structures was possible because CCA extracts the so-called pure component spectra in a data set without requiring any structural information (Perczel et al. 1991). In the CCA method, secondary structure content is estimated by assigning the extracted pure component spectra to specific structures and determining the fractions of each component spectrum in a given protein CD spectrum. Park et al. (1992) were partially successful in the analysis of CD spectra of three membrane proteins for which structures were available.
The methods for protein CD analysis and the availability of membrane protein structures have improved since the publication of Park et al. (1992). The improved CD analysis methods, however, require both the spectra and secondary structures for the reference proteins (Sreerama and Woody 2000b). By using a subset of the Park et al. (1992) membrane protein data set for which crystal structures are available (13 membrane proteins), we have examined the performance of three popular methods for protein CD analysis and the soluble protein reference sets available in CDPro software (Sreerama and Woody 2000b). Our conclusions differ from those of Park et al. (1992) and Wallace et al. (2003). Both Park et al. (1992) and Wallace et al. (2003) concluded that the soluble protein reference sets are inadequate for the analysis of membrane proteins because of bias effect of the reference proteins, optical artifacts, different spectral characteristics, etc. We did not find any systematic differences in spectral characteristics of soluble and membrane proteins. We also found that the CD analysis results, using soluble protein reference sets for membrane proteins, are only slightly inferior to those obtained for soluble proteins. We constructed a membrane protein reference set with this limited set of 13 membrane proteins and examined its performance, both separately and in combination with soluble protein reference sets, by using the CD analysis programs available in CDPro. The performance of the membrane protein reference set was poor, probably due to the limited number of reference proteins. However, the inclusion of membrane proteins in the soluble protein reference sets resulted in improvements for both membrane and soluble proteins.
| Results |
|---|
|
|
|---|
The secondary structures used in CDPro are from the DSSP assignments (Kabsch and Sander 1983) of crystal structures as adapted by Sreerama et al. (1999). The six secondary structures estimated in CDPro are regular
-helix (
R), distorted
-helix (
D), regular ß-sheet (ßR), distorted ß-sheet (ßD), turns (T), and unordered (U). For simplicity and comparison with literature data, we have summarized the results from the CD analysis for four secondary structures:
-helix (
), ß-sheet (ß), turns, and unordered. (Results for individual membrane proteins are provided in Supplemental Material.) The fractions of
and ß were obtained by adding the corresponding regular and distorted fractions, for example,
=
R +
D. The performance of the analysis is measured by performance indices: root mean square (RMS) deviations (
) and correlation coefficients (r) between the crystal structure and the CD predicted values. The performance indices are given for each secondary structure separately (e.g.,
-helix, 
, and r
) and all four secondary structures collectively (
and r, representing overall performance).
Analysis of membrane proteins with soluble protein reference sets
The CD spectra of 13 membrane proteins included in this study are shown in Figure 1
. These spectra were taken (with permission from The Protein Society) from a larger set of 30 CD spectra of membrane proteins measured in the laboratory of Dr. G.D. Fasman (Brandeis University, Waltham, MA), using samples provided by the leading laboratories working on these membrane proteins (Park et al. 1992). We selected these 13 CD spectra because of the availability of the corresponding membrane protein crystal structures in the Protein Data Bank (PDB; Berman et al. 2000). The spectra are identified in the figure by the PDB code of the crystal structure of the membrane protein. The secondary structure fractions for the 13 membrane proteins, assigned by DSSP (Kabsch and Sander 1983), are given in Table 1
. Of the 13 membrane proteins, nine have moderate to high
-helical content (
R +
D), and the other four have high ß-sheet content (ßR + ßD).
|
|
and ß fractions showed marked improvements with the increase in the NREF from 29 to 43, although results from SP29 were not obtained with the full wavelength range available in the reference set because of the smaller range of membrane protein CD data. When we considered reference sets with NREF from 37 to 48, the performance indices for T and U fractions were comparable to those of soluble proteins and showed, in general, smaller variations with the choice of reference set. The performance indices for
and ß fractions were poorer and showed slightly larger variations. Overall performance indices obtained from SP43 and SP48 were similar. These performance indices obtained for membrane proteins compare favorably with those obtained for soluble proteins.
|
- or ß-structures.
Analysis of membrane proteins with reference sets, including membrane proteins
With the relative success of soluble protein reference sets in analyzing membrane proteins and the availability of both CD spectra and crystal structures for a reasonable number of membrane proteins, we took the next logical step of including membrane proteins in CD analysis. We constructed a membrane protein reference set that includes the 13 membrane protein spectra and the corresponding secondary structures given in Figure 1
and Table 1
, respectively. This reference set is referred to as MP13. We also combined the membrane protein data with those of soluble proteins and constructed soluble + membrane protein reference sets. The wavelength range of the membrane protein CD spectra allowed us to choose SP37 and SP43 for combining with membrane proteins, and the combined soluble + membrane protein reference sets are referred to as SMP followed by the number of proteins in the reference set. The expansion of reference sets SP37 and SP43 by including five denatured CD spectra had mixed effects on the performance of membrane protein CD analysis by different methods (performance worsened with CDSSTR, improved with SELCON3, remained the same with CONTIN/LL), and combining SP42 and SP48 with MP13 was not pursued. The two soluble + membrane protein reference sets constructed represent the effects of the expanded wavelength range (190 to 240 nm to 185 to 240 nm) and increased number of proteins (50 to 56) on the analysis.
The results from the analysis of membrane proteins with three reference protein sets that include membrane proteins, and three programs from CDPro, are summarized in Table 3
. The three reference protein sets are identified as MP13, SMP50, and SMP56. The results are obtained from cross-validation analysis, in which the membrane protein analyzed was removed from the reference set and was analyzed with the remaining reference proteins. Our results are compared with those from the CCA method obtained with a 30-membrane protein reference set (MP30; Park et al. 1992), by extracting results for these 13 membrane proteins and obtaining the performance indices.
|
and r are important in determining the performance of a given method, with low
values and high r values indicating a good performance, the value of r can be skewed by a consistent over- or underprediction of a structure. This is clearly the case here, where high correlation coefficients (>0.85) coupled with large values of
(~0.13) are observed as a result of consistent under-prediction of the predominant secondary structure (Supplemental Material). In such situations, the smaller value of
gives a better measure of the performance.
Among the three programs of CDPro, performance of the MP13 reference set decreased in the order, SELCON3 (
= 0.06), CONTIN/LL (
= 0.09), and CDSSTR (
= 0.06, with results for only nine membrane proteins). A careful comparison of results from the individual methods (provided in Supplemental Material) indicated the source of differences in the performance of the three methods, which has origins in both the number of reference proteins and the algorithms followed in these methods. CONTIN/LL (Provencher and Glöckner 1981) uses variable weighting of reference spectra and constrains the sum of secondary structures to unity in fitting the analyzed CD spectrum. In contrast, SELCON3 (Sreerama and Woody 1993) and CDSSTR (Johnson Jr. 1999) do not use any constraints but differ in the implementation of variable selection (Manavalan and Johnson Jr. 1987). SELCON3 uses a locally linearized version (van Stokkum et al. 1990) of variable selection, whereas CDSSTR uses a randomly selected minimal basis (Dalmas and Bannister 1995). The small number of proteins in the membrane protein reference set, 13, gave only 1287 combinations of eight reference proteins in the CDSSTR method (Johnson Jr. 1999), which was not enough to obtain any solution for four spectra (1jb0
[PDB]
, 1nkz
[PDB]
, 2por
[PDB]
, and 1af6
[PDB]
). The low information content of the membrane protein reference set was also responsible for poor solutions for three membrane proteins (1jb0
[PDB]
, 1qhj
[PDB]
, and 2por
[PDB]
) from CONTIN/LL (Supplemental Material); a solution was considered poor if the RMS deviation between the CD predicted and crystal structure values of secondary structures for a given membrane protein (
f) was >0.10. The fact that the performance indices obtained with soluble reference proteins were better than those obtained with membrane proteins alone indicates the lack of sufficient information for CD analysis in MP13.
The increased information content provided by combining soluble and membrane proteins leads to improvements in membrane protein CD analysis. In general, the soluble + membrane protein reference sets performed better than either the soluble or the membrane protein reference sets alone. The overall RMS error was reduced from ~7% to 10% with the soluble or membrane protein reference sets alone to ~7% with combined reference sets. In general, the performance indices for
and ß fractions improved with combined reference protein sets. The only exception was the performance for the ß fraction from SELCON3, which showed an increase in
ß from 0.07 (MP13) to 0.08 (SMP50 and SMP56).
With the soluble + membrane protein reference sets, increasing the wavelength range of the analyzed CD spectra improved the performance of the analysis. In general, performance indices for all secondary structures from SMP50 (wavelength range, 185 to 240 nm) were either better than or comparable to those from SMP56 (wavelength range, 190 to 240 nm). The ß-structure was an exception, in which the larger reference set improved the performance slightly. This is in contrast to the results obtained from soluble protein reference sets, in which increasing the number of reference proteins resulted in better performance. The spectral information content of SMP50 is increased by the inclusion of MP13 membrane proteins in SP37. Further addition of soluble proteins with reduction in wavelength range of the reference set (SMP56) leads to poorer analysis, which indicates a decrease in information content. The benefits of the increased spectral information from 185 to 190 nm, in SMP50, outweigh the benefits of additional proteins (SMP56) for the analysis of membrane proteins. The slightly poorer performance of the ß-structure with SMP50, in comparison with that from SMP56, indicates an under-representation of ß-structures in membrane proteins.
Examination of results for specific membrane proteins of the MP13 reference set (provided in Supplemental Material) indicates that a majority of them are analyzed well, with similar solutions from the three methods. Three proteins, photosystem I, phosphoporin, and porin (R. capsulatus), posed some problems. Photosystem I was analyzed well by the SELCON3 method, but not with the other two. The CD spectrum of photosystem I (Fig. 1
) has a strong inflection at ~225 nm, which affected its analysis with both CONTIN/LL and CDSSTR, as both methods use the similarity between the back-calculated and experimental spectra. CONTIN/LL uses it as a constraint and CDSSTR uses it as a selection rule, whereas in SELCON3 this selection rule is relaxed. Porin (R. capsulatus) was analyzed poorly with all methods, and good analysis of phosphoporin required the larger wavelength range of 185 to 240 nm (SMP50). As Park et al. (1992) observed, the ß-strands in membrane proteins are generally longer than those in soluble proteins, and the problems in the analyses of porins are probably due to under-representation of ß-rich membrane proteins in the soluble + membrane protein reference sets.
The three methods generally gave similar solutions for a given membrane protein (Supplemental Material), which indicates a reliable analysis. We did not obtain the best solution for all proteins, judging by the RMS difference with the crystal structure, from a single reference protein set. The best solutions for the 13 membrane proteins were spread among different methods and different reference sets. We averaged solutions from SP37, SP43, SMP50, and SMP56, and the performance indices for the averaged solution are also given in Table 3
. In the absence of structural information, the average solution obtained by averaging solutions from different methods and different reference sets provides a reliable estimate.
Park et al. (1992) also provided the CD spectrum of F0F1 ATPase, which is shown in Figure 2
. We have performed the analysis of this CD spectrum by using both soluble and soluble + membrane protein reference sets, and the results are presented in Table 4
. The results from three programs and four reference sets were averaged to obtain the CD prediction of 59%
-helix and 8% ß-sheet for F0F1 ATPase. The relative uncertainty in the ß-sheet fraction, given by the standard deviation, was larger than that for
-helix fraction. F0F1 ATPase is a large multimeric protein of Mr ~540 kD (Boyer 1997) and has both soluble (F1) and membrane-bound (F0) components. The soluble component, F1 ATPase (Mr ~ 379 kD), consists of three
-chains, three ß-chains, and one chain each of
,
, and
. The membrane-bound component, F0 ATPase (Mr ~ 160 kD), consists of one a-chain, two b-chains, and 10 to 12 c-chains. The crystal structure of F1 ATPase (PDB code, 1bmf
[PDB]
; subunits
3, ß3, and
; Mr ~ 346 kD; Abrahams et al. 1994) is available, and it has 42%
-helix and 17% ß-sheet. By using the CD estimate of
-helix fraction for F0F1 ATPase (f
CD = 0.59) with the crystal structure of F1 ATPase (f
EXP = 0.42), we estimate the
-helix content of the remaining subunits of F0F1 ATPase to be ~ 0.90. This estimate compares quite well with the NMR structures of the c subunit (Girvin et al. 1998) and ac12 complex of F0 ATPase (Rastogi and Girvin 1999), which indicate a very high
-helix content (f
EXP = 0.85 to 0.90). A similar exercise with the ß-sheet fraction, however, failed to give meaningful results because the average ß-sheet content for F0F1 ATPase (fßCD = 0.08; Table 4
) as estimated by CD analysis is quite low in comparison with the crystal structure of
3ß3
portion of F1 ATPase (fßEXP = 0.17). This is probably due to the underestimation of fß by CD and the larger uncertainty in the value of fßCD (fßCD = 0.08 ± 0.02; range, 0.11 to 0.05). By using the value of 0.11 for fßCD of F0F1 ATPase and 0.17 for fßEXP of F1 ATPase, we estimate the ß-sheet fraction in F0 ATPase to be zero.
|
|
|
|
|
In general, the inclusion of membrane proteins led to slightly improved analysis of soluble proteins. The performance indices for
, T, and U fractions showed smaller improvements, and those for ß showed slightly larger improvements, with a few exceptions. The extent of improvements, however, depended on the method of analysis. Overall, SMP56 performed the best, with 
, 0.07 to 0.09;
ß, ~0.10; and
, 0.08 to 0.09. It showed improvements over the corresponding soluble protein reference set, SP43, for all three methods of analysis. SMP50 also showed improvements over the corresponding soluble protein reference set, SP37, for CONTIN/LL and CDSSTR. For SELCON3, the results from SMP50 were slightly worse than those from SP37.
The performance indices for soluble proteins from the two soluble + membrane protein reference sets, SMP50 and SMP56, were comparable. The larger set showed slightly larger improvement for the ß fraction than that for the
fraction, which were offset by a slight worsening of the T fraction. The overall performance was similar for CONTIN/LL and CDSSTR, whereas SELCON3 showed slight improvement, which may be a correction for the poorer performance of SMP50. Improvements in the soluble protein analysis obtained by the addition of MP13 to both SP37 and SP43 indicate an increase in the information content of both SMP50 and SMP56 due to membrane proteins.
| Discussion |
|---|
|
|
|---|
By using the available spectral and structural data for 13 membrane proteins, we have examined the performance of existing soluble protein reference sets, a newly constructed membrane protein reference set, and combined soluble + membrane protein reference sets for analyzing membrane protein CD spectra. We have also examined the performance of combined souluble + membrane protein reference sets for the analysis of soluble protein CD spectra. Although the existing soluble protein reference sets performed reasonably well in analyzing membrane proteins, the membrane protein reference set performed poorly. The poor performance of the membrane protein reference set was probably due to the low information content, because the number of reference proteins was small. The inclusion of membrane proteins in the soluble protein reference sets increased the spectral information content and improved the performance for both membrane and soluble proteins.
Our results for the analysis of membrane protein CD spectra with both membrane and soluble protein reference sets are better than those of Park et al. (1992). Park et al. used different CD analysis methods for soluble and membrane protein reference sets because of the lack of structural information for membrane proteins. They used the CCA method (Perczel et al. 1991) with the membrane protein reference set without any secondary structure information, and the method of Chang et al. (1978) and the variable selection method (Manavalan and Johnson Jr. 1987) with soluble protein reference sets with secondary structure information. We obtain improvements in the performance of the membrane protein reference set because we use both secondary structure fractions and variable selection of reference proteins in our analysis, which are not included in the CCA method (Perczel et al. 1991). The improvements in the analysis of membrane proteins with soluble protein reference sets over that of Park et al. (1992) are due to the advances in protein CD analyses. We use the latest CD analysis methods, which have increased information content made possible by the inclusion of a large number of soluble proteins and better algorithms.
Our conclusions are different from those of Wallace et al. (2003), who also used CDPro software in their analysis of eight membrane proteins and reported results for two representative CD spectra. Wallace et al. concluded that the existing soluble protein reference sets give inaccurate results for membrane protein CD analysis. They attributed the poor performance to the spectral differences, such as wavelength shifts and intensity differences, between soluble and membrane proteins. They used two parameters in reaching their conclusions: normalized RMS deviation (NRMSD) calculated as
![]() |
(Brahms and Brahms 1980), between the back-calculated (
Calc) and experimental CD spectra (
EXP), and the absolute difference between the secondary structure fractions estimated by CD (fCD) and from crystal structures (fEXP) given as R. Two measures of R were used by Wallace et al. (2003): Rav = (
| fEXP - fCD|),1 which gives the total error in the CD-predicted values, and RP = | fEXP - fCD |, which gives the error in the prediction of the predominant secondary structure (either
or ß). However, their reliance on NRMSD as a measure of accuracy and the manner in which they determined the secondary structure fractions from crystal structures for comparison with CD estimates may lead to errors.
CDPro provides seven reference protein sets that differ either in the number or in the wavelength range of CD spectra of reference proteins (Sreerama and Woody 2000b), all of which were used by Wallace et al. (2003). CDPro also uses three methods for assigning secondary structure fractions to crystal structures (Kabsch and Sander 1983; Sreerama and Woody 1994a; King and Johnson Jr. 1999), of which the former method is used in five reference sets and does not determine the poly(Pro)II type structure fraction. Wallace et al. (2003) use an average from five different assignments of secondary structures as fEXP in calculating Rav and RP. The CD estimates of secondary structure fractions correspond to a particular definition followed in constructing the reference set used in a given analysis. The CD estimates should be compared with the secondary structure fractions obtained by using the same assignment method used in the construction of the reference protein set. Luckily, the average
and ß fractions given by Wallace et al. (2003) are similar to the DSSP values, which are used in six of the seven reference protein sets in CDPro. This allows the comparison of the CD estimate for the predominant secondary structure. Wallace et al. (2003), however, give detailed results for only two proteins. When we consider the results from the reference sets SP37 and SP43 (db = 3 or 4; Table 2b of Wallace et al. [2003]) for the predominantly ß-sheet membrane protein ferric enterobactin receptor, the ß fraction is determined very accurately (RP = 0.0; SELCON3, SP37). For the predominantly
-helical membrane protein mechanosensitive channel from M. tuberculosis, CD analyses predict a higher
-helical content (RP = 0.28 to 0.38), which is consistent with the intensities of the CD bands (Fig. 1
of Wallace et al. [2003]); the CD spectrum is comparable to that of bacteriorhodopsin (1qhj
[PDB]
, Fig. 1
), which has ~75%
-helical content. This apparent discrepancy between the
-helical contents from CD and the crystal structure may be due to difference between the solution and solid-state structures.
We have previously used the RMS difference between the CD estimates and the crystal structure values of secondary structure fractions,
![]() |
(Sreerama and Woody 1994b) as a measure of error in the results for specific proteins. Both
f and Rav give a measure of collective error for specific proteins, but Rav seems to accentuate the error. We do not see any advantage of using Rav over
f. RP, on the other hand, could be useful in testing the performance of a given method, although it is of questionable value for many soluble proteins that have no dominant secondary structure. Further, one needs to be careful in drawing conclusions based on just RP. The RP values reported by Wallace et al. (2003) for the
-rich membrane protein appear to be too large for some reference protein sets. For example, Wallace et al. (2003) report the RP value of 0.48 with SELCON3 (Table 2b, db = 2), and the average value obtained from the crystal structure is 0.52 (Table 1b of Wallace et al. [2003]). This indicates a CD estimate (f
CD = f
EXP ± RP) of either 1.00 or 0.04, both of which are improbable. Wallace et al. obtained poor results for db = 2 with the other two programs also, which indicates a failed analysis. We have provided the
f values for the MP13 set in Electronic Supplemental Material.
The back-calculated spectra from the three CD analysis methods provided in CDPro differ qualitatively because they follow different algorithms. CONTIN/LL always gives the best agreement with the experimental spectrum because the algorithm minimizes the error between the fitted and experimental spectra (Provencher and Glöckner 1981) and is expected to have the lowest NRMSD. Both SELCON3 and CDSSTR use the singular value decomposition algorithm (Forsythe et al. 1977) and ignore singular values that correspond to the noise in the CD data set (Hennessey Jr. and Johnson Jr. 1981). The number of singular values included is varied in SELCON3 (Sreerama and Woody 1993), whereas that in CDSSTR is always five (Johnson Jr. 1999), thus affecting overall noise excluded from the analysis. The minimal basis (Dalmas and Bannister 1995) and the locally linearized (van Stokkum et al. 1990) versions of variable selection, respectively, are implemented in CDSSTR and SELCON3. Generally, errors between the experimental and back-calculated spectra from CDSSTR are smaller than those from SELCON3 because of these differences in their algorithms.
For the 13 membrane protein CD spectra analyzed with SMP50, we obtained NRMSD values in the range (and averaged NRMSD) 0.08 to 0.51 (0.22), 0.04 to 0.12 (0.07), and 0.02 to 0.08 (0.04), respectively, from SELCON3, CDSSTR, and CONTIN/LL programs; the corresponding values for soluble proteins from SMP50 were 0.13 to 1.0 (0.24), 0.01 to 0.24 (0.08), and 0.01 to 0.14 (0.03). Moreover, the NRMSD values were uncorrelated with the error in the secondary structure prediction. Wallace et al. (2003) obtained NRMSD values of 0.002 to 0.050 and 0.019 to 0.193, respectively, for the predominantly
-helical and predominantly ß-sheet membrane proteins with SP37 and SP43 reference protein sets (Table 2b, Wallace et al. 2003). Given the differences between the three methods in the nature of back-calculated spectra, it is difficult to use them to draw conclusions as to the accuracy of the analysis from NRMSD values.
We did not find evidence to support the suggested (Wallace et al. 2003) wavelength shifts between soluble and membrane protein CD spectra. The variation in the spectral peaks observed in the 30 membrane proteins of Park et al. (1992) was similar to that observed in the soluble protein reference set, SP43. The position of the positive 
* band varied between 192 and 196 nm in the CD spectra of predominantly
-rich membrane proteins, whereas the corresponding range for
-rich soluble proteins was 192195 nm. The ß-rich soluble proteins showed the largest variation in the position of the positive 
* band (185 to 197 nm), that in
ß were intermediate (188 to 195 nm). The number of ß-rich membrane proteins was too small to compare with the soluble proteins.
It is important to have a good representation of spectral and structural variation of proteins in the reference set. The success of membrane protein CD analysis with soluble protein reference sets indicates the presence of good spectral and structural variation, which is lacking in the small membrane protein reference set. The improvements obtained in the analysis of both membrane and soluble proteins with the addition of a small number of membrane proteins indicate an increase in the information content in the soluble + membrane protein reference sets. Although the existing soluble protein reference sets perform quite well, the inclusion of membrane proteins should further improve protein CD analysis. The MP13 reference set is dominated by
-rich membrane proteins and, by any measure, is not optimal. There is scope for expanding the membrane protein reference set as new and higher resolution structures and CD spectra become available for membrane proteins.
| Materials and methods |
|---|
|
|
|---|
-rich membrane proteins, and the last four (2omf
[PDB]
-1pho) are ß-rich membrane proteins. The CD spectra of these proteins (Fig. 1
Secondary structure
The secondary structure fractions of the membrane proteins were obtained from the program DSSP (Kabsch and Sander 1983), which uses hydrogen bonding patterns to identify secondary structure elements. The
-helix and ß-strand structures were split into regular and distorted classes, considering four residues per
-helix and two residues per ß-strand distorted (Sreerama et al. 1999). For proteins with more than one polypeptide chain in the structure, all chains were considered for secondary structure assignment. Our grouping of DSSP assignments gave us six secondary structural classes: regular
-helix,
R; distorted
-helix,
D; regular ß-strand, ßR; distorted ß-strand, ßD; turns, T; and unordered, U. The secondary structure fractions are given in Table 1
. The secondary structure fractions used in the reference protein sets were also determined in an identical manner (Sreerama and Woody 2000b).
CD analysis
The analysis of CD spectra was performed by using CDPro software (Sreerama and Woody 2000b), which includes three different methods for analyzing protein CD spectra implemented in computer programs CDSSTR (Johnson Jr. 1999), SELCON3 (Sreerama and Woody 1993; Sreerama et al. 1999), and CONTIN/LL (Provencher and Glöckner 1981; Sreerama and Woody 2000b). These methods differ either in the mathematical procedure or in the implementation of variable selection (Manavalan and Johnson Jr. 1987) or both, and they have been described elsewhere (Sreerama and Woody 2000b). Similar results from all three methods provide a measure of the reliability of the analysis. Several reference protein sets with varying number of proteins, inversely related to the wavelength range, are also provided in CDPro and were used in our analysis.
The performance of the analysis was characterized by RMS deviations (
) and correlation coefficients (r) between the x-ray and CD estimates of secondary structure fractions for different secondary structures. These are denoted by
k and rk, where k is one of the secondary structural types considered. The results from CD analysis for six secondary structures were converted to four secondary structures (
, ß, T, and U) by combining the fractions of regular and distorted fractions of
and ß. Overall, performance of the analysis for a given set of secondary structure fractions was determined by considering all secondary structure fractions collectively, and these are given by
and r.
The RMS deviations and correlation coefficients were calculated by using the following equations:
![]() |
and
![]() |
where
and
are CD and X-ray estimates of secondary structure types of N reference samples, respectively.
| Electronic supplemental material |
|---|
|
|
|---|
| Acknowledgments |
|---|
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| References |
|---|
|
|
|---|
Andrade, M.A., Chacán, P., Merolo, J.J., and Morán, F. 1993. Evaluation of secondary structure of protein from UV circular dichroism spectra using unsupervised learning neural network. Protein Eng. 6: 383390.
Berman, H.M., Westbrook, J., Feng, Z., Gilliand, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The Protein Data Bank. Nucleic Acids Res. 28: 235242.
Böhm, G., Muhr, R., and Jaenicke, R. 1992. Quantitative analysis of protein far UV circular dichroism spectra by neural networks. Protein Eng. 5: 191195.
Boyer, P.D. 1997. The ATP synthase: A splendid molecular machine. Annu. Rev. Biochem. 66: 717749.[CrossRef][Medline]
Brahms, S. and Brahms, J. 1980. Determination of protein secondary structure in solution by vacuum ultraviolet circular dichroism. J. Mol. Biol. 138: 149178.[CrossRef][Medline]
Chang, C.T., Wu, C.-S.C., and Yang, J.T. 1978. Circular dichroism analysis of protein conformation: Inclusion of ß-turns. Anal. Biochem. 91: 1331.[CrossRef][Medline]
Dalmas, B. and Bannister, W.H. 1995. Prediction of protein secondary structure from circular dichroism spectra: An attempt to solve the problem of the best-fitting reference protein subsets. Anal. Biochem. 225: 3948.[CrossRef][Medline]
Forsythe, G.E., Malcolm, M.A., and Moler, C.B. 1977. Computer methods for mathematical computations. Prentice-Hall, Englewood Cliffs, NJ.
Girvin, M.E., Rastogi, V.K., Abildgaard, F., Markley, J.L., and Fillingame, R.H. 1998. Solution structure of the transmembrane H+-transporting subunit c of the F1F0 ATP synthase. Biochemistry 37: 88178824.[CrossRef][Medline]
Greenfield, N. and Fasman, G.D. 1969. Computed circular dichroism spectra for the evaluation of protein conformation. Biochemistry 8: 41084116.[CrossRef][Medline]
Greenfield, N.J. 1996. Methods to estimate the conformation of proteins and polypeptides from circular dichroism data. Anal. Biochem. 235: 110.[CrossRef][Medline]
Hennessey Jr., J.P. and Johnson Jr., W.C. 1981. Information content in the circular dichroism of proteins. Biochemistry 20: 10851094.[CrossRef][Medline]
Johnson Jr., W.C. 1988. Secondary structure of proteins through circular dichroism spectroscopy. Annu. Rev. Biophys. Biophys. Chem. 17: 145166.[CrossRef][Medline]
. 1999. Analyzing protein circular dichroism spectra for accurate secondary structures. Proteins 35: 307312.[CrossRef][Medline]
Kabsch, W. and Sander, C. 1983. Dictionary of protein secondary structure: Pattern recognition of hydrogen bonded and geometric features. Biopolymers 22: 25772637.[CrossRef][Medline]
King, S.M. and Johnson Jr., W.C. 1999. Assigning secondary structure from protein coordinate data. Proteins 35: 313320.[CrossRef][Medline]
Manavalan, P. and Johnson Jr., W.C. 1987. Variable selection method improves the prediction of protein secondary structure from circular dichroism. Anal. Biochem. 167: 7685.[CrossRef][Medline]
Pancoska, P. and Keiderling, T.A. 1991. Systematic comparison of statistical analysis of electronic and vibrational circular dichroism for secondary structure prediction of selected proteins. Biochemistry 30: 68856895.[CrossRef][Medline]
Pancoska, P., Bitto, E., Janota, V., Urbanova, M., Gupta, V.P., and Keiderling, T.A. 1995. Comparison of and limits of accuracy for statistical analyses of vibrational and electronic circular dichroism spectra in terms of correlations to and predictions of protein secondary structure. Protein Sci. 4: 13841401.[Abstract]
Park, K., Perczel, A., and Fasman, G.D. 1992. Differentiation between transmembrane and peripheral helices by the deconvolution of circular dichroism spectra of membrane proteins. Protein Sci. 1: 10321049.[Abstract]
Perczel, A., Hollosi, M., Tusnady, G., and Fasman, G.D. 1991. Convex constraint analysis: A natural deconvolution of circular dichroism curves of proteins. Protein Eng. 4: 669679.
Provencher, S.W. and Glöckner, J. 1981. Estimation of protein secondary structure from circular dichroism. Biochemistry 20: 3337.[CrossRef][Medline]
Rastogi, V.K. and Girvin, M.E. 1999. Structural changes linked to proton translocation by subunit c of the ATP synthase. Nature 402: 263268.[CrossRef][Medline]
Sreerama, N. and Woody, R.W. 1993. A self-consistent method for the analysis of protein secondary structure from circular dichroism. Anal. Biochem. 209: 3244.[CrossRef][Medline]
. 1994a. Poly(Pro)II helices in globular proteins: Identification and circular dichroic analysis. Biochemistry 33: 1002210025.[CrossRef][Medline]
. 1994b. Protein secondary structure from circular dichroism spectroscopy: Combining variable selection principle and cluster analysis with neural network, ridge regression and self-consistent methods. J. Mol. Biol. 242: 497507.[Medline]
. 2000a. Circular dichroism of peptides and proteins. In Circular dichroism: Principles and applications, 2nd ed. (eds. N. Berova et al.), pp. 601620. Wiley, New York.
. 2000b. Estimation of protein secondary structure from CD spectra: Comparison of CONTIN, SELCON and CDSSTR methods with an expanded reference set. Anal. Biochem. 287: 2522