|
|
||||||||
1 Department of Chemical Engineering and Materials Science and 2 Digital Technology Center, University of Minnesota, Minneapolis, Minnesota 55455, USA
Reprint requests to: Yiannis N. Kaznessis, Department of Chemical Engineering and Materials Science, University of Minnesota, 421 Washington Avenue SE, Minneapolis, MN 55455, USA; e-mail: yiannis{at}cems.umn.edu; fax: (612) 626-7246.
(RECEIVED June 18, 2004; FINAL REVISION September 20, 2004; ACCEPTED September 30, 2004)
| Abstract |
|---|
|
|
|---|
Keywords: proteinprotein interaction; docking; conservation index; binding free energy; molecular recognition; computer simulations
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.04941505.
| Introduction |
|---|
|
|
|---|
In principle, calculation of the free energy change upon binding of two proteins should allow determination of the native structure. Although the enthalpic part of the free energy can be calculated with some accuracy, the entropic contributions are not easy to calculate without resorting to semiempirical and less accurate calculations. Furthermore, the computational load can become too large, especially for unbound docking (starting with individual protein crystal structures), which can potentially involve large protein conformation changes. Heuristic criteria, such as shape-complementarity and coarse-grained residue potentials, have been used with relative success (Camacho et al. 2000a). Still, the main bottleneck is choosing the near-native structures from large sets of generated complexes based on a standard global ranking procedure that will bring the near-native structures at the top of the generated structures data set.
Additional information has been used to better select near-native structures: HADDOCK (Dominguez et al. 2003) and TreeDock (Fahmy and Wagner 2002) use information based on chemical shift perturbation data resulting from NMR titration experiments or mutagenesis, whereas ConsDock (Paul and Rognan 2002) uses consensus analysis for proteinligand interactions. ProMate (Gottschalk et al. 2004) is based on a statistical analysis of several properties found to distinguish binding regions from nonbinding ones.
Recent studies of protein complexes have tested the importance of factors, such as interface propensity of residues, accessible surface area, planarity, protrusion, packing energies, and binding areas (Jones and Thornton 1996; Tsai et al. 1997; Larsen et al. 1998; Lo Conte et al. 1999). A test using averages of these factors as an indicator of protein-binding sites showed an ~66% success rate for 59 predictions (Jones and Thornton 1997).
There have also been several reports investigating the role of conservation of interfacial residues in naturally occurring protein complexes, using evolutionary tracing of conserved residues in homologous sequences and structures (Lichtarge and Sowa 2002; Ben-Zeev and Eisenstein 2003; Glaser et al. 2003; Lichtarge et al. 2003; Mihalek et al. 2004; Yan et al. 2004). Our recent analysis of well-resolved protein complexes indicated that the density of highly conserved residues is higher in proteinprotein interface positions compared to the other positions of the protein surfaces (B.V.B. Reddy and Y.N. Kaznessis, in prep.). We actually find that highly conserved positions in surface regions of proteins involved in non-antibodyantigen complexes tend to be in interacting patches. On the other hand, for antibodyantigen complexes, a very low number of conserved positions is observed in the interface regions. This information can potentially assist in the selection of near-native structures. However, to our knowledge, no attempts have been made to use residue conservation information to filter and rank the docking solutions of protein complexes.
In this paper we describe our docking analysis and ranking of docked complex structures for 59 benchmark complexes (Chen et al. 2003b). In the first stage, we use FTDock (Gabb et al. 1997; Moont et al. 1999) to generate 10,000 docked models for each of the complexes. Our study is focusing on the second stage to refine and rerank the docked structures. We use conserved residue position information as a filter to reduce the number of docked structures. Besides filtering, we use conservation information to rank the remaining docked structures. We evaluate these approaches and report on the results.
In this paper we also report on our efforts to develop a global ranking scheme. For each docked model, we relax the conformation of the side chains by minimizing the energy with CHARMM and then calculate the binding free energy using a generalized Born method and the solvent-accessible surface area. We finally develop a global ranking procedure so that the near-native structures rank at the top, using all available information from docking, free energy calculations, and residue conservation information.
| Results and Discussion |
|---|
|
|
|---|
Analysis of FTDock performance
Using FTDock (http://www.bmm.icnet.uk/docking) (Gabb et al. 1997; Moont et al. 1999), we obtained 10,000 docked models and their ranks according to the correlation function (equation 3) of shape complementarity and pair potential (see "Docking calculations" below). For these 10,000 models, we calculated the root mean square deviation (RMSD) of C
atoms of each model structure from the native complex structure. We then defined "hits" as the number of models having RMSD <4.5 Å from the native structure (shown in Table 1
). Also shown in Table 1
are the lowest RMSD (LRMSD) complex obtained with FTDock and its corresponding shape-complementarity rank and pair-potential rank. It can be seen in Table 1
that there are 26 complexes with LRMSD <2.5 Å, 15 complexes with LRMSD >2.5 Å but <3.5 Å, and eight complexes with LRMSD >3.5 Å. We are thus confident that FTDock can generate model complexes close to native structures. Nonetheless, for five complexes (1AVW, 1BQL, 1EFU, 1FIN, 1GOT), FTDock failed to generate near-native structures, as the LRMSDs for these complexes are >4.5 Å.
|
The rank based on shape complementarity predicts near-native structures very poorly: the average rank of the LRMSD complexes is 4123, with only three of the 60 complexes registering ranks better than 100. It is thus clear that shape complementarity is not by itself an adequate means for choosing near-native structures.
The pair-potential rank did improve the ranks for 47 complexes out of the 60 cases. From Table 1
, it can be observed that there are only 12 complexes with pair-potential ranking worse than shape complementarity. Nonetheless, ranks based on pair potential do not have impressive predictive ability. For example, only five complexes (1BRC, 1BRS, 1PPE, 2MTA, 2SIC) have ranks <20 for the LRMSD model, and another three complexes (1CGI, 1CHO, 2BTF) have ranks of LRMSD complexes <100. The rest have very high rank values.
Filters performance
First, we try to reduce the number of possible docked models from the generated 10,000, without filtering out the lower RMSD models. As described in "Filters" below, we developed two filters based on residue conservation information. In the functionally interacting natural proteins, such as enzymeinhibitor complexes, we gave higher ranks for the models with a higher number of conserved positions in the interface region. In the case of antigenantibody interactions, the interacting regions are highly variable, and we gave higher ranks for the models with low numbers of conserved positions. After performing the first filter, we used filter II (see below) to reduce the number of complexes to ~20004000 models. These results are also shown in Table 1
. It can be seen that combining with the conservation filter and filter II the number of complexes is reduced from 56% to 86%.
In Table 1
, there are 11 complexes (1A0O, 1AHW, 1BRS, 1DFJ, 1FQ1, 1IGC, 1UDI, 1UGH, 1WQ1, 2MTA, 4HTC) for which sufficient homolog sequences were not available from nonredundant databases to calculate the conserved residue position information. Therefore, only filter II is applied for these complexes (in this case, filter II only includes three normalized ranks without the conserved residue position information).
When we applied the filters to the model sets, some near-native structures are also filtered out (false negatives), besides nonnative structures. Here we define the improvement factor (I_fact) as:
![]() |
where hits/models is the ratio of the number of structures with RMSD < 4.5 Å from the native structure over the number of complex models, before(hits/models)iand after(hits/models)fapplying the filters.
The results are shown in Table 1
and Figure 1
. It is observed that there are 48 out of 60 complexes with I_fact >1.0. Most of them (44) are >2.0, which means the improvement is >100%. For a few complexes, applying the filter resulted in >400% improvement.
|
Our filters failed for seven complexes: there are three complexes (1FSS, 1IGC, 1MAH) for which I_fact is <1.0 (Fig. 1
; Table 1
). For these structures proportionately more near-native model structures are filtered out than unrelated ones. In Figure 1
, it can also be observed that four complexes (1EO8, 1L0Y, 1NCA, 1QFU) have I_fact = 0. This means that we filtered out all of the near-native structures (two, one, seven, and five hits for the four complexes, respectively). When we examined the number of conserved residue positions at the interface for these four complexes, we found that there is a high number of conserved residue positions for antibodyantigen systems 1EO8 and 1QFU, and a low number of conserved residues for non-antibody 1L0Y and 1NCA, contrary to most of the complexes investigated.
The global rank (see next section) for these four failed complexes (1EO8, 1L0Y, 1NCA, 1QFU) and two of the complexes (1FSS, 1MAH) without improvements are also given in Table 1
without using filter I. It is observed that except for 1L0Y, the I_fact values of the rest of five complexes are >1.0, and the lower RMSD models are still in the subset. 1L0Y only has one hit (see Table 1
) and is filtered out by filter II, but other lower RMSD models are still in the subset. Conserved residue position information cannot be calculated for 1IGC, since there are not enough homologous sequences in the database. The result of 1IGC listed in Table 1
is obtained by just using filter II. Its improvement (I_fact) is still <1.0 since lower RMSD models are filtered out.
By comparing the results before and after filtering (Table 1
), it becomes clear that only in a few cases (1AHW, 1CHO, 1FIN, 1FQ1, 1IGC, 1KKL, 1WQ1), the LRMSD model structure was filtered out, but even in these cases the second lowest RMSD complex is retained into the remaining subset. For all other complexes the structure closest to the native structure is always in the remaining subset. This demonstrates that our conserved residue information filters work well for the benchmark set.
In order to check the redundancy of filter I and filter II, we tested them separately on those complexes that have enough conserved residue position information. The I_fact values for performing these two filters separately are also listed in Table 1
(columns I1 and I2). Both of them do improve the efficiency with most of I_fact values (I1, I2) being >1.0. After combining them, we observed further significant improvement (I_fact in Table 1
). The combined I_fact values are greater than the individual I_fact values (I1, I2). We conclude, thus, that it is necessary to include filters when conserved residue information is available, in order to substantially decrease the number of model structures and improve the prediction.
The efficiency of global ranking
The free energy of binding would in principle suffice to determine the native structure from a large set of complexes. Unfortunately, the free energy we calculated does not rank near-native structures at the top of the list. This could be the result of inaccuracies in the potential force fields used for calculating enthalpic terms or in the empirical entropic terms. Conformational changes upon binding, whether local or global, can also result in significant changes in the free energy of binding (Camacho et al. 2000a). As a result we have to resort to empirical descriptors, and since none can individually predict near-native structures with great accuracy, we decided to combine multiple descriptors in a global ranking scheme.
Empirical rankings based on more than one descriptor have been attempted before: In ZDOCK (Chen et al. 2003a) shape complementarity, electrostatics and desolvation energies were combined to get a final target function, and AutoDock (Morris et al. 1998) involved more energy terms into the score function. A major bottleneck for composite, global scoring functions is that the weights for different quantities are difficult to determine.
As described in "Global normalized ranking" below, we derived a global ranking function by renormalizing the rank of each descriptor used (equation 11), and used weights 1, 1, 2, 4, and 5 for shape complementarity, binding free energy, conservation index, desolvation energy, and pair-potential energy, respectively, in a new global ranking function (equation 12). Using this function we obtained a new global rank for each model complex. Some examples (18 out of 60 complexes) of the global rank versus the RMSD are shown in Figure 2
.
|
Since the methods for generating the decoy complexes, for evaluating and ranking them are dissimilar in all these studies, the information obtained and reported herein can be considered as complementary to other methods.
In Table 1
, we also give the number of hits (E_hits) within the first 100 ranks. For 22 complexes, application of the global rank resulted in no hits in the top 100 ranked structures. We should note that for five of them there were no hits to begin with, because FTDock did not generate any. For the rest of 38 complexes, application of the global ranking improves substantially the predictive ability. Specifically, we calculate the improvement over random (IOR) for these 38 complexes
![]() |
where NRC is the number of complexes after filtering, and we find significant IOR values (see Table 1
). The average calculated IOR for these 38 complexes is 11.18. Even when the 17 complexes with IOR = 0 are included in the average calculation, the average IOR for the 55 complexes for which FTDock generated hits is 7.72.
Figure 3
shows model structures of the best predictions superimposed on the native structures for some of the selected targets with rank <10. The complexes 4HTC, 2MTA, 1SPB, 1STF, 1KXQ, and boundbound 1FIN (1FIN_BB) have given excellent prediction with rank of 1 or 2 for the lowest RMSD structure.
|
Concluding remarks
In this work we have demonstrated the usefulness of conserved residue position information in identifying possible near-native complex model structures from docking solutions. We have used this information to develop two filters, reducing the number of docked model structures by 56% to 86% depending on the complex, while keeping near-native complexes in the remaining subset. We applied our method to a benchmark set of 59 complexes. There are 11 complexes for which we didnt find enough homolog sequence information. Thus, we could not apply our filter at present. Only for four of the remaining complexes did our filter fail to retain the near-native structures, and for another three out of 60 complexes (the 59 benchmark and the FIN boundbound calculation), our filter did poorly compared to FTDock results.
After filtering, we minimized the side-chain structure of the remaining model structures, and we calculated the binding free energy and desolvation energy. We developed a ranking scheme by renormalizing and weighting a combination of the ranks based on conservation position information, shape complementarity, desolvation energy, pair potential, and binding free energy. Excluding the five complexes for which FTDock did not generate any hits (with RMSD < 4.5 Å), the average improvement over random for the top 100 ranked structures is 7.72. For 17 complexes IOR = 0, but for the majority (38 complexes) we observed significant improvements in predictive ability, in terms of predicting near-native structures in the highest-ranked 100 structures. Generally, our approach can be easily adapted to any other docking algorithms to refine their ranking results.
| Materials and methods |
|---|
|
|
|---|
|
The calculation of shape complementarity between any two proteins A and B initially projects the two molecules onto a 3D grid of N3 points, represented by discrete functions:
![]() | (1) |
Then the surface and the interior of each molecule is distinguished by parameters
and
respectively:
![]() | (2) |
The correlation function (score) is calculated as:
![]() | (3) |
where
![]() |
with (a, b, r) the shift vector of molecule B around molecule A. We used
= 1,
= 15 for the empirically chosen parameters to calculate the correlation function C(a, b, r). Using a discrete fast-Fourier transform (FFT), the computation is on the order of N3 ln(N3) instead of the order of N6 of the direct calculation using equation 3. Using this scoring function, we ranked all of the possible generated complex structures (in our case we initially keep 10,000).
Moont et al. (1999) generated empirical residueresidue pair potentials to further screen possible proteinprotein docking complexes by FTDock. We also used their 20 x 20 matrix of pairwise interaction potentials. For each docked complex, we calculate the distance between residues of the two proteins. If this distance is <4.5 Å, we obtain the interaction value from the matrix, then sum up all the values and get the final interaction energy for each complex. Using this interaction information, a new rank (pair-potential rank) is generated.
Conservation of residue positions
To evaluate the extent of conservation of interacting positions on the surface of proteins, we calculate conservation indices as follows:
Homologous sequences
The two protein sequences of each investigated complex were used to obtain their homologous sequences from SWALL, an annotated nonredundant protein sequence database (nonredundant SWISS-PROT + TrEMBL + TrEMBLnew), using the FASTA3 (http://www.ebi.ac.uk/fasta33/) sequence similarity search tool at the European Bioinformatics Institute. Homologous sequences with <30% gaps in the sequence and >35% sequence identity to the parent sequence were used for analysis. If the evolutionary distance (described below) between any two sequences is <5%, then we randomly removed one of the sequences from the homolog set. The remaining sequences were used for calculating the residue conservation index (described below).
Evolutionary distance
Evolutionary distance among the sequences is calculated using the structure-based amino acid substitution matrix M(a, b) (Gonnet et al. 1992). A similarity score Sii for sequence i is calculated by summing the identical substitution [diagonal values from M(a, b)]. Similarly, score Sjj is calculated for sequence j. A similarity score Sij between the sequences i and j is calculated using substitution matrix values of corresponding aligned residues between the two sequences. An evolutionary distance (EDij) between the two sequences is calculated using
![]() | (4) |
Conservation index of residue position
Evolutionary distances between the reference sequence and its homologs were used to calculate residue conservation index (CIl) for each position l using the amino acid substitution matrix, similar to the amino acid variability or conservation used by Valdar and Thornton (2001). Conservation Index (CIl) is a weighted sum of all pairwise similarities between all residues present at the position. The CIl value is calculated using equation 5 in a given alignment and takes a value in the range [0, 1].
![]() | (5) |
where N is the number of homologous sequences in the alignment; si(l) and sj(l) are the amino acids at the alignment position l of sequences si and sj, respectively; ED(si) and ED(sj) are the average evolutionary distance of s(i) and s(j) from the remaining homologs. Mut(a, b) measures the similarity between the amino acids a and b as derived from the amino acid substitution matrix M(a, b) defined as:
![]() | (6) |
where a, b are the pairs of amino acids at a given alignment position l. M(a,b)low is the lowest value in the substitution matrix (5 in the Gonnet matrix; Gonnet et al. 1992) and M(a, b)max is the maximum value among all the possible substitution pairs in that position. Thus Mut(a, b) takes a value in the range [0, 1].
Using PSA (Richmond and Richards 1978; Sali and Blundell 1990), the solvent-accessible surface area (SASA) of amino acids is calculated and used to identify surface residues and buried residues. We have then identified the top 8% and 17% of highly conserved residues, which have solvent accessibility >25% of their total surface area. As an example, in Table 2
we list the highly conserved surface residues of complex 1TABs E and I chains.
|
|
0.4, group 2 positions have CI values between 0.4 and 0.6, group 3 positions have values between 0.60 and 0.85, and group 4 positions have values >0.85. In Table 3
It can be seen from Table 3
that for non-antigenantibody complexes the ratio increases progressively from 0.85 to 1.53 at higher CI intervals. This is a clear indication that the number of highly conserved positions in the interfacial region is significantly more compared to noninterfacial regions.
This finding is not in agreement with the study of Caffrey et al. (2004), who reported only a slight increase in conservation of interfacial regions. A calculation of average conservation indices for the interacting patches of the benchmark proteinprotein complexes explains the discrepancy and verifies the results of our previous study (B.V.B. Reddy and Y.N. Kaznessis, in prep.). This calculation shows that the average conservation indices for all the residues in the interaction sites are indeed only slightly higher as shown by other researchers (Caffrey et al. 2004; B.V.B. Reddy and Y.N. Kaznessis, in prep.). Nonetheless, although the average CI of interacting patches is not a useful measure for the prediction of interacting sites on protein surfaces, the actual number of highly conserved residues in the interfacial region can help in accurately identifying putative interaction sites on given protein structures. Therefore, we have used the number of highly conserved positions per interaction site as our filter to identify the interaction sites. We assigned high ranks to complexes that had a large number of conserved positions at the interacting interface for non-antigenantibody complexes.
From Table 3
it can be also seen that for antigenantibody complexes the ratio decreases progressively from 2.98 to 0.36 at higher CI intervals (unlike the non-antigenantibody complexes). This is a clear indication that the number density of highly conserved positions in the interfacial region is significantly smaller compared to noninterfacial regions. From Table 3
it can be seen that the ratio decreases at higher CI intervals for both antigen and antibody regions. Therefore, we gave higher ranks to the models with low numbers of conserved positions.
That the antigen interface is not conserved in the manner of non-antibodyantigen complexes is perhaps an unexpected finding. At present we do not have any clear explanation for this finding, and to our knowledge there is no study on conservation signals for antigens. Nonetheless, based on the strength of the signal, we have used a reverse conservation filter for both antibodies and antigens.
In principle, it is more difficult to predict the binding site of an antigen, since for antibodies this region is known. Our computations provide a means for identifying the antigen-binding site.
Filters
Conservation position filter
Using homologous sequences we calculated conservation indices for each docked model using equation 5. We have identified the top 8% (defined as group 1) and top 17% (defined as group 2) of highly conserved and well-exposed surface residues, in each polypeptide chain of the interacting complex.
We counted the total number of group 1 and group 2 positions in each modeled complex interface region. Using the group 1 and group 2 conservation positions as a filter, the total number of docked models is reduced. We selected only the models that have at least four of group 1 positions or six of group 2 positions in the interface region of the enzymeinhibitor model complexes. In the case of antigenantibody complexes (e.g., 1JHL, 1KXQ), we have reversed the selection, limiting to two or less group 1 positions and four or less group 2 positions. We chose these cutoffs because we maximized the number of filtered docking solutions out of the 10,000 generated structures with the minimum number of near-native structures, as discussed in "Results" above.
Filter II
A second filter was developed to lower the number of model structures further, using the average conservation rank along with other three ranks (shape complementarity, pair potential, and desolvation energy; described in the next section). If the rank of a complex is worse than 1200 in any of the four rankings, then the corresponding model is filtered out of the set of putative near-native structures. Filter II is performed with only three ranks if conservation information is not available as described in "Results" above.
Side-chain relaxation and binding free energy calculation
Since the generated docked complexes have very strong side-chain overlap effects (atoms are very close to each other), we cannot calculate the binding energy correctly. Therefore, for each possible complex we perform energy minimization to reduce the side-chain overlap effects. We used the CHARMM (Brooks et al. 1983) molecular mechanics simulation package for energy minimization. With CHARMM, we built in the missed atoms and all hydrogen atoms, fixed all backbone atoms, and let the side-chain atoms relax to the minimum internal energy. Minimization was stopped if the energy did not change by more than 0.1% of the total energy of the complex. We should note here that this step is particularly computationally intensive. We thus worked on only the filtered structures after using the calculated conservation indices.
Using the relaxed structures, we calculated the binding free energy. With some approximation, the free energy change can be divided into several terms (Camacho et al. 2000b; Dennis et al. 2002):
![]() | (7) |
These terms can be calculated separately:
Gcoulomb and
Gpol can be calculated with the Generalized Born model with the Debye-Huckel approximation (Jayaram et al. 1998, 1999):
![]() | (8) |
![]() | (9) |
where fGB = (rij2 +
ij2eD)1/2,
ij = (
i
j)1/2, D = rij2/(2
ij)2, and
i is the effective Born radius of the atom, which can be obtained by pairwise dielectric descreening procedure (Hawkins et al. 1996). The desolvation energy term 
kSASAk can be calculated using the Solvent-Accessible Surface Area for each residue (SASAk). The weights (
k) for each residue are taken from the work of Wang et al. (1995). For the binding interaction, we use van der Waals interaction of the form:
![]() | (10) |
The summation is for all the atom pairs from the two proteins. The potential parameters Aij and Bij for each atom pair are taken from the CHARMM force field (Brooks et al. 1983) and AutoDock (Morris et al. 1998). From the value of free energy
G, we calculated a new rank for all filtered possible complexes.
We also generated a rank based on only the desolvation term of the free energy, which is the only part of the free energy that can be calculated without relaxing the docked structures with minimization.
Global normalized ranking
Our goal is to determine an optimal ranking procedure for identifying near-native structures. We could use a weighted sum of all the calculated descriptors (shape complementarity, pair-potential, CHARMM energy, binding free energy, desolvation energy, conservation indices) to produce a global rank for the filtered subset of docked models, but values of these properties are not in the same units, and the weights are not universal and hard to optimize. In our algorithm, instead of using the real value of each descriptor, we used the rank of each property since they have the same meaning and can be summed together.
For each individual descriptor, a normalized ranking method is applied. The rank was obtained by finding the maximum (Vmax) and minimum (Vmin) of their values and using the following equation:
![]() | (11) |
where Vi is the property value of complex i, and N is the total number of complexes after filtering. There may be some gaps if the difference between complexes is large, and several complexes can have the same rank number if their values are very close to one another. Nonetheless, this normalized method clearly reveals the difference among the complexes. Specifically for the binding free energy descriptor, we set the Vmax equal to zero. If for a complex the binding free energy is greater than zero, we assign the highest rank (in our case is 10,000) to that complex.
The global score is simply obtained by a weighted average of all normalized ranks:
![]() | (12) |
where M is the number of rank methods (descriptors), and
i is the weights for descriptor i. Factor 100 is a scale factor that reduces the maximum of global_score to 100.
To determine which properties should be included in our global ranking and what their weights should be (
i in equation 12), we calculated Pearsons correlation coefficients (Devore and Peck 2001) between each of the descriptor ranks and the RMSD of the models from the native structure. From our calculated correlation coefficients, we found the CHARMM energy has a particularly low value of correlation coefficients (<0.10). Therefore we have excluded the CHARMM energy from our ranking procedure and only use M = 5 descriptors (shape complementarity, pair potential, conserved residue, binding free energy, desolvation energy) into our final global rank.
Ideally, for the best possible prediction, the correlation coefficient would be equal to 1 (best ranked having lowest RMSD, second best ranked having second lowest RMSD, etc.). These coefficients provide a measure of the predictive ability of a single descriptor. They also provide a means of comparing the different descriptors.
There is no descriptor that does well, in terms of correlation coefficient values, for all 59 complexes. Specifically, we found that the pair-potential descriptor has a significant correlation coefficient value (>0.10) for 22 complexes, desolvation energy has significant positive correlation in 13 complexes, conserved residue descriptor has significant correlation in 10 complexes, shape-complementarity values correlate well with RMSD in three complexes, and that the binding free energy has significant correlation coefficient values in three complexes. For some complexes more than one descriptor gives significant correlation coefficient values.
We determined the weights for equation 12 using the relative number of complexes for which each descriptor does well, in terms of predictive ability and correlation coefficient values. Taking also into account the fact that for some complexes more than one descriptor does well, we used weights of 1, 1, 2, 4, and 5 for shape complementarity, binding free energy, conservation index, pair-potential energy, and desolvation energy, respectively. Hence, we use these relative coefficients as the weights
for each descriptor in equation 12.
| Acknowledgments |
|---|
| References |
|---|
|
|
|---|
Brooks, B.R., Bruccoleri, R.E., Olfson, B.D., States, D.J., Swaminathan, S., and Karplus, K. 1983. CHARMM: A program for macromolecular energy, minimization, and dynamics calculations. J. Comp. Chem. 4: 187217.
Caffrey, D.R., Somaroo, S., Hughes, J.D., Mintseris, J., and Huang, E.S. 2004. Are proteinprotein interfaces more conserved in sequence than the rest of the protein surface? Protein Sci. 13: 190202.
Camacho, C.J. and Vajda, S. 2002. Proteinprotein association kinetics and protein docking. Curr. Opin. Struct. Biol. 12: 3640.[CrossRef][Medline]
Camacho, C.J., Gatchell, D.W., Kimura, S.R., and Vajda, S. 2000a. Scoring docked conformations generated by rigid-body proteinprotein docking. Proteins 40: 525537.[CrossRef][Medline]
Camacho, C.J., Kimura, S.R., DeLisi, C., and Vajda, S. 2000b. Kinetics of desolvation-mediated proteinprotein binding. Biophys. J. 78: 10941105.
Chen, R., Li, L., and Weng, Z. 2003a. ZDOCK: An initial-stage protein-docking algorithm. Proteins 52: 8087.[CrossRef][Medline]
Chen, R., Mintseris, J., Janin, J., and Weng, Z. 2003b. A proteinprotein docking benchmark. Proteins 52: 8891.[CrossRef][Medline]
Cherfils, J. and Janin, J. 1993. Protein docking algorithms: Simulating molecular recognition. Curr. Opin. Struct. Biol. 3: 265269.
Dennis, S., Kortvelyesi, T., and Vajda, S. 2002. Computational mapping identifies the binding sites of organic solvents on proteins. Proc. Natl. Acad. Sci. 99: 42904295.
Devore, J. and Peck, R. 2001. Statistics: The exploration and analysis of data, 4th ed., p. 136. Duxbury Press, Pacific Grove, CA.
Dominguez, C., Boelens, R., and Bonvin, A.M. 2003. HADDOCK: A proteinprotein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc. 125: 17311737.[CrossRef][Medline]
Ewing, T.J., Makino, S., Skillman, A.G., and Kuntz, I.D. 2001. DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases. J. Comput. Aided Mol. Des. 15: 411428.[CrossRef][Medline]
Fahmy, A. and Wagner, G. 2002. TreeDock: A tool for protein docking based on minimizing van der Waals energies. J. Am. Chem. Soc. 124: 12411250.[CrossRef][Medline]
Gabb, H.A., Jackson, R.M., and Sternberg, M.J. 1997. Modelling protein docking using shape complementarity, electrostatics and biochemical information. J. Mol. Biol. 272: 106120.[CrossRef][Medline]
Gardiner, E.J., Willett, P., and Artymiuk, P.J. 2003. GAPDOCK: A Genetic Algorithm Approach to Protein Docking in CAPRI round 1. Proteins 52: 1014.[CrossRef][Medline]
Glaser, F., Pupko, T., Paz, I., Bell, R.E., Bechor-Shental, D., Martz, E., and Ben-Tal, N. 2003. ConSurf: Identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics 19: 163164.
Gonnet, G.H., Cohen, M.A., and Benner, S.A. 1992. Exhaustive matching of the entire protein sequence database. Science 256: 14431445.
Gottschalk, K.E., Neuvirth, H., and Schreiber, G. 2004. A novel method for scoring of docked protein complexes using predicted proteinprotein binding sites. Protein Eng. Des. Sel. 17: 183189.
Gray, J.J., Moughon, S., Wang, C., Schueler-Furman, O., Kuhlman, B., Rohl, C.A., and Baker, D. 2003. Proteinprotein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. J. Mol. Biol. 331: 281299.[CrossRef][Medline]
Halperin, I., Ma, B., Wolfson, H., and Nussinov, R. 2002. Principles of docking: An overview of search algorithms and a guide to scoring functions. Proteins 47: 409443.[CrossRef][Medline]
Hawkins, G.D., Gramer, C.J., and Truhlar, D.G. 1996. Parametrized models of aqueous free energies of solvation based on pairwise descreening of solute atomic charges from a dielectric medium. J. Phys. Chem. 100: 1982419839.[CrossRef]
Helmer-Citterich, M. and Tramontano, A. 1994. PUZZLE: A new method for automated protein docking based on surface shape complementarity. J. Mol. Biol. 235: 10211031.[CrossRef][Medline]
Janin, J. 1995. Proteinprotein recognition. Prog. Biophys. Mol. Biol. 64: 145166.[CrossRef][Medline]
Jayaram, B., Sprous, D., and Beveridge, D.L. 1998. Solvation free energy of biomacromolecules: Parameters for a modified generalized Born model consistent with the AMBER force field. J. Phys. Chem. B 102: 95719576.[CrossRef]
Jayaram, B., McConnell, K.J., Dixit, S.B., and Beveridge, D.L. 1999. Free energy analysis of proteinDNA binding: The EcoRI endonucleaseDNA complex. J. Comput. Phys. 151: 333357.[CrossRef]
Jones, S. and Thornton, J.M. 1996. Principles of proteinprotein interactions. Proc. Natl. Acad. Sci. 93: 1320.
. 1997. Prediction of proteinprotein interaction sites using patch analysis. J. Mol. Biol. 272: 133143.[CrossRef][Medline]
Katchalski-Katzir, E., Shariv, I., Eisenstein, M., Friesem, A.A., Aflalo, C., and Vakser, I.A. 1992. Molecular surface recognition: Determination of geometric fit between proteins and their ligands by correlation techniques. Proc. Natl. Acad. Sci. 89: 21952199.
Larsen, T.A., Olson, A.J., and Goodsell, D.S. 1998. Morphology of proteinprotein interfaces. Structure 6: 421427.[Medline]
Li, L., Chen, R., and Weng, Z. 2003. RDOCK: Refinement of rigid-body protein docking predictions. Proteins 53: 693707.[CrossRef][Medline]
Lichtarge, O. and Sowa, M.E. 2002. Evolutionary predictions of binding surfaces and interactions. Curr. Opin. Struct. Biol. 12: 2127.[CrossRef][Medline]
Lichtarge, O., Yao, H., Kristensen, D.M., Madabushi, S., and Mihalek, I. 2003. Accurate and scalable identification of functional sites by evolutionary tracing. J. Struct. Funct. Genomics. 4: 159166.[CrossRef][Medline]
Lo Conte, L., Chothia, C., and Janin, J. 1999. The atomic structure of proteinprotein recognition sites. J. Mol. Biol. 285: 21772198.[CrossRef][Medline]
Mandell, J.G., Roberts, V.A., Pique, M.E., Kotlovyi, V., Mitchell, J.C., Nelson, E., Tsigelny, I., and Ten Eyck, L.F. 2001. Protein docking using continuum electrostatics and geometric fit. Protein Eng. 14: 105113.
Mihalek, I., Res, I., and Lichtarge, O. 2004. A family of evolutionentropy hybrid methods for ranking protein residues by importance. J. Mol. Biol. 336: 12651282.[CrossRef][Medline]
Moont, G., Gabb, H.A., and Sternberg, M.J. 1999. Use of pair potentials across protein interfaces in screening predicted docked complexes. Proteins 35: 364373.[CrossRef][Medline]
Morris, G.M., Goodsell, D.S., Halliday, R.S., Huey, R., Hart, W.E., Belew, R.K., and Olson, A.J. 1998. Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J. Comput. Chem. 19: 16391662.[CrossRef]
Palma, P.N., Krippahl, L., Wampler, J.E., and Moura, J.J. 2000. BiGGER: A new (soft) docking algorithm for predicting protein interactions. Proteins 39: 372384.[CrossRef][Medline]
Paul, N. and Rognan, D. 2002. ConsDock: A new program for the consensus analysis of proteinligand interactions. Proteins 47: 521533.[CrossRef][Medline]
Richmond, T.J. and Richards, F.M. 1978. Packing of
-helices: Geometrical constraints and contact areas. J. Mol. Biol. 119: 537555.[CrossRef][Medline]
Ritchie, D.W. and Kemp, G.J. 2000. Protein docking using spherical polar Fourier correlations. Proteins 39: 178194.[CrossRef][Medline]
Sali, A. and Blundell, T.L. 1990. Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. J. Mol. Biol. 212: 403428.[CrossRef][Medline]
Shoichet, B.K. and Kuntz, I.D. 1996. Predicting the structure of protein complexes: A step in the right direction. Chem. Biol. 3: 151156.[CrossRef][Medline]
Smith, G.R. and Sternberg, M.J. 2002. Prediction of proteinprotein interactions by docking methods. Curr. Opin. Struct. Biol. 12: 2835.[CrossRef][Medline]
Sternberg, M.J., Gabb, H.A., and Jackson, R.M. 1998. Predictive docking of proteinprotein and proteinDNA complexes. Curr. Opin. Struct. Biol. 8: 250256.[CrossRef][Medline]
Taylor, J.S. and Burnett, R.M. 2000. DARWIN: A program for docking flexible molecules. Proteins 41: 173191.[CrossRef][Medline]
Tsai, C.J., Lin, S.L., Wolfson, H.J., and Nussinov, R. 1997. Studies of proteinprotein interfaces: A statistical analysis of the hydrophobic effect. Protein Sci. 6: 5364.[Abstract]
Valdar, W.S. and Thornton, J.M. 2001. Proteinprotein interfaces: Analysis of amino acid conservation in homodimers. Proteins 42: 108124.[CrossRef][Medline]
Wang, Y.H., Zhang, H., and Scott, R.A. 1995. A new computational model for protein-folding based on atomic solvation. Protein Sci. 4: 14021411.[Abstract]
Yan, C., Dobbs, D., and Honavar, V. 2004. A two-stage classifier for identification of proteinprotein interface residues. Bioinformatics 20: i371i378.[Abstract]
Yang, J.M. and Chen, C.C. 2004. GEMDOCK: A generic evolutionary method for molecular docking. Proteins 55: 288304.[CrossRef][Medline]
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
H. Madaoui and R. Guerois Coevolution at protein complex interfaces can be detected by the complementarity trace with important impact for predictive docking PNAS, June 3, 2008; 105(22): 7708 - 7713. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. de Vries and A. M. J. J. Bonvin Intramolecular surface contacts contain information about protein-protein interface regions Bioinformatics, September 1, 200 |