|
|
||||||||
1 Department of Biological Chemistry, The Weizmann Institute of Science, Rehovot 76100, Israel
2 Department of Chemical Services, The Weizmann Institute of Science, Rehovot 76100, Israel
Reprint requests to: Miriam Eisenstein, Weizmann Institute Of Science, Chemical Services Unit, Rehovot 76100, Israel; e-mail: miriam.eisenstein{at}weizmann.ac.il; fax: 972-8-9344136.
(RECEIVED June 27, 2001; FINAL REVISION November 9, 2001; ACCEPTED November 21, 2001)
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.26002.
| Abstract |
|---|
|
|
|---|
Keywords: Molecular docking; molecular recognition; electrostatic complementarity; surface matching; electrostatic patches; grid representation by complex numbers
| Introduction |
|---|
|
|
|---|
Analyses of experimentally determined structures indicate that intermolecular recognition is facilitated by a myriad of weak, noncovalent interactions that together promote specificity at different levels. The prediction of the structures of complexes is therefore a difficult, multidimensional problem that attempts to solve simultaneously the relative position of the molecules in the complex and their conformation. The docking problem can, however, be solved in a series of steps: the first of which is the determination of the relative position of the molecules in the complex. This is a six-dimensional problem in which the docked molecules are treated as rigid bodies. Comparisons of the structures of bound and unbound molecules indicate that in many cases, these structures are very similar (Conte et al. 1999), supporting the rigid body approximation. Once the relative positions of the molecules in a complex are determined, attempts can be made to modify and refine their structures in a manner that improves the interaction (Robert and Janin 1998; Oliva and Moult 1999).
Interfaces are characterized by complementarity of the shape and the chemical character of the interacting surfaces (Jones and Thornton 1996; Tsai et al. 1996; Conte et al. 1999). Recent analyses of molecular interfaces indicate that in nonpermanent complexes composed of molecules that can exist individually in solution, electrostatic contacts are abundant (Xu et al. 1997; Conte et al. 1999). In permanent complexes, that is, oligomers, hydrophobic contacts are dominant (Jones and Thornton 1996; Tsai et al. 1996). Nevertheless, in most oligomers the hydrophobic patches at the interface are mixed with hydrophilic ones (Larsen et al. 1998). It appears that despite the small and often unfavorable contribution of electrostatics to the stabilization of molecular complexes (Sheinerman et al. 2000), there is electrostatic complementarity at the interface. Evidently, electrostatic contacts should be included in docking algorithms, in particular when applied to nonpermanent complexes. Yet, attempts in this direction, in which the electrostatic energy is calculated, show only marginal improvement in the docking results (Gabb et al. 1997; M. Eisenstein, unpubl.). This is most likely caused by the acute sensitivity of the electrostatic interaction energy to the details of the structure of the complex. In docking, this sensitivity is problematic in two ways: First, the conformation of the unbound protein differs from that of the same protein in the complex. Second, in many docking algorithms, the relative positions of the two molecules are determined only approximately, for example, by a stepwise sampling of the translation/rotation space, in which the exact relative position is usually not included. The structural errors caused by the conformation change and the mispositioning reduce the usefulness of electrostatic energy computations in docking (Robert and Janin 1998). Mandell et al. (2001) use a continuum model to calculate the potential around one of the molecules and place charges on the other one. They report improved ranking of the nearly correct solution in geometric-electrostatic docking for two of the three unbound systems that they tested.
Shape complementarity is the core of most docking algorithms, and some of them rely only on geometric recognition (for example, Jiang and Kim 1991; Katchalski-Katzir et al. 1992; Walls and Sternberg 1992; Norel et al. 1994). Other algorithms also include electrostatic interactions (Ausiello et al. 1997; Gabb et al. 1997; Mandell et al. 2001), hydrogen bonding (Meyer et al. 1996; Ausiello et al. 1997), and hydrophobic contacts (Vakser and Aflalo 1994; Ackermann et al. 1998). Previously, our group has presented a docking algorithm based only on shape complementarity, which successfully reassembled binary complexes (Katchalski-Katzir et al. 1992) and helical aggregates (Eisenstein et al. 1997) and predicted the structures of new complexes (Strynadka et al. 1996a; Dixon 1997; Eisenstein and Katchalski-Katzir 1998).
Here we present a geometric-electrostatic docking algorithm in which the effects of electrostatics are combined with geometric surface complementarity. Instead of calculating electrostatic interaction energies for the different putative complexes formed by a given pair of molecules, we correlated their tendencies to form good electrostatic contacts. Recently, it has been shown that the electrostatic potentials at the interfaces of interacting molecules are anticorrelated (McCoy et al. 1997). This means that at the interface, there is a good chance to find a patch of positive electrostatic potential on the surface of one molecule positioned next to a negative patch on the surface of the adjacent molecule and vise versa. We represent the electrostatic potential of each molecule as positive, neutral, or negative patches. These patches are combined with the geometric representation of the molecule in a three-dimensional (3D) matrix of complex numbers. The correlation of such matrices provides a measure of the geometric and electrostatic complementarity of the molecular surfaces. We also present an algorithm for rotating of the electrostatic potential by translating it into potential spheres. These spheres are treated as atoms; they are rotated to new orientations and then translated back into potential patches.
The new method is applied to a selection of known binary complexes, starting from the structures of the unbound molecules. The inclusion of electrostatic complementarity significantly improves the docking results for most systems by ranking the nearly correct solutions as highly probable. This is important because once a nearly correct solution is ranked as highly probable, additional screening methods and refinement algorithms, which can only be applied to a limited set of solutions, can be used to indicate the correct structure (Robert and Janin 1998). Our statistical analyses of the docking results for each system and for all the selection of systems further emphasize the contribution of electrostatic complementarity to the successful prediction of the structures of complexes. Finally, we sum up our results and analyses in several "good electrostatic docking rules," which depict systems in which geometric-electrostatic docking is likely to be more successful than geometric docking and vise versa.
| Algorithms |
|---|
|
|
|---|
The electrostatic potential
The electrostatic potential around each of the docked molecules is calculated by solving the linearized Poisson-Boltzman equation, using the finite-differences method as implemented in the program Delphi (Klapper et al. 1986; Honig and Nicholls 1995). The calculations are performed on a fine grid, 0.5 å, producing accurate estimates of the potential. For each system, we ascertain that the Delphi grid extent is large enough to encompass all the potential points, with absolute values exceeding a given minimum, Pmin (see below).
The calculation of the potential is separate from our docking procedure (implemented in a computer program named MolFit), which reads the potential files for the two molecules, together with the necessary data regarding the grid interval and the origin of the potential grid. Thus, in principle, potentials calculated with other programs or with different forcefield parameters can be read in and used for describing the electrostatic character of the docked molecules.
Calculation of the geometric-electrostatic correlation function
In MolFit, the common 3D atomic representation of the molecules to be docked is replaced by a 3D grid representation. Each molecule is projected onto a 3D grid, such that grid points outside the molecule are given the value 0, points on the surface of the molecule are given the value 1, and those in the interior of the molecule are given either the negative value
or the positive value
for molecules a and b, respectively. The grids are then correlated using discrete Fourier transformations (Katchalski-Katzir et al. 1992; Eisenstein et al. 1997). In these transformations, complex numbers are involved, which can be exploited for describing the electrostatic character of each molecule as follows:
Each grid point in the representation of molecule a is given the complex value
![]() | ((1)) |
; Eal,m,n is the electrostatic descriptor of molecule a (its values are discussed below), i is the square root of -1 and, w is an adjustable scale factor. Similarly, each grid point in the representation of molecule b is given by the complex value
![]() | ((2)) |
, and Ebl,m,n is its electrostatic descriptor (see below). The correlation function C
,ß,
can be calculated by a triple summation as follows:
![]() | ((3)) |
,ß,
in equation (3
The real part of C
,ß,
is the complementarity score for the given relative orientation and translation of the two molecules. It consists of two terms. The first term,
![]() | ((4)) |
,ß,
. The values of
and
are -15 and 1, respectively (Katchalski-Katzir et al., 1992). The second term,
![]() | ((5)) |
Instead of the lengthy summation in equation 3
, the correlation function is calculated via fast Fourier transformations (Bringham 1988), and then the real part of the correlation matrix is extracted. Our procedure requires a single N x N x N matrix to describe both the geometric and the electrostatic character of each molecule. Only one series of forward Fourier transformation, multiplication, and inverse Fourier transformation is performed for each orientation, as in the geometric docking (Katchalski-Katzir et al. 1992). A geometric-electrostatic rotation/translation scan, which uses a grid of 128 x 128 x 128 points and a rotation interval of 12°, requires approximately 9 h on a SGI Octane with a single R10000 processor.
Grid representation of the potential via potential spheres
The values of Eal,m,n and Ebl,m,n are derived from the potentials Pai,j,k and Pbi,j,k for molecules a and b, respectively. The program Delphi uses a grid to calculate the potential, and the indices i2, j, and k identify the potential grid points. Notably, these do not necessarily correspond to the indices l, m, and n, which identify the MolFit grid points. This is a result of the different requirements of Delphi and MolFit regarding the grid interval. As mentioned above, potentials are calculated using a fine grid (usually 0.5 å) to attain better accuracy. Such a grid is, however, too fine for the geometric representation of the molecules in MolFit, in which an interval of 1.0 to 1.2 å was found adequate (Katchalski-Katzir et al. 1992). The different grid intervals make it impossible to map the potential grid directly onto the MolFit grid. Therefore, we translate the potential grid into potential spheres with a radius
![]() | ((6)) |
|
Pmin. This is the continuous representation of the potential. In the second approach, only the sign of the average potential is kept for grid points with |El,m,n|
Pmin. Thus, grid points are assigned potential values 1, -1, or 0. This is the single-value step representation of the potential. In both approaches, the potential values of grid points in the interior of molecule a, where Gal,m,n is negative, are 0. Notably, the potential is not limited to the geometric surface of the molecule, and nonzero potential values may occur in grid points in which Gl,m,n is 0.
All the potential values are scaled by the adjustable factor
w, and the combined score is a linear combination of the geometric score and the electrostatic score. The value of w is determined empirically by optimizing the combined geometric-electrostatic score for several known complexes (see below).
Rotation of molecule b and its potential
The computation of the correlation function must be repeated for many relative orientations of the two molecules. When molecule b is rotated with respect to molecule a, both its geometric and electrostatic representations must be recalculated. However, the calculation of the electrostatic potential is time consuming, and in principle, there is no need to recalculate the potential because it is invariant to rotations. The sphere representation of the potential is very helpful here. Thus, we apply the rotation matrix to the potential spheres in the same manner as to the atomic coordinates. The rotated atomic coordinates and potential spheres are then projected onto the MolFit grid, producing the new (rotated) geometric and electrostatic representation of molecule b.
Rotation/translation scans and determination of the rank of nearly correct solutions
Geometric and geometric-electrostatic rotation/translation scans are performed using a grid interval between 1.0 and 1.2 å and a rotation interval of 12°. Only one solution is saved for each orientation, resulting in 8760 putative binary complexes sorted by their complementarity scores (Eisenstein et al. 1997). All the MolFit solutions are compared with the experimental structure of the complex by calculating the root mean square differences (RMSDs) between the positions of the common C
atoms. The rank of the nearly correct solution is its position in the sorted list of solutions, and it is determined by searching for the highest scoring solution with a RMSD <3 å. The only exception to this rule is the system 1bth, in which the RMSD value between the contact residues of the enzyme in the bound and unbound structure is exceedingly large (see Table 1
). In many cases, the score of the nearly correct solution is identical to that of other solutions. In such cases, the rank is given as a range of numbers representing the ranks of all these solutions.
|
. These values are used to calculate a uniqueness value (Zi) for each docking solution, i, the score of which is Si, as follows:
![]() | ((7)) |
Prediction of the ranks of the nearly correct solutions for different values of w (virtual scans)
The parameter w(equations 1, 2![]()
), which determines the relative contributions of the geometric and electrostatic terms to the combined complementarity scores, was optimized for several systems as described in Materials and Methods. To further test the adequacy of w for all the systems and in particular to examine the sensitivity of the geometric-electrostatic docking results to the value of w, we designed the following analysis: The combined geometric-electrostatic score for each solution, j, is a linear combination of the geometric and the electrostatic contributions, in which aj x w is the slope. The value of aj differs for each solution because it depends on the character of the interface in solution j. The aj can be estimated from the results of at least two real scans with different w values and used to predict the score of each solution for another value of w. The prediction is valid only when the rotations and translations for the given solution in the two real scans are very close. This requirement is satisfied when the two w values are close to one another, and therefore, an additional rotation/translation scan is needed. The results of the geometric scan (w = 0.) and the two geometric-electrostatic scans are used to estimate the slopes, aj, for all j solutions. Next, the aj values are used to predict the geometric-electrostatic complementarity scores for other w values (virtual scans). The solutions in each virtual scan are sorted according to the predicted scores, and finally the predicted rank of the nearly correct solution is determined.
| Results |
|---|
|
|
|---|
The structures and their PDB codes are listed in Table 1
. The table also lists the MolFit translation grid intervals used in the rotation/translation scans and the RMSDs between bound and unbound structures, calculated for all the common C
atoms and for the residues at the interface. Four of the unbound structures are incomplete, and the coordinates of several exposed side-chains are not listed in the PDB file. This reflects high thermal motion or disorder and is characteristic of long and exposed side-chains. However, such side-chains are often charged side-chains (arginine, lysine, glutamate), and we suspect that their omission (by using the PDB coordinates as they are) may affect the docking results, particularly the geometric-electrostatic docking. Therefore, for four structures (1bni, 1avu, 4htc, and 1ace), we modeled the missing side-chains. This was performed automatically, using the MSI molecular graphics package (MSI Inc.). The added side-chains usually have an extended conformation (as long as it does not clash with the rest of the molecule), which is not related to the conformation of this side-chain in the bound system.
Eleven disassembled enzyme/inhibitor systems were tested. For eight of these systems, we also docked the unbound molecules; in two cases, we docked the unbound enzyme to the matching inhibitor from a different complex, imitating an unbound situation; and in only one case, we docked the bound structure of the inhibitor (leech-derived tryptase inhibitor) to the unbound enzyme. We also tested six disassembled antibody/antigen systems. The unbound structures of four of the antibodies were not known: Hyhel-10 (3hfm), Hyhel-5 (3hfl), Jel42 (2jel), and antibody E5.2 (1dvf). We therefore used the bound structure when necessary. There were two entries for the unbound structure of lysozyme in the PDB: 1lza and 1hel. Both were docked to the appropriate antibodies in the unbound docking tests.
Optimization of the adjustable parameters
Implementation of the geometric-electrostatic docking algorithm described above requires the optimization of several adjustable parameters: Pmin, which is the minimum absolute value of the potential considered by the program (in kT/e); f, which determines the radius of the potential spheres in equation 6; and w, which determines the relative contributions of the geometric and electrostatic terms in equation 3
. In addition, we tested the suitability of potentials calculated with either the Formal or the PARSE set of electrostatic charges, to our docking procedure. All the parameters were optimized for the two representations of the potential, the step representation and the continuous representation.
The details of the optimization procedures are described in Materials and Methods. We find that electrostatic potentials calculated with the PARSE set of charges are more adequate for our electrostatic representations than those obtained using Formal charges. The optimal value of the parameter f is 1.0 for both representations of the electrostatic potential. In contrast, the optimal values of Pmin and w depend on the electrostatic representation. Thus, Pmin is 3.0 kT/e for the step representation and 2.0 for the continuous representation, and w is 0.25 or 0.35 for the step representation and 0.0015 for the continuous representation.
Geometric-electrostatic docking with the step representation of the potential
In this series of computations positive, neutral and negative single-value patches represent the electrostatic potential. The results of geometric and geometric-electrostatic scans for disassembled systems are summarized in Table 2
. The geometric-electrostatic scores for the nearly correct solutions are usually higher than the geometric scores, and the ranking of these solutions is also higher. The general increase in the complementarity scores observed on the introduction of electrostatic complementarity is in agreement with the results of McCoy et al. (1997), who showed that there is significant anticorrelation between the electrostatic potentials at the interface of interacting molecules.
|
Table 3
presents the docking results for unbound structures. Geometric docking of unbound structures ranks a nearly correct solution among the top 10 in four systems out of the 17 that we tested. The introduction of electrostatic complementarity significantly improves the ranking of a nearly correct solution for 10 out of the 17 systems, and in some cases, the improvement is dramatic. For example, the rank of a nearly correct solution for the acetylcholinesterase/fasciculin-II system is elevated from 689705 to 3, for the trypsin/BPTI system, it is elevated from 271278 to 6364, and in the 1bth system, the rank of the nearly correct solution is elevated from 508531 to 99102 on introduction of electrostatic complementarity. The geometric-electrostatic results are worse than the geometric results for the TEM1/BLIP system and for some of the antibody/antigen systems. These results are discussed below.
|
Statistical significance
Tables 2 and 3![]()
also list Z values for the nearly correct solution and for the top-ranking solution (Z1) in each geometric and geometric-electrostatic scan. The Z1 values for the geometric scans of disassembled systems range from 6.1 to 17.7. The corresponding range for unbound systems is 5.8 to 9.8. The distributions of scores in the geometric-electrostatic scans are consistently wider than those in the corresponding geometric scans (larger
values; data not shown). Hence, the ranges of Z1 values in such scans are shifted toward smaller values: 3.3 to 11.2 for unbound systems versus 5.4 to 16.8 for disassembled systems.
| Discussion |
|---|
|
|
|---|
|
Another novel feature of our docking algorithm is the translation of the electrostatic-potential to potential spheres, which are rotated together with the atoms in the molecule and projected onto the MolFit grid. Hence, only a single computation of the electrostatic potential is necessary for each molecule involved.
The geometric-electrostatic algorithm was applied to 17 disassembled systems and to the corresponding unbound systems, providing very significant improvement in the ranking of the nearly correct solution in a considerable number of cases. We compare the results to the results of geometric docking and analyze them in terms of the rank of the nearly correct solution (the position of this solution in the sorted list of solutions produced by MolFit). The ranks reflect the success of docking tests more effectively than the complementarity scores, which cannot be compared between different systems.
Comparison of the geometric and geometric-electrostatic docking results
Electrostatics consistently improves the docking results for disassembled systems (see Table 2
): The complementarity scores increase (except for the TEM1/BLIP system, which is discussed below), and the ranks of the nearly correct solutions are higher, when possible. More importantly, electrostatics elevates the scores of the nearly correct solutions for 18 of the 20 unbound systems and improves their ranking for 13 of these systems (see Table 3
). At first glance, the enzyme/inhibitor and antibody/antigen groups of systems present a distinctly different behavior with respect to the electrostatic contribution to the complementarity score: In the enzyme/inhibitor group, the scores increase and improve the ranking of nearly correct solutions, whereas in the antibody/antigen group, the increase in score is accompanied by an increase in the number of false-positive solutions, resulting in inferior ranking of the nearly correct solutions. The different behavior of enzyme/inhibitor and antibody/antigen systems has been noticed before (Sternberg et al. 1998). It is discussed in more detail and challenged below.
The statistical analysis of the distribution of scores (Table 4
) provides an additional view of the results. For each scan, we compare the Z value of the nearly correct solution to Z1, which is the Z value of the top-ranking solution. All the 17 disassembled systems have a nearly correct solution within the top 4
range in the geometric scans (ranks 92 and less) and the top 3
range in the geometric-electrostatic scans (ranks 33 and less). One expects that the conformation differences between disassembled and unbound structures will obscure the uniqueness of the correct structure. Indeed, the Z values for the nearly correct solutions obtained in docking of unbound structures are generally lower than those obtained for disassembled systems. We find a nearly correct solution in the top 5
range for 12 of the 20 systems in geometric docking and for 15 systems in the geometric-electrostatic scans. Thus, our geometric-electrostatic docking appears to be more successful than the geometric docking in most cases.
|
The geometric complementarity scores are roughly related to the size of the interface, and therefore, the size and possibly the shape of the interface may contribute to the success of the prediction by geometric docking. However, the buried surface areas, listed in Table 2
, appear to be unrelated to the success of the geometric docking, in terms of the ranking of the nearly correct solutions, of either disassembled or unbound structures (see Tables 2, 3![]()
).
It appears that the results of the geometric docking of unbound structures do not correlate with obvious features such as the amount of structural change on complex formation and the area of the interface. Possibly the success or failure of the geometric docking is related to the omission of water molecules in the docking procedure. In most systems, there are several buried water molecules, which fill gaps between the interacting molecular surfaces. Our algorithm ignores these water molecules, and therefore, in some cases conformation changes are accommodated more easily than in other cases. The complex between barnase and barstar provides a good example for a beneficial effect because of the omission of water. The shape complementarity in this system is not perfect because several water molecules cluster on one side of the interface (Buckle et al. 1994). Our analyzes of the top-ranking solutions in the geometric and geometric-electrostatic scans for disassembled structures and for unbound structures identify clusters of nearly correct solutions. Such clustering is most likely caused by the omission of water molecules, which allows limited rotational freedom at the interface.
Parameters affecting the geometric-electrostatic docking results
In most cases, the geometric-electrostatic docking is superior to geometric docking, as described in the results section and summarized in Table 4
. Our `imperfect success' raises two related questions: What are the factors that determine the success or failure of the geometric-electrostatic docking compared with the geometric docking, and can we predict which algorithm should be applied to a given system? In attempt to answer these questions, we inspected the potential patches on the surfaces of the molecules in all the systems under consideration and compared their size and distribution at the interface to other portions of the surface (the noninteracting surface). We also calculated the molecular dipole moments and the angles between these vectors for each pair of interacting molecules.
The 10 systems for which the introduction of electrostatic complementarity significantly improves the ranking of the nearly correct solution (2ptc, 1brs, 1cho, 1avw, 1smf, 1bth, 1ldt, 1fss, 1vfb, and 2jel) are all characterized by one or two very large potential patches on the interacting surface of the enzyme or antibody and a very large or several large potential patches of the opposite sign on the interacting surface of the ligand. Moreover, the potentials at the interface often protrude into the solvent (Fig. 3
). This increases the number of MolFit grid points with an imaginary part that is nonzero (the real part may be zero) and adds to the electrostatic complementarity score. In contrast to the above, the potentials on the noninteracting portions of the surfaces are characterized, in most cases, by a mixture of positive and negative potential patches of different sizes.
|
Three systems1tec, 2sec, and 2jelare only slightly affected by the introduction of electrostatic complementarity. They are characterized by relatively small potential patches on the interacting surfaces of the enzyme or antibody and the ligand. The noninteracting surfaces of these molecules display a mixture of positive and negative patches.
The TEM1/BLIP system, in which the electrostatic contribution to the complementarity score is negative, stands out in the group of 11 enzyme/inhibitor systems that we tested. Inspection of the structures and the potentials around TEM1 and BLIP reveals that there is only limited electrostatic complementarity at the interface (see Fig. 4
). Thus, a large negative electrostatic patch is found on the binding surface of TEM1 next to an assembly of small positive patches. The inhibitor has a small positive patch at the interface surrounded by negative patches, which in the complex are in contact with the negative patch on the surface of the enzyme, leading to a negative contribution to the electrostatic complementarity score (see Tables 2, 3![]()
).
|
Introduction of electrostatic complementarity has a nega tive effect on the docking results for several antibody/antigen systems. The inferior ranking of the nearly correct solutions in the geometric-electrostatic docking for these systems cannot be related to a lack of electrostatic anticorrelation at the interaction sites because in most cases there is an increase in the complementarity score when either disassembled or unbound structures are docked. We note, however, that three of the six antibody/antigen complexes involve lysozyme (3hfm, 3hfl, and 1vfb). Two of the three anti-lysozyme antibodies (Hyhel-10 in 3hfm and Hyhel-5 in 3hfl) are characterized by negative potential patches at the interface and on the noninteracting surface. The antigen, lysozyme, is characterized by positive potential patches all around, which extend away from the molecular surface in several places. It is most likely that the electrostatic homogeneity of the molecules that form complexes 3hfm and 3hfl allows formation of many false-positive docking solutions and reduces the ability of the geometric-electrostatic docking algorithm to identify the correct solution. In contrast, inclusion of electrostatic complementarity improves the docking results for the third antibody/lysozyme system, 1dvf. The antilysozyme antibody in this complex, D1.3, displays a large negative potential patch at the interface, whereas on its noninteracting surface positive and negative potential patches are mixed.
Conclusions: The good electrostatic docking rules
In view of the analyses above, one may conclude that geometric-electrostatic docking should be preferred over geometric docking when the molecules involved display large positive or negative potential patches on a part of the surface, and the potential next to some of these patches protrudes into the solvent. On the other hand, if the electrostatic potential on the surface of one or both molecules is homogenous, geometric docking is likely to produce better results. These rules originate from the nature of our docking algorithm and of essentially all other protein-protein docking algorithms. Large potential patches are less sensitive to structural differences between bound and unbound structures and to mispositioning because of the stepwise sampling of the rotation/translation space. Therefore, when the electrostatic potentials are characterized by large features geometric-electrostatic docking is successful. In contrast, when the electrostatic potential appears homogenous, that is, with the same sign all around the molecule, the nearly correct solution is less distinguishable, many false-positive solutions are produced, and the rank deteriorates.
To test these rules, we searched for an antibody/antigen system with an adequate potential. In the Igg2a/peptide system (1cft), the Fv fragment of the antibody displays a mixture of large positive and negative patches, and the potential next to one of the negative patches extends away from the molecular surface. The surface of the antigen in this system is characterized by two large positive and two large negative potential patches. Geometric and geometric-electrostatic docking results for 1cft are shown in Tables 2 and 3![]()
. The geometric-electrostatic docking results are considerably better than the geometric docking results, significantly improving the ranking of the nearly correct solution. This is observed in the docking of the disassembled structures as well as for the unbound structures and supports our good electrostatic docking rules. Moreover, it appears that the same rules apply to antibody/antigen systems and enzyme/inhibitor systems, and the different behavior with regard to electrostatic complementarity noted before is a result of the limited choice of antibody/protein systems in the PDB, many of which involve lysozyme as a ligand.
We can further examine our rules against the results of Mandell et al. (2001). They studied three unbound systems and noticed a significant electrostatic contribution in two of them: mouse acetylcholinesterase/fasciculin 2 and cytochrome c peroxidase/cytochrome c. The electrostatic character of the first system is discussed above. Cytochrome c peroxidase displays a large negative patch at the interface, whereas on the noninteracting surface, positive and negative patches are mixed. Cytochrome c has a cluster of positive patches at the interface and mixed positive and negative patches on the noninteracting surface. Hence, this system too complies with our good electrostatic docking rules. In contrast, the third unbound system tested by Mandell et al., uracil-DNA glycosylase (UDG) and UDG inhibitor, is less appropriate for electrostatic docking. Thus, for UDG we note a large positive patch at the interface and a large negative patch on the noninteracting surface. However, the inhibitor is mostly neutral, that is, with a weak homogeneous potential. Indeed, the rank of the nearly correct solution, obtained by Mandell et al. for this system, deteriorates when electrostatics is included.
Clustering of nearly correct solutions
Clustering of similar solutions is mentioned above for the barnase/barstar system. In this system, we analyze the 10 top-ranking solutions obtained in the docking of either disassemble or unbound structures. In the first case, three solutions in the geometric scan and five in the geometric-electrostatic scan are variations of the nearly correct solution. Similarly, in the docking results for unbound structures, there are two nearly correct solutions among the top 10 in the geometric scan and four in the geometric-electrostatic scan. Interestingly, the clustering of nearly correct solutions is more effective in the geometric-electrostatic scans than in geometric docking.
We performed a limited cluster analysis for all the other systems, comparing only the translations of molecule b with respect to molecule a, in the 300 top-ranking docking solutions for unbound structures. Such an analysis indicates positions on the surface of molecule a, which are preferred by molecule b (at any orientation). We found that for 15 of the 20 systems studied here, one to three distinct clusters were formed. In 14 of the cases, the nearly correct solution was included in one of the clusters. Similarly, the geometric-electrostatic docking solutions form clusters in 17 systems out of the 20, and often the number of clusters is smaller than in the corresponding geometric scans. Again, in 16 cases the nearly correct solution is included in one of the clusters. It appears that clustering analysis can help in identifying the correct docking solution. A more detailed clustering analysis is underway. It is noteworthy that similar results, that is, more effective clustering of solutions in geometric-electrostatic docking, were previously reported by Mandell et al. (2001).
Docking of "residents" and "strangers"
In view of the constantly increasing number of new sequences and structures, one may wish to answer the following question: Do molecules a and b form a complex? The underlying assumption of the statistical analysis described above is that the interacting surfaces of molecules posses some unique structural and electrostatic features, which are rare among all other possible contacts. Therefore, we want to compare the distributions of scores obtained in the docking of unbound structures ("residents") with the distributions obtained in docking of unrelated system ("strangers"), expecting that the latter are wider. Hence, their Z1 values will be lower, and there will be no clustering of solutions. To this end, we performed geometric and geometric-electrostatic rotation/translation scans for four pairs of unrelated molecules: trypsin/barstar, acetylcholinesterase/eglin-C, antibody D1.3(Fv)/ovomucoid inhibitor, and subtilisin/ß-lactamase. The Z1 values for the geometric scans for these systems are 7.0, 6.8, 5.9, and 8.6, respectively (7.1 on the average), and for the geometric-electrostatic scans, they are 5.0, 7.3, 6.1, and 8.3, respectively (6.7 on the average). As expected, the average Z1 values obtained for the docking of strangers are lower than the corresponding values obtained for docking of disassembled and unbound structures: 9.3 and 7.8, respectively, for geometric docking and 8.4 and 7.1, respectively, for geometric-electrostatic docking. However, the ranges of the Z1 values obtained for the docking of strangers overlap the corresponding ranges for the docking of unbound structures. Interestingly, the docking results for strangers form clusters in only one case, the trypsin/barstar case, and the number of clusters increases when electrostatic complementarity is included. This is unlike the results obtained for the docking of unbound structures. Hence, the combination of statistical analysis and cluster analysis of the docking results may help to distinguish between biologically related and biologically unrelated molecules.
| Summary |
|---|
|
|
|---|
| Materials and methods |
|---|
|
|
|---|
/ß-hemoglobin (large and flat). In each system, we placed opposite charges on atoms at the interface, calculated the electrostatic potentials emanating from these charges, and then computed the combined geometric-electrostatic correlation scores at the experimental orientation for different values of Pmin. Notably, in the
/ß-hemoglobin system, there are no ion pairs at the interface. Therefore, only for these computations, we placed a negative charge on glutamine 127 in the ß-chain.
The electrostatic complementarity score is lower when Pmin increases, that is, when more potential spheres are omitted (data not shown). However, the dependence appears to be linear, indicating that docking calculations can be performed with a Pmin value of
3.0 kT/e (for the step representation of the potential), and application of an appropriate scale factor will restore the electrostatic score. Moreover, the slopes of the least-squares lines for four charge pairs from the three systems mentioned above are similar, indicating that this scale factor is independent of the absolute values of the charges, the distance between them and the shape of the interacting surfaces. Similar analyses using the continuous representation of the potential produce Pmin = 2.0.
Choosing the set of electrostatic charges and atomic radii in the Delphi computation of the electrostatic potential
We compared two sets of charges: the Formal set (charges placed on the formally charged side-chains lysine, arginine, histidine, glutamate, and aspartate and on the carboxy and amino termini) and the PARSE set of charges (Sitkoff 1994), in which partial charges are assigned to all the atoms. Starting from the coordinates of the disassembled trypsin/BPTI complex, a geometric rotation/translation scan was performed. Two nearly correct solutions were obtained, ranking 1 and 4 (12° deviation). Then the electrostatic potentials for each of the eight top-ranking solutions were calculated with either the Formal or the PARSE set of charges, and the combined geometric-electrostatic scores were calculated for several values of w.
It appears that when the potentials are calculated with the Formal charges, the electrostatic score is small and positive for the nearly correct solutions and negative for the false-positive solutions (data not shown). When PARSE charges are used, the electrostatic score is large and positive for the nearly correct solutions and small positive or negative for the false-positive solutions. Both sets of charges point out the nearly correct solutions. However, electrostatic potentials calculated with the PARSE charges allow more freedom in the choice of the value of w than potentials calculated with Formal charges. In addition, the number of potential spheres is considerably smaller when the PARSE potentials are used: 263,125 and 138,425 for trypsin and BPTI, respectively, compared with 419,561 and 204,967 for the Formal charges potentials. These results indicate that for our docking procedure the PARSE charges produce more adequate potentials than the Formal charges.
Optimization of f, the radius of the potential spheres
The series of computations described above was also used to verify our potential rotation procedure and to determine the value of f in equation 6
. As mentioned above, the electrostatic potential for molecule b was calculated for each of the eight top-ranking solutions. It was translated into potential spheres and projected onto the MolFit grid using three values of f: 0.9, 1.0, and 1.1. Alternatively, the electrostatic potential was calculated only once and translated into potential spheres, which were rotated as necessary and projected with the three different values of f. The linear correlation coefficients between the scores obtained with the exact and the rotated potentials are close to 1.0 (0.99, 0.99, and 0.97 for f = 0.9, 1.0, and 1.1, respectively), and the RMSD values are small (8.5, 7.0, and 11.6 score units, respectively), indicating that the rotation procedure is adequate. In addition, both measures indicate that the most appropriate value for f is 1.0. The same optimal value was obtained when the potentials were calculated with either the step representation or the continuous representation of the potential.
Optimization and verification of w
To obtain an estimate of w, which weighs the geometric and electrostatic contributions, we compared the top-ranking solutions from the full rotation/translation geometric scan and two geometric-electrostatic scans with different values of w. The scores of the nearly correct solutions increase as w increases. At the same time, the scores of several top-ranking false-positive solutions from the geometric scan decrease, or increase less steeply. However, often the scores of several low-ranking solutions from the geometric scan increase very steeply when electrostatics are included. These are new false-positive solutions that can be eliminated by limiting the value of w. Tests for the trypsin/BPTI,
ß-hemoglobin, and barnase/barstar indicate that when the continuous representation of the potential is used, w = 0.0015 is adequate for all three systems.
Similar analyses for the step representation of the electrostatic potential indicate an optimal value of 0.25 for w. This value is used in the rotation/translation scans for disassembled structures. However, analyses of the complementarity scores versus w for all the disassembled systems indicated that a larger value of w (0.35) might be more adequate. The larger value is used in all the rotation/translation scans for unbound systems.
The value of w for the step representation of the potential was reexamined for five unbound systems (1thm+2sec, 5cha+1ovo, 1bni+1bta, 2ptn+4pti, and 2ace+1fsc). For each system, we predicted the results of rotation/translation scans with different values of w, as described in the Algorithms section. The dependence of the predicted rank of the nearly correct solution on w for the five systems is shown in Figure 5
. The value w = 0.35 appears to be adequate. Moreover, the graphs have very wide minima, indicating that the geometric-electrostatic docking results are stable, and variations in w will not significantly improve the ranking of the nearly correct solutions. To ascertain the prediction procedure, we compare the predicted scores and ranks to corresponding values from real scans. For two systems, 1thm+2sec and 5cha+1ovo, we performed additional scans with w values predicted to be best for these systems (0.25 and 0.60, respectively, according to Fig. 5
). For the other three systems, 1bni+1bta, 2ptn+4pti, and 2ace+1fsc, in which w = 0.35 is optimal, additional scans were performed with nonoptimal w values (0.1, 0.45, and 0.50, respectively). The comparison in Table 5
shows that both the scores and the ranks of the nearly correct solutions were predicted correctly, supporting our verification procedure and the choice of the value of w.
|
|
| Acknowledgments |
|---|
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| References |
|---|
|
|
|---|