|
|
||||||||
1 Northwestern University, Department of Molecular Pharmacology and Biological Chemistry, Chicago, Illinois 60611, USA
2 Loyola University Chicago, Department of Physics, Chicago, Illinois 60626, USA
Reprint requests to: Brian K. Shoichet, Northwestern University, Department of Molecular Pharmacology and Biological Chemistry, 303 E. Chicago Ave., Chicago, Illinois 60611, USA; e-mail: b-shoichet{at}northwestern.edu; fax: (312) 503-5349.
(RECEIVED July 18, 2001; FINAL REVISION March 4, 2002; ACCEPTED March 5, 2002)
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.2830102.
| Abstract |
|---|
|
|
|---|
Keywords: Proteinprotein docking; mutant; combinatorial; flexibility
| Introduction |
|---|
|
|
|---|
Since the early 1990s, docking programs have been able to regenerate near-native structures of proteinprotein complexes using the complexed (bound) conformations of the two proteins (Cherfils et al. 1991; Shoichet and Kuntz 1991; Hart and Read 1992; Vakser 1995). The protein docking problem only becomes acute when docking the uncomplexed (unbound) conformations of the two proteins; these are the relevant states for true prediction (Totrov and Abagyan 1994; Vakser 1995; Jackson et al. 1998; Norel et al. 1999; Vakser et al. 1999; Camacho et al. 2000; Kimura et al. 2001). Although unbound proteins often adopt main-chain conformations similar to their bound counterparts, their solvent-exposed side chains commonly adopt conformations that are not complementary to their binding partner (Conte et al. 1999). Thus, when docking algorithms generate near-native complexes from the unbound conformations of the partners, atoms in the interface clash. Such near-native fits score poorly in classical, atomistic energy potentials because of these clashing atoms. Even a single noncomplementary atom can lead to very unfavorable energies because of the steepness of the steric repulsion term of the van der Waals energy (Weiner et al. 1984).
The problem of docking unbound proteins can be addressed either by explicitly sampling many conformations or by modifying the scoring function to accommodate clashes. Several methods have been published that use modified scoring functions. Soft docking (Jiang and Kim 1991; Palma et al. 2000), Fourier correlation (Gabb et al. 1997; Ritchie and Kemp 2000), and shape fitting methods (Shoichet and Kuntz 1991; Norel et al. 1994, 1995, 1999), smooth the details of proteinprotein interactions, thereby allowing clashes to occur, at least before minimization. Empirically derived, or trained, scoring functions (Weng et al. 1996; Moont et al. 1999; Palma et al. 2000) have also been used to address the protein docking problem. Although these methods allow near-native structures to be identified, they cannot reliably distinguish the near-native complexes from non-native complexes when docking unbound proteins. Vajda and coworkers have explored applying extensive minimization to a series of predocked complexes (Camacho et al. 2000). Although this has shown some promise in discriminating near-native complexes from non-native complexes, the discrimination is not always as convincing as one might like, and the high computational expense of the procedure makes it prohibitive for on the fly docking.
An alternative to avoiding the van der Waals violation problem with a trained or attenuated scoring function is to use a full atomistic scoring function and to consider multiple conformations of the docking proteins. In principle, a very large number of coordinated motions would have to be considered. In fact, the backbones of most proteins remain largely unchanged upon complex formation (Conte et al. 1999), and so one might be able to limit flexibility to side chains. Even assuming a rigid backbone, explicitly sampling the possible ligand side-chain conformations in protein docking might be difficult. For surface-exposed residues where the effect of excluded volume is small, the number of possible conformations increases as the power of the number of rotatable bonds. However, most side chains on the convex surface of a protein, especially a protein ligand, may be considered as independent, uncoupled rotors. This would reduce an exponential problem to one that is additive in the number of flexible side chains.
Here we consider the questions: "Can we discretely sample enough states to approximate the native complex?" and then "Are these near-native docked complexes distinguishable from non-native complexes in an atomistic energy potential?" We make two simplifying assumptions. First, only ligand side chains are flexible; we do not consider conformational changes in the backbone. Second, there is complete additivity among residues; the conformations for each residue are calculated assuming that they are independent of all other flexible residues. These assumptions, along with a hierarchical organization of side-chain conformations, allow us to implicitly consider at least 1040 ligand conformations while only explicitly representing hundreds. A simple extension of these ideas allows us to consider residue substitutions (mutant proteins) as well as residue conformations.
| Overview of the method |
|---|
|
|
|---|
|
As is typical for algorithms built off of the DOCK suite of programs (Kuntz et al. 1982; Ewing et al. 2001), the protein ligands are first oriented and then scored in the binding site. For each orientation of the ligand in the binding site the internally rigid portion of the ligand is docked, each conformation of each flexible residue of the ligand is fit into the site according to the rotation matrix found for the rigid fragment, and the best conformers of each residue are recombined to create a best-scoring ligand conformation for that orientation. Ligands are evaluated in an atomistic potential function composed of a Poisson-Boltzmann electrostatic term precalculated for the receptor using DelPhi (Gilson and Honig 1987) and a van der Waals term based on the AMBER potential (Weiner et al. 1984; Meng et al. 1992). In all of the docking calculations described here, only the ligand is made flexible; the receptor is held rigid. Although the receptor could also be made flexible, our current scoring method, based on a precalculated potential grid, makes this impracticable. Other approaches to receptor flexibility and scoring have been described (Schnecke et al. 1998; Claussen et al. 2001).
Throughout the paper, the inhibitors, ligands, and their associated mutants will be collectively referred to as the "ligand" and the pregenerated, hierarchically organized ensemble of ligand conformations will be referred to as the "flexible ligand." Additionally, all ligands are in their unbound conformations; no complexed ligands were used in docking.
| Results |
|---|
|
|
|---|
|
|
|
|
|
of the flexibly docked unbound form of the ligand and the bound form of the ligand was 2.9 Å, corresponding to a 70% conservation of native contacts observed in the complex.
TEM-1/BLIP
Docking the unbound conformation of BLIP as a rigid body, led to non-native complexes of TEM-1/BLIP having better energy scores than near-native complexes (Figs. 3A, 4![]()
). This was because two key interface residues on the unbound ligand, Asp49 and Phe142, are in the "wrong" conformations for optimal fit to TEM-1 (Fig. 5
). Consequently, the near-native complexes cannot be distinguished from the non-native docked complexes when docking to either the unbound or bound conformations of the receptor.
|
of the flexibly docked unbound form of the ligand and the bound form of the ligand was 2.4Å, corresponding to a 59% conservation of native contacts observed in the complex.
Subtilisin/chymotrypsin inhibitor 2
Subtilisin Novo (SN) from the complex with chymotrypsin inhibitor 2 (CI-2) was used as the bound receptor and subtilisin from Bacillus amyloliquefaciens (BAS) was used as the unbound receptor. The RMSD value for active site C
atoms between SN and BAS is 0.30 Å. The residues critical for binding are identical in both proteins.
Docking the rigid conformation of unbound CI-2 to both the unbound and bound receptor structures favored non-native binding modes (Figs. 3B, 4![]()
). This is because the P2 and P1 residues of CI-2, Thr58, and Met59, are in the "wrong" conformation; upon superposition of the unbound ligand onto the bound ligand, they clash with residues His64 and Ala152 of subtilisin (Fig. 6
). Multiple conformations were generated for the 12 residues of unbound CI-2 that had 60% or more of their surface area exposed. The P1' residue, Glu60 (48% exposed), was also made flexible. These 13 flexible residues resulted in 642 independent side chain conformations that were recombined during docking to produce up to 1017 conformations in each orientation. Docking multiple conformations of the unbound ligand favored near-native binding modes in both the unbound and bound receptor structures (Figs. 3B, 4![]()
). The RMSD between the C
of the flexibly docked unbound form of the ligand and the bound form of the ligand was 2.7 Å, corresponding to a 60% conservation of native contacts observed in the complex.
|
Cß. In the bound complex, this residue makes a hydrogen bond to His102 of barnase, a residue that contributes to the electrostatically driven binding of the two proteins (Buckle et al. 1994).
To investigate the role of ligand flexibility, conformations were calculated for the 19 residues on barstar that were more than 60% exposed, generating a total of 389 side chain conformations. These conformations were recombined during docking to produce up to 1021 conformations of the ligand in each orientation. Docking these multiple barstar conformations to the unbound barnase improved the scores of the near-native complexes relative to the non-native complexes (Fig. 3C
), but the non-native complexes still scored better than the near-native complexes (Fig. 4
). It was only when the unbound, multiconformer barstar was docked to the bound conformation of barnase that the near-native dockings could be distinguished from the non-native dockings (Fig. 3C
).
The conformational differences between the unbound and bound receptors were of greater importance in barnase than for other receptors. Different conformations of two key residues in the unbound and bound barnase cause a non-native complex to be favored. His102 of barnase hydrogen bonds with Asp39 and Gly31 of barstar, and a second residue, Arg59, packs tightly against residues Glu76 and Trp38 of barstar. The conformations of His102 and Arg59 in the unbound barnase, although not precluding interactions with barstar, do not allow these favorable interactions to occur. As a result,
-helix 2 of barstar still binds to the binding site of barnase but is flipped 180° (Fig. 7
). Intriguingly, a number of common interactions are observed between the crystallographically determined complex and the flipped ligand orientation generated from docking. Glu46 from the docked unbound barstar mimics interactions observed crystallographically by Asp36 from barstar (Fig. 7
); Trp38 from the docked structure occupies the same space that Trp45 from the complex does, and, in general, hydrogen bond donor positions in the complexed ligand structure are mimicked by donors in the flipped structure. The conformational problems in barnase are not present in the bound enzyme, allowing the multiconformer barstar to dock in high scoring, near-native configurations. The RMSD between the C
of the flexibly docked unbound form of the ligand and the bound form of the ligand was 18.3 Å, corresponding to an 8% conservation of native contacts observed in the complex.
|
|
of the flexibly docked unbound form of the ligand and the bound form of the ligand was 23.4 Å, corresponding to a 63% conservation of native contacts observed in the complex.
-Chymotrypsin/ovomucoid
The preferred binding mode of the rigid unbound ovomucoid to both the unbound and bound
-chymotrypsin receptor is a near-native pose. Surprisingly, even though a near-native complex is favored over non-native complexes when docking to the bound receptor, it is only favored by about one kcal/molea smaller discrimination than observed from docking to the unbound receptor (Figs. 3E, 4![]()
). This small preference for near-native complexes may partially owe to the bound structure used for docking. The bound structure was the complex between
-chymotrypsin and an ovomucoid that had a leucine as the P1 residue (Fujinaga et al. 1987) instead of the methionine found the in the unbound structure.
To investigate the role of ligand side chain flexibility, 147 conformations were generated from 11 residues (60% exposed) on ovomucoid. These were recombined during the docking calculation to produce up to 1011 conformations of the ligand in each orientation. The introduction of flexibility increased the energy separation of the native-like and non-native-like docked complexes from one to six kcal/mole when docking to the bound receptor (Figs. 3E, 4![]()
). When docking to the unbound form of the receptor, ligand flexibility decreased our ability to distinguish native from non-native structures from 25 to 17 kcal/mole. The addition of flexibility improved the docking score of non-native poses more than it did the docking score of near-native posesperhaps reflecting the unusually favorable scores of the near-native, rigid ligand dockings. Despite this improvement, near-native complexes are still clearly distinguished from the non-native complexes. The RMSD between the C
of the flexibly docked unbound form of the ligand and the bound form of the ligand was 1.6 Å, corresponding to an 83% conservation of native contacts observed in the complex.
Protease B/ovomucoid
All docking calculations with protease B were conducted using the bound form of the receptor because no unbound structure was found in the PDB. This structure of protease B was crystallized with a mutant ovomucoid that had an alanine instead of a methionine at the P1 position (Read et al. 1983). The unbound form of ovomucoid used for docking was the same as used in docking to
-chymotrypsin (Met18 at P1). As with
-chymotrypsin, the rigid ovomucoid was preferentially docked in a near-native pose. Introducing flexibility into this system, with 11 residues in up to 1011 conformations per orientation, increased our ability to distinguish native from non-native poses (Figs. 3F, 4![]()
).
Docking mutant ligands
A natural extension to combinatorially evaluating ligand residue conformations is to combinatorially evaluate ligand residue substitutions. We created point mutations in three ligands (Table 3
) and docked them to four different receptors. Docking all 20 amino acid substitutions at the P1 residue, residue 15 for BPTI (Fig. 9A, B
), and residue 18 for ovomucoid (Fig. 9EG
), generally took 2.5 to six times as long per orientation as docking a single molecule. Evaluating 15 selected substitutions at specific residues in BLIP (Fig. 9C, D
) took about the same amount of time per orientation as did docking a single molecule. Additionally, assuming complete additivity of the side chain conformations and substitutions, 20 substitutions of 10 loop residues (1013 mutant ligands) were docked for ovomucoid. Each of these 1013 ligands had about 1011 conformations (11 flexible residues distributed over the surface of the ligand each with about 10 conformations) were docked for ovomucoid. This calculation took about 15 times longer than docking a single protein (i.e, no substitutions) in a single conformation.
|
|
In all but one case, there was a significant positive correlation between the experimental result and the docking predictions. The slopes of the lines for both the unbound and bound
-chymotrypsin with ovomucoid (Fig. 9E, F
) and unbound tryspin with BPTI (Fig. 9A
) were close to 1. Slopes for the TEM-1 with BLIP (Fig. 9C, D
) and bound trypsin with BPTI (Fig. 9B
) were significantly greater than one (Fig. 9G
). There was no correlation (R2 = 0.03) between the binding affinities reported by the docking program and the experimental values for ovomucoid mutants binding to protease B. For all other systems, docking to both the unbound and bound receptors, R2 values ranged from 0.50 to 0.91.
With either Arg or Lys as the P1 residue, BPTI has been shown to bind to trypsin with an affinity that is five orders of magnitude greater than other amino acids at this position (Krowarsch et al. 1999). Consistent with this finding, our results from simultaneously docking all 20 amino acid variants of BPTI to both the bound and unbound structures of the receptor indicate a clear preference for the basic amino acids (Fig. 9A, B
). A trend for the remaining amino acids is present with Thr, Trp, His, and Gln being the farthest outliers when docking to the unbound receptor.
Two different research groups (Huang et al. 2000; Selzer et al. 2000) experimentally determined binding affinities for a total of 15 mutants of BLIP binding to TEM-1. For both sets of mutants, the relative dock energy scores correlate well with experimental results, indicating a binding preference for polar or charged mutants. The 15 mutations were partitioned into two sets of data because of differences in experimental baseline binding affinities determined by the two groups. Only the 11 data points corresponding to the results of Schreiber and coworkers have been plotted for ease of viewing (Fig. 9C, D
).
Binding affinity trends from docking the 20 P1 variants of ovomucoid to the unbound and bound structures of
-chymotrypsin (Fig. 9E, F
) indicate a distinct correlation with experiment (Lu et al. 1997), with the larger, nonpolar amino acids being favored at this position in both the docking predictions and the experimental results. On the other hand, no correlation was observed between the experimental binding affinities and the predicted bindings for ovomucoid with protease B (Fig. 9G
). In the docked structures, the different P1 residues fold back upon the surface of the inhibitor, rather than "down" into an S1 pocket, possibly diminishing specificity interactions with the enzyme.
| Discussion |
|---|
|
|
|---|
For all seven systems, the addition of ligand residue flexibility was sufficient to produce low energy, near-native complexes when docking the unbound ligands to their cognate bound receptors (Fig. 4
). When the unbound ligand structures were docked as rigid bodies, without flexible side chains, this was not the case; the non-native docked complexes scored better than the near-native complexes in all cases except the ovomucoid dockings. In every system, adding flexibility to the ligand improved the scores of the near-native complexes relative to the non-native complexes with the exception of ovomucoid docked to the unbound structure of
chymotrypsin. For this system, although the near-native complexes were clearly distinguished from the non-native, the introduction of flexibility slightly decreased our ability to discriminate between the two. This was probably due to our failure to account for different intramolecular energies in the different ligand conformations. These results suggest that adding ligand flexibility is important, and in some systems sufficient, to distinguish near-native from non-native complexes in proteinprotein docking. We find that many conformations need to be sampled; a hierarchical representation of conformational possibilities provides one method to do so.
The challenge of explicitly sampling the enormous number of conformations accessible to protein ligands has led to the introduction of modified scoring schemes. In particular, the van der Waals component of the interaction energy between two proteins is exquisitely sensitive to conformation. A single atom positioned a fraction of an Angstrom too close to the receptor can lead to very large repulsive terms. Given the coarseness with which we sample conformations and the fact that we do not allow backbone flexibility, it was entirely possible that our conformations might not complement the receptor. Our results indicate that, at least for conformational changes that are largely restricted to ligand side chains, coarse conformational sampling, combined with a large amount of orientation sampling, is sufficient to reproduce near-native complexes. Additionally, these complexes are similar enough to the native complex that they score well and are distinguishable from non-native complexes when using a Lennard-Jones 612 potential term combined with Poisson-Boltzmann electrostatics for our scoring function. The Lennard-Jones term, though unforgiving, appears to help discriminate the near-native complexes from others.
Although adding ligand conformational flexibility seems to be important, it is clearly not sufficient in all systems. Of the six unbound receptors docked (no unbound structure for protease B was found), two favored non-native docked complexes. Because our docking to the bound conformations of these receptors produced "correct" answers, we attribute the preference for a non-native binding mode to a conformational change between the bound and unbound receptors.
An interesting question in proteinprotein interactions is how individual side chains contribute to overall binding affinity. To evaluate a large number of possibilities, investigators have turned to combinatorial methods of exploring side-chain diversity, such as phage display. It occurred to us that the same hierarchical method that allowed for efficient recombination of side-chain conformations would also allow us to recombine side-chain substitutions, in effect recapitulating the combinatorial experiments in the docking calculations (Table 3
).
For three of the four inhibitor/enzyme systems, docking to both unbound and bound receptors, there was a significant correlation between the predicted and the experimental binding affinities. In docking mutants of BLIP to the unbound TEM-1, the R2 value was 0.91; for docking ovomucoid to unbound
-chymotrypsin, the correlation was 0.75. A high correlation was also observed between predicted and experimental energies for docking BPTI mutants into unbound trypsin. Considering the failure to fully treat desolvation and lack of receptor accommodations, these correlations are surprisingly good, and should not be expected to generally hold. Indeed, for docking ovomucoid to protease B, there is essentially no correlation. This may owe partly to the bound structure of protease B used for docking. The protease B/ovomucoid complex with an alanine mutant at P1 (rather than the native leucine or methionine) may have resulted in a more constrained S1 pocket, preventing other residues from binding. The relatively shallow slope (0.33) supports this. A more complete treatment of mutant desolvation (Lamb et al. 2001) and the possible addition of receptor flexibility may make this method more useful for considering residue substitutions to complement a receptor of known structure.
Several caveats deserve mention. The method assumes additivity in recombining conformations from different residues that are independently generated and evaluated in the receptor site. Without this assumption, the method would suffer a combinatorial explosion and the problem would be intractable, at least as we have represented it. For highly exposed residues on convex surfaces, additivity is a reasonable assumption. Although it will never be strictly true, additivity has been observed in several ligand protein interfaces (Wells 1990; Weiss et al. 2000; Lu et al. 2001). Violations to additivity will occur when the residues being sampled are close enough to one another that they can significantly influence each other's internal energies. When this happens, this method will lead to unreasonable results. Indeed, even when residues may be assumed to be independent of each other, we found that it was still important to consider the internal energies of the different conformations being calculated. For instance, mobilizing residues that were more than 50% buried often led to spurious results. A better integration of internal energies with interaction energies would make this approach more robust. Similarly, it is clear that the absolute energies of the docked complexes were often over estimated in magnitude. As in small-molecule docking (Shoichet and Kuntz 1996) predicting absolute binding energies in proteinprotein docking remains problematic. Key components of the interaction energy, such as the cost of desolvating the receptor, have been ignored in our calculations. In favorable circumstances, we might hope for a monotonic relationship between docking energy and experimental binding affinity. It is not clear to us whether the sometimes high correlations that were observed between docking and experimental energies for the mutant ligands will be extensible to other systems.
The method of generating and recombining conformations is well suited to side-chain rotamers, and one might imagine, extending it to small loop movements as well. Other methods, including the use of rotamer libraries, might work as well or better than using SYBYL to calculate low-energy side-chain conformations. More generally, it seems clear that the current method will not address global changes in the conformation of the ligand or the receptor, where the additivity assumption will break down. Finally, it should be clear that this method is best suited for the independent movement of side chains, and may break down when substantial backbone movements are involved in the docking event.
These limitations notwithstanding, these studies suggest that it is feasible to sample a large number of protein ligand side-chain conformations, and that doing so significantly improves our ability to distinguish near-native from non-native configurations in proteinprotein docking. Although there is clearly room for investigation of alternative scoring schemes, the use of a standard Lennard-Jones term may have improved the signal to noise in these calculations by discarding many nonphysical configurations early in the docking calculation. Considering receptor side chain flexibility may further extend these results. For proteinprotein complexes that form without significant main-chain accommodation, sampling side-chain conformations and scoring with a classical energy function may be sufficient to address the protein docking problem.
| Materials and methods |
|---|
|
|
|---|
Selection of docking systems
All structures were taken from the PDB except for the BLIP and TEM-1 structures that were obtained as part of an earlier docking prediction test (Strynadka et al. 1996). Except for protease B/ovomucoid, we chose systems that had crystal structures for the complex as well as unbound structures for both partners (Table 1
). Where multiple records were available for the same structure, we looked for the most recent, highest resolution, and most complete structure. The exception to this rule was that of the lysozyme structure, which was selected for its backbone similarity to the complexed lysozyme structure we chose. Compared to some of the unbound ligand structures, the lysozyme in the FAB complex undergoes significant main-chain accommodations. We excluded these ligand structures because our method has not been used to sample main-chain conformational changes. This is a limitation of the method as currently implemented. Selection of residues for mutation was based on the availability of experimental data (Table 3
).
Preparation of the receptors and energy grids
The systems were prepared for docking using a script to ensure uniform treatment. All water molecules were removed from the structure. Where alternate conformations for side chains existed, the first conformation was selected. Only the known interface region of the receptor proteins was targeted for docking. The His57 residues of the serine proteases were protonated on the N
atoms, consistent with their role in the catalytic triad. Each receptor was then protonated using the Biopolymers package of SYBYL 6.6 (Tripos Software), and the atoms were renamed using the AMBER naming convention. No other modifications were made to either the unbound or bound receptors. The program CHEMGRID (Meng et al. 1992) was used to generate a van der Waals potential grid based on a standard Lennard-Jones 612 potential around the area of the binding site. For the receptor, a molecular surface was generated for all residues within 15 Å of a key active-site residue, and this surface was used by SPHGEN (Kuntz et al. 1982) to generate spheres representing potential ligand atom locations. The locations of these spheres were used to define a region of low dielectric in the active site of the receptorconsistent with the ultimate occupancy of this site by the ligand (Shoichet and Kuntz 1993; Lorber and Shoichet 1998). DelPhi (Gilson and Honig 1987) produced an electrostatic potential grid based on the receptor structure and all of the spheres generated by SPHGEN. DISTMAP (Shoichet et al. 1992) was run on each receptor to generate a shape complementary grid. A subset of the SPHGEN spheres was analytically selected (Shoichet et al. 1992) for use in orienting the ligand. A second, larger subset of spheres was selected for orientation refinement using focusing (Shoichet et al. 1992).
Preparation of the ligands
All ligand surfaces were docked into the active sites of the receptors, unless otherwise noted. Ligand backbone atoms with B-factors greater than 80 and their corresponding side chains (residues 6065 of barstar and 101103 and 128129 of lysozyme) were removed. The program ACCESS (Lee and Richards 1971) was used to determine the solvent accessible area of ligand residues. Residues over 60% exposed were treated as flexible; all others were unchanged from the experimental structure unless otherwise noted. The program SPHGEN was used to generate spheres filling the volume of the ligand. Subsets of these spheres were used for matching at the ligand orientation stage of docking.
Multiple conformations were calculated for each exposed side chain in SYBYL using a systematic search. The key to this conformer generation is that all conformations are generated in the same reference frame. This extends an earlier method for docking multiple ligand conformations (Lorber and Shoichet 1998), significant differences being that previously neither recombination nor a full atomistic potential function was used. By requiring all conformations to be in the same reference frame, a single rotation matrix can be used to bring the entire ensemble of conformations into the binding site. When generating conformations of a given side chain, all other side chains were present in their unbound crystallographic conformation. Rotatable bonds for aromatic amino acids were rotated in 30° increments, all other C
Cß bonds were rotated in 60° increments, and all remaining rotatable bonds were rotated in 120° increments. Internal energies were calculated for each conformation, taking into account electrostatic and van der Waals terms; the sum of these terms was constrained to within 10 kcal/mole of the minimum energy conformation. All conformations meeting the energy requirements were used in the docking calculations. Atoms with only one conformation (main-chain and Cß atoms, and buried residues) were defined as the rigid fragment of the molecule. The remaining flexible atoms were converted to hierarchy format. This typically resulted in seven to 40 residues with multiple conformations.
Treatment of the mutants
The biopolymers package in SYBYL was used to substitute selected residues, and a systematic search generated multiple conformations. Fifteen mutants were explicitly created for BLIP (Huang et al. 2000; Selzer et al. 2000), and 25 were explicitly created for ovomucoid and BPTI. The 25 mutants included charged and neutral representations for Asp, Glu, Arg, Lys, and His. Desolvation values for each residue were calculated using HYDREN (Rashin and Namboodiri 1987). The value for glycine was then subtracted from the desolvation of each residue to determine a relative desolvation value for the side chain. If, upon docking, the residue was in contact with the receptor, the full desolvation energy was subtracted from the substituted residue's energy score.
Generating sphere sets and focusing
Ligand orientations were determined using the sphere matching method (Shoichet et al. 1992). An initial sphere set and at least one larger sphere set were used for each receptor. Because each ligand was relatively large, the number of resulting spheres used to produce orientations was also large. To limit the exponential growth of the number of orientations, the ligand surface and spheres were spatially divided in up to four smaller subclusters to represent different parts of the ligand. These subclusters were each docked sequentially, and their results summed. Alternate surfaces of the ligand barstar were not evaluated because the other convex regions of the ligand had B-factors greater than 80 on main-chain atoms, and were hence removed from the calculation. Similarly, alternate surfaces were not docked for lysozyme. The alternate convex regions were localized near residues 128129 and 101103. The 132L structure has both backbone and side-chain atoms in these regions with B-factors of 100. These residues were omitted from the docking calculations, and these interfaces were not explicitly docked to FAB. Like the receptor, another larger set of spheres was generated in the region of each subcluster. These larger sets were used in orientation refinement through focusing (Shoichet et al. 1992) to allow additional orientations, in the neighborhood of an early, potentially favorable orientation, to be sampled. The number of orientations and times reported for docking reflects the sum of the multiple independent docking runs using different surfaces of the ligands. When docking the mutants, the orientation search space was limited to the sphere cluster nearest the mutation.
Docking scheme
For each orientation of the ligand in the binding site, DOCK places and scores the rigid fragment and then attempts to build each side chain. At this stage each side chain is explored only until one conformation meets the docking requirements. After the algorithm has confirmed that at least one side chain can be built at each position, the remaining side chain conformations are explored. All conformations of each side chain are explored, pruning only when clashes occur with the binding site. The best scoring conformation of each side chain is saved. After all conformations of all side chains have been evaluated, the best side chain conformations are recombined to produce the best ligand conformation for each orientation. The recombination process is repeated for each residue substitution. The entire build-up process is repeated for each orientation of the ligand.
Once the docking calculations were completed, the best scoring near-native and non-native docked complexes were displayed graphically and evaluated to ensure there were no clashing groups due to violations of the additivity assumptions or to molecules extending outside of the energy grids, and hence, not being fully scored. In one case, docking CI-2 to the unbound conformation of subtilisin, two flexible side chains, Lys72 and Leu73, were found in conformations that clashed with each other; such conformations were removed manually from the docking results.
| Acknowledgments |
|---|
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| References |
|---|
|
|
|---|
Camacho, C.J., Gatchell, D.W., Kimura, S.R., and Vajda, S. 2000. Scoring docked conformations generated by rigid-body proteinprotein docking. Proteins 40: 525537.[CrossRef][Medline]
Cherfils, J., Duquerroy, S., and Janin, J. 1991. Proteinprotein recognition analyzed by docking simulation. Proteins 11: 271280.[CrossRef][Medline]
Claussen, H., Buning, C., Rarey, M., and Lengauer, T. 2001. FlexE: Efficient molecular docking considering protein structure variations. J. Mol. Biol. 308: 377395.[CrossRef][Medline]
Connolly, M.L. 1986. Shape complementarity at the hemoglobin alpha 1 beta 1 subunit interface. Biopolymers 25: 12291247.[CrossRef][Medline]
Conte, L.L., Chothia, C., and Janin, J. 1999. The atomic structure of proteinprotein recognition sites. J. Mol. Biol. 285: 21772198.[CrossRef][Medline]
Ewing, T.J., Makino, S., Skillman, A.G., and Kuntz, I.D. 2001. DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases. J. Comput. Aided Mol. Des. 15: 411428.[CrossRef][Medline]
Fujinaga, M., Sielecki, A.R., Read, R.J., Ardelt, W., Laskowski, M., Jr., and James, M.N. 1987. Crystal and molecular structures of the complex of alpha-chymotrypsin with its inhibitor turkey ovomucoid third domain at 1.8 A resolution. J. Mol. Biol. 195: 397418.[CrossRef][Medline]
Gabb, H.A., Jackson, R.M., and Sternberg, M.J. 1997. Modelling protein docking using shape complementarity, electrostatics and biochemical information. J. Mol. Biol. 272: 106120.[CrossRef][Medline]
Gilson, M.K. and Honig, B.H. 1987. Calculation of electrostatic potentials in an enzyme active site. Nature 330: 8486.[CrossRef][Medline]
Hart, T.N. and Read, R.J. 1992. A multiple-start Monte Carlo docking method. Proteins 13: 206222.[CrossRef][Medline]
Huang, W., Zhang, Z., and Palzkill, T. 2000. Design of potent beta-lactamase inhibitors by phage display of beta-lactamase inhibitory protein. J. Biol. Chem. 275: 1496414968.
Jackson, R.M., Gabb, H.A., and Sternberg, M.J. 1998. Rapid refinement of protein interfaces incorporating solvation: Application to the docking problem. J. Mol. Biol. 276: 265285.[CrossRef][Medline]
Jiang, F. and Kim, S.H. 1991. "Soft docking": matching of molecular surface cubes. J. Mol. Biol. 219: 79102.[CrossRef][Medline]
Kimura, S.R., Brower, R.C., Vajda, S., and Camacho, C.J. 2001. Dynamical view of the positions of key side chains in proteinprotein recognition. Biophys. J. 80: 635642.
Krowarsch, D., Dadlez, M., Buczek, O., Krokoszynska, I., Smalas, A.O., and Otlewski, J. 1999. Interscaffolding additivity: Binding of P1 variants of bovine pancreatic trypsin inhibitor to four serine proteases. J. Mol. Biol. 289: 175186.[CrossRef][Medline]
Kuntz, I.D., Blaney, J.M., Oatley, S.J., Langridge, R., and Ferrin, T.E. 1982. A geometric approach to macromoleculeligand interactions. J. Mol. Biol. 161: 269288.[CrossRef][Medline]
Lamb, M.L., Burdick, K.W., Toba, S., Young, M.M., Skillman, A.G., Zou, X., Arnold, J.R., and Kuntz, I.D. 2001. Design, docking, and evaluation of multiple libraries against multiple targets. Proteins 42: 296318.[CrossRef][Medline]
Lee, B. and Richards, F.M. 1971. The interpretation of protein structures: Estimation of static accessibility. J. Mol. Biol. 55: 379400.[CrossRef][Medline]
Lorber, D.M. and Shoichet, B.K. 1998. Flexible ligand docking using conformational ensembles. Protein Sci. 7: 938950.[Abstract]
Lu, S.M., Lu, W., Qasim, M.A., Anderson, S., Apostol, I., Ardelt, W., Bigler, T., Chiang, Y.W., Cook, J., James, M.N., Kato, I., Kelly, C., Kohr, W., Komiyama, T., Lin, T.-Y., Ogawa, M., Otlewski, J., Park, S.-J., Qasim, S., Ranjbar, M., Tashiro, M., Warne, N., Whatley, H., Wieczovek, M., Wilusz, T., Wynn, R., Zhang, W., and Laskowski, Jr., M. 2001. Predicting the reactivity of proteins from their sequence alone: Kazal family of protein inhibitors of serine proteinases. Proc. Natl. Acad. Sci. 98: 14101415.
Lu, W., Apostol, I., Qasim, M.A., Warne, N., Wynn, R., Zhang, W.L., Anderson, S., Chiang, Y.W., Ogin, E., Rothberg, I., Ryan, K., and Laskowski, Jr., M. 1997. Binding of amino acid side-chains to S1 cavities of serine proteinases. J. Mol. Biol. 266: 441461.