|
|
||||||||
1 Department of Computer Science, Dartmouth College, Hanover, New Hampshire 03755, USA
2 Department of Computer Science, Duke University, Durham, North Carolina 27708, USA
3 Department of Biochemistry, School of Medicine, Duke University Medical Center, Durham, North Carolina 27708, USA
(RECEIVED July 5, 2006; FINAL REVISION September 25, 2006; ACCEPTED October 5, 2006)
| Abstract |
|---|
|
|
|---|
Keywords: nuclear Overhauser effect (NOE) assignment; nuclear magnetic resonance (NMR) spectroscopy; protein complex structure determination; homo-oligomer; symmetry; complete search; configuration space; ambiguity; hierarchical subdivision
| Introduction |
|---|
|
|
|---|
There can be several sources of ambiguity in assigning an intersubunit NOE to a pair of atoms: intra- versus intersubunit ambiguity, whether the restrained atoms are within the same subunit in the complex or in different ones; subunit ambiguity, the subunits to which the restrained atoms of an intersubunit NOE belong; atom ambiguity, the identities of the restrained atoms among those with chemical shifts similar to the NOE peak; and finally ambiguity caused by spurious NOEs (NOEs for which all assignments are wrong). Intra- versus intersubunit ambiguity can be resolved experimentally, by a combined analysis of 3D 15N-edited, double 13C-filtered NOESY of a mixed sample and standard 3D 13C-edited NOESY of a homogeneous 15N, 13C-labeled sample (Walters et al. 1997; Zwahlen et al. 1997). Atom ambiguity can (in principle) be resolved experimentally by using 4D or higher dimensional NOESY spectra (Kay et al. 1990; Fiorito et al. 2006). However, in general, subunit ambiguity, and atom ambiguity for 3D NOESY spectra, require computational solutions. This paper presents the first complete algorithm for resolving subunit and atom ambiguity to assign intersubunit NOEs and determine the structure of a Cn symmetric homo-oligomer, given the subunit structure. By complete, we mean that our algorithm identifies structures representing to, within a user-defined similarity level, any structure consistent with available data (ambiguous or not). Here, by subunit structure we mean the "bound" structure of a monomer within the complex. The disambiguation of intra- versus intersubunit ambiguity makes it possible to determine the bound subunit structure prior to computing the oligomeric assembly (Oxenoid and Chou 2005).
Subunit ambiguity is a fundamental problem in dealing with homo-oligomers. While in some special cases it might be possible to manually resolve the ambiguity by identifying a self-consistent direction (e.g., in coiled coils; Wiltscheck et al. 1997), in general an NOE could involve a subunit i and any one of subunits i + 1, i + 2, ... We call a determination of the relative positions of the subunits the "subunit assignment" of the NOE. For an n-mer with r NOEs, there are (n 1) r possible combinations of subunit assignments, ruling out a naïve generate-and-test approach. Most previous attempts at resolving subunit ambiguity have done so simultaneously with resolving atom ambiguity, and have employed heuristic techniques (see below). As we discuss below, these approaches risk being trapped in local minima and missing correct assignments. A docking-based approach that sampled conformations could miss the native conformation and would require explicit enumeration of all combinations of assignments. One previous approach that rigorously deals with subunit ambiguity is AMBIPACK (Wang et al. 1998), which uses a branch-and-bound algorithm to compute rigid body transformations satisfying potentially ambiguous intersubunit distance restraints. AMBIPACK explicitly enumerates combinations of subunit assignments, but only for a limited number of the least ambiguous NOEs. However, unlike our approach, AMBIPACK does not take advantage of the "closed-ring" kinematics of symmetric homo-oligomers. Consequently, AMBIPACK uses randomized numerical techniques, and hence fails to be complete in identifying consistent structures, whereas our approach is complete in characterizing the entire space of consistent structures. Furthermore, AMBIPACK handles only a special case of subunit ambiguity, namely (counter)clockwise subunit ambiguity. By (counter)clockwise subunit ambiguity we mean that an NOE could either go clockwise or counterclockwise in a ring of monomers, resulting in two possible assignments. In the general case, arbitrary subunit ambiguity means that an NOE could go to any of the other n 1 monomers, resulting in n 1 possible assignments. Also, the presented AMBIPACK algorithm and results do not consider atom-ambiguous or spurious NOEs. On the other hand, our approach is able to resolve both arbitrary subunit ambiguity and atom ambiguity. We also show that our approach is able to handle spurious NOEs.
Chemical shift degeneracy in NMR spectra results in ambiguity as to which atoms are interacting in an NOE. We call a determination of the atoms the "atom assignment" of the NOE. A number of techniques, such as NOAH/DIAMOND (Xu et al. 1999), AUTOSTRUCTURE (Huang et al. 2003), and PASD (Juszewski et al. 2004), have been developed to address atom ambiguity in the context of monomer structure determination. These techniques follow an iterative assignment strategy: an initial set of unambiguous NOEs is used to generate some number of structures, which are then used to evaluate the consistency of atom assignments of the remaining ambiguous NOEs. The NOEs that are inconsistent with the ensemble are pruned. The remaining NOEs are incorporated in subsequent rounds of structure ensemble determination, and the process is repeated until no further improvements in atom assignments or structures can be obtained. An alternative approach to assign NOE restraints in monomer structure determination using a rotamer ensemble-based algorithm and residual dipolar couplings was developed by Wang and Donald (2005).
Previous approaches for dealing with both subunit and atom ambiguity in symmetric homo-oligomers perform a tight loop of NOE assignment and structure determination as described above for monomers. (Henceforth, when we refer to NOE assignment we mean intersubunit NOE assignment.) Generation of structures consistent with a subset of NOEs is done by heuristic techniques such as simulated annealing, with additional constraints to ensure symmetry, e.g., by penalizing differences between the subunits (Revington et al. 2005) or by fixing multiple copies of the backbone and allowing only the side chains to move to fit the restraints (Oxenoid and Chou 2005). DYANA (Güntert et al. 1997) and CYANA (Herrmann et al. 2002; Güntert 2004) support this approach for homo-dimer structure determination, while the symmetry-ADR method in ARIA (Nilges 1993; O'Donoghue et al. 1993, 2000; Nilges et al. 1997) implemented in CNS/XPLOR (Brünger 1993; Brünger et al. 1998) is applicable to higher order oligomers.
The iterative techniques discussed above all follow a best-first strategy without backtracking, and thus must contend with the problem of multiple local minima. The large rearrangements potentially required to escape local minima are difficult to identify by local search techniques. The order in which assignments are chosen may affect the outcome, because the determined ensemble attempts to satisfy the chosen assignments, and the ensemble is then used to evaluate subsequent assignments. For example, CYANA uses the heuristic of choosing the most consistent NOEs at each iteration, but of course, this does not guarantee correctness. To ensure settling into the proper minimum, these techniques typically require a significant amount of over-restraint (e.g., ARIA requires >5 NOEs per residue). However, with higher order oligomers, available intersubunit data can be quite sparse, due to the size of the proteins and the decrease in sensitivity of peaks, along with the decrease in peak intensity due to double filtering and editing strategies. Finally, as the number of ambiguous NOEs increases, the ruggedness of the potential landscape increases, making it much more difficult and time-consuming to find valid solutions. Consequently, the level of atom ambiguity that current techniques can handle is also limited (e.g., CYANA requires a 1H chemical shift match tolerance
0.03 ppm). Furthermore, the presence of subunit ambiguity in higher oligomers makes all NOEs inherently ambiguous and exacerbates the landscape ruggedness.
The ambiguity resolution algorithm developed here is an extension of our complete, data-driven search algorithm for Cn symmetric homo-oligomeric structure determination (Potluri et al. 2006). The key insight of that paper is that, given the structure of a subunit, the position and orientation of the symmetry axis determines the structure of a homo-oligomer with Cn symmetry: rotate the subunit n times by the angle of symmetry (360°/n) around the symmetry axis. Each possible conformation of the symmetric homo-oligomer is represented by a point in the four-dimensional symmetry configuration space (SCS), S 2 x R 2 (orientation cross-position). The earlier paper then presents a branch-and-bound structure determination algorithm that identifies a set of 4D cells (hypercuboid regions in SCS), referred to as "consistent regions," ultimately identifying all conformations consistent with the data. However, our previous algorithm fails to handle atom ambiguity and arbitrary subunit ambiguity (subunit ambiguity involving nonadjacent subunits). As a result, it cannot use the additional constraint provided by all the ambiguous NOEs. The present paper significantly extends our earlier work with an algorithm that resolves atom and arbitrary subunit ambiguities. By exploiting the information in ambiguous NOEs, our new algorithm reduces the average variance in structures (over our previous approach) by a factor of 1.5 in our first test case (MinE) and a factor of 3.6 in our second test case (CCMP). These improvements make our approach applicable to the structure determination of a broad range of homo-oligomeric complexes.
In contrast with existing techniques, our approach is complete. (Recall that by complete we mean that it identifies structures representing, to within a user-defined similarity level,
0 Å, any structure consistent with the available data). We guarantee that the native conformations are always within
0 Å to at least one of the representative structures. Furthermore, the results of our algorithm are deterministic and not affected by the order in which NOEs are considered. In addition, while most approaches try to find one assignment for each ambiguous NOE restraint, our algorithm recognizes when additional ambiguity resolution will not improve structural precision and identifies the diminishing information content in NOEs. It might provably be the case that some NOEs do not need to be assigned. On the other hand, our approach can also recognize when different assignments lead to different structural classes, a possibility ignored by all other approaches. We also show that our approach can handle a reasonable number of spurious NOEs. In short, our ambiguity resolution algorithm extracts the information available from ambiguous NOEs, without bias in the possible assignments. Yet, it is still efficient, requiring time only linear in the number of ambiguous NOEs, the number of possible assignments for each, and the set of 4D cells in the SCS. Note that our approach focuses on intersubunit NOE assignment and currently cannot be applied to intrasubunit NOE assignment for monomer structure determination.
| Results |
|---|
|
|
|---|
Problem formulation and approach
The problem that we must solve is the following. We are given a set of intersubunit NOEs (each possibly atom- and subunit-ambiguous), the subunit structure and the oligomeric number. Our goal is to determine a mutually consistent set of NOE assignments and SCS regions, such that the regions represent all conformations consistent with the assignments, and the assignments are all those consistent with the regions. Recall from the introduction that the SCS (symmetry configuration space) is the space of all symmetry axis parameters, each defining a homo-oligomeric complex structure. Thus, we seek to simultaneously assign NOEs and determine structure (by identifying regions in SCS), in a manner that guarantees that we find all consistent assignments and structures. In order to make such a guarantee, we must carefully formulate what it means for an NOE, or one of its possible assignments, to be inconsistent with respect to a given set of NOEs and the structures that they define. In the process, we are also able to formulate what it means to be redundant. In addition to helping us prove (see Materials and Methods) that our approach is correct, these formulations allow us to characterize (in the rest of the Results) the information content provided by a set of NOEs regarding a homo-oligomeric structure.
Let R be a set of (possibly ambiguous) NOEs. Let rkl represent the l th possible assignment of k th NOE of R, and let skl
S 2 x R 2 be the region of the SCS in which rkl is satisfied. (Recall that an assignment specifies the subunit and the atom involved in the NOE.) A determined structure must satisfy all ambiguous NOEs; it satisfies each by satisfying one or more of its possible assignments. In the SCS, this translates into finding the region p(R) defined as:
|
|
where tk is the number of possible assignments for the k th NOE.
Using Equation 1, we can give the definition for redundant and inconsistent NOEs and assignments. Let Q
R be a set of NOEs. Let rk be an (arbitrary) ambiguous NOE (not in Q) with tk assignments. As above, let rkl be its l th possible assignment and skl be the region of the SCS in which the distance restraint rkl is satisfied. We say rkl is redundant with respect to Q if and only if p(Q)
skl . We say rkl is inconsistent with respect to Q if and only if p(Q)
skl = Ø. We can extend these notions from assignments to NOEs as follows. Let Sk
l1tk skl be the region of the SCS which satisfies at least one assignment of rk (the k th NOE). We then declare rk to be redundant with respect to Q if and only if p(Q)
Sk , and declare rk to be inconsistent with respect to Q if and only if p(Q)
Sk = Ø. Our algorithm eliminates inconsistent NOEs and assignments, and is able to detect redundant ones.
Our approach to identifying the mutually consistent, complete set of NOE assignments and SCS regions is summarized in Figure 1 and described in detail in the Materials and Methods. The following is a brief description of our approach. Our algorithm has two phases. In the first phase, branch-and-bound, we perform a search of the SCS using a subset of the NOEs (we show below that those with no atom ambiguity, if available, are the best choice). This phase identifies the consistent regions (a subset of the SCS), which contain symmetry axes defining all complexes consistent with the chosen subset of the NOEs. In the second phase, ambiguity resolution, we sequentially consider ambiguous NOEs and their possible assignments, using an ordering criterion that chooses informative NOEs. This phase identifies the ambiguity-consistent regions (ACR), which contain symmetry axes defining all complexes consistent with some assignment for each NOE. As we formally defined above, this process allows us to label assignments as inconsistent (violated everywhere in the ambiguity-consistent regions) or redundant (does not provide any additional constraint for the regions).
|
0 Å RMSD (the user-defined similarity level) of at least one representative structure. We refer to those representative structures that have good van der Waals packing as "well-packed satisfying" (WPS) structures. The average variance in atomic positions in these conformations allows us to evaluate the structural precision attained by the two phases of the algorithm.
Ambiguity resolution and structure determination of homo-dimeric MinE
The homo-dimeric topological specificity domain of Escherichia coli MinE (King et al. 2000) has a novel dimeric 
-sandwich fold that has previously been determined (PDB ID 1EV0) by using both DYANA (Güntert et al. 1997) and the NCS-symmetry potential of XPLOR (with the ARIA method) (Brünger et al. 1998). Each subunit has 50 residues, and a total of 183 intersubunit NOE restraints were deposited. We use as the subunit the first chain of the reference structure (stated by the authors to be the best representative conformer). Subunit ambiguity is absent in homo-dimers. Atom ambiguity was simulated for each of the intersubunit NOE restraints, according to the deposited chemical shifts, using a chemical shift match tolerance for 1H of
= 0.04 ppm. This resulted in atom ambiguity for 168 of the 183 NOEs, as illustrated in Figure 2A. Note that there is ambiguity as high as 12 (i.e., 12 possible assignments) for two of the NOEs.
|
90% of the reduction in volume occurred after three NOEs were chosen, an indication that our ordering criterion does focus on more informative NOEs. Our ambiguity resolution algorithm eliminates the inconsistent assignments and Figure 2B shows the number of possible assignments remaining for each NOE after all the inconsistent assignments have been eliminated. On comparison with Figure 2A, we see that the ambiguity is considerably reduced, from an average of 5.0 assignments per NOE to an average of 1.9.
|
|
|
= 0.04 ppm). Figure 6A illustrates the number of assignments that arise from subunit and atom ambiguity for each of the 49 NOEs. Two of the NOEs have as many as 36 possible assignments, when considering the combination of subunit and atom ambiguity.
|
|
|
|
Figure 10 shows that this is indeed the case for MinE, measuring the amount of work as the number of consistency tests performed. The figure also shows that the extra work in the branch-and-bound search with increased maximum ambiguity level is not worth itthe number of tests summed across both phases increases with the maximum ambiguity level. When all NOEs are used as input to the branch-and-bound search, the total number of consistency tests required is 7.8 times more than the number required when only atom-unambiguous NOEs are used. It is more efficient to start with only the atom-unambiguous NOEs.
|
Effect of spurious NOEs
So far, we have assumed that each NOE peak is true; that is, that every NOE has at least one correct assignment. However, our approach is readily extended to handle a small number of spurious NOEs, NOEs for which all assignments are wrong. In this case, we eliminate a cell when at least a specified number (which is one in the presented algorithm) of the NOEs are violated. An NOE is identified as spurious if it is inconsistent with all the remaining cells of the ACR. Note that we may not be able to identify those spurious NOEs that are consistent with the remaining cells of the ACR.
We tested this approach with MinE, simulating spurious NOEs from the reference structure by randomly choosing pairs of protons that were >10 Å apart. Atom ambiguity was simulated for each of the spurious NOE as before (with 1H chemical shift match tolerance
= 0.04 ppm). We performed 100 random runs for each number of spurious NOEs, from one to five.
Figure 11A illustrates the detection rate for the spurious NOEs. In the case when one spurious NOE is added, the detection rate is 93%. This means that in 93% of the cases, the spurious NOE has no effect on the remaining cells in the ACR or the remaining assignments. The sets of mutually consistent cells and assignments are the same as that with no spurious NOEs. The failure of detection in the remaining seven cases is due to the presence of an atom assignment for the spurious NOE that is consistent with the remaining conformations. The detection rate decreases to 62% as the number of spurious NOEs is increased from one to five.
|
| Conclusion |
|---|
|
|
|---|
90%) early, and hence, results in greatly reduced time complexity. In the two cases studied here, the branch-and-bound plus ambiguity resolution required only hours to days on a single processor in order to guarantee complete consideration of all possible solutions. Complete computation of all solutions avoids bias in the search, as well as any potential for becoming trapped in local minima, both of which are problems inherent in existing heuristic approaches. The run time can potentially be further reduced by distributing the computation over a commodity cluster with a straightforward extension of our implementation. Our sequential strategy also helps elucidate the information content in ambiguous NOEs; many NOEs may be redundant with other (previously handled) NOEs. We identify that 138 of the 183 NOEs in MinE and 12 of the 49 NOEs in CCMP are redundant with respect to the chosen NOEs and cause no further improvement in structural precision. We have also shown that our approach can work with NOEs at different maximum ambiguity levels provided as input to the branch-and-bound search, and can handle a reasonable number of spurious NOEs. NOE assignment can be difficult to fully automate. At the same time, the structure determination of symmetric proteins by NMR can be challenging. It is interesting that by combining these two difficult problems, an algorithm can be obtained that solves both simultaneously, and that enjoys guarantees on its completeness and complexity. Our approach could be extended to other kinds of symmetry, such as a dimer of dimers, a trimer of dimers, or improper symmetry, by defining appropriate configuration spaces and searching them in an analogous manner. For instance, in the case of dimer of dimers (D 2 symmetry), the space of symmetry axes would be represented by seven dimensions (four dimensions for one symmetry axis and three dimensions for the symmetry axis perpendicular to the first), and bounds analogous to the bounds we have previously obtained for the four-dimensional case (Potluri et al. 2006) would have to be developed. The developed bounds would then be used in the hierarchical subdivision of the configuration space and ambiguity resolution of the NOEs. We believe our algorithm could contribute to the structural characterization of symmetric membrane proteins and other large homo-oligomers by NMR.
Our software can be freely obtained for academic use by request from the corresponding authors.
| Materials and methods |
|---|
|
|
|---|
Subunit assignment
An NOE could capture an interaction between any pair of protons in any pair of subunits. Under Cn symmetry, each interaction is "mirrored," i.e., holds between each pair of subunits that are indistinguishable under Cn rotation. Thus, a subunit assignment for an NOE specifies the difference (1 to n 1) in the positions of the involved subunits within the symmetric order.
Atom assignment
In typical approaches to NMR structure determination, the chemical shift assignments of the various atoms are obtained in an earlier analysis, and are then used to determine which atoms interactions might have generated the NOE peaks. The possible identities of the interacting protons are determined by comparing the chemical shift coordinates of an NOE peak with the assigned chemical shifts (the same for each subunit, by Cn symmetry). In the 3D spectra used here, there are two coordinates (15N/13C and 1H) by which to determine one side of the interaction and one coordinate (1H) for the other. Since two coordinates generally result in a unique assignment, we here consider only one-sided atom ambiguity, although our approach can readily be generalized. Thus, an atom assignment for an NOE specifies the identity of the ambiguous atom, from the entire set A of atoms (here we consider only protons) in a monomer. Typically, a user-specified match tolerance
(e.g., 0.04 ppm) limits the possible protons to those with similar chemical shift to the peak coordinate.
In summary, the assignment for an NOE specifies a pair (i,j)
Zn 1 x A, where i assigns the relative positions of the subunits in the complex and j assigns the proton in the subunit. Here, Zn 1 represents S n, the symmetric group of n elements. Each element of Sn is represented by an integer in {1,..., n 1}.
Ambiguity resolution algorithm
Although we would like to compute exactly the region p(R) (Equation 1) defining all consistent structures, it is a semi-algebraic set characterized by high-degree polynomials that are expensive to solve exactly. Our ambiguity resolution algorithm computes a superset of p(R) by using the conservative analytical bounds developed in Potluri et al. (2006) to test satisfaction of possible NOE assignments. We say that our method is complete because every element of p(R) is provably guaranteed to be in the superset returned by our algorithm. More specifically, our method takes as input an initial region of the SCS (represented as a set of nonoverlapping cells, each of which is a hypercuboid region in S 2 x R 2, in a hierarchical decomposition), and outputs the intersection between that region and a superset of p(R). In this way, it is possible to sequentially compute a superset of p(R) by sequentially "intersecting" more and more (possibly ambiguous) NOEs.
We explain our algorithm by first describing the relationship between our previous work and the present algorithm (refer again to Fig. 1). Our previously developed branch-and-bound structure determination algorithm (Potluri et al. 2006) takes as input available intersubunit NOEs, the oligomeric number n, and the subunit structure (previously determined from the intrasubunit restraints, as is possible under the labeling strategies discussed in the introduction) and identifies a set of 4D cells, which we call the "consistent regions." The consistent regions contain all conformations consistent with the data. The algorithm assumes that the backbone displays exact Cn symmetry, but we allow for asymmetry and flexibility in the side chains when we subject the structures to energy-minimization (discussed below). The algorithm employs a conservative bound that eliminates a cell only if all conformations it represents are guaranteed to violate an NOE. The search algorithm recursively subdivides the cells, terminating when all the structures within a cell are within a user-defined similarity level,
0 Å of each other.
In this paper, we use that algorithm to determine the consistent regions for a subset (typically atom-unambiguous) NOEs. (We note that subunit ambiguity always exists for oligomers with more than two subunits.) We then apply our novel ambiguity resolution algorithm (Fig. 12) that considers each of the remaining ambiguous NOEs sequentially, that is, one after the other. At each step, we use our ordering criterion (next section) to choose the NOE considered most informative. Each SCS cell of the consistent regions is checked, using our conservative bound, to ensure that some assignment of the NOE is consistent with the structures represented by the cell. A cell is eliminated if every assignment of any of the NOEs is violated by all the conformations represented by that cell. We then continue with the next NOE chosen using our ordering criterion and the remaining cells. This process of choosing, updating, and resolving is continued until all the NOEs have been considered. The remaining cells form the ambiguity-consistent regions (ACR). Finally, we resolve the ambiguity in the NOEs by eliminating assignments inconsistent with the ACR. Our ambiguity resolution algorithm hence identifies every mutually consistent set of assignments and ambiguity-consistent regions.
|
0 Å to at least one representative structure. We evaluate the representatives for quality of NOE restraint satisfaction, and then energy-minimize them and evaluate them for quality of van der Waals packing, ultimately identifying the set of well-packed satisfying (WPS) structures.
NOE ordering criterion
We use a particular ordering criterion to choose the next NOE in our sequential strategy. Ordering of the NOEs only affects the efficiency of our approach, and not the result.
Claim 1
Reordering the NOEs in R does not affect the completeness of our sequential algorithm.
Proof sketch
This is a direct consequence of the fact that set intersection is commutative, and the fact that our method returns a superset of p(R) (Equation 1).
However, the order does impact the efficiency, and we want to identify the NOE that causes the largest decrease in the SCS, in order that fewer consistency tests are performed. Identifying such an NOE will decrease the number of satisfaction checks required for the remaining assignments, and hence decreases the overall time complexity. To find the next NOE to assign, we first generate a set of representative structures, one from each cell. For each assignment of an NOE, we compute its scores as the number of representative structures that violate the NOE under the assignment. This is indicative of the portion of the SCS that would be eliminated under the assignment. Each NOE's score is then the average of the scores of all of its possible assignments (the line marked (*) in Fig. 12). We choose the next NOE to assign as the one with the maximum score. Although our ordering criterion does not guarantee optimal efficiency, our results show that it does very well in practice, and allows for the identification of redundant NOEs. We also tried evaluating an NOE by the minimum of its assignment scores, but found this approach to be inferior (data not shown).
Computational complexity
When there are r NOEs with a assignments each, we have ar possible combinations of assignments. An advantage of our algorithm is that it avoids the potentially exponential enumeration of all ar possible combinations of NOE assignments.
We first run the branch-and-bound search with a subset of the available NOEs (typically atom-unambiguous NOEs) as input. The output from the search is then input to our ambiguity resolution algorithm to ultimately return a mutually consistent set of ACR and NOE assignments.
Claim 2
Given c cells as consistent regions from the branch-and-bound search with a subset of the NOEs, the time complexity of our algorithm is O(rac), where r is the number of ambiguous NOEs, and a is the number of assignments for each NOE.
Proof sketch
Let t be the time complexity for checking satisfaction of an NOE in one cell. Assigning a particular NOE requires a satisfaction test in each of the cells for each of the assignments (i.e., whether some conformation in the cell is consistent with the assigned NOE) and thus a total time of O(atc) in the worst case where no assignments and no cells have been eliminated. The time complexity for a satisfaction test is constant, t = O(1) (see Appendix for proof). Therefore, assigning one NOE has a time complexity of O(ac). Our algorithm considers a total of r NOEs, and in the worst case each is independent of the others (i.e., no pruning occurs), for a total time of O(rac).
The number of cells, c, depends on the complexity of the well-packed satisfying regions in the SCS. These regions are bounded by algebraic hypersurfaces. In principle, a worst-case bound could be obtained based on the combinatorial complexity of this four-dimensional arrangement (Edelsbrunner 1987). In practice, we have characterized c empirically for the test cases in this paper (c = 70 for the homo-dimeric MinE and c = 4006 for the homo-trimeric CCMP with atom-unambiguous NOEs as input) and report that our algorithm is efficient in practice. In general, for ambiguous NOE assignment problems, the bottleneck in combinatorial complexity lies in the potentially exponential number ar of possible assignments, and for this reason our methodology has focused on developing an algorithm that can provably make correct and consistent assignments while only considering a linear number of possible assignments, ar.
| Appendix |
|---|
|
|
|---|
Proof
Consider checking satisfaction of a restraint of the form ||p q'||
d (p and q' are atoms on different subunits and d is the distance of the restraint) in a cell of the 4D SCS, G = Gs x Gr
S 2 x R 2. This requires determining if there is a nonempty intersection between a ball (centered at p and of radius d) and G q, the region of possible positions of q' when the symmetry axis is in G. Since G q is hard to compute exactly, we approximate it by computing a bound on Gs q at the four corners of Gr and then taking the convex hull of the four bounds (Section 2.1.1 of Potluri et al. 2006). The resulting convex hull is a superset of G q, which we use in our conservative tests for restraint satisfaction. The bound on Gs q is obtained as an axis-aligned box (AABB). The time complexity, t, for checking satisfaction of a restraint in a cell, would then be the sum of (a) the time complexity, tb , for computing the bound on Gs q at each of the four corners of Gr , (b) the time complexity, tc , for computing the convex hull of the bounds at the corners, and (c) the time complexity, ti , for computing the intersection between the convex hull and a ball.
2 * 32 1, obtaining ti = O(1). Hence, the time complexity for checking satisfaction of a possible NOE assignment in a cell of the 4D SCS, t = tb + tc + ti = O(1).
| Footnotes |
|---|
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.062427307.
Abbreviations: NOE, nuclear Overhauser effect; NMR, nuclear magnetic resonance; RMSD, root-mean-square deviation; SCS, symmetry configuration space; ACR, ambiguity-consistent regions; WPS, well-packed satisfying; S2, space of symmetry axis orientations represented on a 2-sphere.
| Acknowledgments |
|---|
|
|
|---|
| References |
|---|
|
|
|---|
Brünger, A.T., Adams, P.D., Clore, G.M., DeLano, W.L., Gros, P., Grosse-Kunstleve, R.W., Jiang, J.S., Kuszewski, J., Nilges, M., and Pannu, N.S., et al. 1998. Crystallography and NMR system: A new software suite for macromolecular structure determination. Acta Crystallogr. D Biol. Crystallogr. 54: 905921.[CrossRef][Medline]
de Berg, M., van Krevald, M., Overmars, M., and Schwarzkopf, O. 2000. Computational geometry. Springer-Verlag, Berlin.
Edelsbrunner, H. 1987. Algorithms in combinatorial geometry. Springer-Verlag, Berlin.
Fiorito, F., Hiller, S., Wider, G., and Wüthrich, K. 2006. Automated resonance assignment of proteins: 6D APSY-NMR. J. Biomol. NMR 35: 2737.[CrossRef][Medline]
Güntert, P. 2004. Automated NMR protein structure calculation with CYANA. Methods Mol. Biol. 278: 353378.[Medline]
Güntert, P., Mumenthaler, C., and Wüthrich, K. 1997. Torsion angle dynamics for NMR structure calculation with the new program DYANA. J. Mol. Biol. 273: 283298.[CrossRef][Medline]
Herrmann, T., Güntert, P., and Wüthrich, K. 2002. Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J. Mol. Biol. 319: 209227.[CrossRef][Medline]
Huang, Y.J., Swapna, G.V., Rajan, P.K., Ke, H., Xia, B., Shukla, K., Inouye, M., and Montelione, G.T. 2003. Solution NMR structure of ribosome binding factor a (RbfA), a coldshock adaptation protein from Escherichia coli . J. Mol. Biol. 327: 521536.[CrossRef][Medline]
Juszewski, K., Schwieters, C.D.S., Garrett, D.S., Byrd, R.A., Tjandra, N., and Clore, G.M. 2004. Completely automated, highly error-tolerant macromolecular structure determination from multi-dimensional nuclear Overhauser enhancement spectra and chemical shift assignments. J. Am. Chem. Soc. 126: 62586273.[CrossRef][Medline]
Kay, L.E., Clore, G.M., Bax, A., and Gronenborn, A.M. 1990. Four-dimensional heteronuclear triple-resonance nmr spectroscopy of interleukin-1
in solution. Science 249: 364365.
King, G.F., Shih, Y.L., Maciejewski, M.W., Bains, N.P., Pan, B., Rowland, S.L., Mullen, G.P., and Rothfield, L.I. 2000. Structural basis for the topological specificity function of MinE. Nat. Struct. Biol. 7: 10131017.[CrossRef][Medline]
Nilges, M. 1993. A calculation strategy for the structure determination of symmetric dimers by 1H NMR. Proteins 17: 297309.[CrossRef][Medline]
Nilges, M., Macais, M., Odonoghue, S., and Oschkinat, H. 1997. Automated NOESY interpretation with ambiguous distance restraints: The refined NMR solution structure of the pleckstrin homology domain from
-spectrin. J. Mol. Biol. 269: 408422.[CrossRef][Medline]
O'Donoghue, S.I., Junius, F.K., and King, G.F. 1993. Determination of the structure of symmetric coiled-coil proteins from NMR data: Application of the leucine zipper proteins Jun and GCN4. Protein Eng. 6: 557564.
O'Donoghue, S.I., Chang, X., Abseher, R., Nilges, M., and Led, J.J. 2000. Unraveling the symmetry ambiguity in a hexamer: Calculation of the R6 human insulin structure. J. Biomol. NMR 16: 93108.[CrossRef][Medline]
Oxenoid, K. and Chou, J.J. 2005. The structure of phospholamban pentamer reveals a channel-like architecture in membranes. Proc. Natl. Acad. Sci. 102: 1087010875.
Potluri, S., Yan, A.K., Chou, J.J., Donald, B.R., and Bailey-Kellogg, C. 2006. Structure determination of symmetric protein complexes by a complete search of symmetry configuration space using NMR distance restraints and van der Waals packing. Proteins 65: 203219.[CrossRef][Medline]
Revington, M., Semesi, A., Yee, A., and Shaw, G.S. 2005. Solution structure of the Escherichia coli protein ydhR: A putative mono-oxygenase. Protein Sci. 14: 31153120.
Seavey, B.R., Farr, E.A., Westler, W.M., and Markley, J. 1991. A relational database for sequence-specific protein NMR data. J. Biomol. NMR 1: 217236.[CrossRef][Medline]
Walters, K.J., Matsuo, H., and Wagner, G. 1997. A simple method to distinguish intermonomer nuclear Overhauser effects in homodimeric proteins with C 2 symmetry. J. Am. Chem. Soc. 119: 59585959.[CrossRef]
Wang, L. and Donald, B.R. 2005. An efficient and accurate algorithm for assigning nuclear Overhauser effect restraints using a rotamer library ensemble and residual dipolar couplings. In Proc IEEE Comput. Syst Bioinform Conf. (CSB2005), pp. 189202.
Wang, C.E., Pérez, T.L., and Tidor, B. 1998. AMBIPACK: A systematic algorithm for packing of macromolecular structures with ambiguous distance constraints. Proteins 32: 2642.[CrossRef][Medline]
Wiltscheck, R., Kammerer, R.A., Dames, S.A., Schulthess, T., Blommers, M.J., Engel, J., and Alexandrescu, A.T. 1997. Heteronuclear NMR assignments and secondary structure of the coiled coil trimerization domain from cartilage matrix protein in oxidized and reduced forms. Protein Sci. 6: 17341745.[Abstract]
Xu, Y., Wu, J., Gorenstein, D., and Braun, W. 1999. Automated 3D assignment and structure calculation of crambin (S22/I25) with the self-correcting distance geometry based NOAH/DIAMOD programs. J. Magn. Reson. 136: 7685.[CrossRef][Medline]
Zwahlen, C., Legault, P., Vincent, S.J.F., Greenblatt, J., Konrat, R., and Kay, L.E. 1997. Methods for measurement of intermolecular NOEs by multinuclear NMR spectroscopy: Application to a bacteriophage
N-Peptide/boxB RNA complex. J. Am. Chem. Soc. 119: 67116721.[CrossRef]