|
|
||||||||
Department of Pharmacology, University of Medicine and Dentistry of New JerseyRobert Wood Johnson Medical School, Piscataway, New Jersey 08854, USA
Reprint requests to: William J. Welsh, Department of Pharmacology, University of Medicine & Dentistry of New JerseyRobert Wood Johnson Medical School, Piscataway, NJ 08854, USA; e-mail: welshwj{at}umdnj.edu; fax: (732) 235-3475.
(RECEIVED April 5, 2004; FINAL REVISION May 10, 2004; ACCEPTED May 18, 2004)
| Abstract |
|---|
|
|
|---|
-rich structure known as amyloid fibril. Here we introduce a computational algorithm to detect nonnative (hidden) sequence propensity for amyloid fibril formation. Analyzing sequencestructure relationships in terms of tertiary contact (TC), we find that the hidden
-strand propensity of a query local sequence can be quantitatively estimated from the secondary structure preferences of template sequences of known secondary structure found in regions of high TC. The present method correctly pinpoints the minimal peptide fragment shown experimentally as the likely local mediator of amyloid fibril formation in
-amyloid peptide, islet amyloid polypeptide (hIAPP),
-synuclein, and human acetylcholinesterase (AChE). It also found previously unrecognized
-strand propensities in the prototypical helical protein myoglobin that has been reported as amyloidogenic. Analysis of 2358 nonhomologous protein domains provides compelling evidence that most proteins contain sequences with significant hidden
-strand propensity. The present method may find utility in many medically relevant applications, such as the engineering of protein sequences and the discovery of therapeutic agents that specifically target these sequences for the prevention and treatment of amyloid diseases.
Keywords: amyloid fibril; tertiary contacts; secondary structure; hidden
-strand propensity; H
P, SCOP
Abbreviations: A
,
-amyloid AchE, human acetylcholinesterase ANN, artificial neural network BuChE, butyrylcholinesterase DSSP, definition of secondary structure of proteins H
P, hidden
propensity HIAPP, human islet amyloid precursor protein NAC, non-A
component of Alzheimers disease amyloid PAM, percentage accepted mutations PDB, Protein Data Bank SCOP, structural classification of proteins TC, tertiary contact.
1 Present address: ArQule, Inc., Woburn, MA 01801, USA. ![]()
Article and publication date are at http://www.proteinscience.org/cgi/doi/10.1110/ps.04790604.
| Introduction |
|---|
|
|
|---|
As the conformation of a local amino-acid sequence can be influenced by its tertiary environment (Minor and Kim 1996), consideration of the tertiary context of a sequence can improve our understanding of sequencestructure relationships in local regions of proteins. It is possible to quantify the influence of tertiary context by a simple approach that counts the number of atom-to-atom tertiary contacts (TCs), rather than resorting to exhaustive energy calculations (Berezovsky and Trifonov 2001). TCs are formed between nonadjacent residues when a protein undergoes three-dimensional folding, bringing together residues that can be far apart along the linear amino-acid sequence. The notion of TCs has been applied successively by other workers for the identification of self-stabilizing folding units (Fischer and Marqusee 2000) and for comparisons of protein-folding complexity and kinetics (Plaxco et al. 1998). Counting TCs between nonbonded atoms provides a simple yet effective way to approximate tertiary interactions and solvent accessibility, thus representing a judicious compromise between speed and rigor suitable for rapid predictions on a large scale. Here we introduce a computational algorithm that detects the nonnative (hidden)
-strand propensity (H
P) of sequences by formulating relationships between protein local sequence and secondary structure in terms of TCs. This algorithm is henceforth called the H
P method.
Algorithms for making predictions of protein native secondary structures typically rely upon associating homologies between the query sequence and template sequences for which the three-dimensional structure is known. These algorithms generally require a minimum sequence context of 14 to 17 residues to determine the unique native secondary structure of a query peptide or protein (Pan et al. 1999), whereas the present H
P method is applied by using a much shorter sequence context (seven residues), which is more likely to retrieve multiple template secondary structures for a given query sequence (Zhou et al. 2000). Essentially, the present method partitions the structural determinants for local secondary structure propensity into two independent variables: (1) local effects (flanking sequences) and (2) nonlocal effects (TCs). The sequence similarity of flanking regions is an efficient measure of local effects on secondary structure propensity. A seven-residue sequence context was chosen in part because it represents the minimum size sufficient to account for possible (i,i + 3) side-chain interactions within helical sequences. In light of the uncertainty of the homology between similar or even identical seven-residue sequences within the context of evolutionary relationships (Zhou et al. 2000), an obvious advantage in using a shorter (e.g., seven-residue) rather than a longer sequence context when searching protein fold and/or structural databases is the likelihood that a greater number of similar sequences in nonhomologous proteins (e.g., diverse TC states) will be retrieved. By structural analysis of this larger pool of similar sequences, it becomes possible to systematically evaluate sequencestructure relationships in terms of TCs (Fig. 1
). We have implemented this scheme in a computational procedure that predicts the nonnative secondary structure preferences of a query local sequence by searching the SCOP20 (Structural Classification of Proteins) database (Brenner et al. 2000) for conformational preferences of similar local sequences that vary with respect to their TC states. By capturing the influence of variations in tertiary structural environment on local conformations, the present H
P method pinpoints those regions in a protein that exhibit high (or low) propensity for undergoing conformational change. In this report, we illustrate how this method is applied to detect the H
P in protein fragments known to be associated with amyloid fibril formation. The present H
P algorithm is not intended to ascertain whether a specific protein is amyloidogenic; however, it will detect sequences within the protein that are conducive to triggering amyloid fibril formation (i.e., strong H
P).
|
-strands, leading to fibrillar aggregation (Jimenez et al. 1999). Recent studies have shown that diverse proteins not related to amyloid disease can also aggregate into fibrils under laboratory-controlled destabilizing conditions (Chiti et al. 1999; Fändrich et al. 2001). Although amyloid fibrils share a common core of highly compact cross-
structure (Balbirnie et al. 2001; Fändrich et al. 2001), the lack of consensus sequence among amyloidogenic proteins, together with the recent observation of helical content within amyloid fibrils (Mangione et al. 2001), suggests a more varied secondary structure. Encouragingly, investigators have recently been able to establish some correlation between sequence conservation and amyloid fibril formation by conducting a large-scale sequence-structure analysis (Benyamini et al. 2003a,b). Identification of those sequences in a protein that exhibit a strong propensity for nonnative
-strand formation would help scientists to decipher the apparently complex interplay of secondary structure and conformational features that may trigger amyloid fibril formation in virtually any protein. Secondary structure prediction methods, such as the popular PHD algorithm (Rost 1996), were specifically designed for the intended purpose of predicting native secondary structure based on sequence homology. Thus, although demonstrating some proficiency in detecting nonnative
-structure in at least one study (Kallberg et al. 2001), PHD predicts virtually zero
-strand propensity for myoglobin (Fig. 2A
-helical globular protein has been shown to form amyloid fibrils under denaturing conditions (Fändrich et al. 2001). Thus far, no secondary-structure prediction method exists that will detect H
P for amyloid fibril formation in common globular protein sequences. H
P indicates the tendency of some sequences that, although
-helix or random coil in the native state, can transform to nonnative
-strands under certain conditions that are nonetheless physiologically relevant. Hence, our primary objective was to develop a sensitive measure of H
P by calculating TC-based secondary structure. This knowledge bears relevance to medically relevant applications, such as the engineering of proteins devoid of such sequences and the discovery of therapeutic agents that specifically target these regions associated with fibrillar aggregation.
|
| Results and Discussion |
|---|
|
|
|---|
-helix and
-strand shows opposing trends at low and high TCs. These associations between TCs and secondary structure elements (
-helix and
-strand) suggest that TC filters can improve our ability to predict relationships between sequence and structure in local regions. Previous studies (Pan et al. 1999; Zhou et al. 2000) have suggested that a seven-residue sequence context is too short for the prediction of native secondary structure. Nevertheless, we found strong correlations between query and template sequences within a seven-residue context when TC filters are used to evaluate the TC-dependent secondary structure propensity of a given query sequence (Table 2
|
|
-helix in low TCs was 75%, using P(
|low)query > 0.5 as the selection criterion. This criterion provides 92% coverage of all helical fragments with low TCs in SCOP20. The prediction accuracy and percentage of coverage are inversely related and dependent upon the criterion chosen for P(
|low)query and P(
|high)query. For example, the more stringent criterion P(
|low)query > 0.9 yields 90% prediction accuracy but lower coverage (19%). Similarly, the prediction accuracy for
-strand in high TCs was 75% using P(
|high)query > 0.5 as the selection criterion. This criterion provides 66% coverage of all
-strand fragments with high TCs in SCOP20. The prediction accuracy and coverage are generally lower for
-strand in high TCs than for
-helix in low TCs. This result is also commonly found for native secondary structure prediction methods. Because
-strands occur less frequently than do
-helices in proteins (Table 1
-strand propensity than
-helical propensity. In effect, the size of template pool is smaller for
-strand than for
-helix. This limitation is not a factor when identifying local regions that exhibit high H
P, because it is unnecessary to assign secondary structures to all residues in a sequence. In summary, validation of the present H
P method was conducted as described above by using the SCOP20 fragments to predict their native secondary structures in their native TC states. This method was then applied to predict the nonnative secondary structure propensities in nonnative TC states (whether "high" or "low").
Because the data set of 453,787 amino-acid fragments were extracted from nonhomologous SCOP20 domains, the 30 top-scoring templates for a given query sequence will likely represent diverse TC states. Inasmuch as the seven-residue templates are rarely found exclusively in low or high TC bins, a given query sequence will typically yield nonzero values for both P(
|low) and P(
|high). In contrast to other computational methods that predict the native secondary structure of a protein sequence, the present H
P method was designed to predict propensities for nonnative secondary structure. Its intended purpose is to pinpoint local regions that are more susceptible to conformational change, for example, from helical to
-strand. Given the preponderance of evidence that increased
-strand formation is a common feature in triggering aggregation of proteins into amyloid fibrils (Chiti et al. 2003; Paz and Serrano 2004), the propensity for amyloid fibril formation can be predicted by examining the H
P in high TC environments (i.e., P[
|high]) for sequences in which
-strand is not the native structure.
Pinpointing amyloid fibril-forming sequences
The wild-type
-amyloid peptide (A
) is well known for its strong propensity to form fibrils and, as such, serves as a highly relevant test case for the present method. Tjernberg et al. (1996) deduced that the KLVFF segment in the truncated native A
sequence (N-terminal residues 118) was of critical importance in the polymerization of amyloid fibril, whereas a mutant sequence in which KLVFF was replaced by AAVFA showed a markedly reduced tendency to form amyloid fibrils. Their work provides convincing evidence that even short sequences may govern a the propensity of a protein for amyloid fibril formation. We calculated P(
|low) and P(
|high) for the wild-type A
(N-terminal residues 128) and for the corresponding AAVFA mutant sequence. Consistent with the findings of Tjernberg et al. (1996), the present H
P method predicted that the KLVFF segment in A
shows the strongest nonnative
-strand propensity in high TCs, whereas the alternative alanine-rich AAVFA segment shows substantially reduced
-strand propensity (Fig. 3A,B
, respectively). The PHD method, although also predicting
-strand for the KLVFF sequence in A
, incorrectly, predicts the native secondary structure of A
(Fig. 3A
). Our H
P method, which calculates P(
|low) and P(
|high) independently, predicts strong helical propensity at low TCs in accordance with the native structure of A
in the solution state.
|
propensity revealed a strong correlation with that calculated by the present H
P method (r = 0.6) but not with that calculated by PHD (r = 0.2; Table 3
P values calculated by the present method were associated with fragments 11 and 12 (RLANFLVHSS and LANFLVHSSN, respectively), in agreement with the results obtained by Mazor et al. (2002). In contrast, the PHD method failed to detect any
propensity (prE) in this major motif. Both the present H
P method and PHD predicted some
propensity in two minor motifs (fragments 2, 1921). Two overlapping pentapeptides (NFLVH and FLVHS) within the major binding domain of hIAPP1120 (RLANFLVHSS, fragment 11 in Table 3
P method pinpointed the NFLVH region as possessing the highest H
P (i.e., highest P[
|high] value) in the entire hIAPP sequence (Fig. 3C
P method with these experimental results for hIAPP lends credibility to the sensitivity of our TC-based approach in correctly predicting non-native
-strand propensity even in extremely short sequences. Identification of short recognition motifs, such as NFLVH in hIAPP, is of critical value in efforts to discover "
-sheet breakers," much like AcQKLVFFNH2 was shown by Tjernberg et al. (1996) to halt fibrillization of A
.
|
|
-synuclein (Spillantini et al. 1997, 1998). Interestingly, the peptide derived from the central hydrophobic region of
-synuclein represents a second major intrinsic constituent of Alzheimers plaques. This 35-aminoacid peptide, known as NAC (non-A
component of Al-zheimers disease amyloid) was shown to constitute about 10% of the amyloid plaque (Ueda et al. 1993). The amy-loidogenicity is not uniformly distributed within NAC. For example, the C-terminal half of the peptide (NAC residues 1935, QKTVEGAGSIAAATGFV) does not fibrillate, whereas the N-terminal fragment 318 (VTNVGGAVVT GVTAVA) can fibrillate (El-Agnaf et al. 1998; Bodles et al. 2001). NAC fragment 1122 (VTGVTAVAQKTV) and 818 (GAVVTGVTAVA) also showed propensity to fibrillate (Bodles et al. 2001; Giasson et al. 2001). Consistent with this experimental evidence, our H
P method detected significant H
P exclusively in the N-terminal region (Fig. 3D
-strand propensity. In sharp contrast with experimental evidence and the H
P method, PHD predicts higher
-strand propensity in the C-terminal region and a mixture of
and
propensities in the core region (VTGVTAVA).
The intact human acetylcholinesterase (AChE) C-terminal domain is
-helical in the native state, but a shorter, 14-residue fragment (AChE586599) forms
-rich amyloid fibrils (Cottingham et al. 2003). These fibrous AChE586599 aggregates possess all the classical hallmarks of amyloid fibrils and are neurotoxic in vitro (Cottingham et al. 2002). However, the fragment (BuChE573586) derived from the closely related enzyme butyrylcholinesterase (BuChE) showed no detectable fibril formation (Cottingham et al. 2003). Consistent with these observations, the H
P method predicts significantly greater
-strand propensity for AChE586599 than for BuChE573586 (Fig. 3E,F
). The PHD method predicts a helical secondary structure in both fragments and was unable to detect any
-strand propensity in AChE586599. The ability of the H
P method to differentiate between these two highly similar sequences (i.e., AChE586599 and BuChE573586) with respect to their reported propensity to form amyloid fibrils (Cottingham et al. 2003) speaks to its exceptional sensitivity.
A summary of results collected for known amyloidogenic and nonamyloidogenic subsequences is assembled in Table 4
(below). P(
|high) is >>0.5 for all amyloidogenic cases, whereas only P(
|high) = 0.2 for the two nonamyloidogenic sequences (i.e., NAC sequence 1935 of
-synuclein; BuChE573596). The P(
|high) level of each sequence calculated by the present H
P method is consistent with corresponding experimental observations. In contrast, the PHD method generally predicted extremely low
propensity in amyloidogenic fragments except in the A
sequence. In the NAC sequence of
-synuclein and in AChE586599, the helical propensity predicted by both PHD and the present method (P[
|low]) was relatively high compared to non-amyloidogenic counterparts. This intriguing outcome concurs with the view that the amyloid fibril forming propensity of a given protein sequence is related to its ambivalent nature with respect to adopting a
-helix or
-strand conformation.
|
-helical punctuated by short loops that link the helices. Its entire sequence is devoid of the
-strands associated with amyloid fibril formation, and no
-strand propensities can be detected in this protein by the PHD algorithm (Fig. 2A
-stranded fibrils that are virtually identical to those seen in disease-associated amyloid fibrils. Given the profound implications of this experimental finding, we pondered whether our TC-based H
P method would detect any significant H
P in the myoglobin sequence. Indeed, both EVLIRLF and TVVLTAL were predicted by the H
P method to show the strongest nonnative
-strand propensity in high TC environments (Fig. 2B
-helical in the native state. Other helical sequences (VLNVWGKVEA, IKYLEFIS, and IIHVLHSK) also revealed significant H
P. Fändrich et al. (2003) recently reported an independent peptide fragment containing IKYLEFIS and IIHVLHSK as amyloidogenic, although it is known as a stable helical element in the protein and lacks clear polar-hydrophobic sequence pattern. In acccordance with these experimental findings, the present computational analysis shows remarkable conformational ambivalence (helix for low and
-strand for high TCs) for this region. These previously undetected H
P may explain why such a protein that is predominantly helical in the native state is still capable of forming fibrillar aggregates.
The notion that a small number of compact sequences, such as EVLIRLF and TVVLTAL in myoglobin, can provide sufficient driving force for amyloid fibril formation might seem implausible. These two seven-residue sequences represent a small fraction of horse myoglobins 153 residues, and they are well separated in terms of sequence as well as spatially in the crystalline state. On the other hand, the finding (Tjernberg et al. 1996) that the highly amyloidogenic behavior of A
can be arrested by replacing its KLVFF pentapeptide with the helix stabilizing AAVFA attests to the apparent delicate balance between normal and amyloid structures. Likewise, our H
P method detected high
-strand propensities in only a few regions of A
(Fig. 3A
). This invites speculation whether removal of EVLIRLF and TVVLTAL in myoglobin, either by replacing them with helix-promoting residues (e.g., alanine) or by deleting them altogether, would attenuate the tendency of myoglobin for fibril formation under the same experimental conditions used by Fändrich et al. (2001). Alternatively, addition of EVLIRLF and TVVLTAL peptides to these myoglobin solutions might serve as "
-sheet breakers," much like AcQKLVFFNH2 was shown by Tjernberg et al. (1996) to halt polymerization of A
and subsequent formation of fibrils.
H
P in globular domains
Despite the growing number of proteins shown in various experiments to be amyloidogenic, no definitive patterns have been found as to sequence specificity or to the threshold level of additional
-strands that can trigger amyloid fibril formation. Analysis of 2358 nonhomologous domains in the SCOP20 database revealed that the domain H
P for the majority of domains is in the 0.2 to 0.5 range (Fig. 5A
), meaning that 20% to 50% of residues were predicted by the present method to possess significant
-strand propensity (P[
|high] > 0.5) even though they are not
-strand in the native state. Because the present method is designed to detect nonnative as opposed to native
-strand propensity, the H
P will be greater for helical domains than for
-rich domains (Fig. 5A
). For example, the predicted domain H
P was low for
-rich proteins (viz., SH3 domain and Immunoglobulin-light chain) and noticeably higher for A
and other helical proteins (viz., insulin, myogoblin; Table 5
).
|
|
2 analysis comparing the domain H
P among 23 proteins shown experimentally as amyloidogenic and the 2358 SCOP20 domains indicated similar distributions with no significant difference between the two distributions (Fig. 5
P in known amyloidogenic proteins is 0.2, whereas most globular domains in SCOP20 are predicted to possess domain H
P > 0.2. The average domain H
P was identical (H
Pavg = 0.30) for both the SCOP20 domains and the 23 known amyloidogenic proteins (Fig. 5A,B
P exists in most SCOP20 domains corroborates the notion that amyloid fibril formation is a generic feature of proteins (Chiti et al. 1999). Normally benign proteins become toxic when they undergo fibrilization (Bucciantini et al. 2002). To the extent that in vitro observations reflect in vivo behavior (Couzin 2002), the results summarized in Figure 5
The high prevalence of nonnative
-strand propensity in local amino-acid sequences of proteins (Fig. 5
) is fascinating yet disturbing when seen from the perspective that fibrillar aggregates (or their intermediates) are likely toxic in humans. Structural plasticity in a local sequence has been observed, even by a single-site mutation (Cordes et al. 2000), thus suggesting that new protein folds can evolve from existing folds without drastic or large-scale mutagenesis. A plausible interpretation is that amyloid fibrils are by-products resulting from the inherent structural plasticity of local amino-acid sequences. The ambivalent nature of local sequences as to their structural propensities might imply that peptide sequences represent a neutral pool for protein evolution that depends on the appropriate control system (e.g., molecular chaperones) to suppress protein misfolding and aggregation. In concert with the control system, the H
P of a local sequence might represent another driving force in protein evolution. This premise is supported by the significant degree of domain H
P detected in most SCOP20 sequences (Fig. 5
). Identification of the H
P of local sequences provides insight into the role of amyloid fibrils in protein evolution and should contribute toward progress in our battle against amyloid diseases and related conditions.
Therapeutic implications
Although several therapeutic targets (e.g., the secretases) have been identified to block the amyloid cascade upstream of fibrillar formation (Wolfe 2002), no clinically effective drugs for these targets have yet appeared. Somewhat counter-intuitively, small peptides composed of these same local sequences associated with strong H
P have been shown by in vitro experiments (Soto et al. 1998; Citron 2002; Findeis 2002) to block amyloid fibril formation in full-length proteins. These peptides, aptly called
-sheet breakers, are based on A
residues 1721, which constitute the central hydrophobic core that is believed essential for A
assembly (Tjernberg et al. 1996; Soto et al. 1998). Encouragingly, recent in vivo studies (Permanne et al. 2002) using two different transgenic mouse models have demonstrated that systematic administration of a pentapeptide
-sheet breaker can reduce amyloid load and cerebral damage in Alzheimers disease. These small peptides exhibited good brain penetration, reduced A
deposition, increased neuronal survival, and decreased brain inflammation associated with amyloid deposition.
Although promising, the
-sheet breaker approach requires prior knowledge of those sequences associated with H
P. Ready access to this information has been hampered by the absence of a consensus sequence among disease-associated amyloid forming proteins, confounded by difficulties encountered in determining fibril structures at the molecular level using experimental techniques. These obstacles make it extremely difficult to identify target sites of
-sheet breakers and to design effective
-sheet breakers. It is hoped that the present H
P method for detecting H
P will offer some guidance in addressing this problem.
Future directions
The present method demonstrated an exceptional sensitivity to detect H
P in local regions of protein sequences. Our analysis of H
P in globular domains revealed a high degree of association with known amyloidogenic peptides. This new approach enables us to carry out proteome-wide analysis of amyloidogenic propensity in protein sequence. Currently, we are developing a Web-accessible interface that implements our TC-based H
P method within an artificial neural network (ANN) to enable fast and accurate prediction of H
P for any protein or peptide sequence along a continuum of variable TC values. This Web site will also feature a knowledge base that contains a database of short sequences that are predicted by our ANN-based H
P tool to possess strong H
P. This database, which will be updated as more protein structural information becomes available, is intended to offer guidance toward the discovery of therapeutic agents that inhibit
-strand formation and aggregation.
| Materials and methods |
|---|
|
|
|---|
4 Å apart and separated by more than four residues in sequence (Fischer and Marqusee 2000). The TCs of the middle five residues within each seven-residue sequence were counted and then sorted categorically into bins designated low, intermediate, and high. The low/intermediate and intermediate/high boundaries were defined respectively as TCavg - 20% and TCavg + 20%, where TCavg is the average number of TCs. Because individual amino acids will differ with respect to side-chain length, composition, and hydrophobicity, the TCavg value associated with each of the 20 common amino acids was precalculated from the original 453,787 fragments in the SCOP20 database (Table 6
|
|low) and P(
|high)
After sorting the maximum 30 templates into three separate bins designated low, intermediate, and high in terms of number of TCs as described above, the secondary structure of the center residue of each template in the separate bins was analyzed. The occurrences of
-helix in low TC bins, P(
|low)temp, and
-strand in high TC bins, P(
|high)temp, were calculated from the templates in low and high TC bins, respectively (Fig. 1
). The templates in the intermediate bin were discarded. To predict P(
|low) and P(
|high) of a query from the corresponding information obtained from the templates, we developed statistical models by defining two additional variables, N(low)temp and N(high)temp, that correspond to the total number of templates in low and high TCs, respectively (Fig. 6A,B
). The inclusion of N(low)temp and N(high)temp improves our ability to predict P(
|low)query from P(
|low)temp and P(
|high)query from P(
|high)temp.
|
3 and P(
|low)temp > 0.5. We proceeded to plot the occurrence of helix in the queries, P(
|low)query, against their N(low)temp separately for three different ranges of P(
|low)temp (Fig. 6A
|low)temp and N(low)temp, namely, P(
|low)temp in the 0.5 to 0.7 range and N(low)temp = 6. Because 566 of these 782 test queries are
-helical in the native state, P(
|low)query = 0.72 (i.e., 566/782). From the 453,787 SCOP20 fragments, we found 113,680 test queries for which N(high)temp
3 and P(
|high)temp > 0.5. Similarly, values of P(
|high)query versus N(high)temp were plotted separately for three different ranges of P(
|high)temp (Fig. 6B
The curves so obtained (Fig. 6A,B
) were each fitted to a nonlinear regression equation. The values of P(
|low)query and P(
|high)query for each query sequence, such as the sequence GEAVELA shown in Figure 1
, were determined from these equations (Fig. 6A,B
, respectively). Inspection of the curves in Figure 6
reveals, as expected, that the statistical quality of the predicted propensity of a query/test sequence improves as the number of templates, N(low)temp, or N(high)temp, increases and as the propensity of the corresponding templates, P(
|low)temp or P(
|high)temp, approaches unity.
Values of P(
|low) and P(
|high) for a full-length protein query sequence were calculated by using a sliding seven-residue window. The first three residues at each terminus of the protein were excluded from the prediction process because they lack the minimum three residues on one side needed to assign sequence context. Predictions of secondary structure as calculated by PHD algorithms were carried out by accessing the Predict Protein Server (http://www.embl-heidelberg.de/predictprotein/submit_def.html). The domain H
P parameter was calculated as
![]() |
where N(non
) refers to the number of residues in a given domain with secondary structure that is not
-strand in the native structure and N{non
, P(
|high) > 0.5} refers to the subset of these for which P(
|high) > 0.5.
| Acknowledgments |
|---|
| References |
|---|
|
|
|---|
-sheet structure for amyloid. Proc. Natl. Acad. Sci. 98: 23752380.
Benyamini, H., Gunasekaran, K., Wolfson, H., and Nussinov, R. 2003a.
2-Microglobulin amyloidosis: Insights from conservation analysis and fibril modeling by protein docking techniques. J. Mol. Biol. 330: 159174.[CrossRef][Medline]
. 2003b. Convervation and amyloid formation: A study of the gelsolin-like family. Proteins 51: 266282.[CrossRef][Medline]
Berezovsky, I.N. and Trifonov, E.N. 2001. Van der Waals locks: Loop-n-lock structure of globular proteins. J. Mol. Biol. 307: 14191426.[CrossRef][Medline]
Bodles, A.M., Guthrie, D.J., Greeg, B., and Irvine, G.B. 2001. Identification of the region of non-A
component (NAC) of Alzheimers disease amyloid responsible for its aggregation and toxicity. J. Neurochem. 78: 334395.
Brenner, S.E., Koehl, P., and Levitt, M. 2000. The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res. 28: 254256.
Bucciantini, M., Giannoni, E., Chiti, F., Baroni, F., Formigli, L., Zurdo, J., Taddei, N., Ramponi, G., Dobson, C.M., and Stefani, M. 2002. Inherent toxicity of aggregates implies a common mechanism for protein misfolding diseases. Nature 416: 507511.[CrossRef][Medline]
Chiti, F., Webster, P., Taddei, N., Clark, A., Stefani, M., Ramponi, G., and Dobson, C.M. 1999. Designing conditions for in vitro formation of amyloid protofilaments and fibrils. Proc. Natl. Acad. Sci. 96: 35903594.
Chiti, F., Stefani, M., Taddei, N., Ramponi, G., and Dobson, C.M. 2003. Rationalization of the effects of mutations on peptide and protein aggregation rates. Nature 424: 805808.[CrossRef][Medline]
Citron, M. 2002. Alzheimers disease: Treatments in discovery and development. Nat. Neurosci. 5: 10551057.
Cordes, M.H., Burton, R.E., Walsh, N.P., McKnight, C.J., and Sauer, R.T. 2000. An evolutionary bridge to a new protein fold. Nat. Struct. Biol. 7: 11291132.[CrossRef][Medline]
Cottingham, M.G., Hollinshead, M.S., and Vaux, D.J. 2002. Amyloid fibril formation by a synthetic peptide from a region of human acetylcholinesterase that is homologous to the Alzheimers amyloid-
peptide. Biochemistry 41: 1353913547.[CrossRef][Medline]
Cottingham, M.G., Voskuil, J.L., and Vaux, D.J. 2003. The intact human acetylcholinesterase C-terminal oligomerization domain is
-helical in situ and in isolation, but a shorter fragment forms
-sheet-rich amyloid fibrils and protofibrillar oligomers. Biochemistry 42: 1086310873.[CrossRef][Medline]
Couzin, J. 2002. Harmless proteins twist into troublemakers. Science 296: 2829.
El-Agnaf, O.M., Jakes, R., Currar, M.D., Middleton, D., Ingenito, R., Bianchi, E., Pesse, A., Neill, D., and Wallace, A. 1998. Aggregates from mutant and wild-type
-synuclein proteins and NAC peptide induce apoptotic cell death in human neuroblastoma cells by formation of
-sheet and amyloid-like filaments. FEBS Lett. 440: 7175.[CrossRef][Medline]
Fändrich, M., Fletcher, M.A., and Dobson, C.M. 2001. Amyloid fibrils from muscle myoglobin. Nature 410: 165166.[CrossRef][Medline]
Fändrich, M., Forge, V., Buder, K., Kittler, M., Dobson, C.M., and Diekmann, S. 2003. Myoglobin forms amyloid fibrils by association of unfolded polypeptide segments. Proc. Natl. Acad. Sci. 100: 1546315468.
Findeis, M.A. 2002. Peptide inhibitors of
-amyloid aggregation. Curr. Top. Med. Chem. 2: 417423.[CrossRef][Medline]
Fischer, K.F. and Marqusee, S. 2000. A rapid test for identification of autonomous folding units in proteins. J. Mol. Biol. 302: 701712.[CrossRef][Medline]
Giasson, B.I., Murray, I.V., Trojanowski, J.Q., and Lee, V.M. 2001. A hydrophobic stretch of 12 amino acid residues in the middle of
-synuclein is essential for filament assembly. J. Biol. Chem. 276: 23802386.
Hoppener, J.W., Ahren, B., and Lips, C.J.M. 2000. Islet amyloid and type 2 diabetes mellitus. N. Engl. J. Med. 343: 411419.
Jimenez, J.L., Guijarro, J.I., Orlova, E., Zurdo, J., Dobson, C.M., Sunde, M., and Saibil, H.R. 1999. Cryoelectron microscopy structure of an SH3 amyloid fibril and model of the molecular packing. EMBO J. 18: 815821.[CrossRef][Medline]
Kabsch, W. and Sander, C. 1983. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22: 25772637.[CrossRef][Medline]
Kallberg, Y., Gustafsson, M., Persson, B., Thyberg, J., and Johansson, J. 2001. Prediction of amyloid fibril-forming proteins. J. Biol. Chem. 276: 1294512950.
Mangione, P., Sunde, M., Giorgetti, S., Stoppini, M., Esposito, G., Gianelli, L., Obici, L., Asti, L., Andreola, A., Viglino, P., et al. 2001. Amyloid fibrils derived from the apolipoprotein A1 Leu174Ser variant contain elements of ordered helical structure. Protein Sci. 10: 187199.
Mazor, Y., Gilead, S., Benhar, I., and Gazit, E. 2002. Identification and characterization of a novel molecular-recognition and self-assembly domain within the islet amyloid polypeptide. J. Mol. Biol. 322: 10131024.[CrossRef][Medline]
Minor Jr., D.L. and Kim, P.S. 1996. Context-dependent secondary structure formation of a designed protein sequence. Nature 380: 730734.[CrossRef][Medline]
Pan, X.-M., Niu, W.-D., and Wang, Z.-X. 1999. What is the minimum number of residues to determine the secondary structural state? J. Protein Chem. 18: 579584.[CrossRef][Medline]
Paz, M. and Serrano, L. 2004. Sequence determinants of amyloid fibril formation. Proc. Natl. Acad. Sci. 101: 8792.
Permanne, B., Adessi, C., Saborio, G.P., Fraga, S., Frossard, M.-J., Dorpe, J.V., Dewachter, I., Banks, W.A., Leuven, F.V., and Soto, C. 2002. Reduction of amyloid load and cerebral damage in a transgenic mouse model of Alzhei-mers disease by treatment with a
-sheet breaker peptide. FASEB J. 8: 860862.
Plaxco, K.W., Simons, K.T., and Baker, D. 1998. Contact order, transition state placement and the refolding rates of single domain proteins. J. Mol. Biol. 277: 985994.[CrossRef][Medline]
Rost, B. 1996. PHD: Predicting one-dimensional protein structure by profile-based neural networks. Methods Enzymol. 266: 525539.[CrossRef][Medline]
Sacchettini, J.C. and Kelly, J.W. 2002. Therapeutic strategies for human amy-loid diseases. Nat. Rev. Drug Discovery 1: 267275.[CrossRef][Medline]
Schwartz, R.M. and Dayhoff, M.O. 1978. Matrices for detecting distant relationships. In Atlas of protein sequence and structure (ed. M.O. Dayhoff), pp. 353358. National Biomedical Research Foundation, Washington, DC.
Soto, C., Sigurdsson, E.M., Morelli, L., Kumar, R.A., Castano, E.M., and Fran-gione, B. 1998.
-Sheet breaker peptides inhibit fibrillogenesis in a rat brain model of amyloidosis: Implications for Alzheimers therapy. Nat. Med. 4: 822826.[CrossRef][Medline]
Spillantini, M.G., Schmidt, M.L., Lee, V.M., Trojanowski, J.Q., Jakes, R., and Goedert, M. 1997.
-Synuclein in Lewy bodies. Nature 388: 839840.[CrossRef][Medline]
Spillantini, M.G., Crowther, R.A., Jakes, R., Hasegawa, M., and Goedert, M. 1998.
-Synuclein in filamentous inclusions of Lewy bodies from Parkin-sons disease and dementia with lewy bodies. Proc. Natl. Acad. Sci. 95: 64696473.
Tjernberg, L.O., Naslund, J., Lindqvist, F., Johansson, J., Karlstrom, A.R., Thyberg, J., Terenius, L., and Nordstedt, C. 1996. Arrest of
-amyloid fibril formation by a pentapeptide ligand. J. Biol. Chem. 271: 85458548.
Ueda, K., Fukushima, H., Masliah, E., Xia, Y., Iwai, A., Yoshimoto, M., Otero, D.A., Kondo, J., Ihara, Y., and Saitoh, T. 1993. Molecular cloning of cDNA encoding an unrecognized component of amyloid in Alzheimer disease. Proc. Natl. Acad