Protein Science
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Protein Science (2007), 16:906-918. Published by Cold Spring Harbor Laboratory Press. Copyright © 2007 The Protein Society
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplemental Research Data
Right arrow Correction
Right arrow Correction (v16,p1242)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Zibaee, S.
Right arrow Articles by Serpell, L. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zibaee, S.
Right arrow Articles by Serpell, L. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

A simple algorithm locates beta-strands in the amyloid fibril core of {alpha}-synuclein, Abeta, and tau using the amino acid sequence alone

Shahin Zibaee1,3, O. Sumner Makin1,4, Michel Goedert2, and Louise C. Serpell1,4

1 Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, United Kingdom
2 MRC Laboratory of Molecular Biology, Cambridge, Cambridge CB2 2QH, United Kingdom

(RECEIVED October 24, 2006; FINAL REVISION February 8, 2007; ACCEPTED February 18, 2007)


    Abstract
 TOP
 Abstract
 Introduction
 Results
 Discussion
 Materials and Methods
 Acknowledgments
 References
 
Fibrillar inclusions are a characteristic feature of the neuropathology found in the {alpha}-synucleinopathies such as Parkinson's disease, dementia with Lewy bodies, and multiple system atrophy. Familial forms of {alpha}-synucleinopathies have also been linked with missense mutations or gene multiplications that result in higher protein expression levels. In order to form these fibrils, the protein, {alpha}-synuclein ({alpha}-syn), must undergo a process of self-assembly in which its native state is converted from a disordered conformer into a beta-sheet-dominated form. Here, we have developed a novel polypeptide property calculator to locate and quantify relative propensities for beta-strand structure in the sequence of {alpha}-syn. The output of the algorithm, in the form of a simple x-y plot, was found to correlate very well with the location of the beta-sheet core in {alpha}-syn fibrils. In particular, the plot features three peaks, the largest of which is completely absent for the nonfibrillogenic protein, beta-syn. We also report similar significant correlations for the Alzheimer's disease-related proteins, Abeta and tau. A substantial region of {alpha}-syn is also of converting from its disordered conformation into a long amphipathic {alpha}-helical protein. We have developed the aforementioned algorithm to locate and quantify the {alpha}-helical hydrophobic moment in the amino acid sequence of {alpha}-syn. As before, the output of the algorithm, in the form of a simple x-y plot, was found to correlate very well with the location of {alpha}-helical structure in membrane bilayer-associated {alpha}-syn.

Keywords: {alpha}-synuclein; beta-strand propensity; Alzheimer's disease; Parkinson's disease; algorithm; amyloid fibril


    Introduction
 TOP
 Abstract
 Introduction
 Results
 Discussion
 Materials and Methods
 Acknowledgments
 References
 
Ageing-related neurodegenerative conditions including Parkinson's disease (PD) and Alzheimer's disease (AD) are the most common causes of motor dysfunction and dementia, respectively (Goedert 2001). Neuropathological observations at post-mortem invariably reveal the presence of extracellular and intracellular proteinaceous deposits, as well as nerve cell degeneration (Alzheimer 1907; Lewy 1912). In all cases, the deposits have a significant fibrillar component which has been isolated and characterized (Eanes and Glenner 1968; Serpell et al. 2000; Berriman et al. 2003). The fibrillar component of Lewy bodies and Lewy neurites is composed of {alpha}-synuclein ({alpha}-syn) (Baba et al. 1998; Spillantini et al. 1998a). The fibrillar component of AD senile plaques is composed of Abeta peptides (Glenner and Wong 1984) while that of neurofibrillary tangles is composed of tau (Brion et al. 1985; Goedert et al. 1988).

Proteins in general are able to undergo intermolecular associations and aggregation, subject to a destabilization of the natively globular fold (Colon and Kelly 1991; Booth et al. 1997; Perutz 1997). The main effect of many missense mutations identified in cases of familial amyloidoses is likely to be a reduction in stability of tertiary or quarternary structure (Siepen and Westhead 2002). The wild-type protein in a destabilizing environment can also lead to aggregation (Colon and Kelly 1991; Booth et al. 1997; Zurdo et al. 2001). However, in the case of the natively unfolded proteins (or chemically unfolded proteins) (Guijarro et al. 1998; McParland et al. 2000), destabilization of the native state is no longer the rate-limiting step so that self-assembly of the protein in vitro may be estimated from physicochemical properties of its amino acid side chains (Lopez De La Paz et al. 2002; Tjernberg et al. 2002; Chiti et al. 2003). Recombinant DNA techniques and in vitro aggregation assays can be employed to analyze the relationship of the sequence properties to aggregation. Fibril assembly of {alpha}-syn, Abeta, and tau proteins involves a conformational change from "random coil" to beta-sheet structure (Glenner et al. 1974; Watson et al. 1998; Serpell et al. 2000; von Bergen et al. 2001).

Here, we have designed a novel algorithm to measure a derivative of beta-strand propensity which we refer to as "beta-strand contiguity" (beta-SC) by a simple treatment of Chou and Fasman's secondary structure preference numbers (Chou and Fasman 1974a). This was expected to locate "fibrillogenic hotspots" within a protein's sequence. Specifically, peaks in the beta-SC plots were found to correlate well the location of amyloid fibril cores in {alpha}-syn, Abeta, and tau (Wischik et al. 1988; Petkova et al. 2002; Heise et al. 2005). Although the calculations were not designed to predict the absolute rate of fibril assembly very likely they will have a significant bearing on the thermodynamics of fibril assembly. Furthermore, we suggest that the peaks of beta-SC plots may represent good pharmacological targets for inhibitors of fibril assembly for {alpha}-syn, Abeta, and tau, with possible therapeutic consequences for neurodegeneration.

In the presence of phospholipids or surfactants, {alpha}-syn has also been found to undergo a significant change from "random coil" to a long amphipathic {alpha}-helical form, lying parallel to the surface of the phospholipid membrane bilayer or surfactant micelle (Eliezer et al. 2001; Jao et al. 2004). Our program, has been adapted to also measure {alpha}-helical amphipathicity ({alpha}-HA) by applying a calculation of the hydrophobic moment of a projected {alpha}-helix, similar to that used previously by Eisenberg et al. (1982). The {alpha}-HA plots correlate very well with experimental data for {alpha}-syn and beta-syn.


    Results
 TOP
 Abstract
 Introduction
 Results
 Discussion
 Materials and Methods
 Acknowledgments
 References
 
The program's algorithm, which can use any scale of amino acid properties such as secondary structure preference numbers or hydrophobicities, is named "SALSA" (simple algorithm for sliding averages). SALSA calculates an average beta-strand propensity score of a peptide window using Chou and Fasman beta-strand preference numbers (Chou and Fasman 1974a). The algorithm samples the full length of the protein's sequence by sliding a window along, one residue at a time. The x-y plots are a compilation of these scores which are plotted as a y-coordinate against the amino acid sequence which is along the x-axis. We refer to this novel measure of beta-strand propensity as "beta-strand contiguity" (beta-SC).

SALSA beta-strand contiguity
{alpha}-Synuclein
The beta-SC plot for {alpha}-syn has three main peaks, spanning residues 32–89 (see Fig. 1A), with the dominant peak (III) covering residues 58–89.


Figure 1
View larger version (17K):
[in this window]
[in a new window]

 
Figure 1. beta-strand contiguity plots for the human protein sequences of (A) {alpha}-synuclein, (B) Abeta(1–40), and (C) tau-441.

 
Abeta(1–40)
The beta-SC plot for Abeta(1–40) has two relatively small peaks (as compared with {alpha}-syn's prominent peak) which span residues 8–40 and are separated by a significant dip over residues 23–27 (Fig. 1B). The two peaks are comparable in size to one another.

Tau
In Figure 1C, the beta-SC plot for the sequence of the longest isoform of tau with 441 amino acids (abbreviated to "tau-441") is shown. The four largest peaks are labeled I-V. The most prominent peak (III) is comparable in size to {alpha}-syn's most prominent peak (III).

SALSA beta-strand contiguity of homologous proteins
One of the most promising utilities of the SALSA beta-SC algorithm is for making direct comparisons of different protein sequences. This is illustrated in the sections below in which the SALSA beta-SC plot of {alpha}-syn is directly compared with that of beta-syn. This is followed by a comparison of Abeta(1–42) with Abeta(1–40), and finally four-repeat tau is compared with three-repeat tau.

{alpha}-Synuclein vs. beta-synuclein
The beta-SC plot of beta-syn is similar to that of {alpha}-syn up to and including peaks I and II (Fig. 2A). However, peak III is entirely absent from beta-syn.


Figure 2
View larger version (17K):
[in this window]
[in a new window]

 
Figure 2. Comparison of beta-strand contiguity plots for human protein sequences of (A) {alpha}-synuclein (red) and beta-synuclein (blue); (B) Abeta(1–42) (red circles) and Abeta(1–40) (blue); (C) four-repeat tau-441 (red) and three-repeat tau-410 (blue).

 
Abeta(1–40) vs. Abeta(1–42)
Both beta-SC plots for Abeta(1–40) and Abeta(1–42) have two small peaks spanning residues 9–40 and 9–42, respectively. Both peaks are approximately equivalent in size for Abeta(1–40). However, in the case of Abeta(1–42), the carboxy-terminal peak (II) is larger (Fig. 2B).

Four-repeat tau vs. three-repeat tau
For all six isoforms of tau that are expressed in the human central nervous system, the beta-SC plots are nearly identical. The only difference is in the size of the prominent peak (III), which is larger in isoforms with three-repeats. The absence of the extra repeat is responsible for the reduced size of the prominent peak in three-repeat tau (3R-tau) (Fig. 2C). Peak III spans residues 296–326 in four-repeat tau (4R-tau) (tau-441) or 272–296 for 3R-tau (tau-410), covering most of the fourth or third repeat of the microtubule binding region, respectively.

SALSA {alpha}-helical amphipathicity
{alpha}-Synuclein
The {alpha}-HA plot of {alpha}-syn produces one long continuous peak, covering ~95 residues from the N-terminal end (Fig. 3A). Although the peak is uninterrupted, there is a distinctive dip at around residue 40. In Figure 3B, the location of the peak is indicated along a linear representation of the protein, immediately above the actual amino acid sequence. Below the sequence, we have highlighted one predicted and eight experimentally observed locations of {alpha}-helical structure.


Figure 3
View larger version (16K):
[in this window]
[in a new window]

 
Figure 3. (A) {alpha}-Helical amphipathicity plot for the protein sequence of human {alpha}-synuclein. (B) A color-coded representation of the secondary structure over the amino-acid sequence is shown to illustrate {alpha}-helical regions for comparison with the {alpha}-HA plot. From the top, {alpha}-HA plot; the sequence-based prediction of Davidson et al. (1998); structural data of Eliezer et al. (2001); Bussell and Eliezer (2003); Chandra et al. (2003); Jao et al. (2004); Bisaglia et al. (2005); Bussell et al. (2005); Ulmer et al. (2005). Amphipathic {alpha}-helical regions (blue). Possible {alpha}-helical region predicted/visible dip (pale blue). Undefined break in {alpha}-helical structure/no SALSA {alpha}-HA peaks (white). Hairpin linker/turn (yellow). Extended/"random coil"/greater flexibility (light green).

 
{alpha}-Synuclein vs. beta-synuclein
The {alpha}-HA plot of beta-syn is shown in Figure 4, overlapping that of {alpha}-syn. The peaks in the plots over residues 1–50 are very similar between the two synucleins (as expected from their high sequence homology). Both plots display a similar dip at around residue 40. While {alpha}-syn's {alpha}-HA peak terminates at around residue 95, beta-syn's peak terminates upstream of this at around residue 82.


Figure 4
View larger version (11K):
[in this window]
[in a new window]

 
Figure 4. Comparison of {alpha}-helical amphipathicity plots for human {alpha}-synuclein (red circles) and beta-synuclein (blue circles).

 

    Discussion
 TOP
 Abstract
 Introduction
 Results
 Discussion
 Materials and Methods
 Acknowledgments
 References
 
SALSA beta-strand contiguity algorithm
In order to examine the correlation of the amino acid sequence of {alpha}-syn, Abeta, and tau with their fibrillar forms, we have compared experimental data from the literature with the outputs of our algorithm. In the first of two parts, we show there to be a good correlation between experimental data on the location of the beta-sheet core in preformed fibrils and peaks in the algorithm's graphical output ("beta-SC plots").

In the second part, we make direct comparisons of the beta-SC plot for each protein with that of a similar protein and discuss the significance of the differences between these plots with observed fibrillogenicities and pathogenicities.

SALSA beta-strand contiguity compared with structural data of beta-sheet fibrils
SALSA beta-strand contiguity (beta-SC) samples every region of a protein's sequence for local beta-strand propensity. Without incorporating computations of any other physicochemical property, the algorithm and its plotting tool show a surprisingly good correlation with published structures of amyloid and amyloid-like fibrils of {alpha}-syn, Abeta, and tau.

{alpha}-Synuclein
The location of SALSA beta-SC peaks in {alpha}-syn (Fig. 1A) correlate well with the location of regions identified as forming beta-sheet structure in fibrils, as deduced both from direct and indirect structural analyses.

Solid-state NMR experiments on amyloid-like fibrils formed from {alpha}-syn have shown that the beta-sheet core extends from residue 22 to ~105 (Heise et al. 2005). Indirect methods included hydrogen–deuterium exchange, which found the region ~39–101 to be most protected in {alpha}-syn fibrils (Del Mar et al. 2005). Similarly, electron paramagnetic resonance of site-directed spin-labeled {alpha}-syn showed that, upon fibril assembly, intermolecular distances were reduced over the central residues from ~34 –~101. Thus, these ~60–70 residues are implicated in the fibril core (Der-Sarkissian et al. 2003). Finally, proteinase K digestion experiments showed residues 31–109 to be resistant, implicating these 79 amino acids in the core of {alpha}-syn fibrils (Miake et al. 2002).

While peak III of the beta-SC plot ends at residue 89, with another very small peak visible at residues 90–97, the experimental data find that the beta-sheet core can extend to between ~101 and 109. Although the significance of these discrepancies is presently unclear, we hypothesize that the beta-SC plot may represent a statistical average of a heterogenous population of {alpha}-syn fibrils. The existence of structural heterogeneity in {alpha}-syn fibril populations has been demonstrated (Heise et al. 2005). It is further noted that in vitro aggregation assays using the 18-residue peptide, {alpha}-syn(61–78), have found it to be highly amyloidogenic, while the peptide, {alpha}-syn(79–95), was not (El-Agnaf et al. 1998a).

Another reason that the beta-sheet core may extend beyond that predicted by the beta-SC algorithm may be due to further stabilizing {pi}{pi} interactions that could occur through the phenylalanine residue at position 94.

SALSA beta-strand contiguity compared with structural data of beta-sheet fibrils
Abeta(1–40)
The location of peaks in the beta-SC plot of Abeta(1–40) correlates well with regions that have been shown to occupy the beta-sheet core in amyloid fibrils of Abeta(1–40), as compared with experimental data from direct and indirect structural analyses.

Solid-state NMR has been used to produce high-resolution structural data for fibrils of Abeta(1–40) and has shown that residues 1–9 remain disordered, while residues 12–39 are predominantly beta-sheet (Balbach et al. 2002). In another study, energy minimization modeling, constrained by experimental data (from NMR, EM, and fiber diffraction), concluded that the N-terminal region, approximately covering the first 10 amino acids, is disordered, while residues 12–24 and 30–40 adopt a beta-sheet conformation and residues 25–29 contain a bend (Petkova et al. 2002). The beta-SC plot only predicts the location of beta-strands and is not designed to predict bends or other secondary structural motifs. However, it is noted that the location of this bend correlates well with the location of the dip between the two beta-SC peaks. Other techniques, such as solution NMR, site-directed spin labeling, and limited proteolysis, have also been used to obtain structural information on Abeta(1–40) fibrils. In hydrogen–deuterium exchange measured by NMR of Abeta(1–40) fibrils, residues 1–15 and 37–40 were not protected and were therefore unlikely to be part of the beta-sheet core, remaining unstructured instead (Whittemore et al. 2005). Residues 25 and 26 could also exchange backbone amide protons and were potentially part of a turn. Site-directed spin labeling of Abeta(1–40), as well as Abeta(1–42), showed that upon fibril assembly regions that remained highly mobile, and therefore not likely to form part of the beta-sheet core, included the N-terminal region covering the first 10–11 amino acids, residues ~23–29, and C-terminal residues from ~39 (Torok et al. 2002). Experiments in which Abeta(1–40) fibrils had been subjected to limited proteolysis showed N-terminal residues, up to between ~12 and 16, to be fully exposed. The remainder of the protein was protected from the solvent, implicating this region in the beta-sheet core of the fibrils (Kheterpal et al. 2001). Some cleavage also occurred at Lys28–Gly29, but it was not considered to be sufficient evidence for increased accessibility of this bond. Another indirect structural assay is scanning proline mutagenesis (Williams et al. 2004). This works on the premise that if fibril assembly of Abeta(1–40) is insensitive to a particular proline mutation, then that residue is not likely to form part of the beta-sheet core of the fibril. Regions insensitive to proline mutation included residues 1–14 and 37–40. Residues 22, 23, 29, and 30 were also insensitive to proline mutations, leading to the conclusion that these residues may occupy two turns between three beta-strands. Collectively, these findings correlate well with beta-SC peaks, except for observations that showed the C-terminal beta-strand ending prior to residue 40 (Balbach et al. 2002; Torok et al. 2002; Williams et al. 2004).

There is currently no direct structural data available for Abeta(1–42) fibrils. However, quenched hydrogen–deuterium exchange NMR has recently been performed and has indicated that residues 18–26 and 31–42 may form the beta-sheet core (Lührs et al. 2005). It has been suggested that beta-sheets in Abeta(1–42) fibrils are C-terminally shifted from their location in Abeta(1–40) fibrils and also from the peaks of SALSA beta-SC plots. This discrepancy may be a consequence of the fact that the beta-SC algorithm does not make any predictions of tertiary or intraprotein folding associations. For example, the salt bridge between Asp23 and Lys28 may influence the location of the turn and thus the precise location of the beta-strands within the resulting hairpin. However, we also consider that this discrepancy may reflect a degree of heterogeneity in the structure and assembly of Abeta fibrils, such that the precise location of beta-strands will vary. This is further supported by results of site-directed spin labeling experiments with Abeta(1–42), which showed different beta-strand positions (mentioned above) (Torok et al. 2002). We hypothesize that (as with {alpha}-syn) the location of peaks in beta-SC plots represents a statistical average of heterogeneous fibril populations, rather than a rigid structural model. NMR data for the existence of structural heterogeneity or polymorphism in amyloid fibrils have previously been reported for Abeta(1–40) (Petkova et al. 2005).

SALSA beta-strand contiguity compared with structural data of beta-sheet fibrils
Tau
The minimal protease-resistant region of AD paired helical filaments was reported to be an ~90–100 amino acid fragment, at residues ~255–~350, which is largely overlapping with the microtubule-binding repeat (MTBR) region (Wischik et al. 1988; Crowther et al. 1989; Jakes et al. 1991; Novak et al. 1993). The prominent peak in the beta-SC plots for 3R-tau and 4R-tau, which is about 30 residues long, coincides with the center of this fragment. Electron paramagnetic spin resonance studies of tau assembly showed that, when fibrils had assembled, residues 301–320 were at the core of the fibril (Margittai and Langen 2004, 2006). Fibril assembly assays of full-length tau compared with a number of C-terminally truncated fragments demonstrated that the ability to aggregate only began to be impeded when the truncation included the C terminus from residue 321. Fibril assembly was drastically reduced by a truncation up to 314, and abolished altogether by a truncation up to 292 (Abraha et al. 2000). Therefore, while protease sensitivity experiments implicate a large region within the C terminus, spin-labeling experiments of full-length tau and in vitro aggregation assays with C-terminally truncated tau implicate a smaller region, the location of which appears to correlate more precisely with the location of the prominent peak (III) in the beta-SC plot.

SALSA beta-strand contiguity of similar proteins and relative pathogenicities
The algorithm allows direct comparison of different protein sequences and has been used to compare {alpha}-syn with beta-syn, Abeta(1–42) with Abeta(1–40), and three-repeat tau with four-repeat tau. In the following sections, we discuss how the relative differences of their beta-SC plots correlate with experimental data, as well as with observed pathogenic propensities. Each pairing of proteins is highly homologous to one another and, as a consequence, their beta-SC plots display near identical features (i.e., most peaks are in the same places and are the same sizes). However, in each case, we find that, where the sequences do differ, it is specifically the size of the prominent beta-SC peak that differs. Furthermore, the relative differences appear to correlate well with the relative fibrillogenicities in vitro and pathogenicities in vivo.

{alpha}-Synuclein vs. beta-synuclein
The beta-SC plots of {alpha}-syn and beta-syn overlap for the N-terminal regions covering peaks I and II (Fig. 2A). However, peak III is absent from beta-syn, which coincides with the difference at residues 71 and 72 and, most notably, the absence of the beta-strand-favorable amino acids found in {alpha}-syn at residues 73–83. Peak III, which is the largest of the three peaks, locates to the region reported to dominate the fibrillogenic capacity of {alpha}-syn (Han et al. 1995; Iwai et al. 1995; El-Agnaf et al. 1998a,b; Bodles et al. 2000, 2001; Giasson et al. 2001; Du et al. 2003). beta-Syn is reproducibly nonfibrillogenic (Goedert 2001) and is absent from Lewy bodies (Spillantini et al. 1997). Thus far, no mutations in beta-syn have been linked to familial PD. The difference between the two beta-SC plots complements the difference between the fibrillogenic properties of {alpha}- and beta-syn and their relationship to PD.

SALSA beta-strand contiguity of similar proteins and relative pathogenicities
Abeta(1–40) vs. Abeta(1–42)
beta-SC plots for Abeta(1–40) and Abeta(1–42) have two small peaks spanning residues 9–40 and 9–42, respectively. Both peaks are approximately equivalent in size for Abeta(1–40). However, in the case of Abeta(1–42), the C-terminal peak (II) is larger (Fig. 2B).

Abeta(1–42) is more fibrillogenic than Abeta(1–40) in vitro (Jarrett et al. 1993; Murakami et al. 2002), which correlates well with the comparison of their beta-SC plots. Thus, it appears that the addition of isoleucine and alanine at the carboxyl terminus of Abeta(1–42) enhances the fibrillogenicity of the surrounding residues. In agreement with this, it is noticed that the highest MbetaP for any peptide window in Abeta(1–42) (within the range of 4–20 residues) is scored by 38VVIA42. The relationship of fibrillogenicity to disease is supported by the observation that Abeta(1–42) is more toxic than Abeta(1–40) in cell culture (Davis-Salinas et al. 1995; Murakami et al. 2002). Also, the increased ratio of Abeta(1–42) to Abeta(1–40) has been associated with an increased risk for AD (Younkin 1995), although an increase in levels of either peptide is a risk factor in itself (Gregory and Halliday 2005). In neuropathological studies, cerebrovascular deposits were found to have a higher ratio of Abeta(1–42) to Abeta(1–40) (Roher et al. 1993), as well as a higher ratio of Abeta(1–40) to Abeta(1–42) (Joachim et al. 1988; Prelli et al. 1988; Suzuki et al. 1994; Alonzo et al. 1998; McCarron et al. 2000; Fryer et al. 2003; Ingelsson et al. 2004).

Therefore, the intrinsic fibrillogenicity of these two peptides correlates with the propensity for fibril assembly in vitro, which may correlate with in vivo aggregation and ultimately with the age-of-onset or severity of disease. As with the synucleins in the previous section, a strong argument can be made for a direct relationship between amino acid sequence, fibrillogenicity, and the progression of a sporadic neurodegenerative condition. We suggest, therefore, that the beta-SC algorithm can provide insight into fibril assembly of Abeta in AD.

SALSA beta-strand contiguity of similar proteins and relative pathogenicities
Four-repeat tau vs. three-repeat tau
An increased ratio of 4R-tau to 3R-tau has been associated with cases of frontotemporal dementia and Parkinsonism linked to chromosome 17 (FTDP-17) (Clark et al. 1998; Hong et al. 1998; Hutton et al. 1998; Spillantini et al. 1998b; Goedert et al. 1999; Hasegawa et al. 1999; Spillantini et al. 2000; Iseki et al. 2001; Miyamoto et al. 2001; Grover et al. 2002; Yoshida et al. 2002; Connell et al. 2005). About half of the known mutations in tau influence the splicing of tau pre-mRNA, mostly resulting in a higher ratio of 4R- to 3R-tau, but occasionally resulting in a lower ratio (Goedert and Spillantini 2006; van Swieten et al. 2007). A complete understanding of how this ratio influences the onset of a tauopathy is currently lacking. In some tauopathies, tau inclusions are composed mainly of either 4R-tau or 3R-tau (Buée-Scherrer et al. 1996; Bronner et al. 2005). The beta-SC plots suggest that in both proteins the fibrillogenic propensities are concentrated in peak III. Comparison of these peaks suggests that 4R-tau may have a higher propensity for fibril assembly than 3R-tau.

SALSA beta-SC plots are not relevant for globular proteins
Linding and colleagues have shown that the "beta-aggregation" tendency of globular proteins is almost threefold higher than that of natively unfolded proteins (Linding et al. 2004). This was measured using their algorithm (called TANGO) which identifies aggregation-nucleating regions in a protein. It is to be expected that the sequences of globular proteins, which have a higher proportion of hydrophobic residues (and beta-strand-favorable residues) than natively unfolded proteins, will score higher. It is also the case that the SALSA beta-SC of globular proteins will be significantly higher, as the algorithm will find more peptide windows with higher MbetaP scores in the sequences of globular proteins than in natively unfolded proteins. Thus, we emphasize that SALSA beta-SC is designed to calculate the propensity for fibril assembly of natively unfolded proteins only. SALSA beta-SC does not calculate the propensity for nonfibrillar forms of aggregation either.

Differences between SALSA beta-SC and other aggregation algorithms
Several aggregation algorithms have been produced by other research groups, often including a graphical format (Fernandez-Escamilla et al. 2004; Linding et al. 2004; Sánchez de Groot et al. 2005; Tartaglia et al. 2005; Thompson et al. 2006). Such outputs highlight regions in protein sequences with higher thermodynamic propensities for aggregation and are often correlated with the aggregation rate in vitro. In at least one of these algorithms, peaks of the graphs indicate where in the sequence a protein's aggregation propensity is most likely to be sensitive to mutations (Pawar et al. 2005). The beta-SC plots are unique in that the algorithm employs only a single physicochemical property of amino acids (beta-strand propensity). Thus, these plots are not intended to predict the Rate of protein aggregation. The experimentally observed rates of fibril assembly by recombinant human {alpha}-syn, as well as {alpha}-syn, beta-syn, {gamma}1-syn, and {gamma}2-syn from Fugu rubripes, have been found to correlate with other physicochemical properties, including electrostatic charge repulsion, hydrophilicity, and secondary structural propensities (Yoshida et al. 2006). Similar correlations have also been observed for a number of other proteins (Chiti et al. 2002, 2003). SALSA measures an intrinsic structural propensity and is therefore insensitive to the environment (i.e., pH, temperature, salt concentrations). Extrinsic factors are taken into account by some of the algorithms that predict aggregation rates and aggregation propensities (Fernandez-Escamilla et al. 2004; Pawar et al. 2005; Tartaglia et al. 2005). Although we do observe some degree of overlap with the graphical outputs of these algorithms, SALSA beta-SC is the only algorithm that is specialized to amyloid fibrillar aggregation and not to other forms of protein aggregation, and this may be part of the reason for the difference in the various outputs.

Glycine, glutamine, asparagine, low sequence complexity, and aromatic residues
A potential source of inaccuracy in the correlation of beta-SC plots with beta-sheet fibrils is the use of Chou and Fasman (1974a) beta-strand preference numbers. It is very likely that there will be some differences between the propensity for beta-strand structure in native proteins (from which the Chou and Fasman [1974a] preference numbers were derived) and in amyloid fibrils. An example of this is for the residue, glycine, which has a very low Chou and Fasman (1974a) beta-strand preference number but, as demonstrated by spider silk proteins, is readily able to form amyloid-like beta-sheet assemblies (Kenney et al. 2002). Similarly, glutamine and asparagine, which are also found to form beta-sheet assemblies, are not predicted to do so by SALSA beta-SC (Nelson et al. 2005). We hypothesize that our algorithm might be adapted to account for these types of self-assembly by incorporating a calculation for the degree of sequence complexity (Kenney et al. 2002). Another example is the potential for {pi}{pi} stacking by certain aromatic residue side chains that enhance beta-sheet propensity to a greater extent when occurring in the context of amyloid fibril structure than in native structure. Therefore, this might not be adequately quantified by the Chou and Fasman (1974a) numbers alone (Gazit 2002; Makin et al. 2005).

SALSA {alpha}-helical amphipathicity compared with structural data of membrane-associated {alpha}-synuclein
{alpha}-Synuclein
Membrane binding by {alpha}-syn coincides with a conformational change to an {alpha}-helical-dominated form. Edmundson helical wheels (Schiffer and Edmundson 1967) of canary {alpha}-syn show the {alpha}-helical amphipathic distribution of amino acids resides within the ~100 amino-terminal residues (Davidson et al. 1998). By alignment with human {alpha}-syn (which has very few differences from canary {alpha}-syn), this corresponds to amino acids 1–93. By the method of Segrest et al. (1992), three potential helix breakers were identified within this region, thus predicting the formation of five separate {alpha}-helices at residues 1–15, 17–37, 39–48, 50–60, and 61–93.

The structure of the {alpha}-helical region has been studied by solution NMR, using the lipid mimetic, sodium dodecyl sulphate (SDS) (Eliezer et al. 2001). In most studies, only a single break in the {alpha}-helical region has been found, either at residue 40 (Bisaglia et al. 2005) or between residues 42 and 44 (Bussell and Eliezer 2003; Chandra et al. 2003). Some backbone flexibility has also been reported for residues 1–8, 34, and 80–85 (Bussell et al. 2005). A site-directed spin-labeling study of {alpha}-syn in small unilamellar vesicles supported the existence of a single, long {alpha}-helix without any breaks (Jao et al. 2004). A titration of SDS to {alpha}-syn found that residues 3–37 and 45–92 became {alpha}-helical (Ulmer et al. 2005). In all studies, the C-terminal end of {alpha}-syn (~residues 100–140) remained unfolded.

Here, we show that, by comparison with experimental data, the {alpha}-HA plot of {alpha}-syn can map the {alpha}-helical regions of {alpha}-syn very well (Fig. 3). In Figure 3B, the location of the {alpha}-HA peaks is annotated along the amino acid sequence and compared with similar, annotated representations of one sequence-based prediction and eight experimentally observed locations of {alpha}-helical structure (as described above). Furthermore, the position of a break in the {alpha}-helix appears to correlate well with the position of a distinctive dip in the {alpha}-HA peak. We hypothesize that this may be indicative, not necessarily of the precise location of a permanent break, but of a region with a higher propensity for a break, such that, under certain conditions or constraints, the {alpha}-helix may remain unbroken, break, or fluctuate between the two states. This hypothesized variability is supported by the fact that different research groups have located the break to slightly different positions.

SALSA {alpha}-helical amphipathicity
{alpha}-Synuclein vs. beta-synuclein
The residue-by-residue conformational data set for beta-syn in the presence of SDS micelles has recently been published (in the form NMR C{alpha} 2° shifts) (Sung and Eliezer 2006). An {alpha}-helical conformation was found covering residues ~1–85. This correlates well with the {alpha}-HA plot of beta-syn. The {alpha}-helical structure was reported to be interrupted at around residues 43 and 44, which correlates with the location of a dip in the {alpha}-HA peak of beta-syn and coincides with the position of the dip in the {alpha}-HA peak of {alpha}-syn.

The good correlation of calculated {alpha}-HA with observed {alpha}-helical structure demonstrates that the location of the membrane-associated {alpha}-helical region can be deduced from the primary structure alone. Application of the algorithm allows the direct comparison of different sequences, as has been demonstrated by comparison of {alpha}-syn with beta-syn. The algorithm uses a similar calculation of hydrophobic moment as that used by another algorithm (called "MOMENT"; http://nihserver.mbi.ucla.edu/moment/), but produces an output that compares more favorably with experimentally observed data. This is likely to be due to the method by which SALSA samples and collates scores from a larger number of peptide windows.

SALSA {alpha}-helical amphipathicity algorithm
SALSA {alpha}-helical amphipathicity ({alpha}-HA) uses the same mechanism as SALSA beta-SC for sampling and collating data into a plot, but instead of using beta-strand propensity scores, the algorithm measures the hydrophobic moment of a protein's sequence. A number of structural studies have been performed on {alpha}-syn in the context of phospholipid bilayers or surfactant micelles (Eliezer et al. 2001; Bussell and Eliezer 2003; Chandra et al. 2003; Jao et al. 2004; Bisaglia et al. 2005; Bussell et al. 2005; Ulmer et al. 2005) and these have been directly compared with SALSA {alpha}-HA plots. As with the comparisons of beta-SC plots with experimentally observed beta-sheet fibrils, a high level of correlation is seen between {alpha}-HA plots and experimental data of {alpha}-syn in its {alpha}-helical form. This provides us with a powerful bioinformatic tool with which to further probe the conformational character and function of {alpha}-syn.

Conclusion
The SALSA algorithm displays a highly favorable comparison between latent beta-strand propensity in the form of beta-SC peaks and the observed location of beta-sheet structure in fibrillar forms of {alpha}-syn, Abeta, and tau. Furthermore, the algorithm has confirmed there to be a positive correlation between a derivative of relative beta-strand propensities, relative fibrillogenicities, and pathogenic propensities. Thus, we conclude that SALSA provides new and improved insights into the sequence correlates of fibrillogenic propensities for the three proteins that are pertinent for the most common neurodegenerative conditions in humans. The data support the concept that delaying beta-sheet formation by these three proteins may also delay the pathogenic consequences of fibril assembly and therefore represent part of a therapeutically beneficial strategy. The algorithm will allow us to test this hypothesis further, both in vitro and in neurodegenerative disease models.

SALSA can map latent propensities for {alpha}-helical structure as well. This is presented in the form of {alpha}-HA plots and confirms the role of hydrophobicity in the formation of membrane-associated {alpha}-helical structure in synucleins. This algorithm will also allow us to test the role of this property in {alpha}-synucleinopathy models.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Results
 Discussion
 Materials and Methods
 Acknowledgments
 References
 
Algorithm development
Chou and Fasman preference numbers
Chou and Fasman (1974b) secondary structural preference numbers have simply been used here as a scale. The SALSA algorithm does not apply any of the rules for secondary structure prediction as set out by Chou and Fasman, amongst others (Chou and Fasman 1974b, 1978; Williams et al. 1987). Therefore, it represents an alternative application of the numbers rather than an improvement of the original method.

Calculating SALSA beta-strand contiguity
For any selected amino acid property, the algorithm calculates the mean score of that property for all the peptide windows within a polypeptide's sequence. This can include a range of peptide window sizes which will be overlapping in the sequence. In the case of beta-strand contiguity (beta-SC), a "mean beta-strand propensity" (MbetaP) score is first calculated for each peptide window, according to Equation 1:



Formula 1

where Pbeta, P{alpha}, and Pt are the Chou and Fasman beta-strand, {alpha}-helix, and reverse turn preference numbers, respectively (Chou and Fasman 1974a). The sums of the Pbeta, P{alpha}, or Pt preference numbers for every residue in a peptide window are abbreviated to {sum} Pbeta, {sum} P{alpha}, or {sum} Pt, respectively.

In the case of beta-SC, SALSA slides a four-residue window along the entire sequence, one residue at a time, calculating the MbetaP score for each. Thus, in the case of human {alpha}-synuclein which has 140 residues, there are a total of 137 four-residue windows, each with its own MbetaP score. SALSA then repeats this calculation using a five-residue window, then a six-, seven-, eight-residue and so on up to and including a 20-residue window.

Selecting a range of peptide window sizes for beta-strand contiguity plots
The peptide window size range for beta-SC plots was selected in the context of fibril assembly of unfolded proteins. It has been found that fibrils can be formed by peptides as short as four residues in length (Reches et al. 2002; Tjernberg et al. 2002). Therefore, peptide window sizes were selected to range from 4 to 20 residues. This was also based on the hypothesis that the propensity for any 4-mer, 5-mer, 6-mer, 7-mer, etc., to form beta-strands will be directly enhanced by the propensity of neighboring residues, potentially irrespective of whether they form part of the final beta-sheet. The selection of a size range of 4–20 residues is further supported by the observation that, while beta-strands in globular proteins are on average 5.3 residues in length, the majority tends to have a range of 2–10 residues and can be as long as 20 residues. This was based on an analysis of 320 nonhomologous protein chains (R. Laskowski, pers. comm.).

Deriving a minimum threshold score for beta-strand contiguity plots
Thus, SALSA beta-SC contiguity samples every possible window for 17 different window sizes (4–20). In the case of {alpha}-syn, this produces a data set of 2193 windows, each with its own MbetaP score. A beta-SC plot of {alpha}-syn that is constructed using the scores from all of these windows is relatively featureless (see Supplemental Fig. 1A). Therefore, in order to produce meaningful plots, the data set needs to be filtered.

One way of doing this is to restrict the data set to a limited number of windows, i.e., those with the highest MbetaP scores. It was found that plots, similar to those presented in Figure 1, could be constructed using only 400 windows. The windows selected were those with the highest MbetaP scores (and therefore the process of filtering the data set simply involves discarding the remaining 1793 windows). Another method to filter the data set is to discard windows that have MbetaP scores lower than some minimum threshold score. This approach was preferred to the former, as it also allows for the direct comparison of different protein sequences (such as those presented in Fig. 2).

The requirement for filtering the data set and the methods used to perform this have provided the algorithm with a crucial advantage, namely that the plots can, to a certain extent, be calibrated according to empirical observations. For beta-SC plots, we used a comparison of plots produced by the sequences of human {alpha}- and beta-syn to arrive at a minimum threshold score of 1.2. This was the lowest threshold score capable of producing a good overlap between peaks of the two sequences over residues 30–60, and this threshold has been used to produce all beta-SC plots presented in this paper.

Constructing SALSA beta-strand contiguity plots
In all cases, SALSA plots the property in question (such as beta-SC) on the y-axis, against the amino acid sequence, which is along the x-axis. For each residue along the sequence, the value for the y-coordinate is produced by adding together the MbetaP scores from every window that contains that particular residue. As described in the section above, this only follows a filtering step and, as a result, a residue's score may be as low as zero if all possible windows it forms a part of have MbetaP scores that are lower than the minimum threshold score. This is explicitly illustrated by the worked example shown in Figure 5. It follows therefore that residues which are located amidst several beta-strand-favorable residues will accumulate a high score, added together from numerous and often overlapping windows, while those located amidst beta-strand-unfavorable residues will tend to produce a low sum of scores, even if an individual residue has a high propensity for beta-strand.


Figure 5
View larger version (32K):
[in this window]
[in a new window]

 
Figure 5. A worked example of SALSA beta-strand contiguity for a 10-residue peptide window, MDVFMKGLSK. Chou and Fasman (1974b) secondary structure preference numbers are listed below the sequence. A few of the windows, in order of those with the highest MbetaP scores, are shown (A). The MbetaP scores from those windows that have scored higher than the minimum threshold score of 1.2 are summed by every residue in those windows (B). These summed values are the y-coordinates which are plotted against the amino acid sequence (on the x-axis) in order to produce a SALSA beta-SC plot.

 
A worked example of SALSA beta-strand contiguity and its plotting tool
In Figure 5, SALSA and its plotting tool are illustrated with a calculation of the beta-SC for a 10-residue peptide sequence. For this peptide, there are only 28 possible windows within the range of 4–20 residues and these are placed in order of their MbetaP scores (see also screenshot in Supplemental Fig. 1B). In Figure 5A, the calculation of MbetaP for the highest scoring window (VFMK) is shown in full. The Chou and Fasman (1974b) secondary structure preference numbers (from which the MbetaPs are calculated) are shown below the sequence. The MbetaPs for the other seven windows are also listed.

In Figure 5B, the calculation for the plotting tool is illustrated. Only the top five windows have MbetaP scores greater than the threshold of 1.2 and therefore only these window scores are included in the subsequent calculation and the plot. The value for the y-axis in the beta-SC plots comes from the sum of the MbetaPs for each residue that is present in the top five windows. Each window's MbetaP score is allocated to every residue in that window equally—not to the middle residue alone.

Calculating SALSA {alpha}-helical amphipathicity
SALSA {alpha}-helical amphipathicity is calculated as the net hydrophobicity of a peptide window as though in an {alpha}-helical conformation, which is illustrated by a Schiffer and Edmundson {alpha}-helical wheel projection (Schiffer and Edmundson 1967). The net hydrophobicity is calculated by the difference of their vectors (which is similar to the measurement of the hydrophobic moment as defined by Eisenberg and colleagues) and then divided by the number of residues in that window (see Equation 2; Eisenberg et al. 1982, 1984). We refer to this as the "mean {alpha}-helical amphipathicity" (M{alpha}-HA):



Formula 2

Where h is the Kyte and Doolittle hydropathy score for a residue, n is the position of the residue in the fragment (n = 0 for first residue), {theta} is the angle subtended by each progressive residue, as though projected onto an {alpha}-helical wheel (this is ~100°), and N is the number of residues in the peptide window. For the plots presented here, we have used Kyte and Doolittle hydrophobicity numbers (Kyte and Doolittle 1982).

In Figure 6, a worked example is presented using a Schiffer and Edmundson {alpha}-helical wheel projection to illustrate how M{alpha}-HA is calculated for one 11-amino acid peptide window (MDVFMKGLSKA). The first residue in the wheel is positioned at 0° and the angle subtended is 98°. Residues at 90°–270° from the first residue should therefore have negative values of those at 270°–90°. The cosine function accounts for this.


Figure 6
View larger version (22K):
[in this window]
[in a new window]

 
Figure 6. A Schiffer and Edmundson {alpha}-helical wheel projection illustrates the calculation of the mean {alpha}-helical amphipathicity for MDVFMKGLSKA. (The wheel projection uses an angle of 98° according to NMR [Bussell et al. 2005] and SDSL data [Jao et al. 2004].)

 
Constructing SALSA {alpha}-helical amphipathicity plots
The plotting tool for {alpha}-helical amphipathicity uses the same strategy as that used to produce beta-SC plots (which was illustrated in Fig. 5). Thus, a sliding peptide window calculates a mean score for all possible windows within a selected range of window sizes. In this case, the mean score is the mean {alpha}-helical amphipathicity (M{alpha}-HA), rather than mean beta-strand propensity (MbetaP). Once again, the plotting tool filters the large data set by discarding windows with scores below a minimum threshold. The y-coordinate for each residue comes from the sum of M{alpha}-HA scores from every window in which that residue is found. Each window's M{alpha}-HA score is allocated to every residue in that window equally—not to the middle residue only.

Selecting a range of peptide window sizes for {alpha}-helical plots
In globular proteins, the average length of {alpha}-helices is about 14 residues, ranging between 9 and 37 residues (Kumar and Bansal 1998). The {alpha}-helical regions in synucleins are encoded by a varying number of repeats which are in multiples of 11 amino acids (with five repeats in beta-syn and seven in {alpha}-syn) (Jao et al. 2004). This has been experimentally observed to constitute three turns of an {alpha}-helix in a structure described as an 11/3 {alpha}-helix (Jao et al. 2004; Bussell et al. 2005). Therefore, a range of peptide window sizes was selected to start with 11 residues and to include all windows up to and including 33 residues. This may be particularly relevant to synucleins, because the most prominent feature of their {alpha}-helices is that they are continuously amphipathic over a large number of residues (compared to {alpha}-helices found in globular proteins) and, despite the fact that the 11-mers are not always contiguous, the {alpha}-helix does not appear to be disrupted by this. Thus, "{alpha}-helical contiguity" might also be a relevant description.

Deriving a minimum threshold score for {alpha}-helical amphipathicity plots
The minimum threshold score for {alpha}-helical amphipathicity was determined empirically. This was based on the lowest minimum threshold score that could produce the best correlation between the integrals of the {alpha}-helical amphipathicity plots with an observed conformational change from "random coil" in phosphate buffer to {alpha}-helical in the presence of 10 mM SDS (data not shown). The proteins used for this were human {alpha}-syn, human beta-syn, and four other natively unfolded proteins. These included a late embryogenesis abundant (LEA) protein from Aphelenchus avenae (called "AavLEA1"), an LEA protein from wheat (called "Em"), human tau-441, and colicin N T-domain. (Purified recombinant AavLEA1 and Em were kindly provided by Dr. A. Tunnacliffe, University of Cambridge, United Kingdom. Purified recombinant colicin N T-domain was kindly provided by Dr. J. Lakey, University of Newcastle upon Tyne, United Kingdom.)

The best correlation was produced using a minimum threshold score of 0.8. SALSA is implemented in Java (Sun Microsystems) with a modern graphical user interface (see screenshots in Supplemental Fig. 1). To obtain the algorithm, please contact Louise Serpell. (l.c.serpell{at}sussex.ac.uk).


    Footnotes
 
3 Present addresses: MRC Laboratory of Molecular Biology, Cambridge CB2 2QH, United Kingdom; Back

4 School of Life Sciences, University of Sussex, Falmer, Brighton BN1 9QG, United Kingdom. Back

Reprint requests to: Shahin Zibaee, MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, United Kingdom; e-mail: shahin{at}mrc-lmb.cam.ac.uk; fax: +44 1223 402 310; or Louise C. Serpell, Department of Biochemistry, School of Life Sciences, University of Sussex, Falmer, Brighton BN1 9QG, United Kingdom; e-mail: l.c.serpell{at}sussex.ac.uk; fax: +44 1273 678 433.

Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.062624507.

Supplemental material: see www.proteinscience.org


    Acknowledgments
 TOP
 Abstract
 Introduction
 Results
 Discussion
 Materials and Methods
 Acknowledgments
 References
 
We thank Dr. Michael Wise and Dr. Cyrus Chothia for helpful comments. S.Z. is an Alzheimer's Research Trust Fellow, O.S.M. is supported by the UK Biotechnology and Biological Sciences Research Council, and L.C.S. by the Wellcome Trust. Part of this work was funded by the UK Medical Research Council.


    References
 TOP
 Abstract
 Introduction
 Results
 Discussion
 Materials and Methods
 Acknowledgments
 References
 
Abraha, A., Ghoshal, N., Gamblin, T.C., Cryns, V., Berry, R.W., Kuret, J., and Binder, L.I. 2000. C-terminal inhibition of {tau} assembly in vitro and in Alzheimer's disease. J. Cell Sci. 113: 3737–3745.[Abstract]

Alonzo, N.C., Hyman, B.T., Rebeck, G.W., and Greenberg, S.M. 1998. Progression of cerebral amyloid angiopathy: Accumulation of amyloid-beta40 in affected vessels. J. Neuropathol. Exp. Neurol. 57: 353–359.[Medline]

Alzheimer, A. 1907. Über eine eigenartige Erkrankung der Hirnrinde. Allg. Z. Psychiatr. 64: 146–148.

Baba, M., Nakajo, S., Tu, P.H., Tomita, T., Nakaya, K., Lee, V.M., Trojanowski, J.Q., and Iwatsubo, T. 1998. Aggregation of {alpha}-synuclein in Lewy bodies of sporadic Parkinson's disease and dementia with Lewy bodies. Am. J. Pathol. 152: 879–884.[Abstract]

Balbach, J.J., Petkova, A.T., Oyler, N.A., Antzutkin, O.N., Gordon, D.J., Meredith, S.C., and Tycko, R. 2002. Supramolecular structure in full-length Alzheimer's beta-amyloid fibrils: Evidence for a parallel beta-sheet organization