|
|
||||||||
-strands in the amyloid fibril core of
-synuclein, A
, and tau using the amino acid sequence alone
1 Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, United Kingdom
2 MRC Laboratory of Molecular Biology, Cambridge, Cambridge CB2 2QH, United Kingdom
(RECEIVED October 24, 2006; FINAL REVISION February 8, 2007; ACCEPTED February 18, 2007)
| Abstract |
|---|
|
|
|---|
-synucleinopathies such as Parkinson's disease, dementia with Lewy bodies, and multiple system atrophy. Familial forms of
-synucleinopathies have also been linked with missense mutations or gene multiplications that result in higher protein expression levels. In order to form these fibrils, the protein,
-synuclein (
-syn), must undergo a process of self-assembly in which its native state is converted from a disordered conformer into a
-sheet-dominated form. Here, we have developed a novel polypeptide property calculator to locate and quantify relative propensities for
-strand structure in the sequence of
-syn. The output of the algorithm, in the form of a simple x-y plot, was found to correlate very well with the location of the
-sheet core in
-syn fibrils. In particular, the plot features three peaks, the largest of which is completely absent for the nonfibrillogenic protein,
-syn. We also report similar significant correlations for the Alzheimer's disease-related proteins, A
and tau. A substantial region of
-syn is also of converting from its disordered conformation into a long amphipathic
-helical protein. We have developed the aforementioned algorithm to locate and quantify the
-helical hydrophobic moment in the amino acid sequence of
-syn. As before, the output of the algorithm, in the form of a simple x-y plot, was found to correlate very well with the location of
-helical structure in membrane bilayer-associated
-syn.
Keywords:
-synuclein;
-strand propensity; Alzheimer's disease; Parkinson's disease; algorithm; amyloid fibril
| Introduction |
|---|
|
|
|---|
-synuclein (
-syn) (Baba et al. 1998; Spillantini et al. 1998a). The fibrillar component of AD senile plaques is composed of A
peptides (Glenner and Wong 1984) while that of neurofibrillary tangles is composed of tau (Brion et al. 1985; Goedert et al. 1988).
Proteins in general are able to undergo intermolecular associations and aggregation, subject to a destabilization of the natively globular fold (Colon and Kelly 1991; Booth et al. 1997; Perutz 1997). The main effect of many missense mutations identified in cases of familial amyloidoses is likely to be a reduction in stability of tertiary or quarternary structure (Siepen and Westhead 2002). The wild-type protein in a destabilizing environment can also lead to aggregation (Colon and Kelly 1991; Booth et al. 1997; Zurdo et al. 2001). However, in the case of the natively unfolded proteins (or chemically unfolded proteins) (Guijarro et al. 1998; McParland et al. 2000), destabilization of the native state is no longer the rate-limiting step so that self-assembly of the protein in vitro may be estimated from physicochemical properties of its amino acid side chains (Lopez De La Paz et al. 2002; Tjernberg et al. 2002; Chiti et al. 2003). Recombinant DNA techniques and in vitro aggregation assays can be employed to analyze the relationship of the sequence properties to aggregation. Fibril assembly of
-syn, A
, and tau proteins involves a conformational change from "random coil" to
-sheet structure (Glenner et al. 1974; Watson et al. 1998; Serpell et al. 2000; von Bergen et al. 2001).
Here, we have designed a novel algorithm to measure a derivative of
-strand propensity which we refer to as "
-strand contiguity" (
-SC) by a simple treatment of Chou and Fasman's secondary structure preference numbers (Chou and Fasman 1974a). This was expected to locate "fibrillogenic hotspots" within a protein's sequence. Specifically, peaks in the
-SC plots were found to correlate well the location of amyloid fibril cores in
-syn, A
, and tau (Wischik et al. 1988; Petkova et al. 2002; Heise et al. 2005). Although the calculations were not designed to predict the absolute rate of fibril assembly very likely they will have a significant bearing on the thermodynamics of fibril assembly. Furthermore, we suggest that the peaks of
-SC plots may represent good pharmacological targets for inhibitors of fibril assembly for
-syn, A
, and tau, with possible therapeutic consequences for neurodegeneration.
In the presence of phospholipids or surfactants,
-syn has also been found to undergo a significant change from "random coil" to a long amphipathic
-helical form, lying parallel to the surface of the phospholipid membrane bilayer or surfactant micelle (Eliezer et al. 2001; Jao et al. 2004). Our program, has been adapted to also measure
-helical amphipathicity (
-HA) by applying a calculation of the hydrophobic moment of a projected
-helix, similar to that used previously by Eisenberg et al. (1982). The
-HA plots correlate very well with experimental data for
-syn and
-syn.
| Results |
|---|
|
|
|---|
-strand propensity score of a peptide window using Chou and Fasman
-strand preference numbers (Chou and Fasman 1974a). The algorithm samples the full length of the protein's sequence by sliding a window along, one residue at a time. The x-y plots are a compilation of these scores which are plotted as a y-coordinate against the amino acid sequence which is along the x-axis. We refer to this novel measure of
-strand propensity as "
-strand contiguity" (
-SC).
SALSA
-strand contiguity
-Synuclein
The
-SC plot for
-syn has three main peaks, spanning residues 3289 (see Fig. 1A), with the dominant peak (III) covering residues 5889.
|
(140)
-SC plot for A
(140) has two relatively small peaks (as compared with
-syn's prominent peak) which span residues 840 and are separated by a significant dip over residues 2327 (Fig. 1B). The two peaks are comparable in size to one another.
Tau
In Figure 1C, the
-SC plot for the sequence of the longest isoform of tau with 441 amino acids (abbreviated to "tau-441") is shown. The four largest peaks are labeled I-V. The most prominent peak (III) is comparable in size to
-syn's most prominent peak (III).
SALSA
-strand contiguity of homologous proteins
One of the most promising utilities of the SALSA
-SC algorithm is for making direct comparisons of different protein sequences. This is illustrated in the sections below in which the SALSA
-SC plot of
-syn is directly compared with that of
-syn. This is followed by a comparison of A
(142) with A
(140), and finally four-repeat tau is compared with three-repeat tau.
-Synuclein vs.
-synuclein
The
-SC plot of
-syn is similar to that of
-syn up to and including peaks I and II (Fig. 2A). However, peak III is entirely absent from
-syn.
|
(140) vs. A
(142)
-SC plots for A
(140) and A
(142) have two small peaks spanning residues 940 and 942, respectively. Both peaks are approximately equivalent in size for A
(140). However, in the case of A
(142), the carboxy-terminal peak (II) is larger (Fig. 2B).
Four-repeat tau vs. three-repeat tau
For all six isoforms of tau that are expressed in the human central nervous system, the
-SC plots are nearly identical. The only difference is in the size of the prominent peak (III), which is larger in isoforms with three-repeats. The absence of the extra repeat is responsible for the reduced size of the prominent peak in three-repeat tau (3R-tau) (Fig. 2C). Peak III spans residues 296326 in four-repeat tau (4R-tau) (tau-441) or 272296 for 3R-tau (tau-410), covering most of the fourth or third repeat of the microtubule binding region, respectively.
SALSA
-helical amphipathicity
-Synuclein
The
-HA plot of
-syn produces one long continuous peak, covering
95 residues from the N-terminal end (Fig. 3A). Although the peak is uninterrupted, there is a distinctive dip at around residue 40. In Figure 3B, the location of the peak is indicated along a linear representation of the protein, immediately above the actual amino acid sequence. Below the sequence, we have highlighted one predicted and eight experimentally observed locations of
-helical structure.
|
-Synuclein vs.
-synuclein
-HA plot of
-syn is shown in Figure 4, overlapping that of
-syn. The peaks in the plots over residues 150 are very similar between the two synucleins (as expected from their high sequence homology). Both plots display a similar dip at around residue 40. While
-syn's
-HA peak terminates at around residue 95,
-syn's peak terminates upstream of this at around residue 82.
|
| Discussion |
|---|
|
|
|---|
-strand contiguity algorithm
-syn, A
, and tau with their fibrillar forms, we have compared experimental data from the literature with the outputs of our algorithm. In the first of two parts, we show there to be a good correlation between experimental data on the location of the
-sheet core in preformed fibrils and peaks in the algorithm's graphical output ("
-SC plots").
In the second part, we make direct comparisons of the
-SC plot for each protein with that of a similar protein and discuss the significance of the differences between these plots with observed fibrillogenicities and pathogenicities.
SALSA
-strand contiguity compared with structural data of
-sheet fibrils
SALSA
-strand contiguity (
-SC) samples every region of a protein's sequence for local
-strand propensity. Without incorporating computations of any other physicochemical property, the algorithm and its plotting tool show a surprisingly good correlation with published structures of amyloid and amyloid-like fibrils of
-syn, A
, and tau.
-Synuclein
The location of SALSA
-SC peaks in
-syn (Fig. 1A) correlate well with the location of regions identified as forming
-sheet structure in fibrils, as deduced both from direct and indirect structural analyses.
Solid-state NMR experiments on amyloid-like fibrils formed from
-syn have shown that the
-sheet core extends from residue 22 to
105 (Heise et al. 2005). Indirect methods included hydrogendeuterium exchange, which found the region
39101 to be most protected in
-syn fibrils (Del Mar et al. 2005). Similarly, electron paramagnetic resonance of site-directed spin-labeled
-syn showed that, upon fibril assembly, intermolecular distances were reduced over the central residues from
34
101. Thus, these
6070 residues are implicated in the fibril core (Der-Sarkissian et al. 2003). Finally, proteinase K digestion experiments showed residues 31109 to be resistant, implicating these 79 amino acids in the core of
-syn fibrils (Miake et al. 2002).
While peak III of the
-SC plot ends at residue 89, with another very small peak visible at residues 9097, the experimental data find that the
-sheet core can extend to between
101 and 109. Although the significance of these discrepancies is presently unclear, we hypothesize that the
-SC plot may represent a statistical average of a heterogenous population of
-syn fibrils. The existence of structural heterogeneity in
-syn fibril populations has been demonstrated (Heise et al. 2005). It is further noted that in vitro aggregation assays using the 18-residue peptide,
-syn(6178), have found it to be highly amyloidogenic, while the peptide,
-syn(7995), was not (El-Agnaf et al. 1998a).
Another reason that the
-sheet core may extend beyond that predicted by the
-SC algorithm may be due to further stabilizing
interactions that could occur through the phenylalanine residue at position 94.
SALSA
-strand contiguity compared with structural data of
-sheet fibrils
A
(140)
The location of peaks in the
-SC plot of A
(140) correlates well with regions that have been shown to occupy the
-sheet core in amyloid fibrils of A
(140), as compared with experimental data from direct and indirect structural analyses.
Solid-state NMR has been used to produce high-resolution structural data for fibrils of A
(140) and has shown that residues 19 remain disordered, while residues 1239 are predominantly
-sheet (Balbach et al. 2002). In another study, energy minimization modeling, constrained by experimental data (from NMR, EM, and fiber diffraction), concluded that the N-terminal region, approximately covering the first 10 amino acids, is disordered, while residues 1224 and 3040 adopt a
-sheet conformation and residues 2529 contain a bend (Petkova et al. 2002). The
-SC plot only predicts the location of
-strands and is not designed to predict bends or other secondary structural motifs. However, it is noted that the location of this bend correlates well with the location of the dip between the two
-SC peaks. Other techniques, such as solution NMR, site-directed spin labeling, and limited proteolysis, have also been used to obtain structural information on A
(140) fibrils. In hydrogendeuterium exchange measured by NMR of A
(140) fibrils, residues 115 and 3740 were not protected and were therefore unlikely to be part of the
-sheet core, remaining unstructured instead (Whittemore et al. 2005). Residues 25 and 26 could also exchange backbone amide protons and were potentially part of a turn. Site-directed spin labeling of A
(140), as well as A
(142), showed that upon fibril assembly regions that remained highly mobile, and therefore not likely to form part of the
-sheet core, included the N-terminal region covering the first 1011 amino acids, residues
2329, and C-terminal residues from
39 (Torok et al. 2002). Experiments in which A
(140) fibrils had been subjected to limited proteolysis showed N-terminal residues, up to between
12 and 16, to be fully exposed. The remainder of the protein was protected from the solvent, implicating this region in the
-sheet core of the fibrils (Kheterpal et al. 2001). Some cleavage also occurred at Lys28Gly29, but it was not considered to be sufficient evidence for increased accessibility of this bond. Another indirect structural assay is scanning proline mutagenesis (Williams et al. 2004). This works on the premise that if fibril assembly of A
(140) is insensitive to a particular proline mutation, then that residue is not likely to form part of the
-sheet core of the fibril. Regions insensitive to proline mutation included residues 114 and 3740. Residues 22, 23, 29, and 30 were also insensitive to proline mutations, leading to the conclusion that these residues may occupy two turns between three
-strands. Collectively, these findings correlate well with
-SC peaks, except for observations that showed the C-terminal
-strand ending prior to residue 40 (Balbach et al. 2002; Torok et al. 2002; Williams et al. 2004).
There is currently no direct structural data available for A
(142) fibrils. However, quenched hydrogendeuterium exchange NMR has recently been performed and has indicated that residues 1826 and 3142 may form the
-sheet core (Lührs et al. 2005). It has been suggested that
-sheets in A
(142) fibrils are C-terminally shifted from their location in A
(140) fibrils and also from the peaks of SALSA
-SC plots. This discrepancy may be a consequence of the fact that the
-SC algorithm does not make any predictions of tertiary or intraprotein folding associations. For example, the salt bridge between Asp23 and Lys28 may influence the location of the turn and thus the precise location of the
-strands within the resulting hairpin. However, we also consider that this discrepancy may reflect a degree of heterogeneity in the structure and assembly of A
fibrils, such that the precise location of
-strands will vary. This is further supported by results of site-directed spin labeling experiments with A
(142), which showed different
-strand positions (mentioned above) (Torok et al. 2002). We hypothesize that (as with
-syn) the location of peaks in
-SC plots represents a statistical average of heterogeneous fibril populations, rather than a rigid structural model. NMR data for the existence of structural heterogeneity or polymorphism in amyloid fibrils have previously been reported for A
(140) (Petkova et al. 2005).
SALSA
-strand contiguity compared with structural data of
-sheet fibrils
Tau
The minimal protease-resistant region of AD paired helical filaments was reported to be an
90100 amino acid fragment, at residues
255
350, which is largely overlapping with the microtubule-binding repeat (MTBR) region (Wischik et al. 1988; Crowther et al. 1989; Jakes et al. 1991; Novak et al. 1993). The prominent peak in the
-SC plots for 3R-tau and 4R-tau, which is about 30 residues long, coincides with the center of this fragment. Electron paramagnetic spin resonance studies of tau assembly showed that, when fibrils had assembled, residues 301320 were at the core of the fibril (Margittai and Langen 2004, 2006). Fibril assembly assays of full-length tau compared with a number of C-terminally truncated fragments demonstrated that the ability to aggregate only began to be impeded when the truncation included the C terminus from residue 321. Fibril assembly was drastically reduced by a truncation up to 314, and abolished altogether by a truncation up to 292 (Abraha et al. 2000). Therefore, while protease sensitivity experiments implicate a large region within the C terminus, spin-labeling experiments of full-length tau and in vitro aggregation assays with C-terminally truncated tau implicate a smaller region, the location of which appears to correlate more precisely with the location of the prominent peak (III) in the
-SC plot.
SALSA
-strand contiguity of similar proteins and relative pathogenicities
The algorithm allows direct comparison of different protein sequences and has been used to compare
-syn with
-syn, A
(142) with A
(140), and three-repeat tau with four-repeat tau. In the following sections, we discuss how the relative differences of their
-SC plots correlate with experimental data, as well as with observed pathogenic propensities. Each pairing of proteins is highly homologous to one another and, as a consequence, their
-SC plots display near identical features (i.e., most peaks are in the same places and are the same sizes). However, in each case, we find that, where the sequences do differ, it is specifically the size of the prominent
-SC peak that differs. Furthermore, the relative differences appear to correlate well with the relative fibrillogenicities in vitro and pathogenicities in vivo.
-Synuclein vs.
-synuclein
The
-SC plots of
-syn and
-syn overlap for the N-terminal regions covering peaks I and II (Fig. 2A). However, peak III is absent from
-syn, which coincides with the difference at residues 71 and 72 and, most notably, the absence of the
-strand-favorable amino acids found in
-syn at residues 7383. Peak III, which is the largest of the three peaks, locates to the region reported to dominate the fibrillogenic capacity of
-syn (Han et al. 1995; Iwai et al. 1995; El-Agnaf et al. 1998a,b; Bodles et al. 2000, 2001; Giasson et al. 2001; Du et al. 2003).
-Syn is reproducibly nonfibrillogenic (Goedert 2001) and is absent from Lewy bodies (Spillantini et al. 1997). Thus far, no mutations in
-syn have been linked to familial PD. The difference between the two
-SC plots complements the difference between the fibrillogenic properties of
- and
-syn and their relationship to PD.
SALSA
-strand contiguity of similar proteins and relative pathogenicities
A
(140) vs. A
(142)
-SC plots for A
(140) and A
(142) have two small peaks spanning residues 940 and 942, respectively. Both peaks are approximately equivalent in size for A
(140). However, in the case of A
(142), the C-terminal peak (II) is larger (Fig. 2B).
A
(142) is more fibrillogenic than A
(140) in vitro (Jarrett et al. 1993; Murakami et al. 2002), which correlates well with the comparison of their
-SC plots. Thus, it appears that the addition of isoleucine and alanine at the carboxyl terminus of A
(142) enhances the fibrillogenicity of the surrounding residues. In agreement with this, it is noticed that the highest M
P for any peptide window in A
(142) (within the range of 420 residues) is scored by 38VVIA42. The relationship of fibrillogenicity to disease is supported by the observation that A
(142) is more toxic than A
(140) in cell culture (Davis-Salinas et al. 1995; Murakami et al. 2002). Also, the increased ratio of A
(142) to A
(140) has been associated with an increased risk for AD (Younkin 1995), although an increase in levels of either peptide is a risk factor in itself (Gregory and Halliday 2005). In neuropathological studies, cerebrovascular deposits were found to have a higher ratio of A
(142) to A
(140) (Roher et al. 1993), as well as a higher ratio of A
(140) to A
(142) (Joachim et al. 1988; Prelli et al. 1988; Suzuki et al. 1994; Alonzo et al. 1998; McCarron et al. 2000; Fryer et al. 2003; Ingelsson et al. 2004).
Therefore, the intrinsic fibrillogenicity of these two peptides correlates with the propensity for fibril assembly in vitro, which may correlate with in vivo aggregation and ultimately with the age-of-onset or severity of disease. As with the synucleins in the previous section, a strong argument can be made for a direct relationship between amino acid sequence, fibrillogenicity, and the progression of a sporadic neurodegenerative condition. We suggest, therefore, that the
-SC algorithm can provide insight into fibril assembly of A
in AD.
SALSA
-strand contiguity of similar proteins and relative pathogenicities
Four-repeat tau vs. three-repeat tau
An increased ratio of 4R-tau to 3R-tau has been associated with cases of frontotemporal dementia and Parkinsonism linked to chromosome 17 (FTDP-17) (Clark et al. 1998; Hong et al. 1998; Hutton et al. 1998; Spillantini et al. 1998b; Goedert et al. 1999; Hasegawa et al. 1999; Spillantini et al. 2000; Iseki et al. 2001; Miyamoto et al. 2001; Grover et al. 2002; Yoshida et al. 2002; Connell et al. 2005). About half of the known mutations in tau influence the splicing of tau pre-mRNA, mostly resulting in a higher ratio of 4R- to 3R-tau, but occasionally resulting in a lower ratio (Goedert and Spillantini 2006; van Swieten et al. 2007). A complete understanding of how this ratio influences the onset of a tauopathy is currently lacking. In some tauopathies, tau inclusions are composed mainly of either 4R-tau or 3R-tau (Buée-Scherrer et al. 1996; Bronner et al. 2005). The
-SC plots suggest that in both proteins the fibrillogenic propensities are concentrated in peak III. Comparison of these peaks suggests that 4R-tau may have a higher propensity for fibril assembly than 3R-tau.
SALSA
-SC plots are not relevant for globular proteins
Linding and colleagues have shown that the "
-aggregation" tendency of globular proteins is almost threefold higher than that of natively unfolded proteins (Linding et al. 2004). This was measured using their algorithm (called TANGO) which identifies aggregation-nucleating regions in a protein. It is to be expected that the sequences of globular proteins, which have a higher proportion of hydrophobic residues (and
-strand-favorable residues) than natively unfolded proteins, will score higher. It is also the case that the SALSA
-SC of globular proteins will be significantly higher, as the algorithm will find more peptide windows with higher M
P scores in the sequences of globular proteins than in natively unfolded proteins. Thus, we emphasize that SALSA
-SC is designed to calculate the propensity for fibril assembly of natively unfolded proteins only. SALSA
-SC does not calculate the propensity for nonfibrillar forms of aggregation either.
Differences between SALSA
-SC and other aggregation algorithms
Several aggregation algorithms have been produced by other research groups, often including a graphical format (Fernandez-Escamilla et al. 2004; Linding et al. 2004; Sánchez de Groot et al. 2005; Tartaglia et al. 2005; Thompson et al. 2006). Such outputs highlight regions in protein sequences with higher thermodynamic propensities for aggregation and are often correlated with the aggregation rate in vitro. In at least one of these algorithms, peaks of the graphs indicate where in the sequence a protein's aggregation propensity is most likely to be sensitive to mutations (Pawar et al. 2005). The
-SC plots are unique in that the algorithm employs only a single physicochemical property of amino acids (
-strand propensity). Thus, these plots are not intended to predict the Rate of protein aggregation. The experimentally observed rates of fibril assembly by recombinant human
-syn, as well as
-syn,
-syn,
1-syn, and
2-syn from Fugu rubripes, have been found to correlate with other physicochemical properties, including electrostatic charge repulsion, hydrophilicity, and secondary structural propensities (Yoshida et al. 2006). Similar correlations have also been observed for a number of other proteins (Chiti et al. 2002, 2003). SALSA measures an intrinsic structural propensity and is therefore insensitive to the environment (i.e., pH, temperature, salt concentrations). Extrinsic factors are taken into account by some of the algorithms that predict aggregation rates and aggregation propensities (Fernandez-Escamilla et al. 2004; Pawar et al. 2005; Tartaglia et al. 2005). Although we do observe some degree of overlap with the graphical outputs of these algorithms, SALSA
-SC is the only algorithm that is specialized to amyloid fibrillar aggregation and not to other forms of protein aggregation, and this may be part of the reason for the difference in the various outputs.
Glycine, glutamine, asparagine, low sequence complexity, and aromatic residues
A potential source of inaccuracy in the correlation of
-SC plots with
-sheet fibrils is the use of Chou and Fasman (1974a)
-strand preference numbers. It is very likely that there will be some differences between the propensity for
-strand structure in native proteins (from which the Chou and Fasman [1974a] preference numbers were derived) and in amyloid fibrils. An example of this is for the residue, glycine, which has a very low Chou and Fasman (1974a)
-strand preference number but, as demonstrated by spider silk proteins, is readily able to form amyloid-like
-sheet assemblies (Kenney et al. 2002). Similarly, glutamine and asparagine, which are also found to form
-sheet assemblies, are not predicted to do so by SALSA
-SC (Nelson et al. 2005). We hypothesize that our algorithm might be adapted to account for these types of self-assembly by incorporating a calculation for the degree of sequence complexity (Kenney et al. 2002). Another example is the potential for
stacking by certain aromatic residue side chains that enhance
-sheet propensity to a greater extent when occurring in the context of amyloid fibril structure than in native structure. Therefore, this might not be adequately quantified by the Chou and Fasman (1974a) numbers alone (Gazit 2002; Makin et al. 2005).
SALSA
-helical amphipathicity compared with structural data of membrane-associated
-synuclein
-Synuclein
Membrane binding by
-syn coincides with a conformational change to an
-helical-dominated form. Edmundson helical wheels (Schiffer and Edmundson 1967) of canary
-syn show the
-helical amphipathic distribution of amino acids resides within the
100 amino-terminal residues (Davidson et al. 1998). By alignment with human
-syn (which has very few differences from canary
-syn), this corresponds to amino acids 193. By the method of Segrest et al. (1992), three potential helix breakers were identified within this region, thus predicting the formation of five separate
-helices at residues 115, 1737, 3948, 5060, and 6193.
The structure of the
-helical region has been studied by solution NMR, using the lipid mimetic, sodium dodecyl sulphate (SDS) (Eliezer et al. 2001). In most studies, only a single break in the
-helical region has been found, either at residue 40 (Bisaglia et al. 2005) or between residues 42 and 44 (Bussell and Eliezer 2003; Chandra et al. 2003). Some backbone flexibility has also been reported for residues 18, 34, and 8085 (Bussell et al. 2005). A site-directed spin-labeling study of
-syn in small unilamellar vesicles supported the existence of a single, long
-helix without any breaks (Jao et al. 2004). A titration of SDS to
-syn found that residues 337 and 4592 became
-helical (Ulmer et al. 2005). In all studies, the C-terminal end of
-syn (
residues 100140) remained unfolded.
Here, we show that, by comparison with experimental data, the
-HA plot of
-syn can map the
-helical regions of
-syn very well (Fig. 3). In Figure 3B, the location of the
-HA peaks is annotated along the amino acid sequence and compared with similar, annotated representations of one sequence-based prediction and eight experimentally observed locations of
-helical structure (as described above). Furthermore, the position of a break in the
-helix appears to correlate well with the position of a distinctive dip in the
-HA peak. We hypothesize that this may be indicative, not necessarily of the precise location of a permanent break, but of a region with a higher propensity for a break, such that, under certain conditions or constraints, the
-helix may remain unbroken, break, or fluctuate between the two states. This hypothesized variability is supported by the fact that different research groups have located the break to slightly different positions.
SALSA
-helical amphipathicity
-Synuclein vs.
-synuclein
The residue-by-residue conformational data set for
-syn in the presence of SDS micelles has recently been published (in the form NMR C
2° shifts) (Sung and Eliezer 2006). An
-helical conformation was found covering residues
185. This correlates well with the
-HA plot of
-syn. The
-helical structure was reported to be interrupted at around residues 43 and 44, which correlates with the location of a dip in the
-HA peak of
-syn and coincides with the position of the dip in the
-HA peak of
-syn.
The good correlation of calculated
-HA with observed
-helical structure demonstrates that the location of the membrane-associated
-helical region can be deduced from the primary structure alone. Application of the algorithm allows the direct comparison of different sequences, as has been demonstrated by comparison of
-syn with
-syn. The algorithm uses a similar calculation of hydrophobic moment as that used by another algorithm (called "MOMENT"; http://nihserver.mbi.ucla.edu/moment/), but produces an output that compares more favorably with experimentally observed data. This is likely to be due to the method by which SALSA samples and collates scores from a larger number of peptide windows.
SALSA
-helical amphipathicity algorithm
SALSA
-helical amphipathicity (
-HA) uses the same mechanism as SALSA
-SC for sampling and collating data into a plot, but instead of using
-strand propensity scores, the algorithm measures the hydrophobic moment of a protein's sequence. A number of structural studies have been performed on
-syn in the context of phospholipid bilayers or surfactant micelles (Eliezer et al. 2001; Bussell and Eliezer 2003; Chandra et al. 2003; Jao et al. 2004; Bisaglia et al. 2005; Bussell et al. 2005; Ulmer et al. 2005) and these have been directly compared with SALSA
-HA plots. As with the comparisons of
-SC plots with experimentally observed
-sheet fibrils, a high level of correlation is seen between
-HA plots and experimental data of
-syn in its
-helical form. This provides us with a powerful bioinformatic tool with which to further probe the conformational character and function of
-syn.
Conclusion
The SALSA algorithm displays a highly favorable comparison between latent
-strand propensity in the form of
-SC peaks and the observed location of
-sheet structure in fibrillar forms of
-syn, A
, and tau. Furthermore, the algorithm has confirmed there to be a positive correlation between a derivative of relative
-strand propensities, relative fibrillogenicities, and pathogenic propensities. Thus, we conclude that SALSA provides new and improved insights into the sequence correlates of fibrillogenic propensities for the three proteins that are pertinent for the most common neurodegenerative conditions in humans. The data support the concept that delaying
-sheet formation by these three proteins may also delay the pathogenic consequences of fibril assembly and therefore represent part of a therapeutically beneficial strategy. The algorithm will allow us to test this hypothesis further, both in vitro and in neurodegenerative disease models.
SALSA can map latent propensities for
-helical structure as well. This is presented in the form of
-HA plots and confirms the role of hydrophobicity in the formation of membrane-associated
-helical structure in synucleins. This algorithm will also allow us to test the role of this property in
-synucleinopathy models.
| Materials and Methods |
|---|
|
|
|---|
Calculating SALSA
-strand contiguity
For any selected amino acid property, the algorithm calculates the mean score of that property for all the peptide windows within a polypeptide's sequence. This can include a range of peptide window sizes which will be overlapping in the sequence. In the case of
-strand contiguity (
-SC), a "mean
-strand propensity" (M
P) score is first calculated for each peptide window, according to Equation 1:
|
|
where P
, P
, and Pt are the Chou and Fasman
-strand,
-helix, and reverse turn preference numbers, respectively (Chou and Fasman 1974a). The sums of the P
, P
, or Pt preference numbers for every residue in a peptide window are abbreviated to
P
,
P
, or
Pt, respectively.
In the case of
-SC, SALSA slides a four-residue window along the entire sequence, one residue at a time, calculating the M
P score for each. Thus, in the case of human
-synuclein which has 140 residues, there are a total of 137 four-residue windows, each with its own M
P score. SALSA then repeats this calculation using a five-residue window, then a six-, seven-, eight-residue and so on up to and including a 20-residue window.
Selecting a range of peptide window sizes for
-strand contiguity plots
The peptide window size range for
-SC plots was selected in the context of fibril assembly of unfolded proteins. It has been found that fibrils can be formed by peptides as short as four residues in length (Reches et al. 2002; Tjernberg et al. 2002). Therefore, peptide window sizes were selected to range from 4 to 20 residues. This was also based on the hypothesis that the propensity for any 4-mer, 5-mer, 6-mer, 7-mer, etc., to form
-strands will be directly enhanced by the propensity of neighboring residues, potentially irrespective of whether they form part of the final
-sheet. The selection of a size range of 420 residues is further supported by the observation that, while
-strands in globular proteins are on average 5.3 residues in length, the majority tends to have a range of 210 residues and can be as long as 20 residues. This was based on an analysis of 320 nonhomologous protein chains (R. Laskowski, pers. comm.).
Deriving a minimum threshold score for
-strand contiguity plots
Thus, SALSA
-SC contiguity samples every possible window for 17 different window sizes (420). In the case of
-syn, this produces a data set of 2193 windows, each with its own M
P score. A
-SC plot of
-syn that is constructed using the scores from all of these windows is relatively featureless (see Supplemental Fig. 1A). Therefore, in order to produce meaningful plots, the data set needs to be filtered.
One way of doing this is to restrict the data set to a limited number of windows, i.e., those with the highest M
P scores. It was found that plots, similar to those presented in Figure 1, could be constructed using only 400 windows. The windows selected were those with the highest M
P scores (and therefore the process of filtering the data set simply involves discarding the remaining 1793 windows). Another method to filter the data set is to discard windows that have M
P scores lower than some minimum threshold score. This approach was preferred to the former, as it also allows for the direct comparison of different protein sequences (such as those presented in Fig. 2).
The requirement for filtering the data set and the methods used to perform this have provided the algorithm with a crucial advantage, namely that the plots can, to a certain extent, be calibrated according to empirical observations. For
-SC plots, we used a comparison of plots produced by the sequences of human
- and
-syn to arrive at a minimum threshold score of 1.2. This was the lowest threshold score capable of producing a good overlap between peaks of the two sequences over residues 3060, and this threshold has been used to produce all
-SC plots presented in this paper.
Constructing SALSA
-strand contiguity plots
In all cases, SALSA plots the property in question (such as
-SC) on the y-axis, against the amino acid sequence, which is along the x-axis. For each residue along the sequence, the value for the y-coordinate is produced by adding together the M
P scores from every window that contains that particular residue. As described in the section above, this only follows a filtering step and, as a result, a residue's score may be as low as zero if all possible windows it forms a part of have M
P scores that are lower than the minimum threshold score. This is explicitly illustrated by the worked example shown in Figure 5. It follows therefore that residues which are located amidst several
-strand-favorable residues will accumulate a high score, added together from numerous and often overlapping windows, while those located amidst
-strand-unfavorable residues will tend to produce a low sum of scores, even if an individual residue has a high propensity for
-strand.
|
-strand contiguity and its plotting tool
-SC for a 10-residue peptide sequence. For this peptide, there are only 28 possible windows within the range of 420 residues and these are placed in order of their M
P scores (see also screenshot in Supplemental Fig. 1B). In Figure 5A, the calculation of M
P for the highest scoring window (VFMK) is shown in full. The Chou and Fasman (1974b) secondary structure preference numbers (from which the M
Ps are calculated) are shown below the sequence. The M
Ps for the other seven windows are also listed.
In Figure 5B, the calculation for the plotting tool is illustrated. Only the top five windows have M
P scores greater than the threshold of 1.2 and therefore only these window scores are included in the subsequent calculation and the plot. The value for the y-axis in the
-SC plots comes from the sum of the M
Ps for each residue that is present in the top five windows. Each window's M
P score is allocated to every residue in that window equallynot to the middle residue alone.
Calculating SALSA
-helical amphipathicity
SALSA
-helical amphipathicity is calculated as the net hydrophobicity of a peptide window as though in an
-helical conformation, which is illustrated by a Schiffer and Edmundson
-helical wheel projection (Schiffer and Edmundson 1967). The net hydrophobicity is calculated by the difference of their vectors (which is similar to the measurement of the hydrophobic moment as defined by Eisenberg and colleagues) and then divided by the number of residues in that window (see Equation 2; Eisenberg et al. 1982, 1984). We refer to this as the "mean
-helical amphipathicity" (M
-HA):
|
|
Where h is the Kyte and Doolittle hydropathy score for a residue, n is the position of the residue in the fragment (n = 0 for first residue),
is the angle subtended by each progressive residue, as though projected onto an
-helical wheel (this is
100°), and N is the number of residues in the peptide window. For the plots presented here, we have used Kyte and Doolittle hydrophobicity numbers (Kyte and Doolittle 1982).
In Figure 6, a worked example is presented using a Schiffer and Edmundson
-helical wheel projection to illustrate how M
-HA is calculated for one 11-amino acid peptide window (MDVFMKGLSKA). The first residue in the wheel is positioned at 0° and the angle subtended is 98°. Residues at 90°270° from the first residue should therefore have negative values of those at 270°90°. The cosine function accounts for this.
|
-helical amphipathicity plots
-helical amphipathicity uses the same strategy as that used to produce
-SC plots (which was illustrated in Fig. 5). Thus, a sliding peptide window calculates a mean score for all possible windows within a selected range of window sizes. In this case, the mean score is the mean
-helical amphipathicity (M
-HA), rather than mean
-strand propensity (M
P). Once again, the plotting tool filters the large data set by discarding windows with scores below a minimum threshold. The y-coordinate for each residue comes from the sum of M
-HA scores from every window in which that residue is found. Each window's M
-HA score is allocated to every residue in that window equallynot to the middle residue only.
Selecting a range of peptide window sizes for
-helical plots
In globular proteins, the average length of
-helices is about 14 residues, ranging between 9 and 37 residues (Kumar and Bansal 1998). The
-helical regions in synucleins are encoded by a varying number of repeats which are in multiples of 11 amino acids (with five repeats in
-syn and seven in
-syn) (Jao et al. 2004). This has been experimentally observed to constitute three turns of an
-helix in a structure described as an 11/3
-helix (Jao et al. 2004; Bussell et al. 2005). Therefore, a range of peptide window sizes was selected to start with 11 residues and to include all windows up to and including 33 residues. This may be particularly relevant to synucleins, because the most prominent feature of their
-helices is that they are continuously amphipathic over a large number of residues (compared to
-helices found in globular proteins) and, despite the fact that the 11-mers are not always contiguous, the
-helix does not appear to be disrupted by this. Thus, "
-helical contiguity" might also be a relevant description.
Deriving a minimum threshold score for
-helical amphipathicity plots
The minimum threshold score for
-helical amphipathicity was determined empirically. This was based on the lowest minimum threshold score that could produce the best correlation between the integrals of the
-helical amphipathicity plots with an observed conformational change from "random coil" in phosphate buffer to
-helical in the presence of 10 mM SDS (data not shown). The proteins used for this were human
-syn, human
-syn, and four other natively unfolded proteins. These included a late embryogenesis abundant (LEA) protein from Aphelenchus avenae (called "AavLEA1"), an LEA protein from wheat (called "Em"), human tau-441, and colicin N T-domain. (Purified recombinant AavLEA1 and Em were kindly provided by Dr. A. Tunnacliffe, University of Cambridge, United Kingdom. Purified recombinant colicin N T-domain was kindly provided by Dr. J. Lakey, University of Newcastle upon Tyne, United Kingdom.)
The best correlation was produced using a minimum threshold score of 0.8. SALSA is implemented in Java (Sun Microsystems) with a modern graphical user interface (see screenshots in Supplemental Fig. 1). To obtain the algorithm, please contact Louise Serpell. (l.c.serpell{at}sussex.ac.uk).
| Footnotes |
|---|
4 School of Life Sciences, University of Sussex, Falmer, Brighton BN1 9QG, United Kingdom. ![]()
Reprint requests to: Shahin Zibaee, MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, United Kingdom; e-mail: shahin{at}mrc-lmb.cam.ac.uk; fax: +44 1223 402 310; or Louise C. Serpell, Department of Biochemistry, School of Life Sciences, University of Sussex, Falmer, Brighton BN1 9QG, United Kingdom; e-mail: l.c.serpell{at}sussex.ac.uk; fax: +44 1273 678 433.
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.062624507.
Supplemental material: see www.proteinscience.org
| Acknowledgments |
|---|
|
|
|---|
| References |
|---|
|
|
|---|
assembly in vitro and in Alzheimer's disease. J. Cell Sci. 113: 37373745.[Abstract]Alonzo, N.C., Hyman, B.T., Rebeck, G.W., and Greenberg, S.M. 1998. Progression of cerebral amyloid angiopathy: Accumulation of amyloid-
40 in affected vessels. J. Neuropathol. Exp. Neurol. 57: 353359.[Medline]
Alzheimer, A. 1907. Über eine eigenartige Erkrankung der Hirnrinde. Allg. Z. Psychiatr. 64: 146148.
Baba, M., Nakajo, S., Tu, P.H., Tomita, T., Nakaya, K., Lee, V.M., Trojanowski, J.Q., and Iwatsubo, T. 1998. Aggregation of
-synuclein in Lewy bodies of sporadic Parkinson's disease and dementia with Lewy bodies. Am. J. Pathol. 152: 879884.[Abstract]
Balbach, J.J., Petkova, A.T., Oyler, N.A., Antzutkin, O.N., Gordon, D.J., Meredith, S.C., and Tycko, R. 2002. Supramolecular structure in full-length Alzheimer's
-amyloid fibrils: Evidence for a parallel
-sheet organization