|
|
||||||||
and ß tubulin
1 Physics Department, University of California, Santa Barbara, California 93106, USA
2 Physics Department, Norwegian Technical University, Trondheim NG, Norway N-7491
Reprint requests to: D. Kuchnir Fygenson, Physics Department, University of California, Santa Barbara, CA 93106, USA; e-mail: deborah{at}physics.ucsb.edu; fax: (805) 893-3307.
(RECEIVED June 2, 2003; FINAL REVISION September 16, 2003; ACCEPTED September 23, 2003)
3 Present address: Niels Bohr Institute, Copenhagen, Denmark. ![]()
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.03225304.
| Abstract |
|---|
|
|
|---|
and ß Tubulin are well-characterized paralogs with similar structures and functions. We quantify the variability of every amino acid position in both tubulins from the aligned sequences of their numerous known orthologs. By aligning the variability profiles, we identify residues that differ significantly in variability between
and ß tubulin. Most of these residues are part of well-defined secondary structures and are clustered around the nucleotide binding pocket, the site of greatest functional difference between the two paralogs. The remaining residues of large difference in variability are located in the N-terminal loop between H1 and S2. We therefore predict that certain residues in this unstructured region also contribute to a functional difference between
and ß tubulin. Furthermore, we find the most restrictive variability-based alignment is nearly identical to the true structure-based alignment. Thus, by using a stringent variability-based alignment to approximate the true alignment, the method introduced here may predict sites of functional distinction between paralogous proteins even in the absence of structural information. Keywords: sequence alignment; neutral versus functional variation; bioinformatic tools; microtubule catalytic site
| Introduction |
|---|
|
|
|---|
The modern abundance of sequence data permits quantification of the variability of amino acids in many proteins. To relate this variability to function typically requires knowing the structure of the protein. For example, it is common to ask whether amino acids in the core of a protein sustain fewer variations than do those on its surface, as might be expected due to packing constraints or interactions essential for folding. Along these lines, a weak but statistically significant correlation is generally found between variability and solvent accessible surface area (Huang et al. 1996; Goldman et al. 1998; Rodionov and Blundell 1998). More intriguing, perhaps, is an apparent conservation of the three-dimensional pattern of conserved amino acids in several families of structurally homologous proteins, indicating the existence of a folding nucleus (Mirny and Shakhnovich 1999).
In this article, we demonstrate a method for extracting functional information from quantitative variability data by using paralogous proteins. The paralogs of interest here are
and ß tubulin. The numerous orthologs, known structures, and ambiguous structurefunction relationships of these tubulins make them an ideal and interesting test case. The results indicate that the method introduced here may be usefully applied to other protein paralogs with structures that are not yet known.
and ß tubulin form a heterodimer (
ß) that self-assembles into hollow cylindrical filaments called microtubules. Microtubules are dynamic elements of the eukaryotic skeleton that play an essential role in a variety of cellular functions. They are best known for their role in cell division, in which dramatic fluctuations in the length of individual microtubules are required to organize and separate the chromosomes (Mitchison and Kirschner 1984). These length fluctuations are fueled by the hydrolysis of one of two molecules of guanosine-triphosphate (GTP) bound to each tubulin dimer. Just how hydrolysis changes the tubulin structure so as to destabilize the microtubule is still a mystery (Nogales 2001).
In the decade before their crystal structure was known, sequence comparisons provided valuable insights into
and ß tubulin structure and function (Little et al. 1981; Little and Seehaus 1988; Burns 1991). The most thorough analysis to date is based on sequences available in 1992 (Burns and Surridge 1994). These studies emphasize how well suited
and ß tubulins are for variability analysis. Alignment among orthologs and between the two paralogs is unambiguous because their amino acid sequences are highly conserved and easily distinguished, and have few insertions or deletions. For quantitative analysis, it is particularly fortunate that hundreds of complete tubulin sequences are now available in public databases, with all eukaryotic phyla well represented.
Here, we quantify the variability of every amino acid in both
and ß tubulin and compare a variability-based alignment of the amino acid sequences with their true structure-based alignment (Nogales et al. 1998; Löwe et al. 2001). We find that a stringent variability-based alignment effectively reproduces the true alignment, whereas a tolerant variability-based alignment can be used to identify homologous amino acids that differ significantly in variability between the two proteins. This procedure may be especially useful in directing mutagenesis studies to loci of key functional importance.
| Results |
|---|
|
|
|---|
and ß tubulin are both highly conserved and broadly sequenced, it is possible to quantify the variability of each of their residues with confidence. The statistical distribution of residue variability in both tubulins is strongly peaked at low values, with >50% of residues scoring in the bottom 10% of the variability range (Fig. 1
|
|
(x) - Sß(x), using the known structural alignment (Nogales et al. 1998; Löwe et al. 2001), reveals quantitative differences in the variability of corresponding residues. The distribution of S
(x) - Sß(x) has a large peak about zero, but is otherwise normal (µ = 0.01,
= 0.18; Fig. 3
|
ß, between the profiles while varying the gap initiation penalty (Fig. 4
|
ß = 0.42. This level of correlation is matched by the variability-based alignment as soon as the gap initiation penalty is low enough to allow any gaps at all (Fig. 4
, in the disordered C terminus (
, 442443).
The slight discrepancies between the variability and structure-based alignments are resolved when the gap initiation penalty is lowered just enough to allow one more gap into the alignment (1.0 < G < 1.2). The new gap appears in the
C terminus (
, 450452), and the gap between S9 and S10 increases in size (ß, 351358). At the same time, however, new discrepancies arise. The smaller of the two gaps in ß splits into an equivalent pair of intermediate-sized gaps piercing H1 and bracketing the N-terminal loop (ß, 3940
ß, 1419;
, 6164), and the larger of the two gaps in ß shifts left 10 residues, into the middle of S9 (ß, 362365
ß, 351358). These changes lead to a modest increase in the cross-correlation coefficient, R
ß = 0.45.
When the gap initiation penalty is reduced a little more (G < 1.0), the cross-correlation coefficient makes a large and stable jump up to R
ß = 0.52 as a new pair of self-compensating gaps appears surrounding H4 and S5 (ß, 138140;
, 175177). This is the optimal variability-based alignment (labeled III). It persists until the gap initiation penalty becomes so low (G
0.6) that many small, closely spaced gaps arise throughout the sequences.
| Discussion |
|---|
|
|
|---|
(x) - Sß(x)| > 0.18; Fig. 3
The structural context of many of these 40 residues indicates that they are of particular functional importance. Half are clustered around the nucleotide binding pocket (five of these interact directly with the nucleotide), four are clustered around the taxol binding site (on ß tubulin), and one participates in lateral binding between protofilaments in the microtubule (Table 1
). The remainder lie in the large and enigmatic N-terminal loop (Table 2
).
|
|
tubulin is particularly interesting as a target for directed mutagenesis because it interacts directly with the nucleotide in the crystal structure. We speculate that it may be important in preventing hydrolysis at the N-site.
All four residues near the taxol binding site on ß tubulin are more variable than their counterparts on
tubulin, which do not bind taxol. One possible interpretation is that taxol-binding residues are under a "negative selective pressure" to escape susceptibility to this natural poison. Another possibility is that cellular factors (e.g., small peptides or regulatory proteins) exploit this site to regulate microtubule stability, and the variability reflects the variety of such regulatory factors in different species. The latter explanation is particularly intriguing given recent structural evidence for at least one such factor (Kar et al. 2003).
The interpretation of variability differences is less obvious in the large N-terminal loop that connects H1 and S2. Docking the high-resolution tubulin structure into the electron density map of a microtubule puts this loop in a position to participate in the lateral bonds between dimers (Nogales et al. 1999). It is, however, the area of poorest density in the structure of ß tubulin and largely absent in the structure of
tubulin (Löwe et al. 2001). Therefore, unlike the rest of the protein, alignment between these two regions is based on sequence not structural homology. Using the sequence-based alignment, 15 of the 30 positions in the loop differ significantly in variability (Table 2
). In contrast, the variability-based alignment has a six-residue frameshift that reduces the number of positions with significant differences to four, all of which are less variable on ß tubulin (Table 2
). It is possible that the high average variability of residues in the H1-S2 loop makes their sequence-based alignment unreliable and that the variability-based alignment is a better indicator of functional homology.
In both alignments, four highly variable residues on
tubulin (Q35, K40, I42, G44) and one strongly conserved residue on ß tubulin (G38) differ from their counterparts on the opposite paralog. Because similar differences in variability were so plausibly connected with functional differences in the other misaligned regions (see above), we predict that these residues in the N-terminal loop also have a role in making the biochemical functions of
and ß tubulin distinct. Furthermore, because the tendency is for residues on ß tubulin to be more conserved, we speculate that the functional distinction is once again related to hydrolysis and that the tenuously structured, but conserved, glycines on ß tubulin are involved in the hydrolysis-driven conformational change that eventually destabilizes the microtubule.
In summary, by using
and ß tubulin, we have demonstrated how amino acid variability profiles can be used to identify residues that contribute to functional differences between two paralogous proteins. Our approach is based on finding the optimal alignment of the variability profiles and comparing it with the true alignment of the paralogs to reveal on domains with numerous large differences in variability. We note that, under stringent conditions, variability-based alignment reproduces the structure-based alignment. Thus, a comparison between stringent and optimal variability-based alignments of paralogous protein sequences may be used to predict sites of functional distinction, even in the absence of structural information.
| Materials and methods |
|---|
|
|
|---|
tubulins and 300 ß tubulins were obtained by a Blast 2.0 search of the non-redundant database in January 2000, using pig tubulins (P02550
[GenBank]
, P02554
[GenBank]
) as queries. Sequences <90% of the length of the query tubulins (
tubulins <406 residues, ß tubulins <400 residues) are considered fragments and were not used.
Quantification of variability
Variability of the residue at every position in a primary sequence was quantified by the Shannon entropy (Shannon 1948),
![]() | (1) |
where pi(x), the probability of finding amino acid i at position x in the sequence, is estimated from the relative frequency of i at x. The sum was taken over all 20 amino acids i.
We note that it is common to group the amino acids into i < 20 categories on the basis of physical character or substitution propensity (Smith and Xue 1997; Atchley et al. 1999; Mirny and Shakhnovich 1999; Plaxco et al. 2000). We experimented with several amino acid groupings. The one that minimized off diagonal elements in the substitution matrix for our tubulin sequences was: (D, E), (K, R), (P, G, A, S, T, N), (F, Y, W, H), and (I, L, M, V, C, Q). However, because even this grouping had no qualitative effect on the distribution of S(x), we chose to use the simplest i = 20 definition for our measure of variability. Also for simplicity, we ignored insertions and deletions, which, as previously noted, are rare in tubulin. Among the aligned sequences, if <20% of the sequences had an amino acid at a site, no x was assigned to that site.
Alignment
The profiles S
(x) and Sß(x) were aligned by using a standard minimization algorithm to identify optimal paths on a two-dimensional grid (x,y) with potential
. Each path was forced to start at (x,y) = (0,0) and was assigned a score S(x,y) for each point visited plus a penalty for each vertical or horizontal move. Diagonal moves correspond to alignments between the sequences. Horizontal or vertical moves represent gaps in one of the sequences. The first horizontal or vertical move after a diagonal stretch is penalized with a relatively high initiation cost G (
0.5). Subsequent moves in the same direction are penalized with a lower continuation cost g (typically of order G/10). For a given path the resulting score,
, is therefore
![]() | (2) |
where n is the number of gaps, li is the length of the ith gap, and x' is a position index for the aligned sequences. As both G and g are positive, the optimal path is the path with the lowest
(equation 2
). This path is determined iteratively, by choosing whichever path to a point (x,y) from either (x - 1,y - 1) or (x, y - k) or (x - k, y), where k = 1, 2, ... x minimizes
. The first case represents alignment, whereas the latter two represent paths with a gap that terminates at (x,y).
For given parameter set (G, g), the optimal alignment assigns an S-value (or a blank space) to every position x along a common axis for both profiles. We monitor the alignment by computing the correlation coefficient
![]() | (3) |
where
denotes an average over all x. The correlation coefficient measures how predictable S
(x) is given Sß(x) (and vice versa). It can range in absolute value from one, if the value of S
(x) uniquely determines the value of Sß(x), to zero, if knowing the value of S
(x) is of no use in predicting Sß(x).
| Acknowledgments |
|---|
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| References |
|---|
|
|
|---|
Burns, R.G. 1991.
-, ß-, and
-Tubulins: Sequence comparisons and structural constraints. Cell Motil. Cytoskeleton 20: 181189.[CrossRef][Medline]
Burns, R.G. and Surridge, C.D. 1994. Tubulin: Conservation and structure. In Microtubules (eds. J.S. Hyams and C.W. Lloyd), pp. 332. Wiley-Liss, New York.
Fitch, W.M. 2000. Homology: A personal view on some of the problems. Trends Genet. 16: 227231.[CrossRef][Medline]
Gaucher, E.A., Gu, X., Miyamoto, M.M., and Benner, S.A. 2002. Predicting functional divergence in protein evolution by site-specific rate shifts. Trends Biochem. Sci. 27: 315321.[CrossRef][Medline]
Goldman, N., Thorne, J.L., and Jones, D.T. 1998. Assessing the impact of secondary structure and solvent accessibility on protein evolution. Genetics 149: 445458.
Huang, W., Petrosino, J., Hirsch, M., Shenkin, P.S., and Palzkill, T. 1996. Amino acid sequence determinants of ß-lactamase structure and activity. J. Mol. Biol. 258: 688703.[CrossRef][Medline]
Kar, S., Fan, J., Smith, M.J., Goedert, M., and Amos, L.A. 2003. Repeat motifs of
bind to the insides of microtubules in the absence of taxol. EMBO J. 22: 7077.[CrossRef][Medline]
Krauhs, E., Little, M., Kempf, T., Hofer-Warbinek, R., Ade, W., and Postingl, H. 1981. Complete amino acid sequence of ß-tubulin from porcine brain. Proc. Natl. Acad. Sci. 78: 41564160.
Little, M. and Seehaus, T. 1988. Comparative analysis of tubulin sequences. Comp. Biochem. Physiol. B 90: 655670.[CrossRef][Medline]
Little, M., Krauhs, E., and Ponstingl, H. 1981. Tubulin sequence conservation. Biosystems 14: 239246.[CrossRef][Medline]
Löwe, J., Li, H., Downing, K.H., and Nogales, E. 2001. Refined structure of
ß-tubulin at 3.5 Å resolution. J. Mol. Biol. 313: 10451057.[CrossRef][Medline]
Mirny, L.A. and Shakhnovich, E.I. 1999. Universally conserved positions in protein folds: Reading evolutionary signals about stability, folding kinetics and function. J. Mol. Biol. 291: 177196.[CrossRef][Medline]
Mitchison, T. and Kirschner, M. 1984. Dynamic instability of microtubule growth. Nature 312: 237242.[CrossRef][Medline]
Nogales, E. 2001. Structural insights into microtubule function. Ann. Rev. Biophys. Biomol. Struct. 30: 397420.[CrossRef][Medline]
Nogales, E., Wolf, S.G., and Downing, K.H. 1998. Structure of the
ß tubulin dimer by electron crystallography. Nature 391: 199203.[CrossRef][Medline]
Nogales, E., Whittaker, M., Milligan, R.A., and Downing, K.H. 1999. High-resolution model of the microtubule. Cell 96: 7988.[CrossRef][Medline]
Plaxco, K.W., Larson, S., Ruczinski, I., Riddle, D.S., Thayer, E.C., Buchwitz, B., Davidson, A.R., and Baker, D. 2000. Evolutionary conservation in protein folding kinetics. J. Mol. Biol. 298: 303312.[CrossRef][Medline]
Postingl, H., Krauhs, E., Little, M., and Kempf, T. 1981. Complete amino acid sequence of
-tubulin from porcine brain. Proc. Natl. Acad. Sci. 78: 27572761.
Rodionov, M.A. and Blundell, T.L. 1998. Sequence and structure conservation in a protein core. Proteins 33: 358366.[CrossRef][Medline]
Shannon, C.E. 1948. The mathematical theory of communication. Bell Systems Tech. J. 27: 623656.
Smith, D.K. and Xue, H. 1997. Sequence profiles of immunoglobulin and immunoglobulin-like domains. J. Mol. Biol. 274: 530545.[CrossRef][Medline]
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
C. L. E. Zhou, A. T. Zemla, D. Roe, M. Young, M. Lam, J. S. Schoeniger, and R. Balhorn Computational approaches for identification of conserved/unique binding pockets in the A chain of ricin Bioinformatics, July 15, 2005; 21(14): 3089 - 3096. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |