|
|
||||||||
Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
Stephen H. Bryant, Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, Room 8N805, 8600 Rockville Pike, Bethesda, MD 20894; e-mail:bryant{at}ncbi.nlm.nih.gov; fax: (301) 435-7794.
(RECEIVED May 29, 2001; FINAL REVISION November 6, 2001; ACCEPTED November 6, 2001)
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps19902.
| Abstract |
|---|
|
|
|---|
Keywords: Profile search; protein structure alignment; alignment accuracy
| Introduction |
|---|
|
|
|---|
The sensitivity of a PSSM with respect to identification of divergent family members depends on the seed alignment used to construct it. Perfect discrimination between homologous and nonhomologous sequences can be achieved only when the PSSM is at once informative enough for specific recognition, and at the same time based on sequences that encompass the overall diversity of a protein family. As more and more diverse sequences are included in a seed alignment, the accuracy of that alignment may furthermore become an issue. Use of stringent gap penalties may cause misalignment of residues forming a conserved functional site, for example, if those residues are flanked by insertions or deletions. Inaccurate seed alignments may in this way introduce noise and unnecessarily dilute the information content of the PSSM.
Protein three-dimensional structure is remarkably stable with respect to sequence divergence, such that even the most distant relatives within a protein family exhibit the same overall topology and architecture. This resiliency allows proteins to gradually evolve into families with some variety of functional sites and functions, based on the same overall scaffold (Rost 1997; Holm 1998; Murzin 1998). Although it has been noted that conservation of structural features also decreases with evolutionary distance (Chothia and Lesk 1986; Hubbard and Blundell 1987; Flores et al. 1993; Russell and Barton 1994; Wood and Pearson 1999), it nonetheless seems possible that structural alignments may prove a useful source of seed alignments for PSSM construction. Alignments based on a conserved structural scaffold may accurately identify conserved sequence features and/or functional sites, even when overall sequence similarity is low.
The use of structure alignments for PSSM construction has been investigated previously by Sternberg and colleagues (Kelley et al. 1999, 2000). These investigators initiated database searches by merging the PSSMs of different sequence-similar subfamilies, based on structurestructure alignments for representatives of these subfamilies. This procedure aims to detect relationships between protein families that are not obvious from the component PSSMs individually. These investigators showed that combined PSSMs yielded recognition rates similar to the starting PSSMs, which can be explained by the extremely low similarity between the structurally similar subfamilies they considered (Kelley et al. 1999). Further analysis showed that some PSSMs constructed by this approach could indeed detect members of divergent subfamilies (Kelley et al. 2000). The investigators attributed this to the "mosaic" nature of the combined PSSMs, which simultaneously encode the sequence motifs characteristic of two or more sequence-dissimilar subfamilies.
Here we focus on a different application of structural alignments in PSSM construction. We compare the performance of PSSMs derived from seed alignments based on different sequencesequence alignment algorithms to those based on structurestructure alignment. The seed alignments are in each case based on exactly the same protein sequences, and it is only their alignment per se, not the diversity of family members included in the alignment, that we vary in the experiments. Our intention is to measure whether and to what extent PSSM performance improves when seed alignments are based on the shared three-dimensional scaffold detected by structurestructure superposition, as opposed to the residueconservation patterns detected by sequencesequence comparison.
To assess PSSM performance we employ a test set where the 3-dimensional structures of both database and PSSM-template proteins are known. We may thus measure the accuracy of a PSSM-sequence alignment as that of the three-dimensional molecular model implied by that alignment and the known structure of the PSSM-template protein. We use one of the numerical measures of molecular model accuracy developed for the CASP structureprediction competitions (Moult et al. 1997), contact specificity (Marchler-Bauer and Bryant 1997). We also measure recognition sensitivity of PSSMs based on seed alignments calculated by these different methods. To do so we examine the fraction of structurally similar proteins in the known-structure database that are identified with a significant PSI-BLAST E-values (Altschul et al. 1997).
In agreement with earlier results (Kelley et al. 1999, 2000), we find that use of structural alignments in PSSM construction has a modest effect on search sensitivity. We find a much greater effect on the accuracy of the PSSM-sequence alignments. When structural alignments are used to build the seed alignment, molecular models derived from PSSM-sequence alignments are in significantly better agreement with the known structure of the modeled proteins. We thus suggest that PSSMs derived from structural alignments may be most useful for accurate detection of the core-structure scaffold characteristic of a protein family and for annotation of functional sites associated with it.
| Results |
|---|
|
|
|---|
To illustrate the contact specificity metric, we plot in Figure 1
average values obtained when molecular models based on VAST structurestructure alignments of PSSM-template and database proteins take the place of PSSM-sequence alignments. These alignments may be understood as the most accurate one might expect from any PSSM-sequence alignment method, in that knowledge of the three-dimensional structures of the template and database proteins, rather than the template PSSM and database sequence, has been used in calculation of the alignment. One may see that median contact specificity values range from around 80%, for models based on VAST alignments of protein pairs with high sequence (and structure) similarity, to around 70%, for models based on structural alignment of protein pairs with lower sequence (and structure) similarity. Contact specificity does not reach 100%, because the structures of the query and template proteins are never identical to one another. Contact specificity values around 70% correspond to molecular models with root mean square superposition residuals (for polypeptide backbone atoms, when compared to the true structure) of around 3 Angstroms (not shown).
|
|
To directly compare PSSM-sequence alignment accuracy for PSSMs from different seed alignment methods, we plot in Figure 3
average contact specificity for molecular models based on PSSMs from seeds by sequence comparison (BLAST, ClustalW, and ClustalW-pairwise) versus PSSMs from seed alignments by structure comparison (VAST). We note that the averages include all hits, not just those detected by PSSMs from all four seed alignment methods, and that average contact specificity is somewhat lower than in Figure 2
for this reason. It is apparent from Figure 3
that nearly all points fall below the diagonal, indicating that average PSSM-sequence alignment accuracy for nearly all sequences is greater when seed alignments are based on structure comparison. The only exceptions to the pattern are a few proteins where BLAST seed alignments lead to somewhat greater contact specificity; these are cases where the BLAST seed alignment is shorter than the VAST seed alignment, focusing on a highly similar region. It is striking that contact specificity sometimes rises from near zero, with PSSMs based on sequence alignments, to values over 70%, with PSSMs based on structure alignments.
|
|
Measurement of PSSM recognition sensitivity
The test set we employ in these experiments is based on 172 proteins of known structure, each of which is structurally similar to a large and diverse group of other known structures (see Materials and Methods). Each of these test-set proteins is used as a template sequence for calculation of PSSMs from seed alignments calculated by the different methods we evaluate, and our basic measure of PSSM sensitivity is simply the fraction of the structure neighbors of the test-set protein recognized with a significant PSI-BLAST E-value. To examine the effect on PSSM recognition sensitivity of increasing diversity among sequences in the seed alignment, we furthermore assign each test set protein to a particular range of seed alignment diversity. Because this assignment is based in part on the availability of structure neighbors in that diversity range, the proportion of structure neighbors with sequence similarity sufficient for recognition by PSSM-sequence comparison may vary among seed-alignment diversity ranges.
To correct for any differences in diversity in the structureneighbor set we employ as a standard of truth, we also examine the number of structure neighbors recognized relative to the number we would expect to recognize, if PSSM performance were equivalent across different seed-alignment diversity ranges. We estimate the number of structure neighbors we would expect to recognize simply as those with greater than 12% identical residues in VAST structurestructure alignment. This threshold was previously identified as the point where homologous structure neighbors, related by descent from a common ancestral gene, begin to outnumber "analogous" structure neighbors, which may reflect convergent evolution (Matsuo and Bryant 1999). We emphasize that comparison of PSSM performance for different seed alignment methods, within a given seed alignment diversity range, is not affected by this correction, because the total number of structure neighbors recognized is simply divided by a constant. The correction is useful for comparison of PSSM sensitivity across seed-alignment diversity ranges.
Diversity of seed alignments determines recognition sensitivity
In Figure 4
we plot structure neighbor recognition rates, for PSSMs calculated from seed alignments by different methods, for ranges of seed-alignment diversity. One pattern apparent from Figure 4
is that even the most sensitive PSSMs still miss many similarities detected by structurestructure comparison, such that average overall sensitivity is only about 20%. This reflects the difficulty of the test set, where many structure neighbors have no detectable sequence similarity. Using a different set of structure neighbors as the standard of truth, Brenner and colleagues similarly found that only a small fraction of structure neighbors may be detected by sequencesequence or PSSM-sequence comparison methods (Brenner et al. 1998).
|
10e-5 is used in the PSI-BLAST search (not shown). Several investigators have suggested that recognition sensitivity should in general vary with the degree of divergence of the aligned sequences used to calculate the PSSM (Park et al. 1998; Aravind and Koonin 1999; Salamov et al. 1999; Rychlewski et al. 2000), and the current analysis supports this conclusion.
|
It is also apparent from Figure 5
that for seed alignments below 30% average pairwise identity or above an average of four residue types per aligned position there is some variation of recognition sensitivity with alignment type. Recognition sensitivity decreases with the pattern global-sequence (ClustalW) > local-structure (VAST)
global-sequence (ClustalW-pairwise) > local-sequence (BLAST), although it is also apparent that this variation is smaller than the effect of seed-alignment diversity. Global sequence alignments (ClustalW and ClustalW-pairwise) presumably perform better than local sequence alignment (BLAST) because the domain pairs in the test set are indeed globally similar, and the PSSMs formed from these longer alignments more sensitive, an effect observed before (Thompson et al. 1999; Notredame et al. 2000). As can be seen from comparing the same alignment algorithm (ClustalW) used in two different ways, the multiple alignment method ClustalW performs better than ClustalW used in a pairwise fashion. Indeed, in situations where the seed alignment is constructed from different subfamilies, sequence motifs common to a subset of neighbors but not present in the test-set domain can only be aligned correctly using multiple alignment tools.
To examine the effect of gap content on PSSM recognition sensitivity we compare the recognition sensitivity of PSSMs derived from alignments with approximately the same fraction of gaps, as shown in Table 2
. Only alignments with average sequence identity below 30% are included. As can be seen from Table 2
, seed alignments with more gaps produce less sensitive PSSMs for all alignment algorithms, presumably because the sequences in these alignments are among the most diverse. Interestingly, for seed alignments containing equal fractions of gaps, the local alignment methods perform as well as the global alignment methods, and the local-structure (VAST) method gives the most sensitive PSSMs. The local alignments are presumably more accurate in the regions they have aligned, and when the alignment is as extensive as that from ClustalW, containing an equal fraction of gaps, PSSM sensitivity is comparable.
|
| Discussion |
|---|
|
|
|---|
Although the method of seed alignment construction seems to have little effect on search sensitivity, we find just the opposite result with respect to the accuracy of PSSM-sequence alignments. When we examine the accuracy of the 3D molecular model implied by the PSSM-sequence alignment, we find that structure-based seed alignments produce PSSMs that better detect and reproduce the conserved core structure characteristic of a protein family. This effect is most pronounced for PSSM-sequence alignments where the PSSM is derived from seed alignments of very diverse sequences, but it is apparent from the data presented in Figures 2 and 3![]()
that use of structure-based seed alignments rarely leads to a decrease in alignment accuracy. Because the PSI-BLAST algorithm in general tends not to start or extend HSPs (high-scoring segment pairs) where there were gaps in the seed alignment, it is perhaps not surprising that PSSM-sequence alignments, for PSSMs from structure-alignment seeds, which have no gaps within core elements, better reproduce the conserved structural scaffold. This observation suggests a simple strategy for improving the accuracy of PSSM-sequence alignments and the reliability of annotations derived from them: whenever possible, use structure alignments as seeds for PSSM construction.
Last, it is interesting to ask why PSSMs from seed alignments by structurestructure comparison can have a strong effect on PSSM-sequence alignment accuracy, but relatively little effect on recognition sensitivity. This result at first appears paradoxical, because one might suppose that more accurate seed alignments, if this indeed accounts for the effects we observe, might lead to improvements in both sensitivity and accuracy. The explanation may simply be that characteristic sequence motifs, sufficient for sensitive recognition of family members, are in general well detected by all of the seed alignment methods we consider. The improved alignment accuracy we observe for PSSMs from structurestructure alignment suggests that they may better represent additional regions of similarity, where sequence similarity is too weak for accurate alignment by sequence comparison algorithms. Precisely because sequence similarity in these regions is weak, however, one would not expect a large change in the information content of PSSMs calculated from these seed alignments, or a large change in recognition sensitivity. We can thus suggest that the primary effect of using structurestructure alignments in PSSM construction will be to improve PSSM-sequence alignment accuracy, and as a consequence, the accuracy of annotation transfer from known family members to new sequences detected by PSSM-based search tools. If one knows the locations of active site residues in a structure-based seed alignment, for example, one may expect that a PSSM-sequence alignment based on that seed may more accurately identify the homologous active site residues in new sequences.
| Materials and methods |
|---|
|
|
|---|
We omit from the test set sequence-discontinuous domains (except for 1TDE 1), domains that have less than five structure neighbors within the test set, and domains longer than 250 residues. The requirement for five or more sequence-dissimilar structure neighbors is restrictive, because there are few protein families for which this many structures are known, and it reduces the test set to a total of 172 domains. Structure neighbors are again taken from the database distributed with Entrez (http://www.ncbi.nlm.nih.gov/Entrez/), as identified by the VAST structure comparison algorithm (Madej et al. 1995; Gibrat et al. 1996). We note that structural similarities detected by the VAST algorithm have previously been compared to the SCOP classification (Murzin et al. 1995; Matsuo and Bryant 1999; Przytycka et al. 1999). According to SCOP (Murzin et al. 1995), this set of 172 domains includes five different classes, 50 folds, and 73 superfamilies, and by this criterion may be considered a diverse sample of protein domains.
Homologous structure neighbors
We include in seed alignments only sequences from homologous structure neighbors. Homologous structure neighbors for each test-set domain are chosen from the complete structure neighbor set on the basis of significant sequence similarity or extensive structural similarity consistent with descent from a common ancestral gene. We examine all VAST neighbors within MMDB, but include only those neighbors classified as belonging to the same homologous superfamily by the authors of SCOP (Murzin et al. 1995), or, when no SCOP classification is available, with more than 12% sequence identity in the VAST structure alignment. These structure neighbors are then used to calculate the "homologous core substructure" (HCS) of the template domain, and any additional structure neighbors superimposing onto 90% or more of the HCS residues are recruited as additional homologous neighbors, as described previously (Matsuo and Bryant 1999).
Because we anticipate that the diversity of sequences in a seed alignment will have an effect on PSSM performance, we do not select all homologous structure neighbors of a given template domain when constructing seed alignments for PSSMs. Instead, we sample randomly among them, to select a subset of neighbors with a defined range of sequence similarity. We assign each template domain randomly to a range of sequence similarity, 010%, >1020%, etc. We then randomly choose a homologous structure neighbors exhibiting this range of sequence similarity with respect to the template, iterating this process until at least 5 but no more than 50 neighbors are selected. If this process fails for a given template domain and target similarity range, due to insufficient structure neighbors, that domain is randomly exchanged for another from a different similarity range, and neighbor selection begun anew.
Construction of seed alignments
Structure alignments of the template domain with its homologous structure neighbors are taken directly from VAST alignments as described above. Pairwise sequence alignments between the template domain and these neighbors are calculated using the gapped BLAST algorithm (Altschul et al. 1997) and the ClustalW algorithm (Thompson et al. 1994), applied separately for the template and each neighbor sequence. To produce multiple sequence alignments we apply the ClustalW algorithm to the template and all selected neighbors. In this application ClustalW constructs the alignment progressively, grouping the most similar sequences into aligned clusters and aligning larger and larger alignment clusters with one other (Thompson et al. 1994). As a result, we end up with four different types of seed alignment, one based on automatic structurestructure alignment (VAST) and three based on different automatic sequencesequence alignment algorithms (BLAST, ClustalW-pairwise, and ClustalW), each run with default parameters suggested by the authors of the algorithm.
We derive PSSMs from seed alignments using the default method of PSI-BLAST. Aligned sequences are projected onto the template domain (insertions relative to the template are ignored) and a PSSM calculated using the pseudocount method described previously (Altschul et al. 1997). PSSMs derived from each type of alignment are used to initialize searches of a database consisting of nonidentical sequences from PDB. We perform only a single iteration of PSI-BLAST (Altschul et al. 1997), so as to avoid any modification of the seed-derived PSSM, and collect hits with E-values below 0.01. We emphasize that seed alignments based on VAST, BLAST, ClustalW-pariwise, and ClustalW methods contain exactly the same template and neighbor sequences, differing only in the algorithm that has been employed in building the alignment.
Fold recognition sensitivity and alignment accuracy
The VAST structure neighbors of each template domain provide a standard of truth for judging the sensitivity of PSSMs constructed from each type of seed alignment. Hits to structurally similar neighbors are considered true positives, misses of structurally similar neighbors are considered false negatives. We thus calculate fold recognition sensitivity as (Ntr.pos/NVAST.neigh) and fold recognition specificity as (Ntr.pos/NPDB.hits). Here, Ntr.pos is the number of nonidentical structure neighbors detected with E-value below a given threshold, NVAST.neigh is the overall number of nonidentical structure neighbors for a given domain, and NPDB.hits the total number of sequences (with known structure) that are identified with significant PSI-BLAST E-value. We note that these counts exclude any structure neighbors used in the seed alignments or any sequences identical to them. Counts may include sequences similar to those in the seed alignment, however, because our intention is to compare PSSM performance with respect to retrieval and accurate alignment of database sequences spanning a range of similarities with respect to the sequences in the seed alignment.
To evaluate the accuracy of the PSSM-sequence alignments we use the known structures of the template domains and of the database proteins identified by PSI-BLAST search. Using the known structure of the identified database protein as the standard of truth, we evaluate the accuracy of the molecular model implied by the PSSM-sequence alignment of the template domain with the sequence of that protein. We employ the numerical measure contact specificity, defined as the percent of nonlocal residue contacts in the predicted structure that are also present in the experimental structure (Marchler-Bauer and Bryant 1997): ACSpc = Ncp/Np. Here, Ncp is the number of nonlocal contacts (for residues separated along the chain by at least five peptide bonds and having C
-atoms less than 8 Angstroms apart) that occur in both the molecular model implied by PSSM-sequence alignment and in the experimental structure of the database protein. Np is the total number of nonlocal contacts in the predicted model. As in calculation of recognition sensitivity, family members present in the seed alignment are ignored in evaluation of alignment accuracy. In comparing accuracy of PSSM-sequence alignments for PSSMs derived from different seed alignments, we average contact specificity across the various database proteins identified by each PSSM.
| Acknowledgments |
|---|
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| References |
|---|
|
|
|---|
Aravind, L. and Koonin, E.V. 1999. Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. J. Mol. Biol. 287: 10231040.[CrossRef][Medline]
Berman, H.M., Bhat, T.N., Bourne, P.E., Feng, Z., Gilliland, G., Weissig, H., and Westbrook, J. 2000. The Protein Data Bank and the challenge of structural genomics. Nat. Struct. Biol. (Suppl.) 7: 957959.
Brenner, S.E., Chothia, C., and Hubbard, T.J. 1998. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc. Natl. Acad. Sci. 95: 60736078.
Chambers, J.M. (1998). Programming with data. A guide to the S language. Springer-Verlag, New York.
Chothia, C. and Lesk, A.M. 1986. The relation between the divergence of sequence and structure in proteins. EMBO J. 5: 823826.[Medline]
Eddy, S.R. 1996. Hidden Markov models. Curr. Opin. Struct. Biol. 6: 361365.[CrossRef][Medline]
Flores, T.P., Orengo, C.A., Moss, D.S., and Thornton, J.M. 1993. Comparison of conformational characteristics in structurally similar protein pairs. Protein Sci. 2: 18111826.[Abstract]
Gibrat, J.F., Madej, T., and Bryant, S.H. 1996. Surprising similarities in structure comparison. Curr. Opin. Struct. Biol. 6: 377385.[CrossRef][Medline]
Gribskov, M., McLachlan, A.D., and Eisenberg, D. 1987. Profile analysis: Detection of distantly related proteins. Proc. Natl. Acad. Sci. 84: 43554358.
Holm, L. 1998. Unification of protein families. Curr. Opin. Struct. Biol. 8: 372379.[CrossRef][Medline]
Holm, L. and Sander, C. 1994. Parser for protein folding units. Proteins 19: 256268.[CrossRef][Medline]
Hubbard, T.J. and Blundell, T.L. 1987. Comparison of solvent-inaccessible cores of homologous proteins: Definitions useful for protein modelling. Protein Eng. 1: 159171.
Hughey, R. and Krogh, A. 1996. Hidden Markov models for sequence analysis: Extension and analysis of the basic method. Comput. Appl. Biosci. 12: 95107.
Karplus, K., Sjolander, K., Barrett, C., Cline, M., Haussler, D., Hughey, R., Holm, L., and Sander, C. 1997. Predicting protein structure using hidden Markov models. Proteins Suppl. 1: 134139.
Kelley, L.A., MacCallum, R.M., and Sternberg, M.J.E. 1999. Recognition of remote protein homologies using three-dimensional information to generate a position specific scoring matrix in the program 3D-PSSM. Proceedings of RECOMB, Lyon, France.
Kelley, L.A., MacCallum, R.M., and Sternberg, M.J. 2000. Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol. 299: 499520.[Medline]
Madej, T., Gibrat, J.F., and Bryant, S.H. 1995. Threading a database of protein cores. Proteins 23: 356369.[CrossRef][Medline]
Marchler-Bauer, A., Addess, K.J., Chappey, C., Geer, L., Madej, T., Matsuo, Y., Wang, Y., and Bryant, S.H. 1999. MMDB: Entrez's 3D structure database. Nucleic Acids Res. 27: 240243.
Marchler-Bauer, A. and Bryant, S.H. 1997. Measures of threading specificity and accuracy. Proteins Suppl. 1: 7482.
Matsuo, Y. and Bryant, S.H. 1999. Identification of homologous core structures. Proteins 35: 7079.[CrossRef][Medline]
Moult, J., Hubbard, T., Bryant, S.H., Fidelis, K., and Pedersen, J.T. 1997. Critical assessment of methods of protein structure prediction (CASP): Round II. Proteins Suppl. 1: 26.
Murzin, A.G. 1998. How far divergent evolution goes in proteins. Curr. Opin. Struct. Biol. 8: 380387.[CrossRef][Medline]
Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247: 536540.[CrossRef][Medline]
Neuwald, A.F., Liu, J.S., Lipman, D.J., and Lawrence, C.E. 1997. Extracting protein alignment models from the sequence database. Nucleic Acids Res. 25: 16651677.
Notredame, C., Higgins, D.G., and Heringa, J. 2000. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302: 205217.[CrossRef][Medline]
Panchenko, A.R., Marchler-Bauer, A., and Bryant, S.H. 2000. Combination of threading potentials and sequence profiles improves fold recognition. J. Mol. Biol. 296: 13191331.[CrossRef][Medline]
Park, J., Karplus, K., Barrett, C., Hughey, R., Haussler, D., Hubbard, T., and Chothia, C. 1998. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J. Mol. Biol. 284: 12011210.[CrossRef][Medline]
Przytycka, T., Aurora, R., and Rose, G.D. 1999. A protein taxonomy based on secondary structure. Nat. Struct. Biol. 6: 672682.[CrossRef][Medline]
Rost, B. 1997. Protein structures sustain evolutionary drift. Fold. Des. 2: S19S24.[CrossRef][Medline]
Russell, R.B. and Barton, G.J. 1994. Structural features can be unconserved in proteins with similar folds. An analysis of side-chain to side-chain contacts secondary structure and accessibility. J. Mol. Biol. 244: 332350.[CrossRef][Medline]
Rychlewski, L., Jaroszewski, L., Li, W., and Godzik, A. 2000. Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci. 9: 232241.[Abstract]
Salamov, A.A., Suwa, M., Orengo, C.A., and Swindells, M.B. 1999. Genome analysis: Assigning protein coding regions to three-dimensional structures. Protein Sci. 8: 771777.[Abstract]
Sauder, J.M., Arthur, J.W., and Dunbrack, R.L., Jr. Large-scale comparison of protein sequence alignment algorithms with structure alignments. Proteins 40: 622.
Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22: 46734680.
Thompson, J.D., Plewniak, F., and Poch, O. 1999. A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 27: 26822690.
Wood, T.C. and Pearson, W.R. 1999. Evolution of protein sequences and structures. J. Mol. Biol. 291: 977995.[CrossRef][Medline]
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
P. K. Shah, P. Aloy, P. Bork, and R. B. Russell Structural similarity to bridge sequence space: Finding new families on the bridges Protein Sci., May 1, 2005; 14(5): 1305 - 1314. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. D. Cho, C. L. Verlinde, and A. M. Weiner Archaeal CCA-adding Enzymes: CENTRAL ROLE OF A HIGHLY CONSERVED {beta}-TURN MOTIF IN RNA POLYMERIZATION WITHOUT TRANSLOCATION J. Biol. Chem., March 11, 2005; 280(10): 9555 - 9566. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. R. Panchenko, F. Kondrashov, and S. Bryant Prediction of functional sites by analysis of sequence and structure conservation Protein Sci., April 1, 2004; 13(4): 884 - 892. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. R. Panchenko Finding weak similarities between proteins by sequence profile comparison Nucleic Acids Res., January 15, 2003; 31(2): 683 - 689. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||