|
|
||||||||
1 EMBL, 69117 Heidelberg, Germany2 Max Delbruck Center for Molecular Medicine, 13125 Berlin-Buch, Germany
Reprint requests to: Robert B. Russell, EMBL, Meyerhofstrasse 1, 69117 Heidelberg, Germany; e-mail: russell{at}embl.de; fax: +49-6221-387517.
(RECEIVED October 21, 2004; FINAL REVISION February 1, 2005; ACCEPTED February 1, 2005)
| Abstract |
|---|
|
|
|---|
Keywords: structural genomics; protein structure; domain family; evolution
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.041187405.
| Introduction |
|---|
|
|
|---|
Structural genomics initiatives often aim to use 3D structure as a means to identify functions (Zhang and Kim 2003). New structures can reveal binding sites or features that give hints about function, but the best functional inferences come when a new structure reveals an overall similarity to another that was not apparent from the comparison of sequences (Zhang et al. 2000). Such similarities can often allow the two families to be merged and much functional information to be transferred between them (Fig. 1A
). This is a central goal of many initiatives, and this strategy has lead to many successful annotations of function (Heger and Holm 2001; Aloy et al. 2002).
|
| Results |
|---|
|
|
|---|
SCOP classifies proteins into a hierarchy of similarity. Within families protein domains are perhaps most similar in character to those contained in SMART or Pfam. Homology can be very remote, but typically the similarity is detectable by sensitive sequence comparison methods. Superfamilies are the next level up the hierarchy, where a common evolutionary origin has been inferred by comparison of 3D structures, typically owing to the presence of a common active or binding site, or unusual common features unlikely to arise by chance. At the next level are folds, where proteins adopt similar 3D structures without any compelling evidence for a common ancestor. At this level proteins might still be remote homologs, but there is insufficient evidence to argue the case convincingly. For this study we merged Pfam or SMART alignments using structures similar at the superfamily or fold level. Figure 1B
shows the strategy described here. For each merged alignment we prepared profiles that we then used to search all sequences in the domain database. We then tested whether the searches retrieved additional members of the superfamily, fold, or new domain families lacking a representative of known structure.
Overall performance of structure-based profiles
Since many Pfam or SMART domains contain members of known 3D structures, we could easily define positive and negative matches (true or false) in database search results as those belonging to the same or different folds. In order to study the difference in the database search performance due to merged profiles, we chose to plot sensitivity and specificity versus E-value thresholds (Fig. 2
, top and middle). Definitions of sensitivity and specificity are given in the Materials and Methods section. We have also plotted true positive rate versus false positive rate (ROC curves) as a standard evaluation of performance (Fig. 2
, bottom). The number of true positives and false positives caused by the merged and unmerged profile searches are shown in Supplementary Table 1A,B
for interested readers.
|
|
Overall, Figure 2
also shows that structure-based merged profiles do not perform better than their unmerged counterparts: For BLAST they are marginally better, for HMMer marginally worse. However, inspection of the results (Table 1
) shows that performance varies considerably: Some merged profiles are better than their unmerged equivalents. They detect more related sequences from the database. Database searches with 308 merged HMMer profiles and 407 of that with PSI-BLAST at the superfamily level find more sequences of the same fold (annotated with structures) or novels. Such searches with fold-level profiles with HMMer and PSI-BLAST obtain other members with 167 and 159 profiles, respectively, while only 4.87% of unmerged alignments find more related families with PSI-BLAST and 6.25% with HMMer. The combined strategy does find more members of diverse families, just not all of them. This is as expected, as some very distantly-related domains have key functional and structural residues conserved, while others show little beyond an overall similarity in structure. Thus, the best overall strategy is to extend searches with the merged profiles with the unmerged counterparts (Fig. 2
, bottom).
Figure 3
shows examples of relationships found with merged profiles from particular folds. Typically more than one related family is identified by most profiles. For example, the merged profile of SMART domains HTH_ARSR and ETS finds families PAX, HTH_CRP, HTH_ICLR (with known structures), and HTH_DTXR (without representative structure). The merged profile of the Pfam, CheR, and RrnaAD domains finds Methyltransf_3, FtsJ, Fibrilarin, PCMT (with known structures), Methyltransf_2, Met_10, and Ubie_methyltransf (without representative structure). Families are often identified by more than one merged profile. For example, HTH_ICLR is found by both the HTH_ARSR/HSF and ETS/HTH_CRP profiles, Met_10 by both CheR/RrnaAD and PARP_reg/Methyltransf_3.
|
For further analysis (including benchmarking on genomes), we chose conservative E-value thresholds of 102 (BLAST) and 0.5 (HMMer) that do not introduce more false positives (specificity close to 1), but still uncover new similarities with the merged profiles (Fig. 2
).
Annotation of structures in completed genomes
To quantify the gain in sensitivity, we searched the genome sequences of Mycoplasma genitalium (Fraser et al. 1995), Escherichia coli K12 (Blattner et al. 1997), Streptococcus pneumoniae R6 (Hoskins et al. 2001), and Saccharomyces cervisiae (Goffeau et al. 1996) with structure-based profiles. We chose these genomes as they are very well annotated. We compared our assignments to superfamily (HMM profiles of SCOP 1.59; Gough and Chothia 2002), AnDom (IMPALA profiles of SCOP 1.59; Schmidt et al. 2002), and BLAST assignments. The number of unique assignments by our profiles compared to other methods for each genome (assignment difference) is plotted in Figure 4
for HMMer (Fig. 4A
) and PSI-BLAST (Fig. 4B
) searches. Although the starting set is limited to alignments that can be merged via structural alignment, the respective profiles are much more powerful than BLAST and find more distant sequences than are present in either superfamily or AnDom assignments. On the other hand, we miss all the assignments provided by single member superfamilies in SCOP. Figure 4
shows a direct increase in sensitivity in identifying known structure families. It is also clear that HMMer searches are again better than the BLAST counterparts. For example, HMMer profiles assign approximately twice as many new domains as BLAST for the E. coli genome (i.e., over and above those assigned by BLAST with single sequences). As the genome of Mycoplasma genitalium has been frequently used as a benchmark, we sought to discover whether the combined profiles could improve structural annotation of this genome (Teichmann et al. 1999). We made an additional 56 (or 11%) assignments (comparison to assignments provided in the work of Teichmann and colleagues [Teichmann et al. 1999]) on the Mycoplasma genitalium genome. It should be noted that part of this improvement could come from increases in database size and diversity since the original study.
|
|
|
Similarities among proteins with different folds
We also found instances where profiles suggested relationships between different folds. For example, those for the FAD/NAD(P) binding domain fold find members of the NAD(P) binding Rossmann (Pfam pair GDI + FAD_binding_ 3; DAO family at HMMer E-value of 0.0067) and Nucleotide binding folds (3HCDH_N family at HMMer E-value of 0.67). These three folds have previously been proposed to share a common ancestor but have accumulated structural changes during the course of evolution (Grishin 2001).
Similarly the chitin-binding domain in SMART (ChtBD2) is detected by merged profiles from different families of knottins with very low E-values (e.g., EGF + EGF_Lam, HMMer E = 3.4 x 109). SCOP classifies representative structures of ChtBD2 domain as the only member of the Tachycitin fold and knottins as members of EGF/Laminin-like fold. Literature searches and superposition of representative structures show that chitin-binding proteins in invertebrates and plants comprise a common chitin-binding structural motif (Suetake et al. 2000).
Merged profiles of the zinc-finger motif also found small cysteine-rich repeats in many proteins. For example, merged profiles found two CXXC repeats in the TOPRIM domain of O29238 [GenBank] , which are not identified by SMART or Pfam HMMs. The structure of the TOPRIM domain is known and contains an insertion zinc finger motif in some cases (Rodriguez and Stock 2002). These may reflect the power of profiles to identify a remarkable local similarity in proteins.
Unification of superfamilies in the same fold
There were also many examples where mergers linked different superfamilies within the same fold, suggesting that they might indeed share a common evolutionary origin. For example, profiles from the merged immunoglobulin (Ig) variable and fibronectin type III (FN3) domains detect the purple acid phosphoesterases family at an HMMer E-value of 0.043 (SMART pair IGv/FN3). The HMMer profile of the merger of the IGc1 and IGc2 domains also finds FN3 at an E-value of 5.1 x 106 (SMART pair IGc1 + IGc2). Within the helix-turn-helix (HTH) fold, merged profiles of the arsenic resistance operon repressor and homeodomains family find the lux regulon domain (SMART pair of HTH_ARSR + HOX at HMMer E = 0.13), which is classified as a C-terminal effector domain of bipartite response regulator superfamily under an HTH fold.
The Ndr family
The Ndr (N-myc oncogene Downstream Regulated) family is named after a representative member found in a developing mouse embryo (Shimono et al. 1999), which is also found in humans, C. elegans and Drosophila. The Ndr-containing MESK2 gene of Drosophila is implicated in the Ras oncogene signaling pathway (Huang and Rubin 2000) and thus cancer. However, little is known about the molecular function of these proteins. Merged alignments of the Peptidase_S9 and
/
hydrolase Pfam families (both of which belong to the SCOP
/
hydrolase superfamily) find members of the Ndr family with HMMer E-values of 0.023. The relationship is confirmed by 3D-PSSM and SUPERFAMILY and was also noticed as the highest scoring match in the noise of the
/
hydrolase family profile by the Pfam annotators (see http://Pfam.wustl.edu).
The predicted secondary structure of Ndr fits well with the predicted fold, and there are also hints of key residue conservation, e.g., two invariant glycines found at the "nucleophile elbow" next to
5 in all
/
hydrolases are also conserved in Ndr homologs. A triad comprising a nucleophile (usually serine or cysteine), an acid (glutamate or aspartate), and a histidine are key parts of the catalytic enzymes in
/
hydrolases, with residues forming the oxy-anion-hole usually located on
-strands 5 and 3. The reaction chemistry for these proteins is similar to that of serine protease and subtilisin families (Ollis et al. 1992; Heikinheimo et al. 1999). Although the catalytic triad of chloroperoxidase (the best match to a known structure), Ser, Asp, and His is not conserved in the Ndr family (replaced by Gly, Ser/Ala, and Gly/Asp, respectively), other conserved positions containing theses residues are close in space when modeled onto chloroperoxidase. Thus Ndrs might exhibit "active-site migration," known to occur in
/
hydrolases (Ollis et al. 1992) and other enzyme families (e.g., Todd et al. 2001 e.g., Todd et al. 2002).
The accessory domain of diacylglyceryl kinase
The merged profile for the CSF2 and IL2 domains of SMART, which belong to the very diverse four helical cytokine superfamily, find members of the diacylglyceryl kinase accessory (DAGKa) domain (BLAST E = 0.003). This prediction was not confirmed by fold recognition methods, though the predicted secondary structure and certain hydrophobic residue conservation broadly support the assignment (Supplementary Fig. 1
). This prediction is surprising since all four helical cytokines are extracellular effectors, whereas DAGKas are intracellular. A precedent for such phenomenon is known to exist: Several members of the fibroblast growth factors, which are also extracellular effectors, are known to be intracellular signaling molecules (Schoorlemmer and Goldfarb 2001). However, there is also a good chance that this prediction is an artifact, particularly as the BLAST E-value is comparatively high. Alignment of DAGKa family members with four helical cytokine fold members is available as Supplementary Figure 1
.
Overall increase in structural knowledge
We also measured the increase in links to known structure as increasingly more sensitive sequence comparison methods are used for annotation (Fig. 5
). Considering only the 99,023 Swissprot sequences containing SMART or Pfam domains (Fig. 5
), simple identity or homology assigns links to 27,036 (PDB, 2090 or HSSP, 24,946). Many more links are added based on the presence of a domain with representatives of known structure (SMART + Pfam, 12,379). The 53 assignments in Table 1
add 1067 ("PB or Andom" in Fig. 5
) additional links, of which 95 ("Unique") come from the seven novels (Table 2
). This brings the total number of links to 40,367, or 41% of the total, of which 1% comes from merged profiles.
|
| Discussion |
|---|
|
|
|---|
Structural information has been used to aid the detection of remote homologs. Structure-dependent gap penalties, residue secondary structure, accessibility, and interaction pair preferences have all been incorporated into methods of fold recognition (Williams et al. 2001; Dietmann et al. 2002; Schonbrun et al. 2002; McGuffin and Jones 2003). Most recently, these methods have added information about homologous sequences to improve sensitivity (Kelley et al. 2000; Koretke et al. 2001; Williams et al. 2001). Here we have assessed the added value of combining sequence information for families that can only be aligned using knowledge of their structures.
Our findings also largely agree with others (Kelley et al. 2000; Panchenko and Bryant 2002), which suggests that more diverse alignments are generally less able to find distant homologs than their separate constituents. However, we have seen that this is not systematic: Some mergers increase sensitivity, while others decrease it. This suggests that the best strategy is clearly a combined one, where both merged and separated searches complement each other to attain the best sensitivity. New structural similarities, when exploited using this strategy, will often uncover more new members of the superfamily than the unaligned families in isolation.
Structural genomics initiatives continue to find new relationships between diverse protein structures. It is important to make the best use of the discovery in order to find new members of diverse sequence families. Methods like that described here and elsewhere (Pandit et al. 2002) help to do this, and thus provide a more complete picture of the structure and function of proteins within genomes.
| Materials and methods |
|---|
|
|
|---|
Searching and evaluation of accuracy
BLAST and HMMer searches were made with a library of superfamily/fold alignment profiles using default search parameters against databases containing all protein sequences from Pfam and SMART. Profiles generated from Pfam alignments were used to search Pfam and those generated using SMART were used to search SMART. For evaluation purposes, we considered the subset of those sequences for which we knew SCOP superfamily and fold membership by homology to a known structure. From the results of the searches, we computed standard definitions of Specificity and Sensitivity (Ingelfinger et al. 1987; Russell et al. 1998):
![]() |
where TP, TN, FP, and FN denote true-positives, true-negatives, false-positives, and false-negatives, respectively. The SCOP definition of family, superfamily, and fold are used for the categorization of hits. Positives are those sequences with E-values at or better than a threshold value (described below) and are divided into true or false depending on whether or not they belong to the same superfamily or fold as the query profile. Negatives are those with E-values poorer than the threshold, divided into true or false in the opposite manner. As a control, we also did equivalent (HMMer and BLAST) searches for separated (i.e., unmerged) profiles.
Benchmarking on genomes
Protein sequences of Mycoplasma genitalium, Streptococcus pneumoniae R6, Escherichia coli K12 and Saccharomyces cervisiae genomes were downloaded from http://www.ncbi.nlm.nih.gov, searched using the above library of HMMer and PSI-BLAST profiles, and compared to assignments from SUPERFAMILY (Gough et al. 2001; Gough and Chothia 2002), AnDom (Schmidt et al. 2002), and BLAST (McGinnis and Madden 2004). We also compared our assignments for Mycoplasma genitalium to a standard benchmark by Teichmann et al. (1999), which combines assignments from many methods (Huynen et al. 1998; Wolf et al. 1999). AnDom assignments on genomes were done with the SCOP (1.59 release) profiles and an E-value filter value of 0.001 obtained from the authors (Schmidt et al. 2002). SUPERFAMILY assignments were filtered with an E-value of 0.01 as recommended by the SAM authors (Karplus et al. 1998). It should be noted here that we require at least two structures related at superfamily level to represent any given fold in our profiles. Hence, we miss all the assignments provided by single member superfamilies in SCOP. All data are available at http://www.bork.embl-heidelberg.de/~shah/pub/.
Fold prediction and sequence analysis
Fold predictions were done with the Web servers 3D PSSM (Kelley et al. 2000), SUPERFAMILY (Gough and Chothia 2002), and FUGUE (Shi et al. 2001) with several members for each family chosen from different branches of an evolutionary tree derived using the N-J tree option of CLUSTALX (Jeanmougin et al. 1998). Fold recognition thresholds were taken from the relevant Web sites (3D PSSM, E
0.05; SUPERFAMILY, E
0.01; FUGUE, Z
6.0). Secondary structures were predicted using Jpred2 (Cuff et al. 1998), Psipred (McGuffin et al. 2000), and PHD (Rost and Sander 1994). Structural alignment of representative structures of four helical cytokine families were prepared using STAMP (Russell and Barton 1992) and merged with the SMART alignment of the DagKa family. The Ndr family was merged with the structurally aligned thioesterase family members (STAMP).
| Footnotes |
|---|
| References |
|---|
|
|
|---|
Aloy, P., Oliva, B., Querol, E., Aviles, F.X., and Russell, R.B. 2002. Structural similarity to link sequence space: New potential superfamilies and implications for structural genomics. Protein Sci. 11: 11011116.
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 33893402.
Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths-Jones, S., Howe, K.L., Marshall, M., and Sonnhammer, E.L. 2002. The Pfam protein families database. Nucleic Acids Res. 30: 276280.
Blattner, F.R., Plunkett 3rd, G., Bloch, C.A., Perna, N.T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J.D., Rode, C.K., Mayhew, G.F., et al. 1997. The complete genome sequence of Escherichia coli K-12. Science 277: 14531474.
Cuff, J.A., Clamp, M.E., Siddiqui, A.S., Finlay, M., and Barton, G.J. 1998. JPred: A consensus secondary structure prediction server. Bioinformatics 14: 892893.
Dietmann, S., Fernandez-Fuentes, N., and Holm, L. 2002. Automated detection of remote homology. Curr. Opin. Struct. Biol. 12: 362367.[CrossRef][Medline]
Eddy, S.R. 1998. Profile hidden Markov models. Bioinformatics 14: 755763.
Eswaramoorthy, S., Gerchman, S., Graziano, V., Kycia, H., Studier, F.W., and Swaminathan, S. 2003. Structure of a yeast hypothetical protein selected by a structural genomics approach. Acta Crystallogr. D. Biol. Crystallogr. 59: 127135.[CrossRef][Medline]
Fraser, C.M., Gocayne, J.D., White, O., Adams, M.D., Clayton, R.A., Fleischmann, R.D., Bult, C.J., Kerlavage, A.R., Sutton, G., Kelley, J.M., et al. 1995. The minimal gene complement of Mycoplasma genitalium. Science 270: 397403.
Goffeau, A., Barrell, B.G., Bussey, H., Davis, R.W., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J.D., Jacq, C., Johnston, M., et al. 1996. Life with 6000 genes. Science 274: 546, 563567.
Gough, J. and Chothia, C. 2002. SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucleic Acids Res. 30: 268272.
Gough, J., Karplus, K., Hughey, R., and Chothia, C. 2001. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J. Mol. Biol. 313: 903919.[CrossRef][Medline]
Griffiths-Jones, S. and Bateman, A. 2002. The use of structure information to increase alignment accuracy does not aid homologue detection with profile HMMs. Bioinformatics 18: 12431249.
Grishin, N.V. 2001. Fold change in evolution of protein structures. J. Struct. Biol. 134: 167185.[CrossRef][Medline]
Heger, A., and Holm, L. 2001. Picasso: Generating a covering set of protein family profiles. Bioinformatics 17: 272279.
Heikinheimo, P., Goldman, A., Jeffries, C., and Ollis, D.L. 1999. Of barn owls and bankers: A lush variety of
/
hydrolases. Structure Fold. Des. 7: R141R146.[Medline]
Holm, L. and Sander, C. 1996. The FSSP database: Fold classification based on structurestructure alignment of proteins. Nucleic Acids Res. 24: 206209.
Hoskins, J., Alborn Jr., W.E., Arnold, J., Blaszczak, L.C., Burgett, S., DeHoff, B.S., Estrem, S.T., Fritz, L., Fu, D.J., Fuller, W., et al. 2001. Genome of the bacterium Streptococcus pneumoniae strain R6. J. Bacteriol. 183: 57095717.
Huang, A.M. and Rubin, G.M. 2000. A misexpression screen identifies genes that can modulate RAS1 pathway signaling in Drosophila melanogaster. Genetics 156: 12191230.
Huynen, M., Doerks, T., Eisenhaber, F., Orengo, C., Sunyaev, S., Yuan, Y., and Bork, P. 1998. Homology-based fold predictions for Mycoplasma genitalium proteins. J. Mol. Biol. 280: 323326.[CrossRef][Medline]
Ingelfinger, J.A., Mosteller, F., Thibodeau, L.A., and Ware, J.H. 1987. Biostatistics in clinical medicine. Macmillan, New York.
Jaroszewski, L., Rychlewski, L., and Godzik, A. 2000. Improving the quality of twilight-zone alignments. Protein Sci. 9: 14871496.[Abstract]
Jeanmougin, F., Thompson, J.D., Gouy, M., Higgins, D.G., and Gibson, T.J. 1998. Multiple sequence alignment with Clustal X. Trends Biochem. Sci. 23: 403405.[CrossRef][Medline]
Karplus, K., Barrett, C., and Hughey, R. 1998. Hidden Markov models for detecting remote protein homologies. Bioinformatics 14: 846856.
Kelley, L.A., MacCallum, R.M., and Sternberg, M.J. 2000. Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol. 299: 499520.[Medline]
Koretke, K.K., Russell, R.B., and Lupas, A.N. 2001. Fold recognition from sequence comparisons. Proteins Suppl. 5: 6875.
Letunic, I., Goodstadt, L., Dickens, N.J., Doerks, T., Schultz, J., Mott, R., Ciccarelli, F., Copley, R.R., Ponting, C.P., and Bork, P. 2002. Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res. 30: 242244.
Lo Conte, L., Brenner, S.E., Hubbard, T.J., Chothia, C., and Murzin, A.G. 2002. SCOP database in 2002: Refinements accommodate structural genomics. Nucleic Acids Res. 30: 264267.
McGinnis, S. and Madden, T.L. 2004. BLAST: At the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. 32: W20W25.
McGuffin, L.J. and Jones, D.T. 2003. Improvement of the GenTHREADER method for genomic fold recognition. Bioinformatics 19: 874881.
McGuffin, L.J., Bryson, K., and Jones, D.T. 2000. The PSIPRED protein structure prediction server. Bioinformatics 16: 404405.
Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247: 536540.[CrossRef][Medline]
Ohlson, T., Wallner, B., and Elofsson, A. 2004. Profileprofile methods provide improved fold-recognition: A study of different profileprofile alignment methods. Proteins 57: 188197.[CrossRef][Medline]
Ollis, D.L., Cheah, E., Cygler, M., Dijkstra, B., Frolow, F., Franken, S.M., Harel, M., Remington, S.J., Silman, I., Schrag, J., et al. 1992. The
/
hydrolase fold. Protein Eng. 5: 197211.
Orengo, C.A., Pearl, F.M., and Thornton, J.M. 2003. The CATH domain structure database. Methods Biochem. Anal. 44: 249271.[Medline]
Panchenko, A.R. and Bryant, S.H. 2002. A comparison of position-specific score matrices based on sequence and structure alignments. Protein Sci. 11: 361370.
Pandit, S.B., Gosar, D., Abhiman, S., Sujatha, S., Dixit, S.S., Mhatre, N.S., Sowdhamini, R., and Srinivasan, N. 2002. SUPFAMA database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: Implications for structural genomics and function annotation in genomes. Nucleic Acids Res. 30: 289293.
Rodriguez, A.C. and Stock, D. 2002. Crystal structure of reverse gyrase: Insights into the positive supercoiling of DNA. EMBO J. 21: 418426.[CrossRef][Medline]
Rost, B. and Sander, C. 1994. Combining evolutionary information and neural networks to predict protein secondary structure. Proteins 19: 5572.[CrossRef][Medline]
Russell, R.B. and Barton, G.J. 1992. Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels. Proteins 14: 309323.[CrossRef][Medline]
Russell, R.B., Sasieni, P.D., and Sternberg, M.J. 1998. Supersites within super-folds. Binding site similarity in the absence of homology. J. Mol. Biol. 282: 903918.[CrossRef][Medline]
Rychlewski, L., Jaroszewski, L., Li, W., and Godzik, A. 2000. Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci. 9: 232241.[Abstract]
Schmidt, S., Bork, P., and Dandekar, T. 2002. A versatile structural domain analysis server using profile weight matrices. J. Chem. Inf. Comput. Sci. 42: 405407.[CrossRef][Medline]
Schonbrun, J., Wedemeyer, W.J., and Baker, D. 2002. Protein structure prediction in 2002. Curr. Opin. Struct. Biol. 12: 348354.[CrossRef][Medline]
Schoorlemmer, J. and Goldfarb, M. 2001. Fibroblast growth factor homologous factors are intracellular signaling proteins. Curr. Biol. 11: 793797.[CrossRef][Medline]
Shi, J., Blundell, T.L., and Mizuguchi, K. 2001. FUGUE: Sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol. 310: 243257.[CrossRef][Medline]
Shimono, A., Okuda, T., and Kondoh, H. 1999. N-myc-dependent repression of ndr1, a gene identified by direct subtraction of whole mouse embryo cDNAs between wild type and N-myc mutant. Mech. Dev. 83: 3952.[CrossRef][Medline]
Sowdhamini, R., Burke, D.F., Huang, J.F., Mizuguchi, K., Nagarajaram, H.A., Srinivasan, N., Steward, R.E., and Blundell, T.L. 1998. CAMPASS: A database of structurally aligned protein superfamilies. Structure 6: 10871094.[Medline]
Suetake, T., Tsuda, S., Kawabata, S., Miura, K., Iwanaga, S., Hikichi, K., Nitta, K., and Kawano, K. 2000. Chitin-binding proteins in invertebrates and plants comprise a common chitin-binding structural motif. J. Biol. Chem. 275: 1792917932.
Teichmann, S.A., Chothia, C., and Gerstein, M. 1999. Advances in structural genomics. Curr. Opin. Struct. Biol. 9: 390399.[CrossRef][Medline]
Todd, A.E., Orengo, C.A., and Thornton, J.M. 2001. Evolution of function in protein superfamilies, from a structural perspective. J. Mol. Biol. 307: 11131143.[CrossRef][Medline]
. 2002. Plasticity of enzyme active sites. Trends Biochem. Sci. 27: 419426.[CrossRef][Medline]
Williams, M.G., Shirai, H., Shi, J., Nagendra, H.G., Mueller, J., Mizuguchi, K., Miguel, R.N., Lovell, S.C., Innis, C.A., Deane, C.M., et al. 2001. Sequence-structure homology recognition by iterative alignment refinement and comparative modeling. Proteins Suppl. 5: 9297.
Wolf, Y.I., Brenner, S.E., Bash, P.A., and Koonin, E.V. 1999. Distribution of protein folds in the three superkingdoms of life. Genome Res. 9: 1726.
Zhang, C. and Kim, S.H. 2003. Overview of structural genomics: From structure to function. Curr. Opin. Chem. Biol. 7: 2832.[CrossRef][Medline]
Zhang, H., Huang, K., Li, Z., Banerjei, L., Fisher, K.E., Grishin, N.V., Eisenstein, E., and Herzberg, O. 2000. Crystal structure of YbaK protein from Haemophilus influenzae (HI1434) at 1.8 Å resolution: Functional implications. Proteins 40: 8697.[CrossRef][Medline]
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |