|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Institute for Protein Research, Osaka University, Suita, Osaka, 565-0871, Japan
2 PRESTO, JST, Kawaguchi, Saitama 332-0012, Japan
Reprint requests to: Kengo Kinoshita, The Institute of Medical Science, The University of Tokyo, 4-6-1, Shirokanedai, Minato-ku, Tokyo, 108-8639, Japan; e-mail: kino{at}ims.u-tokyo.ac.jp; fax: 81-3-5449-5133.
(RECEIVED August 29, 2004; FINAL REVISION November 9, 2004; ACCEPTED November 10, 2004)
| Abstract |
|---|
|
|
|---|
Keywords: structure-based function prediction; hypothetical proteins; structural genomics; protein three dimensional structure
Article published online ahead of print. Article and publication date are at http://www.proteinscience.org/cgi/doi/10.1110/ps.041080105.
| Introduction |
|---|
|
|
|---|
A similar problem also exists in the sequencefunction relationship, where it is still unknown how the protein sequence determines its function. However, fruitful results have been obtained in the sequence analyses field, by putting the ultimate problem aside and using the indirect but strong correlation between sequence similarity and functional similarity, which is possibly a consequence of evolutionary pressure on functional proteins (Durbin et al. 1998). In the same way, in the structural biology of proteins, proteins with similar structures have been analyzed to gain some inferences on their functions from the structural similarity.
Frequently used approaches are based on the global fold similarity (Holm and Sander 1996; Holm and Park 2000; Thornton et al. 2000). However, it is now being gradually accepted that the level of fold similarity does not always correlate with the functional similarity (Todd et al. 2001), as seen in the observation that a limited number of protein folds are used repeatedly and others are not (Orengo et al. 1994; Holm and Sander 1996; Brenner et al. 1997). Thus, several groups have started to focus their attention on the similarity of local structures in proteins. In these approaches, various types of structure representations are used to define the structural similarity, because the type of structural similarity that correlates well with the functional similarity has not been established. For example, the most straightforward representation is the spatial arrangement of atoms (Kinoshita et al. 1999; Kleywegt 1999), where all atomic positions are used explicitly, and thus if some similarity is detected, it can strongly imply the functional similarity. However, since protein structures are flexible, the explicit use of atomic position could be too sensitive to small structural change. Another approach is to use a structural template to handle the small structural changes observed in the functional sites (Moodie et al. 1996; Wallace et al. 1996, 1997; Dawe et al. 2003). From similar viewpoints, abstract side-chain models could be useful to avoid the explicit position of atoms (Artymiuk et al. 1994). These methods seemed to work fine, but quite interestingly, few similarities were found among proteins with different folds, that is, proteins possibly belonging to different evolutionary origins. In other words, the representation of an explicit or semiexplicit atomic position may not be applicable for detecting a similarity beyond the evolutionary relationship, such as in sequence analyses.
Our goal is to develop a method to predict the molecular functions of proteins from their 3D structures. The aim is to introduce the structural information, and the method should detect the similarity among proteins with different folds. We have reported the first trial of the functional annotation with the hypothetical proteins TT1754 (Handa et al. 2003) and MJ0226 (Kinoshita and Nakamura 2003), where we showed that the molecular surface representation (Connolly 1983) of proteins could be a promising method to detect the similarity beyond the fold levels.
There have been several approaches using the molecular surfaces of proteins. In particular, Wolfson and coworkers extensively used molecular surface representation to search for similar functional sites (Lin et al. 1994; Lin and Nussinov 1996; Rosen et al. 1998; Shulman-Peleg et al. 2004), where they tried to reduce the number of vertices on the molecular surface as much as possible, to enhance the search speed. It is important for the search method to work quickly, but reducing the number of vertices could make the search method insensitive to small differences in the molecular surface, because the representative vertexes can change their positions according to small structural changes.
We now describe an expansion of our previous approaches (Kinoshita et al. 2002; Kinoshita and Nakamura 2003) based on similarity searches of the molecular surfaces. In our approach, no reduction of the vertices was carried out, and all of the surface points were used for the similarity search. Although the heavy calculation required the full molecular surface, the number of functional sites to be searched was small and the application range was limited. Here, we extended the database to be searched to almost all of the heterogeneous atom binding sites that appeared within the PDB, and developed a new analysis method for the search results, to solve some problems arising from the database expansion.
The expansion of the database is not a simple task due to the large variety of ligands appeared in PDB. In the small data set, a single index of similarity and a single threshold may work well as in the previous work. However, in the huge data set, as in this study, one single index of similarity measure is not enough; thus, we should search for some other indices to evaluate the similarity as described later. Furthermore, there are some general problems in the study of proteinligand interaction using PDB: (1) a ligand in crystal structure may not be a cognate ligand, especially in the enzymatic protein, so the interaction can be different from the physiological one; (2) heteroatoms appeared in PDB could be molecules in the buffer of crystallization, and thus such interaction may not be significant in biological context; and (3) crystal contact may create unphysiological binding site. For the first problem, we tried to use as many ligand binding sites as possible, and thus we did not exclude the proteins redundantly appeared from the viewpoint of sequence similarity, where we intend to incorporate as much alternative interactions appeared in PDB as possible. For the second problem, we excluded the binding sites for the molecules commonly used as the crystallization buffer as described later. The third problem remains unsolved in this study. In addition, it is a simple problem but the larger database requires more computation time. This problem is managed by massive parallel computations in the current study.
| Results and Discussion |
|---|
|
|
|---|
The normalization with the Z-score worked fine when the variety of functional site patches was not as large as in the previous small data set (Kinoshita and Nakamura 2003), because the number of corresponding vertexes for each patch, as expected from the size of the patch, was not so different, and the normalization for the size differences of the patches was not required. However, if we used a data set with a large variety of ligand binding sites, as in this study, the differences in the sizes of the functional site patches could cause trouble; that is, large patches tend to yield large Z-scores, which are not satisfactory.
To overcome this difficulty, we introduced another index to evaluate the similarity, that is, the coverage. The coverage is the ratio of the number of corresponding vertexes to that of the vertexes in the functional site patch, and so it ranges from 0.0 to 1.0. The aim of the index is to normalize the difference in the expected number of corresponding vertexes. Actually, through the following applications we observed a strong tendency for the binding sites with large Z-scores (large binding sites) to have small coverage value on average, as seen in the left-lower side of the two-dimensional (2D) plot for the coverage and the Z-score (Fig. 1
). Therefore, a binding site with a larger Z-score and larger coverage is considered as a binding site with higher similarity. Hereafter, we use these two indices, the coverage and the Z-score, and the results are shown in the 2D plot (Figs. 1
, 2
, and so on).
|
|
For each entry for the free form of the 10 proteins, we carried out similarity searches against the search database and determined the correct and incorrect answers by analyzing the results as described in Materials and Methods section. In short, the judgment of whether the prediction was correct or not was done by using the following criteria: (1) the distance between the center of gravity of the predicted ligand and that of the ligand in the complex form is <5 Å, and (2) the predicted ligand is "similar" to the known ligand. The definition of the similarity of the ligand was judged manually by inspecting to the heteroatom dictionary in the PDB. (All the correct answers in our learning data set are listed in Supplemental Table S1.)
Figure 1
shows the distribution of correct and incorrect answers that we determined. To establish the threshold line, we used Matthews correlation coefficient (MCC) as an evaluation indicator. The MCC can be calculated by
![]() |
where TP, TN, FP, and FN are the number of true positives, true negatives, false positives, and false negatives, respectively. These numbers can be calculated when a threshold line is given. If a correct answer is placed above the given threshold line, it is considered as a true positive, otherwise it is a false negative. There seem to be no guiding principles for determining a particular mathematical function for the threshold line. Thus, we tried several types of threshold lines, and the coverage = a/(Z + b) + c was found to give a good MCC value, where a, b, and c are parameters and Z is the Z-score. The parameters were determined by maximizing the MCC by calculating all possible combinations of the parameters in the range of 0.15.0, 2.02.0, and 0.30.3 for a, b, and c, respectively, with 50 equal intervals, thus 503 times calculations were carried out. However, it should be noted that the number of correct examples (TP + FN) was much smaller than that of incorrect examples (TN + FP) in this case. Therefore, the maximum MCC would be achieved by reducing FP at the expense of decreasing TP. However, this would not be desirable, because our aim was to find as many similar binding sites as possible, even though some false positives would be included in the results. In other words, sufficient numbers of TP should be retained. Therefore, we introduced another constraint in the maximization of the MCC, so that the TP percentage should exceed a given value. Here, we show three results, with 90%, 70%, and 50% TP constraints. Each TP percentage must be larger than each value in the maximization processes. With the constraints, we obtained 0.68, 0.46, and 0.34 MCC values with 50%, 70%, and 90% constraints, respectively. The loosest threshold, the 90% TP line, was used for the following applications, but to evaluate the confidence of the prediction, we used all of the threshold lines.
Newly determined hypothetical proteins
We applied our methods to 18 newly determined, hypothetical proteins. They were selected as described in the Materials and Methods section. The results are summarized in Figure 2A
and Supplemental Figure S1, where similar binding sites above the 50% TP line, 70% TP line, and 90% TP line are indicated by red, green, and white circles, respectively. In addition, all of the detected binding sites are listed in Table 1
for the first four entries and Supplemental Table S2 for the other 14 entries.
|
One of the three complex forms is the protein HI0766 (1j85 [PDB] ), which is a hypothetical protein from Haemophilus influenzae. The protein shows sequence similarity to the spoU family, which catalyzes the reaction of S-adenosylmethionine (AdoMet)dependent tRNA/rRNA methyltransferase, with 20%30% sequence identity, but it has a shorter polypeptide chain than spoU by ~70 amino acids. Because of the low sequence similarity, HI0776 was considered as a putative relative to spoU or as a hypothetical protein. However, Lim et al. (2003) suggested that this protein is likely to be a member of the spoU family by their X-ray crystallography analyses with and without S-adenosylhomocysteine (AdoHcy), which is a product of the methytransfer reaction using AdoMet. Their structural determination unexpectedly revealed that HI0766 (1j85) has a novel fold and AdoHcy assumes an unusual conformation.
As a result of our similarity search, 38 similar binding sites were found, and they could be classified into seven putative binding sites according to their position on the query protein (Table 1
). Among them, several strong similarities (red or green circles in Fig. 2
of 1j85) were found, but no AdoHcy binding sites were detected. However, the E09 (indolequinone derivatives, 3-hydroxymethyl-5-aziridi-nyl-1-methyl-2-(H-indole-4,7-indione)-propenol) binding site found in NAD(P)H:Quinone oxidoreductase (PDB: 1gg5
[PDB]
) showed relatively high similarity (Z = 4.4, coverage = 0.44), as seen in Figure 2
, 1j85, and it was the SAH binding site revealed by Lim and colleagues (Lim et al. 2003). In the same binding site, the possibility of mono-nucleotide binding was also predicted by our methods, from the similarity to the ANP binding site in 1kor (Z = 3.5, coverage = 0.36), the UDP binding site in 1f6d (Z = 3.4, coverage = 0.33), and the NAD binding site in 1guz (Z= 4.4, coverage = 0.27) (Table 1
).
As shown in the virtual complex in Figure 3
, our method could successfully predict the position of the binding site but failed to predict the kind of compound that could be bound. To understand why our methods could not find the AdoHcy binding site in the query protein, we examined all of the AdoHcy structures in the PDB, and found that the AdoHcy in HI0766 (1j85
[PDB]
) has a unique conformation, as pointed out by Lim et al. (2003). Usually, AdoHcy has an extended conformation, but in HI0766 (1j85) it has a compact conformation. This difference seemed to prevent our methods from predicting the AdoHcy binding site of this protein.
|
|
Among the other seven hypothetical proteins, three proteins (1nc5 [PDB] , 1o6d, and 1mw7) have distinctive homologs in the sequence database.
The conserved hypothetical protein yteR is taken from Bacillus subtilis (PDB: 1nc5 [PDB] ), and is annotated as an unknown protein in the NCBI Entrez/Protein database (http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=16080064). However, from the structural viewpoint, this protein is considered to be a member of the Six-hairpin glycosyltransferase superfamily, and a weak sequence similarity to the catalytic domain of cellulases was reported in the SCOP database (Lo Conte et al. 2002). Furthermore, our sequence analysis of this protein identified some conserved residues. According to the similarity search with our methods, there were 35 significant similarities that could be classified into 15 clusters in their positions on the molecular surface. Among them, the one with high similarity (Z = 4.4, coverage = 0.46) to the BMSC-10 ((3-nitro-5-(3-morpholin-4-yl-propylamino-carbonyl)-phenyl)- galactopyranoside) binding site on heat-labile enterotoxin (LT; PDB: 1jqy [PDB] ) may be promising, because LT (1jqy) is known to interact with ganglioside, which is a similar compound to similar cellulose and is located on the surface of human intestinal epithelial cells (BMSC-10 is an inhibitor of this interaction), and because the conserved residues found in yteR (1nc5; 88D, 132H, 136Y, 141W, 143D, 189H, 211W, 213R, 217W, 340Y) are located in the vicinity of the putative BMSC-10 binding site.
In the case of 1o6d, the BCX-1812 (3-(1-acetylamino-2-ehyl-butyl)-4-guanidino-2-hydroxy-cyclopentanecarboxylic acid) binding site in 1l7g was found to be similar (Z = 4.5, coverage = 0.44) to the surface of 1o6d in the vicinity of the conserved site constructed by Gly100, Gly104, and Ser120. In 1mw7, the N-benzyl-e-(
-d-galactos-1-yl)-benzamide binding site in 1fd7 was found to be similar (Z = 3.5, coverage = 0.47) to the surface of 1mw7 near the conserved site consisting of Gly125, Leu127, and Phe131.
Four other cases (1nog, 1j7g, 1lpl, 1oz9) are predicted as sugar binding sites by our methods and the conservation analysis. The significance of sugar binding sites will be discussed later. The other eight entries are orphan hypothetical proteins, and thus an evaluation of our prediction is difficult. Therefore, the results of the remaining eight entries are attached as supplementary materials, Supplemental Figure S1 and Supplemental Table S2, to be open for discussion.
As seen in these examples, our prediction tended to find several putative binding sites on the molecular surfaces of the query proteins. Usually some of them can be judged to be correct in the cases where the complex form of the query protein or a protein homologous to the query protein is known, but the other cases are ambiguous. This ambiguity comes from the lack of biochemical function information. It should be noted that the biochemical function of a protein is such a function for which an experimental assay has been carried out. In other words, there is no evidence that the protein has no other functions. For example, our search methods often detected sugar binding sites in many proteins. They are usually found in the region with the low Z-score and high coverage value. It is true that no experimental support is available for most of the sugar binding sites, but the situation is the same in the case that sugars could not bind to the proteins. In contrast, the heme binding sites tend to be found in the region with the high Z-score and low coverage, and are believed to bind specifically. Therefore, our threshold line in that region may need to be improved with a larger learning data set, because there are only a few true answers in the regions, as seen in Figure 1
.
In summary, the application of our method to the newly determined hypothetical proteins worked well in five cases (1j85 [PDB] , 1qwk, 1nc5, 1o6d, and 1mw6), failed in one case (1mp2), and yielded unknowns in 12 other cases. Besides these applications, as reported previously (Handa et al. 2003; Kinoshita and Nakamura 2003), our methods have some potential to make promising predictions for hypothetical proteins. However, several problems still exist. The large structural change and the missing loop issues were pointed out above. The other problem is the biased variety of ligands in the structural database, which has too many sugar and sugar derivatives. It will be necessary to construct a database that contains representative binding sites not from the viewpoint of the sequence homology but from that of the tertiary structures of proteins.
| Materials and methods |
|---|
|
|
|---|
2.5 Å resolution in the PDB (January 2003 release), and 26,359 binding sites appeared in the data set, which is available through the eF-site database (Kinoshita and Nakamura 2004).
Learning data set
To construct the learning data set, we first picked all the protein pairs with free and complex structures determined by X-ray crystallography with
2.5Å resolution, where the free structures are those without any heterogeneous atoms other than water, SO4, PO4, Cl, Na, and modified residues such as selenomethionine. Furthermore, we picked up the proteins registered as single chain to avoid the problem of finding the correspondence between several chains in the calculation of the sequence identity. The correspondence between the free and complex structures was determined with the 100% sequence identity and
95% alignment coverage. The representative entries were then selected according to the sequence comparison, with the threshold of 30% sequence identity and 80% alignment coverage. As a result, we identified 192 representative pairs in January 2004 from the PDB. In order to select the various ligand sizes in the learning data set, we sorted the 192 representatives pairs according to the number of atoms in the ligand and selected one in every 20 from the list. Finally, we obtained 10 entries as the learning data set, whose PDB IDs are 1af9
[PDB]
-1d0h (NGA), 1ah6-1am1 (ADP), 1cz1-1eqc (CTS), 1gta-1gtb (PZQ), 1qj9-1qj8 (C8E), 1qlq-1g6x (EDO), 1xaa-1hex (NAD), 2plc-1aod (INS), 3app-1bxo (GOL), and 3thi-4thi (PYD). The three-letter code in the parentheses is the heteroatom name assigned in each PDB entry.
Clustering according to the position of the predicted ligand
Usually, several tens of similarities to the known binding sites were detected with our method. To reduce the redundancy, we carried out a cluster analysis according to the position of the predicted ligand. The position of the putative ligand is represented by the center of gravity of the ligand after the superimposition, according to the correspondence of the molecular surface. The clustering analysis was carried out by the single linkage clustering algorithm, and the final clusters were identified with a 5 Å threshold; that is, no further cluster joints would be carried out once all of the distances among the existing clusters exceeded 5 Å. The distance between a pair of clusters was measured by the minimum distance between the members of the two clusters.
Assignments of correct answers in the learning data set
To assign correct answers for each free structure in the search database, we first carried out the similarity searches by surface similarity for each free structure followed by the clustering as described above with the temporary threshold line with the step function form:
![]() |
At the same time, the sequence similarity searches against the same search database using FASTA (Pearson 1994) were carried out. Then we identified such clusters that include the homologous proteins with similar ligands to that appeared in the complex form. The similarity of the ligand was judged according to the ligand name that appeared in the heteroatom dictionary in the PDB. For the entries in the clusters, we manually checked the similarity and determined if the entries are correct or not. All entries that we assigned as correct are shown in the Supplemental Table S1.
Newly determined hypothetical proteins
We have selected the newly determined hypothetical proteins according to the following criteria; that is, the proteins were (1) released after 2003, (2) resolved by X-ray crystallography with
2.5 Å resolution, (3) free from ligand, and (4) had the monomeric structure available in January 2004. We obtained 23 entries. Four of them were a membrane protein, RNA binding proteins (two cases), and a lipid binding protein, which were excluded. One pair, 1iuk and 1iul, is identical, so 1iul was excluded from the final list due to its poorer resolution than that of 1iuk. Finally, we picked up 18 hypothetical proteins from the PDB: 1iuk
[PDB]
, 1j27, 1j7g, 1j85, 1jhs, 1lpl, 1mp2, 1mw7, 1nc5, 1nfj, 1ng6, 1nig, 1nij, 1nog, 1o50, 1o6d, 1oz9, and 1qwk. It should be noted that these hypothetical proteins were selected by just a keyword search, so some proteins have been annotated, as in the case of 1mp2
[PDB]
, as discussed in the Results and Discussion.
Conservation analysis
The sequence conservation analysis was done by the recent version of an evolutionary trace method (Mihalek et al. 2004). Similar sequences were searched with BLAST (Altschul et al. 1997) with the E-value threshold of 105. Multiple sequence alignments were constructed with the clustalW (Higgins et al. 1996). The top 20% of residues with a high degree of importance in terms of entropy measurement were selected as the conserved residues (Mihalek et al. 2004).
Supplementary material
Supplemental materials are a table of entries in the learning data set as "correct" answer (Supplemental Table S1), a table of search results for orphan hypothetical proteins (Supplemental Table S2), and a 2D plot for orphan hypothetical proteins (Supplemental Fig. S1).
| Footnotes |
|---|
1 Present address: The Institute of Medical Science, The University of Tokyo, Minato-ku, Tokyo, 108-8639, Japan. ![]()
| Acknowledgments |
|---|
| References |
|---|
|
|
|---|
Artymiuk, P.J., Poirrette, A.R., Grindley, H.M., Rice, D.W., and Willett, P. 1994. A graph-theoretic approach to the identification of three-dimensional patterns of amino acid side-chains in protein structures. J. Mol. Biol. 243: 327344.[CrossRef][Medline]
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The Protein Data Bank. Nucleic Acids Res. 28: 235242.
Berman, H., Henrick, K., and Nakamura, H. 2003. Announcing the worldwide Protein Data Bank. Nat. Struct. Biol. 10: 980.[CrossRef][Medline]
Brenner, S.E. 2001. A tour of structural genomics. Nat. Rev. Genet. 2: 801809.[CrossRef][Medline]
Brenner, S.E., Chothia, C., and Hubbard, T.J. 1997. Population statistics of protein structures: Lessons from structural classifications. Curr. Opin. Struct. Biol. 7: 369376.[CrossRef][Medline]
Connolly, M.L. 1983. Solvent-accessible surfaces of proteins and nucleic acids. Science 221: 709713.
Dawe, J.H., Porter, C.T., Thornton, J.M., and Tabor, A.B. 2003. A template search reveals mechanistic similarities and differences in
-ketoacyl synthases (KAS) and related enzymes. Proteins 52: 427435.[CrossRef][Medline]
Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. 1998. Biological sequence analysis: models of proteins and nucleic acids. Cambridge University Press, Cambridge, United Kingdom.
Handa, N., Terada, T., Kamewari, Y., Hamana, H., Tame, J.R., Park, S.Y., Kinoshita, K., Ota, M., Nakamura, H., Kuramitsu, S., et al. 2003. Crystal structure of the conserved protein TT1542 from Thermus thermophilus HB8. Protein Sci. 12: 16211632.
Higgins, D.G., Thompson, J.D., and Gibson, T.J. 1996. Using CLUSTAL for multiple sequence alignments. Methods Enzymol. 266: 383402.[Medline]
Holm, L. and Park, J. 2000. DaliLite workbench for protein structure comparison. Bioinformatics 16: 566567.
Holm, L. and Sander, C. 1996. Mapping the protein universe. Science 273: 595603.
Kinoshita, K., and Nakamura, H. 2003. Identification of protein biochemical functions by similarity search using the molecular surface database eF-site. Protein Sci. 12: 15891595.
. 2004. eF-site and PDBjViewer: database and viewer for protein functional sites. Bioinformatics 20: 13291330.
Kinoshita, K., Sadanami, K., Kidera, A., and Go, N. 1999. Structural motif of phosphate-binding site common to various protein superfamilies: All-against-all structural comparison of protein-mononucleotide complexes. Protein Eng. 12: 1114.
Kinoshita, K., Furui, J., and Nakamura, H. 2002. Identification of protein functions from a molecular surface database, eF-site. J. Struct. Funct. Genomics 2: 922.[CrossRef][Medline]
Kleywegt, G.J. 1999. Recognition of spatial motifs in protein structures. J. Mol. Biol. 285: 18871897.[CrossRef][Medline]
Kraulis, P.J. 1991. MOLSCRIPT: A program to produce both detailed and schematic plots of proteins structures. J. Appl. Cryst. 24: 946950.[CrossRef]
Lim, K., Zhang, H., Tempczyk, A., Krajewski, W., Bonander, N., Toedt, J., Howard, A., Eisenstein, E., and Herzberg, O. 2003. Structure of the YibK methyltransferase from Haemophilus influenzae (HI0766 (1J85)): A cofactor bound at a site formed by a knot. Proteins 51: 5667.[CrossRef][Medline]
Lin, S.L. and Nussinov, R. 1996. Molecular recognition via face center representation of a molecular surface. J. Mol. Graph. 14: 7890, 9577.[CrossRef][Medline]
Lin, S.L., Nussinov, R., Fischer, D., and Wolfson, H.J. 1994. Molecular surface representations by sparse critical points. Proteins 18: 94101.[CrossRef][Medline]
Lo Conte, L., Brenner, S.E., Hubbard, T.J., Chothia, C., and Murzin, A.G. 2002. SCOP database in 2002: Refinements accommodate structural genomics. Nucleic Acids Res. 30: 264267.
Mihalek, I., Res, I., and Lichtarge, O. 2004. A family of evolution-entropy hybrid methods for ranking protein residues by importance. J. Mol. Biol. 336: 12651282.[CrossRef][Medline]
Moodie, S.L., Mitchell, J.B., and Thornton, J.M. 1996. Protein recognition of adenylate: An example of a fuzzy recognition template. J. Mol. Biol. 263: 486500.[CrossRef][Medline]
Orengo, C.A., Jones, D.T., and Thornton, J.M. 1994. Protein superfamilies and domain superfolds. Nature 372: 631634.[CrossRef][Medline]
Pearson, W.R. 1994. Using the FASTA program to search protein and DNA sequence databases. Methods Mol. Biol. 24: 307331.[Medline]
Rosen, M., Lin, S.L., Wolfson, H., and Nussinov, R. 1998. Molecular shape comparisons in searches for active sites and functional similarity. Protein Eng. 11: 263277.
Shulman-Peleg, A., Nussinov, R., and Wolfson, H.J. 2004. Recognition of functional sites in protein structures. J. Mol. Biol. 339: 607633.[CrossRef][Medline]
Thornton, J.M., Todd, A.E., Milburn, D., Borkakoti, N., and Orengo, C.A. 2000. From structure to function: Approaches and limitations. Nat. Struct. Biol. 7(Suppl): 991994.
Todd, A.E., Orengo, C.A., and Thornton, J.M. 2001. Evolution of function in protein superfamilies, from a structural perspective. J. Mol. Biol. 307: 11131143.[CrossRef][Medline]
Wallace, A.C., Laskowski, R.A., and Thornton, J.M. 1996. Derivation of 3D coordinate templates for searching structural databases: Application to Ser-His-Asp catalytic triads in the serine proteinases and lipases. Protein Sci. 5: 10011013.[Abstract]
Wallace, A.C., Borkakoti, N., and Thornton, J.M. 1997. TESS: A geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases: Application to enzyme active sites. Protein Sci. 6: 23082323.[Abstract]
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
D. M. Standley, A. R. Kinjo, K. Kinoshita, and H. Nakamura Protein structure databases with new web services for structural biology and biomedical research Brief Bioinform, July 1, 2008; 9(4): 276 - 285. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Kinoshita, Y. Murakami, and H. Nakamura eF-seek: prediction of the functional sites of proteins by searching for similar electrostatic potential and molecular surface shape Nucleic Acids Res., July 13, 2007; 35(suppl_2): W398 - W402. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Ertekin, R. Nussinov, and T. Haliloglu Association of putative concave protein-binding sites with the fluctuation behavior of residues. Protein Sci., October 1, 2006; 15(10): 2265 - 2277. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |