|
|
||||||||
1 Pfizer Discovery Technology Center, Cambridge, Massachusetts 02139, USA
2 Bioinformatics Program and Biomedical Engineering Department, Boston University, Boston, Massachusetts 02215, USA
Reprint requests to: Enoch S. Huang, Pfizer Discovery Technology Center, Cambridge, MA 02139, USA; e-mail: enoch_huang{at}cambridge.pfizer.com; fax: (617) 551-3117.
(RECEIVED July 23, 2003; FINAL REVISION September 10, 2003; ACCEPTED September 23, 2003)
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.03323604.
| Abstract |
|---|
|
|
|---|
Keywords: interactions; binding; evolution; protein structure; sequence conservation
| Introduction |
|---|
|
|
|---|
Nonetheless, several groups have successfully used conservation scores to predict proteinprotein binding sites. Two independent groups (Elcock and McCammon 2001; Valdar and Thornton 2001a) conclude that conservation in combination with other factors can accurately discriminate genuine homodimers from crystal contacts. The majority of methods that predict proteinprotein binding sites also use conservation scores (other approaches are discussed later). Those that map the conservation score to the three-dimensional structure are likely to be the most informative and include Evolutionary Trace (ET; Lichtarge et al. 1996), Consurf (Armon et al. 2001), Rate4Site (Pupko et al. 2002), and the method of Landgraf and Eisenberg (Landgraf et al. 2001). In cases in which a three-dimensional structure is unavailable, residues that are conserved for the entire family or a subfamily within the alignment are predicted to be functional (Casari et al. 1995; Livingstone and Barton 1996; Caffrey et al. 2000; Hannenhalli and Russell 2000). However, assessing the accuracy of these methods has been difficult and usually limited to a few experimentally characterized protein families. Furthermore, we are only aware of a few published experiments that confirm previously computed predictions (Stenmark et al. 1994; Bauer et al. 1999; Sowa et al. 2001).
The physical and chemical properties of proteinprotein interactions have been studied on a large number of complexes by numerous groups (Chothia and Janin 1975; Argos 1988; Janin et al. 1988; Janin and Chothia 1990; Korn and Burnett 1991; Clackson and Wells 1995; Jones and Thornton 1996, 1997; Lijnzaad et al. 1996; Tsai et al. 1996, 1997a,b; Tsai and Nussinov 1997; Xu et al. 1997; Bogan and Thorn 1998; Larsen et al. 1998; Xu and Regnier 1998; Lo Conte et al. 1999; Jones et al. 2000; Sheinerman et al. 2000; Glaser et al. 2001; Chakrabarti and Janin 2002; Sheinerman and Honig 2002). In general, interfaces tend to be planar with an area that is often proportional to the total protein size (Jones and Thornton 1996). The residue composition usually differs for those complexes that are transient versus those that are obligate. This is probably due to the former relying more on salt bridges and hydrogen bonds, whereas the latter rely more on hydrophobic attractions (Jones and Thornton 1997; Lo Conte et al. 1999). There are also many examples of both geometric and electrostatic complementarity between the binding interfaces (Lawrence and Colman 1993; McCoy et al. 1997; Xu et al. 1997; Lo Conte et al. 1999; Sheinerman et al. 2000). Although the interface can be quite large, it was shown in some systems that only a small fraction of the residues contribute to the majority of the binding energy (Clackson and Wells 1995). Furthermore, these so-called hotspots of binding energy tend to have preferred residue types that often have a high degree of burial at the interface (Bogan and Thorn 1998). Interestingly, there is evidence (for 11 families) to suggest that there is a relationship between the enrichment of a residue type in a hotspot and the propensity of the corresponding residue to be conserved (Hu et al. 2000).
In this study, we examine the difference in conservation between the protein interface and the rest of the protein surface for a set of 64 proteinprotein interfaces. As residue conservation depends on the choice of sequences aligned, we construct two multiple-sequence alignments (MSAs) for each protein using two different strategies. The first approach attempts to include closely related sequences, whereas the second includes a more diverse set of sequences. These MSAs are generally expected to contain orthologs and paralogs, respectively, and there are arguments for choosing either MSA type. Orthologs are expected to be almost identical in function, whereas a set of paralogs are expected to have undergone some evolutionary changes so that they can perform slightly different functions. However, nonfunctional residues are often conserved over short periods of evolutionary time, which is a source of noise that will be more prominent in orthologs. When the two approaches are examined and compared with each other, we find that the difference in conservation between the interface and the rest of surface is marginally (but not significantly) better in MSAs of diverse homologs than in MSAs of close homologs. Furthermore, we find that obligate and transient interfaces have different physico-chemical properties that influence their evolutionary rates.
| Results |
|---|
|
|
|---|
ASA of 1% or more, the majority of residues (86%) lose more than 5% ASA upon complex formation. For each data set, none of the chains share significant sequence identity with the other chains (see Materials and Methods).
|
subunit as well as the
subunit (1gotAB_B). Both Ran GTPase (1rrpAB_A; bottom, left quadrant) and calcineurin A (1tcoAB_B; bottom, left quadrant) are also known to interact with several different proteins (Griffith et al. 1995; Moroianu 1999).
|
|
|
In some MSAs of diverse homologs, the interface is a lot more conserved (e.g., a ratio
1.3) than the rest of the solvent exposed surface (Fig. 1
; 1apmIE_E, 1ughIE_E, 1ubsAB_A, 1scuDE_E, 1sftAB_A, and 1pkyAC_A). These are cAMP-dependent kinase, uracil DNA glycosylase, tryptophan synthase
subunit, adenylate kinase, succinyl-CoA synthetase, alanine racemase, and pyruvate kinase, respectively. With the exception of 1ubsAB_A, their interfaces overlap with their active sites, explaining the relatively high conservation. In 1ubsAB_A, the highly conserved interface serves as a conduit in which the substrate can be passed from one active site to another.
Collectively, these results indicate that the alignment type, the presence of multiple faces, and the presence of a catalytic site at the interface can influence the conservation of the interface relative to the rest of the surface.
Comparison of interface residues with other surface patches
Despite a difference in conservation existing between the interface and the rest of the exposed surface for a statistically significant fraction of interfaces, a thorough prediction program will have to consider and rank a large number of candidate surface patches. To explore this, we generated a number of surface patches (one for almost every exposed residue), and use the Z test to examine whether the average conservation of the interface is significantly different from the conservation of all other patches on that protein (Fig. 3
). With the exception of one protein (1k9oie_e), all patches had the same number of residues as the interface. In 1k9oie_e, 40% of the surface patches had fewer residues (minimum of 25 residues) than the actual interface (31 residues). The results of this test are summarized in Table 3
, in which it can be seen that the majority of interfaces are not significantly more conserved than other surface patches (Z > 1.64, corresponding to the 95th percentile of the normal distribution). The MSAs of diverse homologs have slightly more significantly conserved interfaces (9/64) than MSAs of close homologs (6/64). However, the overall differences between the two alignment types are not significant.
|
|
Although most interfaces are not significantly more conserved than other patches, it is possible that the most conserved patch shares some overlap with the interface. In Figure 4
, we consider the most conserved surface patch in each protein and measure its overlap with the actual interface. The degree of overlap between the most conserved surface patch and the actual interface is 39% (± 28%) and 36% (± 28%) for MSAs of diverse and close homologs, respectively. The most conserved surface patch overlaps with 50% of the interface in only 17 of the 64 interfaces for both alignment types (top, right quadrant). However, in the majority of proteins (39/64), the most conserved surface patch has <50% overlap with the actual interface (bottom, left quadrant). These results suggest that protein interfaces can rarely be predicted accurately when using conservation analysis alone, regardless of the alignment type used. Again, the interface tends to be more conserved when it forms an active site.
|
7%) and B-ASA should be distinguished from ASA or
ASA (see Materials and Methods). A peripheral residue is defined as one that is only partially buried upon complex formation (B-ASA > 7%). The majority of residues (85% of peripheral residues, 94% of central residues) lose at least 5% ASA after binding. Figure 5
|
|
For heterodimers, there are both similarities and subtle preferential differences between central residues (Fig. 6A
) and peripheral residues (Fig. 6B
). Leucine is the most prominent conserved residue at the central interface, but is also fairly prominent at the peripheral interface, where its B-ASA ranges from 8.2% to 33.7%. There is some evidence that residues at the proteinprotein interface are less flexible than the rest of the protein surface (Cole and Warwicker 2002), and this need might be met by leucine with its limited conformational diversity (Pickett and Sternberg 1993). The aromatic residues phenylalanine and tyrosine are more prominent in the central interface than the peripheral interface. In contrast, the peripheral interface prefers conserved arginine and glycine residues. This would suggest that pi-interactions of the conserved central aromatic residues are a primary driving force for heterodimerization. The preference for conserved arginines at the peripheral interface is probably due to its ability to form hydrophobic interactions, while still requiring interactions with water or polar molecules. We speculate that the role of glycine is probably more structural, given that it is important in helix caps (Fetrow et al. 1997) and loops (Crasto and Feng 2001). The other surprise at the central interface is the preference for aspartic acid. Its is not clear to us why this is more preferred than glutamic acid, but might also be due to its high propensity to be in loops (Crasto and Feng 2001).
|
With the exception of aspartic acid and arginine, the majority of central residues are hydrophobic. These results suggest that hydrophobic forces primarily drive packing of obligate interfaces.
The frequency of gapped alignment positions at the proteinprotein interface
It is generally thought that gaps in an alignment most often correspond to loops in the protein structure. It is also well known that loops are primarily exposed and often part of an active site or proteinprotein interface. Many of the residues described above are commonly found in loops (Crasto and Feng 2001). Therefore, it could be argued that a prediction method should find a way to reward a candidate surface patch that contains a loop that is believed to be part of the interface. However, many scoring schemes either ignore alignment positions with gaps or introduce a gap penalty, the argument being that a residue position is unlikely to be important if it can be deleted. In this work, our conservation score uses a gap penalty, and we were interested to know how many interface residues had one or more gaps in their alignment position compared with the number found in the rest of the exposed surface. In Figure 7
, obligate interfaces (homodimers and heterodimers) tend to have fewer gaps at their interface than on the rest of their protein surface. This observation is not as striking when using alignments of close homologs (Table 5
). In contrast, the number of interface gaps does not significantly differ from the number of surface gaps for transient interfaces.
|
|
| Discussion |
|---|
|
|
|---|
Occasionally, the protein belonged to a large family in which each subgroup might be expected to differ from other subgroups at the interface. Although our information score assigns a relatively high score to these subgroup specific/tree-determinant sites, the MSAs of diverse homologs will not contain many sequences for a subgroup, whereas the MSAs of close homologs will contain many sequences for just one subgroup (see Materials and Methods). Some of the less-conserved interfaces are likely to be detected by methods that account for the phylogenetic relationships (Lichtarge et al. 1996; Armon et al. 2001; Pupko et al. 2002). Unfortunately, defining the correct subset of sequences is not trivial, particularly if the procedure is to be automated (de Sol Mesa et al. 2003). One strategy might be to define subgroups on the basis of gene duplication events, although this also has caveats. Combining parameters such as tree-determinant information with surface-patch conservation should lead to improved prediction of interfaces. Other parameters that might be combined include residue propensities (Ofran and Rost 2003), physical properties (Jones and Thornton 1997), and evolutionary models of variable residues believed to be functionally important (Hughes and Nei 1988; Pazos et al. 1997; Shirai et al. 2002). Efforts along these lines are underway.
| Materials and methods |
|---|
|
|
|---|
Diverse homolog selection
The objective was to have an MSA containing a diverse set of sequences that would include several paralogs whenever possible. As this is a semiautomated approach, the exact phylogeny of the sequences is unknown for each protein family. Each chain from a complex was searched against the nonredundant protein database using BLASTP with an E-Value cutoff of 0.001 (Altschul et al. 1997). Sequences from each search were clustered together when they shared >60% identity, using BLASTCLUST, which is part of the BLAST package (Altschul et al. 1997). The longest sequence from each cluster was taken and aligned to the structural template using CLUSTALW (Thompson et al. 1994). This prevented oversampling from a particular subgroup of sequences found in each protein family. To ensure that the alignments were of an adequate quality, a number of criteria were used. Sequences that had five or more gaps at positions that were otherwise populated with residues in other sequences (75% of the alignment) were removed. This process was iterated three times. To ensure that a significant portion of the protein was crystallized, we only considered alignments in which the structural template made up 85% or more of the significant sites in the alignment. A significant site was defined as a position in the alignment where >70% of sequences had a residue present. Alignments with continuous stretches of significant sites (20 or more) that were not present in the structural template were removed, as were alignments that had 10 or fewer sequences aligned to the structural template. The structural template had to contain at least 120 residues that were aligned to residues in the other sequences. The remaining structures were compared against each other for sequence redundancy using the BLASTCLUST with a cutoff of 30% identity. Finally, the alignment quality was confirmed by manual inspection with PFAAT (Johnson et al. 2003).
Close homolog selection
The objective was to have an MSA containing a set of sequences that were closely related and would typically be orthologs. Again, the semiautomated approach does not guarantee that all sequences are bona fide orthologs. Depending on the taxonomy assignment of the proteins in Table 1
, the proteins were grouped as belonging to eubacteria, metazoa, or euglenezoa (Wheeler et al. 2000). For eubacteria, each of the sequences was searched against the following genomes: Bacillus anthracis (Ames), Borrelia burgdorferi, Chlamydophila pneumoniae (CWL029), Escherichia coli (K12), Haemophilus influenzae, Helicobacter pylori (J99), Listeria monocytogenes, Mycoplasma penetrans, Neisseria meningitidis (MC58), Pseudomonas aeruginosa, Salmonella typhimurium (LT2), Shigella flexneri (2a), Staphylococcus aureus (MW2), Vibrio cholerae, and Xanthomonas citri. The top hit from each genome was selected if it had an E value of e-10 or better. Sequences belonging to the metazoa group were similarly searched against species databases that were derived from the NCBI nr protein database (Homo sapiens, Mus musculus, Caenorhabditis elegans, Drosophila melanogaster, Rattus norvegicus, Anopheles gambiae, Bos taurus, Gallus gallus, Xenopus laevis, Danio rerio, Ovis aries, Sus scrofa, and Takifugu rubripes). Sequences belonging to the euglenezoa group were searched against the entire NCBI nr database, and the top hits were hand selected from appropriate species.
Residue conservation
The Shannon Entropy for a multiple alignment position can be calculated as follows:
![]() | (1) |
in which P(x) is the relative frequency of each amino acid x in the alignment position. The base of 20 ensures that all values are bounded between zero and one (assuming that we ignore entities such as "X", "Z", "B", and "-"). However, it does not account for the physicochemical similarities that are found between the different amino acids. Therefore, we calculate the Von Neumann entropy for each alignment column. The Von Neumann entropy takes a similar form to equation 1
(Lifshitz and Pitaevskii 1980; Petz 2001):
![]() | (2) |
in which
is a density matrix with trace = 1. Apart from normalization by the trace, the density matrix is given by the product of the relative frequencies of the amino acids in each alignment position [P(x)] and an appropriate similarity matrix (e.g., BLOSUM), that is,
![]() | (3) |
The calculation of equation 2
is facilitated by first calculating the eigenvalues
i of
. In this case, equation 2
is given by the simpler and more computationally efficient equation
![]() | (4) |
In the special case in which the similarity matrix is the identity matrix, equations 2
and 4
become identical to the Shannon Entropy in equation 1
. After trial and error, we found that the BLOSUM 50 target frequencies (blosum50.qij) (Henikoff and Henikoff 1992) gave results that we considered most desirable, but other matrices give appropriate results. To incorporate sequence weights, the frequency for each amino acid is computed as follows:
![]() | (5) |
in which aai is one of the 20 amino acids in the alignment position, wj is the sequence weight for sequence j to which amino acid (aaij) belongs, n is the number of sequences in the alignment, and the sequence weights sum to n. The sequence weights are computed using the method of Henikoff and Henikoff (1994), but could be derived by other means. A gap penalty is enforced using an approach similar to that used by CLUSTALX (Thompson et al. 1997). To do this, the VNE score is first transformed to its information score (IS) by subtracting it from the maximum entropy (i.e., IS = 1 -VNE). The gap penalty is the number of residues in the column, divided by the number of sequences. The information score is then multiplied by the gap penalty. An information score derived from VNE will range from 0 to 1, where a score of 1 is assigned to a 100% identical alignment column. In practice, a score will only be below 0.3 when gaps are present, as the 20 residues are not considered to be completely orthogonal. For residue propensities, we assigned an alignment position as being highly conserved when the information score was
0.85.
Defining interface residues
Interface residues were defined as those that lost >1% relative solvent accessibility upon complex formation (
ASA > 1%). Solvent accessibilities were calculated using the algorithm of Lee and Richards with a probe size of 1.4 Å (Lee and Richards 1971). All complexes with a total interface <1500 Å2 were manually inspected. This involved careful reading of the literature and the PDB files to ensure that all files contained genuine biological interfaces. Water molecules were not considered. Interface residues were further classified as peripheral or central on the basis of their solvent accessibility when bound (B-ASA). A peripheral residue has a B-ASA
7%, where as a central residue, has a B-ASA <7%. To clarify, the relationship between all of these terms is as follows:
ASA = B-ASASeparated monomer ASA. Sequence logos for central and peripheral residues were generated for each category using ALPRO (Schneider and Stephens 1990).
Surface-patch generation
We wanted to compare the interface patch with other random surface patches to see whether the former was more conserved. A surface patch was defined by taking each solvent-exposed residue and its surrounding neighbors on the unbound protein. Thus, a protein with 100 solvent-exposed residues would have 100 surface patches. To ensure that we did not measure through the protein, the following procedure was followed. A side-chain centroid was calculated for every solvent-exposed residue on the unbound protein (a whole residue centroid for glycine) and was used to calculate distances between all exposed residues. The patch was grown from the single starting (seed) residue to include all neighboring residues that were within 7 Å of it. This process was iterated using the newly acquired residues, until the total number of residues in the patch was equal to the total number of residues in the interface. When the number of neighboring residues exceeds the number of remaining places in the patch, the residues closest to the seed residue are selected first. The patch will not always expand to an adequate size, and those with <70% of the actual interface are excluded from the analysis. The average residue conservation was calculated for each surface patch and the interface patch.
Statistical measures
The Wilcoxon-signed ranked test was used for all statistical comparisons. This test was chosen because it makes minimal assumptions about the underlying distribution, but is still able to take the magnitudes of the observed differences into account. Similar results were obtained when using the binomial and T-tests. The Z-test was used to compare the conservation of the interface relative to conservation of all other patches on the same protein.
| Acknowledgments |
|---|
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| References |
|---|
|
|
|---|
Argos, P. 1988. An investigation of protein subunit and domain interfaces. Protein Eng. 2: 101113.
Armon, A., Graur, D., and Ben-Tal, N. 2001. ConSurf: An algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J. Mol. Biol. 307: 447463.[CrossRef][Medline]
Bartlett, G.J., Porter, C.T., Borkakoti, N., and Thornton, J.M. 2002. Analysis of catalytic residues in enzyme active sites. J. Mol. Biol. 324: 105121.[CrossRef][Medline]
Bauer, B., Mirey, G., Vetter, I.R., Garcia-Ranea, J.A., Valencia, A., Wittinghofer, A., Camonis, J.H., and Cool, R.H. 1999. Effector recognition by the small GTP-binding proteins Ras and Ral. J. Biol. Chem. 274: 1776317770.
Bogan, A.A. and Thorn, K.S. 1998. Anatomy of hot spots in protein interfaces. J. Mol. Biol. 280: 19.[CrossRef][Medline]
Caffrey, D.R., ONeill, L.A., and Shields, D.C. 2000. A method to predict residues conferring functional differences between related proteins: Application to MAP kinase pathways. Protein Sci. 9: 655670.[Abstract]
Casari, G., Sander, C., and Valencia, A. 1995. A method to predict functional residues in proteins. Nat. Struct. Biol. 2: 171178.[CrossRef][Medline]
Chakrabarti, P. and Janin, J. 2002. Dissecting proteinprotein recognition sites. Proteins 47: 334343.[CrossRef][Medline]
Chothia, C. and Janin, J. 1975. Principles of proteinprotein recognition. Nature 256: 705708.[CrossRef][Medline]
Clackson, T. and Wells, J.A. 1995. A hot spot of binding energy in a hormone-receptor interface. Science 267: 383386.
Cole, C. and Warwicker, J. 2002. Side-chain conformational entropy at proteinprotein interfaces. Protein Sci. 11: 28602870.
Crasto, C.J. and Feng, J. 2001. Sequence codes for extended conformation: A neighbor-dependent sequence analysis of loops in proteins. Proteins 42: 399413.[CrossRef][Medline]
de Sol Mesa, D., Pazos, F., and Valencia, A. 2003. Automatic methods for predicting functionally important residues. J. Mol. Biol. 326: 12891302.[CrossRef][Medline]
Elcock, A.H. and McCammon, J.A. 2001. Identification of protein oligomerization states by analysis of interface conservation. Proc. Natl. Acad. Sci. 98: 29902994.
Fetrow, J.S., Palumbo, M.J., and Berg, G. 1997. Patterns, structures, and amino acid frequencies in structural building blocks, a protein secondary structure classification scheme. Proteins 27: 249271.[CrossRef][Medline]
Gadek, T.R. and Nicholas, J.B. 2003. Small molecule antagonists of proteins. Biochem. Pharmacol. 65: 18.[CrossRef][Medline]
Glaser, F., Steinberg, D.M., Vakser, I.A., and Ben-Tal, N. 2001. Residue frequencies and pairing preferences at proteinprotein interfaces. Proteins 43: 89102.[CrossRef][Medline]
Griffith, J.P., Kim, J.L., Kim, E.E., Sintchak, M.D., Thomson, J.A., Fitzgibbon, M.J., Fleming, M.A., Caron, P.R., Hsiao, K., and Navia, M.A. 1995. X-ray structure of calcineurin inhibited by the immunophilin-immunosuppressant FKBP12-FK506 complex. Cell 82: 507522.[CrossRef][Medline]
Grishin, N.V. and Phillips, M.A. 1994. The subunit interfaces of oligomeric enzymes are conserved to a similar extent to the overall protein sequences. Protein Sci. 3: 24552458.[Abstract]
Hannenhalli, S.S. and Russell, R.B. 2000. Analysis and prediction of functional sub-types from protein sequence alignments. J. Mol. Biol. 303: 6176.[CrossRef][Medline]
Henikoff, S. and Henikoff, J.G. 1992. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89: 1091510919.
. 1994. Position-based sequence weights. J. Mol. Biol. 243: 574578.[CrossRef][Medline]
Hu, Z., Ma, B., Wolfson, H., and Nussinov, R. 2000. Conservation of polar residues as hot spots at protein interfaces. Proteins 39: 331342.[CrossRef][Medline]
Hughes, A.L. and Nei, M. 1988. Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335: 167170.[CrossRef][Medline]
Janin, J. and Chothia, C. 1990. The structure of proteinprotein recognition sites. J. Biol. Chem. 265: 1602716030.
Janin, J., Miller, S., and Chothia, C. 1988. Surface, subunit interfaces and interior of oligomeric proteins. J. Mol. Biol. 204: 155164.[CrossRef][Medline]
Johnson, J.M., Mason, K., Moallemi, C., Xi, H., Somaroo, S., and Huang, E.S. 2003. Protein family annotation in a multiple alignment viewer. Bioinformatics 19: 544545.
Jones, S. and Thornton, J.M. 1996. Principles of proteinprotein interactions. Proc. Natl. Acad. Sci. 93: 1320.
. 1997. Analysis of proteinprotein interaction sites using surface patches. J. Mol. Biol. 272: 121132.[CrossRef][Medline]
Jones, S., Marin, A., and Thornton, J.M. 2000. Protein domain interfaces: Characterization and comparison with oligomeric protein interfaces. Protein Eng. 13: 7782.
Korn, A.P. and Burnett, R.M. 1991. Distribution and complementarity of hydropathy in multisubunit proteins. Proteins 9: 3755.[CrossRef][Medline]
Landgraf, R., Xenarios, I., and Eisenberg, D. 2001. Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. J. Mol. Biol. 307: 14871502.[CrossRef][Medline]
Larsen, T.A., Olson, A.J., and Goodsell, D.S. 1998. Morphology of proteinprotein interfaces. Structure 6: 421427.[Medline]
Lawrence, M.C. and Colman, P.M. 1993. Shape complementarity at protein/protein interfaces. J. Mol. Biol. 234: 946950.[CrossRef][Medline]
Lee, B. and Richards, F.M. 1971. The interpretation of protein structures: Estimation of static accessibility. J. Mol. Biol. 55: 379400.[CrossRef][Medline]
Lichtarge, O., Bourne, H.R., and Cohen, F.E. 1996. An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257: 342358.[CrossRef][Medline]
Lifshitz, E.M. and Pitaevskii, L.P. 1980. Statistical physics, pp. 2526. Pergamon Press, Oxford, UK.
Lijnzaad, P., Berendsen, H.J., and Argos, P. 1996. Hydrophobic patches on the surfaces of protein structures. Proteins 25: 389397.[CrossRef][Medline]
Livingstone, C.D. and Barton, G.J. 1996. Identification of functional residues and secondary structure from protein multiple sequence alignment. Methods Enzymol. 266: 497512.[Medline]
Lo Conte, L., Chothia, C., and Janin, J. 1999. The atomic structure of proteinprotein recognition sites. J. Mol. Biol. 285: 21772198.[CrossRef][Medline]
McCoy, A.J., Chandana Epa, V., and Colman, P.M. 1997. Electrostatic complementarity at protein/protein interfaces. J. Mol. Biol. 268: 570584.[CrossRef][Medline]
Moroianu, J. 1999. Nuclear import and export pathways. J. Cell. Biochem. 33: 7683.[CrossRef]
Ofran, Y. and Rost, B. 2003. Analysing six types of proteinprotein interfaces. J. Mol. Biol. 325: 377387.[CrossRef][Medline]
Ouzounis, C., Perez-Irratxeta, C., Sander, C., and Valencia, A. 1998. Are binding residues conserved? Pac. Symp. Biocomput. 3: 401412.
Pazos, F., Helmer-Citterich, M., Ausiello, G., and Valencia, A. 1997. Correlated mutations contain information about proteinprotein interaction. J. Mol. Biol. 271: 511523.[CrossRef][Medline]
Petz, D. 2001. Entropy, von Neumann and the von Neumann entropy. In John von Neumann and the foundations of quantum physics. Kluwer Academic Publishers, Dordrecht.
Philippsen, A. 2002. DINO: Visualizing structural biology. http://www.dino3d.org.
Pickett, S.D. and Sternberg, M.J. 1993. Empirical scale of side-chain conformational entropy in protein folding. J. Mol. Biol. 231: 825839.[CrossRef][Medline]
Pupko, T., Bell, R.E., Mayrose, I., Glaser, F., and Ben-Tal, N. 2002. Rate4Site: An algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics Suppl. 1: S71S77.
Schneider, T.D. and Stephens, R.M. 1990. Sequence logos: A new way to display consensus sequences. Nucleic Acids Res. 18: 60976100.
Sheinerman, F.B. and Honig, B. 2002. On the role of electrostatic interactions in the design of proteinprotein interfaces. J. Mol. Biol. 318: 161177.[CrossRef][Medline]
Sheinerman, F.B., Norel, R., and Honig, B. 2000. Electrostatic aspects of proteinprotein interactions. Curr. Opin. Struct. Biol. 10: 153159.[CrossRef][Medline]
Shirai, T., Matsui, Y., Shionyu-Mitsuyama, C., Yamane, T., Kamiya, H., Ishii, C., Ogawa, T., and Muramoto, K. 2002. Crystal structure of a conger eel galectin (congerin II) at 1.45Å resolution: Implication for the accelerated evolution of a new ligand-binding site following gene duplication. J. Mol. Biol. 321: 879889.[CrossRef][Medline]
Sowa, M.E., He, W., Slep, K.C., Kercher, M.A., Lichtarge, O., and Wensel, T.G. 2001. Prediction and confirmation of a site critical for effector regulation of RGS domain activity. Nat. Struct. Biol. 8: 234237.[CrossRef][Medline]
Stenmark, H., Valencia, A., Martinez, O., Ullrich, O., Goud, B., and Zerial, M. 1994. Distinct structural elements of rab5 define its functional specificity. EMBO J. 13: 575583.[Medline]
Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22: 46734680.
Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., and Higgins, D.G. 1997. The CLUSTAL_X windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25: 48764882.
Toogood, P.L. 2002. Inhibition of proteinprotein association by small molecules: Approaches and progress. J. Med. Chem. 45: 15431558.[CrossRef][Medline]
Tsai, C.J. and Nussinov, R. 1997. Hydrophobic folding units at proteinprotein interfaces: Implications to protein folding and to proteinprotein association. Protein Sci. 6: 14261437.[Abstract]
Tsai, C.J., Lin, S.L., Wolfson, H.J., and Nussinov, R. 1996. Proteinprotein interfaces: Architectures and interactions in proteinprotein interfaces and in protein cores. Their similarities and differences. Crit. Rev. Biochem. Mol. Biol. 31: 127152.[Medline]
. 1997a. Studies of proteinprotein interfaces: A statistical analysis of the hydrophobic effect. Protein Sci. 6: 5364.[Abstract]
Tsai, C.J., Xu, D., and Nussinov, R. 1997b. Structural motifs at proteinprotein interfaces: Protein cores versus two-state and three-state model complexes. Protein Sci. 6: 17931805.[Abstract]
Valdar, W.S. and Thornton, J.M. 2001a. Conservation helps to identify biologically relevant crystal contacts. J. Mol. Biol. 313: 399416.[CrossRef][Medline]
. 2001b. Proteinprotein interfaces: Analysis of amino acid conservation in homodimers. Proteins 42: 108124.[CrossRef][Medline]
Wheeler, D.L., Chappey, C., Lash, A.E., Leipe, D.D., Madden, T.L., Schuler, G.D., Tatusova, T.A., and Rapp, B.A. 2000. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 28: 1014.
Xu, D., Tsai, C.J., and Nussinov, R. 1997. Hydrogen bonds and salt bridges across proteinprotein interfaces. Protein Eng. 10: 9991012.
Xu, W. and Regnier, F.E. 1998. Proteinprotein interactions on weak-cation-exchange sorbent surfaces during chromatographic separations. J. Chromatogr. A 828: 357364.[CrossRef][Medline]
Yao, H., Kristensen, D.M., Mihalek, I., Sowa, M.E., Shaw, C., Kimmel, M., Kavraki, L., and Lichtarge, O. 2003. An accurate, sensitive, and scalable method to identify functional sites in protein structures. J. Mol. Biol. 326: 255261.[CrossRef][Medline]