Protein Science
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Protein Science (2004), 13:2992-3005. Published by Cold Spring Harbor Laboratory Press. Copyright © 2004 The Protein Society
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Paiardini, A.
Right arrow Articles by Pascarella, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Paiardini, A.
Right arrow Articles by Pascarella, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Evolutionarily conserved regions and hydrophobic contacts at the superfamily level: The case of the fold-type I, pyridoxal-5'-phosphate-dependent enzymes

Alessandro Paiardini1, Francesco Bossa1 and Stefano Pascarella1,2

1 Dipartimento di Scienze Biochimiche "A. Rossi Fanelli," Istituto di Biologia e Patologia Molecolari del Consiglio Nazionale delle Ricerche, and 2 Centro di Ricerca per l’ Analisi dei Modelli e dell’Informazione nei Sistemi Biomedici (CISB), Università La Sapienza, 00185 Roma, Italy

Reprint requests to: Stefano Pascarella, Dipartimento di Scienze Bio-chimiche, Università La Sapienza, P.le A. Moro 5, 00185 Rome, Italy; e-mail: stefano.pascarella{at}uniroma1.it; fax: +0039-06-49917566.

(RECEIVED June 17, 2004; FINAL REVISION July 30, 2004; ACCEPTED August 2, 2004)


    Abstract
 TOP
 Abstract
 Introduction
 Results
 Discussion
 Materials and methods
 References
 
The wealth of biological information provided by structural and genomic projects opens new prospects of understanding life and evolution at the molecular level. In this work, it is shown how computational approaches can be exploited to pinpoint protein structural features that remain invariant upon long evolutionary periods in the fold-type I, PLP-dependent enzymes. A nonredundant set of 23 superposed crystallographic structures belonging to this superfamily was built. Members of this family typically display high-structural conservation despite low-sequence identity. For each structure, a multiple-sequence alignment of orthologous sequences was obtained, and the 23 alignments were merged using the structural information to obtain a comprehensive multiple alignment of 921 sequences of fold-type I enzymes. The structurally conserved regions (SCRs), the evolutionarily conserved residues, and the conserved hydrophobic contacts (CHCs) were extracted from this data set, using both sequence and structural information. The results of this study identified a structural pattern of hydrophobic contacts shared by all of the superfamily members of fold-type I enzymes and involved in native interactions. This profile highlights the presence of a nucleus for this fold, in which residues participating in the most conserved native interactions exhibit preferential evolutionary conservation, that correlates significantly (r = 0.70) with the extent of mean hydrophobic contact value of their apolar fraction.

Keywords: PLP-dependent enzymes; remote homology; molecular evolution; conserved hydrophobic contacts; structural stability

Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.04938104.


    Introduction
 TOP
 Abstract
 Introduction
 Results
 Discussion
 Materials and methods
 References
 
It is broadly accepted that protein three-dimensional structural features are conserved among proteins sharing a common ancestor, despite low-sequence identity (Lesk and Chothia 1980; Chothia and Lesk 1986; Rodionov and Blundell 1998). A particularly interesting problem is how highly divergent sequences fold to similar structures (Michnick and Shakhnovich 1998). In many cases, no significant sequence similarity can be detected, except for regions of particular structural importance or, concerning enzymes, residues involved in the catalytic mechanism (Russell and Barton 1994). This observation implies that not all of the residues of a protein sequence are equally involved in the determination of its final three-dimensional structure. This raises several questions, such as,

  1. Is it possible to detect sequence information necessary to maintain a particular fold and discriminate between this signal and the noise derived from variable regions?
  2. To what extent can we relate sequence conservation at a superfamily level (throughout this work, the term "superfamily" is used according to the SCOP [Murzin et al. 1995] definition) with structural fold and function?
  3. Can we expect to find in all of the members belonging to a superfamily a similar three-dimensional pattern of interacting residues that is reflected by a property conservation at those sites?

In an effort to address these questions, the interacting hydrophobic residues conserved at primary and tertiary structure levels have been investigated in the case of fold-type I, pyridoxal-5'-phosphate (PLP)-dependent enzymes. Although there are at least five evolutionarily unrelated superfamilies of PLP-dependent enzymes, each displaying a completely different fold, by far the largest and best-characterized is known as fold-type I, {alpha} family, or aspartate aminotransferase family (Jansonius 1998; Schneider et al. 2000). This large group of enzymes, which are found in all organisms and together cover the whole range of enzymatic activities cataloged by the Enzyme Commission (John 1995), bears several interesting characteristics; its members are highly divergent enzymes that display structural homology with almost undetectable sequence similarity; thanks to the recent massive sequencing of several genomes and advances in protein structure determination, a good wealth of experimentally well-characterized information is now available for this superfamily.

On the basis of such consideration, the present work was aimed at detecting the evolutionarily conserved structural patterns possibly responsible for the maintenance of the fold of this protein superfamily. The analysis was carried out in two steps; initially, a structural study extracted from a nonredundant set of 23 superposed crystallographic structures the features shared by this superfamily of enzymes, that is, the structurally conserved regions (SCRs) and the conserved hydrophobic contacts (CHCs); then, the initial multiple structural alignment was extended by adding sequence homologs to the enzymes whose structure is known, and an evolutionary analysis was undertaken on the final multiple alignment of 921 sequences to detect the most conserved sequence sites. Finally, the structure-based and the sequence-based analyses were compared.

The role played by conserved residues in the stabilization of the native structure and their possible involvement in the mechanism of protein folding was then discussed in light of the most recent studies on PLP-dependent enzymes.


    Results
 TOP
 Abstract
 Introduction
 Results
 Discussion
 Materials and methods
 References
 
The data collection used in this work included 23 crystallographic structures and 921 sequences of fold-type I, PLP-dependent enzymes (Table 1Go) from different sources comprising the three domains of life, Eukarya, Bacteria, and Archaea. The level of sequence identity between the superimposed structures guaranteed the coverage of values inside the "twilight zone" (Rost 1999), ranging from 6% to 27% (mean 12%, SD ± 3%). Despite the low-sequence identity, this superfamily of enzymes displays a remarkable structural conservation, with a mean secondary structure agreement computed over six states ({alpha}-helix, 3–10 helix, {beta}-bridge, extended strand, bend, hydrogen-bonded turn, and loop) of 64% ± 4% and with a maximum pairwise RMSD of 4.2 Å (Table 2Go).


View this table:
[in this window]
[in a new window]
 
Table 1. Fold-type I enzymes data set
 

View this table:
[in this window]
[in a new window]
 
Table 2. Pairwise sequence identity, RMSD, and the number of C{alpha} atoms superposed between the structures used as data set
 
To identify the common core regions and the residues of these proteins involved in structural and functional roles, the study was focused on protein segments that conserve a similar main-chain conformation in all of the three-dimensional structures analyzed (SCRs; see Materials and Methods section), excluding the intervening regions whose structure differs markedly among different proteins. The SCRs were subjected to similar constraints during the divergent evolution of these enzymes from a common ancestor; therefore, they possibly contain most of the determinants necessary to maintain the fold. Seventeen regions with a mean positional RMSD ≤3.0 Å, lacking insertions and deletions were detected (Figs. 1Go, 2AGo). Positional RMSD values ranged from 0.8 Å in position 65 (residue numbering refers to sites in Fig. 1Go), to 3.1 Å in position 126 (Table 3Go). Figure 2AGo shows that an extensive and evident common structural organization of the main chain around PLP is responsible for the appropriate positioning of key residues previously identified as structural determinants for binding the cofactor (Grishin et al. 1995). Five SCRs are mainly implied in the constitution of this common core as follows: one {alpha}-helix ({alpha}3, which displays a mean positional RMSD of 1.59 Å) and four {beta}-strands, forming a {beta}-sheet ({beta}6, {beta}9, {beta}10, and {beta}11, with a mean positional RMSD of 1.76 Å, 1.54 Å, 1.41 Å, and 1.52 Å, respectively).



View larger version (58K):
[in this window]
[in a new window]
 
Figure 1. Alignment of the SCRs in the fold-type I enzymes. Structurally conserved regions (SCRs) are represented as blocks separated by dashes. The top line represents absolute position of the alignment. Alignment columns are colored according to the color scheme of Figure 2BGo. Each sequence is labeled according to the PDB code of its corresponding structure (see Table 1Go). Boxes at the bottom represent secondary structure elements, and are labeled as follows: ({alpha}), {alpha}-helix; ({beta}), {beta}-strand; (L) loop; (T) turn.

 


View larger version (36K):
[in this window]
[in a new window]
 
Figure 2. (A) Superimposition of the SCRs found in fold-type I enzymes. The backbones of the 23 superposed structures are shown as solid oval ribbon. Seventeen regions with a mean positional RMSD ≤3.0 Å, lacking insertions and deletions were detected, and the corresponding coordinates colored according to the RMSD value. PLP is displayed as slate CPKs, with oxygen atoms colored red; nitrogen atoms, blue; and phosphorus, purple. Each SCR is labeled as follows: ({alpha}), {alpha}-helix; ({beta}), {beta}-strand; (L) loop; (T) turn. (B) Representation of sites involved in making conserved hydrophobic contacts (CHCs). Positions involved in the strongest conserved CHCs (see also Table 4Go) are represented as colored space-filled spheres, and labeled according to absolute position of the alignment shown in Figure 1Go. The backbones of the 23 superposed structures are shown as a solid oval ribbon and colored according to the mean value of hydrophobic contact. PLP is displayed as slate sticks, with oxygen atoms colored red; nitrogen atoms, blue; and phosphorus, purple.

 

View this table:
[in this window]
[in a new window]
 
Table 3. Structural and sequence attributes of the SCRs
 
Given the structural conservation of the residues involved in the SCRs and their possible relevance to the stability of type I superfamily of enzymes, an analysis was carried out to infer to what extent their properties, and consequently their functional role, was preserved during evolution. To get more general information on the conservation of physico-chemical properties of each position in the multiple-structure alignment, sequence homologs for each structure collected were retrieved and aligned. Several criteria were adopted to reduce the presence of any possible redundancy in the data set being analyzed (see Materials and Methods section); hits displaying a sequence identity >80% with any other protein were rejected, as well as distant homologs (<30% sequence identity) for which accuracies of the alignments to sequences of known structure cannot be assured.

The number of sequences retrieved for each structure is shown in Table 1Go. The multiple-structure alignment obtained from the superposition of the crystallographic structures was then used as a guide to merge the 23 multiple-sequence alignments comprising 921 nonredundant sequences. A total of 376,573 of the 422,740 pairwise sequence comparisons displayed a sequence identity in the interval 0%–20% (mean 16%, SD ±6%), which suggests that the data set can sample very distant evolutionary events. After obtaining the multiple-sequence alignment, a method for the identification of evolutionarily conserved residues was applied. Because in extensive tests of sequence alignments (Vogt et al. 1995) the BLOsum62, on average, gave superior results compared with most other matrices, it seemed appropriate to adopt this mutational matrix to assign a score for the amino acid exchanges. A weighting scheme based upon sequence similarity was also adopted, to incorporate in the algorithm corrections for sequence evolutionary distance and residue frequency (see Materials and Methods section). The results obtained for the SCRs, expressed in units of SD from the mean conservation value (R), are shown in Table 3Go. The structural role played by SCRs in maintaining the fold of this superfamily of enzymes is reflected by the high sequence conservation of the corresponding positions of the multiple alignment. Scores displayed by the SCRs are, in fact, all above the mean conservation value, with the only exception being site 14, which obtained a negative score. In particular, residues interacting with the PLP moiety are the most conserved; Asp 67, which is known to interact with the pyridinium nitrogen of PLP (Mehta and Christen 1998), was found in 919 of 921 sequences aligned (the only exceptions are 8-amino-7-oxononanoate synthase from Mesorhizobium loti and Cystathionine {beta}-lyase from Bifidobacterium longum, GI 13475018 and GI 23336039, respectively [Holm and Sander 1998], in which Asp was replaced by Gly and Asn, respectively), scoring at a significance of 3.3 SDs from the mean conservation value; a comparable value (R = 3.2) was seen only by the Schiff base-forming lysine, which is placed in a variable loop between SCRs {beta}10 and {beta}11 (Christen and Mehta 2001). Taken together, these two residues represent the major signature of this superfamily of enzymes. Other sites involved in interactions with the cofactor or the substrates are strongly conserved, that is, position 70, interacting with the phenol oxygen of PLP (R = 1.2), the ring moiety stacking on the re side of PLP (data not shown; R = 1.6), the residue stacking on the si side (site 69, R = 1.4), the so-called glycine-rich region (positions 19, 20, and 21; R = 1.6, 1.9, 1.0, respectively), the 5'-phosphate-binding residue in position 77 (R = 1.5), and the Arg residue ion-paired with the {alpha} carboxyl group of many substrates bound to the fold-type I enzymes (site 133, R = 2.0).

In addition to these positions, other sites not directly involved in any interaction with the cofactor or the substrates show a high degree of sequence conservation, comparable to the conservation measured for functionally important residues (R ≥ 1.0; Table 3Go). These sites might be grouped in two distinct categories as follows: (1) Gly/Ala-rich sites; (2) positions mainly occupied by residues with a hydrophobic character (position 97, for example, scoring at a significance of 1.7 SDs from the mean conservation value, is almost invariantly occupied by a Leu or an aromatic residue in all of the 921 sequences considered, although it seems not to be implied in any functional role).

The positions mainly occupied by Gly or Ala residues (23, 80, and 92), that show a high degree of sequence conservation (1.9, 1.9, and 1.0, respectively), might play important functions other than binding the PLP moiety or being involved in hydrophobic contacts. For example, two Ala rich sites (23 and 92) are found in the middle of an {alpha}-helix spine; it was observed that Ala show the strongest preference over any other residue for a middle-helix location (Richardson and Richardson 1989). This, in turn, is due to the structurally unique features shown by Ala, which direct and stabilize the {alpha}-helix fold (Blaber et al. 1993). The other Gly-rich site (80) was found in {beta}11, where it could be helpful in modulating the curvature of the sheet (Richardson and Richardson 1989).

To test whether the conservation of the physicochemical properties of the second group of positions was driven by selective pressure to maintain the stability of fold-type I, PLP-dependent enzymes through the involvement of the corresponding residues in hydrophobic interactions, an analysis of the conserved hydrophobic contacts (CHCs) was performed on the SCRs previously identified. Previous comparative studies that have been focused on the relationship between sequence conservation of a protein family and the hydrophobic contacts of the corresponding structures available (see, for example, Ptitsyn 1998; Hill et al. 2002; Gromiha et al. 2004; Gunasekaran et al. 2004) have considered two residues to be in contact if the distance between their C{alpha} atoms or between one atom and any other atom was below an arbitrary threshold. In this work, a different criterion was adopted, which is based on the comparative analysis of the pairwise residue apolar contact areas for every possible pair of residues belonging to the SCRs. CHCs are, therefore, defined as residue hydrophobic contacts involving only apolar atoms (Drabløs 1999), observed in at least two of the structures analyzed. This approach permitted us to quantify the strength of a hydrophobic contact and to assess the correlation between this quantity and the evolutionary conservation of the corresponding sites. The strongest CHCs for each site belonging to the SCRs and the corresponding site involved in the hydrophobic interaction are shown in Table 3Go.

Figure 3Go shows the mean conservation values between pairs of sites involved in CHCs in comparison with their mean hydrophobic contact values. Residues interacting with the cofactor PLP as well as the Ala/Gly-rich sites described above were not plotted, as their high evolutionary conservation reflects functions other than the stabilization of this superfamily fold through the involvement in hydrophobic contacts. A significant linear coefficient (r = 0.70) resulted between the two variables. The statistical significance of r was assessed with the t-test, assuming r = 0 as the null hypothesis. This gave a P-value {cong}1.7e-53, indicating that there is a statistically significant relationship between the strength of a CHC and the extent of conservation of the involved residues during evolution. At values >16 Å2, the mean conservation grade becomes comparable to the values measured for catalytically important residues (R ≥ 1.0). CHCs with the highest values of mean apolar contact area (Table 4Go) may be grouped in three main clusters (Fig. 2BGo); a first cluster of CHCs is located at the buried bottom region of the PLP-binding, conserved common core of the major domain constituted by the six SCRs ({alpha}3 {beta}6, L8, {beta}9, {beta}10, and {beta}11); a second small cluster of interacting residues is centered around position 133 of the minor domain ({alpha}13, {beta}14, {alpha}15, {beta}16, and {beta}17); a third cluster of CHCs forms a hinge between SCRs {alpha}1 and {alpha}12, which are positioned at the beginning and at the end of the major domain, respectively (Fig. 2BGo). Amino acids belonging to the first cluster of CHCs occur at positions 27, 28, and 31 in {alpha}3; 35 and 36 in {beta}4; 42 in {beta}5; 48, 49, and 50 in {beta}6; 58 and 59 in L8; 63, 64, and 65 in {beta}9; 73 in {beta}10; 81 and 84 in {beta}11 (Fig. 2BGo; Table 4Go). The five residues participating in the formation of the second cluster (111 in {alpha}13, 118 in {beta}14, 125 in {alpha}15, 128 in {beta}16, and 134 in {beta}17), are located in proximity of position 133 of {beta}17, which is occupied mainly by an Arg residue (18 of 23 structures analyzed); the {alpha} carboxyl group of many substrates bound to the fold-type I enzymes is often ion-paired to this arginine (Jansonius 1998). Residues forming the third cluster of CHCs are involved in interhelical contacts in 22 of the 23 structures considered (the only exception is represented by 1BJW [PDB] , in which only two CHCs involving site 8 are conserved; Table 4Go). These residues (positions 4, 8, and 11 of SCR {alpha}1; 90 and 97 of SCR {alpha}12) form a vertical strip down each side of the helices that delimit the major domain, lying at sites i, i + 4, and i + 7. Site 97, which was described above (R = 1.7), is engaged in the constitution of the two most extensive CHCs measured for the {alpha}1{alpha}12 hinge (11–97 and 8–97 [Fig. 2BGo]; the mean apolar contact areas are, according to Table 3Go, 24.6 Å2 and 17.6 Å2, respectively), and in an additional conserved contact with position 7 (15.3 Å2).



View larger version (7K):
[in this window]
[in a new window]
 
Figure 3. Mean conservation values between pairs of sites involved in CHCs in comparison with their mean hydrophobic contact values. The correlation coefficient between the mean conservation grade and the CHC value, expressed in square angstroms, is 0.70. The statistically significant relationship between the strength of a CHC and the extent of conservation of the involved residues is supported by a P-value <0.0001.

 

View this table:
[in this window]
[in a new window]
 
Table 4. Sites involved in the strongest conserved hydrophobic contacts (CHCs with a mean hydrophobic contact > 16.0 Å2)
 

    Discussion
 TOP
 Abstract
 Introduction
 Results
 Discussion
 Materials and methods
 References
 
The present work was aimed at the detection of structural features remaining invariant over long evolutionary periods in the fold-type I PLP-dependent enzymes. This protein superfamily is particularly suited for such an analysis, because its members are related by a long divergent evolution. It was proposed that these enzymes were already present in the universal ancestor cell some 1500 millions years ago (Mehta and Christen 1998). Whereas the structural homology among the different members is still recognizable, the extent of sequence similarity is not sufficient to establish a common ancestry. For these reasons, this superfamily can be considered, per se, a model of protein evolutionary structural plasticity. The structural similarity among members of this superfamily of enzymes is distributed accordingly to the locally different functions accomplished by the motifs of secondary structure. Approximately 30% of the residues form a well-conserved, structural common core of secondary elements. This observed structural conservation is probably due to the spatial restraints imposed by the similar binding mode of the cofactor PLP, which is accommodated inside of a hydrophobic cleft at the interface between two subunits. Apart from this common core, secondary structure lengths and loops in these distantly related structures vary substantially, which results are also evident in the large multiple alignment of 921 sequences, where indels-free conserved blocks are sparse. Large structural adaptations have probably taken place during the divergent evolution of this superfamily from a common ancestor for the adjustment of a catalytic apparatus required to change reaction and substrate specificity. Likely, the loops surrounding the active site entrance were mainly affected by these structural changes (Contestabile et al. 2001).

Our work focused on the conservation of hydrophobic contacts between the structurally conserved regions resulting from the comparison of 23 distantly related type I PLP-dependent enzymes. The conservation of hydrophobic contacts is the result of the selective pressure exerted during the molecular evolution to maintain a functionally competent fold. We identified three clusters of conserved hydrophobic contacts; the first and the second clusters of CHCs (Fig. 2BGo) are located in proximity of key residues responsible for the proper positioning of the cofactor and the substrate in the active site. Regarding the first cluster, the separation of residues involved in a functional role (interaction with the PLP moiety and modulation of its activity), all located at the top of the conserved core region (constituted by SCRs {alpha}3, {beta}6, {beta}9, {beta}10, and {beta}11) (Fig. 2A,BGo) and residues involved in a structural role (maintenance of structural stability throughout CHCs), positioned, instead, at the inner bottom core of the same functional unit, is remarkable. This functional and spatial arrangement, comprising a stable scaffold folded around a mutable functional core of residues, can be found in many other evolutionarily successful (Nagano et al. 2002; Selvaraj and Gromiha 2003) structural units exploited by nature during the course of evolution (i.e., TIM-barrel and Ig-like domains), and it seems to provide a suitable way to solve the compromise between three-dimensional stability and plasticity of function, broadening substrate, and reaction specificity, without affecting protein fold and conformation (Todd et al. 2001; Wierenga 2001; Nagano et al. 2002).

In apparent contrast to the two previously described clusters, the helix {alpha}1–helix {alpha}12 cluster of CHCs (Fig. 2BGo) seems not to be involved in the proper positioning or stability of any active site residue. Examination of the contact network showed that the CHCs lie along one side of each helix, forming a buried spine at positions i, i + 4, and i + 7. This particular pattern of almost absolutely conserved residue–residue contacts was previously identified by Hill et al. (2002) and Ptitsyn (1998) in the case of the cytokines and c-type cytochromes superfamily of proteins, respectively. In both studies, it was concluded that these residues were of critical importance for protein folding. In the case of PLP-dependent enzymes, previous experimental studies have suggested the presence of three structural nuclei responsible for the proper fold-type I enzymes folding pattern and stability. Herold et al. (1991) demonstrated that the excised PLP-binding domain of aspartate aminotransferase from Escherichia coli, corresponding to the first and third domain in which CHCs are located, is able to fold autonomously both in vivo and in vitro and bind PLP. More recently, Fu et al. (2003) proposed that the folding mechanism of serine hydroxymethyltransferase from E. coli can be divided into two phases, a first fast phase in which two domains, corresponding to the first and second domains in which CHCs are located, have folded into their native state, and a slow final phase in which an interdomain segment, comprising the helix {alpha}12, folds into its native conformation, interacting with the N-terminal {alpha}1 helix of the major domain. This last step is thought to be involved in PLP binding. The present analysis supports this hypothesis and suggests a possible mechanistic explanation for these experimental studies, serving as a basis for further experiments to establish sequence-structure correlation, and to investigate the role of individual residues and pairwise interactions in the folding and stability of this superfamily of proteins.

A main goal of this work was to determine whether the common structural constraints found for the packing of interacting residues within the protein core of the type-I PLP enzymes was reflected by a sequence conservation pattern observed for the hydrophobic positions in the multiple alignment of the fold-type I superfamily. A plot of the mean conservation grade of two interacting sites of the SCRs against the extent of mean hydrophobic contact value of their apolar fraction can be fit by a linear relationship (r = 0.70). In the present analysis, the mean amino acid pairwise conservation was considered in addition to single-site, positional conservation. A significant advantage of considering pairwise conservation is that it allows one to take into account compensating mutations that may occur in the amino acid sequence during evolution. It should be noted that conserved positions are not invariant; on the contrary, correlated mutations can be detected by comparing different structures. Therefore, it seems that what is really conserved is the three-dimensional location of the hydrophobic interaction and its hydrophobic effect, rather than the specific identity of the side chains participating in a CHC. Although the 23 PLP enzymes taken into consideration are very distantly related, they contain a structural pattern of conserved hydrophobic contacts, whose potential importance in stabilizing the native fold is supported by a preferential conservation throughout the homologous sequences.

Finally, we suggest that the significant correlation between sequence conservation and CHC values and the strategy and the algorithms described to determine it, could be extended to other superfamilies for which suitable sequence and structural information is known, to properly train statistical predictors of protein contact maps (Fariselli and Casadio 2000; Pollastri et al. 2001) and to help in planning protein folding and design experiments.


    Materials and methods
 TOP
 Abstract
 Introduction
 Results
 Discussion
 Materials and methods
 References
 
Structural alignments
An initial search for nonredundant, representative members of each fold-type I family whose three-dimensional structure had been previously solved was carried out. Using the classification of these families in several structural databases (SCOP [Murzin et al. 1995], CATH [Orengo et al. 2003], and NCBI’s MMDB [Chen et al. 2003]), we retrieved an exhaustive set of crystallographic structures, from which we selected 27 representative members, on the basis of a hierarchical set of criteria; initially, engineered enzymes bearing residue mutations were discharged; then, in the presence of orthologous enzymes, the one with the highest resolution was chosen; finally, at comparable resolution values, the highest R-factor was also taken into consideration. All structures were retrieved by the Protein Data Bank (PDB; Berman et al. 2000).

An initial multiple alignment was obtained automatically by using the combinatorial extension algorithm, implemented in the program CE (Shindyalov and Bourne 1998). The resulting alignment was utilized as a starting point to build a manually refined structural alignment. Every possible pair of structures was visually inspected and, where necessary, modified to optimize the matching of several structural features, including observed secondary elements, functionally conserved residues known to interact with the PLP moiety and hydrophobic regions, in order to give the most accurate structural alignment. In a few cases of ambiguity, that is, some insertions or deletions in which visual inspection could not discern the optimal matching between two regions, the residue similarity measured by the BLOsum62 (Henikoff and Henikoff 1992) mutational matrix was adopted as a guide criterion. At the end of the manual refinement, structures displaying >30% sequence identity were discharged, leading to a nonredundant ensemble of 23 representatives of fold-type I enzymes with a maximum pairwise sequence identity of 27% and a maximum pairwise RMSD of 4.2 Å.

Identification of the structurally conserved regions
The structural alignment obtained as described above was utilized to identify the common core and the structurally conserved regions between members of this superfamily (SCRs). SCRs were defined as regions displaying similar local conformation, with a mean positional RMSD of the equivalent {alpha}-carbon positions of every structure superposed ≤ 3.0 Å (Hill et al. 2002), lacking indels (insertions and deletions) in all of the structures considered and composed of at least three consecutive residues. A C-language routine was developed to extract from the three-dimensional coordinates of the superimposed structures and their associated multiple alignment the candidate SCRs. For every structurally equivalent position of the multiple structural alignment, the RMSD from the center of mass of the structurally equivalent C{alpha} atoms was computed. To avoid the presence of SCRs with indels, positions with gaps were not considered. A window of size w = 3 positions was then scrolled through the alignment and used to define seed positions with a mean RMSD ≤ 3.0 Å. Each time a seed position was found, w was increased iteratively by one position until the mean score did not raise above 3.0 Å, or until the window reached the end of the alignment.

Identification of the conserved hydrophobic contacts
Computation of conserved hydrophobic contacts (CHCs) performed on the crystallographic structures retrieved is based on the program pdb_np_cont (Drabløs 1999), which computes pairwise atom contact areas between nonpolar atoms from structural protein data in a standard PDB coordinate file. The output of this program was utilized to calculate the pairwise residue contact areas for every possible pair of residues belonging to the SCRs of the structures analyzed. If two positions of the joint multiple structural alignment, x and y, have residues in hydrophobic contact in at least two of the structures, then a candidate CHC was detected. CHCs were then classified on the basis of their strength sxy, defined as:


(1)

where Ai is the apolar contact area of the i-th structure between residues at absolute positions x and y of the structural alignment, and N is the number of superposed structures.

Collection and alignment of sequence homologs
Sequence search was performed against the nonredundant database NRDB (Holm and Sander 1998) with the program BLAST (Altschul et al. 1997), using each of the sequences of the 23 superposed structures as probes. When applicable, the following criteria were adopted to collect or discharge each sequence: (1) Hits were considered to be significant if the E-value was ≤0.0001—if less than 10 sequences were collected, this value was increased to ≤0.001; (2) hits were filtered to assure that no sequence with identity >80% or <30% with any other sequence of the multiple alignment was present in the final alignment; (3) hits with sequence length <80% of the query sequence were rejected to avoid the presence of fragmented sequences in the final alignment.

Sequences filtered were aligned to each corresponding query sequence using the program CLUSTALW (Thompson et al. 1994). The 23 multiple alignments were then merged using as a guide the structural alignment of the 23 PLP enzyme sequences (Pascarella and Argos 1992). The final alignment, comprising 973 sequences, was further checked for redundancy. At the end of this final step, a total number of 921 sequences was obtained.

Identification of the evolutionarily conserved positions
To measure the sequence conservation, each position of the final multiple sequence alignment was assigned a score according to:


(2)

where Ok is the score assigned for every position k of the multiple sequence alignment, n is the number of sequences included in the alignment, i and j refers to the i-th and the j-th sequence, respectively, Bscorekij, Bscorekii, and Bscorekjj are the scores assigned to the residue exchange in position k between the i-th and the j-th sequence according to the BLOsum62 mutational matrix, nidij is the number of identical residues, and nalij is the number of aligned residues between the i-th and the j-th sequence, respectively. Therefore, for every possible exchange at a particular position of the multiple alignment, a normalized conservation index is computed, based on the BLOsum62 mutational matrix. Because the BLOsum62 matrix scores for matching the same amino acids vary for different residues, conservation indices for invariant positions of the multiple-sequence alignment would depend on residue type; normalization is used to avoid different conservation scores for invariant positions. The mean O and the standard deviation (SD) {sigma} for the distribution of Ok values were determined; the significance R of every conservation index of the alignment was then calculated by dividing the difference between Ok and O by {sigma}.


    Acknowledgments
 
Acknowledgments

This work was supported in part by the Italian "Ministero dell’ Università e della Ricerca" (MIUR). This work will be submitted by A.P. in partial fulfillment of the requirements of the degree of Dottorato di Ricerca at the Università di Roma "La Sapienza." Structural and sequence alignments and the source code of the software developed for the analysis are available on request from the authors.


    References
 TOP
 Abstract
 Introduction
 Results
 Discussion
 Materials and methods
 References
 
Alexeev, D., Alexeeva, M., Baxter, R.L., Campopiano, D.J., Webster, S.P., and Sawyer, L. 1998. The crystal structure of 8-amino-7-oxononanoate synthase: A bacterial PLP-dependent, acyl-CoA-condensing enzyme. J. Mol. Biol. 284: 401–419.[CrossRef][Medline]

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 3389–3402.[Abstract/Free Full Text]

Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne P.E. 2000. The Protein Data Bank. Nucleic Acids Res. 28: 235–242.[Abstract/Free Full Text]

Blaber, M., Zhang, X.J., and Matthews, B.W. 1993. Structural basis of amino acid {alpha} helix propensity. Science 260: 1637–1640.[Abstract/Free Full Text]

Burkhard, P., Dominici, P., Borri-Voltattorni, C., Jansonius, J.N., and Malashkevich, V.N. 2001. Structural insight into Parkinson’s disease treatment gained from drug-inhibited dopa decarboxylase. Nat. Struct. Biol. 8: 963–967.[CrossRef][Medline]

Chen, J., Anderson, J.B., DeWeese-Scott, C., Fedorova, N.D., Geer, L.Y., He, S., Hurwitz, D.I., Jackson, J.D., Jacobs, A.R., Lanczycki, C.J., et al. 2003. MMDB: Entrez’s 3D-structure database. Nucleic Acids Res. 31: 474–477.[Abstract/Free Full Text]

Chothia, C. and Lesk, A. 1986. The relation between the divergence of sequence and structure in proteins. EMBO J. 5: 823–826.[Medline]

Christen, P. and Mehta, P.K. 2001. From cofactor to enzymes. The molecular evolution of pyridoxal-5'-phosphate-dependent enzymes. Chem. Rec. 1: 436–447.[CrossRef][Medline]

Clausen, T., Huber, R., Laber, B., Pohlenz, H.D., and Messerschmidt, A. 1996. Crystal structure of the pyridoxal-5'-phosphate dependent cystathionine {beta}-lyase from Escherichia coli at 1.83 Å. J. Mol. Biol. 262: 202–224.[CrossRef][Medline]

Clausen, T., Schlegel, A., Peist, R., Schneider, E., Steegborn, C., Chang, Y.S., Haase, A., Bourenkov, G.P., Bartunik, H.D., and Boos, W. 2000a. X-ray structure of MalY from Escherichia coli: A pyridoxal 5'-phosphate-dependent enzyme acting as a modulator in mal gene expression. EMBO J. 19: 831–842.[CrossRef][Medline]

Clausen, T., Kaiser, J.T., Steegborn, C., Huber, R., and Kessler, D. 2000b. Crystal structure of the cystine C-S lyase from Synechocystis: Stabilization of cysteine persulfide for FeS cluster biosynthesis. Proc. Natl. Acad. Sci. 97: 3856–3861.[Abstract/Free Full Text]

Contestabile, R., Paiardini, A., Pascarella, S., di Salvo, M.L., D’Aguanno, S., and Bossa, F. 2001. 1-Threonine aldolase, serine hydroxymethyltransferase and fungal alanine racemase. A subgroup of strictly related enzymes specialized for different functions. Eur. J. Biochem. 268: 6508–6525.[Medline]

Drabløs, F. 1999. Clustering of non-polar contacts in proteins. Bioinformatics 15: 501–509.[Abstract/Free Full Text]

Eads, J.C., Beeby, M., Scapin, G., Yu, T.W., and Floss, H.G. 1997. The crystal structure of 3-amino-5-hydroxybenzoic acid Ahba synthase. Biochemistry 38: 9840–9849.

Fariselli, P. and Casadio, R. 2000. Prediction of the number of residue contacts in proteins. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8: 146–151.[Medline]

Fu, T.F., Boja, E.S., Safo, M.K., and Schirch, V. 2003. Role of proline residues in the folding of serine hydroxymethyltransferase. J. Biol. Chem. 278: 31088–31094.[Abstract/Free Full Text]

Grishin, N.V., Phillips, M.A., and Goldsmith, E.J. 1995. Modeling of the spatial structure of eukaryotic ornithine decarboxylases. Protein Sci. 4: 1291–1304.[Abstract]

Gromiha, M.M., Pujadas, G., Magyar, C., Selvaraj, S., and Simon, I. 2004. Locating the stabilizing residues in ({alpha}/{beta})8 barrel proteins based on hydrophobicity, long-range interactions, and sequence conservation. Proteins 55: 316–329.[CrossRef][Medline]

Gunasekaran, K., Hagler, A.T., and Gierasch, L.M. 2004. Sequence and structural analysis of cellular retinoic acid-binding proteins reveals a network of conserved hydrophobic interactions. Proteins 54: 179–194.[CrossRef][Medline]

Henikoff, S. and Henikoff, J.G. 1992. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89: 10915–10919.[Abstract/Free Full Text]

Hennig, M., Grimm, B., Contestabile, R., John, R.A., and Jansonius, J.N. 1997. Crystal structure of glutamate-1-semialdehyde aminomutase: An {alpha}2-dimeric vitamin B6-dependent enzyme with asymmetry in structure and active site reactivity. Proc. Natl. Acad. Sci. 94: 4866–4871.[Abstract/Free Full Text]

Herold, M., Leistler, B., Hage, A., Luger, K., and Kirschner, K. 1991. Autonomous folding and coenzyme binding of the excised pyridoxal 5'-phosphate binding domain of aspartate aminotransferase from Escherichia coli. Biochemistry 30: 3612–3620.[CrossRef][Medline]

Hester, G., Stark, W., Moser, M., Kallen, J., Markovic-Housley, Z., and Jansonius, J.N. 1999. Crystal structure of phosphoserine aminotransferase from Escherichia coli at 2.3 Å resolution: Comparison of the unligated enzyme and a complex with {alpha}-methyl-l-glutamate. J. Mol. Biol. 286: 829–850.[CrossRef][Medline]

Hill, E.E., Morea, V., and Chothia, C. 2002. Sequence conservation in families whose members have little or no sequence similarity: The four-helical cytokines and cytochromes. J. Mol. Biol. 322: 205–233.[CrossRef][Medline]

Hohenester, E., Keller, J.W., and Jansonius, J.N. 1994. An alkali metal ion size-dependent switch in the active site structure of dialkylglycine decarboxylase. Biochemistry 33: 13561–13570.[CrossRef][Medline]

Holm, L. and Sander, C. 1998. Removing near-neighbour redundancy from large protein sequence collections. Bioinformatics 14: 423–429.[Abstract/Free Full Text]

Isupov, M.N., Antson, A.A., Dodson, E.J., Dodson, G.G., Dementieva, I.S., Zakomirdina, L.N., Wilson, K.S., Dauter, Z., Lebedev, A.A., and Harutyunyan, E.H. 1998. Crystal structure of tryptophanase. J. Mol. Biol. 276: 603–623.[CrossRef][Medline]

Jansonius, J. 1998. Structure, evolution and action of vitamin B6-dependent enzymes. Curr. Opin. Struct. Biol. 8: 759–769.[CrossRef][Medline]

John, R.A. 1995. Pyridoxal phosphate-dependent enzymes. Biochim. Biophys. Acta. 1248: 81–96.[CrossRef][Medline]

Kaiser, J.T., Clausen, T., Bourenkow, G.P., Bartunik, H.D., Steinbacher, S., and Huber, R. 2000. Crystal structure of a NifS-like protein from Thermotoga maritima: Implications for iron sulphur cluster assembly. J. Mol. Biol. 297: 451–464.[CrossRef][Medline]

Kielkopf, C.L. and Burley, S.K. 2002. X-ray structures of threonine aldolase complexes: Structural basis of substrate recognition. Biochemistry 41: 11711–11720.[CrossRef][Medline]

Krupka, H.I., Huber, R., Holt, S.C., and Clausen, T. 2000. Crystal structure of cystalysin from Treponema denticola: A pyridoxal 5'-phosphate-dependent protein acting as a haemolytic enzyme. EMBO J. 19: 3168–3178.[CrossRef][Medline]

Kuettner, E.B., Hilgenfeld, R., and Weiss, M.S. 2002. The active principle of garlic at atomic resolution. J. Biol. Chem. 277: 46402–46407.[Abstract/Free Full Text]

Lesk, A. and Chothia, C. 1980. How different amino acid sequences determine similar protein structures: The structure and evolutionary dynamics of the globins. J. Mol. Biol. 136: 225–270.[CrossRef][Medline]

Mehta, P.K. and Christen, P. 1998. The molecular evolution of Pyridoxal-5'-phosphate-dependent enzymes. In Advances in enzymology and related areas of molecular biology: Mechanism of enzyme action, Part B (ed. D.L. Purich), pp. 129–184. John Wiley & Sons, Inc., New York.

Michnick, S.W. and Shakhnovich, E. 1998. A strategy for detecting the conservation of folding-nucleus residues in protein superfamilies. Fold. Des. 3: 239–251.[CrossRef][Medline]

Momany, C., Ernst, S., Ghosh, R., Chang, N.L., and Hackert, M.L. 1995. Crystallographic structure of a PLP-dependent ornithine decarboxylase from Lactobacillus 30a to 3.0 Å resolution. J. Mol. Biol. 252: 643–655.[CrossRef][Medline]

Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247: 536–540.[CrossRef][Medline]

Nagano, N., Orengo, C.A., and Thornton, J.M. 2002. One fold with many functions: The evolutionary relationships between TIM barrel families based on their sequences, structures and functions. J. Mol. Biol. 321: 741–765.[CrossRef][Medline]

Nakai, T., Okada, K., Akutsu, S., Miyahara, I., Kawaguchi, S., Kato, R., Kuramitsu, S., and Hirotsu, K. 1999. Structure of Thermus thermophilus HB8 aspartate aminotransferase and its complex with maleate. Biochemistry 38: 2413–2424.[CrossRef][Medline]

Noland, B.W., Newman, J.M., Hendle, J., Badger, J., Christopher, J.A., Tresser, J., Buchanan, M.D., Wright, T.A., Rutter, M.E., Sanderson, W.E., et al. 2002. Structural studies of Salmonella typhimurium Arnb Pmrh aminotransferase: A 4-amino-4-deoxy-L-arabinose liposaccharide modifying enzyme. Structure 10: 1569–1580.[Medline]

Okamoto, A., Nakai, Y., Hayashi, H., Hirotsu, K., and Kagamiyama, H. 1998. Crystal structures of Paracoccus denitrificans aromatic amino acid amino-transferase: A substrate recognition site constructed by rearrangement of hydrogen bond network. J. Mol. Biol. 280: 443–461.[CrossRef][Medline]

Orengo, C.A., Pearl, F.M., and Thornton, J.M. 2003. The CATH domain structure database. Meth. Biochem. Anal. 44: 249–271.[Medline]

Pascarella, S. and Argos, P. 1992. A data bank merging related protein structures and sequences. Protein Eng. 5: 121–137.[Abstract/Free Full Text]

Pollastri, G., Baldi, P., Fariselli, P., and Casadio, R. 2001. Improved prediction of the number of residue contacts in proteins by recurrent neural networks. Bioinformatics Suppl. 1: 234–242.

Ptitsyn, O.B. 1998. Protein folding and protein evolution: Common folding nucleus in different subfamilies of c-type cytochromes? J. Mol. Biol. 278: 655–666.[CrossRef][Medline]

Renwick, S.B., Snell, K., and Baumann, U. 1998. The crystal structure of human cytosolic serine hydroxymethyltransferase: A target for cancer chemotherapy. Structure 6: 1105–1116.[Medline]

Richardson, J.S. and Richardson, D.C. 1989. Principles and patterns of protein conformation. In Prediction of protein structure and the principles of protein conformation (ed. G.D. Fasman), pp. 1–99. Plenum Press, New York.

Rodionov, M.A. and Blundell, T.L. 1998. Sequence and structure conservation in a protein core. Proteins 33: 358–366.[CrossRef][Medline]

Rost, B. 1999. Twilight zone of protein sequence alignments. Protein Eng. 12: 85–94.[Abstract/Free Full Text]

Russell, R.B. and Barton, G.J. 1994. Structural features can be unconserved in proteins with similar folds. An analysis of side-chain to side-chain contacts secondary structure and accessibility. J. Mol. Biol. 244: 332–350.[CrossRef][Medline]

Schneider, G., Käck, H., and Lindqvist, Y. 2000. The manifold of vitamin B6 dependent enzymes. Structure 8: 1–6.[Medline]

Selvaraj, S. and Gromiha, M.M. 2003. Role of hydrophobic clusters and long-range contact networks in the folding of {alpha}/{beta}8 barrel proteins. Biophys. J. 84: 1919–1925.[Abstract/Free Full Text]

Shindyalov, I.N. and Bourne, P.E. 1998. Protein structure alignment by incremental combinatorial extension CE of the optimal path. Protein Eng. 11: 739–747.[Abstract/Free Full Text]

Sivaraman, J., Li, Y., Larocque, R., Schrag, J.D., Cygler, M., and Matte, A. 2001. Crystal structure of histidinol phosphate aminotransferase HisC from Escherichia coli, and its covalent complex with pyridoxal-5'-phosphate and l-histidinol phosphate. J. Mol. Biol. 311: 761–776.[CrossRef][Medline]

Steegborn, C., Messerschmidt, A., Laber, B., Streber, W., Huber, R., and Clausen, T. 1999. The crystal structure of cystathionine {gamma}-synthase from Nicotiana tabacum reveals its substrate and reaction specificity. J. Mol. Biol. 290: 983–996.[CrossRef][Medline]

Storici, P., Capitani, G., De Biase, D., Moser, M., John, R.A., Jansonius, J.N., and Schirmer, T. 1999. Crystal structure of gaba-aminotransferase, a target for antiepileptic drug therapy. Biochemistry 38: 8628–8634.[CrossRef][Medline]

Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22: 4673–4680.[Abstract/Free Full Text]

Todd, A.E, Orengo, C.A., and Thornton, J.M. 2001. Evolution of function in protein superfamilies, from a structural perspective. J. Mol. Biol. 307: 1113–1143.[CrossRef][Medline]

Vogt, G., Etzold, T., and Argos, P. 1995. An assessment of amino acid exchange matrices in aligning protein sequences: The twilight zone revisited. J. Mol. Biol. 249: 816–831.[CrossRef][Medline]

Wierenga, R.K. 2001. The TIM-barrel fold: A versatile framework for efficient enzymes. FEBS Lett. 492: 193–198.[CrossRef][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
K. Chen and L. Kurgan
PFRES: protein fold classification by using evolutionary information and predicted secondary structure
Bioinformatics, November 1, 2007; 23(21): 2843 - 2850.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
E. S. Burgie, J. B. Thoden, and H. M. Holden
Molecular architecture of DesV from Streptomyces venezuelae: A PLP-dependent transaminase involved in the biosynthesis of the unusual sugar desosamine
Protein Sci., May 1, 2007; 16(5): 887 - 896.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
J. Zhang, A. V. Cheltsov, and G. C. Ferreira
Conversion of 5-aminolevulinate synthase into a more active enzyme by linking the two subunits: Spectroscopic and kinetic properties
Protein Sci., May 1, 2005; 14(5): 1190 - 1200.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited