Protein Science (2002), 11:301-312.
Copyright © 2002 The Protein Society
Toward genomic identification of ß-barrel membrane proteins: Composition and architecture of known structures
William C. Wimley
Department of Biochemistry SL43, Tulane University Health Sciences Center, New Orleans, Louisiana 70112-2699
Reprint requests to: William C. Wimley, Department of Biochemistry SL43, Tulane University Health Sciences Center, New Orleans, LA 70112-2699; e-mail: wwimley{at}tulane.edu; fax: (504) 584-2739.
(RECEIVED July 18, 2001;
FINAL REVISION October 29, 2001;
ACCEPTED November 1, 2001)
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.29402
 |
Abstract
|
|---|
The amino acid composition and architecture of all ß-barrel membrane proteins of known three-dimensional structure have been examined to generate information that will be useful in identifying ß-barrels in genome databases. The database consists of 15 nonredundant structures, including several novel, recent structures. Known structures include monomeric, dimeric, and trimeric ß-barrels with between 8 and 22 membrane-spanning ß-strands each. For this analysis the membrane-interacting surfaces of the ß-barrels were identified with an experimentally derived, whole-residue hydrophobicity scale, and then the barrels were aligned normal to the bilayer and the position of the bilayer midplane was determined for each protein from the hydrophobicity profile. The abundance of each amino acid, relative to the genomic abundance, was calculated for the barrel exterior and interior. The architecture and diversity of known ß-barrels was also examined. For example, the distribution of rise-per-residue values perpendicular to the bilayer plane was found to be 2.7 ± 0.25 Å per residue, or about 10 ± 1 residues across the membrane. Also, as noted by other authors, nearly every known membrane-spanning ß-barrel strand was found to have a short loop of seven residues or less connecting it to at least one adjacent strand. Using this information we have begun to generate rapid screening algorithms for the identification of ß-barrel membrane proteins in genomic databases. Application of one algorithm to the genomes of Escherichia coli and Pseudomonas aeruginosa confirms its ability to identify ß-barrels, and reveals dozens of unidentified open reading frames that potentially code for ß-barrel outer membrane proteins.
Keywords: Proteomic; genomic; ß-barrel; membrane protein; outer membrane; dyad repeat
 |
Introduction
|
|---|
The ß-barrel is one of two known structural motifs for membrane-spanning proteins. As many as several hundred ß-barrel species can be found in the outer membrane of Gram-negative bacteria (Schulz 2000; Alm et al. 2000; Molloy et al. 2000), and they also occur in the outer membranes of mitochondria (Benz 1994) and chloroplasts (Fischer et al. 1994). In addition to these native proteins, the ß-barrel motif is also used by a large, diverse set of secreted membrane permeabilizing protein toxins and antibiotics that assemble into ß-barrels on exogenous membranes (Saier 2000). In a recent review, Schulz (2000) summarized the main structural features shared by all known ß-barrel membrane proteins in a list of 10 explicit rules: in summary, known ß-barrels are composed of an even number of membrane-spanning ß-strands with an antiparallel ß-meander topology. Neighboring strands in the barrel are connected by alternating long and short loops. The lipid-interacting outer surfaces of all ß-barrels are hydrophobic, and have a band of aromatics near the bilayer interfaces, while the internal residues have an intermediate polarity. Known structures contain between 8 and 22 strands and include monomeric, dimeric, and trimeric ß-barrels. Many of these features are apparent in the structure of the dimeric ß-barrel phospholipase, OmpLA, which is shown in Figure 1
.

View larger version (60K):
[in this window]
[in a new window]
|
Fig. 1. Molecular graphics image of a ß-barrel outer membrane protein, the dimeric phospholipase OmpLA (Snijder et al. 1999). In this image we show the interfacial aromatic residues tryptophan and tyrosine in green and external charged residues in blue. These residues were used to orient the dimer in the bilayer plane (see text). The grid superimposed over the structure shows the protein in the bilayer-coordinate system that it was transformed to by the procedures described in the text.
|
|
One might assume that knowing these explicit rules would make the prediction of ß-barrel structure and topology and the identification of ß-barrels in genome databases readily solvable problems. In fact, several different types of structure prediction algorithms have been applied with mixed success (Schirmer and Cowan 1993; Fischbarg et al. 1995; von Heijne 1996), and recent structure prediction algorithms based on neural networks have been able to make reasonably accurate predictions of ß-barrel structure and topology (Gromiha et al. 1997; Jacoboni et al. 2001). But these predictions were made for proteins already known to be ß-barrel membrane proteins by other means. A more difficult part of the problem, and one that has not yet been solved, is the accurate identification of ß-barrel membrane proteins in genome databases from physical principles. Currently, ß-barrels are identified in genome annotations mainly by their homology to known ß-barrels. Each Gram-negative bacterial genome has hundreds of "putative" and "probable" outer membrane proteins identified in this way. It would also be useful to able to identify them through their fundamental physical properties so that novel classes of ß-barrels can be identified, and so that the homology-based annotation can be verified. Because each bacterial genome has as many as 1000 hypothetical or unknown proteins that have not been classified at all, there are undoubtedly many ß-barrel membrane proteins that have not yet been identified.
We are broadly interested in understanding ß-barrel membrane proteins through a knowledge of their composition and physical properties and through parallel studies of how model ß-sheets assemble in membranes (Bishop et al. 2001). In theory, a thorough understanding of the fundamental physical principles should contain sufficient information to allow researchers to determine if an unknown protein sequence is a ß-barrel membrane protein. For
-helical bundle membrane proteins this idea is a proven one; prediction algorithms based on the physical principle that membrane-spanning helices will have a contiguous stretch of 19 or more hydrophobic residues, have very high accuracy (Rost et al. 1995; Casadio et al. 1996; Krogh et al. 2001), exceeding 99% in recent applications (S. Jayasinghe, K. Hristova, and S.H. White, 2001). However, ß-barrel membrane proteins have been more difficult to identify from physical principles for several reasons. First, their hydrophobic, membrane-interacting residues are cryptic, hidden in the alternating inside-outside (dyad repeat) motif. Second, compared to helical membrane proteins, there are many fewer membrane-interacting residues on each strand, and this reduces the uniqueness of the membrane-spanning sequences. And third, some ß-sheets in soluble proteins have, superficially, many of the same physical properties, such as similar strand length and amphipathicity as the ß-sheets of ß-barrel membrane proteins. In this work we set out to analyze the composition and architecture of all ß-barrel membrane proteins of known structure, including many new structures, and to generate a body of data that will be a useful starting point in the rapid identification of ß-barrel membrane proteins in genome databases.
 |
Results
|
|---|
The ß-barrel database
All of the initial ß-barrel structures published in the early 1990s belong to the closely related class of trimeric porins of 16 or 18 membrane-spanning ß strands. The architecture of this class of porins has been discussed in the literature (Seshadri et al. 1998). In the last few years, the total number of known ß-barrel membrane proteins has nearly doubled, and the architectural diversity of known structures has increased significantly with the addition of new ß-barrel membrane proteins having different functions, topology, and architecture. For example, three-dimensional structures are now known for the monomeric, TonB-dependent transport proteins FepA (Buchanan et al. 1999) and FhuA (Locher et al. 1998), which have 22 ß-strands each and for the trimeric, single-barrel transporter TolC (Koronakis et al. 2000) in which each monomer contributes four ß-strands to a 12-stranded barrel. New additions also include the first known dimeric ß-barrel, OmpLA (Snijder et al. 1999), shown in Figure 1
, and the adhesion protein OmpX (Vogt and Schulz 1999), a monomeric eight-stranded ß-barrel.
For this work we identified all ß-barrel membrane proteins in the Protein Data Bank (Berman et al. 2000) and used a BLAST (Altschul et al. 1990) sequence alignment to screen each sequence against all other sequences in the PDB. For closely homologous or identical sequences (i.e., those with more than 70% conserved residues) we eliminated all but one member. The ß-barrel database that we used in the calculations is described in detail in Table 1
. It has 15 diverse members comprising a total of 210 membrane-spanning ß-strands with more than 2000 amino acids in the membrane-spanning segments.
Identification of membrane-spanning segments
Three features, which are present in all ß-barrel structures, were used to align the XY plane of each protein's Cartesian coordinates with the putative plane of the bilayer: the band of aromatics that lies in the bilayer interfacial region (Schiffer et al. 1992; von Heijne 1994; Yau et al. 1998), the band of charged residues just outside of the aromatics, and the band of aliphatic residues that interact with the hydrocarbon core of the bilayer (see Fig. 1
for an example). Structure coordinates were transformed as described in Materials and Methods so that the three bands of residues around each ß-barrel (aromatic, aliphatic, and charged) were aligned with the XY plane of the new coordinate system.
After aligning the structures along the bilayer normal, we identified all ß-strands in each structure using the annotation in the PDB datafile, and we identified the ß-strands that span the membrane by inspection of molecular graphics images. One additional residue beyond the designated membrane-spanning ß-sheet was also included in each strand segment. Residues in a membrane-spanning strand were designated as either exposed, internal, or involved in proteinprotein interfaces. Exposed residues were those whose C
to Cß vector extended away from the axis of the barrel and whose side chain was more than 50% "solvent" exposed on the barrel surface. Internal residues were those whose C
to Cß vector pointed towards the interior of the barrel. The geometry of ß-sheet secondary structure places side chains on alternating inner and outer surfaces of the ß-sheet so this distinction is unambiguous. We classified the numerous glycine residues in the ß-barrel database by the orientation of their C
-H vectors and the exposure of the
carbon. We did not differentiate between internal residues that were exposed to water within an aqueous pore or those that were buried in the protein. Residues in proteinprotein contacts were those residues whose C
to Cß vector was oriented out from the barrel axis, but whose side chain was not exposed in the multimer structure because of proteinprotein contacts. Because we are trying to characterize and exploit the unique physical properties of the membrane-interacting surfaces of these proteins, we have excluded the residues in proteinprotein contacts from the database. The properties and composition of these residues, which are similar to proteinprotein interfaces in soluble proteins, have been discussed (Seshadri et al. 1998).
Identification of the bilayer midplane with hydrophobicity profiles
Hydrophobicity profiles for the external and internal residues for all XY-aligned structures were calculated by summing the hydrophobicity of all ß-strand residues within a 5-Å sliding window that was moved along the axis of the bilayer normal. Examples of hydrophobicity profiles for external residues are shown in Figure 2A and B
. For this analysis we used an experimentally derived hydrophobicity scale measured for peptides partitioning into bulk octanol (Wimley et al. 1996). This scale is "absolute" in the sense that it is a whole-residue hydrophobicity scale that includes contributions from both the side chains and the polypeptide backbone. Thus, negative 
G values indicate a net preference of the polypeptide in the window for an octanol phase relative to water. For all the ß-barrel structures examined, the hydrophobicity profile of the external surfaces was very similar to the examples shown in Figure 2A and B
, with a band of negative 
G 27-Å wide (average: 26.5 ± 0.7 SD Å) flanked by regions of large positive 
G. The 27-Å band corresponds to the width of the bacterial outer membrane. The crossover points signify the edges of the hydrophobic membrane phase.
The midpoint of the negative 
G band, as delineated by the crossover points, was taken to be the midpoint of the bilayer. We transformed the coordinates of the ß-barrel structures so that the bilayer midplane for all structures was set to z = 0. This places all of the proteins in the database on a universal "bilayer" coordinate system. The transbilayer profiles for all of the ß-barrel proteins in the database (e.g., Fig. 2A,B) were remarkably similar. Composite profiles calculated from the sum of all the ß-barrels are shown in Figure 3A and B
. There are several universal features of the hydrophobicity profiles that may be important for genomic identification of ß-barrel membrane proteins. The 27-Å negative 
G band, the pronounced peaks in the distribution of external aromatic residues at ±10 Å, and the peaks in the abundance of external charged residues at ±15 Å. In Figure 3B
we also show the hydrophobicity profile of the internal ß-barrel residues, which have a featureless broad hydrophilic character across the membrane.

View larger version (37K):
[in this window]
[in a new window]
|
Fig. 3. Composite transbilayer profiles for all ß-barrel membrane proteins of known structure. (A) Fractional abundance of external aromatic and ionized residues summed over a 5-Å sliding window. The abundance is divided by the total number of external residues within the window. (B) Composite hydrophobicity of internal and exposed amino acids in the ß-barrel membrane proteins of known structure (Table 1 ). The hydrophobicity scale is an absolute scale based on octanol partitioning of model peptides (Wimley et al. 1996), and was calculated using a 5-Å sliding window. Negative numbers on the X-axis signify residues closer to the periplasmic space, and negative numbers on the Y-axis of (B) signify greater hydrophobicity. The hydrophobic thickness of the membrane, 27 Å, is centered on X = 0 Å, and is shown as a gray box. Note that the hydrophobicity scale is an absolute scale that has not been normalized. The fact that the natural zero level of the octanol scale corresponds exactly to the actual membrane-spanning segments has been noted elsewhere for helical bundle membrane proteins applications (S. Jayasinghe, K. Hristova, and S.H. White 2001).
|
|
Composition of ß-barrels
The ß-barrel database contains 1592 amino acids in membrane-spanning ß-barrels that are either exposed or internal and about 400 additional residues that are found at proteinprotein interfaces. Raw abundance (Fig. 4
) was determined for residues within the 27 Å width of the bilayer, or ±13.5 Å from the bilayer midplane and also for interfacial and hydrocarbon core regions of the bilayer separately. The bilayer thickness was subdivided, following structural models of bilayers (Wiener and White 1992), into a hydrocarbon core region ±6.5 Å from the midplane and an interfacial region between 6.5 and 13.5 Å from the midplane. Interior residues had similar abundances in both regions of the bilayer, as shown in Figure 4B
and listed in Table 2
. However, some external residues had very distinct abundance differences between the hydrocarbon core and the interface. For example, tyrosine is about twofold more abundant in the interface than the core, and tryptophan is about sixfold more abundant in the interface, while leucine and alanine are about half as abundant in the interface as in the hydrocarbon core. Abundance data are given in Table 2
, and are available as electronic supplementary material.

View larger version (57K):
[in this window]
[in a new window]
|
Fig. 4. Raw amino acid abundance for the external and internal amino acids in the database of all known ß-barrel membrane proteins. (A) External residues. (B) Internal residues. Raw abundance values are the total number of each amino acid divided by the total number of amino acids in that structural subclass. In addition to the abundance across the whole bilayer, we also show the abundance for each of two bilayer regimes, the hydrocarbon core ±6.5 Å from the bilayer midplane and the bilayer interface between 6.5 and 13.5 Å from the midplane. Abundance values are ranked, left to right, by the value for the whole bilayer.
|
|
The information content of an amino acid abundance measurement such as those shown in Figure 4A and B
does not reside in the raw abundance values but instead in the deviation of the observed abundance from the expected genomic abundance. We, therefore, calculated the expected abundance of each amino acid in the database, fx, using a weighted average of genomic abundances, fix, using
where the relative weight, wi, is for each organism, i. Weights were calculated by
where ni is the number of amino acids in the database that are from each organism, i, and ntotal is the total number of amino acids in the database. Relative ß-barrel abundance values (Table 2
) were calculated by dividing raw abundance by the weighted expectation values, fx. Relative abundances are plotted in Figure 5A and B
and are listed in Table 2
. The dotted line in the relative abundance plots (Fig. 5A,B), shows the value of 1 expected from the genomic abundance. Deviations from 1 are a measure of the information content of each amino acid (Seshadri et al. 1998). Note that the most abundant external ß-barrel residues leucine and valine (Fig. 4A
), have a smaller information content in the relative scale (Fig. 5A
) because of their high natural abundance, while the aromatics have a high information content.
Architecture of ß-barrels
The goal of this work is to obtain information from known ß-barrels that will be useful in characterizing unknown sequences in genome databases. Thus, we also need to explore the architecture and architectural diversity of known structures. The most relevant architectural variable is the rise per residue of the ß-strands along the direction normal to the bilayer plane. Simulations have shown that the shear number and tilt angle of ß-barrels can vary within certain bounds (Murzin et al. 1994; Sansom and Kerr 1995), as reflected in the known structures. Although the maximum possible rise per residue is about 3.6 Å for a ß-strand perpendicular to the bilayer, known structures (Schulz 2000) and theory (Sansom and Kerr 1995) suggest that tilted strands are energetically preferred. We determined the distribution of ß-barrel rise per residue values at the bilayer midplane by calculating the value, over the three residues closest to the midplane, for each membrane-spanning strand. The results, shown in Figure 6
, demonstrate the narrow range of variation in known structures. The rise per residue in the database is 2.7 ± 0.25 Å per residue, or about 10 ± 1 residues across the membrane.

View larger version (34K):
[in this window]
[in a new window]
|
Fig. 6. Histogram of the rise per residue in ß-barrel membrane proteins of known structure. For each lipid-exposed ß-strand in our database we calculated the rise per residue from the three residues closest to the bilayer midplane. The scale at the top shows a conversion to the number of residues required to span the 27-Å thickness of the membrane.
|
|
We also calculated the distribution of loop length in the ß-barrels in the database. These data are shown in Figure 7
. In this work, loops are defined as segments between membrane-spanning ß-strands that are outside the thickness of the membrane. In other words, more than 13.5 Å from the bilayer midplane. Note that about half of the loops are shorter than six residues, indicating that most membrane-spanning ß-strands are connected to at least one other strand by a short loop. This suggests that the ß-hairpin is the basic structural building block of ß-barrel membrane proteins. As apparent in the example shown in Figure 1
and in Figure 2A and B
, the short and long loops of ß-barrel membrane proteins are generally segregated onto opposite sides of the membrane.

View larger version (39K):
[in this window]
[in a new window]
|
Fig. 7. Histogram of interstrand loop lengths in the known ß-barrel membrane proteins. In this measurement, a loop is a count of all the residues between two ß-strands that are outside of the bilayer, more than 13.5 Å from the bilayer midplane. The distribution is bimodal, with about 45% of the loops shorter than eight residues and 55% of the loops longer.
|
|
 |
Discussion
|
|---|
Uniqueness of membrane ß-barrel dyad repeats
Membrane-spanning ß-strands, like all ß-sheets, have a dyad repeat topology in which alternating residues are oriented toward alternating faces of the sheet. In ß-barrel membrane proteins about half of the membrane-spanning residues are hydrophobic residues that are oriented toward the membrane lipids, while the other half are more hydrophilic residues that are oriented towards the interior of the barrel. Several ß-barrel identification algorithms have been developed, in part, on the idea that membrane ß-barrels could be recognizable through the dyad repeat of hydrophobic (external) and hydrophilic (internal) residues (e.g., Fischbarg et al. 1995). However, difficulties arise when genome databases are screened for ß-barrel membrane proteins using this simple idea because the interior of membrane-spanning ß-barrels are not necessarily very hydrophilic, and because many soluble ß-sheets also have a similar dyad repeat motif in which one hydrophobic face of a sheet is buried and one hydrophilic face is more exposed to the aqueous phase. Our goal in this work was to use the known ß-barrels to generate a data set based on the observed abundance of the amino acids and the architecture of ß-barrel membrane proteins that will further help to differentiate ß-barrel membrane proteins from the abundant amphipathic ß-sheets of soluble proteins.
From the strand length distribution shown in Figure 6
we concluded that a search for a membrane-spanning segment of 10 residues will be able to identify most transmembrane ß-strands. We performed a 10-residue sliding window analysis for each protein examined. For each 10-residue sliding window in a protein's amino acid sequence we calculated a "ß-strand score" based on the two abundance data sets (interior and exposed) determined for ß-barrel membrane proteins (shown in Fig. 5A,B, and listed in Table 2
) using
or
whichever is highest, where
Xlin and
Xlout are ln (relative abundance) values for interior (in) and exterior (out) residues (Table 2
) for the ith amino acid in the sliding window. A comparison between the ß-strand scores for the membrane-spanning ß-strands of ß-barrel membrane proteins and the whole E. coli genome (Perna et al. 2001) is shown in Figure 8
. The peak for the ß-barrel strands is at approximately 2.5
from the center of the genome distribution. This is a good starting point for the distinction of membrane-spanning ß-strands in genome databases. We also made the same calculations using a simple dyad repeat of alternating octanol hydrophobicity (Wimley et al. 1996). The results of this comparison, shown in Figure 9
, show that the distinction between membrane-spanning ß-strands and the genomic distribution is significantly poorer than for the scores generated with the abundance data of Table 2
.
ß-barrel profiles
An example of a 10 residue sliding window score profile using the abundance data in Table 2
is shown in Figure 10A
. The sequence examined is the membrane-spanning domain of the 22-stranded monomeric ß-barrel FhuA from E. coli. The actual membrane-spanning ß-strands are shown as solid black bars. For reference, the figure has a gray area between 2 and 6 that covers the range in which most membrane-spanning ß-strands are found (see Fig. 8
). Note that the algorithm is successful at identifying most membrane-spanning ß-strands, although there are also some false positive peaks. A similar over prediction is encountered for the prediction of transmembrane helices in many hydropathy analyses (Zen et al. 1995; Casadio et al. 1996; Krogh et al. 2001). The results of this analysis were the same if we treated FhuA as an unknown protein and left it out of the abundance calculation.

View larger version (48K):
[in this window]
[in a new window]
|
Fig. 10. Examples of sliding window scores for the membrane-spanning segment of FhuA, a monomeric 22-stranded ß-barrel (Table 1 ). The actual membrane-spanning strands are shown by the horizontal bars. (A) ß-Strand score calculated as described in the text. A membrane-spanning ß-strand will have a sharp peak. The gray box represents the area in which most known membrane-spanning ß-strands fall. Note that every ß-strand in this protein has a corresponding peak in this regime. (B) ß-Hairpin score is the sum, in a 25-residue sliding window, of the highest peak in residues 110 and the highest peak in residues 1525. Arrows denote the location of the short turns between known ß-strands. Note that most of the ß-hairpins in the protein are correctly identified.
|
|
To improve the ability to rapidly recognize ß-barrels in genome databases and to simplify the sliding window average, we also incorporated the architectural data (Figs. 6
,
) into a secondary sliding window calculation that gives a "ß-hairpin" score from the ß-strand score. The ß-hairpin score, as shown in Figure 10B
, is the sum, in a 25-residue sliding window, of the highest ß-strand score in residues 110 and the highest ß-strand score in residues 1525. The ß-hairpin score is thus highest when there are two ß-strand peaks separated by a short loop. A prototypical ß-hairpin with two 10 residue ß-strands separated by a five-residue loop (see Figs. 6
,
) will give a high, flat peak in this ß-hairpin analysis. Note in Figure 10B
that most of the ß-hairpins of FhuA are correctly identified in this analysis.
Screening of genomic data
These analyses are being conducted so that we can begin to develop methods for rapidly identifying potential ß-barrels in genome databases. Potential ß-barrels can then be further analyzed with neural network-based structure prediction algorithms (Gromiha et al. 1997; Jacoboni et al. 2001) and with molecular biology and proteomics tools (Molloy et al. 2000). A rapid genomic screening algorithm requires a simple parameterization or scoring of each protein sequence. One feature we expect to find in all ß-barrel membrane proteins is a set of roughly 5 to 15 peaks in the ß-hairpin analysis like that in Figure 10B
. The number of ß-strands or ß-hairpins is expected to scale approximately with protein size; thus, in our preliminary genomic analyses we calculated a single ß-barrel score for each protein by summing the high peaks as follows:
and we obtained the distribution shown in Figure 11
for the E. coli genome. We chose a cutoff value of 6 because it correctly identifies
90% of the ß-hairpins in our structure database, without also including many false peaks (see Fig. 10B
). Using this algorithm, we calculated scores for three sets of known ß-barrel membrane proteins: known crystal structures used in this work (Table 1
), trimeric porins, and TonB-dependent outer membrane receptors. The median genomic score is 0.4, whereas all members of these three sets of ß-barrel membrane proteins are found beyond the 85th percentile at 1.0 and many score higher than the 97th percentile score at 2.0. The eight-stranded ß-barrel OmpX (Table 1
), at 5.5, is the highest scoring protein in the entire E. coli genome.

View larger version (34K):
[in this window]
[in a new window]
|
Fig. 11. Distribution of ß-barrel scores for all proteins in the E. coli genome and in sets of known ß-barrel membrane proteins. The known proteins are from three groups: known structures from the protein data bank (Table 1 ), trimeric porins, and TonB-dependent outer membrane receptors. Note that the known outer membrane proteins have scores that fall well beyond the mean of the E. coli distribution, 0.4.
|
|
Using this simple and rapid scoring algorithm we have begun to analyze the whole genomes of Gram-negative bacteria. Here we discuss preliminary results from the genomes of Escherichia coli and Pseudomonas auriginosa as examples. After scoring and ranking all the open reading frames in these two genomes, we examined the 125 highest scoring proteins for each genome. These proteins, which represent about 2.5% of all open reading frames, fall between 1.7 and 5.5 in ß-barrel score (Fig. 11
). They have been categorized in Table 3
. We find four main classes of proteins in this high-scoring group. Known outer membrane proteins and putative or probable outer membrane proteins, identified by sequence homology, comprise approximately half of the genes in the highest scoring group. This observation strongly supports the idea that this algorithm can accurately detect ß-barrel membrane proteins. Unidentified, open reading frames or hypothetical proteins also comprise about half of these highest scoring proteins. It seems very likely that some of these sequences encode for functional ß-barrel membrane proteins. Interestingly, we also find a significant number of fimbrial (piliar) proteins, fimbrial usher proteins, adhesin-like proteins, and exoproteins in this highest scoring group. These are all proteins that reside in, or pass through, the outer membrane. Proteins or hypothetical proteins belonging to other classes, such as probable soluble enzymes, comprise only a very small fraction of the high-scoring genes. The complete genomic lists of ß-barrel scores are provided as Electronic Supplementary Material to this manuscript.
 |
Conclusions
|
|---|
We have analyzed the amino acid composition and architecture of all ß-barrel membrane proteins of known structure. These data have been used to develop a simple algorithm for rapidly screening genomes for potential ß-barrel membrane proteins. Application of this algorithm to the genomes of the Gram-negative bacteria Escherichia coli and Psedomonas auriginosa has revealed dozens of potential ß-barrel membrane proteins that have previously not yet been identified or annotated as such. Future experiments will be directed toward refinement of the screening algorithm and toward application of proteomics methods to determine if the potential ß-barrels that we have identified can be expressed as ß-barrel membrane proteins in bacterial outer membranes.
 |
Materials and methods
|
|---|
Transformation of PDB coordinates to the bilayer plane
Each protein's XYZ PDB coordinates were transformed to align the "bilayer plane" of the protein with the XY plane of the coordinate system. First, the PDB coordinate file was converted to a kinemage file using PreKin (Richardson and Richardson 1994). With the program Mage (Richardson and Richardson 1994) we viewed the kinemage and used the position of the external aromatics, aliphatics, and charged residues to align each protein with the XY plane. The transformation matrix was obtained from Mage and used in a modified version of the program KinPlot (Wimley et al. 1994) to transform the coordinates and rewrite them in PDB format. The output of this procedure is a PDB format file in which the plane of the bilayer is coincident with the XY plane of the atomic coordinate system. Alignment of the proteins along the z-axis is described in the text. All the software used in this work that is not publicly obtainable is available from the author upon request.
Hydrophobicity profiles
Hydrophobicity profiles were calculated over a 5-Å sliding average window, which was moved across the protein in the bilayer coordinate system along a line normal to the bilayer. The "location" of each residue was taken to be the XYZ coordinates of the ß-carbon, or the
-carbon for glycine. We examined the differences that would occur in the locations of long polar side chains, such as lysine, if we instead used the position of the polar side-chain moiety, but we found only small net differences from the position of the ß-carbon (
1 Å or less). The octanol hydrophobicity scale, which has been discussed in detail elsewhere (Wimley et al. 1996; White and Wimley 1998 White and Wimley 1999) is based on the partitioning of peptides of the form AcWL-X-LL into bulk octanol. The scale is less permissive of polar residues, and appears to be a good scale for mimicking the environment of membrane proteins.
 |
Electronic supplemental material
|
|---|
Electronic supplemental material consists of tabulated amino acid abundance data (Table 2
) and tables of sorted ß-barrel scores for the complete genomes of the two Gram-negative bacteria discussed in the text: Escherichia coli and Pseudomonas aeruginosa. After the file header, the genomic data are given in five columns: ß-barrel score (sorted), protein length, number of peaks in the ß-hairpin score greater than 4.0 (Fig. 10
), description of the protein in the genome annotation, and the protein's code. File name conventions are as follows: Ecoli.doc: Escherichia coli; Paeruginosa. doc: Pseudomonas aeruginosa.
 |
Acknowledgments
|
|---|
The New Orleans Protein Folding Intergroup is gratefully acknowledged for many invaluable discussions, and we thank Samuel J. Landry and William F. Walkenhorst for critically reading the manuscript. We are indebted to Dr. Harald Engelhardt (Max-Planck Institute for Biochemistry, Munich) for sending the coordinates of Omp32 before their release from the PDB. Funded by NIH (GM60000) and the Louisiana Board of Regents Support Fund 1999-02-RD-A-43.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
 |
References
|
|---|
Alm, R.A., Bina, J., Andrews, B.M., Doig, P., Hancock, R.E., and Trust, T.J. 2000. Comparative genomics of Helicobacter pylori: Analysis of the outer membrane protein families. Infect. Immun. 68:41554168.[Abstract/Free Full Text]
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403410.[CrossRef][Medline]
Benz, R. 1994. Permeation of hydrophilic solutes through mitochondrial outer membranes: Review on mitochondrial porins. Biochim. Biophys. Acta 1197:167196.[Medline]
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The Protein Data Bank. Nucleic Acids Res. 28:235242.[Abstract/Free Full Text]
Bishop, C.M., Walkenhorst, W.F., and Wimley, W.C. 2001. Folding of ß-sheet membrane proteins: Specificity and promiscuity in peptide model systems. J. Mol. Biol. 309:975988.[CrossRef][Medline]
Buchanan, S.K., Smith, B.S., Venkatramani, L., Xia, D., Esser, L., Palnitkar, M., Chakraborty, R., van der Helm, D., and Deisenhofer, J. 1999. Crystal structure of the outer membrane active transporter FepA from Escherichia coli. Nat. Struct. Biol. 6:5663.[CrossRef][Medline]
Casadio, R., Fariselli, P., Taroni, C., and Compiani, M. 1996. A predictor of transmembrane
-helix domains of proteins based on neural networks. Eur. Biophys. J. 24:165178.[Medline]
Cowan, S.W., Garavito, R.M., Jansonius, J.N., Jenkins, J.A., Karlsson, R., Koenig, N., Pai, E.F., Pauptit, R.A., Rizkallah, P.J., Rosenbusch, J.P., Rummel, G., and Schirmer, T. 1995. The structure of OmpF porin in a tetragonal crystal form. Structure 3:10411050.[Medline]
Cowan, S.W., Schirmer, T., Rummel, G., Steiert, M., Ghosh, R., Pauptit, R.A., Jansonius, J.N., and Rosenbusch, J.P. 1992. Crystal structures explain functional properties of two E. coli porins. Nature 358:727733.[CrossRef][Medline]
Dutzler, R., Rummel, G., Alberti, S., Hernandez-Alles, S., Phale, P., Rosenbusch, J., Benedi, V., and Schirmer, T. 1999. Crystal structure and functional characterization of OmpK36, the osmoporin of Klebsiella pneumoniae.Struct. Fold. Design 7:425434.[CrossRef]
Fischbarg, J., Li, J., Cheung, M., Czegledy, F., Iserovich, P., and Kuang, K. 1995. Predictive evidence for a porin-type ß-barrel fold in CHIP28 and other members of the MIP family. A restricted-pore model common to water channels and facilitators. J. Membr. Biol. 143:177188.[Medline]
Fischer, K., Weber, A., Brink, S., Arbinger, B., Schunemann, D., Borchert, S., Heldt, H.W., Popp, B., Benz, R., and Link, T.A. 1994. Porins from plants. Molecular cloning and functional characterization of two new members of the porin family. J. Biol. Chem. 269:2575425760.[Abstract/Free Full Text]
Forst, D., Welte, W., Wacker, T., and Diederichs, K. 1998. Structure of the sucrose-specific porin ScrY from Salmonella typhimurium and its complex with sucrose. Nat. Struct. Biol. 5:3746.[CrossRef][Medline]
Gromiha, M.M., Majumdar, R., and Ponnuswamy, P.K. 1997. Identification of membrane spanning ß-strands in bacterial porins. Protein Eng. 10:497500.[Abstract/Free Full Text]
Jacoboni, I., Martelli, P.L., Fariselli, P., De, P.V., and Casadio, R. 2001. Prediction of the transmembrane regions of ß-barrel membrane proteins with a neural network-based predictor. Protein Sci. 10:779787.[Abstract/Free Full Text]
Jayasinghe, S., Hristova, K., and White, S.H. 2001. Energetics, stability, and prediction of transmembrane helices. J. Mol. Biol. 312:927934.[CrossRef][Medline]
Koronakis, V., Sharff, A., Koronakis, E., Luisi, B., and Hughes, C. 2000. Crystal structure of the bacterial membrane protein TolC central to multidrug efflux and protein export. Nature 405:914919.[CrossRef][Medline]
Kreusch, A. and Schulz, G.E. 1994. Refined structure of the porin from Rhodopseudomonas blastica. Comparison with the porin from Rhodobacter capsulatus.J. Mol. Biol. 243:891905.[CrossRef][Medline]
Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E.L. 2001. Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. J. Mol. Biol. 305:567580.[CrossRef][Medline]
Locher, K.P., Rees, B., Koebnik, R., Mitschler, A., Moulinier, L., Rosenbusch, J.P., and Moras, D. 1998. Transmembrane signaling across the ligand-gated FhuA receptor: Crystal structures of free and ferrichrome-bound states reveal allosteric changes. Cell 95:771778.[CrossRef][Medline]
Meyer, J.E.W., Hofnung, M., and Schulz, G.E. 1997. Structure of maltoporin from Salmonella typhimurium ligated with a nitrophenyl-maltotrioside. J. Mol. Biol. 266:761775.[CrossRef][Medline]
Molloy, M.P., Herbert, B.R., Slade, M.B., Rabilloud, T., Nouwens, A.S., Williams, K.L., and Gooley, A.A. 2000. Proteomic analysis of the Escherichia coli outer membrane. Eur. J. Biochem. 267:28712881.[Medline]
Murzin, A.G., Lesk, A.M., and Chothia, C. 1994. Principles determining the structure of ß-sheet barrels in proteins: I. A theoretical analysis. J. Mol. Biol. 236:13691381.[CrossRef][Medline]
Pautsch, A. and Schulz, G.E. 1998. Structure of the outer membrane protein A transmembrane domain. Nat. Struct. Biol. 5:10131017.[CrossRef][Medline]
Perna, N.T., Plunkett III, G., Burland, V., Mau, B., Glasner, J.D., Rose, D.J., Mayhew, G.F., Evans, P.S., Gregor, J., Kirkpatrick, H.A., Posfai, G., Hackett, J., Klink, S., Boutin, A., Shao, Y., Miller, L., Grotbeck, E.J., Davis, N.W., Lim, A., Dimalanta, E.T., Potamousis, K.D., Apodaca, J., Anantharaman, T.S., Lin, J., Yen, G., Schwartz, D.C., Welch, R.A., and Blattner, F.R. 2001. Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 409:529533.[CrossRef][Medline]
Richardson, D.C. and Richardson, J.S. 1994. Kinemages: Simple macromolecular graphics for interactive teaching and publication. Trends Biochem. Sci. 19:135138.[CrossRef][Medline]
Rost, B., Casadio, R., Fariselli, P., and Sander, C. 1995. Transmembrane helices predicted at 95% accuracy. Protein Sci. 4:521533.[Abstract]
Saier Jr., M.H. 2000. Families of proteins forming transmembrane channels. J. Membr. Biol. 175:165180.[CrossRef][Medline]
Sansom, M.S.P. and Kerr, I.D. 1995. Transbilayer pores formed by ß-barrels: Molecular modeling of pore structures and properties. Biophys. J. 69:13341343.[Abstract/Free Full Text]
Schiffer, M., Chang, C.H., and Stevens, F.J. 1992. The functions of tryptophan residues in membrane proteins. Protein Eng. 5:213214.[Abstract/Free Full Text]
Schirmer, T. and Cowan, S.W. 1993. Prediction of membrane-spanning ß-strands and its application to maltoporin. Protein Sci. 2:13611363.[Medline]
Schulz, G.E. 2000. ß-Barrel membrane proteins. Curr. Opin. Struct. Biol. 10:443447.[CrossRef][Medline]
Seshadri, K., Garemyr, R., Wallin, E., von Heijne, G., and Elofsson, A. 1998. Architecture of ß-barrel membrane proteins: Analysis of trimeric porins. Protein Sci. 7:20262032.[Abstract]
Snijder, H.J., Ubarretxena-Belandia, I., Blaauw, M., Kalk, K.H., Verheij, H.M., Egmond, M.R., Dekker, N., and Dijkstra, B.W. 1999. Structural evidence for dimerization-regulated activation of an integral membrane phospholipase. Nature 401:717721.[CrossRef][Medline]
Song, L., Hobaugh, M.R., Shustak, C., Cheley, S., Bayley, H., and Gouaux, J.E. 1996. Structure of staphylococcal
-hemolysin, a heptameric transmembrane pore. Science 274:18591866.[Abstract/Free Full Text]
Stover, C.K., Pham, X.Q., Erwin, A.L., Mizoguchi, S.D., Warrener, P., Hickey, M.J., Brinkman, F.S., Hufnagle, W.O., Kowalik, D.J., Lagrou, M., Garber, R.L., Goltry, L., Tolentino, E., Westbrock-Wadman, S., Yuan, Y., Brody, L.L., Coulter, S.N., Folger, K.R., Kas, A., Larbig, K., Lim, R., Smith, K., Spencer, D., Wong, G.K., Wu, Z., and Paulsen, I.T. 2000. Complete genome sequence of Pseudomonas aeruginosa PA01, an opportunistic pathogen. Nature 406:959964.[CrossRef][Medline]
Vogt, J. and Schulz, G.E. 1999. The structure of the outer membrane protein OmpX from Escherichia coli reveals possible mechanisms of virulence. Struct. Fold. Design 7:13011309.[CrossRef]
von Heijne, G. 1994. Membrane proteins: From sequence to structure. Annu. Rev. Biophys. Biomol. Struct. 23:167192.[Medline]
von Heijne, G. 1996. Prediction of transmembrane protein topology. In Protein structure prediction (eds. M.J.E. Sternberg), pp. 101110. Oxford University Press, Oxford.
Weiss, M.S. and Schulz, G.E. 1992. Structure of porin refined at 1.8 Å resolution. J. Mol. Biol. 227:493509.[CrossRef][Medline]
White, S.H. and Wimley, W.C. 1998. Hydrophobic interactions of peptides with membrane interfaces. Biochim. Biophys. Acta 1376:339352.[Medline]
. 1999. Membrane protein folding and stability: Physical principles. Annu. Rev. Biophys. Biomol. Struct. 28:319365.[CrossRef][Medline]
Wiener, M.C. and White, S.H. 1992. Structure of a fluid dioleoylphosphatidylcholine bilayer determined by joint refinement of X-ray and neutron diffraction data. III. Complete structure. Biophys. J. 61:434447.
Wimley, W.C., Creamer, T.P., and White, S.H. 1996. Solvation energies of amino acid sidechains and backbone in a family of hostguest pentapeptides. Biochemistry 35:51095124.[CrossRef][Medline]
Wimley, W.C., Selsted, M.E., and White, S.H. 1994. Interactions between human defensins and lipid bilayers: Evidence for the formation of multimeric pores. Protein Sci. 3:13621373.[Abstract]
Yau, W.M., Wimley, W.C., Gawrisch, K., and White, S.H. 1998. The preference of tryptophan for membrane interfaces. Biochemistry 37:1471314718.[CrossRef][Medline]
Zen, K.H., Consler, T.G., and Kaback, H.R. 1995. Insertion of the polytopic membrane protein lactose permease occurs by multiple mechanisms. Biochemistry 34:34303437.[CrossRef][Medline]
Zeth, K., Diederichs, K., Welte, W., and Engelhardtm H. 2000. Crystal structure of Omp32, the anion-selective porin from Comamonas acidovorans, in complex with a periplasmic peptide at 2.1 A resolution. Struct. Fold. Design 8:981992.[CrossRef]

CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
A. Randall, J. Cheng, M. Sweredoski, and P. Baldi
TMBpro: secondary structure, {beta}-contact and tertiary structure prediction of transmembrane {beta}-barrel proteins
Bioinformatics,
February 15, 2008;
24(4):
513 - 520.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
X. Han, K. Hristova, and W. C. Wimley
Protein Folding in Membranes: Insights from Neutron Diffraction Studies of a Membrane -Sheet Oligomer
Biophys. J.,
January 15, 2008;
94(2):
492 - 505.
[Abstract]
[Full Text]
[PDF]
|
 |
|