Size and Shape of Protein Molecules at the Nanometer Level Determined by Sedimentation, Gel Filtration, and Electron Microscopy
互联网
Most proteins fold into globular domains. Protein folding is driven largely by the hydrophobic effect, which seeks to minimize contact of the polypeptide with solvent. Most proteins fold into globular domains, which have a minimal surface area. Peptides from 10 to 30 kDa typically fold into a single domain. Peptides larger than 50 kDa typically form two or more domains that are independently folded. However, some proteins are highly elongated, either as a string of small globular domains or stabilized by specialized structures such as coiled coils or the collagen triple helix. The ultimate structural understanding of a protein comes from an atomic-level structure obtained by X-ray crystallography or nuclear magnetic resonance. However, structural information at the nanometer level is frequently invaluable. Hydrodynamics, in particular sedimentation and gel filtration, can provide this structural information, and it becomes even more powerful when combined with electron microscopy (EM).
One guiding principle enormously simplifies the analysis of protein structure. The interior of protein subunits and domains consists of closely packed atoms (1 ). There are no substantial holes and almost no water molecules in the protein interior. As a consequence of this, proteins are rigid structures, with a Young’s modulus similar to that of Plexiglas (2 ). Engineers sometimes categorize biology as the science of “soft wet materials”. This is true of some hydrated gels, but proteins are better thought of as hard dry plastic. This is obviously important for all of biology, to have a rigid material with which to construct the machinery of life. A second consequence of the close packed interior of proteins is that all proteins have approximately the same density, about 1.37 g/cm3 . For most of the following, we will use the partial specific volume, v 2 , which is the reciprocal of the density. v 2 varies from 0.70 to 0.76 for different proteins, and there is a literature on calculating or determining the value experimentally. For the present discussion, we will ignore these variations and assume the average v 2 = 0.73 cm3 /g.
|
(2.1) |
The inverse relationship is also frequently useful: M (Da) = 825 V (nm3 ).
|
(2.2) |
Protein M (kDa) |
5 |
10 |
20 |
50 |
100 |
200 |
500 |
---|---|---|---|---|---|---|---|
R min (nm) |
1.1 |
1.42 |
1.78 |
2.4 |
3.05 |
3.84 |
5.21 |
It is important to emphasize that this is the minimum radius of a smooth sphere that could contain the given mass of protein. Since proteins have an irregular surface, even ones that are approximately spherical will have an average radius larger than the minimum.
It is frequently useful to know the average volume of solution occupied by each molecule, or more directly, the average distance separating molecules in solution. This is a simple calculation based only on the molar concentration.
In a 1-M solution, there are 6 × 1023 molecules/l, = 0.6 molecules/nm3 , or inverting, the volume per molecule is V = 1.66 nm3 /molecule at 1 M. For a concentration C , the volume per molecule is V = 1.66/C .
|
(3.1) |
Concentration |
1 M |
1 mM |
1 μM |
1 nM |
---|---|---|---|---|
Distance between molecules (nm) |
1.18 |
11.8 |
118 |
1,180 |
Two interesting examples are hemoglobin and fibrinogen. Hemoglobin is 330 mg/ml in erythrocytes, making its concentration 0.005 M. The average separation of molecules (center to center) is 6.9 nm. The diameter of a single hemoglobin molecule is about 5 nm. These molecules are very concentrated, near the highest physiological concentration of any protein (the crystallins in lens cells can be at >50% protein by weight).
Fibrinogen is a large rod-shaped molecule that forms a fibrin blood clot when activated. It circulates in plasma at a concentration of around 2.5 g/l, about 9 μM. The fibrinogen molecules are therefore about 60 nm apart, comparable to the 46-nm length of the rod-shaped molecule.
Biochemists have long attempted to deduce the shape of a protein molecule from hydrodynamic parameters. There are two major hydrodynamic methods that are used to study protein molecules―sedimentation and diffusion (or gel filtration, which is the equivalent of measuring the diffusion coefficient).
|
(4.1) |
M is the mass of the protein molecule in Dalton; N o is Avogadro’s number, 6.023 × 1023 ; v 2 is the partial specific volume of the protein; typical value is 0.73 cm3 /g; ρ is the density of solvent (1.0 g/cm3 for H2 O); η is the viscosity of the solvent (0.01 g/cm−s for H2 O).
A critical factor in the equation is the frictional coefficient, f (dimensions gram per second) which depends on both the size and shape of the protein. For a given mass of protein (or given volume), f will increase as the protein becomes elongated or asymmetrical (f can be replaced by an equivalent expression containing R s , the Stokes radius, to be discussed later). S has the dimensions of time (seconds). For typical protein molecules, S is in the range of 2�20 × 10−13 s, and the value 10−13 s is designated a Svedberg unit, S. Thus, typical proteins have sedimentation coefficients of 2�20 S.
From the above definition of parameters, it is clear that S depends on the solvent and temperature. In classical studies, the solvent-dependent factors were eliminated and the sedimentation coefficient was extrapolated to the value it would have at 20°C in water (for which ρ and η are given above). This is referred to as S 20,w . In the present treatment, we will be referring mostly to standard proteins that have already been characterized, or unknown ones that will be referenced to these in gradient sedimentation, so our use of S will always mean S 20,w .
|
(4.2) |
We have now designated f min as the minimal frictional coefficient for a protein of a given mass, which would obtain if the protein were a smooth sphere of radius R min .
The actual f of a protein will always be larger than f min because of two things. First, the shape of the protein normally deviates from spherical, to be ellipsoidal or elongated; closely related to this is the fact that the surface of the protein is not smooth but rather rough on the scale of the water molecules it is traveling through. Second, all proteins are surrounded by a shell of bound water, one�two molecules thick, which is partially immobilized or frozen by contact with the protein. This water of hydration increases the effective size of the protein and thus increases f .
If one could determine the amount of water of hydration and factor this out, there would be hope that the remaining excess of f over f min could be interpreted in terms of shape. Algorithms have been devised for estimating the amount of bound water from the amino acid sequence, but these generally do not distinguish between buried residues, which have no bound water and surface residues which bind water. Some attempts have been made to base the estimate of bound water based on polar residues, which are mostly exposed on the surface. A 0.3-g H2 O/g protein is a typical estimate, but in fact, this kind of guess is almost useless for analyzing f .
In the older days, when there was some confidence in these estimates of bound water, physical chemists calculated a value called f o , which was the frictional coefficient for a sphere that would contain the given protein, but enlarged by the estimated shell of water (other authors use f o to designate what we term f min (3 , 4 ); we recommend using f min to avoid ambiguity). The measured f for proteins was almost always larger than f o , suggesting that the protein was asymmetrical or elongated. A very popular analysis was to model the protein as an ellipsoid of revolution and calculate the axial ratio from f /f o , using an equation first developed by Perrin. This approach is detailed in most classical texts of physical biochemistry. In fact, the Perrin analysis always overestimates the asymmetry of the proteins, typically by a factor of two to five. It should not be used for proteins.
The problem is illustrated by an early collaborative study of phosphofructokinase, in which the laboratory of James Lee did hydrodynamics and our laboratory did EM (5 ). We found by EM that the tetrameric particles were approximately cylinders, 9 nm in diameter and 14 nm long. The shape was therefore like a rugby ball, with an axial ratio of 1.5 for a prolate ellipsoid of revolution. The Lee group measured the molecular weight and sedimentation coefficient, determined f and estimated water of hydration and f o . They then used the Perrin equation to calculate the axial ratio. The ratio was five, which would suggest that the protein had the shape of a hot dog. The EM structure (which was later confirmed by X-ray crystallography) shows that the Perrin equation overestimated the axial ratio by a factor of 3.
Teller et al. (6 ) summarized the situation: “Frequently the axial ratios resulting from such treatment are absurd in light of the present knowledge of protein structure.” They explained that the major problem with the Perrin equation is that it treats the protein as a smooth ellipsoid, when in fact the surface of the protein is quite rough. Teller et al. went on to show how the frictional coefficient can actually be derived from the known atomic structure of the protein, by modeling the surface of the protein as a shell of small beads of radius 1.4 Å. The shell coated the surface of the protein, modeling its rugosity, and increasing the size of the protein by the equivalent of a single layer of bound water. This analysis has been extended by Garcia De La Torre and colleagues (7 ).
If the Perrin equation is useless, is there some other way that shape can be interpreted from f ? The answer is yes, at a semiquantitative level. We have discovered simple guidelines where the ratio f /f min can provide a good indication of whether a protein is globular, somewhat elongated, or very elongated.
|
(4.3a) |
|
(4.3b) |
Protein M r (kDa) |
10 |
25 |
50 |
100 |
200 |
500 |
1,000 |
---|---|---|---|---|---|---|---|
S max Svedbergs |
1.68 |
3.1 |
4.9 |
7.8 |
12.3 |
22.7 |
36.1 |
• | No protein has S max /S = f /f min smaller than ∼1.2. |
• |
For approximately globular proteins:
S max /S is typically between 1.2 and 1.3.
|
• |
For moderately elongated proteins:
S max /S is in the range of 1.5 to 1.9.
|
• |
For highly elongated proteins (tropomyosin, fibrinogen, extended fibronectin):
S max /S is in the range of 2.0 to 3.0.
|
• |
For very long thread-like molecules like collagen, or huge extended molecules like the tenascin hexabrachion (not shown):
S max /S can range from 3�4 or more.
|
Protein |
Dimensions (nm) |
Mass |
S max |
S |
S max /S |
---|---|---|---|---|---|
Globular protein standards dimensions are from pdb files |
|||||
Phosphofructokinase |
14 × 9 × 9 |
345,400 |
17.77 |
12.2 |
1.46 |
Catalase |
9.7 × 9.2 × 6.7 |
230,000 |
13.6 |
11.3 |
1.20 |
Serum albumin |
7.5 × 6.5 × 4.0 |
66,400 |
5.9 |
4.6 |
1.29 |
Hemoglobin |
6 × 5 × 5 |
64,000 |
5.78 |
4.4 |
1.32 |
Ovalbumin |
7.0 × 3.6 × 3.0 |
43,000 |
4.43 |
3.5 |
1.27 |
FtsZ |
4.8 × 4 × 3 |
40,300 |
4.26 |
3.4 |
1.25 |
Elongated protein standards―tenascin fragments (27 , 28 ); heat repeat (29 , 30 ) |
|||||
TNfn1�5 |
14.7 × 1.7 × 2.8 |
50,400 |
4.94 |
3.0 |
1.65 |
TNfn1�8 |
24.6 × 1.7 × 2.8 |
78,900 |
6.64 |
3.6 |
1.85 |
TNfnALL |
47.9 × 1.7 × 2.8 |
148,000 |
10.1 |
4.3 |
2.36 |
PR65/A HEAT repeat |
17.2 × 3.5 × 2.0 |
60,000 |
5.53 |
3.6 |
1.54 |
Fibrinogen |
46 × 3 × 6 |
390,000 |
19.3 |
7.9 |
2.44 |
Apart from indicating the shape of a protein, S max /S can often give valuable information about the oligomeric state, if one has some idea of the shape. For example, if one knows that the protein subunit is approximately globular (from EM for example), but finds S max /S = 2.1, this would suggest that the protein in solution is actually a dimer. On the other hand, if one thinks a protein is a dimer, but finds S max /S < 1.0 for the dimer mass, the protein is apparently sedimenting as a monomer.
The use of S max /S to estimate protein shape has been described briefly in (8 ).
The understanding of how protein shape affects hydrodynamics is elegantly extended by an analysis originally developed by Kirkwood (9 ) and later extended by Bloomfield and Garcia De La Torres (10 �12 ). In its simplest application, it calculates the sedimentation coefficient of a rigid oligomeric protein composed of subunits of known S and known spacing relative to each other. In more complex applications, a protein of any complex shape can be modeled as a set of nonoverlapping spheres or beads. See Byron (13 ) for a comprehensive review of the principals and applications of hydrodynamic bead modeling of biological macromolecules.
• | A rod of three beads has about a twofold higher S than a single bead. |
• | S max /S is 1.18 for the single bead (the effect of the assumed shell of water), 1.34 for the three-bead rod, and 1.93 for the straight 11-bead rod. This is consistent with the principals given in Section 4 for globular, somewhat elongated, and very elongated particles. |
• | Bending the rod at 90° in the middle causes only a small increase in S . Bending it into a U -shape with the arms about one bead diameter apart increases S a bit more. Bending this same 11-bead structure more sharply, so the two arms are in contact, causes a substantial increase in S , from 5.05 to 5.58. The guiding principle is that folding affects S when one part of the molecule is brought close enough to another to shield it from water flow. |
“Gel filtration chromatography is widely used for determining protein molecular weight.” This quote from Sigma-Aldrich bulletin 891A is a widely held misconception. The fallacy is obscurely corrected by a later note in the bulletin that “Once a calibration curve is prepared, the elution volume for a protein of similar shape, but unknown weight, can be used to determine the MW.” The key issue is “of similar shape”. Generally, the calibration proteins are all globular, and if the unknown protein is also globular, the calibrated gel filtration column does give a good approximation of its molecular weight. The problem is that the shape of an unknown protein is generally unknown. If the unknown protein is elongated, it can easily elute at a position twice the molecular weight of a globular protein.
The gel filtration column actually separates proteins not on their molecular weight but on their frictional coefficient. Since the frictional coefficient, f , is not an intuitive parameter, it is usually replaced by the Stokes radius R s . R s is defined as the radius of a smooth sphere that would have the actual f of the protein. This is much more intuitive since it allows one to imagine a real sphere approximately the size of the protein, or somewhat larger if the protein is elongated and has bound water.
|
(6.1) |
The Stokes radius R s is larger than R min because it is the radius of a smooth sphere whose f would match the actual f of the protein. It accounts for both the asymmetry of the protein and the shell of bound water. More quantitatively, f/f min = S max /S = R s /R min .
Protein |
M r aa seq |
S 20,w |
S max /S |
R s (nm) |
Source |
M r S-M |
---|---|---|---|---|---|---|
Ribonuclease A beef pancreas |
14,044 |
2.0a |
1.05a |
1.64 |
HBC |
13,791 |
Chymotrypsinogen A beef pancreas |
25,665 |
2.6 |
1.21 |
2.09 |
HBC |
22,849 |
Ovalbumin hen egg |
42,910s |
3.5 |
1.27 |
3.05 |
HBC |
44,888 |
Albumin beef serum |
69,322 |
4.6a |
1.33 |
3.55 |
S-M, HBC |
68,667 |
Aldolase rabbit muscle |
157,368 |
7.3 |
1.45 |
4.81 |
HBC |
147,650 |
Catalase beef liver |
239,656 |
11.3 |
1.21 |
5.2 |
S-M |
247,085 |
Apo-ferritin horse spleen |
489,324 |
17.6 |
1.28 |
6.1 |
HBC |
451,449 |
Thyroglobulin bovine |
606,444 |
19 |
1.37 |
8.5 |
HBC |
679,107 |
Fibrinogen, human |
387,344 |
7.9 |
2.44 |
10.7 |
S-M |
355,449 |
The standard proteins should span R s values above and below that of the protein of interest (but in the case of SMC protein from B. subtilis , a short extrapolation to a larger value was used). The literature generally recommends determining the void and included volumes of the column and plotting a partition coefficient K AV (4 ). However, we have found it generally satisfactory to simply plot elution position vs R s for the standard proteins. This generally gives an approximately linear plot, but otherwise, it is satisfactory to draw lines between the points and read the R s of the protein of interest from its elution position on this standard curve.
|
(6.2) |
|
(6.3) |
Simply knowing, R s is not very valuable in itself, except for estimating the degree of asymmetry, but this would be the same analysis developed above for S max /S . However, if one determines both R s and S , this permits a direct determination of molecular weight, which cannot be deduced from either one alone. This is described in the next section.
With the completion of multiple genomes and increasingly good annotation, the primary sequence of almost any protein can be found in the databases. The molecular weight of every protein subunit is therefore known from its sequence. But an experimental measure is still needed to determine if the native protein in solution is a monomer, dimer, or oligomer, or if it forms a complex with other proteins. If one has a purified protein, the molecular weight can be determined quite accurately by sedimentation equilibrium in the analytical ultracentrifuge. This technique has made a strong comeback with the introduction of the Beckman XL-A analytical ultracentrifuge. There are a number of good reviews (14 , 15 ), and the documentation and programs that come with the centrifuge are very instructive.
What if one does not have an XL-A centrifuge or the protein of interest is not purified? In 1966, Siegel and Monte (4 ) proposed a method that achieves the results of sedimentation equilibrium, with two enormous advantages. First, it requires only a preparative ultracentrifuge for sucrose or glycerol gradient sedimentation and a gel filtration column. This equipment is available in most biochemistry laboratories. Second, the protein of interest need not be purified; one needs only an activity or an antibody to locate it in the fractions. This is a very powerful technique and should be in the repertoire of every protein biochemist.
|
(7.1a) |
|
(7.1b) |
where S is in Svedberg units, R s is in nanometer, and M is in Daltons.
Application to SMC protein from B. subtilis . In the sections above, we showed how S of the SMC protein from B. subtilis was determined to be 6.3 S from glycerol gradient sedimentation, and R s was 10.3 nm, from gel filtration. Putting these values in Eq. 7.1b , we find that the molecular weight of SMC protein from B. subtilis is 273,000 Da. From the amino acid sequence, we know that the molecular weight of one SMC protein from B. subtilis subunit is 135,000 Da. The Siegel�Monte analysis finds that the SMC protein from B. subtilis molecule is a dimer. |
Knowing that SMC protein from B. subtilis is a dimer with molecular weight 270,000 Da, we can now determine its S max /S . S max is 15.1 (Eq. 4.3b ) so S max /S is 2.4. The SMC protein from B. subtilis molecule is thus expected to be highly elongated. EM (see below) confirmed this prediction. |
Since the early 1980s, electron microscopy has become a powerful technique for determining the size and shape of single protein molecules, especially ones larger than 100 kDa. Two techniques available in most EM laboratories, rotary shadowing and negative stain, can be used for imaging single molecules. Cryo-EM is becoming a powerful tool for protein structural analysis, but it requires special equipment and expertise. For a large number of applications, rotary shadowing and negative stain provide the essential structural information.
For rotary shadowing, a dilute solution of protein is sprayed on mica, the liquid is evaporated in a high vacuum, and platinum metal is evaporated onto the mica at a shallow angle. The mica is rotated during this process, so the platinum builds up on all sides of the protein molecules. The first EM images of single protein molecules were obtained by Hall and Slayter using rotary shadowing (16 ). Their images of fibrinogen showed a distinctive trinodular rod. However, rotary shadowing fell into disfavor because the images were difficult to reproduce. Protein tended to aggregate and collect salt, rather than spread as single molecules. In 1976, James Pullman, a graduate student at the University of Chicago, then devised a protocol with one simple but crucial modification―he added 30% glycerol to the protein solution. For reasons that are still not understood, the glycerol greatly helps the spreading of the protein as single molecules.
Pullman never published his protocol, but two labs saw his mimeographed notes and tested out the effect of glycerol, as a part of their own attempts to improve rotary shadowing (17 , 18 ). They obtained reproducible and compelling images of fibrinogen (the first since the original Hall and Slayter study and confirming the trinodular rod structure) and spectrin (the first ever images of this large protein). The technique has since been used in characterizing hundreds of protein molecules.
Negative stain is another EM technique capable of imaging single protein molecules. It is especially useful for imaging larger molecules with a complex internal structure, which appear only as a large blob in rotary shadowing. Importantly, noncovalent protein�protein bonds are sometimes disrupted in the rotary shadowing technique (8 ), but uranyl acetate, in addition to providing high resolution contrast, fixes oligomeric protein structures in a few milliseconds (22 ). An excellent review of modern techniques of negative staining, with comparison to cryo-EM, is given in (23 ).
The simple picture of the molecule produced by EM is frequently the most straightforward and satisfying structural analysis at the 1�2-nm resolution. When the structure is confirmed by hydrodynamic analysis, the interpretation is even more compelling.
The text box above showed the application of the Siegel�Monte analysis to SMC protein from B. subtilis , which had only one type subunit and was found to be a dimer. Similar hydrodynamic analysis can be used to analyze multisubunit protein complexes. There are many examples in the literature; I will show here an elegant application to DASH/Dam1.
• | For both the gel filtration (size exclusion chromatography, Fig. 5 a) and gradient sedimentation, Fig. 5 b, two calibration curves of known protein standards are shown, green and black. These are independent calibration runs. In this study, the gel filtration column was calibrated in terms of the reciprocal diffusion coefficient, 1/D , which is proportional to R s (Eq. 6.2 ). |
• | The fractions were analyzed by Western blot for the location of two proteins of the complex, Spc34p and Hsk3p. Methods notes that 1 ml fractions from gel filtration were precipitated with perchloric acid and rinsed with acetone prior to SDS-PAGE, an essential amplification for the dilute samples of yeast cytoplasmic extract. These two proteins eluted together in both gel filtration and sedimentation, consistent with their being part of the same complex. |
• | The profiles of the two proteins were identical when analyzed in their native form in yeast cytoplasmic extract and as the purified complex expressed in E. coli . This is strong evidence that the expression protein is correctly folded and assembled. |
• | There is minimal trailing of any subunits. This means that there is no significant dissociation during the tens of minutes for the gel filtration, or the 12-h centrifugation. The complex is held together by very high affinity bonds, making it essentially irreversible. |
• | Combining the R s = 7.6 nm (from 1/D = 0.35 × 10−7 , and S = 7.4, Eq. 7.1b gives a mass of M = 236 kDa, close to the 204 kDa obtained from adding the mass of the ten subunits. S max is 12.6 giving S max /S = 1.7, suggesting a moderately elongated protein. |