previous section previous page next page next section

Online Lectures on Bioinformatics


Analysis of individual sequences


Amino acid side chains differ in their physico-chemical features. Some of them, e.g., like to be exposed to water, i.e. they are hydrophilic, while the hydrophobic amino acids tend to avoid exposure to water. Charge, size, or flexibility in the backbone are only some of the other examples of amino acid parameters. These parameters are usually measured on a numeric scale such that for every parameter there exists a table assigning a number to each amino acid. For example for the case of hydrophobicity two such scales have become famous. The first one is due to Hopp and Woods [1] while the other one is due to Kyte and Doolittle [2].

Features of the individual amino acids also play a key role in protein secondary structure formation. Consequently, early secondary structure prediction methods have assigned preferences to the amino acids according to which secondary structure they tend to assume. For example, Glutamate is frequently found in alpha helices while Valine has a preference for beta strands and Proline is known to be strongly avoided in helices. Modern secondary structure prediction methods are more involved though.

In contrast to the physico-chemical parameters, preferential occurence of an amino acid in a particular secondary structure is a statistically derived quantitiy. Taking both types of parameters together more than 200 amino acid parameters have been published (Nishikawa et al.). There exists a database listing all of them with their respective values . Subgroups of those, however, are correlated with each other such that the real information content of this large number of parameters is in fact lower than it seems. Argos [3] has analyzed these correlations and selected parameters as a non-redundant set. A tree of relatedness among the different parameters has been computed by Nishikawa.

The functional features of proteins that are grasped by such parameters are manifold: Hydrophobic amino acids tend to occur in the interior of globular proteins, while at the surface of a protein one will preferentially find hydrophilic residues. In transmembrane proteins, the regions of the chain that span the membrane tend to be strongly hydrophobic. Certain periodicities in the occurence of hydrophic and hydrophilc residues may indicate helices (helical wheel representation) or strands (every other residue) that are buried on one side and exposed on the other. By virtue of its relatedness with surface exposure, the hydrophobicity scale or related scales are frequently used for the prediction of antigenic epitopes.

In practice, the various parameters are used to plot a curve along the amino acid chain. Values are averaged within a sliding window to smoothen the curve. The selection of the window width is, of course, arbitrary. Values between 9 and 15 residues would generally seem appropriate. Figure 1 depicts a hydrophobicity plot for human Rhodopsin (AC P08100 at ExPASy).

Figure 1: Hydrophobicity plot of the human Rhodopsin (AC P08100 at ExPASy), created by means of ExPASy-Service ProtScale. The Window size is 9 and the hydrophobicity scale is the one of Kyte J., Doolittle R.F. [2]

exercise 1
exercise 1

Standard sequence analysis software offers programs that plot various parameters for a given protein. Serious software packages tend to provide the user with a selection of informative and non-redundant parameters. Some other packages pretend to offer new insights by plotting large numbers of parameters.

Comments are very welcome.