Online Lectures on Bioinformatics
|
Analysis of individual sequences
Introduction
Amino acid side chains differ in their physico-chemical features.
Some of them, e.g., like to be exposed to water, i.e. they are hydrophilic,
while the hydrophobic amino acids tend to avoid exposure to water.
Charge, size, or flexibility in the backbone are only some of the other
examples of amino acid parameters. These parameters are usually measured
on a numeric scale such that for every parameter there exists a table
assigning a number to each amino acid. For example for the case of
hydrophobicity two such scales have become famous. The first one is
due to Hopp and Woods [1] while the other one is due to Kyte and Doolittle [2].
Features of the individual amino acids also play a key role in protein secondary
structure formation. Consequently, early secondary structure prediction methods
have assigned preferences to the amino acids according to which secondary
structure they tend to assume. For example, Glutamate is frequently found
in alpha helices while Valine has a preference for beta strands and Proline
is known to be strongly avoided in helices. Modern secondary structure
prediction methods are more involved though.
In contrast to the physico-chemical parameters, preferential occurence
of an amino acid in a particular secondary structure is a statistically derived quantitiy.
Taking both types of parameters together more than 200 amino acid parameters
have been published (Nishikawa et al.). There exists a database listing
all of them with their respective values . Subgroups of those, however,
are correlated with each other such that the real information content of
this large number of parameters is in fact lower than it seems. Argos [3]
has analyzed these correlations and selected parameters as a non-redundant set.
A tree of relatedness among the different parameters has been computed by Nishikawa.
The functional features of proteins that are grasped by such parameters are manifold:
Hydrophobic amino acids tend to occur in the interior of globular proteins,
while at the surface of a protein one will preferentially find hydrophilic residues.
In transmembrane proteins, the regions of the chain that span the membrane tend to
be strongly hydrophobic.
Certain periodicities in the occurence of hydrophic and hydrophilc residues may
indicate helices (helical wheel representation) or strands (every other residue)
that are buried on one side and exposed on the other.
By virtue of its relatedness with surface exposure, the hydrophobicity scale or
related scales are frequently used for the prediction of antigenic epitopes.
In practice, the various parameters are used to plot a curve along the amino
acid chain. Values are averaged within a sliding window to smoothen the curve.
The selection of the window width is, of course, arbitrary.
Values between 9 and 15 residues would generally seem appropriate.
Figure 1 depicts a hydrophobicity plot for human Rhodopsin (AC P08100 at ExPASy).
Figure 1: Hydrophobicity plot of the human Rhodopsin (AC P08100 at ExPASy),
created by means of ExPASy-Service ProtScale. The Window size is 9 and the
hydrophobicity scale is the one of Kyte J., Doolittle R.F. [2]
 exercise 1
Standard sequence analysis software offers programs that plot various parameters
for a given protein. Serious software packages tend to provide the user with
a selection of informative and non-redundant parameters.
Some other packages pretend to offer new insights by plotting large numbers of parameters.
Comments are very welcome.
luz@molgen.mpg.de
|