Conservation and Prediction of Solvent Accessibility in Protein Families

Rost, B. & Sander, C.

1994, Proteins, 20, 216-226


Currently, the prediction of three-dimensional (3D) protein structure from sequence alone is an exceedingly difficult task. As an intermediate step, a much simpler task has been pursued extensively: predicting 1D strings of secondary structure.

Here, we present an analysis of another 1D projection from 3D structure: the relative solvent accessibility of each residue. We show that solvent accessibility is less conserved in 3D homologues than secondary structure, and hence is predicted less accurately from automatic homology modelling; the correlation coefficient of relative solvent accessibility between 3D homologues is only 0.77, and the average accuracy of prediction based on sequence alignments is only 0.68. The latter number provides an effective practical upper limit for the accuracy of predicting accessibility from sequence where homology modelling is not possible.

We introduce a neural network system that predicts relative solvent accessibility (projected onto 10 discrete states) using evolutionary profiles of amino acid substitutions derived from multiple sequence alignments. Evaluated in a cross-validation test on 238 unique proteins, the correlation between predicted and observed relative accessibility is 0.54. For a three state (buried, intermediate, exposed) description of relative accessibility, the fraction of correctly predicted residue states is about 58%. In absolute terms this accuracy appears poor, but given the relatively low conservation of accessibility in 3D families, the network system is not far from its likely optimal performance.

The most reliable fraction of the residues (40%) are predicted as accurately as by automatic homology modelling. Prediction is best for buried residues, e.g. 86% of the completely buried sites are correctly predicted as having 0% relative accessibility.

Key words: evolutionary information; multiple alignments; conservation of solvent accessibility; protein families; prediction of relative solvent accessibility; neural networks; protein structure prediction.