Accuracy of PHDhtm

PHDhtm helical trans-membrane region prediction

****************************************************************************
*                                                                          *
*                                                                          *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             *
*      Prediction of helical transmembrane segments by PHDhtm:             *
*      a Profile fed neural network system from HeiDelberg                 *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             *
*                                                                          *
*      Authors:            Burkhard Rost & Chris Sander                    *
*                          EMBL, Heidelberg, FRG                           *
*                          Meyerhofstrasse 1, 69 117 Heidelberg            *
*                          Internet: Predict-Help@EMBL-Heidelberg.DE       *
*                                                                          *
*      All rights reserved.                                                *
*                                                                          *
*                                                                          *
****************************************************************************
*                                                                          *
*                                                                          *
*  About the network method                                                *
*  ~~~~~~~~~~~~~~~~~~~~~~~~                                                *
*                                                                          *
*  To be quoted for publications of PHDhtm output:                         *
*     B Rost, R Casadio, P Fariselli & C Sander: Prediction of helical     *
*        transmembrane segments at 95% accuracy. Prot. Science, 1995, 4,   *
*        521-533. (Abstract)                                               *
*                                                                          *
*  The PredictProtein mail server is described in:                         *
*     B Rost:  PHD: predicting one-dimensional  protein structure by pro-  *
*        file based neural networks. Meth. in Enzym., 1996, 266, 525-539.  *
*        (Text)                                                            *
*                                                                          *
*  The network for prediction of secondary structure is described in       *
*  detail in:                                                              *
*     B Rost & C Sander: Prediction of protein structure at better than    *
*        70% accuracy. J. Mol. Biol., 1993, 232, 584-599. (Abstract)       *
*     B Rost & C Sander:  Combining evolutionary information and neural    *
*        networks to predict protein secondary struct. Proteins, 1994, 19, *
*        55-77. (Abstract)                                                 *
*                                                                          *
*                                                                          *
*                                                                          *
****************************************************************************
*                                                                          *
*                                                                          *
*  Estimated Accuracy of Prediction                                        *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  A cross validation test on 69 helical trans-membrane  proteins (in total*
*  about 30,000 residues) with less than 25% pairwise sequence identity    *
*  gave the following results:                                             *
*                                                                          *
*  ++================++-----------------------------------------+          *
*  || Qtotal = 94.7% ||      ("overall two state accuracy")     |          *
*  ++================++-----------------------------------------+          *
*                                                                          *
*  +----------------------------+-----------------------------+            *
*  | Qhelix (% of observed)=92% | Qhelix (% of predicted)=83% |            *
*  | Qloop  (% of observed)=96% | Qloop  (% of predicted)=97% |            *
*  +----------------------------+-----------------------------+            *
*                                                                          *
*..........................................................................*
*                                                                          *
*  These percentages are defined by:                                       *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*                      number of correctly predicted residues              *
*  Qtotal =            ---------------------------------------      (*100) *
*                            number of all residues                        *
*                                                                          *
*                      no of res correctly predicted to be in helix        *
*  Qhelix (% of obs) = -------------------------------------------- (*100) *
*                      no of all res observed to be in helix               *
*                                                                          *
*                                                                          *
*                      no of res correctly predicted to be in helix        *
*  Qhelix (% of pred)= -------------------------------------------- (*100) *
*                      no of all residues predicted to be in helix         *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Further measures of performance                                         *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                         *
*                                                                          *
*  Matthews correlation coefficient:                                       *
*                                                                          *
*  +---------------------------------------------+                         *
*  | Chelix = 0.84, Cloop = 0.84                 |                         *
*  +---------------------------------------------+                         *
*..........................................................................*
*                                                                          *
*  Average length of predicted transmembrane helices:		           *
*                                                                          *
*              +------------+----------+                                   *
*              |  predicted | observed |                                   *
*  +-----------+------------+----------+                                   *
*  | Lhelix  = |    24.6    |   22.2   |                                   *
*  +-----------+------------+----------+                                   *
*..........................................................................*
*                                                                          *
*  The accuracy matrix in detail:                                          *
*                                                                          *
*  +---------------------------------+                                     *
*  |    number of residues with H, L |                                     *
*  +---------+------+-------+--------+                                     *
*  |         |net H | net L |sum obs |                                     *
*  +---------+------+-------+--------+                                     *
*  | obs H   | 5214 |   492 |   5706 |                                     *
*  | obs L   | 1050 | 22423 |  23473 |                                     *
*  +---------+------+-------+--------+                                     *
*  | sum Net | 6264 | 22915 |  29179 |                                     *
*  +---------+------+-------+--------+                                     *
*                                                                          *
*  Note: This table is to be read in the following manner:                 *
*        5214 of all residues predicted to be in a helical trans-membrane  *
*        region, were observed to be in the lipid bilayer, 1050 however    *
*        were observed either inside or outside of the protein, i.e. in    *
*        loop (or non-membrane) regions. The term "observed" refers to DSSP*
*        assignment of secondary structure calculated from 3D coordinates  *
*        of experimentally determined structures (Dictionary of Secondary  *
*        Structure  of Proteins: Kabsch & Sander (1983) Biopolymers, 22,   *
*        2577-2637) where these were available.  For all other proteins,   *
*        the assignment of trans-membrane segments has been taken from the *
*        Swissprot data bank (Bairoch, A.; Boeckmann, B.: The SWISS-PROT   *
*        protein sequence data bank. Nucl. Acids Res. 20: 2019-2022, 1992).*
*                                                                          *
*..........................................................................*
*                                                                          *
*  Overlap between predicted and observed segments:                        *
*                                                                          *
*  +-----------------+---------------+----------------+                    *
*  | segment overlap | % of observed | % of predicted |                    *
*  |   Sov helix     |      95.6%    |      95.5%     |                    *
*  |   Sov loop      |      83.6%    |      97.2%     |                    *
*  +-----------------+---------------+----------------+                    *
*  |   Sov total     |      86.0%    |      96.8%     |                    *
*  +-----------------+---------------+----------------+                    *
*                                                                          *
*        Definition of Sov in: Rost et al., JMB, 1994, 235, 13-26.         *
*                                                                          *
*        As helical trans-membrane segments are longer than globular heli- *
*        ces, correctly predicted segments can easily be made out.  PHDhtm *
*        misses 5 out of 258 observed segments, predicts 6 where non is    *
*        observed and 3 times the predicted helical segment overlaps two   *
*        observed regions.  Thus, in total more than 95% of all segments   *
*        are correctly predicted.                                          *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Entropy of prediction (information measure):                            *
*                                                                          *
*  +-----------------+                                                     *
*  | I = 0.64        |                                                     *
*  +-----------------+                                                     *
*                                                                          *
*        (For comparison: homology modelling of globular proteins in three *
*        states: I=0.62.)                                                  *
*        Definition of Sov in: Rost et al., JMB, 1994, 235, 13-26.         *
*                                                                          *
*                                                                          *
****************************************************************************
*                                                                          *
*                                                                          *
*  Position-specific reliability index                                     *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                     *
*                                                                          *
*  The network predicts two states: helical trans-membrane region and rest *
*  using two output units.  The prediction is assigned by choosing the ma- *
*  ximal unit ("winner takes all").  However, the real numbers of the out- *
*  put units contain additional information.                               *
*  E.g. the difference between the two output units can be used to derive  *
*  a "reliability index".  This index is given for each residue along with *
*  the prediction.  The index is scaled to have values between 0 (lowest   *
*  reliability), and 9 (highest).                                          *
*  The accuracies (Qtot) to be expected for residues with values above a   *
*  particular value of the index are given below as well as the fraction   *
*  of such residues (%res).:                                               *
*                                                                          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*  | index|  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |    *
*  | %res |100.0| 98.8| 97.3| 95.9| 94.1| 92.3| 89.9| 86.2| 75.0| 66.8|    *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*  |      |     |     |     |     |     |     |     |     |     |     |    *
*  | Qtot | 94.7| 95.2| 95.6| 96.2| 96.7| 97.2| 97.7| 98.4| 99.4| 99.8|    *
*  |      |     |     |     |     |     |     |     |     |     |     |    *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*  | H%obs| 91.8| 92.9| 93.8| 94.4| 95.0| 95.7| 96.2| 96.8| 95.5| 78.7|    *
*  | L%obs| 95.3| 95.7| 96.1| 96.6| 97.0| 97.5| 98.1| 98.8| 99.7|100.0|    *
*  |      |     |     |     |     |     |     |     |     |     |     |    *
*  | H%prd| 82.7| 83.8| 85.0| 86.7| 88.1| 89.7| 91.4| 93.8| 96.3| 97.1|    *
*  | L%prd| 97.9| 98.3| 98.5| 98.7| 98.8| 99.0| 99.2| 99.4| 99.7| 99.9|    *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*                                                                          *
*  The above table gives the cumulative results, e.g. 92.3% of all         *
*  residues have a reliability of at least 5.  The overall two-state       *
*  accuracy for this subset is 97.2%.  For this subset, e.g., 95.7% of     *
*  the observed helical trans-membrane residues are correctly predicted,   *
*  and 89.7% of all residues predicted to be in helical trans-membrane     *
*  segment are correct.                                                    *
*                                                                          *
*                                                                          *
*                                                                          *
****************************************************************************