Example for predicting 1D structure from SAF formatted alignment(PHD)

(input SAF format, i.e., your alignment)


The output consists of the following parts:



The following information has been received by the server:          
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~          

________________________________________________________________________________

b.rost
EMBL, 69012 Heidelberg, Europe
rost@embl-heidelberg.de
return concise
# CYTOCHROME C OXIDASE POLYPEPTIDE I (cox1_parde)
MSAQISDSIEEKRGFFTRWFMSTNHKDIGVLYLFTAGLAGLISVTLTVYMRMELQHPGVQ
YMCLEGMRLVADAAAECTPNAHLWNVVVTYHGILMMFFVVIPALFGGFGNYFMPLHIGAP
DMAFPRLNNLSYWLYVCGVSLAIASLLSPGGSDQPGAGVGWVLYPPLSTTEAGYAMDLAI
FAVHVSGATSILGAINIITTFLNMRAPGMTLFKVPLFAWAVFITAWMILLSLPVLAGGIT
MLLMDRNFGTQFFDPAGGGDPVLYQHILWFFGHPEVYMLILPGFGIISHVISTFARKPIF
GYLPMVLAMAAIAFLGFIVWAHHMYTAGMSLTQQTYFQMATMTIAVPTGIKVFSWIATMW
GGSIEFKTPMLWALAFLFTVGGVTGVVIAQGSLDRVYHDTYYIVAHFHYVMSLGALFAIF
AGTYYWIGKMSGRQYPEWAGQLHFWMMFIGSNLIFFPQHFLGRQGMPRRYIDYPVEFSYW
NNISSIGAYISFASFLFFIGIVFYTLFAGKPVNVPNYWNEHADTLEWTLPSPPPEHTFET
LPKPEDWDRAQAHR
________________________________________________________________________________




The sequence had been interpreted as being:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

________________________________________________________________________________

>P1; t2
(#)  cytochrome c oxidase polypeptide i (cox1_parde)
MSAQISDSIEEKRGFFTRWFMSTNHKDIGVLYLFTAGLAGLISVTLTVYMRMELQHPGVQ
YMCLEGMRLVADAAAECTPNAHLWNVVVTYHGILMMFFVVIPALFGGFGNYFMPLHIGAP
DMAFPRLNNLSYWLYVCGVSLAIASLLSPGGSDQPGAGVGWVLYPPLSTTEAGYAMDLAI
FAVHVSGATSILGAINIITTFLNMRAPGMTLFKVPLFAWAVFITAWMILLSLPVLAGGIT
MLLMDRNFGTQFFDPAGGGDPVLYQHILWFFGHPEVYMLILPGFGIISHVISTFARKPIF
GYLPMVLAMAAIAFLGFIVWAHHMYTAGMSLTQQTYFQMATMTIAVPTGIKVFSWIATMW
GGSIEFKTPMLWALAFLFTVGGVTGVVIAQGSLDRVYHDTYYIVAHFHYVMSLGALFAIF
AGTYYWIGKMSGRQYPEWAGQLHFWMMFIGSNLIFFPQHFLGRQGMPRRYIDYPVEFSYW
NNISSIGAYISFASFLFFIGIVFYTLFAGKPVNVPNYWNEHADTLEWTLPSPPPEHTFET
LPKPEDWDRAQAHR


The alignment that has been used as input to the network is:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- ------------------------------------------------------------
--- multiple sequence alignment
--- ------------------------------------------------------------
--- MAXHOM ALIGNMENT HEADER: ABBREVIATIONS FOR SUMMARY 
--- ID           : identifier of aligned (homologous) protein
--- STRID        : PDB identifier (only for known structures)
--- PIDE         : percentage of pairwise sequence identity
--- WSIM         : percentage of weighted similarity
--- LALI         : number of residues aligned
--- NGAP         : number of insertions and deletions (indels)
--- LGAP         : number of residues in all indels
--- LSEQ2        : length of aligned sequence
--- ACCNUM       : SwissProt accession number
--- NAME         : one-line description of aligned protein
--- 
--- ALIGNMENT HEADER: SUMMARY
ID         STRID  IDE WSIM LALI NGAP LGAP LEN2 ACCNUM NAME                     
ANDR_MOUSE        100    0  176    2    8  168                          
ANDR_RAT          100    0  175    2    8  167                          
ESTR_CHICK         22    0  176    2    2  174                          
ESTR_HUMAN         22    0  175    2    2  173                          
ESTR_MOUSE         22    0  175    2    2  173                          
ESTR_RAT           22    0  175    2    2  173                          
ESTR_SALIR         27    0  175    2    2  173                          
GCR_MOUSE          53    0  175    2    8  167                          
GCR_RAT            51    0  175    2    8  167                          
GCRA_HUMAN         53    0  175    2    8  167                          
MCR_HUMAN          53    0  175    2    8  167                          
MCR_RAT            53    0  175    2    8  167                          
PRGR_CHICK         57    0  175    2    8  167                          
PRGR_HUMAN         55    0  174    2    8  166                          
PRGR_MOUSE         55    0  175    2    8  167                          
PRGR_RABIT         55    0  175    2    8  167                          
________________________________________________________________________________

--- 
--- ALIGNMENT: IN MSF FORMAT
MSF of: /home/phd/tmp/t-msf_14623.hssp from:    1 to:  176
 /home/phd/tmp/t-msf_14623.ret_msf  MSF:  176  Type: P 15-Nov-95  05:35:1  Check: 5859  ..
 
 
 Name: ANDR_HUMAN   Len:   176  Check:  750  Weight:  1.00
 Name: ANDR_MOUSE   Len:   176  Check:  786  Weight:  1.00
 Name: ANDR_RAT     Len:   176  Check:  750  Weight:  1.00
 Name: ESTR_CHICK   Len:   176  Check: 5710  Weight:  1.00
 Name: ESTR_HUMAN   Len:   176  Check: 6582  Weight:  1.00
 Name: ESTR_MOUSE   Len:   176  Check: 5496  Weight:  1.00
 Name: ESTR_RAT     Len:   176  Check: 5376  Weight:  1.00
 Name: ESTR_SALIR   Len:   176  Check: 3276  Weight:  1.00
 Name: GCR_MOUSE    Len:   176  Check:  658  Weight:  1.00
 Name: GCR_RAT      Len:   176  Check: 1906  Weight:  1.00
 Name: GCRA_HUMAN   Len:   176  Check: 1740  Weight:  1.00
 Name: MCR_HUMAN    Len:   176  Check: 3890  Weight:  1.00
 Name: MCR_RAT      Len:   176  Check: 4570  Weight:  1.00
 Name: PRGR_CHICK   Len:   176  Check: 2725  Weight:  1.00
 Name: PRGR_HUMAN   Len:   176  Check: 3885  Weight:  1.00
 Name: PRGR_MOUSE   Len:   176  Check: 3779  Weight:  1.00
 Name: PRGR_RABIT   Len:   176  Check: 3980  Weight:  1.00
 
//
 
 
           1                                                   50  
ANDR_HUMAN .QLVHVVKWA KALPGFRNLH VDDQMAVIQY SWMGLMVFAM GWRSFTNVNS
ANDR_MOUSE RQLVHVVKWA KALPGFRNLH VDDQMAVIQY SWMGLMVFAM GWRSFTNVNS
ANDR_RAT   .QLVHVVKWA KALPGFRNLH VDDQMAVIQY SWMGLMVFAM GWRSFTNVNS
ESTR_CHICK RELVHMINWA KRVPGFVDLT LHDQVHLLEC AWLEILMIGL VWRSMEH.PG
ESTR_HUMAN .ELVHMINWA KRVPGFVDLT LHDQVHLLEC AWLEILMIGL VWRSMEH.PV
ESTR_MOUSE .ELVHMINWA KRVPGFGDLN LHDQVHLLEC AWLEILMIGL VWRSMEH.PG
ESTR_RAT   .ELVHMINWA KRVPGFGDLN LHDQVHLLEC AWLEILMIGL VWRSMEH.PG
ESTR_SALIR .ELVHMIAWA KKVPGFQELS LHDQVQLLES SWLEVLMIGL IWRSIHC.PG
GCR_MOUSE  .QVIAAVKWA KAIPGFRNLH LDDQMTLLQY SWMFLMAFAL GWRSYRQASG
GCR_RAT    .QVIAAVKWA KAILGLRNLH LDDQMTLLQY SWMFLMAFAL GWRSYRQSSG
GCRA_HUMAN .QVIAAVKWA KAIPGFRNLH LDDQMTLLQY SWMFLMAFAL GWRSYRQSSA
MCR_HUMAN  .QMIQVVKWA KVLPGFKNLP LEDQITLIQY SWMCLSSFAL SWRSYKHTNS
MCR_RAT    .QMIQVVKWA KVLPGFKNLP LEDQITLIQY SWMCLSSFAL SWRSYKHTNS
PRGR_CHICK .QLLCVVKWS KLLPGFRNLH IDDQITLIQY SWMSLMVFAM GWRSYKHVSG
PRGR_HUMAN .QLLSVVKWS KSLPGFRNLH IDDQITLIQY SWMSLMVFGL GWRSYKHVSG
PRGR_MOUSE .QLLSVVKWS KSLPGFRNLH IDDQITLIQY SWMSLMVFGL GWRSYKHVSG
PRGR_RABIT .QLLSVVKWS KSLPGFRNLH IDDQITLIQY SWMSLMVFGL GWRSYKHVSG
 
           51                                                 100  
ANDR_HUMAN RMLYFAPDLV FNEYRMH.KS RMYSQCVRMR HLSQEFGWLQ ITPQEFLCMK
ANDR_MOUSE RMLYFAPDLV FNEYRMH.KS RMYSQCVRMR HLSQEFGWLQ ITPQEFLCMK
ANDR_RAT   RMLYFAPDLV FNEYRMH.KS RMYSQCVRMR HLSQEFGWLQ ITPQEFLCMK
ESTR_CHICK KLL.FAPNLL LDRNQGKCVE GMVEIFDMLL ATAARFRMMN LQGEEFVCLK
ESTR_HUMAN KLL.FAPNLL LDRNQGKCVE GMVEIFDMLL ATSSRFRMMN LQGEEFVCLK
ESTR_MOUSE KLL.FAPNLL LDRNQGKCVE GMVEIFDMLL ATSSRFRMMN LQGEEFVCLK
ESTR_RAT   KLL.FAPNLL LDRNQGKCVE GMVEIFDMLL ATSSRFRMMN LQGEEFVCLK
ESTR_SALIR KLI.FAQDLI LDRSEGDCVE GMAEIFDMLL ATVSRFGMLK LKPEEFVCLK
GCR_MOUSE  NLLCFAPDLI INEQRMT.LP CMYDQCKHML FISTELQRLQ VSYEEYLCMK
GCR_RAT    NLLCFAPDLI INEQRMS.LP CMYDQCKHML FVSSELQRLQ VSYEEYLCMK
GCRA_HUMAN NLLCFAPDLI INEQRMT.LP CMYDQCKHML YVSSELHRLQ VSYEEYLCMK
MCR_HUMAN  QFLYFAPDLV FNEEKMH.QS AMYELCQGMH QISLQFVRLQ LTFEEYTIMK
MCR_RAT    QLLYFAPDLV FNEEKMH.QS AMYELCQGMR QISLQFVRLQ LTFEEYSIMK
PRGR_CHICK QMLYFAPDLI LNEQRMK.ES SFYSLCLSMW QLPQEFVRLQ VSQEEFLCMK
PRGR_HUMAN QMLYFAPDLI LNEQRMK.ES SFYSLCLTMW QIPQEFVKLQ VSQEEFLCMK
PRGR_MOUSE QMLYFAPDLI LNEQRMK.EL SFYSLCLTMW QIPQEFVKLQ VTHEEFLCMK
PRGR_RABIT QMLYFAPDLI LNEQRMK.ES SFYSLCLTMW QIPQEFVKLQ VSQEEFLCMK
 
           101                                                150  
ANDR_HUMAN ALLLFSI... ....IPVDGL KNQKFFDELR MNYIKELDRI IACKRKNPTS
ANDR_MOUSE ALLLFSI... ....IPVDGL KNQKFFDELR MNYIKELDRI IACKRKNPTS
ANDR_RAT   ALLLFSI... ....IPVDGL KNQKFFDELR MNYIKELDRI IACKRKNPTS
ESTR_CHICK SIILLNSGVY TFLSSTLKSL EERDYIHRVL DKITDTLIHL MAKSGLSLQQ
ESTR_HUMAN SIILLNSGVY TFLSSTLKSL EEKDHIHRVL DKITDTLIHL MAKAGLTLQQ
ESTR_MOUSE SIILLNSGVY TFLSSTLKSL EEKDHIHRVL DKITDTLIHL MAKAGLTLQQ
ESTR_RAT   SIILLNSGVY TFLSSTLKSL EEKDHIHRVL DKINDTLIHL MAKAGLTLQQ
ESTR_SALIR AIILLNPGAF SFCSNSVESL HNSSAVESML DNITDALIHH ISHSGASVQQ
GCR_MOUSE  TLLLLSS... ....VPKEGL KSQELFDEIR MTYIKELGKA IVKREGNSSQ
GCR_RAT    TLLLLSS... ....VPKEGL KSQELFDEIR MTYIKELGKA IVKREGNSSQ
GCRA_HUMAN TLLLLSS... ....VPKDGL KSQELFDEIR MTYIKELGKA IVKREGNSSQ
MCR_HUMAN  VLLLLST... ....IPKDGL KSQAAFEEMR TNYIKELRKM VTKCPNNSGQ
MCR_RAT    VLLLLST... ....VPKDGL KSQAAFEEMR TNYIKELRKM VTKCPNSSGQ
PRGR_CHICK ALLLLNT... ....IPLEGL RSQSQFDEMR TSYIRELVKA IGLRQKGVVA
PRGR_HUMAN VLLLLNT... ....IPLEGL RSQTQFEEMR SSYIRELIKA IGLRQKGVVS
PRGR_MOUSE VLLLLNT... ....IPLEGL RSQSQFEEMR SSYIRELIKA IGLRQKGVVP
PRGR_RABIT VLLLLNT... ....IPLEGL RSQSQFEEMR SSYIRELIKA IGLRQKGVVS
 
           151                      176    
ANDR_HUMAN CSRRFYQLTK LLDSVQPIAR ELHQFT
ANDR_MOUSE CSRRFYQLTK LLDSVQPIAR ELHQFT
ANDR_RAT   CSRRFYQLTK LLDSVQPIAR ELHQFT
ESTR_CHICK QHRRLAQLLL ILSHIRHMSN KGMEHL
ESTR_HUMAN QHQRLAQLLL ILSHIRHMSN KGMEHL
ESTR_MOUSE QHRRLAQLLL ILSHIRHMSN KGMEHL
ESTR_RAT   QHRRLAQLLL ILSHIRHMSN KGMEHL
ESTR_SALIR QPRRQAQLLL LLSHIRHMSN KGMEHL
GCR_MOUSE  NWQRFYQLTK LLDSMHDVVE NLLSYC
GCR_RAT    NWQRFYQLTK LLDSMHEVVE NLLTYC
GCRA_HUMAN NWQRFYQLTK LLDSMHEVVE NLLNYC
MCR_HUMAN  SWQRFYQLTK LLDSMHDLVS DLLEFC
MCR_RAT    SWQRFYQLTK LLDSMHDLVS DLLEFC
PRGR_CHICK NSQRFYQLTK LMDSMHDLVK QLHLFC
PRGR_HUMAN SSQRFYQLTK LLDNLHDLVK QLHLY.
PRGR_MOUSE TSQRFYQLTK LLDSLHDLVK QLHLYC
PRGR_RABIT SSQRFYQLTK LLDNLHDLVK QLHLYC

****************************************************************************
*                                                                          *
*                                                                          *
*      PredictProtein@EMBL-Heidelberg.DE                                   *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                   *
*                                                                          *
*      Prediction of:			                                   *
*                                                                          *
*	- secondary structure,   		by PHDsec		   *
*	- solvent accessibility, 		by PHDacc		   *
*	- and helical transmembrane regions, 	by PHDhtm		   *
*                                                                          *
*      PHD: Profile fed neural network systems from HeiDelberg             *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             *
*                                                                          *
*      Author:             Burkhard Rost		                   *
*                          EMBL, Heidelberg, FRG                           *
*                          Meyerhofstrasse 1, 69 117 Heidelberg            *
*                          Internet: Predict-Help@EMBL-Heidelberg.DE       *
*                                                                          *
*      All rights reserved.                                                *
*                                                                          *
*                                                                          *
****************************************************************************
*                                                                          *
*                                                                          *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~	                   *
*      Secondary structure prediction by PHDsec:                           *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~	                   *
*                                                                          *
*      Author:             Burkhard Rost		                   *
*                          EMBL, Heidelberg, FRG                           *
*                          Meyerhofstrasse 1, 69 117 Heidelberg            *
*                          Internet: Rost@EMBL-Heidelberg.DE 		   *
*                                                                          *
*      All rights reserved.                                                *
*                                                                          *
*                                                                          *
****************************************************************************
*                                                                          *
*  About the network method                                                *
*  ~~~~~~~~~~~~~~~~~~~~~~~~                                                *
*                                                                          *
*  The network procedure is described in detail in:                        *
*  1) Rost, Burkhard; Sander, Chris:                                       *
*     Prediction of protein structure at better than 70% accuracy.         *
*     J. Mol. Biol., 1993, 232, 584-599.        	                   *
*                                                                          *
*  A brief description is given in:                                        *
*     Rost, Burkhard; Sander, Chris:                                       *
*     Improved prediction of protein secondary structure by use of se-     *
*     quence profiles and neural networks.                                 *
*     Proc. Natl. Acad. Sci. U.S.A., 1993, 90, 7558-7562.   		   *
*                                                                          *
*  The PHD mail server is described in:                                    *
*  2) Rost, Burkhard; Sander, Chris; Schneider, Reinhard:                  *
*     PHD - an automatic mail server for protein secondary structure       *
*     prediction.                                                          *
*     CABIOS, 1994, 10, 53-60.                                             *
*                                                                          *
*  The latest improvement steps (up to 72%) are explained in:              *
*  3) Rost, Burkhard; Sander, Chris:                                       *
*     Combining evolutionary information and neural networks to predict    *
*     protein secondary structure.                                         *
*     Proteins, 1994,  19, 55-72.                                          *
*                                                                          *
*  To be quoted for publications of PHD output:                            *
*     Papers 1-3 for the prediction of secondary structure and the pre-    *
*     diction server.                                                      *
*                                                                          *
****************************************************************************
*                                                                          *
*  About the input to the network                                          *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                         *
*                                                                          *
*  The prediction is performed by a system of neural networks.             *
*  The input is a multiple sequence alignment. It is taken from an HSSP    *
*  file (produced by the program MaxHom:                                   *
*     Sander, Chris & Schneider, Reinhard: Database of Homology-Derived    *
*     Structures and the Structural Meaning of Sequence Alignment.         *
*     Proteins, 1991, 9, 56-68.                                            *
*                                                                          *
*  For optimal results the alignment should contain sequences with varying *
*  degrees of sequence similarity relative to the input protein.           *
*  The following is an ideal situation:                                    *
*                                                                          *
*  +-----------------+----------------------+                              *
*  |   sequence:     |  sequence identity   |                              *
*  +-----------------+----------------------+                              *
*  | target sequence |  100 %               |                              *
*  | aligned seq. 1  |   90 %               |                              *
*  | aligned seq. 2  |   80 %               |                              *
*  |      ...        |   ...                |                              *
*  | aligned seq. 7  |   30 %               |                              *
*  +-----------------+----------------------+                              *
*                                                                          *
****************************************************************************
*                                                                          *
*  Estimated Accuracy of Prediction                                        *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  A careful cross validation test on some 250 protein chains (in total    *
*  about 55,000 residues) with less than 25% pairwise sequence identity    *
*  gave the following results:                                             *
*                                                                          *
*  ++================++-----------------------------------------+          *
*  || Qtotal = 72.1% ||      ("overall three state accuracy")   |          *
*  ++================++-----------------------------------------+          *
*                                                                          *
*  +----------------------------+-----------------------------+            *
*  | Qhelix (% of observed)=70% | Qhelix (% of predicted)=77% |            *
*  | Qstrand(% of observed)=62% | Qstrand(% of predicted)=64% |            *
*  | Qloop  (% of observed)=79% | Qloop  (% of predicted)=72% |            *
*  +----------------------------+-----------------------------+            *
*..........................................................................*
*                                                                          *
*  These percentages are defined by:                                       *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  |                    number of correctly predicted residues             *
*  |Qtotal =            ---------------------------------------      (*100)*
*  |                          number of all residues                       *
*  |                                                                       *
*  |                    no of res correctly predicted to be in helix       *
*  |Qhelix (% of obs) = -------------------------------------------- (*100)*
*  |                    no of all res observed to be in helix              *
*  |                                                                       *
*  |                                                                       *
*  |                    no of res correctly predicted to be in helix       *
*  |Qhelix (% of pred)= -------------------------------------------- (*100)*
*  |                    no of all residues predicted to be in helix        *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Averaging over single chains                                            *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                            *
*                                                                          *
*  The most reasonable way to compute the overall accuracies is the above  *
*  quoted percentage of correctly predicted residues.  However, since the  *
*  user is mainly interested in the expected performance of the prediction *
*  for a particular protein, the mean value when averaging over protein    *
*  chains might be of help as well.  Computing first the three state       *
*  accuracy for each protein chain, and then averaging over 250 chains     *
*  yields the following average:                                           *
*                                                                          *
*  +-------------------------------====--+                                 *
*  | Qtotal/averaged over chains = 72.2% |                                 *
*  +-------------------------------====--+                                 *
*  | standard deviation          =  9.3% |                                 *
*  +-------------------------------------+                                 *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Further measures of performance                                         *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                         *
*                                                                          *
*  Matthews correlation coefficient:                                       *
*                                                                          *
*  +---------------------------------------------+                         *
*  | Chelix = 0.63, Cstrand = 0.53, Cloop = 0.52 |                         *
*  +---------------------------------------------+                         *
*..........................................................................*
*                                                                          *
*  Average length of predicted secondary structure segments:               *
*                                                                          *
*  .           +------------+----------+                                   *
*  .           |  predicted | observed |                                   *
*  +-----------+------------+----------+                                   *
*  | Lhelix  = |    10.3    |    9.3   |                                   *
*  | Lstrand = |     5.0    |    5.3   |                                   *
*  | Lloop   = |     7.2    |    5.9   |                                   *
*  +-----------+------------+----------+                                   *
*..........................................................................*
*                                                                          *
*  The accuracy matrix in detail:                                          *
*                                                                          *
*  +---------------------------------------+                               *
*  |    number of residues with H, E, L    |                               *
*  +---------+------+------+------+--------+                               *
*  |         |net H |net E |net L |sum obs |                               *
*  +---------+------+------+------+--------+                               *
*  | obs H   |12447 | 1255 | 3990 |  17692 |                               *
*  | obs E   |  949 | 7493 | 3750 |  12192 |                               *
*  | obs L   | 2604 | 2875 |19962 |  25441 |                               *
*  +---------+------+------+------+--------+                               *
*  | sum Net |16000 |11623 |27702 |  55325 |                               *
*  +---------+------+------+------+--------+                               *
*                                                                          *
*  Note: This table is to be read in the following manner:                 *
*        12447 of all residues predicted to be in helix, were observed to  *
*        be in helix, 949 however belong to observed strands, 2604 to      *
*        observed loop regions.  The term "observed" refers to the DSSP    *
*        assignment of secondary structure calculated from 3D coordinates  *
*        of experimentally determined structures (Dictionary of Secondary  *
*        Structure  of Proteins: Kabsch & Sander (1983) Biopolymers, 22,   *
*        2577-2637).                                                       *
*                                                                          *
****************************************************************************
*                                                                          *
*  Position-specific reliability index                                     *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                     *
*                                                                          *
*  The network predicts the three secondary structure types using real     *
*  numbers from the output units. The prediction is assigned by choosing   *
*  the maximal unit ("winner takes all").  However, the real numbers       *
*  contain additional information.                                         *
*  E.g. the difference between the maximal and the second largest output   *
*  unit can be used to derive a "reliability index".  This index is given  *
*  for each residue along with the prediction.  The index is scaled to     *
*  have values between 0 (lowest reliability), and 9 (highest).            *
*  The accuracies (Qtot) to be expected for residues with values above a   *
*  particular value of the index are given below as well as the fraction   *
*  of such residues (%res).:                                               *
*                                                                          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*  | index|  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |    *
*  | %res |100.0| 99.2| 90.4| 80.9| 71.6| 62.5| 52.8| 42.3| 29.8| 14.1|    *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*  |      |     |     |     |     |     |     |     |     |     |     |    *
*  | Qtot | 72.1| 72.3| 74.8| 77.7| 80.3| 82.9| 85.7| 88.5| 91.1| 94.2|    *
*  |      |     |     |     |     |     |     |     |     |     |     |    *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*  | H%obs| 70.4| 70.6| 73.7| 77.1| 80.1| 83.1| 86.0| 89.3| 92.5| 96.4|    *
*  | E%obs| 61.5| 61.7| 63.7| 66.6| 69.1| 71.7| 74.6| 77.0| 77.8| 68.1|    *
*  |      |     |     |     |     |     |     |     |     |     |     |    *
*  | H%prd| 77.8| 78.0| 80.0| 82.6| 84.7| 86.9| 89.2| 91.3| 93.1| 95.4|    *
*  | E%prd| 64.5| 64.7| 67.8| 71.0| 74.2| 77.6| 81.4| 85.1| 89.8| 93.5|    *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*                                                                          *
*  The above table gives the cumulative results, e.g. 62.5% of all         *
*  residues have a reliability of at least 5.  The overall three-state     *
*  accuracy for this subset of almost two thirds of all residues is 82.9%. *
*  For this subset, e.g., 83.1% of the observed helices are correctly      *
*  predicted, and 86.9% of all residues predicted to be in helix are       *
*  correct.                                                                *
*                                                                          *
*..........................................................................*
*                                                                          *
*  The following table gives the non-cumulative quantities, i.e. the       *
*  values per reliability index range.  These numbers answer the question: *
*  how reliable is the prediction for all residues labeled with the        *
*  particular index i.                                                     *
*                                                                          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+          *
*  | index|  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |          *
*  | %res |  8.8|  9.5|  9.3|  9.1|  9.7| 10.5| 12.5| 15.7| 14.1|          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+          *
*  |      |     |     |     |     |     |     |     |     |     |          *
*  | Qtot | 46.6| 50.6| 57.7| 62.6| 67.9| 74.2| 82.2| 88.3| 94.2|          *
*  |      |     |     |     |     |     |     |     |     |     |          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+          *
*  | H%obs| 36.8| 42.3| 49.5| 55.2| 61.7| 69.9| 78.8| 87.4| 96.4|          *
*  | E%obs| 44.7| 44.5| 52.1| 55.4| 60.9| 68.0| 75.9| 81.0| 68.1|          *
*  |      |     |     |     |     |     |     |     |     |     |          *
*  | H%prd| 49.9| 52.5| 60.3| 64.2| 69.2| 77.5| 85.4| 89.9| 95.4|          *
*  | E%prd| 41.7| 47.1| 53.6| 57.0| 64.0| 71.6| 78.8| 88.8| 93.5|          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+          *
*                                                                          *
*  For example, for residues with Relindex = 5 64% of all predicted betha- *
*  strand residues are correctly identified.                               *
*                                                                          *
*                                                                          *
****************************************************************************
*                                                                          *
*                                                                          *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~		           *
*      Solvent accessibility prediction by PHDacc:                         *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~		           *
*                                                                          *
*      Author:             Burkhard Rost		                   *
*                          EMBL, Heidelberg, FRG                           *
*                          Meyerhofstrasse 1, 69 117 Heidelberg            *
*                          Internet: Rost@EMBL-Heidelberg.DE 		   *
*                                                                          *
*      All rights reserved.                                                *
*                                                                          *
*                                                                          *
****************************************************************************
*                                                                          *
*  About the network method                                                *
*  ~~~~~~~~~~~~~~~~~~~~~~~~                                                *
*                                                                          *
*  The network for prediction of secondary structure is described in       *
*  detail in:                                                              *
*     Rost, Burkhard; Sander, Chris:                                       *
*     Prediction of protein structure at better than 70% accuracy.         *
*     J. Mol. Biol., 1993, 232, 584-599.                                   *
*                                                                          *
*  The analysis of the prediction of solvent exposure is given in:         *
*     Rost, Burkhard; Sander, Chris:                                       *
*     Conservation and prediction of solvent accessibility in protein      *
*     families.  Proteins, 1994, 20, 216-226.                              *
*                                                                          *
*  To be quoted for publications of PHD exposure prediction:               *
*     Both papers quoted above.                                            *
*                                                                          *
****************************************************************************
*                                                                          *
*  Definition of accessibility                                             *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~                                             *
*                                                                          *
*  For training the residue solvent accessibility the DSSP (Dictionary of  *
*  Secondary Structure of Proteins; Kabsch & Sander (1983) Biopolymers, 22,*
*  2577-2637) values of accessible surface area have been used.  The       *
*  prediction provides values for the relative solvent accessibility.  The *
*  normalisation is the following:                                         *
*                                                                          *
*  |                           ACCESSIBILITY (from DSSP in Angstrom)       *
*  |RELATIVE_ACCESSIBILITY =   ------------------------------------- * 100 *
*  |                               MAXIMAL_ACC (amino acid type i)         *
*                                                                          *
*  where MAXIMAL_ACC (i) is the maximal accessibility of amino acid type i.*
*  The maximal values are:                                                 *
*                                                                          *
*  +----+----+----+----+----+----+----+----+----+----+----+----+           *
*  |  A |  B |  C |  D |  E |  F |  G |  H |  I |  K |  L |  M |           *
*  | 106| 160| 135| 163| 194| 197|  84| 184| 169| 205| 164| 188|           *
*  +----+----+----+----+----+----+----+----+----+----+----+----+           *
*  |  N |  P |  Q |  R |  S |  T |  V |  W |  X |  Y |  Z |                *
*  | 157| 136| 198| 248| 130| 142| 142| 227| 180| 222| 196|                *
*  +----+----+----+----+----+----+----+----+----+----+----+                *
*                                                                          *
*  Notation: one letter code for amino acid, B stands for D or N; Z stands *
*     for E or Q; and X stands for undetermined.                           *
*                                                                          *
*  The relative solvent accessibility can be used to estimate the number   *
*  of water molecules (W) in contact with the residue:                     *
*                                                                          *
*  W = ACCESSIBILITY /10                                                   *
*                                                                          *
*  The prediction is given in 10 states for relative accessibility, with   *
*                                                                          *
*  RELATIVE_ACCESSIBILITY = (PREDICTED_ACC * PREDICTED_ACC)                *
*                                                                          *
*  where PREDICTED_ACC = 0 - 9.                                            *
*                                                                          *
****************************************************************************
*                                                                          *
*  Estimated Accuracy of Prediction                                        *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  A careful cross validation test on some 238 protein chains (in total    *
*  about 62,000 residues) with less than 25% pairwise sequence identity    *
*  gave the following results:                                             *
*                                                                          *
*                                                                          *
*  Correlation                                                             *
*  ...........                                                             *
*                                                                          *
*  The correlation between observed and predicted solvent accessibility    *
*  is:                                                                     *
*                                                                          *
*  -----------                                                             *
*  corr = 0.53                                                             *
*  -----------                                                             *
*                                                                          *
*  This value ought to be compared to the worst and best case prediction   *
*  scenario: random prediction (corr = 0.0) and homology modelling         *
*  (corr = 0.66).  (Note: homology modelling yields a relative accurate    *
*  prediction in 3D if, and only if, a significantly identical sequence    *
*  has a known 3D structure.)                                              *
*                                                                          *
*                                                                          *
*  3-state accuracy                                                        *
*  ................                                                        *
*                                                                          *
*  Often the relative accessibility is projected onto, e.g., 3 states:     *
*     b  = buried       (here defined as < 9% relative accessibility),     *
*     i  = intermediate ( 9% <= rel. acc. < 36% ),                         *
*     e  = exposed      ( rel. acc. >= 36% ).                              *
*                                                                          *
*  A projection onto 3 states or 2 states (buried/exposed) enables the     *
*  compilation of a 3- and 2-state prediction accuracy.  PHD reaches an    *
*  overall 3-state accuracy of:                                            *
*     Q3 = 57.5%                                                           *
*  (compared to 35% for random prediction and 70% for homology modelling). *
*                                                                          *
*  In detail:                                                              *
*                                                                          *
*  +-----------------------------------+-------------------------+         *
*  | Qburied       (% of observed)=77% | Qb (% of predicted)=60% |         *
*  | Qintermediate (% of observed)= 9% | Qi (% of predicted)=44% |         *
*  | Qexposed      (% of observed)=78% | Qe (% of predicted)=56% |         *
*  +-----------------------------------+-------------------------+         *
*                                                                          *
*                                                                          *
*  10-state accuracy                                                       *
*  .................                                                       *
*                                                                          *
*  The network predicts relative solvent accessibility in 10 states, with  *
*  state i (i = 0-9) corresponding to a relative solvent accessibility of  *
*  i*i %.  The 10-state accuracy of the network is:                        *
*                                                                          *
*     Q10 = 24.5%                                                          *
*                                                                          *
*..........................................................................*
*                                                                          *
*  These percentages are defined by:                                       *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  |                     number of correctly predicted residues            *
*  |Q3 		      = ---------------------------------------      (*100)*
*  |                           number of all residues                      *
*  |                                                                       *
*  |                     no of res. correctly predicted to be buried       *
*  |Qburied (% of obs) = ------------------------------------------- (*100)*
*  |                     no of all res. observed to be buried              *
*  |                                                                       *
*  |                                                                       *
*  |                     no of res. correctly predicted to be buried       *
*  |Qburied (% of pred)= ------------------------------------------- (*100)*
*  |                     no of all residues predicted to be buried         *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Averaging over single chains                                            *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                            *
*                                                                          *
*  The most reasonable way to compute the overall accuracies is the above  *
*  quoted percentage of correctly predicted residues.  However, since the  *
*  user is mainly interested in the expected performance of the prediction *
*  for a particular protein, the mean value when averaging over protein    *
*  chains might be of help as well.  Computing first the correlation       *
*  between observed and predicted accessibility for each protein chan, and *
*  then averaging over all 238 chains yields the following average:        *
*                                                                          *
*  +-------------------------------====--+                                 *
*  | corr/averaged over chains   = 0.53  |                                 *
*  +-------------------------------====--+                                 *
*  | standard deviation          = 0.11  |                                 *
*  +-------------------------------------+                                 *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Further details of performance accuracy                                 *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                 *
*                                                                          *
*  The accuracy matrix in detail:                                          *
*  ..............................                                          *
*                                                                          *
* -------+----------------------------------------------------+----------- *
*  \ PHD |    0    1   2   3    4    5     6     7    8    9  |  SUM  %obs *
* -------+----------------------------------------------------+----------- *
* OBS  0 | 8611  140   8  44   82  169   772   334   27    0  | 10187 16.6 *
* OBS  1 | 4367  164   0  50  106  231   738   346   44    3  |  6049  9.8 *
* OBS  2 | 3194  168   1  68  125  303   951   513   42    7  |  5372  8.7 *
* OBS  3 | 2760  159   8  80  136  327  1246   746   58   19  |  5539  9.0 *
* OBS  4 | 2312  144   2  72  166  396  1615  1245  124   19  |  6095  9.9 *
* OBS  5 | 1873   96   3  84  138  425  1979  1834  187   27  |  6646 10.8 *
* OBS  6 | 1387   67   1  60   80  278  2237  2627  231   51  |  7019 11.4 *
* OBS  7 | 1082   35   0  32   56  225  1871  3107  302   60  |  6770 11.0 *
* OBS  8 |  660   25   0  27   43  136  1206  2374  325   87  |  4883  7.9 *
* OBS  9 |  325   20   2  27   29   74   648  1159  366  214  |  2864  4.7 *
* -------+----------------------------------------------------+----------- *
* SUM    |26571 1018  25 544  961 2564 13263 14285 1706  487  |            *
* %pred  | 43.3  1.7 0.0 0.9  1.6  4.2  21.6  23.3  2.8  0.8  |            *
* -------+----------------------------------------------------+----------- *
*                                                                          *
*  Note: This table is to be read in the following manner:                 *
*        8611 of all residues predicted to be in exposed by 0%, were       *
*        observed with 0% relative accessibility.  However, 325 of all     *
*        residues predicted to have 0% are observed as completely exposed  *
*        (obs = 9 -> rel. acc. >= 81%).  The term "observed" refers to the *
*        DSSP compilation of area of solvent accessibility calculated from *
*        3D coordinates of experimentally determined structures (Diction-  *
*        ary of Secondary Structure  of Proteins: Kabsch & Sander (1983)   *
*        Biopolymers, 22, 2577-2637).                                      *
*                                                                          *
*                                                                          *
*  Accuracy for each amino acid:                                           *
*  .............................                                           *
*                                                                          *
*  +---+------------------------------+-----+-------+------+               *
*  |AA |   Q3 b%o b%p i%o i%p e%o e%p | Q10 |  corr |    N |               *
*  +---+------------------------------+-----+-------+------+               *
*  | A | 59.0  87  60   2  38  66  57 |  31 | 0.530 | 5054 |               *
*  | C | 62.0  91  67   5  39  25  21 |  34 | 0.244 |  893 |               *
*  | D | 56.5  21  45   6  49  94  57 |  20 | 0.321 | 3536 |               *
*  | E | 60.8   9  40   3  41  98  61 |  21 | 0.347 | 3743 |               *
*  | F | 63.3  94  67   9  46  29  37 |  27 | 0.366 | 2436 |               *
*  | G | 52.1  75  51   1  31  67  53 |  22 | 0.405 | 4787 |               *
*  | H | 50.9  63  53  23  45  71  50 |  18 | 0.442 | 1366 |               *
*  | I | 64.9  95  68   6  41  30  38 |  34 | 0.360 | 3437 |               *
*  | K | 66.6   2  11   2  37  98  67 |  23 | 0.267 | 3652 |               *
*  | L | 61.6  93  65   8  44  31  40 |  31 | 0.368 | 5016 |               *
*  | M | 60.1  92  64   5  39  45  44 |  29 | 0.452 | 1371 |               *
*  | N | 55.5  45  45   8  38  87  59 |  17 | 0.410 | 2923 |               *
*  | P | 53.0  48  48   9  39  83  56 |  18 | 0.364 | 2920 |               *
*  | Q | 54.3  27  44   7  44  92  56 |  20 | 0.344 | 2225 |               *
*  | R | 49.9  15  47  36  47  76  51 |  18 | 0.372 | 2765 |               *
*  | S | 55.6  69  53   3  51  81  56 |  22 | 0.464 | 3981 |               *
*  | T | 51.8  61  51   8  38  78  53 |  21 | 0.432 | 3740 |               *
*  | V | 61.1  93  65   5  40  39  42 |  34 | 0.418 | 4156 |               *
*  | W | 56.2  85  62  20  49  29  27 |  21 | 0.318 |  891 |               *
*  | Y | 49.7  73  52  33  49  36  38 |  19 | 0.359 | 2301 |               *
*  +---+------------------------------+-----+-------+------+               *
*                                                                          *
*  Abbreviations:                                                          *
*                                                                          *
*  AA:   amino acid in one-letter code                                     *
*  b%o, i%o, e%o:   = Qburied, Qintermediate, Qexposed (% of observed),    *
*        i.e. percentage of correct prediction in each state, see above    *
*  b%p, i%p, e%p:   = Qburied, Qintermediate, Qexposed (% of predicted),   *
*        i.e. probability of correct prediction in each state, see above   *
*  b%o:  = Qburied (% of observed), see above                              *
*  Q10:  percentage of correctly predicted residues in each of the 10      *
*        states of predicted relative accessibility.                       *
*  corr: correlation between predicted and observed rel. acc.              *
*  N:    number of residues in data set                                    *
*                                                                          *
*                                                                          *
*  Accuracy for different secondary structure:                             *
*  ...........................................                             *
*                                                                          *
*  +--------+------------------------------+----+-------+-------+          *
*  | type   |   Q3 b%o b%p i%o i%p e%o e%p |Q10 |  corr |     N |          *
*  +--------+------------------------------+----+-------+-------+          *
*  | helix  | 59.5  79  64   8  44  80  56 | 27 | 0.574 | 20100 |          *
*  | strand | 61.3  84  73   9  46  69  37 | 35 | 0.524 | 13356 |          *
*  | loop   | 54.4  64  43  11  44  78  61 | 18 | 0.442 | 27968 |          *
*  +--------+------------------------------+----+-------+-------+          *
*                                                                          *
*  Abbreviations as before.                                                *
*                                                                          *
****************************************************************************
*                                                                          *
*  Position-specific reliability index                                     *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                     *
*                                                                          *
*  The network predicts the 10 states for relative accessibility using real*
*  numbers from the output units. The prediction is assigned by choosing   *
*  the maximal unit ("winner takes all").  However, the real numbers       *
*  contain additional information.                                         *
*  E.g. the difference between the maximal and the second largest output   *
*  unit (with the constraint that the second largest output is compiled    *
*  among all units at least 2 positions off the maximal unit) can be used  *
*  to derive a "reliability index".  This index is given for each residue  *
*  along with the prediction.  The index is scaled to have values between  *
*  0 (lowest reliability), and 9 (highest).                                *
*  The accuracies (Q3, corr, asf.) to be expected for residues with values *
*  above a particular value of the index are given below as well as the    *
*  fraction of such residues (%res).:                                      *
*                                                                          *
*  +---+------------------------------+----+-------+-------+               *
*  |RI |   Q3 b%o b%p i%o i%p e%o e%p |Q10 |  corr |  %res |               *
*  +---+------------------------------+----+-------+-------+               *
*  | 0 | 57.5  77  60   9  44  78  56 | 24 | 0.535 | 100.0 |               *
*  | 1 | 59.1  76  63   9  45  82  57 | 25 | 0.560 |  91.2 |               *
*  | 2 | 61.7  79  66   4  47  87  58 | 27 | 0.594 |  77.1 |               *
*  | 3 | 66.6  87  70   1  51  89  63 | 30 | 0.650 |  57.1 |               *
*  | 4 | 70.0  89  72   0  83  91  67 | 32 | 0.686 |  45.8 |               *
*  | 5 | 72.9  92  75   0   0  93  70 | 34 | 0.722 |  35.6 |               *
*  | 6 | 76.3  95  77   0   0  93  75 | 36 | 0.769 |  24.7 |               *
*  | 7 | 79.0  97  79   0   0  93  78 | 39 | 0.803 |  16.0 |               *
*  | 8 | 80.9  98  80   0   0  91  81 | 43 | 0.824 |   9.6 |               *
*  | 9 | 81.2  99  80   0   0  88  83 | 45 | 0.828 |   5.9 |               *
*  +---+------------------------------+----+-------+-------+               *
*                                                                          *
*  Abbreviations as before.                                                *
*                                                                          *
*  The above table gives the cumulative results, e.g. 45.8% of all         *
*  residues have a reliability of at least 4.  The correlation for this    *
*  most reliably predicted half of the residues is 0.686, i.e. a value     *
*  comparable to what could be expected if homology modelling were         *
*  possible.  For this subset of 45.8% of all residues, 89% of the buried  *
*  residues are correctly predicted, and 72% of all residues predicted to  *
*  be buried are correct.                                                  *
*                                                                          *
*..........................................................................*
*                                                                          *
*  The following table gives the non-cumulative quantities, i.e. the       *
*  values per reliability index range.  These numbers answer the question: *
*  how reliable is the prediction for all residues labeled with the        *
*  particular index i.                                                     *
*                                                                          *
*  +---+------------------------------+----+-------+-------+               *
*  |RI |   Q3 b%o b%p i%o i%p e%o e%p |Q10 |  corr |  %res |               *
*  +---+------------------------------+----+-------+-------+               *
*  | 0 | 40.9  79  40  16  41  21  40 | 14 | 0.175 |   8.8 |               *
*  | 1 | 45.4  61  46  28  44  48  44 | 17 | 0.278 |  14.1 |               *
*  | 2 | 47.4  53  52  10  46  80  44 | 19 | 0.343 |  19.9 |               *
*  | 3 | 52.9  75  59   4  50  77  47 | 23 | 0.439 |  11.4 |               *
*  | 4 | 60.0  81  63   0  83  84  56 | 25 | 0.547 |  10.1 |               *
*  | 5 | 65.2  82  70   0   0  93  62 | 28 | 0.607 |  10.9 |               *
*  | 6 | 71.3  90  72   0   0  94  70 | 31 | 0.692 |   8.8 |               *
*  | 7 | 76.0  94  76   0   0  95  75 | 34 | 0.762 |   6.3 |               *
*  | 8 | 80.5  97  81   0   0  94  79 | 39 | 0.808 |   3.8 |               *
*  | 9 | 81.2  99  80   0   0  88  83 | 45 | 0.828 |   5.9 |               *
*  +---+------------------------------+----+-------+-------+               *
*                                                                          *
*  For example, for residues with RI = 4 83% of all predicted intermediate *
*  residues are correctly predicted as such.                               *
*                                                                          *
*                                                                          *
****************************************************************************
*                                                                          *
*                                                                          *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             *
*      Prediction of helical transmembrane segments by PHDhtm:             *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             *
*                                                                          *
*      Author:             Burkhard Rost		                   *
*                          EMBL, Heidelberg, FRG                           *
*                          Meyerhofstrasse 1, 69 117 Heidelberg            *
*                          Internet: Rost@EMBL-Heidelberg.DE 		   *
*                                                                          *
*      All rights reserved.                                                *
*                                                                          *
*                                                                          *
****************************************************************************
*                                                                          *
*  About the network method                                                *
*  ~~~~~~~~~~~~~~~~~~~~~~~~                                                *
*                                                                          *
*  The PHD mail server is described in:                                    *
*     Rost, Burkhard; Sander, Chris; Schneider, Reinhard:                  *
*     PHD - an automatic mail server for protein secondary structure       *
*     prediction.                                                          *
*     CABIOS, 1994, 10, 53-60.     	                                   *
*                                                                          *
*  To be quoted for publications of PHDhtm output:                         *
*     Rost, Burkhard; Casadio, Rita; Fariselli, Piero; Sander, Chris:      *
*     Prediction of helical transmembrane segments at 95% accuracy.        *
*     Protein Science, 1995, 4, 521-533. 				   *
*                                                                          *
****************************************************************************
*                                                                          *
*  Estimated Accuracy of Prediction                                        *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  A cross validation test on 69 helical trans-membrane  proteins (in total*
*  about 30,000 residues) with less than 25% pairwise sequence identity    *
*  gave the following results:                                             *
*                                                                          *
*  ++================++-----------------------------------------+          *
*  || Qtotal = 94.7% ||      ("overall two state accuracy")     |          *
*  ++================++-----------------------------------------+          *
*                                                                          *
*  +----------------------------+-----------------------------+            *
*  | Qhelix (% of observed)=92% | Qhelix (% of predicted)=83% |            *
*  | Qloop  (% of observed)=96% | Qloop  (% of predicted)=97% |            *
*  +----------------------------+-----------------------------+            *
*                                                                          *
*..........................................................................*
*                                                                          *
*  These percentages are defined by:                                       *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  |                    number of correctly predicted residues             *
*  |Qtotal =            ---------------------------------------      (*100)*
*  |                          number of all residues                       *
*  |                                                                       *
*  |                    no of res correctly predicted to be in helix       *
*  |Qhelix (% of obs) = -------------------------------------------- (*100)*
*  |                    no of all res observed to be in helix              *
*  |                                                                       *
*  |                                                                       *
*  |                    no of res correctly predicted to be in helix       *
*  |Qhelix (% of pred)= -------------------------------------------- (*100)*
*  |                    no of all residues predicted to be in helix        *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Further measures of performance                                         *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                         *
*                                                                          *
*  Matthews correlation coefficient:                                       *
*                                                                          *
*  +---------------------------------------------+                         *
*  | Chelix = 0.84, Cloop = 0.84                 |                         *
*  +---------------------------------------------+                         *
*..........................................................................*
*                                                                          *
*  Average length of predicted secondary structure segments:               *
*                                                                          *
*  |           +------------+----------+                                   *
*  |           |  predicted | observed |                                   *
*  +-----------+------------+----------+                                   *
*  | Lhelix  = |    24.6    |   22.2   |                                   *
*  +-----------+------------+----------+                                   *
*..........................................................................*
*                                                                          *
*  The accuracy matrix in detail:                                          *
*                                                                          *
*  +---------------------------------+                                     *
*  |    number of residues with H, L |                                     *
*  +---------+------+-------+--------+                                     *
*  |         |net H | net L |sum obs |                                     *
*  +---------+------+-------+--------+                                     *
*  | obs H   | 5214 |   492 |   5706 |                                     *
*  | obs L   | 1050 | 22423 |  23473 |                                     *
*  +---------+------+-------+--------+                                     *
*  | sum Net | 6264 | 22915 |  29179 |                                     *
*  +---------+------+-------+--------+                                     *
*                                                                          *
*  Note: This table is to be read in the following manner:                 *
*        5214 of all residues predicted to be in a helical trans-membrane  *
*        region, were observed to be in the lipid bilayer, 1050 however    *
*        were observed either inside or outside of the protein, i.e. in    *
*        loop (or non-membrane) regions. The term "observed" refers to DSSP*
*        assignment of secondary structure calculated from 3D coordinates  *
*        of experimentally determined structures (Dictionary of Secondary  *
*        Structure  of Proteins: Kabsch & Sander (1983) Biopolymers, 22,   *
*        2577-2637) where these were available.  For all other proteins,   *
*        the assignment of trans-membrane segments has been taken from the *
*        Swissprot data bank (Bairoch, A.; Boeckmann, B.: The SWISS-PROT   *
*        protein sequence data bank. Nucl. Acids Res. 20: 2019-2022, 1992).*
*                                                                          *
*..........................................................................*
*                                                                          *
*  Overlap between predicted and observed segments:                        *
*                                                                          *
*  +-----------------+---------------+----------------+                    *
*  | segment overlap | % of observed | % of predicted |                    *
*  |   Sov helix     |      95.6%    |      95.5%     |                    *
*  |   Sov loop      |      83.6%    |      97.2%     |                    *
*  +-----------------+---------------+----------------+                    *
*  |   Sov total     |      86.0%    |      96.8%     |                    *
*  +-----------------+---------------+----------------+                    *
*                                                                          *
*        Definition of Sov in: Rost et al., JMB, 1994, 235, 13-26.         *
*                                                                          *
*        As helical trans-membrane segments are longer than globular heli- *
*        ces, correctly predicted segments can easily be made out.  PHDhtm *
*        misses 5 out of 258 observed segments, predicts 6 where non is    *
*        observed and 3 times the predicted helical segment overlaps two   *
*        observed regions.  Thus, in total more than 95% of all segments   *
*        are correctly predicted.                                          *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Entropy of prediction (information measure):                            *
*                                                                          *
*  +-----------------+                                                     *
*  | I = 0.64        |                                                     *
*  +-----------------+                                                     *
*                                                                          *
*        (For comparison: homology modelling of globular proteins in three *
*        states: I=0.62.)                                                  *
*        Definition of Sov in: Rost et al., JMB, 1994, 235, 13-26.         *
*                                                                          *
****************************************************************************
*                                                                          *
*  Position-specific reliability index                                     *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                     *
*                                                                          *
*  The network predicts two states: helical trans-membrane region and rest *
*  using two output units.  The prediction is assigned by choosing the ma- *
*  ximal unit ("winner takes all").  However, the real numbers of the out- *
*  put units contain additional information.                               *
*  E.g. the difference between the two output units can be used to derive  *
*  a "reliability index".  This index is given for each residue along with *
*  the prediction.  The index is scaled to have values between 0 (lowest   *
*  reliability), and 9 (highest).                                          *
*  The accuracies (Qtot) to be expected for residues with values above a   *
*  particular value of the index are given below as well as the fraction   *
*  of such residues (%res).:                                               *
*                                                                          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*  | index|  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |    *
*  | %res |100.0| 98.8| 97.3| 95.9| 94.1| 92.3| 89.9| 86.2| 75.0| 66.8|    *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*  |      |     |     |     |     |     |     |     |     |     |     |    *
*  | Qtot | 94.7| 95.2| 95.6| 96.2| 96.7| 97.2| 97.7| 98.4| 99.4| 99.8|    *
*  |      |     |     |     |     |     |     |     |     |     |     |    *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*  | H%obs| 91.8| 92.9| 93.8| 94.4| 95.0| 95.7| 96.2| 96.8| 95.5| 78.7|    *
*  | L%obs| 95.3| 95.7| 96.1| 96.6| 97.0| 97.5| 98.1| 98.8| 99.7|100.0|    *
*  |      |     |     |     |     |     |     |     |     |     |     |    *
*  | H%prd| 82.7| 83.8| 85.0| 86.7| 88.1| 89.7| 91.4| 93.8| 96.3| 97.1|    *
*  | L%prd| 97.9| 98.3| 98.5| 98.7| 98.8| 99.0| 99.2| 99.4| 99.7| 99.9|    *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*                                                                          *
*  The above table gives the cumulative results, e.g. 92.3% of all         *
*  residues have a reliability of at least 5.  The overall two-state       *
*  accuracy for this subset is 97.2%.  For this subset, e.g., 95.7% of     *
*  the observed helical trans-membrane residues are correctly predicted,   *
*  and 89.7% of all residues predicted to be in helical trans-membrane     *
*  segment are correct.                                                    *
*                                                                          *
*                                                                          *
*                                                                          *
****************************************************************************


The resulting network (PHD) prediction is:                             
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                             

****************************************************************************
*                                                                          *
*      PredictProtein@EMBL-Heidelberg.DE                                   *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                   *
*                                                                          *
*      PHD: Profile fed neural network systems from HeiDelberg             *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             *
*      Prediction of:			                                   *
*	- secondary structure,   		by PHDsec		   *
*	- solvent accessibility, 		by PHDacc		   *
*	- and helical transmembrane regions, 	by PHDhtm		   *
*                                                                          *
*      Author:             Burkhard Rost		                   *
*                          EMBL, Heidelberg, FRG                           *
*                          Meyerhofstrasse 1, 69 117 Heidelberg            *
*                          Internet: Predict-Help@EMBL-Heidelberg.DE       *
*      All rights reserved.                                                *
*                                                                          *
****************************************************************************
*                                                                          *
*  The network systems are described in:   		                   *
*                                                                          *
*  	PHDsec: B Rost & C Sander: JMB, 1993, 232, 584-599.		   *
*		B Rost & C Sander: Proteins, 1994, 19, 55-72.		   *
*	PHDacc:	B Rost & C Sander: Proteins, 1994, 20, 216-226.		   *
*	PHDhtm: B Rost, R Casadio, P Fariselli & C Sander, 		   *
*			Prot. Science,  4, 521-533.			   *
*                                                                          *
****************************************************************************
*                                                                          *
*    Some statistics                                                       *
*    ~~~~~~~~~~~~~~~                                                       *
*                                                                          *
*    Percentage of amino acids:                                            *
*    +--------------+--------+--------+--------+--------+--------+         *
*    | AA:          |    L   |    R   |    F   |    V   |    Q   |         *
*    | % of AA:     |   10.8 |    7.4 |    6.8 |    6.3 |    6.3 |         *
*    +--------------+--------+--------+--------+--------+--------+         *
*    | AA:          |    S   |    M   |    K   |    I   |    A   |         *
*    | % of AA:     |    5.7 |    5.7 |    5.7 |    4.5 |    4.5 |         *
*    +--------------+--------+--------+--------+--------+--------+         *
*    | AA:          |    N   |    D   |    Y   |    P   |    E   |         *
*    | % of AA:     |    4.0 |    4.0 |    3.4 |    3.4 |    3.4 |         *
*    +--------------+--------+--------+--------+--------+--------+         *
*    | AA:          |    T   |    H   |    G   |    W   |    C   |         *
*    | % of AA:     |    2.8 |    2.8 |    2.8 |    2.3 |    2.3 |         *
*    +--------------+--------+--------+--------+--------+--------+         *
*                                                                          *
*    Percentage of secondary structure predicted:                          *
*    +--------------+--------+--------+--------+                           *
*    | SecStr:      |    H   |    E   |    L   |                           *
*    | % Predicted: |   70.5 |    5.7 |   23.9 |                           *
*    +--------------+--------+--------+--------+                           *
*                                                                          *
*    According to the following classes:                                   *
*       all-alpha:   %H>45 and %E< 5; all-beta : %H<5 and %E>45            *
*       alpha-beta : %H>30 and %E>20; mixed:    rest,                      *
*    this means that the predicted class is:           mixed class         *
*                                                                          *
****************************************************************************
*                                                                          *
*    PHD output for your protein                                           *
*    ~~~~~~~~~~~~~~~~~~~~~~~~~~~                                           *
*                                                                          *
*    Wed Nov 15 05:35:31 1995                                              *
*    Jury on:       10    different architectures (version   5.94_317 ).   *
*    Note: differently trained architectures, i.e., different versions can *
*    result in different predictions.                                      *
*                                                                          *
****************************************************************************
*                                                                          *
*    About the protein                                                     *
*    ~~~~~~~~~~~~~~~~~                                                     *
*                                                                          *
*    HEADER                                                                *
*    COMPND                                                                *
*    SOURCE                                                                *
*    AUTHOR                                                                *
*    SEQLENGTH   176                                                       *
*    NCHAIN        1 chain(s) in ANDR_HUMAN data set                       *
*    NALIGN       16                                                       *
*    (=number of aligned sequences in HSSP file)                           *
*                                                                          *
****************************************************************************
*                                                                          *
*    Abbreviations: PHDsec                                                 *
*    ~~~~~~~~~~~~~~~~~~~~~                                                 *
*                                                                          *
*    sequence:                                                             *
*       AA : amino acid sequence                                           *
*    secondary structure:                                                  *
*       HEL: H=helix, E=extended (sheet), blank=other (loop)               *
*       PHD: Profile network prediction HeiDelberg                         *
*       Rel: Reliability index of prediction (0-9)                         *
*    detail:                                                               *
*       prH: 'probability' for assigning helix                             *
*       prE: 'probability' for assigning strand                            *
*       prL: 'probability' for assigning loop                              *
*            note: the 'probabilites' are scaled to the interval 0-9, e.g.,*
*                  prH=5 means, that the first output node is 0.5-0.6      *
*    subset:                                                               *
*       SUB: a subset of the prediction, for all residues with an expected *
*            average accuracy > 82% (tables in header)                     *
*            note: for this subset the following symbols are used:         *
*         L: is loop (for which above " " is used)                         *
*       ".": means that no prediction is made for this residue, as the     *
*            reliability is:  Rel < 5                                      *
*                                                                          *
*    Abbreviations: PHDacc                                                 *
*    ~~~~~~~~~~~~~~~~~~~~~                                                 *
*                                                                          *
*    solvent accessibility:                                                *
*       3st: relative solvent accessibility (acc) in 3 states:             *
*            b = 0-9%, i = 9-36%, e = 36-100%.                             *
*       PHD: Profile network prediction HeiDelberg                         *
*       Rel: Reliability index of prediction (0-9)                         *
*       P_3: predicted relative accessibility in 3 states                  *
*            note: for convenience a blank is used intermediate (i).       *
*       10st:relative accessibility in 10 states:                          *
*            = n corresponds to a relative acc. of n*n %                   *
*    subset:                                                               *
*       SUB: a subset of the prediction, for all residues with an expected *
*            average correlation > 0.69 (tables in header)                 *
*            note: for this subset the following symbols are used:         *
*       "I": is intermediate (for which above " " is used)                 *
*       ".": means that no prediction is made for this residue, as the     *
*            reliability is: Rel < 4                                       *
*                                                                          *
*                                                                          *
*    Abbreviations: PHDhtm                                                 *
*    ~~~~~~~~~~~~~~~~~~~~~                                                 *
*                                                                          *
*    secondary structure:                                                  *
*       HL:  T=helical transmembrane region, blank=other (loop)            *
*       PHD: Profile network prediction HeiDelberg                         *
*       PHDF:filtered prediction, i.e., too long transmembrane segments    *
*            are split, too short ones are deleted                         *
*       Rel: Reliability index of prediction (0-9)                         *
*    detail:                                                               *
*       prH: 'probability' for assigning helical transmembrane region      *
*       prL: 'probability' for assigning loop                              *
*            note: the 'probabilites' are scaled to the interval 0-9, e.g.,*
*                  prH=5 means, that the first output node is 0.5-0.6      *
*    subset:                                                               *
*       SUB: a subset of the prediction, for all residues with an expected *
*            average accuracy > 82% (tables in header)                     *
*            note: for this subset the following symbols are used:         *
*         L: is loop (for which above " " is used)                         *
*       ".": means that no prediction is made for this residue, as the     *
*            reliability is:  Rel < 5                                      *
*                                                                          *
****************************************************************************
*                                                                          *
*    protein:       ANDR_HU        length      176                         *
*                                                                          *
                  ....,....1....,....2....,....3....,....4....,....5....,....6
         AA      |.QLVHVVKWAKALPGFRNLHVDDQMAVIQYSWMGLMVFAMGWRSFTNVNSRMLYFAPDLV|
         PHD sec | EHHHHHHHHHH         HHHHHHHHHHHHHHHHHHHHHHHHE     EEEE  HHH|
         Rel sec |911122235753253333451114345489999999997557522125421125321332|
 detail: 
         prH sec |024455566876423322113455566688899999998667644322232321134554|
         prE sec |033433331000000122212001222210000000000101133431013346511112|
         prL sec |931100001123575545663442110100000000001221121246654221244332|
 subset: SUB sec |L.......HHH..L.....L......H.HHHHHHHHHHHHHHH....L.....E......|
 
 ACCESSIBILITY 
 3st:    P_3 acc |eebbbbbebbeebeeb ebbbeeebbbbebbbbbbbbbbbbbebbe bbbebbbbbbebb|
 10st:   PHD acc |970000060076077056000776000060000000000000600750006000000600|
         Rel acc |744318323561033211202331605512436286755431241513012651451124|
 subset: SUB acc |eeb..b...be.............b.bb..b.b.bbbbbb...b.e.....bb.bb...b|
                  ....,....7....,....8....,....9....,....10...,....11...,....12
         AA      |FNEYRMH.KSRMYSQCVRMRHLSQEFGWLQITPQEFLCMKALLLFSI.......IPVDGL|
         PHD sec |H       HHHHHHHHHHHHHHHHHHHHHH  HHHHHHHHHHHHHH    EEEE  HHH |
         Rel sec |131225425889999999998668875522123676787555455257521122151314|
 detail: 
         prH sec |434332236888999999988778876644335677787677676421222321115542|
         prE sec |100010110000000000000000001122220112101222322210123444310000|
         prL sec |355556642110000000001121011222343100000000000267643233464346|
 subset: SUB sec |.....L..HHHHHHHHHHHHHHHHHHHH.....HHHHHHHHH.HH.LLL......L....|
 
 ACCESSIBILITY 
 3st:    P_3 acc |beeeebebbbbbbbbbbbbbbbbeebbebebebeebbbbebbbbbbeebbbbbbbbbeeb|
 10st:   PHD acc |077860700000000000000006600606060760000600000077000000002770|
         Rel acc |244322550026006702511321142231220513455177665033423652310630|
 subset: SUB acc |.ee...eb...b..bb..b......b.......e..bbb.bbbbb...b..bb....e..|
                  ....,....13...,....14...,....15...,....16...,....17...,....18
         AA      |KNQKFFDELRMNYIKELDRIIACKRKNPTSCSRRFYQLTKLLDSVQPIARELHQFT|
         PHD sec |  HHHHHHHHHHHHHHHHHHHHHHH    HHHHHHHHHHHHHHHHHHHHHHHH   |
         Rel sec |61348999999999999999987514544468999999999999999999985159|
 detail: 
         prH sec |24668999999999999999987643233678999999999999999999986420|
         prE sec |00000000000000000000000000000000000000000000000000000000|
         prL sec |75331000000000000000001246766321000000000000000000012469|
 subset: SUB sec |L...HHHHHHHHHHHHHHHHHHHH..L...HHHHHHHHHHHHHHHHHHHHHHH.LL|
 
 ACCESSIBILITY 
 3st:    P_3 acc |eeeebbeebbeebbeebbebbbeeeeebeeebeebbebbebbebb ebbeebbe b|
 10st:   PHD acc |77670077006700670070007677706660660060060070057007700750|
         Rel acc |63140345501424236454534133401111124026315230413454441400|
 subset: SUB acc |e..e..eeb..e.b..bbebb.e...e.......b..b..b...b..bbeeb.e..|