**************************************************************************** * * * PredictProtein@EMBL-Heidelberg.DE * * Prediction of helical transmembrane regions by PHDhtm * * * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * Refined prediction of the location and topology for * * transmembrane helices by PHDtopology * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * * * Author: Burkhard Rost * * EMBL, Heidelberg, FRG * * Meyerhofstrasse 1, 69 117 Heidelberg * * Internet: Rost@EMBL-Heidelberg.DE * * * * All rights reserved. * * * * * **************************************************************************** * * * Please quote * * ~~~~~~~~~~~~ * * * * The PredictProtein mail server is described in: * * B Rost: PHD: predicting one-dimensional protein structure by pro- * * file based neural networks. Meth. in Enzym., 1996, 266, 525-539. * * (Text) * * * * Additionally to be quoted for publications of PHDtopology output: * * B Rost, R Casadio & P Fariselli: Refining neural network predic- * * tions for helical transmembrane proteins by dynamic programming. * * In: D States et al. (eds.) "The fourth international conference * * Intelligent Systems for Molecular Biology (ISMB)", St. Louis, * * U.S.A., Jun 1996, Menlo Park, CA: AAAI Press, in press. * * (Abstract) * * * * A more thorough evaluation of PHDtopology is to be found in: * * B Rost, P Fariselli & R Casadio: Topology prediction for helical * * transmembrane proteins at 86% accuracy. Preprint, EMBL, 69012 * * Germany, PDG-03/96, 1996. * * (Abstract) * * * * * **************************************************************************** * * * Definition of topology * * ~~~~~~~~~~~~~~~~~~~~~~ * * * * The topology of integral membrane proteins with transmembrane helices * * describes the orientation of the helices with respect to the membrane: * * OUT: first residue (N-term) starting extra-cytoplasmic, i.e. outside * * of the membrane * * IN: first residue starting intra-cytoplasmic, i.e. inside. * * * * * **************************************************************************** * * * Estimated Accuracy of Prediction * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * * * The method was evaluated on 131 helical transmembrane proteins in * * cross-validation experiments, i.e., such that no protein used for * * setting up the method had more than 25% sequence identity to any * * protein used for deriving the estimates for performance accuracy. * * For all integral membrane proteins used for the evaluation the * * knowledge about the helix locations and the topology were known by * * experiment. * * * * Results of test on 131 proteins: * * * * +----------+------------------------------------------------------+ * * | 539 | number of transmembrane helices (HTM's) observed | * * | 552 | number of HTM's predicted | * * | 533 | number of HTM's predicted correctly, i.e. with an | * * | | overlap of more than 3 residues to observed HTM's | * * +----------+------------------------------------------------------+ * * | 99% | percentage of residues correctly predicted/observed | * * | 97% | percentage of residues correctly predicted/predicted | * * +----------+------------------------------------------------------+ * * * * ++========++------------------------------------------------------+ * * || 89% || percentage of proteins for which all HTM's were | * * || || predicted correctly | * * ++========++------------------------------------------------------+ * * || 86% || percentage of proteins with correctly predicted | * * || || topology | * * ++========++------------------------------------------------------+ * * * * Note: The error for the estimates of correctly predicting all HTM's * * (89%) and for correctly predicting topology (86%) have an ex- * * pected error of 6% (two standard deviations of binomial dis- * * tribution). In other words, given your protein, you can * * estimate your chance that the prediction is correct for all * * HTM's as 83%-95%; and that the prediction of topology is cor- * * rect as 81%-91%. * * * *..........................................................................* * * * Eukaryotes: * * The expected accuracy is higher than average for eukaryotic proteins: * * 94% correct prediction of all HTM's, * * 90% correct prediction of topology. * * * * Prokaryotes: * * The expected accuracy is lower than average for prokaryotic proteins: * * 76% correct prediction of all HTM's, * * 73% correct prediction of topology. * * * * Viral proteins: * * We evaluated PHDtopology only on five viral proteins. For all five * * prediction accuracy was 100%. * * * * Note: The estimates for prokaryotes are based on fewer proteins, thus * * the estimated error is 18% (two standard deviations). * * The result for the five viral proteins can, at best, be seen as * * a trend, as five proteins are much too few for deriving general * * estimates for prediction accuracy. * * * *..........................................................................* * * * Average length of transmembrane helices: * * * * | +------------+----------+ * * | | predicted | observed | * * +-----------+------------+----------+ * * | Lhelix = | 20.5 | 22.3 | * * +-----------+------------+----------+ * * * * * **************************************************************************** * * * Protein-specific reliability indices * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * * * We empirically favoured the definition of two indices for the reliabi- * * lity of the correctness of the prediction for all helices and the pre- * * diction of topology. Both indices are normalised to integer values * * between 0 (low) and 9 (high). The following results are based on 131 * * proteins. * * * * Reliability of predicting all HTM's correctly: * * * * +-----------+------+-----+-----+-----+-----+-----+-----+-----+-----+ * * | Ri(model) | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 9 | * * | | | | | | | | | | | * * | Nprot | 131 | 117 | 83 | 66 | 56 | 40 | 25 | 17 | 9 | * * | Ncorr | 117 | 108 | 79 | 65 | 55 | 39 | 25 | 17 | 9 | * * | | | | | | | | | | | * * | %prot | 100 | 89 | 63 | 50 | 42 | 30 | 19 | 12 | 6 | * * | %correct | 89 | 92 | 95 | 98 | 98 | 97 | 100 | 100 | 100 | * * +-----------+------+-----+-----+-----+-----+-----+-----+-----+-----+ * * * * Reliability of correctly predicting topology * * * +-----------+------+-----+-----+-----+-----+-----+-----+-----+-----+ * * | Ri(top) | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 9 | * * | | | | | | | | | | | * * | Nprot | 131 | 124 | 109 | 97 | 83 | 52 | 21 | 12 | 5 | * * | Ncorr | 113 | 110 | 99 | 89 | 79 | 49 | 19 | 12 | 5 | * * | | | | | | | | | | | * * | %prot | 100 | 94 | 83 | 74 | 63 | 39 | 16 | 9 | 3 | * * | %correct | 86 | 88 | 90 | 91 | 95 | 94 | 90 | 100 | 100 | * * +-----------+------+-----+-----+-----+-----+-----+-----+-----+-----+ * * * * Abbreviations: * * Nprot cumulative number of proteins predicted at a reliability * * index larger or equal n, with n = 0, ..., 9. * * Ncorr cumulative number of proteins predicted correctly at a * * reliability index larger or equal n, with n = 0, ..., 9. * * %prot =100*(Nprot/131), i.e. percentage of proteins predicted. * * %corr =100*(Ncorr/131), i.e. percentage of proteins predicted * * correctly. * * * * The table above gives the cumulative results, e.g. 50% of all proteins * * are predicted at a reliability index Ri(model) >= 3; for 98% of these * * all transmembrane helices are predicted correctly. Similarly, 63% of * * the proteins were predicted with an index Ri(top) >= 4; for 95% of * * these the prediction for topology was correct. * * * * Ri(model) and Ri(top) are combined in the following sense. In our test * * analysis proteins for which the topology prediction was wrong despite * * a relatively high value for the reliability index (Ri(top)>3), were in * * almost all cases predicted with the wrong number of HTM's. * * * * * ****************************************************************************