Topology prediction for helical transmembrane proteins at 86% accuracy

Rost, Burkhard; Fariselli, Piero & Casadio, Rita

Protein Science, 1996, 5, 1704-1718.


Previously, we introduced a neural network system predicting locations of transmembrane helices based on evolutionary profiles (PHDhtm (Rost et al., 1995) . Here, we describe an improvement and an extension of that system. The improvement is achieved by a dynamic programming-like algorithm that optimises helices compatible with the neural network output. The extension is the prediction of topology (orientation of first loop region with respect to membrane) by applying the observation that positively charged residues are more abundant in extra-cytoplasmic regions to the refined prediction of all transmembrane helices. Furthermore, we introduce a method to reduce the number of false positives, i.e., proteins falsely predicted with membrane helices. The evaluation of prediction accuracy is based on a cross-validation and a double-blind test set (in total 131 proteins). The final method appears to be more accurate than other methods published. (1) For almost 89% (+/- 3%) of the test proteins all transmembrane helices are predicted correctly. (2) For more than 86% (+/- 3%) of the proteins topology is predicted correctly. (3) We define reliability indices which correlate with prediction accuracy: for one half of the proteins segment accuracy rises to 98%; and for two-thirds accuracy of topology prediction is 95%. (4) The rate of proteins for which transmembrane helices are predicted falsely is below 2% (+/- 2%). Finally, the method is applied to 1616 sequences of Haemophilus influenzae. We predict 19% of the genome sequences to contain one or more transmembrane helices. This appears to be lower than what we predicted previously for the yeast VIII chromosome (about 25%).