TOPITS: Threading One-dimensional Predictions Into Three-dimensional Structures

Rost, Burkhard

In: Rawlings, C.; Clark, D.; Altman, R.; Hunter, L. Lengauer, T.; & Wodak, S. (eds.) "The third international conference on Intelligent Systems for Molecular Biology (ISMB)", Cambridge, U.K., Jul 16-19, 1995, Menlo Park, CA: AAAI Press, 314-321.


Homology modelling, currently, is the only theoretical tool which can successfully predict protein 3D structure. As 3D structure is well conserved within sequence families, homology modelling allows to predict 3D structure for 20% of the SWISSPROT proteins. 20% of the proteins in are remote homologues to another PDB protein, i.e. the structures are homologous but pairwise sequence identity is not significant. Threading techniques attempt to predict such remote homologues based on sequence information to thus increase the scope of homology modelling. Here, a new threading method is presented. First, for a list of PDB proteins, 3D structure was projected onto 1D strings of secondary structure and relative solvent accessibility. Then, secondary structure and solvent accessibility were predicted by neural network systems (PHD) for a search sequence. Finally, the predicted and observed 1D strings were aligned by dynamic programming. The resulting alignment was used to detect remote 3D homologues. Four results stand out. First, even for an optimal prediction of 1D strings (taken from PDB), only about half the hits that ranked above a given threshold were correctly identified as remote homologues; only about 25% of the first hits were correct. Second, real predictions (PHD) were not much worse: about 20% of the first hits were correct. Third, a simple filtering procedure improved prediction performance to about 30% correct first hits. With such a filter, the correct hit ranked among the first three for more than 23 out of 46 cases. Fourth, the combination of the 1D threading and sequence alignments markedly improved the performance of the threading method TOPITS for some selected cases. Keywords: protein structure prediction, threading, remote homologues, secondary structure, solvent accessibility, multiple alignments, dynamic programming, neural networks