Previous - Next - Bottom - PP home - PP help TOC

PP Help 01: Introduction

Contents

WHAT IS IT? HOW TO USE IT?

What is PredictProtein (PP)?

PP is an automatic service for protein database searches and the prediction of aspects of protein structure. You send an amino acid sequence and PP returns:

a multiple sequence alignment (i.e. database search),
ProSite sequence motifs (more info),
low-complexity retions (SEG) ( more info),
ProDom domain assignments (more info),
Nuclear localisation signals ( more info),
and predictions of
1. secondary structure (more info),
2. solvent accessibility (more info),
3. globular regions ( more info),
4. transmembrane helices (more info),
5. coiled-coil regions ( more info).
6. structural switch regions ( more info).

The following features are available upon request:

fold recognition by prediction-based threading (more info):
PDB is searched for possible remote homologues (sequence identity 0-25%) to your sequence,
evaluation of prediction accuracy (more info):
for a given predicted and observed secondary structure (for one or several proteins), per-residue and per-segment scores are compiled.

For all services, you can submit your sequence (or prediction) either by electronic mail, or interactively from World Wide Web.

How does PredictProtein work?

Generating an alignment. The following steps are performed.

the sequence database (currently only SWISSPROT) is scanned for similar sequences (by BLASTP).
a multiple sequence alignment is generated by a weighted dynamic programming method (by MaxHom).
ProSite motifs are retrieved from the ProSite database,
low-complexity regions (e.g. composition bias) are marked by the program SEG,
and your protein is compared to a domain database (ProDom),

Prediction of protein structure in 1D. The multiple alignment is used as input for profile-based neural network predictions (PHD methods). The following levels of prediction accuracy have been evaluated in cross-validation experiments:

Secondary structure prediction (PHDsec or PROFsec):
expected three-state (helix, strand, rest) overall accuracy >72% (PHD) >76% (PROF) for water-soluble globular proteins. For an automatic, continuous comparison of prediction accuracy to other programs see EVA.
You may find details about accuracy in graphs, on tables, and in the literature: Rost 1997 (paper) and 1996 (paper); Rost & Sander 1993 (abstract) and 1994 (abstract).
Solvent accessibility prediction (PHDacc or PROFacc):
expected correlation between observed and predicted relative accessibility > 0.5.
You may find details about accuracy in graphs, on tables, and in the literature: Rost 1997 (paper) and 1996 (paper), Rost & Sander 1994 (abstract).
Transmembrane helix prediction (PHDhtm):
expected overall two-state accuracy (transmembrane, non-transmembrane) > 95%; refined prediction of transmembrane helices and topology & expected likelihood of predicting all helices correctly about 89%, expected accuracy of topology prediction > 86%
You may find details about accuracy on tables, and in the literature: Rost, Casadio & Fariselli 1996 (abstract), and Rost, Casadio, Fariselli & Sander 1995 abstract).

Fold recognition by prediction-based threading. Predictions of secondary structure and accessibility are aligned against PDB to detect remote homologues (prediction-based threading). As for other threading methods, results should be taken with caution.

The first hit of the prediction-based threading is on average in 30% of the cases correct.
Hits with z-scores above 3.0 are more reliable (accuracy > 60%).
For exceptional cases the resulting alignments suffice for building correct homology-based models.

You may find details about accuracy in the literature: Rost, Schneider & Sander, 1996 (paper), Rost 1995 (abstract) and 1994 (abstract).

Evaluation of prediction accuracy. If you opt for 'evaluate prediction accuracy', we evaluate the accuracy of the secondary structure prediction provided by you. The following per-residue and per-segment scores are returned: overall three-state accuracy, single state accuracy, correlation coefficients, information entropy, fractional segment overlap, and finally the accuracy of predicting secondary structure content and structural class (Rost et al., JMB, 1994, 235, 13-26, example for output).

What is META-PP?

META-PP provides a single-page interface to various World Wide Web services for sequence analysis (list of servers available at the moment). 'Single-page interface' means that you fill in your sequence only once, and can select any number of a list of services. For each selected service, you will receive the results by email. Currently, the following features of sequence analysis are covered by META-PP:

signal peptides
cleavage sites
O-glycosylation sites
cleavage sites of picornaviral proteases
chloroplast transit peptides and cleavage sites
secondary structure prediction
membrane helix prediction
threading, or remote homology modelling (searching for proteins of known 3D structure that appear structurally similar to your protein)
database searches
homology modelling (prediction of protein 3D structure by homology to a sequence similar protein of known structure) NOTE: this will only work if there is a protein of known structure that has sufficient sequence similarity to your protein!

How to use PP and META-PP?

license

Using email (internet, not for META):
1. Prepare a file with your sequence(s) according to the required format (see below), and:
  Send sequence(s) to: PredictProtein@columbia.edu
2. Send questions to: predict_help@columbia.edu
3. Send problem reports to: pp_admin@columbia.edu
Using World Wide Web (WWW):
1. Home page: http://cubic.bioc.columbia.edu/predictprotein/predictprotein.html
2. Help page (this): http://cubic.bioc.columbia.edu/predictprotein/doc/help_entry.html
3. Submit request to PP:
  
  default: http://cubic.bioc.columbia.edu/predictprotein/submit_def.html
  advanced: http://cubic.bioc.columbia.edu/predictprotein/submit_adv.html
  expert: http://cubic.bioc.columbia.edu/predictprotein/submit_exp.html
4. Submit request to META-PP:
  
  default: http://cubic.bioc.columbia.edu/predictprotein/submit_meta.html
5. Questions, feedback: http://cubic.bioc.columbia.edu/predictprotein/send_feedback.html

What can we do for you?

You have a protein sequence and want to find out anything we can say about structure?
In general, we can provide multiple sequence alignments and predictions of secondary structure, residue solvent accessibility and the location of transmembrane helices (examples for: request; and output).

You have a helical transmembrane protein sequence and want a refined prediction of the helix locations and topology?
We provide multiple sequence alignments and refined predictions for the location of transmembrane helices and for the topology, i.e. the orientation of the N-term with respect to the membrane (examples for: request; and output).

You have a protein sequence and search for remote homologues (i.e., homologues with <25% sequence identity)?
We find secondary structure and accessibility motifs similar between a known structure and your protein by prediction-based threading (examples for: request; and output).

You have a multiple sequence alignment and want to obtain a prediction of 1D structure based on that alignment?
We use your alignment as input to the methods predicting secondary structure, solvent accessibility and transmembrane helices (examples for: request; and output).

You have a list of sequences not in current data bases and want it to be used for 1D predictions?
We align your sequences and use the resulting alignment as input to predictions of secondary structure, accessibility and transmembrane helices (examples for: request; and output).

You have a prediction of secondary structure and accessibility and search similar motifs in known structures?
We base the threading procedure on your prediction (examples for: request; and output).

You have a prediction and an observation of secondary structure and you want to compile the prediction accuracy?
We compile per-residue and per-segment based score for the evaluation of prediction accuracy (examples for: request; and output).

Responsible for PP

Burkhard Rost, CUBIC, Biochemistry, Columbia University, New York

Previous - Next - Top - PP home - PP help TOC