Brief explanation of PredictProtein options

Default programs and thresholds run by PredictProtein

  1. Blast: fast database search (cite)
    output not appended if not explicitly requested (see advanced options)
  2. Maxhom: dynamic programming based multiple sequence alignment (cite)
  3. ProSite: scanning for functional motifs annotated by experts (cite)
    reported only if hit found
  4. SEG: detection of composition-biased regions) (cite)
    reported only if more than 10 residues of low-complexity found
  5. ProDom: scanning for the putative domain structure for your protein (cite)
    reported only if hit found
  6. Coils: prediction of coiled-coil regions (cite)
    reported only if hit found
  7. GLOBE: prediction of globular regions
  8. CYSPRED: prediction of bound cysteines (cite)
    reported only if hit found
  9. PHDsec: prediction of secondary structure (cite)
  10. PHDacc: prediction of solvent accessibility (cite)
  11. PHDhtm: prediction of transmembrane helices and their topology (cite)
    reported only if hit found

    NOTE: by default, the threshold for what is considered to be a membrane helix is rather restrictive. This has two consequences:
    1. almost no false positves (proteins identified to contain membrane helices that do actually NOT contain membrane helices),
    2. some membrane proteins may be missed
    If you want to check with a higher sensitivity whether or not your protein does is likely to contain membrane helices, please make use of the advanced prediction option:
    'transmembrane helices (PHDhtm)'



Methods available upon request

  1. TOPITS: prediction-based threading (detection of remote homologies) (cite)
  2. EvalSec: evaluation of secondary structure prediction accuracy (cite)

Default submission form

  1. Your email Example: rost@cubic.bioc.columbia.edu
    Your entire (and entirely correct) email address (e.g. rost@cubic.bioc.columbia.edu).
    Note: typos will result in that we shall not be ablet to return the results.
  2. Password Example: (i.e. leave field empty!)
    Using PredictProtein is free for academical users. Only companies have to fill in their password, here!
  3. Return results in HTML example
    The email you get will have the entire results attached in one HTML formatted file (which you can load into any WWW browser). Alignments and ProDom results are displayed using the program MView developed by (A HREF="http://mathbio.nimr.mrc.ac.uk/~nbrown/">Nigel Brown (MRC, Mill Hill, London).
    NOTE: the option HTML for printouts results in that the file will contain a format which you can directly print out (the default is viewable by WWW browsers, but not printable, since there are too many characters per line!).
  4. Store results here, return no mail output
    We shall not returned the results by mail. Instead, the results for your requests will be stored on our machines for 3 days, and you will receive a mail that simply tells you how you can ftp the result from here. The reason for including this option is that some requests may result in very large output files, and those may be difficult to handle for your local mailing device (in particular when you request HTML formatted output).
  5. One-line name of protein Example: Cytochrome C oxidase
  6. Paste, or type your sequence
    Example:
    	MSAQISDSIEEKRGFFTRWFMSTNHKDIGVLYLFTAGLAGLISVTLTVYMRMELQHPGVQ
    	YMCLEGMRLVADAAAECTPNAHL
    Please use only one-letter code amino acids. In particular, avoid numbers or '*', or '.'.
  7. SUBMIT or CLEAR
    Click on the button SUBMIT to request a prediction
    Click on the button CLEAR to clear all data you filled in (e.g. to restart, or to send a new request).




Advanced submission form

  1. Which type of prediction do you require?
    1. Default: all programs will be run. Some results may be omitted in the final mail we return, if we decide that the respective signal was not above a certain threshold (this holds in particular for the prediction of coiled coil and membrane regions).
    2. TOPITS: prediction-based threading.
      We run your protein against a representative part of the PDB database (i.e. the database of proteins with known 3D structure) to find proteins similar to your sequence, that cannot be identified as similar from the sequence alone.
    3. PHDsec only: will return only the prediction of secondary structure (and the alignments, if requested).
    4. PHDacc only: will return only the prediction of solvent accessibility (and the alignments, if requested).
    5. PHDhtm only: will return only the prediction of helical transmembrane regions, as well as of hthe topology of helical membrane proteins.
      Note: PHDhtm is also run by default, but weaker hits are reported only if you choose this option!
    6. Evalsec: evaluates the accuracy of a secondary structure prediction for which you provide the observed secondary structure.
    7. PROF: by default 1D structure predictions are still produced by PHD. Now, the more accurate program PROF is also available.
    8. PROFsec only: will return only the PROFsec prediction of secondary structure (and the alignments, if requested).
    9. PROFacc only: will return only the PROFacc prediction of solvent accessibility (and the alignments, if requested).
  2. Specify the format for the returned multiple-sequence alignment
    1. Default: the alignment is returned in MSF format example
    2. HSSP format example
      The MaxHom alignment is retunred in HSSP format (omitting profiles and the insertion list at the bottom of the HSSP file).
    3. HSSP format with profiles example
      The MaxHom alignment is retunred in HSSP format including profiles of residue substitution frequencies.
    4. no alignment returned example
  3. Specify the database to be searched for similar proteins
    1. Default: SWISS-PROT protein sequence database (version 37.0, 78597 proteins)
    2. PDB: Protein Data Bank of protein structures (version 99-04, 14400 protein chains)
    3. TrEMBL: Translations of all coding sequences in the EMBL Nucleotide Sequence Database (version 99-04, 374386 proteins)
    4. SWISS-Prot + PDB + TrEMBL: combination of all the previous (total of 467383 sequences)
      Note: due to the cosiderable CPU-time spent on searching through the combined database, we have to impose a length limit (currently < 1000 residues). Thus, if you want a search through the big database, please chop your sequence into the units that are likely to form domains (see e.g. ProDom).
      If your sequence is longer than 1000 residues, the search will be restricted to the SWISS-PROT database.
  4. Run iterated PSI-BLAST on SWISS-PROT + TrEMBL + PDB
    Limited CPU-time prevents us from automatically running an iterated PSI-BLAST for all submissions. However, you can request this option. If you do, the PSI-BLAST alignment will be used for predictions. (Note: you also will have to select 'Return BLAST output', since that the PSI-BLAST results will NOT be returned by default!)
  5. Return BLAST output from SWISS-PROT search example
    The (more or less unfiltered) raw output from the BLAST search against the SWISS-PROT database is additionally returned (note: by default we return only the final result from the dynamic programming search with the program MaxHom).
  6. Return additional PHD output
    1. PHD msf example
      Returns the PHD predictions additionally in an MSF format (appended to the alignment).
    2. PHD rdb example
      Returns the PHD predictions additionally in RDB format (as read and written by local versions of the programs PHD and TOPITS).
    3. PHD col example
      Returns the secondary structure and accessibility predictions additionally in a column format. (Note: this format can be used as input for a request of prediction-based threading.)
    4. PHD casp2 example
      Returns the PHD predictions additionally in the format used for the second protein structure prediction contest in Asilomar, 1996 (CASP2).
  7. Return additional PROF output
    1. PHD msf example
      Returns the PROF predictions additionally in an MSF format (appended to the alignment).
    2. PHD rdb example
      Returns the PROF predictions additionally in RDB format (as read and written by local versions of the programs PROF and TOPITS).
    3. PHD col example
      Returns the secondary structure and accessibility predictions additionally in a column format. (Note: this format can be used as input for a request of prediction-based threading.)
    4. PROF casp example
      Returns the PROF predictions additionally in the format used for the second protein structure prediction contest in Asilomar, 2000 (CASP4).
  8. Return additional TOPITS output
    1. TOPITS hssp example
      Returns the threading output additionally in HSSP format.
    2. TOPITS strip example
      Returns the threading output additionally in STRIP format (which displays predicted and observed secondary structure underneath one another).
    3. TOPITS own example
      Returns the threading output additionally in the format used by TOPITS.
  9. Return the result in HTML format
    1. HTML formatted results example
      The email you get will have the entire results attached in one HTML formatted file (which you can load into any WWW browser). Alignments and ProDom results are displayed using the program MView developed by (A HREF="http://mathbio.nimr.mrc.ac.uk/~nbrown/">Nigel Brown (MRC, Mill Hill, London).
    2. HTML for printouts example
      The email you get will have the all output attached in one HTML formatted file (to display with any WWW browser) that has fewer characters per line than the normal HTML output (see "return HTML"), so that you can print the output.
    3. HTML with PHD graphs example
      The email you get will have the entire results attached in one HTML formatted file (which you can load into any WWW browser).
      • Note: the HTML files resulting from the PHD predictions may be large. To avoid that your mail will be too big, you may therefore use the option of leaving the result on our machines, and simply ftp it to your local machine (see option "return no mail").
    4. HTML with PHD graphs for printouts example
      The email you get will have the all output attached in one HTML formatted file (to display with any WWW browser) that has fewer characters per line than the normal HTML output (see "return HTML detail"), so that you can print the output. (For further information see the option "return html".)
      • Note: the HTML files resulting from the PHD predictions may be large. To avoid that your mail will be too big, you may therefore use the option of leaving the result on our machines, and simply ftp it to your local machine (see option "return no mail").
  10. Concise output example
    Returns a concise summary of results (e.g., no tables for prediction accuracy).
  11. Further alignment options
    1. do NOT align:
      if you check this box, then we shall not align your list of FASTA or PIR formatted sequences
      Note: this option is only effective for those two cases (see sequence formats below).
      Note 2: PHD prediction are more accurate if based on alignments. Thus, please DO NOT use this when you don't have a strong reason for it!!!
    2. return full sequences:
      By default, all alignments will be returned in a form which has NO insertions in the protein sequence you submitted. Thus, sequences that are best aligned to your protein by cleaving off (typically loop regions), will simply appear as lower case residues indicating that between these two residues others were deleted (e.g. 'AACpsHW' could indicate a deletion between P and S from an original sequence that may have been 'AACPEQGGSHW'). If you want to keep all the residues inserted, you have to select the box 'return full sequences'.
    3. do NOT filter returned alignment
      By default we filter the alignment returned to you such that only more likely homologues will be identified. If you switch this option off, you may want to be aware of the fact that most proteins aligned to your sequence will NEITHER be similar in terms of structure, NOR in terms of function!!
    4. do NOT filter alignment used for PHD
      If the divergence found in your family is not 'well' spread, prediction accuracy may drop. In particular, too many highly similar sequences may be problematic in absence of further diverged family members. This problem came up only in the post-genome era, i.e. since the number of sequences is exploding. To correct for this problem we run a crude filter on the alignment, by default.
    5. do NOT return PSI-BLAST:
      if you check this box, then we shall not use the PSI-BLAST alignment as input for the prediction
      Note: at the moment iterated PSI-BLAST are not run, however, we consider to change this as soon as we shall have more CPU resources available.
  12. Switch off default methods
    By default all following programs are executed:
    1. ProSite (scan for motifs)
    2. SEG (scan for low-complexity, or composition-biased regions)
    3. ProDom (scan for domains)
    4. PredictNLS (scan for nuclear localisation signals)
    5. COILS (find coiled-coil regions)
    6. CYSPRED (find coiled-coil regions)
    7. ASP (identify structural switching regions)

    choosing the respective box, you can switch this off (e.g. to reduce the length of the output, and/or the processing time on our side).
  13. Specify the format of your sequence(s), alignment, or prediction
    1. Default= single sequence:
      single protein sequence in one-letter amino acid code
    2. Multiple-sequence alignment in SAF-format: example
      Your alignment (in the simple alignment format SAF).
      Note: I do strongly recommend this as THE option of choice for non-experts (rather than the MSF format).
    3. Multiple-sequence alignment in MSF-format: example
      Your alignment (in the multiple sequence format MSF).
      Note: To non-experts I strongly recommend to use the SAF format, instead (see above).
    4. List of sequences or alignment in FASTA-format: example
    5. List of sequences (PIR-format): example
    6. Swissprot identifier (SWISSID):
      Example: paho_chick
      Allows to submit a single sequence through its SWISSPROT identifier, i.e. you simply provide the respective swissprot identifier, and we shall align exactly that protein (provided it is available in our current SWISS-PROT database!).
    7. Prediction of secondary structure and solvent accessibility in COLUMN-format: example
    8. Known and predicted secondary structure in COLUMN-format: example
      Submitting secondary structure for evaluation of prediction accuracy
      NOTE: only for running Evalsec, i.e. NOT for getting predictions!
  14. Batch or interactive?
    PP has the option of providing the results interactively, i.e. you keep waiting, and eventually the results will pop up on the screen.
    However, the typical processing time is more than 5 minutes, and to avoid overloading the network connection, we actually switch to the BATCH mode after that time!
    Note: typically the interactive mode is useful:
    1. if the current PP job queue is empty (see the WAIT icon below to check)
    2. AND if your request contains an alignment already (i.e. you want only a PHD prediction)




Expert submission form

well, you don't need help, you are an expert, anyway....