Example for requesting to return TOPITS results in its own format

NOTE this will be extended to a more general threading format to be defined by the people in the field (hopefully with the same keywords!)

TOPITS: Threading based on PHD predictions

Bold face: keywords "prediction-based threading", and "return topits"

Effect: alignments with possible remote homologues (<25% sequence identity) are returned additionally in TOPITS format


Threading based on PHD predictions

Bold face: keyword "prediction-based threading"

Effect: alignments with possible remote homologues (<25% sequence identity) are returned

NOTE : only the output specific for this option is given!


Threading results in TOPITS format:				
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~				

________________________________________________________________________________

# TOPITS (Threading One-D Predictions Into Three-D Structures)
# --------------------------------------------------------------------------------
# FORMAT   begin
# FORMAT   general:    - lines starting with hashes contain comments or PARAMETERS
# FORMAT   general:    - columns are delimited by tabs
# FORMAT   general:    - the data are given in BLOCKS, each introduced by a line
# FORMAT   general:      beginning with a hash and a keyword
# FORMAT   parameters: '# PARA:tab     keyword =tab value tab (further-information)'
# FORMAT   notation:   '# NOTATION:tab keyword tab explanation'
# FORMAT   info:       '# INFO:tab     text'
# FORMAT   blocks 0:   '# BLOCK        keyword'
# FORMAT   blocks 1:    column names (tab delimited)
# FORMAT   blocks n>1:  column data  (tab delimited)
# FORMAT   file end:   '//' is the end of a complete file
# FORMAT   end
# --------------------------------------------------------------------------------
# PARA     begin
# PARA     TOPITS HEADER: PARAMETERS
# PARA:	len1       =	24
# PARA:	nali       =	38
# PARA:	listName   =	/home/phd/ut/topits/mat/x2.list
# PARA:	sortMode   =	ZSCORE
# PARA:	weight1    =	NO
# PARA:	weight2    =	NO
# PARA:	smin       =	-1.00
# PARA:	smax       =	2.00
# PARA:	gapOpen    =	2
# PARA:	gapElon    =	0.2
# PARA:	indel1     =	YES
# PARA:	indel2     =	NO
# PARA:	threshold  =	ALL
# PARA:	str:seq    =	50    	(i.e. str= 50%, seq= 50%)
# PARA     end
# --------------------------------------------------------------------------------
# NOTATION begin
# NOTATION TOPITS HEADER: ABBREVIATIONS PARAMETERS
# NOTATION:	len1        :	length of search sequence, i.e., your protein
# NOTATION:	nali        :	number of alignments in file
# NOTATION:	listName    :	fold library used for threading
# NOTATION:	sortMode    :	mode of ranking the hits
# NOTATION:	weight1     :	YES if guide sequence weighted by residue conservation
# NOTATION:	weight2     :	YES if aligned sequence weighted by residue conservation
# NOTATION:	smin        :	minimal value of alignment metric
# NOTATION:	smax        :	maximal value of alignment metric
# NOTATION:	gapOpen     :	gap open penalty
# NOTATION:	gapElon     :	gap elongation penalty
# NOTATION:	indel1      :	YES if insertions in sec str regions allowed for guide seq
# NOTATION:	indel2      :	YES if insertions in sec str regions allowed for aligned seq
# NOTATION:	threshold   :	hits above this threshold included (ALL means no threshold)
# NOTATION:	str:seq     :	weight structure:sequence
# NOTATION TOPITS HEADER: ABBREVIATIONS SUMMARY
# NOTATION:	id2         :	PDB identifier of aligned structure (1pdbC -> C = chain id)
# NOTATION:	pide        :	percentage of pairwise sequence identity
# NOTATION:	lali        :	length of alignment
# NOTATION:	ngap        :	number of insertions
# NOTATION:	lgap        :	number of residues inserted
# NOTATION:	len2        :	length of aligned protein structure
# NOTATION:	Eali        :	alignment score
# NOTATION:	Zali        :	alignment zcore;  note: hits with z>3 more reliable
# NOTATION:	strh        :	secondary str identity between guide and aligned protein
# NOTATION:	ifir        :	position of first residue of search sequence
# NOTATION:	ilas        :	position of last residue of search sequence
# NOTATION:	jfir        :	pos of first res of remote homologue (e.g. DSSP number)
# NOTATION:	jlas        :	pos of last res of remote homologue  (e.g. DSSP number)
# NOTATION:	name        :	name of aligned protein structure
# NOTATION end
# --------------------------------------------------------------------------------
# INFO     begin
# INFO     TOPITS HEADER: ACCURACY
# INFO:	 Tested on 80 proteins, TOPITS found the correct remote homologue in about
# INFO:	 30%of the cases.  Detection accuracy was higher for higher z-scores:
# INFO:	 ZALI>0   => 1st hit correct in 33% of cases
# INFO:	 ZALI>3   => 1st hit correct in 50% of cases
# INFO:	 ZALI>3.5 => 1st hit correct in 60% of cases
# INFO     end
# --------------------------------------------------------------------------------
# BLOCK    TOPITS HEADER: SUMMARY
rank	id2	pide	lali	ngap	lgap	len2	Eali	Zali	strh	ifir	ilas	jfir	jlas	name
1	1ytbA	21	24	0	0	180	18.37	1.46	75	1	24	38	61	1ytb_A TATA-BOX BINDING PROTEIN (YTBP) COMPLEXED WITH DNA
2	2yhx	21	24	1	1	457	17.37	1.28	83	1	24	60	84	2yhx YEAST HEXOKINASE B (E.C.2.7.1.1) COMPLEX WITH
3	1xnb	17	23	1	1	185	17.28	1.26	96	1	23	22	45	1xnb XYLANASE (ENDO-1,4-BETA-XYLANASE) (E.C.3.2.1.8)
4	1ysc	17	24	1	2	421	16.67	1.15	79	1	24	373	398	1ysc SERINE CARBOXYPEPTIDASE (CPY, CPD-Y, OR PROTEINASE C)
5	1whi	13	23	1	1	122	16.10	1.05	91	1	23	4	27	1whi MOL_ID: 1;
6	1xxaA	13	24	0	0	220	16.02	1.03	83	1	24	37	60	1xxa_A MOL_ID: 1;
7	1wkt	26	23	0	0	87	15.95	1.02	70	1	23	24	46	1wkt MOL_ID: 1;
8	1xsoA	30	23	2	4	301	15.95	1.02	87	2	24	152	178	1xso_A CU, ZN SUPEROXIDE DISMUTASE (E.C.1.15.1.1)
9	1xsoA	30	23	2	4	301	15.95	1.02	87	2	24	1	27	1xso_A CU, ZN SUPEROXIDE DISMUTASE (E.C.1.15.1.1)
10	1yal	21	24	1	4	218	15.92	1.01	75	1	24	145	172	1yal MOL_ID: 1;
11	1whtB	20	20	0	0	410	15.83	1.00	100	4	23	24	43	1wht_B SERINE CARBOXYPEPTIDASE II (E.C.3.4.16.1) COMPLEXED WITH
12	1wit	17	23	0	0	93	15.62	0.96	87	1	23	8	30	1wit MOL_ID: 1;
13	1xxaA	13	24	0	0	220	15.18	0.88	75	1	24	110	133	1xxa_A MOL_ID: 1;
14	1xxaA	13	24	0	0	220	15.18	0.88	75	1	24	183	206	1xxa_A MOL_ID: 1;
15	1xyzA	21	19	1	4	320	14.13	0.69	74	1	23	231	249	1xyz_A MOL_ID: 1;
16	1whtB	21	24	1	21	410	14.05	0.67	88	1	24	320	364	1wht_B SERINE CARBOXYPEPTIDASE II (E.C.3.4.16.1) COMPLEXED WITH
17	1xaa	32	19	1	3	345	13.58	0.59	77	2	23	257	275	1xaa MOL_ID: 1;
18	1xis	35	23	1	4	386	12.10	0.32	52	1	23	237	263	1xis XYLOSE ISOMERASE (E.C.5.3.1.5) COMPLEX WITH MN*CL2
19	1ycc	17	23	0	0	108	9.80	-0.10	48	1	23	29	51	1ycc CYTOCHROME C (ISOZYME 1) (REDUCED)
20	9wgaA	18	22	1	2	343	9.65	-0.13	58	1	24	247	268	9wga_A WHEAT GERM AGGLUTININ (ISOLECTIN 2)
21	9wgaA	18	22	1	2	343	8.28	-0.37	54	1	24	75	96	9wga_A WHEAT GERM AGGLUTININ (ISOLECTIN 2)
22	1wdcC	22	23	0	0	360	8.28	-0.37	26	1	23	231	253	1wdc_C MOL_ID: 1;
23	1wdcB	22	23	0	0	360	8.28	-0.37	26	1	23	231	253	1wdc_B MOL_ID: 1;
24	1wdcA	22	23	0	0	360	8.28	-0.37	26	1	23	231	253	1wdc_A MOL_ID: 1;
25	1wdcA	17	24	2	16	360	6.67	-0.67	58	1	24	154	193	1wdc_A MOL_ID: 1;
26	1wdcC	17	24	2	16	360	6.67	-0.67	58	1	24	154	193	1wdc_C MOL_ID: 1;
27	1wdcB	17	24	2	16	360	6.67	-0.67	58	1	24	154	193	1wdc_B MOL_ID: 1;
28	1zfd	16	19	1	4	32	6.47	-0.70	48	1	23	1	19	1zfd MOL_ID: 1;
29	1yrnA	33	6	0	0	128	4.25	-1.11	33	18	23	12	17	1yrn_A MOL_ID: 1;
30	1yrnB	33	6	0	0	128	4.25	-1.11	33	18	23	12	17	1yrn_B MOL_ID: 1;
31	1yrnA	20	5	0	0	128	4.03	-1.15	60	1	5	124	128	1yrn_A MOL_ID: 1;
32	1yrnB	20	5	0	0	128	4.03	-1.15	60	1	5	124	128	1yrn_B MOL_ID: 1;
33	3wrp	25	4	0	0	101	3.72	-1.20	75	20	23	24	27	3wrp $TRP APOREPRESSOR
34	1wdcA	33	3	0	0	360	3.10	-1.32	100	1	3	61	63	1wdc_A MOL_ID: 1;
35	1wdcB	33	3	0	0	360	3.10	-1.32	100	1	3	61	63	1wdc_B MOL_ID: 1;
36	1wdcC	33	3	0	0	360	3.10	-1.32	100	1	3	61	63	1wdc_C MOL_ID: 1;
37	1wfbA	0	3	0	0	37	2.52	-1.42	67	21	23	35	37	1wfb_A ANTIFREEZE PROTEIN ISOFORM HPLC6 (-180 DEGREES C)
38	1xxaA	0	1	0	0	220	0.70	-1.75	100	1	1	220	220	1xxa_A MOL_ID: 1;
//