CHOPIN: a web resource for the structural and functional proteome of Mycobacterium tuberculosis Open Access

General statistics of CHOPIN pipeline results

Category	Count
Sequences w/ FUGUE Z -Score ≥15	1832
Sequences w/ FUGUE Z -Score ≥6, <15	759
Sequences w/ FUGUE Z -Score ≥4, <6	157
Sequences w/ FUGUE Z -Score <4	759
Sequences without FUGUE hits	157
Number of significant hits ( Z -Score ≥4)	5268
Unique TOCCATA profiles among hits	2009
Number of multi-domain hits	523
Number of alignments	16 420
Number of unique alignments	13 169
Alignments w/apo-form templates	6071
Alignments w/liganded templates	5133
Alignments w/complexed templates	6365
Alignments w/monomeric templates	4839
Alignments w/templates in any state	5216
Average template PID (%)	24.21
Total number of models	49 218
Top models w/ ‘great’ quality rating (=4)	7026
Top models w/ ‘good’ quality rating (≥3, <4)	3187
Top models w/ ‘fair’ quality rating (≥2, <3)	2269
Top models w/ ‘poor’ quality rating (<2)	3931

Category	Count
Sequences w/ FUGUE Z -Score ≥15	1832
Sequences w/ FUGUE Z -Score ≥6, <15	759
Sequences w/ FUGUE Z -Score ≥4, <6	157
Sequences w/ FUGUE Z -Score <4	759
Sequences without FUGUE hits	157
Number of significant hits ( Z -Score ≥4)	5268
Unique TOCCATA profiles among hits	2009
Number of multi-domain hits	523
Number of alignments	16 420
Number of unique alignments	13 169
Alignments w/apo-form templates	6071
Alignments w/liganded templates	5133
Alignments w/complexed templates	6365
Alignments w/monomeric templates	4839
Alignments w/templates in any state	5216
Average template PID (%)	24.21
Total number of models	49 218
Top models w/ ‘great’ quality rating (=4)	7026
Top models w/ ‘good’ quality rating (≥3, <4)	3187
Top models w/ ‘fair’ quality rating (≥2, <3)	2269
Top models w/ ‘poor’ quality rating (<2)	3931

Table 1.

General statistics of CHOPIN pipeline results

Category	Count
Sequences w/ FUGUE Z -Score ≥15	1832
Sequences w/ FUGUE Z -Score ≥6, <15	759
Sequences w/ FUGUE Z -Score ≥4, <6	157
Sequences w/ FUGUE Z -Score <4	759
Sequences without FUGUE hits	157
Number of significant hits ( Z -Score ≥4)	5268
Unique TOCCATA profiles among hits	2009
Number of multi-domain hits	523
Number of alignments	16 420
Number of unique alignments	13 169
Alignments w/apo-form templates	6071
Alignments w/liganded templates	5133
Alignments w/complexed templates	6365
Alignments w/monomeric templates	4839
Alignments w/templates in any state	5216
Average template PID (%)	24.21
Total number of models	49 218
Top models w/ ‘great’ quality rating (=4)	7026
Top models w/ ‘good’ quality rating (≥3, <4)	3187
Top models w/ ‘fair’ quality rating (≥2, <3)	2269
Top models w/ ‘poor’ quality rating (<2)	3931

Category	Count
Sequences w/ FUGUE Z -Score ≥15	1832
Sequences w/ FUGUE Z -Score ≥6, <15	759
Sequences w/ FUGUE Z -Score ≥4, <6	157
Sequences w/ FUGUE Z -Score <4	759
Sequences without FUGUE hits	157
Number of significant hits ( Z -Score ≥4)	5268
Unique TOCCATA profiles among hits	2009
Number of multi-domain hits	523
Number of alignments	16 420
Number of unique alignments	13 169
Alignments w/apo-form templates	6071
Alignments w/liganded templates	5133
Alignments w/complexed templates	6365
Alignments w/monomeric templates	4839
Alignments w/templates in any state	5216
Average template PID (%)	24.21
Total number of models	49 218
Top models w/ ‘great’ quality rating (=4)	7026
Top models w/ ‘good’ quality rating (≥3, <4)	3187
Top models w/ ‘fair’ quality rating (≥2, <3)	2269
Top models w/ ‘poor’ quality rating (<2)	3931

A total of 16 420 alignments was constructed, although only 13 169 were unique, since in some cases the state-free alignment would be the same as one of the others. In 43% (7026) and 19% (3187) of all alignments, the best models generated were assigned a ‘great’ or ‘good’ quality rating, respectively, and 14% (3187) and 24% (2269) falling under the ‘fair’ and ‘poor’ categories. When considering only the best model per hit (i.e. independently of state), the percentages are 53, 19, 13 and 15% for ‘great’, ‘good’, ‘fair’ and ‘poor’ models, respectively, suggesting that the choice of template and alignment can make a significant difference in model quality. On a per sequence basis, the percentages are 60, 16, 12 and 12%.

Table 2 shows the 44 mutations predicted by either SDM or mCSM to be ‘deleterious’ (defined as having an absolute ΔΔ G value >2 kJ/mol) on models of at least ‘fair’ quality. Of those, 11 correspond to mutations that are either of the high confidence TBDReaMDB set or only on the MDR or XDR ones, while the rest correspond to mutations present in all strains. The full list of mutations and the analysis results is on Supplementary Table 2 and on the website.

Table 2.

Mutations predicted to be deleterious to protein stability according to SDM and mCSM

Sequence ID	Mutation	Strain/Source	Sequence Description	SDM ΔΔG (kJ/mol)	mCSM ΔΔG (kJ/mol)
Rv0006	A74S	FLQ	DNA gyrase subunit A gyrA	−2.29	−1.15
Rv0006	D94A	FLQ	DNA gyrase subunit A gyrA	2.04	−0.79
Rv0006	G247S	DS,MDR,XDR	DNA gyrase subunit A gyrA	−3.28	−1.29
Rv0237	A240V	DS,MDR,XDR	Lipoprotein lpqI	2.18	−0.71
Rv0319	G69D	DS,MDR,XDR	Pyrrolidone-carboxylate peptidase pcp	−1.57	−2.31
Rv0404	P478H	DS,MDR,XDR	Fatty-acid-CoA ligase fadD30	1.38	−2.10
Rv0655	V144A	DS,MDR,XDR	Ribonucleotide transport ATP-binding protein ABC transporter mkl	−1.53	−2.38
Rv0667	L456S	DS,MDR,XDR	DNA-directed RNA polymerase beta subunit rpoB	−4.11	−2.66
Rv0667	I1112T	XDR	DNA-directed RNA polymerase beta subunit rpoB	−4.53	−2.43
Rv0721	A105V	DS,MDR,XDR	30. ribosomal protein S5 rpsE	2.18	−0.25
Rv0790c	F83S	DS,MDR,XDR	Hypothetical protein	−2.20	−2.66
Rv1001	T281M	DS,MDR,XDR	Arginine deiminase arcA	2.39	−0.31
Rv1039c	A67T	DS,MDR,XDR	PPE family protein	−2.48	−0.92
Rv1240	G306R	DS,MDR,XDR	Malate dehydrogenase mdh	3.41	−0.97
Rv1276c	Q79E	DS,MDR,XDR	Hypothetical protein	−0.31	−2.48
Rv1569	A171G	DS,MDR,XDR	8.Amino-7-oxononanoate synthase bioF1	−2.24	−1.39
Rv1600	S271A	DS,MDR,XDR	Histidinol-phosphate aminotransferase hisC1	2.85	−0.50
Rv1605	G145V	DS,MDR,XDR	Cyclase hisF	2.55	−0.41
Rv1638	S908I	DS,MDR,XDR	Excinuclease ABC subunit A (DNA-binding ATPase) uvrA	3.02	0.11
Rv1825	P181S	DS,MDR,XDR	Hypothetical protein	−0.81	−2.03
Rv1870c	D123G	DS,MDR,XDR	Hypothetical protein	2.51	−0.38
Rv1878	S296F	DS,MDR,XDR	Glutamine synthetase glnA3	3.03	−0.90
Rv1933c	V196A	MDR,XDR	Acyl-CoA dehydrogenase fadE18	−2.73	−2.53
Rv2000	L275P	XDR	Hypothetical protein	−6.18	−0.95
Rv2043c	A3P	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−3.35	−0.51
Rv2043c	Q10P	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−2.32	−0.49
Rv2043c	C14H	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−4.49	−1.44
Rv2043c	C14R	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−3.76	−0.63
Rv2043c	L19P	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−2.48	−1.46
Rv2043c	V21G	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−4.20	−1.60
Rv2043c	Y34S	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−2.47	−2.96
Rv2122c	A88D	DS,MDR,XDR	Phosphoribosyl-ATP pyrophosphohydrolase hisE	−2.70	−0.82
Rv2161c	G105A	DS,MDR,XDR	Hypothetical protein	2.23	−0.47
Rv2197c	P112S	DS,MDR,XDR	Conserved transmembrane protein	2.77	−0.56
Rv2250c	A119T	DS,MDR,XDR	Hypothetical transcriptional regulatory protein	−2.02	−0.68
Rv2464c	A99T	DS,MDR,XDR	Hypothetical DNA glycosylase	−2.84	−1.35
Rv2886c	V153A	DS,MDR,XDR	Hypothetical resolvase	−2.73	−2.48
Rv2887	S2G	DS,MDR,XDR	Hypothetical transcriptional regulatory protein	2.58	−0.24
Rv3032	Q310L	DS,MDR,XDR	Hypothetical transferase	3.07	−0.33
Rv3174	L42R	DS,MDR,XDR	Hypothetical short-chain type dehydrogenase/reductase	−2.32	−1.56
Rv3545c	I359T	DS,MDR,XDR	Cytochrome P450 125 cyp125	−2.20	−2.79
Rv3591c	F30S	DS,MDR,XDR	Hypothetical hydrolase	−3.05	−1.96
Rv3606c	L172P	DS,MDR,XDR	2.Amino-4-hydroxy-6- hydroxymethyldihydropteridine pyrophosphokinase folk	−2.74	−1.45
Rv3719	R310T	DS,MDR,XDR	Hypothetical protein	−2.20	−1.80

Sequence ID	Mutation	Strain/Source	Sequence Description	SDM ΔΔG (kJ/mol)	mCSM ΔΔG (kJ/mol)
Rv0006	A74S	FLQ	DNA gyrase subunit A gyrA	−2.29	−1.15
Rv0006	D94A	FLQ	DNA gyrase subunit A gyrA	2.04	−0.79
Rv0006	G247S	DS,MDR,XDR	DNA gyrase subunit A gyrA	−3.28	−1.29
Rv0237	A240V	DS,MDR,XDR	Lipoprotein lpqI	2.18	−0.71
Rv0319	G69D	DS,MDR,XDR	Pyrrolidone-carboxylate peptidase pcp	−1.57	−2.31
Rv0404	P478H	DS,MDR,XDR	Fatty-acid-CoA ligase fadD30	1.38	−2.10
Rv0655	V144A	DS,MDR,XDR	Ribonucleotide transport ATP-binding protein ABC transporter mkl	−1.53	−2.38
Rv0667	L456S	DS,MDR,XDR	DNA-directed RNA polymerase beta subunit rpoB	−4.11	−2.66
Rv0667	I1112T	XDR	DNA-directed RNA polymerase beta subunit rpoB	−4.53	−2.43
Rv0721	A105V	DS,MDR,XDR	30. ribosomal protein S5 rpsE	2.18	−0.25
Rv0790c	F83S	DS,MDR,XDR	Hypothetical protein	−2.20	−2.66
Rv1001	T281M	DS,MDR,XDR	Arginine deiminase arcA	2.39	−0.31
Rv1039c	A67T	DS,MDR,XDR	PPE family protein	−2.48	−0.92
Rv1240	G306R	DS,MDR,XDR	Malate dehydrogenase mdh	3.41	−0.97
Rv1276c	Q79E	DS,MDR,XDR	Hypothetical protein	−0.31	−2.48
Rv1569	A171G	DS,MDR,XDR	8.Amino-7-oxononanoate synthase bioF1	−2.24	−1.39
Rv1600	S271A	DS,MDR,XDR	Histidinol-phosphate aminotransferase hisC1	2.85	−0.50
Rv1605	G145V	DS,MDR,XDR	Cyclase hisF	2.55	−0.41
Rv1638	S908I	DS,MDR,XDR	Excinuclease ABC subunit A (DNA-binding ATPase) uvrA	3.02	0.11
Rv1825	P181S	DS,MDR,XDR	Hypothetical protein	−0.81	−2.03
Rv1870c	D123G	DS,MDR,XDR	Hypothetical protein	2.51	−0.38
Rv1878	S296F	DS,MDR,XDR	Glutamine synthetase glnA3	3.03	−0.90
Rv1933c	V196A	MDR,XDR	Acyl-CoA dehydrogenase fadE18	−2.73	−2.53
Rv2000	L275P	XDR	Hypothetical protein	−6.18	−0.95
Rv2043c	A3P	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−3.35	−0.51
Rv2043c	Q10P	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−2.32	−0.49
Rv2043c	C14H	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−4.49	−1.44
Rv2043c	C14R	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−3.76	−0.63
Rv2043c	L19P	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−2.48	−1.46
Rv2043c	V21G	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−4.20	−1.60
Rv2043c	Y34S	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−2.47	−2.96
Rv2122c	A88D	DS,MDR,XDR	Phosphoribosyl-ATP pyrophosphohydrolase hisE	−2.70	−0.82
Rv2161c	G105A	DS,MDR,XDR	Hypothetical protein	2.23	−0.47
Rv2197c	P112S	DS,MDR,XDR	Conserved transmembrane protein	2.77	−0.56
Rv2250c	A119T	DS,MDR,XDR	Hypothetical transcriptional regulatory protein	−2.02	−0.68
Rv2464c	A99T	DS,MDR,XDR	Hypothetical DNA glycosylase	−2.84	−1.35
Rv2886c	V153A	DS,MDR,XDR	Hypothetical resolvase	−2.73	−2.48
Rv2887	S2G	DS,MDR,XDR	Hypothetical transcriptional regulatory protein	2.58	−0.24
Rv3032	Q310L	DS,MDR,XDR	Hypothetical transferase	3.07	−0.33
Rv3174	L42R	DS,MDR,XDR	Hypothetical short-chain type dehydrogenase/reductase	−2.32	−1.56
Rv3545c	I359T	DS,MDR,XDR	Cytochrome P450 125 cyp125	−2.20	−2.79
Rv3591c	F30S	DS,MDR,XDR	Hypothetical hydrolase	−3.05	−1.96
Rv3606c	L172P	DS,MDR,XDR	2.Amino-4-hydroxy-6- hydroxymethyldihydropteridine pyrophosphokinase folk	−2.74	−1.45
Rv3719	R310T	DS,MDR,XDR	Hypothetical protein	−2.20	−1.80

DS (Drug Sensitive), MDR (Multiple Drug Resistant) and XDR (eXtensively Drug Resistance) refer to the KwaZulu-Natal strains sequenced by the Broad Institute, with residue numbers given relative to the F11 reference strain. PZA and FLQ indicate to various high-confidence pyrazinamide or fluoroquinone resistant strains, respectively, as identified on TBDreaMDB, with residue numbers relative to the H37Rv strain

Table 2.

Mutations predicted to be deleterious to protein stability according to SDM and mCSM

Sequence ID	Mutation	Strain/Source	Sequence Description	SDM ΔΔG (kJ/mol)	mCSM ΔΔG (kJ/mol)
Rv0006	A74S	FLQ	DNA gyrase subunit A gyrA	−2.29	−1.15
Rv0006	D94A	FLQ	DNA gyrase subunit A gyrA	2.04	−0.79
Rv0006	G247S	DS,MDR,XDR	DNA gyrase subunit A gyrA	−3.28	−1.29
Rv0237	A240V	DS,MDR,XDR	Lipoprotein lpqI	2.18	−0.71
Rv0319	G69D	DS,MDR,XDR	Pyrrolidone-carboxylate peptidase pcp	−1.57	−2.31
Rv0404	P478H	DS,MDR,XDR	Fatty-acid-CoA ligase fadD30	1.38	−2.10
Rv0655	V144A	DS,MDR,XDR	Ribonucleotide transport ATP-binding protein ABC transporter mkl	−1.53	−2.38
Rv0667	L456S	DS,MDR,XDR	DNA-directed RNA polymerase beta subunit rpoB	−4.11	−2.66
Rv0667	I1112T	XDR	DNA-directed RNA polymerase beta subunit rpoB	−4.53	−2.43
Rv0721	A105V	DS,MDR,XDR	30. ribosomal protein S5 rpsE	2.18	−0.25
Rv0790c	F83S	DS,MDR,XDR	Hypothetical protein	−2.20	−2.66
Rv1001	T281M	DS,MDR,XDR	Arginine deiminase arcA	2.39	−0.31
Rv1039c	A67T	DS,MDR,XDR	PPE family protein	−2.48	−0.92
Rv1240	G306R	DS,MDR,XDR	Malate dehydrogenase mdh	3.41	−0.97
Rv1276c	Q79E	DS,MDR,XDR	Hypothetical protein	−0.31	−2.48
Rv1569	A171G	DS,MDR,XDR	8.Amino-7-oxononanoate synthase bioF1	−2.24	−1.39
Rv1600	S271A	DS,MDR,XDR	Histidinol-phosphate aminotransferase hisC1	2.85	−0.50
Rv1605	G145V	DS,MDR,XDR	Cyclase hisF	2.55	−0.41
Rv1638	S908I	DS,MDR,XDR	Excinuclease ABC subunit A (DNA-binding ATPase) uvrA	3.02	0.11
Rv1825	P181S	DS,MDR,XDR	Hypothetical protein	−0.81	−2.03
Rv1870c	D123G	DS,MDR,XDR	Hypothetical protein	2.51	−0.38
Rv1878	S296F	DS,MDR,XDR	Glutamine synthetase glnA3	3.03	−0.90
Rv1933c	V196A	MDR,XDR	Acyl-CoA dehydrogenase fadE18	−2.73	−2.53
Rv2000	L275P	XDR	Hypothetical protein	−6.18	−0.95
Rv2043c	A3P	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−3.35	−0.51
Rv2043c	Q10P	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−2.32	−0.49
Rv2043c	C14H	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−4.49	−1.44
Rv2043c	C14R	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−3.76	−0.63
Rv2043c	L19P	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−2.48	−1.46
Rv2043c	V21G	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−4.20	−1.60
Rv2043c	Y34S	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−2.47	−2.96
Rv2122c	A88D	DS,MDR,XDR	Phosphoribosyl-ATP pyrophosphohydrolase hisE	−2.70	−0.82
Rv2161c	G105A	DS,MDR,XDR	Hypothetical protein	2.23	−0.47
Rv2197c	P112S	DS,MDR,XDR	Conserved transmembrane protein	2.77	−0.56
Rv2250c	A119T	DS,MDR,XDR	Hypothetical transcriptional regulatory protein	−2.02	−0.68
Rv2464c	A99T	DS,MDR,XDR	Hypothetical DNA glycosylase	−2.84	−1.35
Rv2886c	V153A	DS,MDR,XDR	Hypothetical resolvase	−2.73	−2.48
Rv2887	S2G	DS,MDR,XDR	Hypothetical transcriptional regulatory protein	2.58	−0.24
Rv3032	Q310L	DS,MDR,XDR	Hypothetical transferase	3.07	−0.33
Rv3174	L42R	DS,MDR,XDR	Hypothetical short-chain type dehydrogenase/reductase	−2.32	−1.56
Rv3545c	I359T	DS,MDR,XDR	Cytochrome P450 125 cyp125	−2.20	−2.79
Rv3591c	F30S	DS,MDR,XDR	Hypothetical hydrolase	−3.05	−1.96
Rv3606c	L172P	DS,MDR,XDR	2.Amino-4-hydroxy-6- hydroxymethyldihydropteridine pyrophosphokinase folk	−2.74	−1.45
Rv3719	R310T	DS,MDR,XDR	Hypothetical protein	−2.20	−1.80

Sequence ID	Mutation	Strain/Source	Sequence Description	SDM ΔΔG (kJ/mol)	mCSM ΔΔG (kJ/mol)
Rv0006	A74S	FLQ	DNA gyrase subunit A gyrA	−2.29	−1.15
Rv0006	D94A	FLQ	DNA gyrase subunit A gyrA	2.04	−0.79
Rv0006	G247S	DS,MDR,XDR	DNA gyrase subunit A gyrA	−3.28	−1.29
Rv0237	A240V	DS,MDR,XDR	Lipoprotein lpqI	2.18	−0.71
Rv0319	G69D	DS,MDR,XDR	Pyrrolidone-carboxylate peptidase pcp	−1.57	−2.31
Rv0404	P478H	DS,MDR,XDR	Fatty-acid-CoA ligase fadD30	1.38	−2.10
Rv0655	V144A	DS,MDR,XDR	Ribonucleotide transport ATP-binding protein ABC transporter mkl	−1.53	−2.38
Rv0667	L456S	DS,MDR,XDR	DNA-directed RNA polymerase beta subunit rpoB	−4.11	−2.66
Rv0667	I1112T	XDR	DNA-directed RNA polymerase beta subunit rpoB	−4.53	−2.43
Rv0721	A105V	DS,MDR,XDR	30. ribosomal protein S5 rpsE	2.18	−0.25
Rv0790c	F83S	DS,MDR,XDR	Hypothetical protein	−2.20	−2.66
Rv1001	T281M	DS,MDR,XDR	Arginine deiminase arcA	2.39	−0.31
Rv1039c	A67T	DS,MDR,XDR	PPE family protein	−2.48	−0.92
Rv1240	G306R	DS,MDR,XDR	Malate dehydrogenase mdh	3.41	−0.97
Rv1276c	Q79E	DS,MDR,XDR	Hypothetical protein	−0.31	−2.48
Rv1569	A171G	DS,MDR,XDR	8.Amino-7-oxononanoate synthase bioF1	−2.24	−1.39
Rv1600	S271A	DS,MDR,XDR	Histidinol-phosphate aminotransferase hisC1	2.85	−0.50
Rv1605	G145V	DS,MDR,XDR	Cyclase hisF	2.55	−0.41
Rv1638	S908I	DS,MDR,XDR	Excinuclease ABC subunit A (DNA-binding ATPase) uvrA	3.02	0.11
Rv1825	P181S	DS,MDR,XDR	Hypothetical protein	−0.81	−2.03
Rv1870c	D123G	DS,MDR,XDR	Hypothetical protein	2.51	−0.38
Rv1878	S296F	DS,MDR,XDR	Glutamine synthetase glnA3	3.03	−0.90
Rv1933c	V196A	MDR,XDR	Acyl-CoA dehydrogenase fadE18	−2.73	−2.53
Rv2000	L275P	XDR	Hypothetical protein	−6.18	−0.95
Rv2043c	A3P	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−3.35	−0.51
Rv2043c	Q10P	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−2.32	−0.49
Rv2043c	C14H	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−4.49	−1.44
Rv2043c	C14R	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−3.76	−0.63
Rv2043c	L19P	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−2.48	−1.46
Rv2043c	V21G	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−4.20	−1.60
Rv2043c	Y34S	PZA	Pyrazinamidase/Nicotinamidase PncA (PZase)	−2.47	−2.96
Rv2122c	A88D	DS,MDR,XDR	Phosphoribosyl-ATP pyrophosphohydrolase hisE	−2.70	−0.82
Rv2161c	G105A	DS,MDR,XDR	Hypothetical protein	2.23	−0.47
Rv2197c	P112S	DS,MDR,XDR	Conserved transmembrane protein	2.77	−0.56
Rv2250c	A119T	DS,MDR,XDR	Hypothetical transcriptional regulatory protein	−2.02	−0.68
Rv2464c	A99T	DS,MDR,XDR	Hypothetical DNA glycosylase	−2.84	−1.35
Rv2886c	V153A	DS,MDR,XDR	Hypothetical resolvase	−2.73	−2.48
Rv2887	S2G	DS,MDR,XDR	Hypothetical transcriptional regulatory protein	2.58	−0.24
Rv3032	Q310L	DS,MDR,XDR	Hypothetical transferase	3.07	−0.33
Rv3174	L42R	DS,MDR,XDR	Hypothetical short-chain type dehydrogenase/reductase	−2.32	−1.56
Rv3545c	I359T	DS,MDR,XDR	Cytochrome P450 125 cyp125	−2.20	−2.79
Rv3591c	F30S	DS,MDR,XDR	Hypothetical hydrolase	−3.05	−1.96
Rv3606c	L172P	DS,MDR,XDR	2.Amino-4-hydroxy-6- hydroxymethyldihydropteridine pyrophosphokinase folk	−2.74	−1.45
Rv3719	R310T	DS,MDR,XDR	Hypothetical protein	−2.20	−1.80

However, there are various mechanisms of resistance to a drug: of these SDM and mCSM estimate the effect of a mutation on the structural stability of the protein, which in turn may affect drug binding. Mutations that generate resistance by directly interfering with the binding of a drug molecule can detected noting their location with respect to the drug-binding site; quantitative methods trained using the database Platinum (Pires D, Ascher D and Blundell TL, under review) and graph signatures are under development and will be incorporated later.

An example of resistance-conferring mutations that act through disrupting the stability of a protein can be found in Rv2043c/ pncA . This gene encodes for the nicotinamidase/ pyrazinamidase responsible for the conversion of the pro-drug pyrazinamide into its active form pyrazinoic acid, such that disrupting either the function or stability of the enzyme would lead to preventing the drug from becoming active. Indeed, various mutations across the gene, including deletions, truncations and frame shifts, have been shown to confer resistance to pyrazinamide ( 43 ). In particular, Petrella et al. ( 44 ) have shown structurally how a number of mutations that lead to a loss of stability affect the catalytic activity of the enzyme. None of the nine mutations on the TBDReaMDB high-confidence PZA set are part of the active site as proposed on their paper, but seven out of them were predicted to be deleterious by at least one of the programs.

Conclusion and future perspectives

The CHOPIN database provides a resource for structural information on Mtb, including a flexible and user-friendly repository of high quality homology models and domain annotations, as well as up-to-date experimental structures. Its homology recognition step has helped in enriching the functional annotation of the proteome ( 45 ) and its models assisted in elucidating the mechanism of action of potential drugs ( 46 ). Its focus on providing a variety of models based on specific conformational states of the templates is, as far as we know, unique and should prove valuable to applied researchers in the field, despite the necessary simplifications that were adopted to deal with the highly complex topic of conformational variability. We aim to perform updates following those of the underlying profile and template database, TOCCATA, which itself relies on the SCOP and CATH release schedule of every year or so. We intend to hone our methods to provide more refined and flexible results, such as fully modelled complexes and specific ligands.

The structural analysis of polymorphisms, while currently limited in scope, should also be of interest to researchers in drug discovery. Our group is currently working on further methods to expand and improve the predictions of the effect of structural changes, and as better databases of polymorphisms become available ( 47 , 48 ), we aim to expand our database with their analysis.

Supplementary Data

Supplementary data are available at Database Online.

Acknowledgements

The authors are grateful to Dr Jiye Shi for his advice and modifications to FUGUE and to group members for valuable discussions. The authors also thank their anonymous reviewers for their useful comments and suggestions.

Funding

This work was supported by the Bill & Melinda Gates Foundation (RG60453). University of Cambridge for facilities and support [to TLB]. Funding for open access charge: Bill & Melinda Gates Foundation.

References

Cole

S.T.

Brosch

Parkhill

et al. . (

1998

)

Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence

Nature

393

537

–

544

Camus

J.-C.

Pryor

M.J.

Medigue

et al. . (

2002

)

Re-annotation of the genome sequence of Mycobacterium tuberculosis H37Rv

Microbiology

148

2967

–

2973

Ehebauer

M.T.

Wilmanns

(

2011

)

The progress made in determining the Mycobacterium tuberculosis structural proteome

Proteomics

3128

–

3133

Pieper

Webb

B.M.

Barkan

D.T.

et al. . (

2011

)

ModBase, a database of annotated comparative protein structure models, and associated resources

Nucl. Acids Res.

D465

–

D474

Lewis

T.E.

Sillitoe

Andreeva

et al. . (

2013

)

Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains

Nucl. Acids Res.

D499

–

D507

Mao

Shukla

Larrouy-Maumus

et al. . (

2013

)

Functional assignment of Mycobacterium tuberculosis proteome revealed by genome-scale fold-recognition

Tuberculosis

–

Anand

Sankaran

Mukherjee

et al. . (

2011

)

Structural annotation of Mycobacterium tuberculosis proteome

PLoS ONE

e27044

Berman

Henrick

Nakamura

et al. . (

2007

)

The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data

Nucl. Acids Res.

D301

–

D303

Murzin

A.G.

Brenner

S.E.

Hubbard

et al. . (

1995

)

SCOP: a structural classification of proteins database for the investigation of sequences and structures

J. Mol. Biol.

247

536

–

540

PubMed

OpenURL Placeholder Text

Greene

L.H.

Lewis

T.E.

Addou

et al. . (

2007

)

The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution

Nucl. Acids Res.

D291

–

D297

P.C.

Henikoff

(

2002

)

Accounting for human polymorphisms predicted to affect protein function

Genome Res.

436

–

446

Topham

Srinivasan

Blundell

(

1997

)

Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitution and propensity tables

Protein Eng.

–

Worth

C.L.

Preissner

Blundell

T.L.

(

2011

)

SDM—a server for predicting effects of mutations on protein stability and malfunction

Nucl. Acids Res.

W215

–

W222

Adzhubei

I.A.

Schmidt

Peshkin

et al. . (

2010

)

A method and server for predicting damaging missense mutations

Nat. Methods

248

–

249

Pires

D.E.V.

Ascher

D.B.

Blundell

T.L.

(

2014

)

mCSM: predicting the effects of mutations in proteins using graph-based signatures

Bioinformatics

335

–

342

Reddy

T.B.K.

Riley

Wymore

et al. . (

2009

)

TB database: an integrated platform for tuberculosis research

Nucl. Acids Res.

D499

–

508

Goodstadt

(

2010

)

Ruffus: a lightweight Python library for computational pipelines

Bioinformatics

2778

–

2779

Godzik

(

2006

)

Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences

Bioinformatics

1658

–

1659

Shi

Blundell

T.L.

Mizuguchi

(

2001

)

FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties

J. Mol. Biol.

310

243

–

257

Sali

Blundell

T.L.

(

1990

)

Definition of general topological equivalence in protein structures: a procedure involving comparison of properties and relationships through simulated annealing and dynamic programming

J. Mol. Biol.

212

403

–

428

Altschul

Madden

Schaffer

et al. . (

1997

)

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

Nucl. Acids Res.

3389

–

3402

Schreyer

A.M.

Blundell

T.L.

(

2013

)

CREDO: a structural interactomics database for drug discovery

Database

2013

bat049

–

bat049

Chandonia

J.-M.

Hon

Walker

N.S.

et al. . (

2004

)

The ASTRAL Compendium in 2004

Nucl. Acids Res.

D189

–

D192

Suzek

B.E.

Huang

McGarvey

et al. . (

2007

)

UniRef: comprehensive and non-redundant UniProt reference clusters

Bioinformatics

1282

–

1288

Katoh

Standley

D.M.

(

2013

)

MAFFT multiple sequence alignment software version 7: improvements in performance and usability

Mol. Biol. Evol.

772

–

780

Eddy

S.R.

(

2011

)

Accelerated profile HMM searches

PLoS Comput. Biol.

e1002195

Sammut

S.J.

Finn

R.D.

Bateman

(

2008

)

Pfam 10 years on: 10 000 families and still growing

Brief. Bioinf.

210

–

219

Theobald

D.L.

Wuttke

D.S.

(

2006

)

THESEUS: maximum likelihood superpositioning and analysis of macromolecular structures

Bioinformatics

2171

–

2172

Melo

Sánchez

Sali

(

2002

)

Statistical potentials for fold assessment

Protein Sci.

430

–

448

Melo

Sali

(

2007

)

Fold assessment for comparative protein structure modeling

Protein Sci.

2412

–

2426

Chen

V.B.

Arendall

W.B.

Headd

J.J.

et al. . (

2010

MolProbity: all-atom structure validation for macromolecular crystallography

Acta Cryst. D

–

Eramian

Shen

Devos

et al. . (

2006

)

A composite score for predicting errors in protein structure models

Protein Sci.

1653

–

1666

Wang

Dunbrack

R.L.

(

2005

)

PISCES: recent improvements to a PDB sequence culling server

Nucl. Acids Res.

W94

–

W98

Benkert

Künzli

Schwede

(

2009

)

QMEAN server for protein model quality estimation

Nucl. Acids Res.

W510

–

W514

Kryshtafovych

Barbato

Fidelis

et al. . (

2014

)

Assessment of the assessment: evaluation of the model quality estimates in CASP10

Proteins

112

–

126

Ioerger

T.R.

Koo

E.-G.

et al. . (

2009

)

Genome analysis of multi- and extensively-drug-resistant tuberculosis from KwaZulu-Natal, South Africa

PLoS ONE

e7778

Sandgren

Strong

Muthukrishnan

et al. . (

2009

)

Tuberculosis drug resistance mutation database

PLoS Med.

e1000002

Smith

R.E.

Lovell

S.C.

Burke

D.F.

et al. . (

2007

)

Andante: reducing side-chain rotamer search space during comparative modeling using environment-specific substitution probabilities

Bioinformatics

1099

–

1105

Pires

D.E.V.

de Melo-Minardi

R.C.

da Silveira

C.H.

et al. . (

2013

)

aCSM: noise-free graph-based signatures to large-scale receptor-based ligand prediction

Bioinformatics

855

–

861

Lew

J.M.

Kapopoulou

Jones

L.M.

et al. . (

2011

)

TubercuList – 10 years after

Tuberculosis

–

The UniProt Consortium

. (

2007

)

The Universal Protein Resource (UniProt)

Nucl. Acids Res.

D190

–

D195

Mizuguchi

Deane

C.M.

Blundell

T.L.

et al. . (

1998

)

JOY: protein sequence-structure representation and analysis

Bioinformatics

617

–

623

Singh

Mishra

A.K.

Malonia

S.K.

et al. . (

2006

)

The paradox of pyrazinamide: an update on the molecular mechanisms of pyrazinamide resistance in Mycobacteria

J. Commun. Dis.

288

–

298

PubMed

OpenURL Placeholder Text