bc-GenExMiner 4.5: new mining module computes breast cancer differential gene expression analyses Open Access

PAM50 molecular subtyping distribution in function of the nature of the breast tissue

Nature of the tissue	Molecular intrinsic subtype (PAM50)
	Basal-like	HER2E	Luminal A	Luminal B	Normal breast-like
Healthy	0	0	89	0	3
Tumour-adjacent	0	0	101	0	3
Tumour	161	74	406	387	7

Nature of the tissue	Molecular intrinsic subtype (PAM50)
	Basal-like	HER2E	Luminal A	Luminal B	Normal breast-like
Healthy	0	0	89	0	3
Tumour-adjacent	0	0	101	0	3
Tumour	161	74	406	387	7

Table 1.

Open in new tab Download slide

PAM50 molecular subtyping distribution in function of the nature of the breast tissue

Nature of the tissue	Molecular intrinsic subtype (PAM50)
	Basal-like	HER2E	Luminal A	Luminal B	Normal breast-like
Healthy	0	0	89	0	3
Tumour-adjacent	0	0	101	0	3
Tumour	161	74	406	387	7

Nature of the tissue	Molecular intrinsic subtype (PAM50)
	Basal-like	HER2E	Luminal A	Luminal B	Normal breast-like
Healthy	0	0	89	0	3
Tumour-adjacent	0	0	101	0	3
Tumour	161	74	406	387	7

Figure 1.

Comparisons of biological characteristics of the three breast tissues by means of GESs. The first row presents the three types of breast tissues. The other rows, from top to bottom, present significant GES scores in function of tissue type (green: low score; red and green checkerboard pattern: intermediate score; red: high score). This figure is an illustration of Supplementary Table S3.

p53 status gene expression comparison analyses

p53-mutated status frequencies varied from 26.6% to 32.9% in function of the nature of the data (i.e. microarrays or RNAseq) and method of p53 status determination (i.e. IHC, GES and sequencing) (Table 2). These results are concordant with those observed in the literature for breast cancer (i.e. 20–35%) (24, 25). However, the lowest and significantly different frequency was observed for p53 status determination by means of p53 GES applied on microarray data (P < 0.0001). Other frequencies were comparable (P = 0.5121).

Table 2.

Frequencies of p53 mutations in function of p53 mode of annotation and nature of the data

Data	TP53 status annotation	No	No (%) WT	No (%) MT
Microarrays	IHC	922	638 (69.2)	284 (30.8)
	Sequence-based	1980	1328 (67.1)	652 (32.9)
	GES	2728	2003 (73.4)	725 (26.6)
RNAseq	Sequence-based	1027	699 (68.1)	328 (31.9)

Data	TP53 status annotation	No	No (%) WT	No (%) MT
Microarrays	IHC	922	638 (69.2)	284 (30.8)
	Sequence-based	1980	1328 (67.1)	652 (32.9)
	GES	2728	2003 (73.4)	725 (26.6)
RNAseq	Sequence-based	1027	699 (68.1)	328 (31.9)

MT, mutated; No, number of; WT, wild type.

Table 2.

Open in new tab Download slide

Frequencies of p53 mutations in function of p53 mode of annotation and nature of the data

Data	TP53 status annotation	No	No (%) WT	No (%) MT
Microarrays	IHC	922	638 (69.2)	284 (30.8)
	Sequence-based	1980	1328 (67.1)	652 (32.9)
	GES	2728	2003 (73.4)	725 (26.6)
RNAseq	Sequence-based	1027	699 (68.1)	328 (31.9)

Data	TP53 status annotation	No	No (%) WT	No (%) MT
Microarrays	IHC	922	638 (69.2)	284 (30.8)
	Sequence-based	1980	1328 (67.1)	652 (32.9)
	GES	2728	2003 (73.4)	725 (26.6)
RNAseq	Sequence-based	1027	699 (68.1)	328 (31.9)

MT, mutated; No, number of; WT, wild type.

Expressions of the 31 probes and 26 genes belonging to p53 GES were concordant with their GES weights (−1 or +1) for IHC and sequence-based microarray data and for sequence-based RNAseq data (Supplementary Table S4).

Furthermore, expressions of the 47 probes and 40 genes belonging to proliferation GES showed that proliferation in p53-mutated tumours was higher than in p53 wild-type tumours irrespective of the method of p53 status determination in microarray data, and p53 status determination by sequencing in RNAseq data (Supplementary Table S5).

Immune response in function of p53 status was explored by means of eight immune response representative genes, 20 HLA genes and 12 immune checkpoint genes. Whatever the nature of the transcriptomic data (i.e. microarrays or RNAseq), a brief synthesis of these analyses demonstrates that immune response takes place in p53-mutated tumours whatever the method of p53 status determination (Supplementary Tables S6, S7 and S8).

Basal-like (PAM50) and/or TNBC (IHC)

FOXC1 expression was always found to be significantly elevated in basal-like (PAM50) and/or TNBC(IHC) versus non-basal-like and/or non-TNBC, and in basal-like (PAM50) versus other intrinsic molecular subtypes (Figure 2).

Figure 2.

FOXC1 gene expression analysis in basal-like (PAM50) and/or TNBC (IHC), and intrinsic molecular subtypes (PAM50).

TNBC subtypes

Transcriptomic TNBC data of eight Affymetrix® cohorts were selected (Supplementary Table S9). Unsupervised analysis followed by annotation using clinicopathological data, IHC markers and GES, separated TNBC into three subtypes: C1 [n = 169 (24.4%)], C2 [n = 252 (36.4%)] and C3 [n = 272 (39.2%)].

TNBC subtype profiles of the eight marker genes were concordant with what was expected (Table 3). Androgen signalling markers were highly expressed in C1, and immune-response markers were highly expressed in C3. FOXC1, which is a basal-like marker, displayed highest expression in C2 compared to C3, although these two subtypes are basal-like. This observation is likely in line with the fact that biological aggressiveness in C2 is more pronounced than in C3.

Table 3.

Gene expression profiles of TNBC subtype–specific genes

Subtype specificity	Gene (median probe)	Biological process	TNBC profile
C1	AR	Androgen signalling	C1 > C2 = C3
	FOXA1	Androgen signalling	C1 > C2 = C3
C2	FOXC1	Development (basal-like marker)	C2 > C3 > C1
C3	CD8A	Immune response (T lymphocytes)	C3 > C1 > C2
	IGKC	Immune response (immunoglobulin)	C3 > C1 > C2
	STAT1	Immune response (interferon pathway)	C3 > C2 = C1
	CD274 (PD-L1)	Immune response (immune checkpoint)	C3 > C2 = C1
	PDCD1 (PD1)	Immune response (immune checkpoint)	C3 > C2 = C1

Subtype specificity	Gene (median probe)	Biological process	TNBC profile
C1	AR	Androgen signalling	C1 > C2 = C3
	FOXA1	Androgen signalling	C1 > C2 = C3
C2	FOXC1	Development (basal-like marker)	C2 > C3 > C1
C3	CD8A	Immune response (T lymphocytes)	C3 > C1 > C2
	IGKC	Immune response (immunoglobulin)	C3 > C1 > C2
	STAT1	Immune response (interferon pathway)	C3 > C2 = C1
	CD274 (PD-L1)	Immune response (immune checkpoint)	C3 > C2 = C1
	PDCD1 (PD1)	Immune response (immune checkpoint)	C3 > C2 = C1

Table 3.

Open in new tab Download slide

Gene expression profiles of TNBC subtype–specific genes

Subtype specificity	Gene (median probe)	Biological process	TNBC profile
C1	AR	Androgen signalling	C1 > C2 = C3
	FOXA1	Androgen signalling	C1 > C2 = C3
C2	FOXC1	Development (basal-like marker)	C2 > C3 > C1
C3	CD8A	Immune response (T lymphocytes)	C3 > C1 > C2
	IGKC	Immune response (immunoglobulin)	C3 > C1 > C2
	STAT1	Immune response (interferon pathway)	C3 > C2 = C1
	CD274 (PD-L1)	Immune response (immune checkpoint)	C3 > C2 = C1
	PDCD1 (PD1)	Immune response (immune checkpoint)	C3 > C2 = C1

Subtype specificity	Gene (median probe)	Biological process	TNBC profile
C1	AR	Androgen signalling	C1 > C2 = C3
	FOXA1	Androgen signalling	C1 > C2 = C3
C2	FOXC1	Development (basal-like marker)	C2 > C3 > C1
C3	CD8A	Immune response (T lymphocytes)	C3 > C1 > C2
	IGKC	Immune response (immunoglobulin)	C3 > C1 > C2
	STAT1	Immune response (interferon pathway)	C3 > C2 = C1
	CD274 (PD-L1)	Immune response (immune checkpoint)	C3 > C2 = C1
	PDCD1 (PD1)	Immune response (immune checkpoint)	C3 > C2 = C1

Customized analysis

More than 100 000 differential gene expression analyses (20 000 genes × five splitting criteria) can be performed based on microarray data. This number increases to more than 180 000 (36 000 genes × five splitting criteria) by using RNAseq data.

Increased kinetics, from Q1 to Q4, was observed for correlated proliferation and T-cell cytotoxicity genes (Figure 3). On the contrary, decreased kinetics was observed for T-cell cytotoxicity genes (GZMA and PRF1) in function of ESR1 level. As expected, these results are concordant with the fact that immune response is triggered in ER-negative tumours, i.e. ESR1-low tumours.

Figure 3.

Customized expression analysis results of four demonstrative gene pairs (tested gene, splitting gene/quartile criterion): (MKI67, AURKA); (GZMA, PRF1); (GZMA, ESR1) and (PRF1, ESR1).

Discussion

From the very beginning, bc-genExMiner development is guided by one principle: to offer the most easy-to-use, reliable, complete, and biologically and clinically relevant web-based tool to breast cancer researchers and clinicians. Furthermore, a special effort was made to avoid ‘black box’ approach. The development of this new module was no exception with these guidelines; the handling of the expression module remains as simple as it ever was. Entry screens are not cluttered, analyses are performed in very few clicks and interpretation of the results is simple.

Different strategies have been applied in order to optimize the reliability of our web tool. First, strict inclusion criteria were used. Second, in order to limit normalization bias, normalization was carried out on the cohorts taken into account in specific analyses. Third, because no gold standard exists for intrinsic molecular subtyping and p53 status determinations, we proposed six modes of molecular subtyping and three robust subtyped cohorts (same annotation with different molecular subtype predictors) and three methods of p53 status determination. Fourth, analyses may be based on microarray or RNAseq pooled cohorts, or microarray unique cohort (METABRIC), or RNAseq unique cohorts (TCGA, SCAN-B). Concordant results based on different cohorts allow concluding that biological significance is robust. Fifth, each development of our web-based tool was validated by a large number of ‘biological tests’ whose aim was to prove that the pathobiological information of the gene expression data was present and not disrupted by the bioinformatics process (1, 2). All these validation test results confirmed that bc-GenExMiner bioinformatics process is globally neutral and that this web-based tool may be used for in silico validation or discovery purposes.

bc-GenExMiner belongs to the category of disease-associated web-based tools. Furthermore, it is considered as a complete tool. Indeed, users can test their genes of interest in multiple ways (expression, correlation and prognostic) by means of the same interface and know-how.

To the best of our knowledge, this new module includes at least two original kinds of gene expression analyses. Users can explore gene expression simultaneously in healthy mammary, tumour adjacent and tumour tissues. Here, healthy tissue is really a mammary tissue without any link with cancer. Biological kinetics observed between these three tissues (e.g. proliferation) demonstrated that tumour-adjacent tissue must not be assimilated to healthy tissue. Increasing (H < TA < T) and decreasing (H > TA > T) biological kinetics show that tumour-adjacent tissue has an intermediate pathological phenotype. Another originality of this new module is the possibility to explore gene expression in TNBC subtypes. We and others clearly showed that basal-like subtypes may be split into two distinct subtypes, notably in function of a pro-tumourigenic or an anti-tumourigenic immune response. Therefore, basal-like explorations have to take into account basal-like heterogeneity.

Finally, bc-GenExMiner continues to be actively developed. The updation of the gene names and inclusion of new cohorts are done regularly. By further increasing the number of patients, we will be able to explore gene expression in rare breast cancer cohorts for more specific investigations.

Supplementary data

Supplementary data are available at Database online.

Acknowledgements

This paper was prepared in the context of the SIRIC ILIAD programme supported by the French National Cancer Institute (INCa), the Ministry of Health and the Institute for Health and Medical Research (Inserm) (SIRIC ILIAD, INCa-DGOS Inserm-12558). The results shown here are in part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga and GTEx portal generated on 6 May 2017 under dbGaP accession number phs000424.v8.p2.

Conflict of interest.

The authors declare no conflicts of interest or other disclosures.

References

Jézéquel

Campone

Gouraud

et al. (

2012

)

bc-GenExMiner: an easy-to-use online platform for gene prognostic analyses in breast cancer

Breast Cancer Res. Treat.

131

765

–

775

Jézéquel

Frénel

J.S.

Campion

et al. (

2013

)

bc-GenExMiner 3.0: new mining module computes breast cancer gene expression correlation analyses

Database (Oxford)

2013

, bas060.

Jézéquel

Loussouarn

Guérin-Charbonnel

et al. (

2015

)

Gene-expression molecular subtyping of triple-negative breast cancer tumours: importance of immune response

Breast Cancer Res.

, 43.

Jézéquel

Guette

Lasla

et al. (

2019

)

iTRAQ-based quantitative proteomic analysis strengthens transcriptomic subtyping of triple-negative breast cancer tumors

Proteomics

, e1800484.

Jézéquel

Kerdraon

Hondermarck

et al. (

2019

)

Identification of three subtypes of triple-negative breast cancer with potential therapeutic implications

Breast Cancer Res.

, 65.

Curtis

Shah

S.P.

Chin

S.F.

et al. (

2012

)

The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups

Nature

486

346

–

352

Miller

L.D.

Smeds

George

et al. (

2005

)

An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patients survival

Proc. Natl. Acad. Sci. USA

102

13550

–

13555

Crossref

Liu

Jiang

Gao

et al. (

2019

)

TP53 mutations promote immunogenic activity in breast cancer

J. Oncol.

2019

, 5952836.

Karn

Pusztai

Holtrich

et al. (

2011

)

Homogeneous datasets of triple negative breast cancers enable the identification of novel prognostic and predictive signatures

PLoS One

, e28403.

10.

Sorlie

Tibshirani

Parker

et al. (

2003

)

Repeated observation of breast tumor subtypes in independent gene expression data sets

Proc. Natl. Acad. Sci. USA

100

8418

–

8423

Crossref

11.

Prat

Adamo

Cheang

M.C.

et al. (

2013

)

Molecular characterization of basal-like and non-basal-like triple-negative breast cancer

Oncologist

123

–

133

12.

Ray

P.S.

Wang

et al. (

2010

)

FOXC1 is a potential prognostic biomarker with functional significance in basal-like breast cancer

Cancer Res.

3870

–

3876

13.

Lehmann

B.D.

Bauer

J.A.

Chen

et al. (

2011

)

Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies

J. Clin. Invest.

121

2750

–

2767

14.

Burstein

M.D.

Tsimelzon

Poage

G.M.

et al. (

2015

)

Comprehensive genomic analysis identifies novel subtypes and targets of triple-negative breast cancer

Clin. Cancer Res.

1688

–

1698

15.

Rooney

Shukla

S.A.

C.J.

et al. (

2015

)

Molecular and genetic properties of tumors associated with local immune cytolytic activity

Cell

160

–

16.

Saal

L.H.

Vallon-Christersson

Häkkinen

et al. (

2015

)

The Sweden Cancerome Analysis Network - Breast (SCAN-B) initiative: a large-scale multicenter infrastructure towards implementation of breast cancer genomic analyses in the clinical routine

Genome Med.

, 20.

17.

Liao

Smyth

G.K.

and

Shi

(

2013

)

The subread aligner: fast, accurate and scalable read mapping by seed-and-vote

Nucleic Acids Res.

, e108.

18.

Johnson

W.E.

and

Rabinovic

(

2006

)

Adjusting batch effects in microarray expression data using empirical Bayes methods

Biostatistics

118

–

127

19.

Ben Azzouz

Michel

Lasla

et al. (

2020

)

Development of an absolute assignment predictor for triple-negative breast cancer subtyping using machine learning approaches

Comput. Biol. Med.

129

, 104171.

20.

Wallden

Storhoff

Nielsen

et al. (

2015

)

Development and verification of PAM50-based Prosigna breast cancer gene signature assay

BMC Med. Genomics

, 54.

21.

Tian

and

Schiemann

W.P.

(

2009

)

The TGF-beta paradox in human cancer: an update

Future Oncol.

259

–

271

22.

Moses

and

Barcellos-Hoff

M.H.

(

2011

)

TGF-beta biology in mammary development and breast cancer

Cold Spring Harb. Perspect. Biol.

, a003277.

23.

Zarzynska

J.M.

(

2014

)

Two faces of TGF-beat1 in breast cancer

Mediators Inflamm.

2014

, 141747.