PMBC: a manually curated database for prognostic markers of breast cancer Open Access

Number of RNA records and unique RNA markers

	Number of records	Number of unique markers
mRNA	1089	869
lncRNA	80	70
miRNA	136	97
circRNA	49	32
Pseudogene	3	2

Table 1.

Number of RNA records and unique RNA markers

	Number of records	Number of unique markers
mRNA	1089	869
lncRNA	80	70
miRNA	136	97
circRNA	49	32
Pseudogene	3	2

Basic statistics of the curated prognostic markers

Among the 1070 prognostic markers of breast cancer, we sorted them according to their frequency. As shown in Figure 1A, HIF1A is the most frequently reported prognostic marker, and AR, EGFR and MIK67 are the second most reported markers. HOX transcript antisense RNA is the most frequently reported prognostic lncRNA, while miR-125b, miR-205, miR-21 and miR-210 are the most frequently reported prognostic miRNAs. Besides, we rank the journals according to the number of reported prognostic markers (Figure 1B). The journal ‘Cancers (Basal)’ is at the top of the list, followed by the journals ‘Breast Cancer Res’ and ‘Cancer Res’.

Figure 1.

Basic statistics of the curated prognostic markers. (A) Most frequently reported prognostic markers. (B) The journals with the top number of prognostic markers.

Database design

PMBC provides a user-friendly interface to allow users to easily query information for prognostic markers of breast cancer. A screenshot of the database for prognostic markers of breast cancer is shown in Figure 2. Specifically, the webpage consists of seven parts: ‘Home’, ‘Browse’, ‘Search’ ‘Submit’, ‘Help’ and ‘Contact us’. In the ‘Browse’ page, users can browse prognostic markers by RNA type, including miRNA, lncRNA, mRNA, circRNA and pseudogene. PMBC also provides browsing with initials of prognostic markers. By clicking a specific initial, all prognostic markers with this initial are returned and shown in a table. In the ‘Search’ page, PMBC allows users to search by symbols of prognostic markers. The database enables fuzzy searching, allowing users to return the most possible matching results. The corresponding results will be returned about the searched prognostic marker, including the publication ID in the PubMed database of National Center for Biotechnology Information, the RNA type of the marker, the favorable or poor outcome for patients with high expression of the marker and the detailed description in the literature. PMBC designed a ‘Submit’ page that enables researchers to submit up-to-date prognostic markers of breast cancer. Once approved by the review committee, the background database will be updated including the submitted record. Moreover, a step-by-step tutorial is also rendered to facilitate users quickly know how to use the database in the ‘Help’ page. The users can also contact us with the information provided in the ‘Contact us’ page.

Figure 2.

A schematic workflow of PMBC.

Function enrichment

We performed functional enrichment to explore the biological processes and pathways for the known prognostic markers. The results showed that these markers were enriched in ‘regulation of epithelial cell proliferation’, ‘positive regulation of mitogen-activated protein kinases (MAPK) cascade’, ‘gland development’, and ‘epithelial cell proliferation’ (Figure 3A). Among the significant pathways, the pathways enriched by the greatest number of prognostic markers are ‘Proteoglycans in cancer’, ‘MicroRNAs in cancer’, ‘PI3K-Akt signaling pathway’ and ‘MAPK signaling pathway’ (Figure 3B).

Figure 3.

Functional enrichment of prognostic markers. We performed the functional enrichment of the known prognostic RNAs in biological processes of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. The significant terms were ranked according to the number of prognostic markers. The top 10 significant terms were shown for (A) GO and (B) KEGG.

Known prognostic markers play pivotal roles in the ceRNA network

The mRNAs and lncRNAs compete to bind with miRNAs and function as ceRNAs. Therefore, we systematically characterize prognostic markers of breast cancer based on the ceRNA network. These prognostic markers were mapped to the downloaded ceRNA network, which comprises mRNA, lncRNA and pseudogene. As a result, 631 markers have interactions in the ceRNA network, including 602 mRNAs, 28 lncRNAs and one pseudogene.

Then, we characterize the topological features of these prognostic markers from the perspectives of degree, closeness and betweenness. The RNAs (nodes) in the ceRNA network were ranked according to these three topological features, respectively. By comparing the topological value of the prognostic RNAs with that of the random RNAs, we found that known prognostic RNAs have higher normalized closeness than random (P < 1.00E − 05, Table 2, Figure 4A), which suggests that they have a shorter distance to other nodes in the ceRNA network. Besides, we divided the prognostic markers into three RNA types, lncRNA, mRNA and pseudogene. Compared with the random RNAs, prognostic lncRNAs have both higher raw and normalized degrees, (P < 1.00E − 05, Table 2, Figure 4B and C). For prognostic mRNAs, they have significantly higher normalized closeness (P < 1.00E − 05, Table 2, Figure 4D). From these results, we conclude that the lncRNAs play important roles in maintaining the interactions between lncRNAs and their ceRNAs, which might be used as a characteristic to prioritize prognostic lncRNAs based on its ceRNA network.

Figure 4.

Significant topological features of prognostic markers. We compared the topological value of the known prognostic RNAs with that of the random RNAs. (A) The normalized closeness was compared with that of random. The raw (B) and normalized (C) degree of prognostic lncRNAs was compared with that of random. (D) The normalized closeness of prognostic mRNAs was compared with that of random.

Table 2.

Topological features of prognostic markers

	Random	All markers	lncRNA	mRNA	Pseudogene
degree_raw	3.29E + 01	3.05E + 01 (P = 0.28)	9.27E + 01 (P < 1.00E-5)	2.76E + 01 (P = 0.43)	3.00 (P = 1.00)
degree_norm	2.28E − 03	2.10E − 03 (P = 0.28)	6.42E − 03 (P ≤ 1.00E-5)	1.91E − 03 (P = 0.43)	2.08E − 04 (P = 1.00)
betweenness_raw	2.20E + 04	1.26E + 04 (P = 0.39)	2.32E + 04 (P = 0.14)	1.22E + 04 (P = 0.41)	7.95E − 01 (P = 1.00)
betweenness_norm	2.12E − 04	1.21E − 04 (P = 0.39)	2.23E − 04 (P = 0.14)	1.17E − 04 (P = 0.41)	0.00 (P = 1.00)
closeness_raw	3.20E − 03	2.70E − 05 (P = 0.97)	2.30E − 5 (P = 1.00)	2.70E − 05 (P = 0.97)	2.10E − 05 (P = 1.00)
closeness_norm	3.65E − 01	3.83E − 01 (P < 1.00E − 05)	3.3E − 01 (P = 1.00)	3.86E − 01 (P < 1.00E − 05)	3.00E − 01 (P = 1.00)

	Random	All markers	lncRNA	mRNA	Pseudogene
degree_raw	3.29E + 01	3.05E + 01 (P = 0.28)	9.27E + 01 (P < 1.00E-5)	2.76E + 01 (P = 0.43)	3.00 (P = 1.00)
degree_norm	2.28E − 03	2.10E − 03 (P = 0.28)	6.42E − 03 (P ≤ 1.00E-5)	1.91E − 03 (P = 0.43)	2.08E − 04 (P = 1.00)
betweenness_raw	2.20E + 04	1.26E + 04 (P = 0.39)	2.32E + 04 (P = 0.14)	1.22E + 04 (P = 0.41)	7.95E − 01 (P = 1.00)
betweenness_norm	2.12E − 04	1.21E − 04 (P = 0.39)	2.23E − 04 (P = 0.14)	1.17E − 04 (P = 0.41)	0.00 (P = 1.00)
closeness_raw	3.20E − 03	2.70E − 05 (P = 0.97)	2.30E − 5 (P = 1.00)	2.70E − 05 (P = 0.97)	2.10E − 05 (P = 1.00)
closeness_norm	3.65E − 01	3.83E − 01 (P < 1.00E − 05)	3.3E − 01 (P = 1.00)	3.86E − 01 (P < 1.00E − 05)	3.00E − 01 (P = 1.00)

The topological features with P < 0.05 are marked in Bold.

Table 2.

Topological features of prognostic markers

	Random	All markers	lncRNA	mRNA	Pseudogene
degree_raw	3.29E + 01	3.05E + 01 (P = 0.28)	9.27E + 01 (P < 1.00E-5)	2.76E + 01 (P = 0.43)	3.00 (P = 1.00)
degree_norm	2.28E − 03	2.10E − 03 (P = 0.28)	6.42E − 03 (P ≤ 1.00E-5)	1.91E − 03 (P = 0.43)	2.08E − 04 (P = 1.00)
betweenness_raw	2.20E + 04	1.26E + 04 (P = 0.39)	2.32E + 04 (P = 0.14)	1.22E + 04 (P = 0.41)	7.95E − 01 (P = 1.00)
betweenness_norm	2.12E − 04	1.21E − 04 (P = 0.39)	2.23E − 04 (P = 0.14)	1.17E − 04 (P = 0.41)	0.00 (P = 1.00)
closeness_raw	3.20E − 03	2.70E − 05 (P = 0.97)	2.30E − 5 (P = 1.00)	2.70E − 05 (P = 0.97)	2.10E − 05 (P = 1.00)
closeness_norm	3.65E − 01	3.83E − 01 (P < 1.00E − 05)	3.3E − 01 (P = 1.00)	3.86E − 01 (P < 1.00E − 05)	3.00E − 01 (P = 1.00)

	Random	All markers	lncRNA	mRNA	Pseudogene
degree_raw	3.29E + 01	3.05E + 01 (P = 0.28)	9.27E + 01 (P < 1.00E-5)	2.76E + 01 (P = 0.43)	3.00 (P = 1.00)
degree_norm	2.28E − 03	2.10E − 03 (P = 0.28)	6.42E − 03 (P ≤ 1.00E-5)	1.91E − 03 (P = 0.43)	2.08E − 04 (P = 1.00)
betweenness_raw	2.20E + 04	1.26E + 04 (P = 0.39)	2.32E + 04 (P = 0.14)	1.22E + 04 (P = 0.41)	7.95E − 01 (P = 1.00)
betweenness_norm	2.12E − 04	1.21E − 04 (P = 0.39)	2.23E − 04 (P = 0.14)	1.17E − 04 (P = 0.41)	0.00 (P = 1.00)
closeness_raw	3.20E − 03	2.70E − 05 (P = 0.97)	2.30E − 5 (P = 1.00)	2.70E − 05 (P = 0.97)	2.10E − 05 (P = 1.00)
closeness_norm	3.65E − 01	3.83E − 01 (P < 1.00E − 05)	3.3E − 01 (P = 1.00)	3.86E − 01 (P < 1.00E − 05)	3.00E − 01 (P = 1.00)

The topological features with P < 0.05 are marked in Bold.

We further extract the ceRNA subnetwork for the 631 prognostic markers, and each pair of markers competitively binds to same miRNAs. The network is shown in Figure 5. The lncRNA NEAT1 competes with 11 RNAs, including lncRNAs and mRNAs, to bind with miRNAs. Majority of the ceRNAs of ABAT belong to pseudogenes, while nearly all the ceRNAs of the pseudogene PSPHP1 are pseudogenes.

Figure 5.

CeRNA network of prognostic markers. The blue nodes represent mRNAs, the orange nodes represent lncRNAs and the dark gold nodes represent pseudogenes. The node size is proportional to its degree in the ceRNA network.

Future extensions

We will review and select newly reported prognostic indicators for breast cancer every 6 months and update the database with user-submitted material monthly. In the current ceRNA network, there are no interactions for circRNA, which fail to characterize them in the ceRNA network. It will be our future direction to reconstruct the ceRNA network to include more RNA types, including but not limited to circRNA. The extended number of prognostic markers and incorporation of more comprehensive ceRNA interactions will improve the utility and coverage of this database. We will have a global and precise understanding of prognostic markers of breast cancer.

Discussion

We have developed a user-friendly database, PMBC. PMBC provides curated prognostic markers for breast cancer. It is different from survival analysis web tools in that it (i) is composed of prognostic markers that are manually curated from publications, (ii) contains various types of RNA markers, including mRNA, miRNA, lncRNA, circRNA and pseudogene, which may not simultaneously be detected in one dataset, (iii) enables the characterization of prognostic markers from the perspective of the ceRNA network and (iv) facilitates the prioritization of prognostic markers based on the network topology of markers in PMBC. To ensure quick and easy searching ability, an intuitive querying interface was implemented in PMBC.

Conclusions

PMBC hosts comprehensive prognostic markers spanning from mRNAs, lncRNAs, miRNAs, to circRNAs. Besides, it provides convenient and user-friendly interfaces to explore the known prognostic markers. We have generated PMBC as a cross-validation tool for researchers to identify potential prognostic RNA biomarkers to follow up for further research. PMBC allows for rapid retrieval and organization of information about cancer prognostic markers, facilitating researchers in comprehending pertinent data and enhancing therapy efficacy and patient well-being. PMBC will help accelerate translational research from bench to bedside.

Data availability

The data presented in this study are available in the article.

Author contributions

Conceptualization, X.C., W.S. and X.W.; methodology and software, X.C., Y.Y., M.L., Y.W., W.C., G.L., L.L., J.L. and C.P.; validation, X.C., W.S. and X.W.; formal analysis, X.C., W.S. and X.W.; investigation, X.C., W.S. and X.W.; writing—original draft preparation, X.C.; writing—review and editing, W.S. and X.W.; supervision, W.S. and X.W.; project administration, X.C., W.S. and X.W.; funding acquisition, X.C., W.S. and X.W.; J.L. revised the webpage and completed the rebutter letter to the reviewers’ comments. All authors have read and agreed to the published version of the manuscript.

Funding

National Natural Science Foundation of China (62003094, 82003615); Science and Technology Projects in Guangzhou (202102020573, 202201010147); Basic Research Project (Dengfeng hospital) jointly Funded by Guangzhou City and University (202201020586).

Conflict of interest statement

The authors declare no conflict of interest.

References

Sung

Ferlay

Siegel

R.L.

et al. (

2021

)

Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries

CA Cancer J. Clin.

209

–

249

van ‘t Veer

L.J.

Dai

van de Vijver

M.J.

et al. (

2002

)

Gene expression profiling predicts clinical outcome of breast cancer

Nature

415

530

–

536

Paik

Shak

Tang

et al. (

2004

)

A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer

N. Engl. J. Med.

351

2817

–

2826

Filipits

Rudas

Jakesz

et al. (

2011

)

A new molecular predictor of distant recurrence in ER-positive, HER2-negative breast cancer adds independent information to conventional clinical risk factors

Clin. Cancer. Res.

6012

–

6020

Parker

J.S.

Mullins

Cheang

M.C.U.

et al. (

2009

)

Supervised risk predictor of breast cancer based on intrinsic subtypes

J. Clin. Oncol.

1160

–

1167

Dwivedi

Mumme

Satpathy

et al. (

2022

)

Survival Genie, a web platform for survival analysis across pediatric and adult cancers

Sci. Rep.

, 3069.

Lanczky

Nagy

Á.

Bottai

et al. (

2016

)

miRpower: a web-tool to validate survival-associated miRNAs utilizing expression data from 2178 breast cancer patients

Breast Cancer Res. Treat.

160

439

–

446

Mizuno

Kitada

Nakai

et al. (

2009

)

PrognoScan: a new database for meta-analysis of the prognostic value of genes

BMC Med. Genet.

, 18.

Madhavan

Gusev

Harris

et al. (

2011

)

G-DOC: a systems medicine platform for personalized oncology

Neoplasia

771

–

783

10.

Bhuvaneshwar

Belouali

Singh

et al. (

2016

)

G-DOC Plus—an integrative bioinformatics platform for precision medicine

BMC Bioinf.

, 193.

11.

Ringner

Fredlund

Häkkinen

et al. (

2011

)

GOBO: gene expression-based outcome for breast cancer online

PLoS One

, e17911.

12.

Aguirre-Gamboa

Gomez-Rueda

Martínez-Ledesma

et al. (

2013

)

SurvExpress: an online biomarker validation tool and database for cancer gene expression data using survival analysis

PLoS One

, e74250.

13.

Madden

S.F.

Clarke

Gaule

et al. (

2013

)

BreastMark: an integrated approach to mining publicly available transcriptomic datasets relating to breast cancer outcome

Breast Cancer Res.

, R52.

14.

Aguirre-Gamboa

and

Trevino

(

2014

)

SurvMicro: assessment of miRNA-based prognostic signatures for cancer clinical outcomes by multivariate survival analysis

Bioinformatics

1630

–

1632

15.

Gyorffy

(

2021

)

Survival analysis across the entire transcriptome identifies biomarkers with the highest prognostic power in breast cancer

Comput. Struct. Biotechnol. J.

4101

–

4109

16.

Cerami

Gao

Dogrusoz

et al. (

2012

)

The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data

Cancer Discov.

401

–

404

17.

Goswami

C.P.

and

Nakshatri

(

2014

)

PROGgeneV2: enhancements on the existing database

BMC Cancer

, 970.

18.

Chen

Miao

Divate

et al. (

2018

)

KM-express: an integrated online patient survival and gene expression analysis tool for the identification and functional characterization of prognostic markers in breast and prostate cancers

Database

2018

, bay069.