Abstract

In a broad sense, lactic acid bacteria (LAB) is a general term for Gram-positive bacteria that can produce lactic acid by utilizing fermentable carbohydrates. It is widely used in essential fields such as industry, agriculture, animal husbandry and medicine. At the same time, LAB are closely related to human health. They can regulate human intestinal flora and improve gastrointestinal function and body immunity. Cancer, a disease in which some cells grow out of control and spread to other body parts, is one of the leading causes of human death worldwide. In recent years, the potential of LAB in cancer treatment has attracted attention. Mining knowledge from the scientific literature significantly accelerates its application in cancer treatment. Using 7794 literature studies of LAB cancer as source data, we have processed 16 543 biomedical concepts and 23 091 associations by using automatic text mining tools combined with manual curation of domain experts. An ontology containing 31 434 pieces of structured data is constructed. Finally, based on ontology, a knowledge graph (KG) database, which is called Beyond ‘Lactic acid bacteria to Cancer Knowledge graph Database’ (BLAB2CancerKD), is constructed by using KG and web technology. BLAB2CancerKD presents all the relevant knowledge intuitively and clearly in various data presentation forms, and the interactive system function also makes it more efficient. BLAB2CancerKD will be continuously updated to advance the research and application of LAB in cancer therapy. Researchers can visit BLAB2CancerKD at.

Database URLhttp://110.40.139.2:18095/

Introduction

Lactic acid bacteria (LAB) is a generic term for Gram-positive bacteria that produce lactic acid, which contain over 60 genera (1, 2). LAB are closely related to human life and are widely used in food processing, farming, animal husbandry and medicine production (3). At the same time, LAB have numerous beneficial effects on the human body, including regulating the microbial community in the gastrointestinal tract, improving immune response, controlling endotoxin and improving the body’s immunity (4, 5). Cancer, also called malignant tumor, is a group of diseases involving abnormal cell growth that may invade or spread to other body parts (6, 7). There are more than 100 types of cancers that affect human health (8). With nearly 10 million deaths by 2020, cancer remains one of the major challenges of the 21st century (9). Recent studies have shown that LAB can inhibit the onset or progression of cancer in various ways, which has potential clinical value in cancer prevention and treatment (10).

There are many existing databases that are related to LAB. Random amplified polymorphic DNA polymerase chain reaction (RAPD-PCR) fingerprint databases (11) have identified and typed about a thousand LAB isolated from cheese by species-specific PCR and RAPD-PCR and constructed a LAB fingerprint database. It can identify newly isolated strains, improve previous identification or identify LAB strains isolated from other food ecosystems. LABiocin database (12) is a database especially designed for LAB’s bacteriocin, including name, category, amino acid, nucleic acid sequence of bacteriocin, target microorganism, source, state of production strain and its culture conditions, as well as extraction and purification methods. Similarly, A web-based bacteriocin genome mining tool (BAGEL) (13) also contains information about LAB’s bacteriocin. GelCompar II database (14) contains DNA fingerprints of EcoRI fragments stained with ethidium bromide of total LAB DNA isolated by routine agarose gel electrophoresis. LAB-Secretome (15) stores, visualizes and updates extracellular and surface-associated proteins and Lacto bacillales Cluster of Ortholog Groups (LaCOG) of LAB. Currently, there are many public cancer databases whose primary purpose is to describe cancer and its characteristics, such as National Cancer Institute Thesaurus (16), one of the most commonly used cancer research ontologies, which compiles terms across all aspects of cancer research and health care. Manually handled by experts, Catalogue of Somatic Mutations in Cancer (17) is the world’s largest and most comprehensive resource for studying the effects of variations in human cancer cells. SynLethDB (18) is a comprehensive knowledge base of synthetic lethality (SL), containing an extensive collection of SL gene pairs from various sources to discover selective and sensitive anticancer drug targets. However, till now, no knowledge graph (KG) database has focused on the association between LAB and cancer.

Using LAB cancer–related scientific literature as data sources, we constructed Beyond ‘Lactic acid bacteria to Cancer’ Knowledge-graph Database (BLAB2CancerKD). This KG database focuses on, but beyond, the association between LAB and cancer. Although manual sorting of unstructured data in biomedical literature maintains a high accuracy and recall rate, this method needs to be improved in time, cost and efficiency (19). Although the existing biomedical text mining (TM) (20) tools can solve the problem of time, cost and low efficiency, the accuracy is far less than the manual way (21). Therefore, we use TM tools and web crawler technology (22) combined with a semi-automatic method of manual curation by domain experts to extract relevant structured knowledge from scientific literature, which saves time and cost and dramatically improves accuracy and recall rate. The ontology of ‘beyond lactic acid bacteria cancer’ was established, and finally, BLAB2CancerKD was constructed based on data combined with KG and web technology. The visual form and interactive user interface of the KG can help researchers quickly acquire knowledge of the association between LAB and cancer and assist researchers in finding new strategies, methods and applications of LAB in cancer treatment and prevention.

Method

This paper used TM tools combined with more in-depth manual curation of domain experts to obtain the structured knowledge of LAB and cancer from the scientific literature. Based on these data, KGs and web technology were adopted to construct BLAB2CancerKD. The process is shown in Figure 1.

Figure 1.

BLAB2CancerKD flow chart.

Data source

We obtained the PubMed Unique Identifier (PMID) of 7794 related articles from PubMed (https://pubmed.ncbi.nlm.nih.gov/) by using ‘lactic acid bacteria&cancer’ as keyword. PubTator (23) is an automatic, web-based biomedical concept annotation system that provides an Application Programming Interface (API) interface to easily access and extract biomedical knowledge with a large amount of data. Biomedical entities that can be annotated contain gene/protein, disease, chemical, mutation, species and cell line. We used PubTator to obtain the relevant literature abstracts and the corresponding six biomedical concepts.

Construction of LAB cancer–related biomedical concepts

Medical Subject Headings (MeSH) (24) is an The United States National Library of Medicine (NLM)-controlled vocabulary thesaurus used to index PubMed articles and includes 19 categories of biomedical concepts. Taxonomy database (25) has the classification and nomenclature of all organisms in public sequence databases. Based on MeSH and taxonomy, we further supplemented and improved the biomedical concepts given by PubTator by using crawler technology combined with in-depth expert curation and obtained an entity dictionary of biomedical concepts related to LAB and cancer, which is called LAB cancer–related biomedical concepts (LAB2CancerBC).

We tend to focus on the knowledge of LAB cancer and its related diseases; hence, the division of diseases into specific strains is more detailed. We subdivided the disease category according to the ‘Diseases Category’ in MeSH. Regarding the categories of specific strain entities, we marked the strains at the ‘genus’ level according to the classification standard of taxonomy database to more clearly show the association between different strains and cancer and its related diseases.

Relation extraction

We use LAB2CancerBC as a reference and utilize Python regular expressions (https://docs.python.org/3/library/re.html) to split the relevant abstracts into pre-processing statements. Finally, we use the Open Domain Information Extraction (26) (OpenIE) System and expert curation to complete the relation extraction (RE) task. OpenIE is a RE tool that can extract relational tuples from text without a given training corpus. For example, the sentence ‘I like playing basketball’ can be extracted to obtain a triple: (I, like playing, basketball). We prefer this kind of relationship extraction method that keeps the original semantic structure of scientific literature the same rather than defining the type of relationship in advance to supporting deeper curation and obtaining structured data with more prosperous association relations.

Manual curation of experts

We have implemented a system for the manual curation of experts, which visualizes all structured data and frees experts from tedious file operations. The system is shown in Figure 2. In the construction task of LAB2CancerBC, the domain experts mainly carry out entity curation work on four aspects, which are unlabeled concepts, incorrectly labeled concepts, incompletely labeled concepts and specific strains that were unsuitably processed by PubTator. At the same time, the concept entity from the manual curation of experts will be continuously updated in LAB2CancerBC to reduce labor costs continuously.

Figure 2.

System of manual curation of experts.

Unlabeled concepts

Barrett’s esophagus is a marker for severe reflux and precursor to adenocarcinoma of the esophagus. It is also available in MeSH. We labeled it as ‘Barrett’s esophagus, neoplasms, MESH: D001471’ and ‘Barrett’s esophagus, digestive system disease, MESH: D001471’. Meanwhile, we also focus on biomedical concepts such as analytical, diagnostic and therapeutic techniques and equipment category, phenomena and processes category and technology and food and beverages category, unlabeled by PubTator. These biomedical concepts need to be processed by experts as well.

Incorrectly labeled concepts

LF-HFY06 is the abbreviation of Limosilactobacillus fermentum HFY06. PubTator annotated it as a cell line concept, which we corrected to ‘LF-HFY06, Limosilactobacillus, 1613’. Lipopolysaccharide (LPS) is a component of the outer wall of the cell wall of Gram-negative bacteria, and it is also a kind of endotoxin. However, PubTator identified it as a disease called lip-pits syndrome or the house mouse Tlr4 toll–like receptor 4 gene. We corrected it to ‘LPS, Chemical, MESH: D008070’.

Incompletely labeled concepts

PubTator only labels ‘cancer’ for oral cancer, which is not comprehensive enough for our work. Hence, we label it ‘oral cancer, neoplasms, MESH: D009062’ and ‘oral cancer, stomatognathic disease, MESH: D009062’.

Annotation of the genus of specific LAB strains

We labeled Lactobacillus fermentum HFY06 as ‘Lactobacillus fermentum HFY06, Limosilactobacillus, 1613’. We labeled Lactobacillus rhamnosus ATCC 7469 as ‘Lactobacillus rhamnosus ATCC 7469, Lacticaseibacillus, 47715’.

In the RE task, the structured triples extracted by OpenIE also have some inaccuracies, such as redundancy and missing. In order to ensure the accuracy of RE results, we implement further manual curation based on OpenIE extraction results to improve the accuracy of data. For instance, the sentence ‘22611376|t|Cholesterol-lowering probiotics as potential biotherapeutics for metabolic diseases.’ has no extracted result. Hence, we extracted the triples manually and obtained ‘cholesterol-lowering probiotics, as potential biotherapeutics for metabolic diseases’.

Construction of BLAB2Cancer ontology and BLAB2CancerKG

We combined LAB2CancerBC with triples to obtain a structured BLAB2Cancer ontology containing relevant biomedical concepts, categories, National Center for Biotechnology Information (NCBI)_ids, associations and corresponding statements. Neo4j (https://neo4j.com/) is a graph database management system with local graph storage and processing functions. Data are stored in the form of nodes, edges or attributes. We use Python scripts to import BLAB2Cancer ontology into Neo4j to obtain LAB cancer KG, BLAB2CancerKG.

Construction of BLAB2CancerKD

We developed BLAB2CancerKD based on BLAB2CancerKG. The front end of BLAB2CancerKD is constructed by using Python Django web frame (https://www.djangoproject.com/), and the back end is implemented by using Neo4j and MySQL (https://www.mysql.com/cn/). We use pyecharts (https://pyecharts.org/#/) and cytoscape.js (https://js.cytoscape.org/) to visualize the data of KG.

Result

LAB2CancerBC

A combination of PubTator, web crawler techniques and hand curation by domain experts resulted in LAB2CancerBC with 157 species of 19 020 relevant biomedical concepts. The partial content of LAB2CancerBC is shown in Table 1. Figure 3 shows the comparison of the annotation of PubTator with the annotation of LAB2CancerBC. It can be clearly seen from the figure that LAB2CancerBC after processing is more abundant in the number of entities and categories. Figure 4a shows the tree diagram of categorized diseases. The category ‘neoplasms’ accounted for the largest proportion, up to 18.48%, which involves colon cancer, Barrett’s esophagus, breast tumor, gastric cancer, etc., then come pathological conditions, signs and symptoms, including various types of bacteremia caused by various types of reasons, sepsis, diarrhea, etc. Figure 4b shows the tree diagram of categorized neoplasms since we mainly focus on cancer. Figure 4b shows that the percentage of colorectal cancer (CRC) is the biggest, 16.45%. Then, colon cancer accounts for 11.66%, gastric cancer accounts for 3.20%, etc. We merged the cases that have small quantities into ‘other neoplasms’. Figure 4c shows the tree diagram of categorized strains. Lactobacillus accounted for the most significant proportion, including Lactobacillus plantarum, Lactobacillus acidophilus, L. fermentum, etc., then comes Lacticaseibacillus, mainly including Lactobacillus casei, Lactobacillus paracasei, L. rhamnosus, etc.

Table 1.

Partial content of LAB2CancerBC

Entity nameEntity typeNCBI_ID
LymphangioleiomyomatosisImmune system diseasesMESH: D018192
enteritisDigestive system diseaseMESH: D004751
L. pentosus strains LB2F2Lactiplantibacillus1589
L. rhamnosus P1Lacticaseibacillus47 715
BacteremiaInfectionsMESH: D016470
Entity nameEntity typeNCBI_ID
LymphangioleiomyomatosisImmune system diseasesMESH: D018192
enteritisDigestive system diseaseMESH: D004751
L. pentosus strains LB2F2Lactiplantibacillus1589
L. rhamnosus P1Lacticaseibacillus47 715
BacteremiaInfectionsMESH: D016470
Table 1.

Partial content of LAB2CancerBC

Entity nameEntity typeNCBI_ID
LymphangioleiomyomatosisImmune system diseasesMESH: D018192
enteritisDigestive system diseaseMESH: D004751
L. pentosus strains LB2F2Lactiplantibacillus1589
L. rhamnosus P1Lacticaseibacillus47 715
BacteremiaInfectionsMESH: D016470
Entity nameEntity typeNCBI_ID
LymphangioleiomyomatosisImmune system diseasesMESH: D018192
enteritisDigestive system diseaseMESH: D004751
L. pentosus strains LB2F2Lactiplantibacillus1589
L. rhamnosus P1Lacticaseibacillus47 715
BacteremiaInfectionsMESH: D016470
Figure 3.

Comparison of annotation of PubTator with annotation of LAB2CancerBC.

Figure 4.

Statistics of major entity categories in LAB2CancerBC.

LAB2Cancer ontology and BLAB2CancerKG

Combined with OpenIE for relationship extraction and expert curation, 31 434 structural triples were obtained. We combined LAB2CancerBC with triples to obtain LAB2Cancer ontology. Table 2 shows partial results of LAB2Cancer ontology. Entity 1 of each dataset in the structured triples is Node 1, the association relationship is the edge, and Entity 2 is Node 2. Each node is attached with the entity category and NCBI_ID in LAB2CancerBC. BLAB2CancerKG is constructed based on Neo4j, which contains 16 543 nodes, 23 091 edges and 72 720 attributes. A partial example of ‘LGG’, that is, ‘L. rhamnosus GG’, as the central node, is shown in Figure 5.

Table 2.

Partial results of LAB2Cancer ontology

Entity 1Entity 1 classificationEntity 1 IDRelationEntity 2Entity 2 classificationEntity 2 IDSentence
L. helveticus R389Lactobacillus1587Decreasing Interleukin-6 inMammary glandsAnatomyMESH:D04236120 550 747|milk fermented with…
Hepatic abscessDigestive system diseaseMESH:D008100As first manifestation ofRenal carcinomaNeoplasmsMESH:D00229212 073 672|t|[Hepatic abscess…
L. lactis 332Lactococcus1358Caused an accumulation ofMacrophagesAnatomyMESH:D0082649 564 792|Intraperitoneal injection of…
Invasive Pneumococcal Disease (IPD)Immune system diseaseMESH:C564468Increase in population withMultiple myelomaCardiovascular diseaseMESH:D00910120 429 967|Compared to…
CRIP1Gene1396Activated byE. faecalisSpecies135127 836 662|t|CRIP1, a novel…
Entity 1Entity 1 classificationEntity 1 IDRelationEntity 2Entity 2 classificationEntity 2 IDSentence
L. helveticus R389Lactobacillus1587Decreasing Interleukin-6 inMammary glandsAnatomyMESH:D04236120 550 747|milk fermented with…
Hepatic abscessDigestive system diseaseMESH:D008100As first manifestation ofRenal carcinomaNeoplasmsMESH:D00229212 073 672|t|[Hepatic abscess…
L. lactis 332Lactococcus1358Caused an accumulation ofMacrophagesAnatomyMESH:D0082649 564 792|Intraperitoneal injection of…
Invasive Pneumococcal Disease (IPD)Immune system diseaseMESH:C564468Increase in population withMultiple myelomaCardiovascular diseaseMESH:D00910120 429 967|Compared to…
CRIP1Gene1396Activated byE. faecalisSpecies135127 836 662|t|CRIP1, a novel…
Table 2.

Partial results of LAB2Cancer ontology

Entity 1Entity 1 classificationEntity 1 IDRelationEntity 2Entity 2 classificationEntity 2 IDSentence
L. helveticus R389Lactobacillus1587Decreasing Interleukin-6 inMammary glandsAnatomyMESH:D04236120 550 747|milk fermented with…
Hepatic abscessDigestive system diseaseMESH:D008100As first manifestation ofRenal carcinomaNeoplasmsMESH:D00229212 073 672|t|[Hepatic abscess…
L. lactis 332Lactococcus1358Caused an accumulation ofMacrophagesAnatomyMESH:D0082649 564 792|Intraperitoneal injection of…
Invasive Pneumococcal Disease (IPD)Immune system diseaseMESH:C564468Increase in population withMultiple myelomaCardiovascular diseaseMESH:D00910120 429 967|Compared to…
CRIP1Gene1396Activated byE. faecalisSpecies135127 836 662|t|CRIP1, a novel…
Entity 1Entity 1 classificationEntity 1 IDRelationEntity 2Entity 2 classificationEntity 2 IDSentence
L. helveticus R389Lactobacillus1587Decreasing Interleukin-6 inMammary glandsAnatomyMESH:D04236120 550 747|milk fermented with…
Hepatic abscessDigestive system diseaseMESH:D008100As first manifestation ofRenal carcinomaNeoplasmsMESH:D00229212 073 672|t|[Hepatic abscess…
L. lactis 332Lactococcus1358Caused an accumulation ofMacrophagesAnatomyMESH:D0082649 564 792|Intraperitoneal injection of…
Invasive Pneumococcal Disease (IPD)Immune system diseaseMESH:C564468Increase in population withMultiple myelomaCardiovascular diseaseMESH:D00910120 429 967|Compared to…
CRIP1Gene1396Activated byE. faecalisSpecies135127 836 662|t|CRIP1, a novel…
Figure 5.

Part of BLAB2CancerKG instance.

BLAB2CancerKD

BLAB2CancerKD can quickly browse and inquire about LAB cancer knowledge and download data. It includes a KG module, a data list module and a text annotation module. Users can access BLAB2CancerKD at http://110.40.139.2:18095/. Its home page is shown in Figure 6, which is mainly divided into the following three functional modules:

Figure 6.

Home page of BLAB2CancerKD.

  • BLAB2CancerKG module: This module speeds up the interaction between users and data and can fully display biomedical concepts, entity categories, NCBI_ID and the association between concepts, while providing search and data export functions. We have divided four submodules according to specific functions.

  • BLAB2CancerKG module 1 (adjustable category display): This module mainly uses data display as the main function. Users can search for ‘concept’ or ‘association’ to get KG and then click the entity label legend to hide or display other entities of different categories. The module supports data export in three forms: .csv, .JavaScript Object Notation (.JSON) and .Portable Network Graphics (.PNG), as shown in Figure 7a.

  • BLAB2CancerKG module 2 (adjustable search object), as shown in Figure 7b. After the user obtains the submap through searching, the submodule can switch between two different KG display forms by clicking the ‘switch layout’ button and support data download in .Comma-Separated Values (.CSV) format. In addition, researchers can click on a node to view the KG with this node as the center node and provide Google search and NCBI search at the same time, as shown in Figure 8.

  • BLAB2CancerKG module 3 (adjustable graph depth) allows users to view subgraphs of different node depths. We set it in the range of 1–4 layers. Other functions of this module are consistent with Module 2, as shown in Figure 7c.

Figure 7.

BLAB2CancerKG module.

Figure 8.

Node expansion of the KG module.

(4) BLAB2CancerKG module 4 (focus on cancer), in order to focus more on cancer, in this module, we have limited the scope of BLAB2CancerKG, that is, to ensure that all the information users search for under this module is related to cancer, so that users can have more efficient access to cancer-related information. The page also provides a submap of the eight tumor categories that account for the largest proportion, as shown in Figure 7d.

  • BLAB2Cancer Sheet module: This module displays all BLAB2Cancer ontology structured data in a list form, provides search and data export functions and provides PubMed links for traceability verification. The module is shown in Figure 9.

  • Text annotation module: This module includes all 7794 scientific literature abstracts related to LAB and cancer. Users can search corresponding literature abstracts through PMID or keywords. This module highlights all relevant biomedical concepts, and the corresponding category and NCBI_ID can be seen when the cursor is placed on the corresponding entity concepts. The module is shown in Figure 10.

Figure 9.

BLAB2Cancer sheet module.

Figure 10.

BLAB2CancerKD overview.

Discussion

Accessing LAB and cancer-related literature through PubMed retrieval requires researchers to spend a lot of learning costs to obtain the relevant knowledge, which is time-consuming and laborious. Our BLAB2CancerKD integrates unstructured knowledge in the literature in multiple formats, allowing researchers to access the knowledge more easily and quickly. For example, suppose researchers would like to get a quick overview of Lactobacillus reuteri, they can search for L. reuteri in the Lactic Acid Bacteria Knowledge Graph interface, and KD will show all the specific strains affiliated with this species in the graph related to other entities, as shown in Figure 11.

Figure 11.

KG of L. reuteri.

We will describe in detail how BLAB2CancerKD accelerates knowledge acquisition and promotes knowledge discovery in ‘Lactobacillus for cancer treatment or prevention’ and explain why we say ‘beyond’ based on overall data and specific examples.

LAB and cancer therapy

BLAB2CancerKD contains 50 genera, including the core Lactobacillus, Connostrea, and Amphicoccus, as well as the more peripheral Aerococcus, Botulus, Enterococcus, etc. As experts pay more attention to the potential application of specific strains in cancer treatment, we collected 591 specific strains. The association between these LAB and cancer and related diseases can be clearly seen through the KG and ontology data list. For example, we can see that L. rhamnosus GG (LGG) can reduce tumor load in a Cluster of Differentiation 8 T-cell-dependent manner (27). Lactobacillus E6-1 inhibited Cal-27 and induced apoptosis of oral cancer cells (28). L. rhamnosus Probio-M9 can treat inflammation and colitis-associated tumorigenesis by regulating intestinal environment (29). Researchers can quickly learn from BLAB2CancerKD’s multiple data presentation methods, in which LAB strains have potential in cancer therapy and adjuvant cancer therapy, and then carry out targeted research work without the need to read an enormous amount of scientific literature in advance, thus improving the efficiency of scientific research.

LAB and cancer prevention

LAB, including specific strains, can help prevent cancer. We have also collected relevant data. For example, eating Bifidobacterium Longum SPM1207 can help to improve mild to moderate hypercholesterolemia to prevent CRC (30). Lactobacillus salivarius Ren isolated from centenarians living in Bma, China (the longevity land of the world), can prevent dimethylhydrazine-induced CRC by protein kinase B inhibition (31). Lactobacillus reuteri ATCC-PTA-6475 can inhibit mammary carcinogenesis (32). Lactobacillus casei BL23 can prevent colitis-associated CRC (33). Lactobacillus casei strain Shirota could inhibit 3-methylcholanthrene-induced tumorigenesis (34). Prophylactic administration of L. casei ATCC 393 delays the onset of cancer (35).

Microbial communities and cancer

Through BLAB2CancerKG, we learned that microbial communities might also determine human health and cancer treatment. For example, compared with the gut microbiota of healthy people, disturbed microbial balance is often observed in patients with CRC, who have more Enterococcus faecalis, but less Lactobaccilus acidophilus and Lactoacillus palntarom (36, 37). A profusion of Fusobacterium nucleatum CTI-2 and a decrease in Streptococcus pneumoniae were found in oral cancer patients’ oral microbiota. This knowledge emphasizes that modulation of microbiota balance can reduce cancer incidence and targeting microbiome analysis can potentially complement existing cancer screening approaches.

LAB and cancer-related diseases

LAB can better cope with cancer complications, reduce the pain of cancer patients and improve the quality of life of patients. BLAB2CancerKD lists 1919 cancer-related diseases and summarizes how LAB respond to these diseases. For example, oral mucositis (OM) is a common complication in head and neck cancer chemotherapy. Lactobacillus reuteri DSM 17 938 and L. reuteri ATCC PTA 5289 can improve OM (38). Lactobacillus casei YIT9018 can prevent leukopenia (39) during radiotherapy in cancer patients, is an adjuvant immunotherapy and can be used in combination with radiotherapy.

Not all LAB are good for human health. For example, Streptococcus pyogenes, or Group A Streptococcus, can cause acute pharyngitis (40) and necrotizing fasciitis (41). Enterococcus faecalis infection may be involved in the progression of chronic pancreatitis and eventually lead to the development of pancreatic cancer (42).

Why ‘Beyond’? Other related entities

We describe BLAB2CancerKD as ‘beyond’ because it also contains concepts such as gene, chemical, cell line, phenomena and process, technology and food and beverages, analytical, diagnostic and therapeutic techniques and equipment, etc. These rich entities can more comprehensively and systematically show other extended concepts and relationships of LAB or cancer and expand scientific research ideas. This structured knowledge in KG may hold promise for uncovering hidden interactions between LAB and cancers. For example, Lactobacilli R389 fermented milk could delay breast cancer growth by reducing serum IL-6 and increasing IL-10 in breast and tumor-infiltrating immune cells (43). Some probiotic Lactobacillus species in kimchi, such as Anaerobia and L. plantarum, can not only be used as antibacterial agents in foods but also develop functional foods to reduce the risk of colon cancer (44, 45). Lactobacillus plantarum 299 v can improve the nutritional status, enteral nutrition tolerance and quality of life in cancer patients (46). Lunasin, a cancer-preventive peptide, is synthesized by LAB during sourdough fermentation (47). Regular intake of aspirin and other Nonsteroidal Antiinflammatory Drugs reduces the risk of CRCs and adenomas (48). Catechol can inhibit breast cancer cell proliferation and mammosphere formation (49).

Explore and discovery

While searching through PubMed, researchers need to read a large amount of literature and finally summarize and integrate all the information, which is time-consuming and laborious. Compared with searching from PubMed, the knowledge base we built integrates the unstructured knowledge scattered in the literature together in the form of graphs, and hence, researchers can search more conveniently and quickly.

Inference research can be carried out through KG, and potential relationships can be mined to provide ideas for researchers and give a start for subsequent scientific experiment verification. An example is given to show how BLAB2CancerKG can be used to facilitate biological discovery. Figure 12 shows that inulin can reduce risks for colon cancer, while salicylic acid can induce the cytotoxic effects of LGG against colon cancer. We speculate that inulin and salicylic acid may have a synergistic relationship. In the subsequent innovative research and development, it may be possible to consider combining inulin with salicylic acid to develop health products or drugs for specific cancers. As shown in Figure 13, L. salivarius inhibits colorectal carcinogenesis and shows an anticancer potential in oral cancer. Therefore, the two cancer-inducing substances may have similar properties that can be acted upon by L. salivarius. Meanwhile, periodontal disease is associated with oral cancer; therefore, there may be an association between periodontal disease and colorectal carcinogenesis. Patients with periodontal disease should be alerted to the occurrence of colorectal carcinogenesis along with the prevention of oral cancer.

Figure 12.

Speculation on the synergistic effect of inulin and salicylic acid.

Figure 13.

Potential association between periodontal disease and CRC.

Conclusion

We have constructed a knowledge graph database, BLAB2CancerKD, for the lactobacillus-cancer association using text mining techniques and in-depth expert curation. As of now, it has systematically collected and displayed 31,434 structured pieces of relevant knowledge from 7,794 literature sources. All the data can be found at http://110.40.139.2:18095/, along with the Pubmed links to the corresponding literature for verification and accuracy checking.

BLAB2CancerKD provides knowledge integration for the research and application of LAB in cancer prevention and treatment. The KG, an efficient visual representation, helps to promote the scientific research process and helps researchers discover hidden and previously unknown biological associations. BLAB2CancerKD will continue to provide data support for related research, and incremental updating of data is still part of our future task. We will also combine intelligent question answering and deep learning to further explore the association between LAB and cancer and provide new biological insights.

Author contributions

J.G. designed and supervised the study. Y.J., B.F. and J.L. carried out TM work and the development of BLAB2CancerKD, and Z.S. and Y.W. guided the manual curation work for the data.

Funding

The Inner Mongolia Science and Technology Major Special Projects (2019ZD016 and 2021ZD0005), Natural Science Foundation of Inner Mongolia Autonomous Region (2019MS03014) and Inner Mongolia Autonomous Region Graduate Research Innovation Project (S20210228Z).

Conflict of interest statement

None declared.

Acknowledgements

We would like to thank Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application for Agriculture and Animal Husbandry for the support of this work. We would also like to thank Zhihong Sun’s team and Yufei Wang’s team for the support for data curation.

References

1.

Wang
Y.
,
Wu
J.
,
Lv
M.
et al.  (
2021
)
Metabolism characteristics of lactic acid bacteria and the expanding applications in food industry
.
Front. Bioeng. Biotechnol.
,
9
, 612285.

2.

Chen
W.
(
2019
) Lactic acid bacteria . In:
Omics and Functional Evaluation
.
Springer
,
Singapore
.

3.

Wang
C.
,
Cui
Y.
and
Xiaojun
Q.
(
2018
)
Mechanisms and improvement of acid resistance in lactic acid bacteria
.
Arch. Microbiol.
,
200
,
195
201
.

4.

Tsai
Y.-T.
,
Cheng
P.-C.
and
Pan
T.-M.
(
2012
)
The immunomodulatory effects of lactic acid bacteria for improving immune functions and benefits
.
Appl. Microbiol. Biotechnol.
,
96
,
853
862
.

5.

Gilliland
S.E.
(
1990
)
Health and nutritional benefits from lactic acid bacteria
.
FEMS Microbiol. Rev.
,
87
,
175
188
.

6.

Suresh
S.
(
2007
)
Biomechanics and biophysics of cancer cells
.
Acta Biomater.
,
3
,
413
438
.

7.

National Cancer Institute
. (
2021
)
What Is Cancer?
https://www.cancer.gov/about-cancer/understanding/what-is-cancer (
11 October 2021, data last accessed
).

8.

National Cancer Institute
.
Defining Cancer
.
17 September 2007
.
Retrieved 28 March
2018
.

9.

Sung
H.
,
Ferlay
J.
,
Siegel
R.L.
et al.  (
2021
)
Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries
.
CA Cancer J. Clin.
,
71
,
209
249
.

10.

Liu
C.
,
Zheng
J.
,
Ou
X.
et al.  (
2021
)
Anti-cancer substances and safety of lactic acid bacteria in clinical treatment
.
Front. Microbiol.
,
12
, 722052.

11.

Rossetti
L.
and
Giraffa
G.
(
2005
)
Rapid identification of dairy lactic acid bacteria by M13-generated, RAPD-PCR fingerprint databases
.
J. Microbiol. Methods
,
63
,
135
144
.

12.

Kassaa
I.A.
,
Rafei
R.
,
Moukhtar
M.
et al.  (
2019
)
LABiocin database: a new database designed specifically for lactic acid bacteria bacteriocins
.
Int. J. Antimicrob. Agents
,
54
,
771
779
.

13.

de Jong
A.
,
van Hijum
S.A.
,
Bijlsma
J.J.
et al.  (
2006
)
BAGEL: a web-based bacteriocin genome mining tool
.
Nucleic Acids Res.
,
34
,
W273
W279
.

14.

Chan
R.K.
,
Wortman
C.R.
,
Smiley
B.K.
et al.  (
2003
)
Construction and use of a computerized DNA fingerprint database for lactic acid bacteria from silage
.
J. Microbiol. Methods
,
55
,
565
574
.

15.

Zhou
M.
,
Theunissen
D.
,
Wels
M.
et al.  (
2010
)
LAB-Secretome: a genome-scale comparative analysis of the predicted extracellular and surface-associated proteins of lactic acid bacteria
.
BMC Genom.
,
11
,
1
16
.

16.

Golbeck
J.
,
Fragoso
G.
,
Hartel
F.
et al.  (
2003
)
The National Cancer Institute’s thesaurus and ontology
.
J. Web Semant. First Look 1_1_4
,
1
,
75
80
.

17.

Tate
J.G.
,
Bamford
S.
,
Jubb
H.C.
et al.  (
2019
)
COSMIC: the catalogue of somatic mutations in cancer
.
Nucleic Acids Res.
,
47
,
D941
D947
.

18.

Wang
J.
,
Min
W.
,
Huang
X.
et al.  (
2022
)
SynLethDB 2.0: a web-based knowledge graph database on synthetic lethality for novel anticancer drug discovery
.
Database
,
2022
, baac030.

19.

Baumgartner Jr
W.A.
,
Cohen
K.B.
,
Fox
L.M.
et al.  (
2007
)
Manual curation is not sufficient for annotation of genomic databases
.
Bioinformatics
,
23
,
i41
i48
.

20.

Cohen
A.M.
and
Hersh
W.R.
(
2005
)
A survey of current work in biomedical text mining
.
Brief. Bioinformatics
,
6
,
57
71
.

21.

Karp
P.D.
(
2016
)
Can we replace curation with information extraction software?
Database (Oxford)
,
2016
, baw150.

22.

Mirtaheri
S.M.
et al.  (
2014
)
A brief history of web crawlers
. In: CASCON '13: Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research,
Ontario, Canada
November 2013
. pp.
40
54
.

23.

Wei
C.-H.
,
Allot
A.
,
Leaman
R.
et al.  (
2019
)
PubTator central: automated concept annotation for biomedical full text articles
.
Nucleic Acids Res.
,
47
,
W587
W593
.

24.

Lipscomb
C.E.
(
2000
)
Medical subject headings (MeSH)
.
Bull. Med. Libr. Assoc.
,
88
, 265.

25.

Federhen
S.
(
2012
)
The NCBI taxonomy database
.
Nucleic Acids Res.
,
40
,
D136
D143
.

26.

Angeli
G.
,
Premkumar
M.J.
and
Manning
C.D.
(
2015
)
Leveraging linguistic structure for open domain information extraction
. In: Proceedings of the Association of Computational Linguistics (ACL),
Beijing, China
,
July
. pp.
344
354
.

27.

Owens
J.A.
,
Saeedi
B.J.
,
Naudin
C.R.
et al.  (
2021
)
Lactobacillus rhamnosus GG orchestrates an antitumor immune response
.
Cell. Mol. Gastroenterol. Hepatol.
,
12
,
1311
1327
.

28.

Lei
G.A.O.
,
Zhi-hui
S.U.N.
,
Hao
Y.U.
et al.  (
2019
)
Inhibiting effect and mechanism of Lactobacillus E6-1 on oral cancer cell line Cal-27
.
Shanghai J. Stomatol.
,
28
, 605.

29.

Xu
H.
,
Hiraishi
K.
,
Kurahara
L.-H.
et al.  (
2021
)
Inhibitory effects of breast milk-derived Lactobacillus rhamnosus Probio-M9 on colitis-associated carcinogenesis by restoration of the gut microbiota in a mouse model
.
Nutrients
,
13
, 1143.

30.

Lee
D.K.
,
Jang
S.
,
Baek
E.H.
et al.  (
2009
)
Lactic acid bacteria affect serum cholesterol levels, harmful fecal enzyme activity, and fecal water content
.
Lipids Health Dis.
,
8
,
1
8
.

31.

Dong
Y.
,
Zhu
J.
,
Zhang
M.
et al.  (
2020
)
Probiotic Lactobacillus salivarius Ren prevent dimethylhydrazine-induced colorectal cancer through protein kinase B inhibition
.
Appl. Microbiol. Biotechnol.
,
104
,
7377
7389
.

32.

Lakritz
J.R.
,
Poutahidis
T.
,
Levkovich
T.
et al.  (
2014
)
Beneficial bacteria stimulate host immune cells to counteract dietary and genetic predisposition to mammary cancer in mice
.
Int. J. Cancer
,
135
,
529
540
.

33.

Jacouton
E.
,
Chain
F.
,
Sokol
H.
et al.  (
2017
)
Probiotic strain Lactobacillus casei BL23 prevents colitis-associated colorectal cancer
.
Front. Immunol.
,
8
, 1553.

34.

Takagi
A.
,
Ikemura
H.
,
Matsuzaki
T.
et al.  (
2008
)
Relationship between the in vitro response of dendritic cells to Lactobacillus and prevention of tumorigenesis in the mouse
.
J. Gastroenterol.
,
43
,
661
669
.

35.

Choi
S.S.
,
Kim
Y.
,
Han
K.S.
et al.  (
2006
)
Effects of Lactobacillus strains on cancer cell proliferation and oxidative stress in vitro
.
Lett. Appl. Microbiol.
,
42
,
452
458
.

36.

D’ Asheesh
T.I.A.
,
Hussen
B.M.
,
Al-Marzoqi
A.H.
et al.  (
2021
)
Assessment of oncogenic role of intestinal microbiota in colorectal cancer patients
.
J. Gastrointest Cancer
,
52
,
1016
1021
.

37.

Motamedi
H.
,
Ari
M.M.
,
Shahlaei
M.
et al.  (
2023
)
Designing multi-epitope vaccine against important colorectal cancer (CRC) associated pathogens based on immunoinformatics approach
.
BMC Bioinform.
,
24
,
1
31
.

38.

Gupta
N.
,
Ferreira
J.
,
Hong
C.H.L.
et al.  (
2020
)
Lactobacillus reuteri DSM 17938 and ATCC PTA 5289 ameliorates chemotherapy-induced oral mucositis
.
Sci. Rep.
,
10
,
1
11
.

39.

Okawa
T.
,
Kita
M.
,
Arai
T.
et al.  (
1989
)
Phase II randomized clinical trial of LC9018 concurrently used with radiation in the treatment of carcinoma of the uterine cervix. Its effect on tumor reduction and histology
.
Cancer
,
64
,
1769
1776
.

40.

Chandnani
H.K.
,
Jain
R.
and
Patamasucon
P.
(
2015
)
Group C streptococcus causing rheumatic heart disease in a child
.
J. Emerg. Med.
,
49
,
12
14
.

41.

Stetzner
Z.W.
,
Li
D.
,
Feng
W.
et al.  (
2015
)
Serotype M3 and M28 group A streptococci have distinct capacities to evade neutrophil and TNF-α responses and to invade soft tissues
.
PLoS One
,
10
, e0129417.

42.

Maekawa
T.
,
Fukaya
R.
,
Takamatsu
S.
et al.  (
2018
)
Possible involvement of Enterococcus infection in the pathogenesis of chronic pancreatitis and cancer
.
Biochem. Biophys. Res. Commun.
,
506
,
962
969
.

43.

de Leblanc
A.D.M.
and
Perdigón
G.
(
2010
)
The application of probiotic fermented milks in cancer and intestinal inflammation
.
Proc. Nutr. Soc.
,
69
,
421
428
.

44.

Chang
J.-H.
,
Shim
Y.Y.
,
Cha
S.-K.
et al.  (
2012
)
Effect of Lactobacillus acidophilus KFRI342 on the development of chemically induced precancerous growths in the rat colon
.
J. Med. Microbiol.
,
61
,
361
368
.

45.

Park
J.
,
Oh
S.
and
Cha
Y.
(
2014
)
Lactobacillus brevis OPK-3 isolated from kimchi inhibits adipogenesis and exerts anti-inflammation in 3T3-L1 adipocyte
.
J. Sci. Food Agric.
,
94
,
2514
2520
.

46.

Kaźmierczak-Siedlecka
K.
,
Folwarski
M.
,
Skonieczna-Żydecka
K.
et al.  (
2020
)
The use of Lactobacillus plantarum 299v (DSM 9843) in cancer patients receiving home enteral nutrition–study protocol for a randomized, double-blind, and placebo-controlled trial
.
Nutr. J.
,
19
,
1
8
.

47.

Rizzello
C.G.
,
Nionelli
L.
,
Coda
R.
et al.  (
2012
)
Synthesis of the cancer preventive peptide lunasin by lactic acid bacteria during sourdough fermentation
.
Nutr. Cancer
,
64
,
111
120
.

48.

Narisawa
T.
(
1998
)
An overview on chemoprevention of colorectal cancer
.
Nihon Geka Gakkai Zasshi
,
99
,
362
367
.

49.

Choi
H.S.
,
Kim
J.-H.
,
Kim
S.-L.
et al.  (
2018
)
Catechol derived from aronia juice through lactic acid bacteria fermentation inhibits breast cancer stem cell formation via modulation Stat3/IL‐6 signaling pathway
.
Mol. Carcinog.
,
57
,
1467
1479
.

Author notes

Co-first author.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.