CPMKG: a condition-based knowledge graph for precision medicine

Author Notes

Abstract

Personalized medicine tailors treatments and dosages based on a patient’s unique characteristics, particularly its genetic profile. Over the decades, stratified research and clinical trials have uncovered crucial drug-related information—such as dosage, effectiveness, and side effects—affecting specific individuals with particular genetic backgrounds. This genetic-specific knowledge, characterized by complex multirelationships and conditions, cannot be adequately represented or stored in conventional knowledge systems. To address these challenges, we developed CPMKG, a condition-based platform that enables comprehensive knowledge representation. Through information extraction and meticulous curation, we compiled 307 614 knowledge entries, encompassing thousands of drugs, diseases, phenotypes (complications/side effects), genes, and genomic variations across four key categories: drug side effects, drug sensitivity, drug mechanisms, and drug indications. CPMKG facilitates drug-centric exploration and enables condition-based multiknowledge inference, accelerating knowledge discovery through three pivotal applications. To enhance user experience, we seamlessly integrated a sophisticated large language model that provides textual interpretations for each subgraph, bridging the gap between structured graphs and language expressions. With its comprehensive knowledge graph and user-centric applications, CPMKG serves as a valuable resource for clinical research, offering drug information tailored to personalized genetic profiles, syndromes, and phenotypes.

Database URL: https://www.biosino.org/cpmkg/

Introduction

The primary challenge in drug therapy is the wide variation in individual responses to medications. This is due to differences in drug metabolism and physiological conditions [1–3]. Precision medicine, unlike the traditional one-size-fits-all approach, tailors treatments to each patient’s unique genetic makeup and health profile [4, 5]. It recognizes the individuality of each person, customizing therapies accordingly. Currently, the most accurate drug information is found on labels and in the Clinical Pharmacogenetics Implementation Consortium (CPIC) guidelines, which explain how genomic data can inform decisions on dosages, metabolism, and potential adverse reactions to certain drugs [6, 7]. However, the FDA has detailed genomic information for only ∼380 drugs on their labels [8]. A significant portion of precision medication data remains buried in basic drug research databases and academic literature [9].

Existing data resources like PharmGKB [10], focusing on pharmacology and pharmacogenomics, CTD [11], with its specialization in toxicological data, and DrugBank [12], offering comprehensive drug information, contain a wealth of drug-related knowledge. However, their usefulness is hindered by a lack of a unified knowledge representation framework and their scattered presence across various platforms. This fragmentation makes it difficult for researchers and clinicians to fully utilize these resources [13]. Other knowledge resources such as the Precision Medicine Knowledgebase (PreMedKB) [14] consolidate ∼500 000 structured precision medicine relationships. But a detailed analysis shows that many of these relationships are oversimplified, labeled merely as “associate” or “effect,” and fail to reflect the nuanced information in the original text. This issue arises from the traditional reliance on “triples” in knowledge graphs as the basic unit of knowledge representation. Traditional triples—comprising a subject, predicate, and object—are limited in conveying complex biomedical information, especially in representing the conditions for the establishment of knowledge [15]. They fail to capture the detailed genomic context required for precision medicine. Therefore, there is an urgent need for methods that can more accurately represent precision medicine knowledge by integrating these genetic conditions into knowledge representation. Furthermore, strategies in integration, mining, and governance should be developed to generate knowledge resources that better meet the requirements of precision medicine. Such resources would resolve ambiguities in current knowledge bases and focus on precise and accurate knowledge representation and dissemination.

To tackle existing challenges in precision medicine knowledge representation, we have developed Condition-based Precision Medicine Knowledge Graph (CPMKG), a comprehensive and advanced knowledge graph based on conditional precision medicine data. Our achievements include the following:

We introduced the “Hyper-Triple,” a novel framework that redefines the core data units in knowledge graphs. This framework overcomes the limitations of traditional triples by incorporating specific conditions (genetic backgrounds), ensuring that certain relationships are valid only under specific circumstances. This approach enhances the accuracy and depth of our knowledge representation, distinguishing CPMKG from traditional knowledge graphs.
We developed a “knowledge pattern” approach for organizing data. This method summarizes events involving multiple entities into a model, serving as an abstraction of conditional domain knowledge derived from the literature or expert input. These patterns are represented using the “Hyper-Triple” framework, guiding the collection of domain-specific knowledge.
CPMKG defines four key knowledge patterns for precision medicine research and application. It integrates 307 614 pieces of knowledge, addressing the needs of personalized medicine and drug discovery. The graph presents explicit relationships and constraints, enhancing the precision of therapies.
The knowledge graph offers a drug-centric exploration landscape, merging insights from molecular and clinical research into a comprehensive reasoning map. This facilitates the discovery of valuable evidence, supports medication synergy, and incorporates pharmacogenomics for holistic drug recommendations.
CPMKG employs a large language model (LLM) to improve understanding of the knowledge graph. It provides clear explanations for subgraphs generated through system reasoning, balancing structured information with language expression for better user comprehension.

Materials and methods

Data collection

To delve into precision medicine knowledge, CPMKG has aggregated and restructured 15 727 studies from nine key drug-related databases: PharmGKB [10], SIDER [16], CIViC [17], DrugBank [12], TTD [18], CTD [11], DCDB [19], DoCM [20], and PharmacotherapyDB (https://github.com/dhimmel/indications). We meticulously extracted and mined information on relationships between various entities, such as drug side effects, sensitivities, molecular mechanisms, and treatments.

Knowledge pattern and conditional knowledge representation

CPMKG knowledge pattern

Biomedical literature serves as a vital repository of knowledge, where authors craft sentences to define and delineate key concepts. However, not every sentence contains critical information. Effective distillation of pertinent knowledge enhances the precision of text mining tools, bolstering their ability to expand knowledge bases and clarify the application scope of knowledge graphs. Our research primarily examines pharmacogenomics and drug research papers. In clinical drug studies, the focus often lies on treatment and side effects, whereas drug discovery research prioritizes understanding drug sensitivity and mechanisms. Despite varying linguistic expressions, these knowledge sources share semantic and entity-level commonalities. We harness these similarities through knowledge patterns to capture essential information accurately.

In CPMKG, knowledge is organized into four distinct patterns: Pattern 1 links drug side effects to disease treatment, highlighting genetic variations that signal increased risk (e.g. the C allele of olanzapine and metabolic syndrome in schizophrenia, Fig. 1a) [21–23]. Pattern 2 focuses on drug sensitivity, showing how genetic variations impact treatment outcomes (e.g. reduced tacrolimus need in liver transplant patients with the CC genotype, Fig. 1b) [24–26]. Pattern 3 elucidates drug mechanisms by connecting drug usage to changes in gene expression influenced by genetic variations (e.g. reduced AKR1C3 enzyme activity with the A allele during daunorubicin treatment, Fig. 1c) [27]. Pattern 4 outlines drug indications, providing therapeutic insights for specific drugs or combination therapies (e.g. enzalutamide for prostate cancer, Fig. 1d) [28]. These knowledge patterns often use genetic variations as conditions, highlighting the need for a more nuanced representation.

Figure 1.

Knowledge patterns in CPMKG. (a) Side effects refer to side effects or complications that occur during the use of medication. Example: association between olanzapine and metabolic syndrome risk in schizophrenia patients. (b) Drug sensitivity refers to an individual’s propensity to exhibit a heightened or exaggerated response to medication compared to the average population. Example: influence of the CC genotype on tacrolimus requirement in liver transplant patients. (c) Drug mechanism refers to the relationship between an individual’s genome and their response to medications. Example: reduction of AKR1C3 enzyme activity during daunorubicin treatment in patients with the A allele variant. (d) Drug indication refers to the formal recommendation for medication use in treating specific diseases or pathological conditions. Example: utilization of enzalutamide in prostate cancer treatment.

Open in new tab Download slide

Conditional knowledge representation framework and storage

The knowledge patterns we designed, including node-to-edge (e.g. conditional) and edge-to-edge (e.g. causal) connections, cannot be intrinsically represented by conventional knowledge graphs, which only support node-to-node triples. To address this limitation, we extend to a “hypergraph” that accommodates complex information and allows for these non-node-to-node connections. In our knowledge graph, tuples can denote relationships between both entities and relations. For a tuple (S, R, and O), where S, R, and O stand for subject, relation, and object, respectively, R acts as a predicate, while S and O can be entities or other tuples. This structure is defined as follows:

$$G = {\ }\langle V,E\rangle \\[-6pt]$$

$$E = {E_{{\rm{vv}}}},{E_{{\rm{ev}}}},{E_{{\rm{ee}}}}$$

Here, |${E_{{\rm{vv}}}}$| represents edges that connect a vertex or an entity to another. |${E_{{\rm{ev}}}}$| denotes edges that link a vertex to an edge or vice versa. Finally, |${E_{{\rm{ee}}}}$| signifies edges that connect two edges. In a semantic context, |${E_{ev}}$| is often utilized to denote a constraint condition for a tuple, while |${E_{ee}}$| usually describes how one tuple (or event) leads to another. This framework allows for a more nuanced representation of complex relationships and conditions within the knowledge graph (Supplementary Fig. S2).

Our graph structure incorporates causes, conditions, and other crucial information, enabling detailed exploration of relationships. This intricate hypergraph comprises three fundamental structures: node-to-node, node-to-relation, and relation-to-relation connections, as shown in Supplementary Fig. S1a–c. Our knowledge representation framework also includes concept composition, representing combined medication as a collective “ALL” union of various drugs. We introduce “gate” nodes, inspired by logic gates, to amalgamate concepts and relations into new entities. We use two primary gates: the “AND gate,” integrating all members, and the “OR gate,” combining some members (see Supplementary Fig. S1d). Multiple nodes or edges directly connected without a gate node are considered independent. Additionally, our framework allows for the expression of negation and likelihood in all relations, making it highly expressive and adaptable for various scenarios.

Due to our updated knowledge representation, the classic knowledge graph storage method cannot accommodate our framework. To address this, we can adapt the data structure to better fit the storage capabilities of contemporary graph databases. Our approach involves integrating a helper node within a relationship, serving as a meaningful predicate. Specifically, we augment relationship predicates to function similarly to entities. We introduce a special relationship node that represents the original relationship predicate. This node uses “from” and “to” edges to indicate the direction of the relationship between the subject and object nodes. As a result, a triple’s relationship is routed through this node, facilitating connections to entities or other triples. Furthermore, we can insert an auxiliary node within relationships to convey complex relationships more effectively, as demonstrated in Supplementary Fig. S2D. Additionally, this method enhances the flexibility and scalability of our knowledge graph, allowing for more intricate data representations and improving query performance.

Conditional knowledge mining

To ensure better alignment with our knowledge patterns, we refined our approach using several methods for knowledge mining and data integration.

We employed automated entity and relationship extraction techniques, utilizing regular expressions to parse unstructured text from databases such as PharmGKB [10], DrugBank [12], and CTD [11]. This approach allowed us to identify and extract entities and their relationships, converting raw text into structured data for our knowledge patterns. For a comprehensive description, refer to the Supplementary Methods for Processing Each Database.

From PharmGKB [10], we extracted 13 055 entries, and from CTD [11], we extracted 143 280 entries, which were then standardized and integrated.

To create a unified dataset for databases like SIDER [16], where data were scattered across multiple files, we merged and standardized various data tables. This process ensured consistency and completeness, resulting in a consolidated dataset of 89 491 entries.

For databases like TTD [18], which provided data in HTML format, we extracted relevant information by parsing the HTML content and reorganized it into a standardized dataset format, processing a total of 1002 entries.

In databases like PharmacotherapyDB (https://github.com/dhimmel/indications), where files contained duplicates, we performed deduplication by comparing data points and removing redundant entries. This process ensured each entry was unique and accurate, resulting in 11 751 unique entries post-deduplication.

For databases like DCDB [19], which listed multiple drugs in a single table, we isolated each drug entry and combined relevant data points to form a comprehensive dataset, processing 496 entries.

For databases like CIViC [17] and DoCM [20], which lacked structured relationships or complete data, we employed manual curation. Experts identified and added missing entities and relationships to ensure completeness and accuracy. This resulted in the curation of 1998 entries from CIViC [17] and 89 entries from DoCM [20], filling gaps and ensuring accurate representation.

Graph interpretation by LLM

To improve the user experience of CPMKG, we have incorporated the ChatGPT API, specifically the gpt-4o, into our web application. The integration of the AI-generated content model equips our web application with advanced natural language processing capabilities, significantly enhancing its interactivity and intelligence. We have designed prompts for four distinct application scenarios (Supplementary Table S3). The graphic descriptions generated in response to these prompts are tailored to user needs and supported by evidence from our knowledge graph. In this setup, the LLM refines and consolidates our knowledge, producing content that is both user-friendly and accurate. Importantly, we ensure that the model strictly adheres to the information in the knowledge graph, preventing the introduction of unsupported information and avoiding the risk of model hallucination.

Entity disambiguation

Entity disambiguation is a critical process for resolving ambiguities among entities sharing the same name. In CPMKG, this technique is applied to standardize five distinct types of entities: drugs, diseases, phenotypes, genetic variations, and genes. Each of these entities is associated with its own ontology, encompassing controlled vocabularies of standard names, synonyms, and IDs (the ontologies used for these entities are detailed in Supplementary Table S2).

For entities where a direct correlation exists between the source database entity ID and the target ontology ID, we employ ID mapping for straightforward standardization. For entities lacking this direct link, name mapping is utilized. Using controlled vocabulary M = (m1, m2, …, mn), we map the original structured names N of these entities against M to identify the most accurate terms. When this mapping results in a unique ID, it is adopted as the entity’s external ID. In cases where the mapping leads to multiple “best matches,” manual correction is undertaken.

$${{{\Gamma }}_{{\rm{best}}}} = {\rm{argma}}{{\rm{x}}_{{\Gamma }}}\mathop {\mathop \sum}\limits^n_{i = 0} \varphi ({m_i},N)$$

Results

Statistics on entities and knowledge in CPMKG

CPMKG aims to advance precision medicine and drug discovery in clinical research. Utilizing a unified knowledge representation framework, CPMKG consolidates comprehensive pharmaceutical knowledge through processes like knowledge acquisition, element mining, and restructuring (Supplementary Fig. S3). This process has yielded 307 614 pieces of detailed drug knowledge, including 139 824 entries on side effects, 9819 on drug sensitivity, 144 269 on drug mechanisms, and 13 702 on drug indications. This comprehensive data encompasses 2150 drugs, 1689 diseases, 1719 phenotypes, 5029 genetic variations, and 20 111 genes (Table 1). For a more detailed statistical breakdown, including filtering and merging across various databases, refer to Supplementary Table S1.

Table 1.

Open in new tab

Statistics on entities and knowledge in CPMKG

Knowledge pattern	Knowledge source	Knowledge	Drug	Disease	Phenotype	Variant	Gene
Side effects	SIDER, PharmGKB, and DrugBank	139 824	1712	544	1719	1447	597
Drug sensitivity	CIViC, TTD, DrugBank, DoCM, and PharmGKB	9819	496	336	–	3478	880
Drug mechanisms	CIViC, CTD, and PharmGKB	144 269	1039	–	–	682	19 984
Drug indications	PharmacotherapyDB, DCDB, and SIDER	13 702	1394	1544	–	–	–
Total	–	307 614	2150	1689	1719	5029	20 111

Knowledge pattern	Knowledge source	Knowledge	Drug	Disease	Phenotype	Variant	Gene
Side effects	SIDER, PharmGKB, and DrugBank	139 824	1712	544	1719	1447	597
Drug sensitivity	CIViC, TTD, DrugBank, DoCM, and PharmGKB	9819	496	336	–	3478	880
Drug mechanisms	CIViC, CTD, and PharmGKB	144 269	1039	–	–	682	19 984
Drug indications	PharmacotherapyDB, DCDB, and SIDER	13 702	1394	1544	–	–	–
Total	–	307 614	2150	1689	1719	5029	20 111

En-dashes (–) : data is not available for this knowledge pattern.

Table 1.

Open in new tab

Statistics on entities and knowledge in CPMKG

Knowledge pattern	Knowledge source	Knowledge	Drug	Disease	Phenotype	Variant	Gene
Side effects	SIDER, PharmGKB, and DrugBank	139 824	1712	544	1719	1447	597
Drug sensitivity	CIViC, TTD, DrugBank, DoCM, and PharmGKB	9819	496	336	–	3478	880
Drug mechanisms	CIViC, CTD, and PharmGKB	144 269	1039	–	–	682	19 984
Drug indications	PharmacotherapyDB, DCDB, and SIDER	13 702	1394	1544	–	–	–
Total	–	307 614	2150	1689	1719	5029	20 111

Knowledge pattern	Knowledge source	Knowledge	Drug	Disease	Phenotype	Variant	Gene
Side effects	SIDER, PharmGKB, and DrugBank	139 824	1712	544	1719	1447	597
Drug sensitivity	CIViC, TTD, DrugBank, DoCM, and PharmGKB	9819	496	336	–	3478	880
Drug mechanisms	CIViC, CTD, and PharmGKB	144 269	1039	–	–	682	19 984
Drug indications	PharmacotherapyDB, DCDB, and SIDER	13 702	1394	1544	–	–	–
Total	–	307 614	2150	1689	1719	5029	20 111

En-dashes (–) : data is not available for this knowledge pattern.

In terms of entity disambiguation, the process resulted in the standardization of 30 698 entities, with 918 entities not aligned with external database mappings. Notably, there are 618 entities in CPMKG that can be classified as both diseases and phenotypes. These entities are represented in different knowledge patterns: diseases are associated with treatments, while phenotypes are linked to side effects. Users can choose the appropriate classification based on their specific use case.

Unlike traditional methods that integrate databases from diverse sources, CPMKG focuses on aligning data sources to predefined knowledge patterns. This methodology involves literature mining and manual curation based on original evidence within these patterns. This strategy not only gathers essential knowledge elements, such as drug interactions and genomic variations, but also enhances existing data by introducing new knowledge patterns.

Conditional knowledge-based schema of CPMKG

The conditional knowledge-based schema of CPMKG is constructed from four primary knowledge patterns: drug side effects, drug sensitivity, drug mechanisms, and drug indication. These patterns form the foundational elements of the schema, including entities such as drugs, diseases, phenotypes, genes, and variations. The relationships among these entities establish the schema’s structure (Fig. 2). Differing from traditional knowledge graphs, CPMKG incorporates critical yet long-missing causal and conditional associations, embodied in relationships between entities and triples, as well as among triples themselves. This allows for a nuanced representation of precision medicine knowledge, highlighting differences in individual genetic backgrounds and population characteristics.

Figure 2.

Conditional knowledge-based schema of CPMKG. This schema includes foundational elements such as drugs, diseases, phenotypes, genes, and variations. “Drugs” cover pharmacological substances, “diseases” encompass pathological conditions, “variations” refer to differences in the human genome, “phenotypes” include side effects or complications, and “genes” pertain to human genes. This schema illustrates the integration of these entities and their detailed relationships, highlighting the four conditional knowledge patterns in precision medicine.

Open in new tab Download slide

Drug-centered conditional knowledge exploration

CPMKG empowers researchers to delve into drug-centered research in precision medicine. It offers users access to knowledge across four categories, including medication recommendations and pharmacogenomics, anchored in core elements like drugs and genetic variations. For example, Fig. 3a displays a knowledge list centered on “warfarin.” It provides switchable lists covering four types of precision medicine knowledge, enabling users to deeply understand and compare warfarin’s effects across different genetic backgrounds and explore personalized drug recommendations via entity-condition-relationship pairs.

Figure 3.

Knowledge exploration in CPMKG. (a) Precision medicine knowledge list. A list centered on “warfarin,” comprising four distinct patterns. (b) Knowledge unit details. Illustrated by “warfarin treatment side effects,” it includes graphical representation, established conditions, evidence sources, and entity details. (c) Multiple evidence explorations. Subgraph exploration centered on metformin, along with knowledge description. (d) Systematic reasoning. Illustration of pemetrexed’s efficacy in MPM treatment and its correlation with reduced KRAS expression, suggesting a shared mechanism with methotrexate, supporting methotrexate’s potential effectiveness in MPM treatment.

Open in new tab Download slide

Furthermore, CPMKG presents each knowledge unit graphically, allowing users to visualize the type of knowledge, the conditions under which it was established, and its evidence sources within the knowledge graph. This is accompanied by detailed knowledge descriptions and annotations for each entity. For instance, Fig. 3b demonstrates that under the genetic background NC_000010.11:g.94981296A>C, warfarin treatment for venous thromboembolism heightens bleeding risk [29]. This graphical representation provides an intuitive understanding of the knowledge, while the descriptions and entity annotations offer an in-depth comprehension of the graph.

Knowledge inference with multiple evidence

Each knowledge unit, representing a specific knowledge pattern, can effectively communicate the author’s intended information. However, the scope of knowledge conveyed by a single piece of literature remains confined. CPMKG, as a knowledge graph, empowers researchers to integrate multiple knowledge instances into subgraphs, each drawing on numerous evidence sources. This allows for systematic reasoning about pertinent knowledge connections within these subgraphs.

Take, for example, the frequent occurrence of KRAS mutations (KRAS is a proto-oncogene that encodes a GTPase) in cancer, a factor in >20% of human cancers. These mutations are also present in patients with malignant pleural mesothelioma (MPM). As shown in Fig. 3d, pemetrexed effectively treats MPM and reduces KRAS protein expression, a mechanism shared with methotrexate [30]. This shared mechanism led us to hypothesize methotrexate’s effectiveness in treating MPM, a hypothesis supported by the literature [31].

To make these subgraphs more accessible, we have innovatively combined LLMs with precise, specialized medical knowledge from our database, including reference articles, to provide clear descriptions for each subgraph. We offer four distinct scenarios for varied graph descriptions, skillfully connecting structured graphs with narrative language. For instance, Fig. 3c displays a subgraph centered on “metformin,” illustrating gene variations and their impact on diabetic patients’ responses and side effects. The rs784888(G) allele correlates with a better response to metformin, reducing hyperglycemia severity compared to rs784888(C) [32], while the rs5219(T) allele is linked to an increased likelihood of treatment failure compared to rs5219(C) [33]. Common side effects for hyperglycemia patients taking metformin include blurred vision, urticaria, pruritus, skin rash, tremor, lethargy, hypertension, and syncope.

Advanced application of CPMKG (case study)

CPMKG enhances the drug usage experience with three user-centric applications: personalized drug suggestion, which offers tailored medical advice; pharmacogenomics application, accelerating drug mechanism research and discovering new applications for existing drugs; and medication synergy assistant, aiding in the selection of effective drugs or drug combinations.

Case study 1: personalized drug suggestion

Personalized drug suggestion aids clinical research by enabling individualized medical advice based on patients’ diagnostic outcomes and genetic backgrounds. Consider Fig. 4a, which delineates crucial factors for prescribing medication to breast cancer patients carrying the NC_000002.12:g.38071060G>A,C genetic variant. For the GG genotype, docetaxel may be ineffective [34]. In contrast, the CC genotype can increase nausea risk with doxorubicin [35] and potentially lead to diminished efficacy with epirubicin [36]. However, this variant does not impact the effectiveness of gemcitabine and paclitaxel [37]. Importantly, cyclophosphamide has the potential to reduce peripheral neuropathy risk in patients with the C allele [38]. CPMKG provides valuable genotype-specific references to support prescription choices in clinical research.

Figure 4.

Advanced application of CPMKG. (a) Personalized drug suggestion offers tailored medical advice based on diagnostic outcomes and genetic backgrounds. Example: crucial factors for prescribing medication to breast cancer patients with the NC_000002.12:g.38071060G>A,C variant. (b) Pharmacogenomics focuses on understanding drug mechanisms for personalized medicine and novel drug discovery. Example: ivacaftor’s effects on various CFTR alleles and genotypes in CF. (c) Medication synergy assistant optimizes treatment outcomes and patient safety, particularly with multiple drugs. Example: effects of interferon α-2b and ribavirin in treating chronic hepatitis C.

Open in new tab Download slide

Case study 2: pharmacogenomics application

Pharmacogenomics plays a crucial role in understanding drug mechanisms for personalized medicine and novel drug discovery, particularly in diseases like cystic fibrosis (CF). CF, caused by mutations in the CF transmembrane conductance regulator (CFTR) gene [39], can be treated with ivacaftor, a CFTR potentiator that enhances CFTR protein function [40]. Figure 4b demonstrates ivacaftor’s effects on various CFTR alleles and genotypes, highlighting 11 genomic variations that significantly influence the drug’s pharmacological response in the human body. For instance, ivacaftor treatment alters CFTR activity in the NC_000007.14:g.117603654T>A,C and NC_000007.14:g.117611620A>C variants. Additionally, the NC_000007.14:g.117559592_117559594del variant is linked with increased CFTR transport [41] but does not affect the protein’s thermal stability [42]. Such insights are invaluable for developing targeted treatments for patients with CFTR-related conditions.

Case study 3: medication synergy assistant

In drug indication, both efficacy and side effects are of paramount importance. Medication synergy assistants can optimize treatment outcomes and bolster patient safety, particularly when multiple drugs are used, either in combination or individually. For example, Fig. 4c demonstrates the effects of interferon α-2b and ribavirin in treating chronic hepatitis C [43]. Both drugs, whether used separately or together, significantly increase the risk and severity of anemia. However, patients with the NC_000020.11:g.3271278A>C and NC_000020.11:g.3213247A>C genetic variants experience less severe anemia after ribavirin treatment [44, 45]. Consequently, ribavirin therapy is recommended for patients with these specific genetic profiles.

Discussion

Despite the advancements achieved with CPMKG, some detailed aspects still require deeper exploration. Precision medicine demands a thorough understanding of both entities and their attributes, such as gene variant genotypes and clinical indices. Transitioning from traditional triples to hyper-triples presents challenges in making accurate inferences due to the detailed and specific conditions involved. However, genetic information in drug databases is limited, and even the original literature often lacks necessary genomic details. As this area is under-researched, we aim to refine this in future studies and encourage broader contributions. This shift also underscores the need for further research into advanced knowledge reasoning methods. Our forward-looking approach leverages the sophisticated understanding capabilities of LLMs to decode complex semantics and hyper-triples. Additionally, we aim to utilize natural language interpretation based on intricate knowledge reasoning, driving the advancement of application-focused knowledge graphs.

Conclusion

CPMKG revolutionizes traditional drug knowledge by incorporating refined elements like specific conditions, making it ideal for precision medicine. Our knowledge graph offers personalized medication recommendations based on patients’ genetic profiles, serving as a reference for clinical practice. It also supports researchers by facilitating drug metabolism studies and targeted drug discovery. Unique in its approach, CPMKG employs the “hyper-triple” concept in knowledge representation, capturing the complex nuances of precision medicine with remarkable accuracy. It merges and rationalizes various precision medicine knowledge pieces through innovative knowledge graph construction methods. This process not only uncovers information overlooked in current research but also enhances the understanding and application of these knowledge graphs in clinical research. Furthermore, the hypergraph structure can be seamlessly integrated into any graph database, accommodating existing database technologies while ensuring minimal information loss compared to the original research publications. This effectively preserves the depth and complexity of the relationships, providing a robust and comprehensive foundation for future clinical-related research.

To make our knowledge graph more user-friendly, we have integrated LLM for graph interpretation. This integration not only advances our construction methods but also enriches the fusion of structured graphs with textual data. It broadens the spectrum of user engagement with knowledge graphs, paving the way for new perspectives in their representation, storage, and interpretation.

Acknowledgements

We would like to thank Dr Qingwei Xu from Ezhou Industrial Technology Research Institute, Huazhong University of Science and Technology for his support of website development.

Supplementary data

Supplementary data is available at Database online.

Conflict of interest

None declared.

Funding

This research was supported by the National Key Research and Development Program of China (Grant Nos 2021YFC2301502, 2021YFF0703702, 2016YFC0901904, and 2023YFA0915501); Key disciplines in the three-year Plan of Shanghai municipal public health system (2023–2025) (GWVI-11.1-42); Shanghai Science and Technology Innovation Action Plan (Grant No. 23JS1401500); Shanghai Municipal Science and Technology Major Project; and R&D Program of Guangzhou National Laboratory (Grant No. GZNL2024A01002).

Data availability

CPMKG is publicly accessible through https://www.biosino.org/cpmkg/. All data and resources hosted on the platform are freely accessible.

References

Jian

Gao

et al.

Pharmacokinetics in pharmacometabolomics: towards personalized medication

Pharmaceuticals (Basel)

2023

;

:1568. doi:

10.3390/ph16111568

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

Hamburg

Collins

The path to personalized medicine

N Engl J Med

2010

;

363

301

–

. doi:

Dingemanse

Appel-Dingemanse

Integrated pharmacokinetics and pharmacodynamics in drug development

Clin Pharmacokinet

2007

;

713

–

. doi:

10.2165/00003088-200746090-00001

Schee Genannt Halfmann

Evangelatos

Schröder-Bäck

et al.

European healthcare systems readiness to shift from ‘one-size fits all’ to personalized medicine

Per Med

2017

;

–

Naithani

Sinha

Misra

et al.

Precision medicine: concept and tools

Med J Armed Forces India

2021

;

249

–

Caudle

Klein

Hoffman

et al.

Incorporation of pharmacogenomics into routine clinical practice: the Clinical Pharmacogenetics Implementation Consortium (CPIC) guideline development process

Curr Drug Metab

2014

;

209

–

Relling

Klein

Gammal

et al.

The Clinical Pharmacogenetics Implementation Consortium: 10 years later

Clin Pharmacol Ther

2020

;

107

171

–

Kim

Ceccarelli

Pharmacogenomic biomarkers in US FDA-approved drug labels (2000-2020)

J Pers Med

2021

;

:179.

Google Scholar

OpenURL Placeholder Text

WorldCat

Scott

Personalizing medicine with clinical pharmacogenetics

Genet Med

2011

;

987

–

10.

Barbarino

Whirl-Carrillo

Altman

et al.

PharmGKB: a worldwide resource for pharmacogenomic information

Wiley Interdiscip Rev Syst Biol Med

2018

;

:e1417.

Google Scholar

OpenURL Placeholder Text

WorldCat

11.

Davis

Wiegers

et al.

CTD tetramers: a new online tool that computationally links curated chemicals, genes, phenotypes, and diseases to inform molecular mechanisms for environmental health

Toxicol Sci

2023

;

195

155

–

12.

Wishart

Feunang

Guo

et al.

DrugBank 5.0: a major update to the DrugBank database for 2018

Nucleic Acids Res

2018

;

D1074

–

D1082

13.

Wilkinson

Dumontier

Aalbersberg

et al.

The FAIR Guiding Principles for scientific data management and stewardship

Sci Data

2016

;

:160018.

Google Scholar

OpenURL Placeholder Text

WorldCat

14.

Wang

Xia

et al.

PreMedKB: an integrated precision medicine knowledgebase for interpreting relationships between diseases, genes, variants and drugs

Nucleic Acids Res

2019

;

D1090

–

101

15.

Anderson

Müller

Hanbury

et al.

Formal ontologies in biomedical knowledge representation

Yearb Med Inform

2013

;

132

–

Google Scholar

OpenURL Placeholder Text

WorldCat

16.

Kuhn

Letunic

Jensen

et al.

The SIDER database of drugs and side effects

Nucleic Acids Res

2016

;

D1075

–

17.

Griffith

Spies

Krysiak

et al.

CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer

Nat Genet

2017

;

170

–

18.

Zhou

Zhang

Zhao

et al.

TTD: Therapeutic Target Database describing target druggability information

Nucleic Acids Res

2024

;

D1465

–

19.

Liu

Wei

et al.

DCDB 2.0: a major update of the drug combination database

Database (Oxford)

2014

;

2014

:bau124.

Google Scholar

OpenURL Placeholder Text

WorldCat

20.

Ainscough

Griffith

Coffman

et al.

DoCM: a database of curated mutations in cancer

Nat Methods

2016

;

806

–

21.

Mulder

Franke

van Der-beek van der

et al.

The association between HTR2C gene polymorphisms and the metabolic syndrome in patients with schizophrenia

J Clin Psychopharmacol

2007

;

338

–

22.

Risselada

Vehof

Bruggeman

et al.

Association between HTR2C gene polymorphisms and the metabolic syndrome in patients using antipsychotics: a replication study

Pharmacogenomics J

2012

;

–

23.

Maimaitirexiati

Zhang

et al.

HTR2C polymorphisms, olanzapine-induced weight gain and antipsychotic-induced metabolic syndrome in schizophrenia patients: a meta-analysis

Int J Psychiatry Clin Pract

2014

;

229

–

24.

Chen

Han

Xue

et al.

Personalized tacrolimus dose requirement by CYP3A5 but not ABCB1 or ACE genotyping in both recipient and donor after pediatric liver transplantation

PLoS One

2014

;

:e109464.

Google Scholar

OpenURL Placeholder Text

WorldCat

25.

Wei-lin

Jing

Shu-sen

et al.

Tacrolimus dose requirement in relation to donor and recipient ABCB1 and CYP3A5 gene polymorphisms in Chinese liver transplant patients

Liver Transpl

2006

;

775

–

26.

Monostory

Tóth

Kiss

et al.

Personalizing initial calcineurin inhibitor dosing by adjusting to donor CYP3A-status in liver transplant patients

Br J Clin Pharmacol

2015

;

1429

–

27.

Bains

Grigliatti

Reid

et al.

Naturally occurring variants of human aldo-keto reductases with reduced in vitro metabolism of daunorubicin and doxorubicin

J Pharmacol Exp Ther

2010

;

335

533

–

28.

Lanman

Kong

et al.

Inhibition of the erythropoietin-producing receptor EPHB4 antagonizes androgen receptor overexpression and reduces enzalutamide resistance

J Biol Chem

2020

;

295

5470

–

29.

Bryk

Wypasek

Plens

et al.

Bleeding predictors in patients following venous thromboembolism treated with vitamin K antagonists: association with increased number of single nucleotide polymorphisms

Vascul Pharmacol

2018

;

106

–

30.

Moran

Trusk

Pry

et al.

KRAS mutation status is associated with enhanced dependency on folate metabolism pathways in non-small cell lung cancer cells

Mol Cancer Ther

2014

;

1611

–

31.

Kuribayashi

Miyata

Fukuoka

et al.

Methotrexate and gemcitabine combination chemotherapy for the treatment of malignant pleural mesothelioma

Mol Clin Oncol

2013

;

639

–

32.

Goswami

Yee

Stocker

et al.

Genetic variants in transcription factors are associated with the pharmacokinetics and pharmacodynamics of metformin

Clin Pharmacol Ther

2014

;

370

–

33.

Sesti

Laratta

Cardellini

et al.

The E23K variant of KCNJ11 encoding the pancreatic beta-cell adenosine 5ʹ-triphosphate-sensitive potassium channel subunit Kir6.2 is associated with an increased risk of secondary failure to sulfonylurea in patients with type 2 diabetes

J Clin Endocrinol Metab

2006

;

2334

–

34.

Tulsyan

Chaturvedi

Singh

et al.

Assessment of clinical outcomes in breast cancer patients treated with taxanes: multi-analytical approach

Gene

2014

;

543

–

35.

Tecza

Pamula-Pilat

Lanuszewska

et al.

Pharmacogenetics of toxicity of 5-fluorouracil, doxorubicin and cyclophosphamide chemotherapy in breast cancer patients

Oncotarget

2018

;

9114

–

36.

Le Morvan

Litière

Laroche-Clary

et al.

Identification of SNPs associated with response of breast cancer patients to neoadjuvant chemotherapy in the EORTC-10994 randomized phase III trial

Pharmacogenomics J

2015

;

–

37.

Lee

Park

et al.

Genetic polymorphisms of SLC28A3, SLC29A1 and RRM1 predict clinical outcome in patients with metastatic breast cancer receiving gemcitabine plus paclitaxel chemotherapy

Eur J Cancer

2014

;

698

–

705

38.

Abraham

Guo

Dorling

et al.

Replication of genetic polymorphisms reported to be associated with taxane-related sensory neuropathy in patients with early breast cancer treated with Paclitaxel

Clin Cancer Res

2014

;

2466

–

39.

Riordan

Rommens

Kerem

et al.

Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA

Science

1989

;

245

1066

–

40.

Van Goor

Burton

et al.

Effect of ivacaftor on CFTR forms with missense mutations associated with defects in protein processing or function

J Cyst Fibros

2014

;

–

41.

Keating

Marigowda

Burr

et al.

VX-445-tezacaftor-ivacaftor in patients with cystic fibrosis and one or two Phe508del alleles

N Engl J Med

2018

;

379

1612

–

42.

Liu

Dawson

Cystic fibrosis transmembrane conductance regulator (CFTR) potentiators protect G551D but not ΔF508 CFTR from thermal instability

Biochemistry

2014

;

5613

–

43.

Gane

Stedman

Hyland

et al.

Efficacy of nucleotide polymerase inhibitor sofosbuvir plus the NS5A inhibitor ledipasvir or the NS5B non-nucleoside inhibitor GS-9669 against HCV genotype 1 infection

Gastroenterology

2014

;

146

736

–

743.e731

44.

Thompson

Fellay

Patel

et al.

Variants in the ITPA gene protect against ribavirin-induced hemolytic anemia and decrease the need for ribavirin dose reduction

Gastroenterology

2010

;

139

1181

–

45.

Fellay

Thompson

et al.

ITPA gene variants protect against anaemia in patients treated for chronic hepatitis C

Nature

2010

;

464

405

–

Author notes

‡

Equal contribution.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Month:	Total Views:
September 2024	114
October 2024	312
November 2024	206
December 2024	108

Article Contents

CPMKG: a condition-based knowledge graph for precision medicine

Abstract

Introduction

Materials and methods

Data collection

Knowledge pattern and conditional knowledge representation

CPMKG knowledge pattern

Conditional knowledge representation framework and storage

Conditional knowledge mining

Graph interpretation by LLM

Entity disambiguation

Results

Statistics on entities and knowledge in CPMKG

Conditional knowledge-based schema of CPMKG

Drug-centered conditional knowledge exploration

Knowledge inference with multiple evidence

Advanced application of CPMKG (case study)

Case study 1: personalized drug suggestion

Case study 2: pharmacogenomics application

Case study 3: medication synergy assistant

Discussion

Conclusion

Acknowledgements

Supplementary data

Conflict of interest

Funding

Data availability

References

Author notes

Supplementary data

Citations

Views

Altmetric

Citing articles via

Latest

Most Read

Most Cited

Article Contents

CPMKG: a condition-based knowledge graph for precision medicine

Abstract

Introduction

Materials and methods

Data collection

Knowledge pattern and conditional knowledge representation

CPMKG knowledge pattern

Conditional knowledge representation framework and storage

Conditional knowledge mining

Graph interpretation by LLM

Entity disambiguation

Results

Statistics on entities and knowledge in CPMKG

Conditional knowledge-based schema of CPMKG

Drug-centered conditional knowledge exploration

Knowledge inference with multiple evidence

Advanced application of CPMKG (case study)

Case study 1: personalized drug suggestion

Case study 2: pharmacogenomics application

Case study 3: medication synergy assistant

Discussion

Conclusion

Acknowledgements

Supplementary data

Conflict of interest

Funding

Data availability

References

Author notes

Supplementary data

Citations

Views

Altmetric

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only