Integrated ACMG-approved genes and ICD codes for the translational research and precision medicine Open Access

List of American College of Medical Genetics and Genomics (ACMG) genes

Number	Genes	Name of the disease
1	BRCA1	Hereditary breast and ovarian cancers
2	BRCA2
3	PALB2
4	TP53	Li-Fraumeni syndrome
5	STK11	Peutz-Jeghers syndrome
6	MLH1	Lynch syndrome
7	MSH2
8	MSH6
9	PMS2
10	APC	Familial adenomatous polyposis
11	MUTYH	MYH-associated polyposis, adenomas, multiple colorectal, FAP type 2, colorectal adenomatous polyposis, autosomal recessive, with pilomatricomas
12	BMPRA1	Juvenile polyposis
13	SMAD4	Juvenile polyposis
14	VHL	Von Hippel–Lindau syndrome
15	MEN1	Multiple endocrine neoplasia type 1
16	RET	Multiple endocrine neoplasia type 2 and familial medullary thyroid cancer
17	PTEN	PTEN hamartoma tumor syndrome
18	RB1	Retinoblastoma
19	SDHD	Hereditary paraganglioma-pheochromocytoma syndrome
20	SDHAF2
21	SDHC
22	SDHB
23	MAX
24	TMEM127
25	TSC1	Tuberous sclerosis complex
26	TSC2	Tuberous sclerosis complex
27	WT1	WT1-related Wilms tumor
28	NF2	Neurofibromatosis type 2
29	COL3A1	Vascular-type Ehlers-Danlos syndrome
30	FBN1	Marfan syndrome, Loeys-Dietz syndromes and familial thoracic aortic aneurysms and dissections
31	TGFBR1
32	TGFBR2
33	SMAD3
34	ACTA2
35	MYH11
36	MYBPC3	Hypertrophic cardiomyopathy, dilated cardiomyopathy
37	MYH7
38	TNNT2
39	TNNI3
40	TPM1
41	MYL3
42	ACTC1
43	PRKAG2
44	GLA
45	MYL2
46	LMNA
47	FLNC
48	TTN
49	RYR2	Catecholaminergic polymorphic ventricular tachycardia
50	CASQ2
51	TRDN
52	PKP2	Arrhythmogenic right ventricular cardiomyopathy
53	DSP
54	DSC2
55	TMEM43
56	DSG2
57	KCNQ1	Romano-Ward long-QT syndrome types 1, 2 and 3 and Brugada syndrome
58	KCNH2
59	SCN5A
60	LDLR	Familial hypercholesterolemia
61	APOB
62	PCSK9
63	ATP7B	Wilson disease
64	OTC	Ornithine transcarbamylase deficiency
65	BTD	Biotinidase deficiency
66	GAA	Pompe disease
67	RYR1	Malignant hyperthermia susceptibility
68	CACNA1S	Malignant hyperthermia susceptibility
59	HFE	Hereditary haemochromatosis
70	ACVRL1	Hereditary haemorrhagic telangiectasia
71	ENG	Hereditary haemorrhagic telangiectasia
72	HNF1A	Maturity-onset diabetes of the young
73	RPE65	RPE65-related retinopathy

Number	Genes	Name of the disease
1	BRCA1	Hereditary breast and ovarian cancers
2	BRCA2
3	PALB2
4	TP53	Li-Fraumeni syndrome
5	STK11	Peutz-Jeghers syndrome
6	MLH1	Lynch syndrome
7	MSH2
8	MSH6
9	PMS2
10	APC	Familial adenomatous polyposis
11	MUTYH	MYH-associated polyposis, adenomas, multiple colorectal, FAP type 2, colorectal adenomatous polyposis, autosomal recessive, with pilomatricomas
12	BMPRA1	Juvenile polyposis
13	SMAD4	Juvenile polyposis
14	VHL	Von Hippel–Lindau syndrome
15	MEN1	Multiple endocrine neoplasia type 1
16	RET	Multiple endocrine neoplasia type 2 and familial medullary thyroid cancer
17	PTEN	PTEN hamartoma tumor syndrome
18	RB1	Retinoblastoma
19	SDHD	Hereditary paraganglioma-pheochromocytoma syndrome
20	SDHAF2
21	SDHC
22	SDHB
23	MAX
24	TMEM127
25	TSC1	Tuberous sclerosis complex
26	TSC2	Tuberous sclerosis complex
27	WT1	WT1-related Wilms tumor
28	NF2	Neurofibromatosis type 2
29	COL3A1	Vascular-type Ehlers-Danlos syndrome
30	FBN1	Marfan syndrome, Loeys-Dietz syndromes and familial thoracic aortic aneurysms and dissections
31	TGFBR1
32	TGFBR2
33	SMAD3
34	ACTA2
35	MYH11
36	MYBPC3	Hypertrophic cardiomyopathy, dilated cardiomyopathy
37	MYH7
38	TNNT2
39	TNNI3
40	TPM1
41	MYL3
42	ACTC1
43	PRKAG2
44	GLA
45	MYL2
46	LMNA
47	FLNC
48	TTN
49	RYR2	Catecholaminergic polymorphic ventricular tachycardia
50	CASQ2
51	TRDN
52	PKP2	Arrhythmogenic right ventricular cardiomyopathy
53	DSP
54	DSC2
55	TMEM43
56	DSG2
57	KCNQ1	Romano-Ward long-QT syndrome types 1, 2 and 3 and Brugada syndrome
58	KCNH2
59	SCN5A
60	LDLR	Familial hypercholesterolemia
61	APOB
62	PCSK9
63	ATP7B	Wilson disease
64	OTC	Ornithine transcarbamylase deficiency
65	BTD	Biotinidase deficiency
66	GAA	Pompe disease
67	RYR1	Malignant hyperthermia susceptibility
68	CACNA1S	Malignant hyperthermia susceptibility
59	HFE	Hereditary haemochromatosis
70	ACVRL1	Hereditary haemorrhagic telangiectasia
71	ENG	Hereditary haemorrhagic telangiectasia
72	HNF1A	Maturity-onset diabetes of the young
73	RPE65	RPE65-related retinopathy

A list of 73 ACMG-approved genes for which specific mutations are known to be causative of disorders with defined phenotypes that are clinically actionable by an accepted intervention is included. The disease phenotype associated with each gene is also included.

Table 1.

List of American College of Medical Genetics and Genomics (ACMG) genes

Number	Genes	Name of the disease
1	BRCA1	Hereditary breast and ovarian cancers
2	BRCA2
3	PALB2
4	TP53	Li-Fraumeni syndrome
5	STK11	Peutz-Jeghers syndrome
6	MLH1	Lynch syndrome
7	MSH2
8	MSH6
9	PMS2
10	APC	Familial adenomatous polyposis
11	MUTYH	MYH-associated polyposis, adenomas, multiple colorectal, FAP type 2, colorectal adenomatous polyposis, autosomal recessive, with pilomatricomas
12	BMPRA1	Juvenile polyposis
13	SMAD4	Juvenile polyposis
14	VHL	Von Hippel–Lindau syndrome
15	MEN1	Multiple endocrine neoplasia type 1
16	RET	Multiple endocrine neoplasia type 2 and familial medullary thyroid cancer
17	PTEN	PTEN hamartoma tumor syndrome
18	RB1	Retinoblastoma
19	SDHD	Hereditary paraganglioma-pheochromocytoma syndrome
20	SDHAF2
21	SDHC
22	SDHB
23	MAX
24	TMEM127
25	TSC1	Tuberous sclerosis complex
26	TSC2	Tuberous sclerosis complex
27	WT1	WT1-related Wilms tumor
28	NF2	Neurofibromatosis type 2
29	COL3A1	Vascular-type Ehlers-Danlos syndrome
30	FBN1	Marfan syndrome, Loeys-Dietz syndromes and familial thoracic aortic aneurysms and dissections
31	TGFBR1
32	TGFBR2
33	SMAD3
34	ACTA2
35	MYH11
36	MYBPC3	Hypertrophic cardiomyopathy, dilated cardiomyopathy
37	MYH7
38	TNNT2
39	TNNI3
40	TPM1
41	MYL3
42	ACTC1
43	PRKAG2
44	GLA
45	MYL2
46	LMNA
47	FLNC
48	TTN
49	RYR2	Catecholaminergic polymorphic ventricular tachycardia
50	CASQ2
51	TRDN
52	PKP2	Arrhythmogenic right ventricular cardiomyopathy
53	DSP
54	DSC2
55	TMEM43
56	DSG2
57	KCNQ1	Romano-Ward long-QT syndrome types 1, 2 and 3 and Brugada syndrome
58	KCNH2
59	SCN5A
60	LDLR	Familial hypercholesterolemia
61	APOB
62	PCSK9
63	ATP7B	Wilson disease
64	OTC	Ornithine transcarbamylase deficiency
65	BTD	Biotinidase deficiency
66	GAA	Pompe disease
67	RYR1	Malignant hyperthermia susceptibility
68	CACNA1S	Malignant hyperthermia susceptibility
59	HFE	Hereditary haemochromatosis
70	ACVRL1	Hereditary haemorrhagic telangiectasia
71	ENG	Hereditary haemorrhagic telangiectasia
72	HNF1A	Maturity-onset diabetes of the young
73	RPE65	RPE65-related retinopathy

Number	Genes	Name of the disease
1	BRCA1	Hereditary breast and ovarian cancers
2	BRCA2
3	PALB2
4	TP53	Li-Fraumeni syndrome
5	STK11	Peutz-Jeghers syndrome
6	MLH1	Lynch syndrome
7	MSH2
8	MSH6
9	PMS2
10	APC	Familial adenomatous polyposis
11	MUTYH	MYH-associated polyposis, adenomas, multiple colorectal, FAP type 2, colorectal adenomatous polyposis, autosomal recessive, with pilomatricomas
12	BMPRA1	Juvenile polyposis
13	SMAD4	Juvenile polyposis
14	VHL	Von Hippel–Lindau syndrome
15	MEN1	Multiple endocrine neoplasia type 1
16	RET	Multiple endocrine neoplasia type 2 and familial medullary thyroid cancer
17	PTEN	PTEN hamartoma tumor syndrome
18	RB1	Retinoblastoma
19	SDHD	Hereditary paraganglioma-pheochromocytoma syndrome
20	SDHAF2
21	SDHC
22	SDHB
23	MAX
24	TMEM127
25	TSC1	Tuberous sclerosis complex
26	TSC2	Tuberous sclerosis complex
27	WT1	WT1-related Wilms tumor
28	NF2	Neurofibromatosis type 2
29	COL3A1	Vascular-type Ehlers-Danlos syndrome
30	FBN1	Marfan syndrome, Loeys-Dietz syndromes and familial thoracic aortic aneurysms and dissections
31	TGFBR1
32	TGFBR2
33	SMAD3
34	ACTA2
35	MYH11
36	MYBPC3	Hypertrophic cardiomyopathy, dilated cardiomyopathy
37	MYH7
38	TNNT2
39	TNNI3
40	TPM1
41	MYL3
42	ACTC1
43	PRKAG2
44	GLA
45	MYL2
46	LMNA
47	FLNC
48	TTN
49	RYR2	Catecholaminergic polymorphic ventricular tachycardia
50	CASQ2
51	TRDN
52	PKP2	Arrhythmogenic right ventricular cardiomyopathy
53	DSP
54	DSC2
55	TMEM43
56	DSG2
57	KCNQ1	Romano-Ward long-QT syndrome types 1, 2 and 3 and Brugada syndrome
58	KCNH2
59	SCN5A
60	LDLR	Familial hypercholesterolemia
61	APOB
62	PCSK9
63	ATP7B	Wilson disease
64	OTC	Ornithine transcarbamylase deficiency
65	BTD	Biotinidase deficiency
66	GAA	Pompe disease
67	RYR1	Malignant hyperthermia susceptibility
68	CACNA1S	Malignant hyperthermia susceptibility
59	HFE	Hereditary haemochromatosis
70	ACVRL1	Hereditary haemorrhagic telangiectasia
71	ENG	Hereditary haemorrhagic telangiectasia
72	HNF1A	Maturity-onset diabetes of the young
73	RPE65	RPE65-related retinopathy

Table 2.

PAS-GDC database description and statistics

Categories	Count
Genes	73
Diseases	1788
ICD-9	2101
ICD-10	2589
Gene–disease combination (ICD-9)	7918
Gene–disease combination (ICD-10)	11 799

PAS-GDC database includes genes, diseases and ICD-9 and ICD-10 codes, as well as the relevant gen–disease combinations for each ICD code.

Table 2.

PAS-GDC database description and statistics

Categories	Count
Genes	73
Diseases	1788
ICD-9	2101
ICD-10	2589
Gene–disease combination (ICD-9)	7918
Gene–disease combination (ICD-10)	11 799

PAS-GDC database includes genes, diseases and ICD-9 and ICD-10 codes, as well as the relevant gen–disease combinations for each ICD code.

Figure 1.

PAS-GDC components’ design, development and data flow. PAS-GDC is an online application developed using MySQL database, PHP scripting language and UNIX-based web and database servers.

Relational database modelling

The main objective of the database was to make the compiled information easily searchable and parsed so that all searches from the website would be up to date. Additionally, the database design is needed to support easy integration of future ICD codes to ensure that up-to-date information is reflected in our website. To meet these requirements, the database was created in MySQL (open-source relational database management system) Workbench and consisted of seven relations. The seven relations included ACMG’s 73 actionable genes, diseases, ICD-9 codes, ICD-10 codes, gene–disease pairings, gene–disease–ICD-9 pairings and gene–disease–ICD-10 pairings. The gene–disease–ICD-9 and gene–disease–ICD-10 relations are created from the relations that manage the genes, diseases and their respective ICD codes (Figure 2). This ensures that there are no duplicate values. This database is unique and accessible through our freely available, open-source web application.

Figure 2.

PAS-GDC relational database. PAS-GDC database includes six relations, genes, diseases, ICD9, ICD10, gene–disease, gene–disease–ICD-9 and gene–disease–ICD-10.

Web development and search

PAS-GDC is a web application that has been developed using Hypertext Markup Language (HTML) and JavaScript with its jQuery packages. Additionally, we have used Cascading Style Sheets with a Bootstrap framework on HTML to enhance the presentation and provide a user-friendly interface to our users. The database is connected to our web application using server-side PHP (general-purpose scripting language for web application development) language and its ‘mysqli’ package. Visual Studio Code was the primary Integrated Development Environment used in the creation of the source code as well as testing. The testing of the website involved using Red Hat localhost servers. During development, testers used macOS, iOS, Windows and Android operating systems along with a variety of different browsers that include but are not limited to Google Chrome, Safari and Firefox to ensure that the website performs typically and is configured correctly regardless of the environment. SSL certificates were utilized in the PAS-GDC website, and the communication between the browser and server was encrypted. The search allows the user to perform searches based on the ICD-9 codes, ICD-10 codes, gene or disease category (Figure 3). The ICD-9 and ICD-10 searches allow for their independent gene and disease search allowing the user to retrieve the respective gene–disease pairing based on the desired ICD selection. Additionally, the website allows for a simple and easy export feature that allows the users to store and share their desired results as a comma-separated values (CSV) file. The user interface of PAS-GDC is explained in the supplementary material attached.

Figure 3.

PAS-GDC graphical user interfaces workflow. PAS-GDC GUI includes Main, Gene, Disease and ICD Code (9 and 10) interfaces.

Results

The gene–disease–ICD code database is a flexible and dynamic database. The database design in SQL allows for more genes and ICD codes to be integrated as they are made available and updated automatically on the PAS-GDC website. PAS-GDC is a simple-to-use, robust search engine that utilizes minimalistic features and an internet connection to retrieve results. The graphical user interface includes a search capability of three features, namely (1) ICD Codes, (2) Genes and (3) Diseases as a simple check box that gives the users the capability to choose the feature they desire. Additionally, PAS-GDC provides the users the option to search their results against ICD-9 codes and ICD-10 codes.

Case study: gene

To test the effectiveness and functionality of the PAS-GDC web application, we created three different case studies exploring the ‘gene’ search feature (Figure 4). The genes that were included in this case study were BRCA1, MYBPC3 and APC. The results were exported and collected in a tabular format with three columns: genes, diseases and ICD codes. The BRCA1 gene codes for proteins that are vital to a multitude of cellular processes (30). Mutations in this gene can lead to a predisposition to breast and ovarian cancers (30–32). The search results for the BRCA1 gene present 57 distinct diseases that are directly linked to this gene. These diseases include but are not limited to breast, ovarian and pancreatic cancers as well as fallopian tube carcinoma. Additionally, the search uncovered a total of 126 ICD-9 (Figure 4A(1)) and 243 ICD-10 codes associated with BRCA1 (Figure 4A(2)). ICD-9 codes starting with 17 seemed to repeat for the BRCA1 gene as this category denotes breast cancer. Similarly, ICD-10 codes starting with C4 and C5 were the most common in the search. The search criteria were repeated for the gene MYBPC3. Mutations in this gene are usually linked to cardiovascular diseases such as cardiomyopathy and atrial fibrillation (33). Seventeen other diseases were also found to be linked to this gene through our web application. These diseases included but were not limited to diastolic heart failure, cardiac arrest and heart disease. Currently, there are 59 ICD-9 (Figure 4B(1)) and 104 ICD-10 codes linked to MYBPC3 (Figure 4B(2)). One of the most common diagnoses linked to this gene was cardiovascular disease (heart disease), the leading cause of death in the USA (34, 35). Thirty-three of the fifty-nine ICD-9 codes and fifty-eight of the 104 ICD-10 codes are linked to heart disease showing their prevalence and impact on patients with a genetic mutation in the MYBPC3 gene. The third case study focused on the APC gene, which is known to lead to a predisposition to colorectal and lung cancers (36, 37). Based on the results from the PAS-GDC web application, it was observed that there are 43 diseases that are associated with this gene. Some of the diseases highlighted in the results included but were not limited to lung cancer , thyroid cancer (38) and breast cancer (39). Seventy-six ICD-9 (Figure 4C(1)) and 186 ICD-10 codes were retrieved for the APC gene (Figure 4C(2)). Notably, the most common diagnoses linked to this gene included breast and lung cancers. While APC had been linked to lung cancer in previous studies, the relation with breast cancer has not yet been established. Sixteen out of the seventy-six ICD-9 and eighty out of the 186 ICD-10 codes for the APC gene were associated with breast cancer.

Figure 4.

PAS-GDC use case—gene. This figure presents three different case studies exploring the ‘gene’ search feature: BRCA1 (A), MYBPC3 (B) and APC (C).

Case study: disease

The link between common disease nomenclature and international classification was exemplified through three case studies of breast cancer, heart disease and Alzheimer’s disease (Figure 5). The disease case studies were chosen because of their prevalence in the general population. Breast cancer is one of the leading causes of death for women worldwide and has an incidence of 1 in 10 cancer diagnoses each year (39, 40). Our web application highlights the impact of this disease by returning 323 ICD-9 (Figure 5A(1)) and 1449 ICD-10 codes (Figure 5A(2)). Additionally, a total of 16 gene records were retrieved for breast cancer, which includes but is not limited to BRCA1, RB1, APC and PTEN. Like cancer, the term ‘heart disease’ encompasses several different subtypes, and one of the most common forms, congenital heart disease, continues to be a growing burden on health care systems (41). A search of heart disease retrieved 540 ICD-9 (Figure 5B(1)) and 937 ICD-10 codes (Figure 5B(2)). A total of 18 gene records were retrieved for heart disease, the most common gene being ACTC1, which has been documented to cause cardiomyopathies (42). Our final case is Alzheimer’s disease that is characteristically prevalent in older adults, and current research implicates a complex relationship between genetic and environmental factors (43). The search yielded three ICD-9 (Figure 5C(1)) and nine ICD-10 codes (Figure 5C(2)). Additionally, two gene records, LDLR and HFE, were associated with this disease. While these genes have been studied in other forms of neurological diseases, their effects on and interactions with neurodegenerative diseases are not as widely studied. Since ICD-10 diagnostic code set allows for greater specificity in the disease aetiology, anatomic site and severity (44), there are a greater number of codes available, as seen in all three case studies.

Figure 5.

PAS-GDC use case—disease. This figure presents three different case studies exploring the ‘disease’ search feature: breast cancer (A), heart disease (B), and Alzheimer’s disease (C).

Case study: ICD code

The third search feature utilized by our web application is based on the ICD codes. The three ICD-9 codes that were included in this case study were 104, 233 and 770. A search on our web application for the ICD-9 code 104 returns 16 unique genes, which include but are not limited to ACTC1, APOB, PKP2 and MYH7, as well as to two common diseases, heart disease and bone fracture (Figure 6A(1)). Notably, the ACTC1 gene is also associated with other forms of cardiovascular disease as stated previously. The ICD-9 code, 233, yields one distinct disease, breast cancer, which is the most common type of cancer (Figure 6B(1)). Additionally, the search returns 16 unique genes, which include but are not limited to APC, BRCA1, MLH1, PTEN and RB1. A search in our third case study for the ICD-9 code, 770, shows that this code is associated with Alzheimer’s disease as well as with two distinct genes, LDLR and HFE (Figure 6C(1)), which have been observed to be linked to Alzheimer’s disease based on our previous queries. We also utilized our ICD-10 database for three distinct case studies involving the codes 202, 411 and 1316. A search for the ICD-10 code, 202, returns 12 unique genes and 2 diseases, heart disease and ptosis (Figure 6A(2)). The code, 411, is associated mainly with breast cancer as well as other phenotypic variations of this disease such as breast giant fibroadenoma and breast benign neoplasm. Additionally, the search yields 19 distinct genes, which are linked to this unique ICD code (Figure 6B(2)). The code 1316 is linked to one distinct disease, Alzheimer’s, and two genes, LDLR and HFE (Figure 6C(2)).

Figure 6.

PAS-GDC use case—ICD code. This figure presents three different case studies exploring ICD-9 and ICD-10 codes. (A) The results for ICD-9 and ICD-10 codes starting with ‘104’ and ‘202’, respectively. (B) The results for ICD-9 and ICD-10 codes starting with ‘233’ and ‘411’, respectively. (C) The results for ICD-9 and ICD-10 codes starting with ‘770’ and ‘1316’, respectively.

The connection to our in-house database does not require the user to install any tools or external modifications. When users search for their desired keywords, it triggers the database and cross-references for the exact or similar keywords. Once the database retrieves the results, it is presented to the user in a table format displaying the gene, disease and ICD code as separate columns. Additionally, our web application allows users to save their desired results as a text (CSV) file. The intelligent search feature of the PAS-GDC removes the need to cross-verify genes or diseases on other web applications or databases by integrating and providing an all-in-one (gene, ICD and disease) search capability to the users. Updates to the database might include but are not limited to new additions to the ACMG-approved genes, new associations between genes and diseases and the addition of another version of the ICD code.

Discussion

Recent developments in sequencing technologies and analysis of gene expression and variant data have helped advance the field of precision medicine (37). Genomic and transcriptomic analyses have the potential to be a driver for clinically reliable predictions of complex diseases and disorders. In a clinical research setting, the exploratory and dynamic nature of precision medicine yields promising results in discovering new gene–disease relationships, variants and diverse genotyping (3). NGS has aided in the implementation of personalized treatments for patients with cardiovascular disease and neurodegenerative conditions (3). Some of the applications of NGS include but are not limited to genomic data models to support clinical decision-making, identification of robust epigenetic biomarkers as well as clinical translation (3). Additionally, the latest research indicates that there is merit in integrating untargeted metabolomic profiling with genomic analysis for individuals at the ends of phenotypic expression (26). This approach demonstrates that integrated genomics helps narrow the gap between treatment and disease by leveraging streamlined analysis of a patient’s genome, thus saving critical diagnosis time and money for the patient and institution of care (45). However, there are still many constraints when trying to integrate genomic and clinical data. These constraints include but are not limited to lack of standardization when linking genes to their disease phenotypes (20), difficulty in integrating huge amounts of genetic and clinical data (46) and absence of a platform that contains up-to-date genome and clinical data (20, 47, 48). To address these limitations, we have created PAS-GDC, a web application, that is easy to navigate and freely available on many platforms. This graphical user interface was designed so that it can be used by non-computational users, such as physicians and geneticists, allowing for the integration of precision medicine in the clinical field. Beyond a computational lens, clinicians and patients can interpret clinical and genomic data by learning the implications of one or more mutations in the genome and present actionable steps in a more effective, personalized treatment plan. Additionally, researchers in various fields could use our web application to support their work, especially those seeking connections between genomics and phenotypical manifestation.

One of the immediate implications of our web application is to support the downstream bioinformatic analysis involving gene–disease relationships. The produced outcome of the PAS-GDC is based on the integrated information including authentic and ACMG-approved genes, WHO-provided ICD codes and associated diagnoses. To the best of our knowledge, so far, there is no comprehensive, dedicated, user-friendly, manually curated and detailed application exists, which is mainly proposed, designed and developed to share such important information for supporting the objectives of translational research and precision medicine (49). One of the most important benefits of the outcome of PAS-GDC is to help in filling the gaps between basic sciences and clinical research. Most of the time, the outcomes of the research based on the high-throughput genomics data analysis cannot be linked to the EHR of the patients, precisely. The reason for this is that in the clinical world, one disease can be represented with several phenotypes and classified using variable ICD codes available through the different health systems, e.g. EPIC, NextGen and Cerner. To efficiently link the outcomes of genomics research, it is important to first classify authentic disease-causing genes and then link those to the relevant ICD codes. There are many advancements, when it comes to the development of bioinformatic tools and the application of machine learning and Artificial Intelligence approaches for the processing, analysis and integration of clinical data with multi-omics/genomics data (6, 26, 46). Here, the challenges and limitations are when it comes to usability and interpretation, as most of such applications require strong computation background from their users. Furthermore, they might need its users to install, learn and practice programming languages (e.g. R, Python, SAS and MATLAB), relational databases (e.g. MySQL and Microsoft SQL), high-performance and secure computing environments (e.g. cloud and data cluster) and handling of text files of variables sizes and structures. This can increase their time to access and interpret data, cost to afford and reduce the liberty to exercise and experiment with it. However, the availability and application of user-friendly platforms such as PAS-GDC will help addressing such challenges at global levels and can benefit millions of users from diverse backgrounds expertise.

The fundamental aspect of our PAS-GDC database is the 73 ACMG codes as well as the ICD-9 and ICD-10 codes. The backend development was labor-intensive, and we chose actionable genes that have been shown to be causative of disorders as a strategic start. There are seven relationship databases created to compile the information cohesively and serve as a base for future updates, allowing our web application to remain up to date. To optimize our process, we are exploring different methods to address the time-consuming aspect of data curation. Furthermore, we are interested in using Artificial Intelligence and machine learning algorithms for data mining. A solid foundation was created, and the tools to build out the database to a more robust capacity are readily available. We are extending the scope of our project by implementing more disease-causing genes in our database as well as different versions of the ICD code as they are made available. With the copious amounts of data available and the development of systems that can interpret them on a large scale, the focus of treatment can shift from symptom-based to prevention and early intervention in unprecedented ways. A world with precision medicine would challenge the current health care system by centering care on maintaining health instead of addressing the lack thereof.

Conclusion and future recommendations

PAS-GDC is a cross-platform online application compatible with Microsoft Windows and macOS operating systems. It has been developed using relational database management systems and programming languages including but not limited to HTML/Cascading Style Sheet with Bootstrap Framework, PHP, MySQL and JavaScript. The current version of PAS-GDC is based on the set of 73 genes approved by the ACMG. In the future, we are looking forward to increase the number of genes with the identification and inclusion of more authentic genes and variants, produced as the result of genome-wide association studies (GWAS) (50, 51) and other important analyses. PAS-GDC can execute in smartphones (e.g. iOS and Android-based devices) but using installed web browsers. To increase its visibility, application and benefits at the global level, in the future, we aim to develop the desktop and iOS-based user interfaces of PAS-GDC and integrate them into the PROMIS-APP-SUITE (25) and Visualizing Genes with disease causing Variants (52). PAS is an iOS application designed to simplify navigation across the landscape of gene annotation resources by an efficient mobile record search engine, which is based on standardized genes and related diseases to help explore multi-purpose clinical and genomics concepts in meaningful ways (25).

Supplementary material

Supplementary material is available at Database online.

Data availability

Data are freely available and can be accessed via a https://promis.rutgers.edu/pas/. All results produced and discussed in this article are incorporated into the article and its online supplementary material.

Author contributions

Z.A. proposed, supervised and led this study. R.W. programmed the web interface of the application. A.P. modelled a relational database and implemented data extraction, transfer and loading modules to efficiently parse and insert data into the designed database. K.P. and A.S.N. performed in data curation, integration and management. W.P.L. and H.A. participated in post-development analysis and theoretical research. H.A., S.B. and S.M. did quality testing. D.M. supported the establishment of web and database servers and supported clinical and genomics data management. All authors have participated in writing and review and have approved this manuscript for publication.

Notes on contributors

R.W. is a research assistant at the Ahmed Lab, Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, New Brunswick.

A.P. is a research assistant at the Ahmed Lab, Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, New Brunswick.

A.S.R. is a research assistant at the Ahmed Lab, Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, New Brunswick.

W.P.L. is a research assistant at the Ahmed Lab, Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, New Brunswick.

S.B. is an intern at the Ahmed Lab, Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, New Brunswick.

S.M. is an intern at the Ahmed Lab, Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, New Brunswick.

K.P. is a research assistant at the Ahmed Lab, Department of Genetics and Genome Sciences, UConn Health.

D.M. is the lead software engineer at the Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, New Brunswick.

H.A. is a senior research assistant and project manager at the Ahmed Lab, Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, New Brunswick.

Z.A. is an assistant professor of medicine and a tenure track and core faculty member at the Rutgers Institute for Health, Health Care Policy and Aging Research; and Department of Medicine, Division of General Internal Medicine, Rutgers Robert Wood Johnson Medical School, Rutgers Biomedical and Health Sciences, Rutgers University, New Brunswick.

Funding

The Institute for Health, Health Care Policy and Aging Research; Department of Medicine, Robert Wood Johnson Medical School; and Rutgers Biomedical and Health Sciences at Rutgers, The State University of New Jersey. Research reported in this publication was supported in part by the Office of the Director of the National Institutes of Health under award number R33AG068931.

Conflict of interest statement

The authors declare no competing financial or non-financial interests.

Acknowledgements

We appreciate the great support by the Rutgers Institute for Health, Health Care Policy, and Aging Research (IFH); Department of Medicine, Rutgers Robert Wood Johnson Medical School (RWJMS); and Rutgers Biomedical and Health Sciences (RBHS), at Rutgers, The State University of New Jersey. We thank the members and collaborators of Ahmed Lab at Rutgers (I.F.H., R.W.J.M.S., R.B.H.S.) for their support, participation and contribution to this study.

This study was completed in part by research services and/or survey/data resources provided by the Institute for Health Survey/Data Core at Rutgers University.

The authors acknowledge the Office of Advanced Research Computing (OARC) at Rutgers, The State University of New Jersey for providing access to the Amarel cluster and associated research computing resources that have contributed to the results reported here.

References

Zeeshan

Xiong

Liang

B.T.

et al. (

2020

)

100 years of evolving gene-disease complexities and scientific debutants

Brief. Bioinformatics

885

–

905

Ahmed

(

2020

)

Practicing precision medicine with intelligently integrative clinical and multi-omics data analysis

Hum. Genomics

, 35.

Ahmed

(

2022

)

Multi-omics strategies for personalized and predictive medicine: past, current, and future translational opportunities

Emerg. Top Life Sc.

215

–

225

Ahmed

Zeeshan

Foran

D.J.

et al. (

2021

)

Integrative clinical, genomics and metabolomics data analysis for mainstream precision medicine to investigate COVID-19

BMJ Innov.

Hou

Y.C.C.

H.C.

Martin

et al. (

2020

)

Precision medicine integrating whole-genome sequencing, comprehensive metabolomics, and advanced imaging

Proc. Natl. Acad. Sci.

117

3053

–

3062

Abdelhalim

Berber

Lodi

et al. (

2022

)

Artificial intelligence, healthcare, clinical genomics, and pharmacogenomics approaches in precision medicine

Front. Genet.

, 929736.

Faulkner

Holtorf

A.P.

Liu

C.Y.

et al. (

2020

)

Being precise about precision medicine: what should value frameworks incorporate to address precision medicine? A report of the Personalized Precision Medicine Special Interest Group

Value Health

529

–

539

Khoury

M.J.

Iademarco

M.F.

and

Riley

W.T.

(

2016

)

Precision public health for the era of precision medicine

Am. J. Prev. Med.

398

–

401

Richards

C.S.

Bale

Bellissimo

D.B.

et al.

Molecular Subcommittee of the ACMG Laboratory Quality Assurance Committee

. (

2008

)

ACMG recommendations for standards for interpretation and reporting of sequence variations: revisions 2007

Genet. Med.

294

–

300

10.

Roth

S.C.

(

2019

)

What is genomic medicine?

JMLA

107

, 442.

11.

International Human Genome Sequencing Consortium

. (

2001

)

Initial sequencing and analysis of the human genome

Nature

409

860

–

921

PubMed

12.

Montes

Sanford

B.L.

Comiskey

D.F.

et al. (

2019

)

RNA splicing and disease: animal models to therapies

Trends Genet.

–

13.

Zhao

(

2019

)

Alternative splicing, RNA-seq and drug discovery

Drug Discov. Today

1258

–

1267

14.

Meng

and

Jin

(

2021

)

RNA structures in alternative splicing and back‐splicing

Wiley Interdiscip. Rev. RNA

, e1626.

15.

Sanger

Nicklen

and

Coulson

A.R.

(

1977

)

DNA sequencing with chain-terminating inhibitors

Proc. Natl. Acad. Sci.

5463

–

5467

16.

Liu

et al. (

2012

)

Comparison of next-generation sequencing systems

J. Biomed. Biotechnol.

2012

–

PubMed

17.

De Coster

and

Van Broeckhoven

(

2019

)

Newest methods for detecting structural variations

Trends Biotechnol.

973

–

982

18.

Cock

P.J.A.

Fields

C.J.

Goto

et al. (

2010

)

The sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants

Nucleic Acids Res.

1767

–

1771

19.

Hoogstrate

Jenster

and

van de Werken

H.J.G.

(

2021

)

FASTAFS: File System Virtualisation of Random Access Compressed FASTA Files

BMC Bioinforma.

–

20.

Ahmed

Zeeshan

Xiong

et al. (

2019

)

Debutant iOS app and gene-disease complexities in clinical genomics and precision medicine

Clin. Transl. Med.

–

21.

Petersen

B.S.

Fredrich

Hoeppner

M.P.

et al. (

2017

)

Opportunities and challenges of whole-genome and-exome sequencing

BMC Genet.

–

22.

Wronikowska

M.W.

Malycha

Morgan

L.J.

et al. (

2021

)

Systematic review of applied usability metrics within usability evaluation methods for hospital electronic healthcare record systems: metrics and evaluation methods for eHealth systems

J. Eval. Clin. Pract.

1403

–

1416

23.

Ahmed

Renart

E.G.

Mishra

et al. (

2021

)

JWES: a new pipeline for whole genome/exome sequence data processing, management, and gene‐variant discovery, annotation, prediction, and genotyping

FEBS Open Bio.

2441

–

2452

24.

Ahmed

Kim

and

Liang

B.T.

(

2019

)

MAV-clic: management, analysis, and visualization of clinical data

JAMIA Open

–

25.

Ahmed

Zeeshan

Mendhe

et al. (

2020

)

Human gene and disease associations for clinical-genomics and precision medicine research

Clin. Transl. Med.

297

–

318

26.

Vadapalli

Abdelhalim

Zeeshan

et al. (

2022

)

Artificial intelligence and machine learning approaches using gene expression and variant data for personalized medicine

Brief. Bioinformatics

, bbac191.

. https://www.cdc.gov/nchs/icd/icd9cm.htm (

27.

Escalona

Rocha

and

Posada

(

2016

)

A comparison of tools for the simulation of genomic next-generation sequencing data

Nat. Rev. Genet.

459

–

469

28.

Centers for Disease Control and Prevention

ICD - ICD-9-CM - International Classification of Diseases, Ninth Revision, Clinical Modification

Centers for Disease Control and Prevention

November

, 2021, date last accessed).

29.

Drug Administration. Science Information Facility, Drug Administration. Bureau of Drugs. Office of Scientific Coordination, Drug Administration. Drug Listing Branch, Center for Drug Evaluation, & Research (US). Product Information Management Branch

. (

1976

)

National Drug Code Directory (Vol. 2)

Consumer Protection and Environmental Health Service, Public Health Service, US Department of Health, Education, and Welfare

30.

Yoshida

and

Miki

(

2004

)

Role of BRCA1 and BRCA2 as regulators of DNA repair, transcription, and cell cycle in response to DNA damage

Cancer Sci.

866

–

871

31.

Werner

(

2022

)

BRCA1: an endocrine and metabolic regulator

Front Endocrinol (Lausanne)

, 844575.

32.

Jhanwar-Uniyal

(

2003

)

BRCA1 in cancer, cell cycle and genomic stability

Front. Biosci

s1107

–

s1117

33.

Helms

A.S.

Tang

V.T.

O’Leary

T.S.

et al. (

2020

)

Effects of MYBPC3 loss-of-function mutations preceding hypertrophic cardiomyopathy

JCI Insight

, e133782.

34.

Mc Namara

Alzubaidi

and

Jackson

J.K.

(

2019

)

Cardiovascular disease as a leading cause of death: how are pharmacists getting involved?

Integr. Pharm. Res. Pract.

–

35.

Stewart

Manmathan

and

Wilkinson

(

2017

)

Primary prevention of cardiovascular disease: a review of contemporary guidance and literature

JRSM Cardiovasc. Dis.

, 2048004016687211.

36.

Fodde

(

2002

)

The APC gene in colorectal cancer

Eur. J Cancer (Oxford, England: 1990)

867

–

871

37.

Liu

Zhou

et al. (

2021

)

APC gene promoter methylation as a potential biomarker for lung cancer diagnosis: a meta-analysis

Thorac. Cancer

2907

–

2913

38.

Łukasiewicz

Czeczelewski

Forma

et al. (

2021

)

Breast cancer-epidemiology, risk factors, classification, prognostic markers, and current treatment strategies-an updated review

Cancers

, 4287.

39.

Alkabban

F.M.

and

Ferguson

(

2018

)

Breast Cancer

Nih.gov; StatPearls Publishing

Google Preview

40.

Yan

Liu

et al. (

2020

)

Diagnosis and treatment of breast cancer in the precision medicine era

Methods Mol. Biol.

2204

–

41.

Mutluer

F.O.

and

Çeliker

(

2018

)

General concepts in adult congenital heart disease

Balkan Med. J.

–

42.

Ohtaki

Wanibuchi

Kataoka-Sasaki

et al. (

2017

)

ACTC1 as an invasion and prognosis marker in glioma

J. Neurosurg.

126

467

–

475

43.

Lane

C.A.

Hardy

and

Schott

J.M.

(

2017

)

Alzheimer’s disease

Eur. J. Neurol.

–

44.

Manchikanti

Falco

F.J.E.

and

Hirsch

J.A.

(

2011

)

Ready or not! Here comes ICD-10

J. Neurointerv. Surg.

–

45.

Alaimo

J.T.

Glinton

K.E.

Liu

et al. (

2020

)

Integrated analysis of metabolomic profiling and exome data supplements sequence variant interpretation, classification, and diagnosis

Genet. Med.

1560

–

1566

46.

Ahmed

Mohamed

Zeeshan

et al. (

2020

)

Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine

Database

2020

, baaa010.

47.

Xuan

Qing

et al. (

2013

)

Next-generation sequencing in the clinic: promises and challenges

Cancer Lett.

340

284

–

295

48.

Biesecker

L.G.

Nussbaum

R.L.

and

Rehm

H.L.

(

2018

)

Distinguishing variant pathogenicity from genetic diagnosis: how to know whether a variant causes a condition

JAMA

320

1929

–

1930

49.

Kim

M.O.

Coiera

and

Magrabi

(

2017

)

Problems with health information technology and their effects on care delivery and patient outcomes: a systematic review

JAMIA

246

–

250

PubMed

50.

Witte

J.S.

(

2010

)

Genome-wide association studies and beyond

Annu. Rev. Public Health

–

51.

Chang

and

Cai

(

2018

)

An overview of genome-wide association studies

Methods Mol. Biol. (Clifton, N.J.)

1754

–

108

52.

Ahmed

Renart

E.G.

Zeeshan

et al. (

2021

)

Advancing clinical genomics and precision medicine with GVViZ: FAIR bioinformatics platform for variable gene-disease annotation, visualization, and expression analysis

Hum. Genomics

, 37.