DAPredict: a database for drug action phenotype prediction Open Access

Data in DAPredict. Data: This column represents the type of stored data, including known information obtained from existing databases and relationship pairs predicted by our original algorithm. Source: This column represents the source of stored data. Number: This column represents the number of entries of stored data. Description: This column represents the detailed description of stored data. (Abbreviation, SUB: Substructure)

Data	Source	Number	Description
Approved drug	DrugBank	1748	Approved drugs for predicting the action phenotype.
ADR	SIDER	454	Adverse reactions to be predicted.
ATC	DrugBank	178	Drug therapy effects to be predicted.
Compound	PubChem	111 428 061	Compounds for predicting the action phenotype online.
Drug-ADR	Original algorithm	305 981	Drug-ADR predicted relationships.
Drug-ATC	Original algorithm	83 117	Drug-ATC predicted relationships.
SUB-ADR	Original algorithm	5808	Substructure-ADR predicted relationships.
SUB-ATC	Original algorithm	1748	Substructure-ATC predicted relationships.
SUB-Domain	Original algorithm	33 334	Substructure-domain predicted relationships.
Domain-ADR	Original algorithm	30 874	Domain-ADR predicted relationships.
Domain-ATC	Original algorithm	2788	Domain-ATC predicted relationships.

Data	Source	Number	Description
Approved drug	DrugBank	1748	Approved drugs for predicting the action phenotype.
ADR	SIDER	454	Adverse reactions to be predicted.
ATC	DrugBank	178	Drug therapy effects to be predicted.
Compound	PubChem	111 428 061	Compounds for predicting the action phenotype online.
Drug-ADR	Original algorithm	305 981	Drug-ADR predicted relationships.
Drug-ATC	Original algorithm	83 117	Drug-ATC predicted relationships.
SUB-ADR	Original algorithm	5808	Substructure-ADR predicted relationships.
SUB-ATC	Original algorithm	1748	Substructure-ATC predicted relationships.
SUB-Domain	Original algorithm	33 334	Substructure-domain predicted relationships.
Domain-ADR	Original algorithm	30 874	Domain-ADR predicted relationships.
Domain-ATC	Original algorithm	2788	Domain-ATC predicted relationships.

Table 1.

Data	Source	Number	Description
Approved drug	DrugBank	1748	Approved drugs for predicting the action phenotype.
ADR	SIDER	454	Adverse reactions to be predicted.
ATC	DrugBank	178	Drug therapy effects to be predicted.
Compound	PubChem	111 428 061	Compounds for predicting the action phenotype online.
Drug-ADR	Original algorithm	305 981	Drug-ADR predicted relationships.
Drug-ATC	Original algorithm	83 117	Drug-ATC predicted relationships.
SUB-ADR	Original algorithm	5808	Substructure-ADR predicted relationships.
SUB-ATC	Original algorithm	1748	Substructure-ATC predicted relationships.
SUB-Domain	Original algorithm	33 334	Substructure-domain predicted relationships.
Domain-ADR	Original algorithm	30 874	Domain-ADR predicted relationships.
Domain-ATC	Original algorithm	2788	Domain-ATC predicted relationships.

Data	Source	Number	Description
Approved drug	DrugBank	1748	Approved drugs for predicting the action phenotype.
ADR	SIDER	454	Adverse reactions to be predicted.
ATC	DrugBank	178	Drug therapy effects to be predicted.
Compound	PubChem	111 428 061	Compounds for predicting the action phenotype online.
Drug-ADR	Original algorithm	305 981	Drug-ADR predicted relationships.
Drug-ATC	Original algorithm	83 117	Drug-ATC predicted relationships.
SUB-ADR	Original algorithm	5808	Substructure-ADR predicted relationships.
SUB-ATC	Original algorithm	1748	Substructure-ATC predicted relationships.
SUB-Domain	Original algorithm	33 334	Substructure-domain predicted relationships.
Domain-ADR	Original algorithm	30 874	Domain-ADR predicted relationships.
Domain-ATC	Original algorithm	2788	Domain-ATC predicted relationships.

Relationship score grade

To intuitively quantify the degree of relationship scores, all relationship scores are used as the background, and the scores are divided into five levels by quintiles, namely lowest, low, medium, high and highest. The division thresholds for each relationship are shown in Table 2.

Table 2.

Score grade threshold. Relationship: This column shows the names of relationship pairs. Grade: This column represents the five levels of correlation, namely highest, high, medium, low and lowest. Threshold: This column shows the correlation range of each level. (Abbreviation, SUB: Substructure)

Relationship	Grade	Threshold
Drug-ADR	Highest	(0.000630035,+∞)
Drug-ADR	High	(0.0000172,0.000630035]
Drug-ADR	Medium	(1.64E-10,0.0000172]
Drug-ADR	Low	(3.56E-21,1.64E-10]
Drug-ADR	Lowest	(0,3.56E-21]
Drug-ATC	Highest	(0.00124725,+∞)
Drug-ATC	High	(0.00000675,0.00124725]
Drug-ATC	Medium	(7.45E-10,0.00000675]
Drug-ATC	Low	(3.28E-22,7.45E-10]
Drug-ATC	Lowest	(0,3.28E-22]
Drug-SUB-ADR	Highest	(0.1976036,+∞)
Drug-SUB-ADR	High	(0.00054703,0.1976036]
Drug-SUB-ADR	Medium	(3.480196e-16,0.00054703]
Drug-SUB-ADR	Low	(5.73442e-26,3.480196e-16]
Drug-SUB-ADR	Lowest	(0,5.73442e-26]
Drug-SUB-ATC	Highest	(0.749394,+∞)
Drug-SUB-ATC	High	(0.296104,0.749394]
Drug-SUB-ATC	Medium	(0.000727606,0.296104]
Drug-SUB-ATC	Low	(3.13E-13,0.000727606]
Drug-SUB-ATC	Lowest	(0,3.13E-13]
Drug-Domain-ADR	Highest	(18.3027,+∞)
Drug-Domain-ADR	High	(10.7562,18.3027]
Drug-Domain-ADR	Medium	(6.08470,10.7562]
Drug-Domain-ADR	Low	(2.63413,6.08470]
Drug-Domain-ADR	Lowest	(0,2.63413]
Drug-Domain-ATC	Highest	(24.3574,+∞)
Drug-Domain-ATC	High	(15.7606,24.3574]
Drug-Domain-ATC	Medium	(10.2282,15.7606]
Drug-Domain-ATC	Low	(4.94737,10.2282]
Drug-Domain-ATC	Lowest	(0,4.94737]

Relationship	Grade	Threshold
Drug-ADR	Highest	(0.000630035,+∞)
Drug-ADR	High	(0.0000172,0.000630035]
Drug-ADR	Medium	(1.64E-10,0.0000172]
Drug-ADR	Low	(3.56E-21,1.64E-10]
Drug-ADR	Lowest	(0,3.56E-21]
Drug-ATC	Highest	(0.00124725,+∞)
Drug-ATC	High	(0.00000675,0.00124725]
Drug-ATC	Medium	(7.45E-10,0.00000675]
Drug-ATC	Low	(3.28E-22,7.45E-10]
Drug-ATC	Lowest	(0,3.28E-22]
Drug-SUB-ADR	Highest	(0.1976036,+∞)
Drug-SUB-ADR	High	(0.00054703,0.1976036]
Drug-SUB-ADR	Medium	(3.480196e-16,0.00054703]
Drug-SUB-ADR	Low	(5.73442e-26,3.480196e-16]
Drug-SUB-ADR	Lowest	(0,5.73442e-26]
Drug-SUB-ATC	Highest	(0.749394,+∞)
Drug-SUB-ATC	High	(0.296104,0.749394]
Drug-SUB-ATC	Medium	(0.000727606,0.296104]
Drug-SUB-ATC	Low	(3.13E-13,0.000727606]
Drug-SUB-ATC	Lowest	(0,3.13E-13]
Drug-Domain-ADR	Highest	(18.3027,+∞)
Drug-Domain-ADR	High	(10.7562,18.3027]
Drug-Domain-ADR	Medium	(6.08470,10.7562]
Drug-Domain-ADR	Low	(2.63413,6.08470]
Drug-Domain-ADR	Lowest	(0,2.63413]
Drug-Domain-ATC	Highest	(24.3574,+∞)
Drug-Domain-ATC	High	(15.7606,24.3574]
Drug-Domain-ATC	Medium	(10.2282,15.7606]
Drug-Domain-ATC	Low	(4.94737,10.2282]
Drug-Domain-ATC	Lowest	(0,4.94737]

Table 2.

Open in new tab Download slide

Relationship	Grade	Threshold
Drug-ADR	Highest	(0.000630035,+∞)
Drug-ADR	High	(0.0000172,0.000630035]
Drug-ADR	Medium	(1.64E-10,0.0000172]
Drug-ADR	Low	(3.56E-21,1.64E-10]
Drug-ADR	Lowest	(0,3.56E-21]
Drug-ATC	Highest	(0.00124725,+∞)
Drug-ATC	High	(0.00000675,0.00124725]
Drug-ATC	Medium	(7.45E-10,0.00000675]
Drug-ATC	Low	(3.28E-22,7.45E-10]
Drug-ATC	Lowest	(0,3.28E-22]
Drug-SUB-ADR	Highest	(0.1976036,+∞)
Drug-SUB-ADR	High	(0.00054703,0.1976036]
Drug-SUB-ADR	Medium	(3.480196e-16,0.00054703]
Drug-SUB-ADR	Low	(5.73442e-26,3.480196e-16]
Drug-SUB-ADR	Lowest	(0,5.73442e-26]
Drug-SUB-ATC	Highest	(0.749394,+∞)
Drug-SUB-ATC	High	(0.296104,0.749394]
Drug-SUB-ATC	Medium	(0.000727606,0.296104]
Drug-SUB-ATC	Low	(3.13E-13,0.000727606]
Drug-SUB-ATC	Lowest	(0,3.13E-13]
Drug-Domain-ADR	Highest	(18.3027,+∞)
Drug-Domain-ADR	High	(10.7562,18.3027]
Drug-Domain-ADR	Medium	(6.08470,10.7562]
Drug-Domain-ADR	Low	(2.63413,6.08470]
Drug-Domain-ADR	Lowest	(0,2.63413]
Drug-Domain-ATC	Highest	(24.3574,+∞)
Drug-Domain-ATC	High	(15.7606,24.3574]
Drug-Domain-ATC	Medium	(10.2282,15.7606]
Drug-Domain-ATC	Low	(4.94737,10.2282]
Drug-Domain-ATC	Lowest	(0,4.94737]

Relationship	Grade	Threshold
Drug-ADR	Highest	(0.000630035,+∞)
Drug-ADR	High	(0.0000172,0.000630035]
Drug-ADR	Medium	(1.64E-10,0.0000172]
Drug-ADR	Low	(3.56E-21,1.64E-10]
Drug-ADR	Lowest	(0,3.56E-21]
Drug-ATC	Highest	(0.00124725,+∞)
Drug-ATC	High	(0.00000675,0.00124725]
Drug-ATC	Medium	(7.45E-10,0.00000675]
Drug-ATC	Low	(3.28E-22,7.45E-10]
Drug-ATC	Lowest	(0,3.28E-22]
Drug-SUB-ADR	Highest	(0.1976036,+∞)
Drug-SUB-ADR	High	(0.00054703,0.1976036]
Drug-SUB-ADR	Medium	(3.480196e-16,0.00054703]
Drug-SUB-ADR	Low	(5.73442e-26,3.480196e-16]
Drug-SUB-ADR	Lowest	(0,5.73442e-26]
Drug-SUB-ATC	Highest	(0.749394,+∞)
Drug-SUB-ATC	High	(0.296104,0.749394]
Drug-SUB-ATC	Medium	(0.000727606,0.296104]
Drug-SUB-ATC	Low	(3.13E-13,0.000727606]
Drug-SUB-ATC	Lowest	(0,3.13E-13]
Drug-Domain-ADR	Highest	(18.3027,+∞)
Drug-Domain-ADR	High	(10.7562,18.3027]
Drug-Domain-ADR	Medium	(6.08470,10.7562]
Drug-Domain-ADR	Low	(2.63413,6.08470]
Drug-Domain-ADR	Lowest	(0,2.63413]
Drug-Domain-ATC	Highest	(24.3574,+∞)
Drug-Domain-ATC	High	(15.7606,24.3574]
Drug-Domain-ATC	Medium	(10.2282,15.7606]
Drug-Domain-ATC	Low	(4.94737,10.2282]
Drug-Domain-ATC	Lowest	(0,4.94737]

Since the prediction of compounds is performed online, it is impossible to obtain the relationship score between all compounds and phenotypes in advance to construct the background, and the approved drugs that have been predicted in advance are representative of all compounds, so the score thresholds of the online prediction were set consistent with the approved drugs.

Web interface and visualization

The web page is divided into five sections, namely home, search, tool, download and help. The home section shows the introduction of DAPredict, and the search section provides users with the direct search function of the pre-predicted full action phenotype spectrum, potential action domain and action phenotype-related substructures of 1748 approved drugs. The tool section provides users with online real-time prediction functions of the full action phenotype spectrum, potential action domain and action phenotype-related substructures of more than 110 000 000 compounds included in the PubChem database. The download section provides users with a wealth of downloadable resources, including the pre-predicted full action phenotype relationship data of 1748 approved drugs, and the relationship data between substructure and ADR, ATC and target protein domain. In addition, DAPredict also includes drug-target relationship data from the DrugBank database and target-domain relationship data from the Pfam database. The help section describes browser compatibility and specific usage tutorial of DAPredict.

All results in DAPredict are presented in interactive graphs and tables. The forms of interactive graphs include line plots (default) and bar plots (Figure 2). The grade of the relationship score is distinguished by different colors and can be displayed selectively by users in graphs. Due to the limited space on the webpage, we have set a sliding window function for interactive graphs, and users can view the results within the sliding window in more detail by dragging the sliding window. The interactive table provides users with column sorting and a secondary search function by which users can further filter keywords in the search results (Figure 3). Clickable hyperlinks in the table are underlined, and users can easily jump to related sites. Both graphs and tables are available for download in DAPredict.

Figure 2.

Interactive graph. ① Sliding window: Users can drag the handles at both ends to adjust the size of the sliding window. Drag the sliding window to further explore a specific small range of data. ② Save: Users can save the current graph. ③ Restore: Users can restore the graph to its original form. ④ Bar plot: Users can click this button to switch the default line chart to a bar chart. When the score is very small, it is recommended to view the bar chart. ⑤ Rank: The predicted relevance score is divided into five grades (see the search and tool sections for details), and users can click the label to filter the results by rank attribute. ⑥ Interactive nodes: Users can hover over nodes to view details.

Alt text: This figure demonstrates the use of interactive graphs in the database.

Figure 3.

Interactive table. ① Search: search within the form is available to further filter valuable results. ② Format conversion: users can switch the present format of the result form. ③ Column display: users can customize the displayed columns. ④ Export: users can export the result on the current page in CSV or Excel formats (to download all results, please select display number as all.). ⑤ Arrange: users can sort the form in ascending or descending order based on a column. ⑥ Related link: hyperlink is underlined in blue, and users can click to go to the relevant page. ⑦ Display number: the number of results displayed on the current page. ⑧ Page: users can turn pages here.

Open in new tab Download slide

Alt text: This figure demonstrates the use of interactive tables in the database.

Discussion

We have developed a comprehensive and easy-to-use drug action phenotype search and prediction platform based on our original prediction algorithm, which includes not only the pre-predicted full action phenotype profile of 1748 approved drugs but also the ability to predict more than 110 000 000 known compounds online. It is valuable to reposition approved drugs with reference to the predicted full action phenotype profile, which would save a lot of manpower, resources, finances and time compared to de novo drug development. In addition, the screening of lead compounds is one of the most important steps in the development of new drugs. Predicting and evaluating the action phenotype of known compounds can provide a reference value for the screening of lead compounds, which also includes natural products that are less studied at present. More importantly, DAPredict can also provide researchers with information on the mechanism of action of a drug or compound and substructure information related to the action phenotype, which is of great significance in druggability evaluation and structural optimization in the development of new drugs.

At present, the number of known compounds is constantly increasing, so our database needs to be constantly updated. In addition, the predicted results may still need to be checked manually. This would be a time-consuming process because of the huge number of compounds. We hope that DAPredict can provide a convenient and valuable data resource for drug developers and pharmacogenomics researchers.

Data availability

All data in DAPredict are stored in the download section, and users can download it on demand.

Contribution statement

Conceptualization, Qingkang Meng and Xiujie Chen; Data curation, Yiyang Cai; Formal analysis, Qingkang Meng, Kun Zhou, Fei Xu and Diwei Huo; Funding acquisition, Xiujie Chen and Denan Zhang; Methodology, Qingkang Meng and Yiyang Cai; Project administration, Qingkang Meng and Xiujie Chen; Software, Kun Zhou, Fei Xu, Diwei Huo, Hongbo Xie, Meini Yu and Denan Zhang; Writing—original draft, Qingkang Meng; Writing—review and editing, Yiyang Cai, Kun Zhou, Fei Xu, Diwei Huo, Hongbo Xie, Meini Yu, Denan Zhang and Xiujie Chen.

Funding

Funding for open access charge: National Human Genome Research Institute (2P41HG02273-07). National Natural Science Foundation of China (Grant Number: 61671191, 62072144).

Conflict of interest statement

None declared.

Code availability

No custom code was used. Software tools used for processing are mentioned in the Methods section.

References

Berg

E.L.

(

2021

)

The future of phenotypic drug discovery

Cell Chem. Biol.

424

–

430

Luo

Yang

et al. (

2021

)

Biomedical data and computational models for drug repositioning: a comprehensive review

Brief. Bioinformatics

1604

–

1619

Crossref

Schcolnik-Cabrera

Juarez-Lopez

and

Duenas-Gonzalez

(

2021

)

Perspectives on drug repurposing

Curr. Med. Chem.

2085

–

2099

Brogi

(

2019

)

Computational approaches for drug discovery

Molecules

, 3061.

OpenURL Placeholder Text

Shi

J.Y.

Liu

et al. (

2015

)

Predicting drug-target interactions via within-score and between-score

BioMed Res. Int.

2015

, 350983.

OpenURL Placeholder Text

Tabei

Pauwels

Stoven

et al. (

2012

)

Identification of chemogenomic features from drug-target interaction networks using interpretable classifiers

Bioinformatics

i487

–

i494

Yang

Zhang

Liu

et al. (

2021

)

Computational drug repositioning based on the relationships between substructure-indication

Brief. Bioinformatics

, bbaa348.

OpenURL Placeholder Text