SynLethDB 2.0: a web-based knowledge graph database on synthetic lethality for novel anticancer drug discovery

Quantitative Scores Assigned to SLs According to Experimental Methods.

Method	Score
CRISPR interference	0.85
Drug inhibition	0.75
RNAi	0.75
Low throughput	0.80
High throughput	0.50

Table 1.

Quantitative Scores Assigned to SLs According to Experimental Methods.

Method	Score
CRISPR interference	0.85
Drug inhibition	0.75
RNAi	0.75
Low throughput	0.80
High throughput	0.50

In the integration step, we integrated the scores of different types of sources into a normalized confidence score for every SL pair. Different weights were assigned according to the source types. The integration formula for the final confidence score of an SL pair is

$$ S_c=\frac{w_{m} s_{m}+w_{d} s_{d}+w_{p} s_{p}+w_{t} s_{t}}{w_{m}+w_{d}+w_{p}+w_{t}}, $$

(2)

where the default values of w_m, w_d, w_p and w_t were set to 0.8, 0.5, 0.3 and 0.2, as the weight factors of biochemical experiment, existing databases, computational prediction and text mining, respectively. Note that users can customize these weight values according to their own experience or preference when querying and ranking the SLs on the web interface.

Gene set enrichment analysis

Given a gene g, let G denote the set of all SL partner genes of g. The enrichment analysis is to find out the pathways and GO terms from each of the three ontologies (i.e. biological process, molecular function and cellular component) that occur significantly more frequently than randomly in the gene set G. We implemented two enrichment analysis methods based on the degree information and P-value, respectively.

Degree-based gene set enrichment analysis

An SLPR (Synthetic Lethality PageRank) score inspired by PageRank (44) was computed for each pathway or GO term associated with the gene set G. The pathways and GO terms can be ranked based on their SLPR scores. A larger SLPR score means that a pathway or GO term is more closely associated with the gene set. The SLPR score is defined as:

$$ SLPR = (1 - d) + d \times \sum_{l \in L}^{|L|}[(1 - q) + q \times S_c(g, l) \times degree(l)^w)], $$

(3)

where d is a damping factor set to 0.85, q is another damping factor set to 0.8 and w is set to −1 to reflect a negative correlation. For a specific pathway or GO term, L represents the subset of genes in set G that are directly connected with it. Given a gene |$l \in L$|⁠, |$S_c(g, l)$| is the confidence score of the SL pair (g, l), degree(l) is the number of pathways or GO terms associated with l.

P-value-based gene set enrichment analysis

Assume that M is the number of genes in G and N is the number of genes having SL partners in the whole database. Given a specific pathway or GO term, n is the total number of genes associated with it and m is the number of genes in G associated with it. To show the enrichment of the gene set G with the pathway or GO term, we calculate a P-value as follows (45):

$$ P=1-\sum_{i=0}^{m-1} \frac{\left(\begin{array}{c} M \\ i \end{array}\right)\left(\begin{array}{l} N-M \\ n-i \end{array}\right)}{\left(\begin{array}{l} N \\ n \end{array}\right)}. $$

(4)

Thus, we attain a list of pathways or GO terms sorted in order of the P-values. A smaller P-value means that G is more enriched with the given pathway or GO term.

SynLethDB 2.0 portal

A user-friendly web interface has been developed for SynLethDB to facilitate data visualization, analysis and interpretation. Compared with SynLethDB 1.0, SynLethDB 2.0 provides more ways for searching and browsing SLs. For example, users may wish to start with a type of cancer to find SLs associated with the cancer. Thus, in SynLethDB 2.0, in addition to searching by genes, users can also search SLs by a disease name. SynLethDB 2.0 also allows users to browse the part of knowledge graph related to an SL gene pair with a graph viewer to help understand the mechanism underlying the SL. Besides, it allows users to customize the integration weights of confidence scores so that those SLs more reliable or interesting to users would be prioritized at the top of the list of query results. In addition to the enrichment analysis tool based on P-values, SynLethDB 2.0 also provides a gene set enrichment analysis tool based on the graph degree, which ranks pathways and GO terms by their relevance to a gene set inferred based on the network topology of the knowledge graph.

On the home page of the website of SynLethDB, we provide a general introduction to the database, as well as the search bar for looking up SLs by gene symbols or gene IDs. Other functionalities of SynLethDB can be accessed by menu tabs on the website as follows.

Searching and browsing the SLs

In the first version of SynLethDB, users could only search for SLs by genes. In this new version, we collected 14 116 gene–cancer relationships and 56 921 gene–compound relationships for those genes involved in SLs from DisGeNET (46), DrugBank (47) and BindingDB (48). Based on these new data, we offer two new options for searching, namely, ‘search SL by disease’ and ‘search SL by compound’, and provide the autocomplete function to the list of all available cancers or compounds in SynLethDB. The searching results are shown in a table viewer.

Customizable confidence scores for SLs

A confidence score reflects an SL’s credibility based on its sources, which can be used to rank SLs. As mentioned earlier, we use a two-step scoring procedure (i.e. quantification and integration) to assign a confidence score based on the sources of the SL. In the quantification step, we assigned the quantitative scores to SL pairs according to their experimental methods as shown in Table 1. In the integration step, we provide default values for the weight factors but allow users to customize these weights to facilitate them to extract the SLs of a certain type of source that they are most interested in. When searching and browsing SLs by genes, users can adjust the weight factors of source types and rank results by the confidence scores.

Searching and browsing the knowledge graph SynLethKG

SynLethKG contains relationships that describe various features for genes, cancers and drugs. With the ‘Inspect SL’ functionality, all these relationships are categorized by their node types and can be browsed through an interactive graph viewer. Starting with SL genes to be inspected, users only need to click on the node they are about to inspect, and the graph viewer can fetch and visualize the results. The type of relationships and the number of edges to be displayed can be specified by the users. Properties of the nodes and edges, such as data sources and entity IDs, can also be viewed through an infobox in the upper right corner.

Gene set enrichment analysis of SL partners

We developed two methods for gene set enrichment analysis based on P-values and node degrees, respectively. Both methods take a gene symbol as input and conduct gene set enrichment analysis for the SL partners of this gene. The output includes the rank of the pathways and GO terms separately. A higher ranking of a pathway or a GO term indicates that the SL partners of this gene are more enriched with this pathway or GO term. The P-value-based enrichment analysis tool ranks the results by P-value calculated in Equation (4), and a lower P-value corresponds to higher ranking. Meanwhile, the degree-based enrichment analysis tool ranks the pathways and GO terms based on the SLPR score as calculated in Equation (3), and a higher SLPR score corresponds to a higher ranking.

Data access and download

We provide a download page to make it easy for users to retrieve a large amount of data. All the SL gene pairs are classified by species and can be downloaded in either CSV or JSON format. We provide the files of SynLethKG in the formats of CSV, JSON and GraphML for users to download. In particular, the datasets in GraphML format can be imported to other software tools such as Gephi and Cytoscape for analysis and visualization. For users who prefer the triplet format, we also provide a CSV file that contains all the relationships in the format (source, relationship, target). All the data can be freely accessed and downloaded without a login requirement. RESTful Application Programming Interfaces (APIs) are also provided for users to access and analyze the data by running the scripts in programming languages such as Python and R.

User manual

To lower the learning curve for new users of SynLethDB, we offer a web page containing a user manual, which gives an introduction to every functionality of SynLethDB, as well as examples of using the web interface, RESTful APIs and SynLethKG.

Results

Comparison with other databases

In this section, we compare SynLethDB 2.0 with existing databases of SL. SLKG (36) is a knowledge graph about SL, and it focuses on drug repositioning for tumor-specific treatments based on the concepts of SL and synthetic dosage lethality (SDL). It contains the relationships among genes, drugs and cancers. There are 19 987 SLs and 3039 SDLs in SLKG. Compared with SLKG, SynLethKG is focused on collecting existing SLs and related knowledge, and it includes more types of relationships and a more comprehensive list of SLs. SynLethDB 2.0 contains 35 943 human SLs and, in addition to relationships among genes, drugs and cancer types, it contains the relationships between genes and pathways, drugs and pharmacologic classes and so on. SynLethDB 1.0 is the first comprehensive database of SL. Based on that, SynLethDB 2.0 is even more comprehensive and user-friendly, as we have made extensive and important updates to the database in the following aspects.

Firstly, SynLethDB 2.0 is arguably the most up-to-date and most comprehensive database for SL. It contains 50 868 SL pairs in total, almost doubling the number of SL pairs in SynLethDB 1.0. In particular, SynLethDB 2.0 contains 35 943 human SLs, 381 mouse SLs, 439 fly SLs, 14 000 yeast SLs and 105 worm SLs as shown in Table 2. The number of human SLs in SynLethDB 2.0 is almost 1.8 times that in SynLethDB 1.0. Consistent with SynLethDB 1.0, we also provide the HUGO Gene Nomenclature Committee gene symbols, Entrez gene IDs, PubMed IDs of its original publications, types of sources and the confidence score calculated according to the sources for each SL pair in SynLethDB 2.0. Note that we updated the confidence scores by considering new sources of SLs such as CRISPR screening and allowing user-defined weight factors.

Table 2.

Comparison of Statistics Between Two Versions of SynLethDB.

	SynLethDB 1.0	SynLethDB 2.0
# Human SLs	19 952	35 943
# Mouse SLs	366	381
# Fly SLs	423	439
# Worm SLs	105	105
# Yeast SLs	13 241	14 000
KG	No	Yes
Annotation	SLs only	Yes
Offline dataset	Yes	Yes
RESTful APIs	No	Yes

	SynLethDB 1.0	SynLethDB 2.0
# Human SLs	19 952	35 943
# Mouse SLs	366	381
# Fly SLs	423	439
# Worm SLs	105	105
# Yeast SLs	13 241	14 000
KG	No	Yes
Annotation	SLs only	Yes
Offline dataset	Yes	Yes
RESTful APIs	No	Yes

Table 2.

Comparison of Statistics Between Two Versions of SynLethDB.

	SynLethDB 1.0	SynLethDB 2.0
# Human SLs	19 952	35 943
# Mouse SLs	366	381
# Fly SLs	423	439
# Worm SLs	105	105
# Yeast SLs	13 241	14 000
KG	No	Yes
Annotation	SLs only	Yes
Offline dataset	Yes	Yes
RESTful APIs	No	Yes

	SynLethDB 1.0	SynLethDB 2.0
# Human SLs	19 952	35 943
# Mouse SLs	366	381
# Fly SLs	423	439
# Worm SLs	105	105
# Yeast SLs	13 241	14 000
KG	No	Yes
Annotation	SLs only	Yes
Offline dataset	Yes	Yes
RESTful APIs	No	Yes

Secondly, SynLethDB 2.0 provides more types of biomedical knowledge. SynLethDB 1.0 comprises mainly the SL relationships between genes. By adding the knowledge graph SynLethKG, SynLethDB 2.0 contains much more types of entities and relationships, including biological processes, pathways, molecule functions and cellular components for genes, pharmacologic classes and side effects for drugs, symptoms and anatomies for cancers. Overall, there are 37 341 entities (nodes) and 1 405 652 relationships (edges) in SynLethKG as shown in Table 3. The types of relationships and their numbers are listed in Table 4. In addition, SynLethDB 2.0 retains the annotations of SLs from SynLethDB 1.0 and corrects them. It also adds annotations to the nodes and edges in SynLethKG, such as the name of entity, the data source and the link to entity in the original data source, and other annotations such as the organisms of genes and the thresholds used when extracting the relationships. Therefore, SynLethDB 2.0 can provide users with more comprehensive annotations for the entries and relationships. Table 5 shows the number of each type of entities in SynLethKG and the average numbers of annotations and relationships of each kind of entities. The average number of relationships for each type of nodes was counted by adding the numbers of incident edges among all the nodes and dividing the sum by the total number of nodes.

Table 3.

Statistics About the Knowledge Graph SynLethKG

	# genes	9856
Human SLs	# interactions	35 943
	Density	0.07%
	# entity types	11
SynLethKG	# relationship types	27
	# nodes	37 341
	# edges	1 405 652

	# genes	9856
Human SLs	# interactions	35 943
	Density	0.07%
	# entity types	11
SynLethKG	# relationship types	27
	# nodes	37 341
	# edges	1 405 652

Table 3.

Statistics About the Knowledge Graph SynLethKG

	# genes	9856
Human SLs	# interactions	35 943
	Density	0.07%
	# entity types	11
SynLethKG	# relationship types	27
	# nodes	37 341
	# edges	1 405 652

	# genes	9856
Human SLs	# interactions	35 943
	Density	0.07%
	# entity types	11
SynLethKG	# relationship types	27
	# nodes	37 341
	# edges	1 405 652

Table 4.

Numbers of the Relationships in SynLethKG.

Type	# Edges
(Anatomy, downregulates, Gene)	31
(Anatomy, expresses, Gene)	358 005
(Anatomy, upregulates, Gene)	26
(Compound, binds, Gene)	11 453
(Compound, causes, Side Effect)	135 063
(Compound, downregulates, Gene)	17 506
(Compound, palliates, Cancer)	42
(Compound, resembles, Compound)	5500
(Compound, treats, Cancer)	282
(Compound, upregulates, Gene)	13 573
(Cancer, associates, Gene)	7708
(Cancer, downregulates, Gene)	988
(Cancer, localizes, Anatomy)	1444
(Cancer, presents, Symptom)	1048
(Cancer, resembles, Cancer)	106
(Cancer, upregulates, Gene)	1263
(Gene, covaries, Gene)	16 985
(Gene, interacts, Gene)	87 103
(Gene, non-synthetic lethal, Gene)	2831
(Gene, participates, Biological Process)	393 049
(Gene, participates, Cellular Component)	59 054
(Gene, participates, Molecular Function)	65 207
(Gene, participates, Pathway)	41 790
(Gene, regulates, Gene)	147 639
(Gene, synthetic lethal, Gene)	35 943
(Gene, synthetic rescue, Gene)	895
(Pharmacologic Class, includes, Compound)	1118

Type	# Edges
(Anatomy, downregulates, Gene)	31
(Anatomy, expresses, Gene)	358 005
(Anatomy, upregulates, Gene)	26
(Compound, binds, Gene)	11 453
(Compound, causes, Side Effect)	135 063
(Compound, downregulates, Gene)	17 506
(Compound, palliates, Cancer)	42
(Compound, resembles, Compound)	5500
(Compound, treats, Cancer)	282
(Compound, upregulates, Gene)	13 573
(Cancer, associates, Gene)	7708
(Cancer, downregulates, Gene)	988
(Cancer, localizes, Anatomy)	1444
(Cancer, presents, Symptom)	1048
(Cancer, resembles, Cancer)	106
(Cancer, upregulates, Gene)	1263
(Gene, covaries, Gene)	16 985
(Gene, interacts, Gene)	87 103
(Gene, non-synthetic lethal, Gene)	2831
(Gene, participates, Biological Process)	393 049
(Gene, participates, Cellular Component)	59 054
(Gene, participates, Molecular Function)	65 207
(Gene, participates, Pathway)	41 790
(Gene, regulates, Gene)	147 639
(Gene, synthetic lethal, Gene)	35 943
(Gene, synthetic rescue, Gene)	895
(Pharmacologic Class, includes, Compound)	1118

Table 4.

Numbers of the Relationships in SynLethKG.

Type	# Edges
(Anatomy, downregulates, Gene)	31
(Anatomy, expresses, Gene)	358 005
(Anatomy, upregulates, Gene)	26
(Compound, binds, Gene)	11 453
(Compound, causes, Side Effect)	135 063
(Compound, downregulates, Gene)	17 506
(Compound, palliates, Cancer)	42
(Compound, resembles, Compound)	5500
(Compound, treats, Cancer)	282
(Compound, upregulates, Gene)	13 573
(Cancer, associates, Gene)	7708
(Cancer, downregulates, Gene)	988
(Cancer, localizes, Anatomy)	1444
(Cancer, presents, Symptom)	1048
(Cancer, resembles, Cancer)	106
(Cancer, upregulates, Gene)	1263
(Gene, covaries, Gene)	16 985
(Gene, interacts, Gene)	87 103
(Gene, non-synthetic lethal, Gene)	2831
(Gene, participates, Biological Process)	393 049
(Gene, participates, Cellular Component)	59 054
(Gene, participates, Molecular Function)	65 207
(Gene, participates, Pathway)	41 790
(Gene, regulates, Gene)	147 639
(Gene, synthetic lethal, Gene)	35 943
(Gene, synthetic rescue, Gene)	895
(Pharmacologic Class, includes, Compound)	1118

Type	# Edges
(Anatomy, downregulates, Gene)	31
(Anatomy, expresses, Gene)	358 005
(Anatomy, upregulates, Gene)	26
(Compound, binds, Gene)	11 453
(Compound, causes, Side Effect)	135 063
(Compound, downregulates, Gene)	17 506
(Compound, palliates, Cancer)	42
(Compound, resembles, Compound)	5500
(Compound, treats, Cancer)	282
(Compound, upregulates, Gene)	13 573
(Cancer, associates, Gene)	7708
(Cancer, downregulates, Gene)	988
(Cancer, localizes, Anatomy)	1444
(Cancer, presents, Symptom)	1048
(Cancer, resembles, Cancer)	106
(Cancer, upregulates, Gene)	1263
(Gene, covaries, Gene)	16 985
(Gene, interacts, Gene)	87 103
(Gene, non-synthetic lethal, Gene)	2831
(Gene, participates, Biological Process)	393 049
(Gene, participates, Cellular Component)	59 054
(Gene, participates, Molecular Function)	65 207
(Gene, participates, Pathway)	41 790
(Gene, regulates, Gene)	147 639
(Gene, synthetic lethal, Gene)	35 943
(Gene, synthetic rescue, Gene)	895
(Pharmacologic Class, includes, Compound)	1118

Table 5.

Statistics About the Entities in SynLethKG.

Labels (n)	Size	Avg_Ann^a	Avg_Rel^b
SideEffect	5664	5.00	23.85
Gene	14 100	8.00	112.99
BiologicalProcess	12 141	5.00	32.37
Compound	1898	7.00	100.12
MolecularFunction	3012	5.00	21.65
Anatomy	390	6.64	921.81
CellularComponent	1619	5.00	36.48
Pathway	2069	5.00	20.63
Symptom	325	5.00	3.224
PharmacologicClass	357	6.00	3.13
Cancer	53	5.00	245.04

Labels (n)	Size	Avg_Ann^a	Avg_Rel^b
SideEffect	5664	5.00	23.85
Gene	14 100	8.00	112.99
BiologicalProcess	12 141	5.00	32.37
Compound	1898	7.00	100.12
MolecularFunction	3012	5.00	21.65
Anatomy	390	6.64	921.81
CellularComponent	1619	5.00	36.48
Pathway	2069	5.00	20.63
Symptom	325	5.00	3.224
PharmacologicClass	357	6.00	3.13
Cancer	53	5.00	245.04

The average number of annotations of each type of nodes.

The average number of relationships of each type of nodes.

Table 5.

Open in new tab Download slide

Statistics About the Entities in SynLethKG.

Labels (n)	Size	Avg_Ann^a	Avg_Rel^b
SideEffect	5664	5.00	23.85
Gene	14 100	8.00	112.99
BiologicalProcess	12 141	5.00	32.37
Compound	1898	7.00	100.12
MolecularFunction	3012	5.00	21.65
Anatomy	390	6.64	921.81
CellularComponent	1619	5.00	36.48
Pathway	2069	5.00	20.63
Symptom	325	5.00	3.224
PharmacologicClass	357	6.00	3.13
Cancer	53	5.00	245.04

Labels (n)	Size	Avg_Ann^a	Avg_Rel^b
SideEffect	5664	5.00	23.85
Gene	14 100	8.00	112.99
BiologicalProcess	12 141	5.00	32.37
Compound	1898	7.00	100.12
MolecularFunction	3012	5.00	21.65
Anatomy	390	6.64	921.81
CellularComponent	1619	5.00	36.48
Pathway	2069	5.00	20.63
Symptom	325	5.00	3.224
PharmacologicClass	357	6.00	3.13
Cancer	53	5.00	245.04

The average number of annotations of each type of nodes.

The average number of relationships of each type of nodes.

Thirdly, SynLethDB 2.0 provides additional ways to access the data. In SynLethDB 1.0, users can only access the SLs by searching a gene name as the query. In SynLethDB 2.0, users can search by names of drugs or cancers. In addition to searching from the web interface, users can also download the raw dataset from the website. Besides, SynLethDB 2.0 provides RESTful APIs which allow users to access the data through different programming languages like Python and R as well as command line.

SynLethKG for cancer driver genes

In SynLethKG, we collected various relationships for the genes involved in the SL pairs, including gene–gene relationships (gene expression covariation, gene interaction and gene regulation), GO annotations and pathways as shown in Table 4. In particular, SynLethKG has 14 100 genes, 12 141 biological processes, 3012 molecular functions, 1619 cellular components, 2026 pathways, etc. as nodes and their relationships as edges in Table 5.

Moreover, SynLethKG also contains 9959 relationships between the genes and 53 cancers from DisGeNET database (46) and 42 532 relationships between the genes and 1898 compounds from DrugBank database (47). For cancers, 325 symptoms and 390 anatomies are included as entities to describe the cancer features. For drugs, 357 pharmacologic classes and 5664 side effects are included as entities to describe drug features. Based on the same strategy as in SL-BioDP (35), we counted the numbers of cancer driver genes and genes from hallmark cancer pathways in 32 cancer types from The Cancer Genome Atlas (TCGA) contained in SynLethKG, as well as the numbers of their SL partners and related drugs.

Figure 2 shows the numbers of cancer driver genes, their SL partners and related drugs. We can observe that several cancers have quite a few SL pairs and drugs related to their driver genes, including BLCA (Bladder urothelial carcinoma), BRCA (Breast invasive carcinoma), CESC (Cervical squamous cell carcinoma and endocervical adenocarcinoma), COADREAD (Colorectal adenocarcinoma), HNSC (Head and neck squamous cell carcinoma), LGG (Brain lower grade glioma), LIHC (Liver hepatocellular carcinoma), SKCM (Skin cutaneous melanoma) and UCEC (Uterine corpus endometrial carcinoma). Figure 2 demonstrates that our database contains useful information about many genes, SLs and drugs related to cancers, making it a powerful tool for data-driven discovery and analysis of anticancer drug targets.

Figure 2.

SLs and drugs in SynLethKG for driver genes of 32 cancers and pan-caner. The bar chart shows the numbers of cancer driver genes in SynLethKG. In the left figure, the line chart represents the numbers of SLs containing the cancer driver genes in SynLethKG, and in the right figure, the line chart represents the numbers of drugs associated with the driver genes in SynLethKG.

Case study

To demonstrate how to use SynLethDB 2.0 to discover drug targets, let us do a case study of searching SL partner genes of BRCA1 in breast cancer through the web interface as shown in Figure 3. First, with the ‘Search SL by disease’ module, we choose ‘breast cancer’ as the disease and select the relationship ‘Disease Associates Gene’. Then, the first line of the results shows that breast cancer is associated with breast cancer associated gene 1 (BRCA1). By clicking the ‘Search SL’ button in the ‘Function’ column, we searched for the SL partner genes of BRCA1. The result shows that poly (ADP-ribose) polymerase 2 (PARP2) is an SL partner of BRCA1 with a high confidence score (0.87). Hence we choose this SL pair for further inspection. By clicking the ‘Inspect’ button in the ‘Function’ column, we can browse more knowledge about this SL pair. Different types of biomedical relationships can be browsed by clicking the nodes in the graph. For example, we can see that both BRCA1 and PARP2 are associated with the ‘ovarian cancer’ and ‘breast cancer’ diseases, and BRCA1 participates in the ‘DNA Damage Response’ pathway, consistent with the literature. The annotations of any node or edge can be viewed in the infobox at the upper right corner by hovering the mouse over the node or edge. We hovered the mouse over the edge between ‘BRCA1’ and ‘PARP2’, and the infobox displayed the annotations of the SL relationship between ‘BRCA1’ and ‘PARP2’. The ‘pubmed_id’ attribute shows the PubMed IDs of papers which reported this SL; ‘cell_line’ shows the cell lines or cancer types in which this SL has been verified; ‘statistic_score’ is the confidence score of the SL; ‘unbiased’ indicates whether a relationship is bidirectional (when its value is true) or unidirectional (when its value is false); ‘source’ shows that this SL is collected from the Decipher project and Syn-Lethality database. As BRCA1 is known to be downregulated in breast cancer and PARP2 is an SL partner gene of BRCA1, we searched for compounds that downregulate PARP2 as candidate drugs for breast cancer. As shown in the knowledge graph at the lower left corner of Figure 3, Rucaparib, Talazoparib, Niraparib and Olaparib all bind to PARP2. On the other hand, we can also search for SLs related to Rucaparib, Talazoparib, Niraparib and Olaparib using the ‘Search SL by compound’ option. In this way, we can find that PARP2 is indeed a drug target, as shown in the right half of Figure 3. Through this case study, we can see how to use the basic functionalities of SynLethDB 2.0 through the web interface, which can be used to explore potential anticancer drug targets based on SL or analyze biological mechanisms behind SLs.

Figure 3.

A case study on BRCA1 in breast cancer. Using ‘Search SL by disease’, the SL genes associated with the disease will be shown. BRCA1 is a gene that has SL partners and is downregulated in breast cancer. Click the ‘Search SL’ button, and it shows that PARP2 is an SL partner of BRCA1 with a high confidence score (0.87). Inspecting this pair of SL genes, we notice that BRCA1 and PARP2 are both associated with the ‘ovarian cancer’ and ‘breast cancer’ diseases, and BRCA1 participates in the ‘DNA Damage Response’ pathway. The infobox at the upper right corner shows the annotations of the SL. The synthetic lethal relationship between BRCA1 and PARP2 has been reported in the literature on human cancer and cell lines including A549, PC3 and MDA468. The result that breast cancer downregulates BRCA1 and PARP2 is an SL partner of BRCA1 indicates that PARP2 is a drug target for breast cancer. Rucaparib, Talazoparib, Niraparib and Olaparib all bind with PARP2. Using ‘Search SL by compound’, we also identify PARP2 as a potential drug target of Rucaparib, Talazoparib, Niraparib and Olaparib.

Open in new tab Download slide

Discussion and conclusion

With the development of RNAi and CRISPR screening technologies, data about SL have increased rapidly in the past few years. We have been continuously collecting SL data, integrating them into SynLethDB and improving the annotation quality. In this version, we have integrated more biomedical knowledge about human SLs into a knowledge graph called SynLethKG. The additional knowledge can provide more features for SL prediction and improve the performance of the predictive model. A similar procedure can be applied to predicting drugs based on SLs. We also provided a new web interface with online services for data browsing, visualization and analysis. For instance, ‘Search SL by disease’ can facilitate SL-based cancer drug discovery. The ‘SL inspect’ functionality displays relationships between a pair of SL genes from multiple sources in one intuitive graph. Enrichment analysis tools help analyze the most relevant pathways and GO of a gene’s SL partners. SynLethDB has been used as a source of training or testing datasets by many computational methods for SL prediction, and this new version of SynLethDB provides a larger and more comprehensive dataset for these methods. In addition, we realize that the data in SynLethDB are enriched with SLs of some hub genes, such as Kirsten ras proto-oncogene, because they are more experimentally studied. This kind of data skewness may introduce some bias, which makes a model learn superficial patterns and achieve inflated performance.

The overall goal of SynLethDB is to increase the understanding of SL mechanisms and to facilitate drug discovery. In the future, we will continue to collect new SLs and enhance the functionalities of the database. For instance, we will add genomics data and cell line annotations to make SLs more context-specific. In addition, we can create more efficient path queries based on the graph database to find the pathways shared between SL pairs and interactions between SLs and drugs.

Acknowledgements

We would like to thank William R. Sellers for kindly answering our questions about synthetic lethal gene pairs identified by GEMINI from CRISPR screens and sharing the data.

Funding

ShanghaiTech university startup grant.

Conflict of interest

The authors declare that they have no competing interests.

Author contributions statement

J.Z., M.W. and H.L. conceived the study. J.W. and S.Z. collected the data and performed the analysis. J.W., X.H. and L.W. developed the SynLethKG knowledge graph and the SynLethDB database. J.W. drafted the manuscript with critical input from J.Z. and M.W. All authors reviewed the manuscript.

References

Dobzhansky

(

1946

)

Genetics of natural populations. XIII. Recombination and variability in populations of Drosophila pseudoobscura

Genetics

269

–

290

. doi:

10.1093/genetics/31.3.269

O’Neil

N.J.

Bailey

M.L.

and

Hieter

(

2017

)

Synthetic lethality and cancer

Nat. Rev. Genet.

613

–

623

. doi:

Hartwell

L.H.

Szankasi

Roberts

C.J.

et al. . (

1997

)

Integrating genetic approaches into the discovery of anticancer drugs

Science

278

1064

–

1068

. doi:

10.1126/science.278.5340.1064

Bryant

H.E.

Schultz

Thomas

H.D.

et al. . (

2005

)

Specific killing of BRCA2-deficient tumours with inhibitors of poly (ADP-ribose) polymerase

Nature

434

913

–

917

. doi:

Roemer

and

Boone

(

2013

)

Systems-level antimicrobial drug and drug synergy discovery

Nat. Chem. Biol.

222

–

231

. doi:

10.1038/nchembio.1205

10.1371/journal.pone.0210859

Kaelin

W.G.

(

1999

)

Choosing anticancer drug targets in the postgenomic era

J. Clin. Invest.

104

1503

–

1506

. doi:

Kaelin

W.G.

(

2005

)

The concept of synthetic lethality in the context of anticancer therapy

Nat. Rev. Cancer

689

–

698

. doi:

Heinzel

Marhold

Mayer

et al. . (

2019

)

Synthetic lethality guiding selection of drug combinations in ovarian cancer

PLoS ONE

, e0210859. doi:

10.1016/j.ccr.2010.04.025

O’Hare

Zaberezhnyy

Williams

R.T.

et al. . (

2010

)

Wnt/Ca2+/NFAT signaling maintains survival of Ph+ leukemia cells upon inhibition of Bcr-Abl

Cancer Cell

–

. doi:

10.

Bartz

S.R.

Zhang

Burchard

et al. . (

2006

)

Small interfering RNA screens reveal enhanced cisplatin cytotoxicity in tumor cells having both BRCA network and TP53 disruptions

Mol. Cell. Biol.

9377

–

9386

. doi:

11.

Chang

J.-G.

Chen

C.-C.

Y.-Y.

et al. . (

2016

)

Uncovering synthetic lethal interactions for therapeutic targets and predictive markers in lung adenocarcinoma

Oncotarget

73664

–

73680

. doi:

10.18632/oncotarget.12046

12.

Luo

Emanuele

M.J.

et al. . (

2009

)

A genome-wide RNAi screen identifies multiple synthetic lethal interactions with the Ras oncogene

Cell

137

835

–

848

. doi:

10.1016/j.cell.2009.05.006

13.

Blank

J.L.

Liu

X.J.

Cosmopoulos

et al. . (

2013

)

Novel DNA damage checkpoints mediating cell death induced by the NEDD8-activating enzyme inhibitor MLN4924

Cancer Res.

225

–

234

. doi:

10.1158/0008-5472.CAN-12-1729

14.

Han

Jeng

E.E.

Hess

G.T.

et al. . (

2017

)

Synergistic drug combinations for cancer identified in a CRISPR screen for pairwise genetic interactions

Nat. Biotechnol.

463

–

474

. doi:

15.

Shen

J.P.

Zhao

Sasik

et al. . (

2017

)

Combinatorial CRISPR–Cas9 screens for de novo mapping of genetic interactions

Nat. Methods.

573

–

576

. doi:

16.

Jerby-Arnon

Pfetzer

Waldman

Y.Y.

et al. . (

2014

)

Predicting cancer-specific vulnerability via data-driven detection of synthetic lethality

Cell

158

1199

–

1209

. doi:

10.1016/j.cell.2014.07.027

17.

Lee

J.S.

Das

Jerby-Arnon

et al. . (

2018

)

Harnessing synthetic lethality to predict the response to cancer treatment

Nat. Commun.

–

10.1186/s13062-015-0086-1

18.

Srihari

Singla

Wong

et al. . (

2015

)

Inferring synthetic lethal interactions from mutual exclusivity of genetic events in cancer

Biol. Direct

–

. doi:

19.

Hao

Zhang

Chen

et al. . (

2016

)

Ranking novel cancer driving synthetic lethal gene pairs using TCGA data

Oncotarget

55352

–

55367

20.

Guo

Liu

and

Zheng

(

2016

)

SynLethDB: synthetic lethality database toward discovery of selective and sensitive anticancer drug targets

Nucleic Acids Res.

D1011

–

D1017

. doi:

21.

Schmidt

E.E.

Pelz

Buhlmann

et al. . (

2013

)

GenomeRNAi: a database for cell-based and in vivo RNAi phenotypes, 2013 update

Nucleic Acids Res.

D1021

–

D1026

. doi:

22.

Leung

and

McAdam

et al. . (

2019

)

The BioGRID interaction database: update

Nucleic Acids Res.

D529

–

D541

10.1016/j.ccr.2014.08.008

23.

Ryan

C.J.

Lord

C.J.

and

Ashworth

(

2014

)

DAISY: picking synthetic lethals from cancer genomes

Cancer Cell

306

–

308

. doi:

24.

Liany

Jeyasekharan

and

Rajan

(

2020

)

Predicting synthetic lethal interactions using heterogeneous data sources

Bioinformatics

2209

–

2216

. doi:

10.1093/bioinformatics/btz893

25.

Cai

Chen

Fang

et al. . (

2020

)

Dual-dropout graph convolutional network for predicting synthetic lethality in human cancers

Bioinformatics

4458

–

4465

. doi:

10.1093/bioinformatics/btaa211

26.

Das

Deng

Camphausen

et al. . (

2019

)

DiscoverSL: an R package for multi-omic data driven prediction of synthetic lethality in cancers

Bioinform.

701

–

702

. doi:

10.1093/bioinformatics/bty673

27.

Yuxuan

Chen

Ding

et al. . (

2019

)

Optimal control nodes in disease-perturbed networks as targets for combination therapy

Nat. Commun.

–

28.

Wang

Han

and

Zhao

et al. . (

2019

)

Link synthetic lethality to drug sensitivity of cancer cells

Brief. Bioinform.

1295

–

1307

. doi:

29.

Cui

X.L.

Han

Liu

et al. . (

2021

)

siGCD: a web server to explore survival interaction of genes, cells and drugs in human cancers

Brief. Bioinform.

. doi:

10.1093/bib/bbab058

10.1016/j.molcel.2018.01.017

30.

Wong

A.S.L.

Choi

G.C.G.

Cui

C.H.

et al. . (

2016

)

Multiplexed barcoded CRISPR-Cas9 screening enabled by CombiGEM

Proc. Natl. Acad. Sci. U.S.A.

113

2544

–

2549

. doi:

10.1073/pnas.1517883113

31.

Zhao

Badur

M.G.

Luebeck

et al. . (

2018

)

Combinatorial CRISPR-Cas9 metabolic screens reveal critical redox control points dependent on the KEAP1-NRF2 regulatory axis

Mol. Cell

699

–

708

. doi:

32.

Wang

Hughes

N.W.

et al. . (

2017

)

Gene essentiality profiling reveals gene networks and synthetic lethal interactions with oncogenic Ras

Cell

168

890

–

903

. doi:

10.1016/j.cell.2017.01.013

33.

Steinhart

Pavlovic

Chandrashekhar

et al. . (

2017

)

Genome-wide CRISPR screens reveal a Wnt–FZD5 signaling circuit as a druggable vulnerability of RNF43-mutant pancreatic tumors

Nat. Med.

–

. doi:

34.

Zamanighomi

Jain

S.S.

Ito

et al. . (

2019

)

GEMINI: a variational Bayesian approach to identify genetic interactions from combinatorial CRISPR screens

Genome Biol.

–

. doi:

10.1186/s13059-019-1745-9

35.

Deng

Das

Valdez

et al. . (

2019

)

Sl-biodp: multi-cancer interactive tool for prediction of synthetic lethality and response to cancer treatment

Cancers

, 1682. doi:

10.3390/cancers11111682

36.

Zhang

Tang

Yao

et al. . (

2021

)

The tumor therapy landscape of synthetic lethality

Nat. Commun.

–

10.1109/TCBB.2019.2909908

37.

Sinha

Thomas

Chan

et al. . (

2017

)

Systematic discovery of mutation-specific synthetic lethals by mining pan-cancer human primary tumor data

Nat. Commun.

–

. doi:

38.

Liu

et al. . (

2019

)

SL 2 MF: predicting synthetic lethality in human cancers via logistic matrix factorization

IEEE/ACM Trans. Comput. Biol. Bioinf.

748

–

757

. doi:

10.1186/s12859-019-3197-3

39.

Huang

et al. . (

2019

)

Predicting synthetic lethal interactions in human cancers using graph regularized self-representative matrix factorization

BMC Bioinform.

–

. doi:

40.

Himmelstein

D.S.

Lizee

Hessler

et al. . (

2017

)

Systematic integration of biomedical knowledge prioritizes drugs for repurposing

eLife

, e26726. doi:

10.7554/eLife.26726

41.

Höfken

and

Schiebel

(

2004

)

Novel regulation of mitotic exit by the Cdc42 effectors Gic1 and Gic2

Int. J. Cell Biol.

164

219

–

231

. doi:

10.1083/jcb.200309080

42.

Yunyan

Wang

Han

et al. . (

2018

)

A landscape of synthetic viable interactions in cancer

Brief. Bioinform.

644

–

655

43.

Maglott

Ostell

Pruitt

K.D.

et al. . (

2010

)

Entrez Gene: gene-centered information at NCBI

Nucleic Acids Res.

D52

–

D57

44.

Page

Brin

Motwani

and

Winograd

(

1999

)

The PageRank citation ranking: bringing order to the web. Technical report

. Stanford InfoLab.

45.

James Hung

H.M.

Robert

T.O.N.

Peter

et al. (

1997

)

The behavior of the p-value when the alternative hypothesis is true

Biometrics

–

46.

Piñero

Ramírez-Anguita

J.M.

Saüch-Pitarch

et al. . (

2020

)

The DisGeNET knowledge platform for disease genomics: 2019 update

Nucleic Acids Res.

D845

–

D855