Abstract

The exponential growth of genomic/genetic data in the era of Big Data demands new solutions for making these data findable, accessible, interoperable and reusable. In this article, we present a web-based platform named Gene Expression Time-Course Research (GETc) Platform that enables the discovery and visualization of time-course gene expression data and analytical results from the NIH/NCBI-sponsored Gene Expression Omnibus (GEO). The analytical results are produced from an analytic pipeline based on the ordinary differential equation model. Furthermore, in order to extract scientific insights from these results and disseminate the scientific findings, close and efficient collaborations between domain-specific experts from biomedical and scientific fields and data scientists is required. Therefore, GETc provides several recommendation functions and tools to facilitate effective collaborations. GETc platform is a very useful tool for researchers from the biomedical genomics community to present and communicate large numbers of analysis results from GEO. It is generalizable and broadly applicable across different biomedical research areas. GETc is a user-friendly and efficient web-based platform freely accessible at http://genestudy.org/

Introduction

Over the past few decades, substantial funding and resources have been invested to generate biomedical datasets at many levels, ranging from nucleic acid and gene level to population level, in order to understand, treat and prevent various diseases, and protect public health. Based on data sharing policies of National Institute of Health (NIH) and other government agencies, many of aforementioned datasets are required to be shared with the general research communities. Consequently, vast amounts of biomedical data are being accumulated in databases and data repositories. However, use or reuse of these existing datasets for research by third parties is still not common as expected.

Gene expression data from various diseases under different experimental conditions are mostly deposited in the NIH/NCBI-sponsored Gene Expression Omnibus (GEO) data repository (1). Like many of the biomedical databases, GEO was originally created as a data repository to comply with the data sharing policies. Often, these data sharing platforms are designed and organized for easy and convenient data submission by experimentalists, but not friendly for data retrieval and analysis. Further, it is not easy to identify the particular datasets to address a particular biological question for a specific disease from GEO, since the experimental design and study description are documented in an unstructured free text. Hence, it is necessary to use text mining and natural language processing (NLP) technologies to restructure the existing repository so that datasets can be findable, accessible and reusable.

This article describes a web-based platform that addresses the difficulties in finding, accessing, reusing biomedical datasets, specifically from GEO, as well as the difficulties in finding and forming collaborations. The novel platform, named as Gene Expression Time-Course Research (GETc) platform (http://genestudy.org/), is built on top of an analytical method based on the ordinary differential equation (ODE) model for analyzing time-course gene expression data. GETc offers the following services and functions:

  • Hosts time-course gene expression datasets from GEO annotated with disease and cell types.

  • User-friendly navigation and searching functions.

  • Hosts analysis results of the time-course gene expression datasets produced by the ODE analytic pipeline.

  • Recommends relevant datasets for users based on their research interests.

  • Recommends relevant papers and collaborators for each dataset hosted in the platform.

The rest of the article is organized as follows: Section 2 discusses the background of the analytic pipeline and recommendation systems. Section 3.1 presents datasets used for developing the GETc platform. Section 3.2 describes the methodology used for analytic pipeline, recommendation systems and platform implementation. Section 4 describes and discusses the results. Finally, conclusions are presented in Section 5.

Background

In this section, we present the three main parts of our work, (i) repositories developed for archiving datasets in the biomedical domains and their metadata, (ii) an analytic pipeline developed for analyzing gene data and (iii) dataset, literature and collaborator recommendation systems.

Dataset repositories

It is a growing trend among the researchers to make their data publicly available for reproducibility and data reusability. Many repositories and knowledge bases have been established for different types of data in many doma-ins. GEO(www.ncbi.nlm.nih.gov/geo/), UKBioBank(www.ukbiobank.ac.uk/), ImmPort(www.immport.org/home) and TCGA(portal.gdc.cancer.gov) are a few examples of repositories in the biomedical domain. DATA.GOV archives the U.S. Government’s open data from agriculture, climate, education, etc. for research use. However, users from the biomedical community have to visit and search each repository separately to find data for their research, which can be time-consuming and hectic.

DataMed(datamed.org) started an initiative to solve the above issue for the biomedical community by combining biomedical repositories and enhancing the query searching using advanced NLP techniques (2, 3). DataMed indexes and searches diverse categories of biomedical datasets (3). DataCite is another data discovery index, which includes 16 187 835 works from many different domains (4). However, these repositories do not provide either insight of data or help to find collaborators, which are still challenging tasks to accomplish.

Analytic pipelines for gene expression data

The study of gene regulation related to different biological functions is critical to understand the underlying mechanism of each function, such as cell growth, division, development and response to environmental stimulus. In addition, gene regulatory networks (GRN) have been shown useful for investigating the interaction among genes involved in a biological process, or genes responsive to an external stimulus. There are many computational approaches in the literature for inferring GRNs from gene expression data; for example, information theory models (5–7), Boolean networks (8–11) and Bayesian networks (12–15). However, these approaches are either not efficient in describing dynamic regulations between genes or restricted to small-scale networks. Meanwhile, responses to environmental stimulus, such as immune response to viral infection or response to aberrant activation of a particular pathway, are dynamic processes and require deliberate analysis of time-course gene expression data, which in turn is an ultra-high dimensional problem and needs the use of advanced statistical and computational approaches developed. Therefore, we implement an alternative comprehensive approach that exploits ODE models and gene regulatory network analysis developed in (16–18). This model takes into account the dynamic and temporal behavior of genes, and learns the dynamic relation between genes, in the form of stimulator or inhibitor of each other. Genes (or probes) with significant expression level changes over time are identified as dynamic response genes. Then the top 3000 dynamic response genes are clustered into groups according to their expression pattern over time. Finally, a regulatory network is established by the ODE model (19).

Recommendation systems

A recommendation system is an enabling mechanism to overcome information overload. Literature in this area can be broadly grouped as content-based or collaborative filtering based recommendation systems. Next, we discuss literature related to developed recommendation systems.

Dataset recommendation

There are many dataset repositories in the biomedical domain and many datasets are added to each repository on a daily basis. For example, 34 datasets were added to GEO repository daily in 2019. Hence researchers are likely to be overwhelmed with the data available and they have to visit each repository for searching a dataset. The platforms like DataMed solved this problem and researchers only had to visit DataMed for searching the datasets. However, DataMed has not been updated recently. Again, the intent of search is always difficult to identify (20). A dataset recommendation system based on researcher’s profile may be helpful for information filtering. There were a few experiments performed on data linking (21–23) where similar datasets were clustered together using different semantic features. Most of these works were on linking the datasets with similar datasets rather than a dataset recommendation.

Literature recommendation

The usefulness of the literature recommendation can be stated by the acceptance of Google Scholar, Semantic Scholar, PubMed, etc. The CiteSeer project (24, 25) was the first of its kind to start research paper recommendation. Later, many scientific article recommendation systems were developed. Science Concierge is a content-based article recommendation system using distributional semantics (LSA) and the relevance feedback (Rocchio algorithm). It recommends articles for any number of input articles based on the 2015 Society of Neuroscience Conference articles (26). (27) proposed a citation-based collaborative filtering recommendation system for research articles using Jaccard similarity. Similar article recommendation systems have been developed using TF-IDF (28), topic modeling (29) and citation or author network analysis (30). TF-IDF was the most frequently applied weighting scheme for recommendation tasks (25).

SciMiner is a web-based platform for identifying gene names in text based on user input and provides literature from MEDLINE for the corresponding gene (31). A content-based PubMed article recommendation system, PURE, was developed using Expectation Minimization (32) and it recommends articles to users based on their preferred articles. (33) developed a probabilistic topic-based model for content similarity called ‘pmra’ on the publications from MEDLINE and this has been used as a related article search function in PubMed. Most of the proposed literature recommendation systems use embedding methods to convert text into vectors and calculate the similarity between articles.

Once a researcher finds a dataset suitable for his/her study, he/she may need literature available related to the dataset. A literature recommendation system for datasets may be a helpful tool for this scenario where researchers can get literature from PubMed for each dataset.

Collaborator recommendation

Academic collaborator recommendation has long been regarded as a useful application in the academic environment, which aims to find potential collaborators for a given researcher by exploiting big academic data. In the past few years, several works on collaborator recommendation have been proposed (34–37).

Mainly, co-author network information has been incorporated to enhance the collaboration recommendation (35, 37, 38). (38) proposed a random walk restart model on co-author order, latest collaboration time point and collaboration times. (37) developed a collaborator recommendation system using collaborative entity embedding developed using the topic words collected from the publications of researchers. The cross-domain collaborator recommender is another important aspect of this recommendation and (36) proposed a cross-domain collaborator recommendation using the co-author matching, topic matching and cross-domain topic learning.

(35) proposed CollabSeer based on the co-author network and lexical similarity. However, it is difficult for new researchers or students to get recommendation using the co-author network or lexical similarity as they do not have papers. (39) proposed a collaborator recommender for new researchers or students using input keywords, organizational relationship, ratings and activity level of the collaborators.

When a researcher finds suitable data for his/her study, the researcher may look for collaborators to work with on that dataset. In this scenario, a collaborator recommendation system for each dataset may be helpful.

Materials and methods

Data

GEO Metadata collection

GEO is one of the most popular public repositories for functional genomics data. As of 18 December 2019, there were 122 222 series of datasets available in GEO. Metadata of GEO datasets such as title, summary, date of publication and name of authors was collected from the GEO using a web crawler. The PMIDs of the articles associated with each dataset were also collected. Many datasets did not have associated articles.

Time-course dataset: This study was conducted for the time-course datasets from GEO, however, the time-course datasets were not identified explicitly in the GEO websites. The time-course datasets can be identified manually by reading the dataset descriptions or scanning the associated data with it which is a time-consuming and tedious task. A keyword-based NLP method was applied for identifying time-course datasets. We implemented a regular expression-based approach to extract the time point information from the GEO metadata. For example, some phrases like ‘12 time points’, ‘7 developmental stages; harvest at 10 hrs, 12 hrs’, etc. were used to extract the time point information. The regular expression-based system was able to identify 167 datasets out of 200 random datasets with an accuracy of 83.5%. Further, a total of 555 datasets were filtered manually from 862 datasets identified by the above system for processing. More details on identifying time-course datasets can be found in (40). Once the datasets are identified, the GSE number were fed to the pipeline (Section 3.2.1) and it automatically retrieved the data and metadata information corresponding to GSE numbers. In addition to the time points, diseases, organisms or/and cell types were identified from the title and summary of the datasets. MetaMap (41) applied to the metadata, and the Human Disease Ontology (DOID) terms were detected from the annotated text for each dataset (42). Further, datasets can be filtered using both the cell type and diseases.

MEDLINE Articles

For developing dataset recommender, we collected the researcher’s publications from PubMed. MEDLINE articles were collected for developing literature and collaborator recommenders. MEDLINE articles were collected from PubMed which comprises more than 29 million biomedical and life science research articles. These articles consist of information such as title, abstract, authors, affiliations, Medical Subject Headings (MeSH) terms and publisher name.

However, the articles collected from PubMed contain a variety of topics related to biomedicine and life sciences which may not be suitable for building a recommendation system for datasets in GEO. Further, the articles before 1998 were removed as the research on micro-array data started during that year (43). The datasets that are related to gene expressions and articles collected from PubMed contain a variety of topics. Thus, a MeSH term-based filtering method was implemented to remove unrelated articles from the whole MEDLINE articles. The details of the filtering method can be found in (43). A total of 770 537 articles were utilized for developing literature and collaborator recommendations.

Methods

Analytic pipeline for time-course gene expression data

We integrated the series of statistical and modeling methods for the time-course gene expression data into an analytic pipeline (19) which includes eight steps as mentioned in Figure 1.

Figure 1. Time-course gene expression analytic pipeline.

The final analysis results of the pipeline can be reported as the initial bioinformatics findings for narrowing down the analysis and framing scientific questions, toward new collaborative publications. We could apply the pipeline to each of the time-course gene expression datasets under one experimental or biological condition. Furthermore, simple comparison functions between two or more datasets across experimental conditions and/or from different studies are currently under development for the pipeline. We published the source code of the analytic pipeline, so others can use the pipeline and expand its functionalities.(github.com/j142857z/Pipeline (Original code)),(github.com/AutumnTail/Pipeline (Updated code)).

Recommendation systems

Data Recommendation: Data recommendation is an essential part of the GETc platform. The dataset recommendation function recommends datasets to researchers based on their publications. The datasets used for this recommendation system contain data not only from GEO but also from other sources such as TCGA, ArrayExpress, SRA and Clinical Trails. We used only textual information of datasets (title and summary) and publications (title and abstract).

A researcher may have multiple research interests. To identify the research interests, we implemented a non-parametric clustering algorithm named Dirichlet process mixture model (DPMM). More details on DPMM and its parameter tuning for obtaining better number of clusters can be found in (44). Each researcher had to provide name and curriculum vitae (CV)/list of publications to get dataset recommendation. Researcher’s names were searched in PubMed to get publications (title, abstract, year of publication). This search may result publications from other researchers with the same name which was solved by searching the title of the publication from PubMed in the CV/list of publications provided by the researcher. Finally, publications of the authors were clustered using DPMM to obtain the research topics. For each topic, datasets can be recommended by calculating cosine similarity of research field/cluster vector and dataset vectors. The detailed methodology and evaluation can be found in our previous publication on dataset recommendation (44).

Literature Recommendation: The literature recommendation system recommends literature for datasets. The most similar literature for a dataset can be determined simply by comparing the cosine similarity of the dataset vector and paper vectors. For developing the literature recommendation system in GETc, we used BM25 as it resulted in better precision at 10 compared to other embedding methods such as TF-IDF, word2vec and doc2vec (43). Finally, we used the title based weighted re-ranking and text normalization methods to improve the retrieved results. The detailed methods, experiments and results can be found in our previous publication (43).

Collaborator Recommendation: For each dataset, the recommendation system suggests some collaborators based on the recommended literature. We can say that the authors of the top similar literature for a dataset can be suitable collaborators to work with on that dataset. The authors of the similar articles may have experience working on the dataset and already published articles using it. Further, the collaborators may be recommended for each dataset by ranking the unique authors of the retrieved similar articles. For a dataset (d), the score for each unique author of similar articles can be calculated using Equation (1).

$$\begin{equation} \textrm{AuthorScore}_{i} = \sum_{j=0}^n \textrm{SimScore}_{j} * \textrm{weight} \end{equation}$$
(1)
$$\begin{equation*} \textrm{weight} = \begin{cases} 0\ \textrm{if } A_i \notin P_j\\ 1\ \textrm{if } A_i\,\,\,\textrm{is the first or last author in } P_j \\ 0.1\ \textrm{if } A_i\,\,\,\textrm{is not first or last author in } P_j \end{cases} \end{equation*}$$

where AuthorScorei is the score for ith author calculated over all the retrieved similar articles (⁠|$P = {P_0, P_1, ...P_n}$|⁠) for d. n is the number of total retrieved article for d. SimScorej is the similarity score of d and jth article (Pj).

Higher weights were provided to the first and last authors of each similar article whereas less weights were provided to all other authors. Finally, the authors with the highest scores were recommended as the collaborators for d.

The top 1000 recommended publications from the above literature recommender for a single dataset were used for identifying collaborators for that dataset. Furthermore, authors’ affiliations provided in papers were parsed using the affiliation_parser(github.com/titipata/affiliation_parser) package and the distance between the recommended collaborators’ and the user’s current location was calculated using geopy(geopy.readthedocs.io) package to show a distance-based relevance of user and collaborators.

GETc Platform

In this work, we developed an interactive web-based platform, called GETc, to facilitate collaboration and sharing of the analytic results of our pipeline on time-course gene expression data from GEO to the general research community. We have identified 555 time-course gene expression datasets with more than 7 time points from GEO. We applied our analytic pipeline on 37 of those datasets (results in Section 4). The output of the analytic pipeline for each dataset is folder of files containing intermediate and final analytic results, tables, graphics/plots and documents. The output also includes an automatically generated analysis report for each dataset.

Platform users could interactively search, browse and identify particular datasets and corresponding results of interest. They can visualize and review the analysis results including figures and tables, which can be easily downloaded via the platform web-based user-interface. For the unprocessed time-course gene expression datasets included in the platform, users can request to execute the pipeline. The platform also provides its users with recommendations by employing the recommendation systems described in Section 3.2.2. It recommends literature for time-course gene expression datasets, potential collaborators for extracting scientific insights from the analytic results. It also recommends datasets to researchers. Figure 2 shows the overview of GETc platform. GETc platform executes the tasks mentioned inside the green box.

Figure 2. High-level architecture of the GETc platform.

Users of the platform can search for a time-course dataset using keywords and phrases and see the literature available, significant gene lists, gene clusters and prospective collaborators for that dataset. A screenshot of search and view dataset functionalities is shown in Figure 3. The dataset can be searched if any of the searched keywords matched with the dataset id, title, abstract or platform organism. The datasets retrieved can be filtered using disease or cell type provided on the left side tree view or right side pie charts. The disease types are extracted from human disease ontology (40).

Figure 3.

Search and view datasets in GETc research platform.

Results and discussion

The results of the analytic pipeline which we applied on 37 time-course gene expression cancer datasets from GEO are presented in Table A1. For each dataset with different conditions, the table shows the number of DRGs, number of GRMs, number of time points, cancer type, cell line, the organism, vitro or ex vitro or in vitro or in vivo and species (human or mouse/rats species). MCF10A, MCF7, HeLa and other widely used cell lines are tested in these datasets. These cells lines are originated from various types of cancers such as breast cancer, cervical cancer and leukemia. Also, treatments in these datasets target several essential cancer pathways, such as NFkB, EGFR and hedgehog. These classifications will help researchers perform meta-analyses to identify common/key genes and GRN in a certain type of cancer.

Evaluating recommendation systems are challenging because no benchmark nor prior true annotation exists for either dataset recommendation or dataset-driven literature recommendation. For that reason, we performed a manual evaluation by asking expert human judges to rate the recommendation of systems using one to three ‘stars’ scale based on the relevance (1: not relevant, 2: partially relevant, 3: most relevant).

We evaluated the recommendation systems using strict and partial precision at 10 (P@10). Strict considers only 3-star, while partial considers both 2- and 3-star results. The developed dataset recommendation system was evaluated with five judges who have worked on the datasets before. The system obtained P@10 (strict) and P@10 (partial) of 0.61 and 0.78, respectively. For the literature recommendation, we considered 36 datasets for evaluation and the human judges have already worked on these datasets earlier. The proposed system obtained 0.80 and 0.87 of P@10 (strict) and P@10 (partial), respectively.

No gold standard dataset for evaluating collaborator recommendation is available to date. Similar to literature recommendation, evaluating our collaborator recommendation system was a challenging task, as it requires time to work with collaborators and only then they can provide feedback for system’s output. We are currently working with additional multiple collaborators to evaluate the output of the system and generate feedback that we can use to assess the system’s quality in the future.

A screenshot of literature (top right corner) and collaborator (bottom right corner) recommendations for dataset GSE14 103 is provided in Figure 4. For a selected dataset on the platform UI, the literature recommendation system will generate a list of related papers recommended for users. The recommended list of collaborators can be sorted by name or distance. We have a plan to implement a search function which will allow users to search for collaborators based on the preferred city.

Figure 4. A screenshot of recommended literature and collaborators for GSE14103.

We believe the functions of GETc are very useful for researchers from the biomedical genomics community to present and communicate large numbers of analysis results. In addition to datasets from GEO, we are currently expanding the platform with new time-course datasets from other repositories such as TCGA, SRA and ImmPort. We applied the ODEs in the process of constructing the high-dimensional gene regularity network where having at least 8-time points was essential for the identifiability of the corresponding model. Thus, only datasets with more than or equal to 8-time points can be processed with our pipeline.

Conclusion

In this work, we developed a novel research platform called GETc for sharing data and analytic results of time-course gene expression datasets from GEO to improve the dataset reusability. It is built on top of an analytical method based on the ODE model for analyzing time-course gene expression data. GETc platform provides means to efficiently search and retrieve data, results, and facilitate collaboration through recommendation of related literature and potential collaborators corresponding to datasets. This platform also hosts a dataset recommendation system which will help researchers in biomedical domain to search datasets based on their publications. This will hopefully lead to better data reuse experience. We believe that the proposed novel idea and computational platform could also be applied to other types of data from different databases or data repositories.

Acknowledgement

We thank Dr H.M and other members from GEO Big Data Working Group at the Center for Big Data in Health Sciences (CBD-HS) for suggestions and comments on designing platform that greatly improved the website.

Funding

This project is mainly supported by the Center for Big Data in Health Sciences (CBD-HS) at School of Public Health, University of Texas Health Science Center at Houston (UTHealth), and partially supported by grants National Institute of Health (R01 AI087135, 1R01AG066749-01, 1UL1TR003167-01), Cancer Prevention and Research Institute of Texas (CPRIT RP170668) at UTHealth.

Conflicts of interest. None declared.

Appendix

Table A1.

Results statistics from the cancer datasets (ORG: Organism, in vitro: ivr, ex vitro: evv, in vivo: ivv, Species: SP, Homo sapiens: HS, Rattus norvegicus: RN, Mus Musculus: MM)

SLGEO accessionTime pointCancer typeCell lineORGSPCondition# of DRG# of GRM
1GSE186411Breast cancerZR-75.1ivrHS17β-estradiol127285
17β-estradiol (dye swap)114534
2GSE31138Colorectal carcinomaEcR-RKO/KLF4ivrHSPonesterone334944
Control158950
3GSE77010Prostate adenocarcinomaLNCaP C4-2ivrHSIrradiation422772
4GSE164010Kaposi’s sarcomaBCBL-1ivrHSCidofovir rep1473146
Cidofovir rep2453137
Cidofovir rep3301164
Control rep1643125
Control rep2504121
Control rep3568142
5GSE904814N/AEmbryonic stemivrMMHDRep121 34930
HDRep221 34937
HD_LIF20 20936
6GSE985410OsterosarcomaU2OSivrHSGFP640084
HIC1706261
7GSE141038Colorectal carcinomaHCT116ivrHSNocodazole629533
8GSE170189stomachGIST-T1ivrHSImatinib mesylat Rep113 12142
9Imatinib mesylat Rep213 12142
8Imatinib mesylat Rep323 00234
9GSE203618Breast cancerMCF-7ivrHS17β-estradiol20
10GSE209888Mediastinal (thymic) large B-cell lymphomaK1106ivrHSJAK2 inhibitor476646
11GSE2295516Breast cancerSUM-225ivrHSHER-2 inhibitor CP724,71411 72584
12GSE2313516Breast cancerMCF-10AivrHSGfitinib10 04650
13GSE2313616Breast cancerMCF-10HER-2ivrHSGfitinib12 18449
SLGEO accessionTime pointCancer typeCell lineORGSPCondition# of DRG# of GRM
1GSE186411Breast cancerZR-75.1ivrHS17β-estradiol127285
17β-estradiol (dye swap)114534
2GSE31138Colorectal carcinomaEcR-RKO/KLF4ivrHSPonesterone334944
Control158950
3GSE77010Prostate adenocarcinomaLNCaP C4-2ivrHSIrradiation422772
4GSE164010Kaposi’s sarcomaBCBL-1ivrHSCidofovir rep1473146
Cidofovir rep2453137
Cidofovir rep3301164
Control rep1643125
Control rep2504121
Control rep3568142
5GSE904814N/AEmbryonic stemivrMMHDRep121 34930
HDRep221 34937
HD_LIF20 20936
6GSE985410OsterosarcomaU2OSivrHSGFP640084
HIC1706261
7GSE141038Colorectal carcinomaHCT116ivrHSNocodazole629533
8GSE170189stomachGIST-T1ivrHSImatinib mesylat Rep113 12142
9Imatinib mesylat Rep213 12142
8Imatinib mesylat Rep323 00234
9GSE203618Breast cancerMCF-7ivrHS17β-estradiol20
10GSE209888Mediastinal (thymic) large B-cell lymphomaK1106ivrHSJAK2 inhibitor476646
11GSE2295516Breast cancerSUM-225ivrHSHER-2 inhibitor CP724,71411 72584
12GSE2313516Breast cancerMCF-10AivrHSGfitinib10 04650
13GSE2313616Breast cancerMCF-10HER-2ivrHSGfitinib12 18449
Table A1.

Results statistics from the cancer datasets (ORG: Organism, in vitro: ivr, ex vitro: evv, in vivo: ivv, Species: SP, Homo sapiens: HS, Rattus norvegicus: RN, Mus Musculus: MM)

SLGEO accessionTime pointCancer typeCell lineORGSPCondition# of DRG# of GRM
1GSE186411Breast cancerZR-75.1ivrHS17β-estradiol127285
17β-estradiol (dye swap)114534
2GSE31138Colorectal carcinomaEcR-RKO/KLF4ivrHSPonesterone334944
Control158950
3GSE77010Prostate adenocarcinomaLNCaP C4-2ivrHSIrradiation422772
4GSE164010Kaposi’s sarcomaBCBL-1ivrHSCidofovir rep1473146
Cidofovir rep2453137
Cidofovir rep3301164
Control rep1643125
Control rep2504121
Control rep3568142
5GSE904814N/AEmbryonic stemivrMMHDRep121 34930
HDRep221 34937
HD_LIF20 20936
6GSE985410OsterosarcomaU2OSivrHSGFP640084
HIC1706261
7GSE141038Colorectal carcinomaHCT116ivrHSNocodazole629533
8GSE170189stomachGIST-T1ivrHSImatinib mesylat Rep113 12142
9Imatinib mesylat Rep213 12142
8Imatinib mesylat Rep323 00234
9GSE203618Breast cancerMCF-7ivrHS17β-estradiol20
10GSE209888Mediastinal (thymic) large B-cell lymphomaK1106ivrHSJAK2 inhibitor476646
11GSE2295516Breast cancerSUM-225ivrHSHER-2 inhibitor CP724,71411 72584
12GSE2313516Breast cancerMCF-10AivrHSGfitinib10 04650
13GSE2313616Breast cancerMCF-10HER-2ivrHSGfitinib12 18449
SLGEO accessionTime pointCancer typeCell lineORGSPCondition# of DRG# of GRM
1GSE186411Breast cancerZR-75.1ivrHS17β-estradiol127285
17β-estradiol (dye swap)114534
2GSE31138Colorectal carcinomaEcR-RKO/KLF4ivrHSPonesterone334944
Control158950
3GSE77010Prostate adenocarcinomaLNCaP C4-2ivrHSIrradiation422772
4GSE164010Kaposi’s sarcomaBCBL-1ivrHSCidofovir rep1473146
Cidofovir rep2453137
Cidofovir rep3301164
Control rep1643125
Control rep2504121
Control rep3568142
5GSE904814N/AEmbryonic stemivrMMHDRep121 34930
HDRep221 34937
HD_LIF20 20936
6GSE985410OsterosarcomaU2OSivrHSGFP640084
HIC1706261
7GSE141038Colorectal carcinomaHCT116ivrHSNocodazole629533
8GSE170189stomachGIST-T1ivrHSImatinib mesylat Rep113 12142
9Imatinib mesylat Rep213 12142
8Imatinib mesylat Rep323 00234
9GSE203618Breast cancerMCF-7ivrHS17β-estradiol20
10GSE209888Mediastinal (thymic) large B-cell lymphomaK1106ivrHSJAK2 inhibitor476646
11GSE2295516Breast cancerSUM-225ivrHSHER-2 inhibitor CP724,71411 72584
12GSE2313516Breast cancerMCF-10AivrHSGfitinib10 04650
13GSE2313616Breast cancerMCF-10HER-2ivrHSGfitinib12 18449
Table A1.

(Continued)

SLGEO accessionTime pointCancer typeCell lineORGSPCondition# of DRG# of GRM
14GSE1868428Prostate adenocarcinomaLNCaPivrHSR1881_Rep1-1866641
28R1881_Rep1-29293121
9R1881_Rep2-1427249
9R1881_Rep2-2405251
15GSE216188Breast cancerMCF-7ivrHSTamR_Control667647
TamR_E2485938
TamR_E2_Tamoxifen11 31436
TamR_HRG14 76438
TamR_HRG_Tamoxifen10 24331
TamR_Tamoxifen10 34535
WT_E2760635
WT_E2_Rep1861939
WT_E2_Rep2326741
WT_E2_Tamoxifen605937
WT_HRG837034
WT_HRG_Rep111 72432
WT_HRG_Rep2927442
WT_HRG_Tamoxifen609335
WT_Tamoxifen353037
16GSE4107219Acute T cell leukemiaJurkar or Primary T cellsivrHSJurkat Roc14 38244
12Tcell Roc852059
17GSE260028GlioblastomaTRP mouse modelivvMMTRPhet132845
18GSE3862313Skin cancerMouse whole back skinevvMMUVB11 225104
19GSE296418Breast cancerDU145; HT29; MCF7ivrHSHypoxia632526
Hypoxia765129
Hypoxia816929
20GSE410348Diffuse large B-cell lymphomaHBL-1ivrHSIkB kinase beta inhibitor MLN120B15 27843
SLGEO accessionTime pointCancer typeCell lineORGSPCondition# of DRG# of GRM
14GSE1868428Prostate adenocarcinomaLNCaPivrHSR1881_Rep1-1866641
28R1881_Rep1-29293121
9R1881_Rep2-1427249
9R1881_Rep2-2405251
15GSE216188Breast cancerMCF-7ivrHSTamR_Control667647
TamR_E2485938
TamR_E2_Tamoxifen11 31436
TamR_HRG14 76438
TamR_HRG_Tamoxifen10 24331
TamR_Tamoxifen10 34535
WT_E2760635
WT_E2_Rep1861939
WT_E2_Rep2326741
WT_E2_Tamoxifen605937
WT_HRG837034
WT_HRG_Rep111 72432
WT_HRG_Rep2927442
WT_HRG_Tamoxifen609335
WT_Tamoxifen353037
16GSE4107219Acute T cell leukemiaJurkar or Primary T cellsivrHSJurkat Roc14 38244
12Tcell Roc852059
17GSE260028GlioblastomaTRP mouse modelivvMMTRPhet132845
18GSE3862313Skin cancerMouse whole back skinevvMMUVB11 225104
19GSE296418Breast cancerDU145; HT29; MCF7ivrHSHypoxia632526
Hypoxia765129
Hypoxia816929
20GSE410348Diffuse large B-cell lymphomaHBL-1ivrHSIkB kinase beta inhibitor MLN120B15 27843
Table A1.

(Continued)

SLGEO accessionTime pointCancer typeCell lineORGSPCondition# of DRG# of GRM
14GSE1868428Prostate adenocarcinomaLNCaPivrHSR1881_Rep1-1866641
28R1881_Rep1-29293121
9R1881_Rep2-1427249
9R1881_Rep2-2405251
15GSE216188Breast cancerMCF-7ivrHSTamR_Control667647
TamR_E2485938
TamR_E2_Tamoxifen11 31436
TamR_HRG14 76438
TamR_HRG_Tamoxifen10 24331
TamR_Tamoxifen10 34535
WT_E2760635
WT_E2_Rep1861939
WT_E2_Rep2326741
WT_E2_Tamoxifen605937
WT_HRG837034
WT_HRG_Rep111 72432
WT_HRG_Rep2927442
WT_HRG_Tamoxifen609335
WT_Tamoxifen353037
16GSE4107219Acute T cell leukemiaJurkar or Primary T cellsivrHSJurkat Roc14 38244
12Tcell Roc852059
17GSE260028GlioblastomaTRP mouse modelivvMMTRPhet132845
18GSE3862313Skin cancerMouse whole back skinevvMMUVB11 225104
19GSE296418Breast cancerDU145; HT29; MCF7ivrHSHypoxia632526
Hypoxia765129
Hypoxia816929
20GSE410348Diffuse large B-cell lymphomaHBL-1ivrHSIkB kinase beta inhibitor MLN120B15 27843
SLGEO accessionTime pointCancer typeCell lineORGSPCondition# of DRG# of GRM
14GSE1868428Prostate adenocarcinomaLNCaPivrHSR1881_Rep1-1866641
28R1881_Rep1-29293121
9R1881_Rep2-1427249
9R1881_Rep2-2405251
15GSE216188Breast cancerMCF-7ivrHSTamR_Control667647
TamR_E2485938
TamR_E2_Tamoxifen11 31436
TamR_HRG14 76438
TamR_HRG_Tamoxifen10 24331
TamR_Tamoxifen10 34535
WT_E2760635
WT_E2_Rep1861939
WT_E2_Rep2326741
WT_E2_Tamoxifen605937
WT_HRG837034
WT_HRG_Rep111 72432
WT_HRG_Rep2927442
WT_HRG_Tamoxifen609335
WT_Tamoxifen353037
16GSE4107219Acute T cell leukemiaJurkar or Primary T cellsivrHSJurkat Roc14 38244
12Tcell Roc852059
17GSE260028GlioblastomaTRP mouse modelivvMMTRPhet132845
18GSE3862313Skin cancerMouse whole back skinevvMMUVB11 225104
19GSE296418Breast cancerDU145; HT29; MCF7ivrHSHypoxia632526
Hypoxia765129
Hypoxia816929
20GSE410348Diffuse large B-cell lymphomaHBL-1ivrHSIkB kinase beta inhibitor MLN120B15 27843
Table A1.

(Continued)

SLGEO accessionTime pointCancer typeCell lineORGSPCondition# of DRG# of GRM
21GSE2313716Breast cancerMCF-10HER-2ivrHSHER-2 inhibitor CP724,71411 469140
22GSE2313816Breast cancerMCF-10AivrHSHER-2 inhibitor CP724,7148811120
23GSE2313916Breast cancerMCF-10HER-2/E7ivrHSHER-2 inhibitor CP724,714922196
24GSE3286911Pancreas adenocarcinomaAR42JivrRNGastrin718181
12Control659492
11Gastrin5515105
12Control6282144
25GSE414918Breast cancerDU145; HT29; MCF7ivrHSHypoxia612727
Hypoxia740630
Hypoxia801124
26GSE4470012B-cell Precursor leukemia cell lineBLaER1ivrHSE2 treatment rep131 58348
E2 treatment rep223 76768
27GSE4604514Desmoplastic cerebellar medulloblastomaDaoyivrHSControl_median7176216
EGF_median15 65948
EGF_SHH_median17 97251
SHH_median10 770237
28GSE495838Pancreatic carcinomaPrimary pancreatic stellate cellsivrHSTumor-cell supernatant446948
29GSE495848Pancreatic carcinomaMiaPaca2ivrHSControl544144
30GSE495869Pancreatic carcinomaMiaPaca2ivrHSStellate-cell supernatant14 60137
31GSE506248Acute T cell leukemiaJurkativrHSCDK7 inhibitor30 0139
CDK7 inhibitor2980413
SLGEO accessionTime pointCancer typeCell lineORGSPCondition# of DRG# of GRM
21GSE2313716Breast cancerMCF-10HER-2ivrHSHER-2 inhibitor CP724,71411 469140
22GSE2313816Breast cancerMCF-10AivrHSHER-2 inhibitor CP724,7148811120
23GSE2313916Breast cancerMCF-10HER-2/E7ivrHSHER-2 inhibitor CP724,714922196
24GSE3286911Pancreas adenocarcinomaAR42JivrRNGastrin718181
12Control659492
11Gastrin5515105
12Control6282144
25GSE414918Breast cancerDU145; HT29; MCF7ivrHSHypoxia612727
Hypoxia740630
Hypoxia801124
26GSE4470012B-cell Precursor leukemia cell lineBLaER1ivrHSE2 treatment rep131 58348
E2 treatment rep223 76768
27GSE4604514Desmoplastic cerebellar medulloblastomaDaoyivrHSControl_median7176216
EGF_median15 65948
EGF_SHH_median17 97251
SHH_median10 770237
28GSE495838Pancreatic carcinomaPrimary pancreatic stellate cellsivrHSTumor-cell supernatant446948
29GSE495848Pancreatic carcinomaMiaPaca2ivrHSControl544144
30GSE495869Pancreatic carcinomaMiaPaca2ivrHSStellate-cell supernatant14 60137
31GSE506248Acute T cell leukemiaJurkativrHSCDK7 inhibitor30 0139
CDK7 inhibitor2980413
Table A1.

(Continued)

SLGEO accessionTime pointCancer typeCell lineORGSPCondition# of DRG# of GRM
21GSE2313716Breast cancerMCF-10HER-2ivrHSHER-2 inhibitor CP724,71411 469140
22GSE2313816Breast cancerMCF-10AivrHSHER-2 inhibitor CP724,7148811120
23GSE2313916Breast cancerMCF-10HER-2/E7ivrHSHER-2 inhibitor CP724,714922196
24GSE3286911Pancreas adenocarcinomaAR42JivrRNGastrin718181
12Control659492
11Gastrin5515105
12Control6282144
25GSE414918Breast cancerDU145; HT29; MCF7ivrHSHypoxia612727
Hypoxia740630
Hypoxia801124
26GSE4470012B-cell Precursor leukemia cell lineBLaER1ivrHSE2 treatment rep131 58348
E2 treatment rep223 76768
27GSE4604514Desmoplastic cerebellar medulloblastomaDaoyivrHSControl_median7176216
EGF_median15 65948
EGF_SHH_median17 97251
SHH_median10 770237
28GSE495838Pancreatic carcinomaPrimary pancreatic stellate cellsivrHSTumor-cell supernatant446948
29GSE495848Pancreatic carcinomaMiaPaca2ivrHSControl544144
30GSE495869Pancreatic carcinomaMiaPaca2ivrHSStellate-cell supernatant14 60137
31GSE506248Acute T cell leukemiaJurkativrHSCDK7 inhibitor30 0139
CDK7 inhibitor2980413
SLGEO accessionTime pointCancer typeCell lineORGSPCondition# of DRG# of GRM
21GSE2313716Breast cancerMCF-10HER-2ivrHSHER-2 inhibitor CP724,71411 469140
22GSE2313816Breast cancerMCF-10AivrHSHER-2 inhibitor CP724,7148811120
23GSE2313916Breast cancerMCF-10HER-2/E7ivrHSHER-2 inhibitor CP724,714922196
24GSE3286911Pancreas adenocarcinomaAR42JivrRNGastrin718181
12Control659492
11Gastrin5515105
12Control6282144
25GSE414918Breast cancerDU145; HT29; MCF7ivrHSHypoxia612727
Hypoxia740630
Hypoxia801124
26GSE4470012B-cell Precursor leukemia cell lineBLaER1ivrHSE2 treatment rep131 58348
E2 treatment rep223 76768
27GSE4604514Desmoplastic cerebellar medulloblastomaDaoyivrHSControl_median7176216
EGF_median15 65948
EGF_SHH_median17 97251
SHH_median10 770237
28GSE495838Pancreatic carcinomaPrimary pancreatic stellate cellsivrHSTumor-cell supernatant446948
29GSE495848Pancreatic carcinomaMiaPaca2ivrHSControl544144
30GSE495869Pancreatic carcinomaMiaPaca2ivrHSStellate-cell supernatant14 60137
31GSE506248Acute T cell leukemiaJurkativrHSCDK7 inhibitor30 0139
CDK7 inhibitor2980413
Table A1.

(Continued)

SLGEO accessionTime pointCancer typeCell lineORGSPCondition# of DRG# of GRM
32GSE5271010Hodgkin lymphomaL428ivrHSLNA-antimiR-9392156
LNA-Scrable173266
33GSE153279Non-small cell lung cancerNCI-H1975ivrHSH2O2244675
Menadione891224
34GSE5098823OsteosarcomaU2OSivrHSThymidine-nocodazol7763792
20Thymidine rep118 894166
24Thymidine rep29593390
24Thymidine rep324 583199
35GSE6407317Breast cancerMCF7ivrHSDHMEQ20233
16HRG15533102
16HRG + DHMEQ16 57362
16HRG + LY29400212 128174
17LY29400214 30948
17Control6427193
36GSE7172111Burkitt lymphomaPrimary lymphomaevvHSanti human IgM F(ab)2 fragment rep1629458
10anti human IgM F(ab)2 fragment rep2447962
10anti human IgM F(ab)2 fragment rep3447962
SLGEO accessionTime pointCancer typeCell lineORGSPCondition# of DRG# of GRM
32GSE5271010Hodgkin lymphomaL428ivrHSLNA-antimiR-9392156
LNA-Scrable173266
33GSE153279Non-small cell lung cancerNCI-H1975ivrHSH2O2244675
Menadione891224
34GSE5098823OsteosarcomaU2OSivrHSThymidine-nocodazol7763792
20Thymidine rep118 894166
24Thymidine rep29593390
24Thymidine rep324 583199
35GSE6407317Breast cancerMCF7ivrHSDHMEQ20233
16HRG15533102
16HRG + DHMEQ16 57362
16HRG + LY29400212 128174
17LY29400214 30948
17Control6427193
36GSE7172111Burkitt lymphomaPrimary lymphomaevvHSanti human IgM F(ab)2 fragment rep1629458
10anti human IgM F(ab)2 fragment rep2447962
10anti human IgM F(ab)2 fragment rep3447962
Table A1.

(Continued)

SLGEO accessionTime pointCancer typeCell lineORGSPCondition# of DRG# of GRM
32GSE5271010Hodgkin lymphomaL428ivrHSLNA-antimiR-9392156
LNA-Scrable173266
33GSE153279Non-small cell lung cancerNCI-H1975ivrHSH2O2244675
Menadione891224
34GSE5098823OsteosarcomaU2OSivrHSThymidine-nocodazol7763792
20Thymidine rep118 894166
24Thymidine rep29593390
24Thymidine rep324 583199
35GSE6407317Breast cancerMCF7ivrHSDHMEQ20233
16HRG15533102
16HRG + DHMEQ16 57362
16HRG + LY29400212 128174
17LY29400214 30948
17Control6427193
36GSE7172111Burkitt lymphomaPrimary lymphomaevvHSanti human IgM F(ab)2 fragment rep1629458
10anti human IgM F(ab)2 fragment rep2447962
10anti human IgM F(ab)2 fragment rep3447962
SLGEO accessionTime pointCancer typeCell lineORGSPCondition# of DRG# of GRM
32GSE5271010Hodgkin lymphomaL428ivrHSLNA-antimiR-9392156
LNA-Scrable173266
33GSE153279Non-small cell lung cancerNCI-H1975ivrHSH2O2244675
Menadione891224
34GSE5098823OsteosarcomaU2OSivrHSThymidine-nocodazol7763792
20Thymidine rep118 894166
24Thymidine rep29593390
24Thymidine rep324 583199
35GSE6407317Breast cancerMCF7ivrHSDHMEQ20233
16HRG15533102
16HRG + DHMEQ16 57362
16HRG + LY29400212 128174
17LY29400214 30948
17Control6427193
36GSE7172111Burkitt lymphomaPrimary lymphomaevvHSanti human IgM F(ab)2 fragment rep1629458
10anti human IgM F(ab)2 fragment rep2447962
10anti human IgM F(ab)2 fragment rep3447962
Table A1.

(Continued)

SLGEO accessionTime pointCancer typeCell lineORGSPCondition# of DRG# of GRM
37GSE155238Skin cancerBJ NMycivrHSN-MycER(delta-MbII)387544
N-MycER274154
38GSE177089Lung adenocarcinomaA549ivrHSTGFb120 29657
39GSE188178Diffuse large B-cell lymphomaHBL-1ivrHSMLN120B11 86551
40GSE3422826Lung adenocarcinomaPC9ivrHSGefitinib30 56573
41GSE2124510Pancreatic adenocarcinomaLNCaPivrHSDihydrotestosterone miRNA array143188
Dihydrotestosterone miRNA array13 63693
42GSE3424317N/APgk12.1ivrMMDifferentiation induction343 73849
43GSE459588Breast cancerControlivrHS2gy Radiation56 56044
6gy Radiation2719146
R6gy43 65046
44GSE763688Breast cancerMCF-7ivrHSStarvation322951
45GSE8409611Non-small cell lung cancerNCI-H1975evHSEGF944390
8Control705964
SLGEO accessionTime pointCancer typeCell lineORGSPCondition# of DRG# of GRM
37GSE155238Skin cancerBJ NMycivrHSN-MycER(delta-MbII)387544
N-MycER274154
38GSE177089Lung adenocarcinomaA549ivrHSTGFb120 29657
39GSE188178Diffuse large B-cell lymphomaHBL-1ivrHSMLN120B11 86551
40GSE3422826Lung adenocarcinomaPC9ivrHSGefitinib30 56573
41GSE2124510Pancreatic adenocarcinomaLNCaPivrHSDihydrotestosterone miRNA array143188
Dihydrotestosterone miRNA array13 63693
42GSE3424317N/APgk12.1ivrMMDifferentiation induction343 73849
43GSE459588Breast cancerControlivrHS2gy Radiation56 56044
6gy Radiation2719146
R6gy43 65046
44GSE763688Breast cancerMCF-7ivrHSStarvation322951
45GSE8409611Non-small cell lung cancerNCI-H1975evHSEGF944390
8Control705964
Table A1.

(Continued)

SLGEO accessionTime pointCancer typeCell lineORGSPCondition# of DRG# of GRM
37GSE155238Skin cancerBJ NMycivrHSN-MycER(delta-MbII)387544
N-MycER274154
38GSE177089Lung adenocarcinomaA549ivrHSTGFb120 29657
39GSE188178Diffuse large B-cell lymphomaHBL-1ivrHSMLN120B11 86551
40GSE3422826Lung adenocarcinomaPC9ivrHSGefitinib30 56573
41GSE2124510Pancreatic adenocarcinomaLNCaPivrHSDihydrotestosterone miRNA array143188
Dihydrotestosterone miRNA array13 63693
42GSE3424317N/APgk12.1ivrMMDifferentiation induction343 73849
43GSE459588Breast cancerControlivrHS2gy Radiation56 56044
6gy Radiation2719146
R6gy43 65046
44GSE763688Breast cancerMCF-7ivrHSStarvation322951
45GSE8409611Non-small cell lung cancerNCI-H1975evHSEGF944390
8Control705964
SLGEO accessionTime pointCancer typeCell lineORGSPCondition# of DRG# of GRM
37GSE155238Skin cancerBJ NMycivrHSN-MycER(delta-MbII)387544
N-MycER274154
38GSE177089Lung adenocarcinomaA549ivrHSTGFb120 29657
39GSE188178Diffuse large B-cell lymphomaHBL-1ivrHSMLN120B11 86551
40GSE3422826Lung adenocarcinomaPC9ivrHSGefitinib30 56573
41GSE2124510Pancreatic adenocarcinomaLNCaPivrHSDihydrotestosterone miRNA array143188
Dihydrotestosterone miRNA array13 63693
42GSE3424317N/APgk12.1ivrMMDifferentiation induction343 73849
43GSE459588Breast cancerControlivrHS2gy Radiation56 56044
6gy Radiation2719146
R6gy43 65046
44GSE763688Breast cancerMCF-7ivrHSStarvation322951
45GSE8409611Non-small cell lung cancerNCI-H1975evHSEGF944390
8Control705964

References

1.

Barrett
,
T.
 et al.  (
2012
)
Ncbi geo: archive for functional genomics data sets–update
.
Nucleic acids research
,
41
,
D991
D995
.

2.

Roberts
,
K.
 et al.  (
2017
) Information retrieval for biomedical datasets: the 2016 biocaddie dataset retrieval challenge.
Database
, p 2017.

3.

Chen
,
X.
 et al.  (
2018
)
Datamed–an open source discovery index for finding biomedical datasets
.
Journal of the American Medical Informatics Association
,
25
,
300
308
.

4.

Brase
,
J.
(
2009
)
Datacite-a global registration agency for research data
.
2009 Fourth International Conference on Cooperation and Promotion of Information Resources in Science and Technology
.
IEEE
,
pages 257
261
.

5.

Steuer
,
R.
 et al.  (
2002
)
The mutual information: detecting and evaluating dependencies between variables
.
Bioinformatics
,
18
,
S231
S240
.

6.

Stuart
,
J.M.
 et al.  (
2003
)
A gene-coexpression network for global discovery of conserved genetic modules
.
Science
,
302
,
249
255
.

7.

Margolin
,
A.A.
 et al.  (
2006
)
Aracne: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context
.
BMC Bioinformatics
,
7
, S7.

8.

Thomas
,
R.
(
1973
)
Boolean formalization of genetic control circuits
.
Journal of theoretical biology
,
42
,
563
585
.

9.

Akutsu
,
T.
 et al.  (
2000
)
Inferring qualitative relations in genetic networks and metabolic pathways
.
Bioinformatics
,
16
,
727
734
.

10.

Shmulevich
,
I.
 et al.  (
2002
)
Probabilistic boolean networks: a rule-based uncertainty model for gene regulatory networks
.
Bioinformatics
,
18
,
261
274
.

11.

Bornholdt
,
S.
(
2008
)
Boolean network models of cellular regulation: prospects and limitations
.
Journal of the Royal Society Interface
,
5
,
S85
S94
.

12.

Friedman
,
N.
 et al.  (
2000
)
Using Bayesian networks to analyze expression data
.
Journal of Computational Biology
,
7
,
601
620
.

13.

Kim
,
S.Y.
 et al.  (
2003
)
Inferring gene networks from time series microarray data using dynamic Bayesian networks
.
Briefings in Bioinformatics
,
4
,
228
235
.

14.

Zou
,
M.
and
Conzen
,
S.D.
(
2004
)
A new dynamic Bayesian network (dbn) approach for identifying gene regulatory networks from time course microarray data
.
Bioinformatics
,
21
,
71
79
.

15.

Needham
,
C.J.
 et al.  (
2007
)
A primer on learning in Bayesian networks for computational biology
.
PLoS computational biology
,
3
, e129.

16.

Lu
,
T.
 et al.  (
2011
)
High-dimensional odes coupled with mixed-effects modeling techniques for dynamic gene regulatory network identification
.
Journal of the American Statistical Association
,
106
,
1242
1258
.

17.

Wu
,
S.
 et al.  (
2014
)
Modeling genome-wide dynamic regulatory network in mouse lungs with influenza infection using high-dimensional ordinary differential equations
.
PloS one
,
9
, e95276.

18.

Linel
,
P.
 et al.  (
2014
)
Dynamic transcriptional signatures and network responses for clinical symptoms in influenza-infected human subjects using systems biology approaches
.
Journal of Pharmacokinetics and Pharmacodynamics
,
41
,
509
521
.

19.

Carey
,
M.
 et al.  (
2018
)
A big data pipeline: Identifying dynamic gene regulatory networks from time-course gene expression omnibus data with applications to influenza infection
.
Statistical methods in medical research
,
27
,
1930
1955
.

20.

Jansen
,
B. J.
 et al.  (
2007
)
Determining the user intent of web search engine queries
.
Proceedings of the 16th international conference on World Wide Web
.
ACM
,
pages 1149
1150
.

21.

Nunes
,
B. P.
 et al.  (
2013
)
Combining a co-occurrence-based and a semantic measure for entity linking
.
Extended Semantic Web Conference
.
Springer
,
pages 548
562
.

22.

Ellefi
,
M. B.
 et al.  (
2016
)
Dataset recommendation for data linking: An intensional approach
.
European Semantic Web Conference
.
Springer
,
pages 36
51
.

23.

Srivastava
,
K. S.
, (
2018
),
Predicting and recommending relevant datasets in complex environments
. US Patent App.
15/721
 122.

24.

Bollacker
,
K. D.
 et al.  (
1998
)
Citeseer: An autonomous web agent for automatic retrieval and identification of interesting publications
.
Proceedings of the second international conference on Autonomous agents
.
ACM
,
pages 116
123
.

25.

Beel
,
J.
 et al.  (
2016
)
Research-paper recommender systems: a literature survey
.
International Journal on Digital Libraries
,
17
,
305
338
.

26.

Achakulvisut
,
T.
 et al.  (
2016
)
Science concierge: A fast content-based recommendation system for scientific publications
.
PloS one
,
11
, e0158423.

27.

Haruna
,
K.
 et al.  (
2017
)
A collaborative approach for research paper recommender system
.
PloS one
,
12
, e0184516.

28.

Beel
,
J.
 et al.  (
2013
)
Introducing Docear’s research paper recommender system
.
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
.
ACM
,
pages 459
460
.

29.

Wang
,
C.
and
Blei
,
D. M.
(
2011
)
Collaborative topic modeling for recommending scientific articles
.
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
.
ACM
,
pages 448
456
.

30.

Huynh
,
T.
 et al. (
2012
)
Scientific publication recommendations based on collaborative citation networks
.
Collaboration Technologies and Systems (CTS), 2012 International Conference on
.
IEEE
,
pages 316
321
.

31.

Hur
,
J.
 et al.  (
2009
)
Sciminer: web-based literature mining tool for target identification and functional enrichment analysis
.
Bioinformatics
,
25
,
838
840
.

32.

Yoneya
,
T.
and
Mamitsuka
,
H.
(
2007
)
Pure: a Pubmed article recommendation system based on content-based filtering
.
Genome informatics
,
18
,
267
276
.

33.

Lin
,
J.
and
Wilbur
,
W.J.
(
2007
)
Pubmed related articles: a probabilistic topic-based model for content similarity
.
BMC bioinformatics
,
8
, 423.

34.

Sun
,
Y.
 et al.  (
2011
)
Co-author relationship prediction in heterogeneous bibliographic networks
.
2011 International Conference on Advances in Social Networks Analysis and Mining
.
IEEE
,
pages 121
128
.

35.

Chen
,
H.-H.
 et al.  (
2011
)
Collabseer: a search engine for collaboration discovery
.
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
.
ACM
,
pages 231
240
.

36.

Tang
,
J.
 et al.  (
2012
)
Cross-domain collaboration recommendation
.
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
.
ACM
,
pages 1285
1293
.

37.

Liu
,
Z.
 et al.  (
2018
)
Context-aware academic collaborator recommendation
.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
.
ACM
,
pages 1870
1879
.

38.

Li
,
J.
 et al.  (
2014
)
Acrec: a co-authorship based random walk model for academic collaboration recommendation
.
Proceedings of the 23rd International Conference on World Wide Web
.
ACM
,
pages 1209
1214
.

39.

Huynh
,
T.
 et al.  (
2014
)
Collaborator recommendation for isolated researchers
.
2014 28th International Conference on Advanced Information Networking and Applications Workshops
.
IEEE
,
pages 639
644
.

40.

Zhu
,
Y.
 et al.  (
2008
)
Geometadb: powerful alternative search engine for the gene expression omnibus
.
Bioinformatics
,
24
,
2798
2800
.

41.

Demner-Fushman
,
D.
 et al.  (
2017
)
Metamap lite: an evaluation of a new Java implementation of metamap
.
Journal of the American Medical Informatics Association
,
24
,
841
844
.

42.

Chen
,
G.
 et al.  (
2019
) Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis.
Database
, p 2019.

43.

Patra
,
B.G.
 et al.  (
2020
)
A content-based literature recommendation system for datasets to improve data reusability-a case study on gene expression omnibus (geo) datasets
.
Journal of Biomedical Informatics
, page 103399.

44.

Patra
,
B.G.
 et al.  (
2020
) A content-based dataset recommendation system for researchers – a case study on gene expression omnibus (geo) repository.
Database
, p 2020.

Author notes

Citation details: Patra,B.G., Soltanalizadeh,B., Deng,N., et al., An informatics research platform to make public gene expression time-course datasets reusable for more scientific discoveries. Database (2020) Vol. 00: article ID baaa074; doi:10.1093/database/baaa074

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.