DGPD: a knowledge database of dense granule proteins of the Apicomplexa

Author Notes

Abstract

Apicomplexan parasites cause severe diseases in human and livestock. Dense granule proteins (GRAs), specific to the Apicomplexa, participate in the maintenance of intracellular parasitism of host cells. GRAs have better immunogenicity and they can be emerged as important players in vaccine development. Although studies on GRAs have increased gradually in recent years, due to incompleteness and complexity of data collection, biologists have difficulty in the comprehensive utilization of information. Thus, there is a desperate need of user-friendly resource to integrate with existing GRAs. In this paper, we developed the Dense Granule Protein Database (DGPD), the first knowledge database dedicated to the integration and analysis of typical GRAs properties. The current version of DGPD includes annotated GRAs metadata of 245 samples derived from multiple web repositories and literature mining, involving five species that cause common diseases (Plasmodium falciparum, Toxoplasma gondii, Hammondia hammondi, Neospora caninum and Cystoisospora suis). We explored the baseline characteristics of GRAs and found that the number of introns and transmembrane domains in GRAs are markedly different from those of non-GRAs. Furthermore, we utilized the data in DGPD to explore the prediction algorithms for GRAs. We hope DGPD will be a good database for researchers to study GRAs.

Database URL: http://dgpd.tlds.cc/DGPD/index/

Introduction

Apicomplexan parasites include Plasmodium falciparum, Toxoplasma gondii, Hammondia hammondi, Neospora caninum, Cystoisospora suis, etc., causes diseases not only in animals but also in humans (1). Nearly, all creatures can be the host of the apicomplexan species (2). P. falciparum and T. gondii are the causative agents of two important human diseases: malaria and toxoplasmosis, respectively (3, 4). Toxoplasmosis, as one of the most important diseases, is also related to reproductive failure of sows (5). N. caninum engenders neosporosis causing infectious abortion in cattle worldwide (6). Thus, apicomplexan parasites have a great influence on human health and animal husbandry, resulting in public health problems and economic loss (7, 8).

Dense granule proteins (GRAs) are a category of immunocompetent proteins secreted by the apicomplexan parasites’ secretory organelles known as dense granules. Most of the GRAs locate within the parasitophorous vacuole (PV) where the parasite multiplies and maintains intracellular parasitism in nearly all nucleated host cells, mainly by modifying the PV at the interface between the host cell and the parasite (9). Besides, several members of the GRAs also are secreted to nucleus or cytoplasm of infected host cells (10). The functions of these GRAs with different localization are also diverse, such as participating in the formation of tubular membrane (11), regulating signaling pathways in host cells (12) and affecting the transport of substances in the vacuolar membrane (13). Even so, the exact biological mechanisms of GRAs are not fully understood.

The traditional identification methods used to isolate parasite’s dense granules were biochemical fractionation approaches, but the excessive parasite and/or host contamination limited its application (14). Recently, proximity-dependent biotin identification (BioID) technique has been widely used for GRAs screening, but there is also the problem of non-specific protein contamination (15). The vast workload has brought inconvenience to the experimental work and caused the waste of resources. The next-generation sequencing technology provides new ideas for peptide research and bioinformatics methods are commonly used by current researchers to discover new functional peptides. As a special class of proteins, different GRAs also share a few same features, which commonly play a role in GRAs identification (6). There are already two genomics resources (PlasmoDB and ToxoDB) for Plasmodium and Toxoplasma, including GRAs of P. falciparum, T. gondii and other species (16, 17). However, there is no integrated database for GRAs at present, bringing about difficult for researchers to analyze functional characteristics of GRAs and to develop prediction tools.

By the development of modern technologies, many studies report that GRAs have potential applications in different aspects of functions (18, 19). But little has been done to build a golden benchmark GRA dataset in this research field, thus there is an urgent need for a dedicated database. Here, we integrate rich GRAs data to develop Dense Granule Protein Database (DGPD) to address these problems. Furthermore, we also use the available data to investigate the GRAs prediction algorithms. Comprehensive information about molecular weight, intron, signal peptide and signal peptide, etc., are available at http://dgpd.tlds.cc/DGPD/index/.

In this study, our main contributions are summarized as follows:

We integrate GRAs about five separate species from Apicomplexans. To the best of our knowledge, this is the first time to collect and study biological information for GRAs in depth.
We explore and analyze the baseline characteristics of GRAs in DGPD through comparison with non-GRAs, finding that GRAs tend to have fewer introns and more transmembrane domains.
This database can be a publicly available gold-standard benchmark data set for the development and evaluation of methods for predicting novel GRAs.

Material and methods

Design of database

The DGPD construction and analysis include the following steps as shown in Figure 1.

Figure 1.

Workflow for data curation in DGPD database. Experimentally validated GRAs are classified as the group of ‘confirmed GRAs’ (with blue arrows). Highly suspected GRAs existing in the main text or attachment of literature are included in the group of ‘likely GRAs’ (with orange arrows). Homologous proteins of known dense granule proteins in PlasmoDB and ToxoDB database are included in the group of ‘predicted GRAs’ (with green arrows).

Open in new tab Download slide

(1) Search and collect GRAs related literature from PubMed.

(2) Sort out the specific information and relevant features of genes in the literature, according to database design and requirements.

(3) Manually screen positive and negative samples, following by feature engineering.

(4) Develop prediction models for GRAs.

(5) Build the website and complete relevant tests.

Acquisition of protein data

The core idea of this work is to analyze existing data on GRAs, and our first task is to collect the biological information. Most of the GRAs in the database are acquired in relevant literatures, and a small portion come from our previous study. At first, we searched the scientific literatures about GRAs from PubMed with a set of keywords, such as ‘dense granule protein’, ‘GRA’, ‘TgGRA’, ‘NcGRA’ and so on. At this step, more than 1200 articles were obtained. Then, we preferentially selected two model organisms, P. falciparum and T. gondii, as well as their similar species as literature screening strategies. After literatures extracting, we removed papers without full-text and others unrelated to GRAs during this process. Next, we browsed the articles based on title and abstracts, then downloaded the full-text PDF version. For each protein, we extracted the corresponding metadata including a brief description and correlation property from papers. Herein, the detailed protein information were downloaded from databases of PlasmoDB, ToxoDB, NCBI, Uniprot and PDB by the gene login number in the literature.

Noteworthy, most homologous proteins possess identical or similar functions (20). Therefore, we also searched homologous GRAs of other species in PlasmoDB and ToxoDB by experimentally validated GRAs, and then brought them into DGPD. The above method was used for data collection of other species, except Plasmodium and Toxoplasma. Many studies proved that some typical characteristics play a part in the biological function of protein, such as the presence or absence of signal peptides (21), domains (22), the number of intron (23), etc. Hence, we focused on collecting the protein characteristics that contribute to the GRAs research for improving the availability of database.

Data integration and processing

GRAs in the DGPD database contain the searchable names of gene and species, signal peptide (SignalP), intron, transmembrane domains (TMHMM), molecular weight and their level of evidence. For sharing information, data standardization and annotation are essential. Therefore, the collected data need to be processed into format that users can access.

Apart from the data directly mined from the literature, we acquired the GRAs in some existing resources. For example, ToxoDB is a database closely related to Toxoplasma, which has developed into maturity gradually since its initial release (24). In the ToxoDB website, the number of exons was obtained from the gene Model section and subtracted one to get the number of introns. From the Protein Feature and Properties section, the molecular weight of protein was obtained. Whether the protein contains signal peptides would be determined by the ‘yes’ and ‘no’ under the ‘Has SignalP’ column. For collecting the TMHMM and domain information, we need to observe the table from the attributes and protein browser section, confirming whether the domain exists through the descriptions under the track of InterPro Domains: one or more bands indicate presence and vice versa. Besides, the number of purple band under the track of transmembrane domains equals to the TMHMM number of the protein.

Combining information obtained from the related literatures, we divided the data into three evidence levels based on their source. The GRAs which has experimentally evidence are categorized as the highest level of confidence (Level 1). The ones that are documented in the main body or supplementary materials of the articles are the highly suspected GRAs (Level 2). Homologous proteins collected by the identified GRAs usually have the same function with them, regarding as the predicted GRAs (Level 3). We also added a download function to the DGPD database; biomedical researchers could explore, visualize and intuitively analyze these data.

GRAs prediction

To identify the authenticity of the data and demonstrate the usefulness of the DGPD, we constructed a binary-classification model to distinguish GRAs. The workflow for developing prediction algorithm is shown in Figure 2. We first built the training datasets from DGPD and ToxoDB. And, then feature engineering was performed to extract sequence features. Finally, five machine learning models were compared in predicting GRAs, as elaborated below. We obtained 245 protein information from the DGPD database as positive samples for the prediction experiment. For negative samples, from the ToxoDB database, we retrieved 15 621 proteins in the five parasite species as the positive ones. To increase the likelihood that proteins are not GRAs and to retain sufficient proteins for this dataset, we only used proteins with the description ‘unspecified product’ or ‘hypothetical protein’ in the ToxoDB database. A total of 2826 proteins were collected in this manner. After deduplication, we obtained 1706 proteins as the putative non-GRAs. With these constraints, the final dataset contained 245 and 1706 proteins.

Figure 2.

The workflow of prediction models for identifying GRAs.

Open in new tab Download slide

iLearn is an integrated platform and meta-learner for modeling of DNA, RNA and protein sequence data (25). And, we utilized the protein sequences to extract a variety of protein features by ilearn, including CTD (composition/transition/distribution), CKSAAP (composition of k-spaced amino acid pairs), SOCNumber (sequence-order-coupling number), CTDD (distribution) and CTDC (composition) (26). In this paper, we chose the CTD features, which denoted the distribution pattern of some particular amino acids.

Five classical machine learning algorithms, i.e. decision trees, random forest, extremely randomized trees, Gaussian naïve Bayes and support vector machine (SVM), were selected to develop the classifiers. We adopted two evaluation metrics, the area under precision–recall curve (AUPRC) (27) and the area under ROC curve (AUC) (28) to evaluate the overall performance in the prediction experiment. Furthermore, as known GRAs are much less than non-GRAs, we used AUPR as the primary metric, which punishes false positive more in the evaluation process (29, 30). And, other metrics are also calculated, including recall, specificity, precision, ACC and F1-score for comparing different machine learning methods for constructing prediction models.

Results and discussion

Statistics of database

Presently, DGPD provides 245 GRAs covering five typical species: T. gondii (70.2%), C. suis (3.7%), H. hammondi (11.8%), N. caninum (6.5%) and P. falciparum (7.8%). Some important protein metadata were supplemented in DGPD, such as the protein sequences, intron, thnum, etc. In particular, we labeled each protein with an evidence level based on its source to ensure the data credibility, including 110 confirmed GRAs (Level 1), 37 likely GRAs (Level 2) and 98 predicted GRAs (Level 3). Table 1 shows the detailed database statistics. In DGPD, the indistinct GRAs or those whose functions/features are unclear exist in the group of ‘likely’ or ‘predicted’. We also welcome users to contact us through the Submit Panel or email provided at the webpage when finding novel GRAs. And, the request will be validated. Additionally, we will constantly collect the experimentally proven GRAs and DGPD will be periodic updated.

Table 1.

Open in new tab

Statistics in DGPD

Species	Level 1	Level 2	Level 3	Total
Toxoplasma gondii	66	26	80	172
Hammondia hammondi	11	0	18	29
Plasmodium falciparum	8	11	0	19
Neospora caninum	16	0	0	16
Cystoisospora suis	9	0	0	9

Species	Level 1	Level 2	Level 3	Total
Toxoplasma gondii	66	26	80	172
Hammondia hammondi	11	0	18	29
Plasmodium falciparum	8	11	0	19
Neospora caninum	16	0	0	16
Cystoisospora suis	9	0	0	9

Table 1.

Open in new tab

Statistics in DGPD

Species	Level 1	Level 2	Level 3	Total
Toxoplasma gondii	66	26	80	172
Hammondia hammondi	11	0	18	29
Plasmodium falciparum	8	11	0	19
Neospora caninum	16	0	0	16
Cystoisospora suis	9	0	0	9

Species	Level 1	Level 2	Level 3	Total
Toxoplasma gondii	66	26	80	172
Hammondia hammondi	11	0	18	29
Plasmodium falciparum	8	11	0	19
Neospora caninum	16	0	0	16
Cystoisospora suis	9	0	0	9

Implementation of database website

DGPD provides a user-friendly interactive web and users can browse, search and download the data. We adopt Django frame to coordinate MySql database for back-end setup of DGPD. LayUI, an open-source web framework is used to construct the front-end panel. The DGPD homepages consist of five panels. And, Figure 3 shows the details of web.

Figure 3.

A web-interface of DGPD database. (A) Panel of GRA repository. A statistics visualization is displayed on the right. The gene information can be viewed by submitting keywords in search bar. (B) Panel of gene information. Detailed information of gene that users search is visualized on this panel. (C) Panel of database introduction and help. Users will receive help and brief introduction for database functions. The catalog is displayed on the top left. (D) Download panel. All data are available through this panel. (E) Data submission panel. The novel GRAs information is allowed to submit in this panel. (F) Contact panel. The different contact ways is provided for user to communicate with us.

Open in new tab Download slide

Home Panel to search\browse proteins

In this panel, users can browse the desired proteins by selecting the species or gene names. It also allows users to use the specified data to search (e.g. organism name) (Figure 3A). After submitting specific search criteria, the webpage will redirect to gene browsing page (Figure 3B) with data message (e.g. gene sequence). Users could click on the hyperlinks of genes or PMIDs to reach the detailed information from the corresponding NCBI pages. In addition, we provide a 3D graph for each specific protein to help visualize the information on the tertiary and secondary structure of GRAs.

About panel to introduce the database

The catalog in the panel makes users utilize DGPD reasonably (Figure 3C). We adopted different tabs to facilitate users to view helps and messages about the database, such as the brief introduction, the web browser requirements and the database usages.

Download panel to obtain the protein data

All GRAs data in the DGPD are open-source. Researchers can obtain the detailed protein information by clicking the download button (Figure 3D).

Submit panel to upload the new data

In recent years, the correlational study of GRAs has developed rapidly. Many novel GRAs continue to emerge in this field. To ensure effectiveness of DGPD, we will update the database regularly. Furthermore, we welcome users to provide protein information related with new GRAs by submit panel (Figure 3E). After information submitting, we will review it, and the feedback will be sent to the submitted email address.

Contact panel to stay in touch with us

Address and email of us are listed in this panel (Figure 3F). We hope that more researchers will contribute valuable comments to our database. We would like to encourage users to communicate with us on relevant topics and issues.

Exploration of GRAs characteristics

Generally, gene-related attributes can be obtained from some dominating characteristics. For instance, the number of intron affects the gene expression (31) and the TMHMM number influences the transport of proteins (32). Thus, we carried out characteristic analysis on the curated dataset. We found that the median of intron number closes to 0 in identified GRAs. In contrast, for the negative samples, the median of intron number closes to 3 (Figure 4A). For example, the intron number of TGME49_227280 protein (GRA3) is 0 and TGGT1_209200 protein (non-GRAs) is 11. Excessive introns can cause the dysregulation of gene products expression, and GRAs possessing low-level intron number may avoid aberrant expression (33). Refer to TMHMM, the median of its quantity in GRAs usually close to 2. In contrast, the TMHMM numbers of negative samples is usually lower than that of the positive ones, and the median of TMHMM number is 0 for negative samples (Figure 4B). For instance, the TMHMM number of TgME49_268900 (GRA10) is 2 and TGME49_208760 (non-GRAs) is 0. TMHMM is essential for transmembrane proteins, mostly composed of hydrophobic amino acids (32). As secreted proteins that is mostly the type I transmembrane proteins, GRAs usually contain a variable number of TMHMM that may affect the structure of intravacuolar network membrane (9). Signal peptides play an important influence in the protein translocation (34). We also investigate the signal peptide pattern of GRAs by comparing them with putative non-GRAs. The bar plots in Figure 4C show that GRAs in DGPD are more inclined to contain a signal peptide than non-GRAs (P-values < 2.2e−16, Fisher’s exact test). These results may provide new ideas for dense granule protein discovery.

Figure 4.

Feature analysis between positive and negative samples across species. Orange and blue represent GRAs and non-GRAs, respectively.

Open in new tab Download slide

Development of prediction model for GRAs

Here, to demonstrate how to use the data in DGPD, we conduct a case study to develop machine learning-based prediction models for GRAs. In this task, after generating the dataset containing positive and negative samples depicted in the above, the CTD feature descriptors are extracted (35). Then, the full dataset and quantified features are used to fit the models. We optimize the existing model framework and maximize the mean AUC, AUPRC and other evaluation metrics by using 5-fold cross-validation (36). SVM, a machine learning method based on statistical learning theory for small sample set, is a common choice in the binary-classification problems (37, 38). Herein, the performance of SVM consistently outperforms other compared models on most metrics (Figure 5). The average AUC and AUPRC of the best model are 0.9372 and 0.7815, respectively. Thus, we select SVM to establish the prediction model and make the source codes with training data available in the Download page. In addition, we conduct hyperparameter optimization and found it had a small influence on the model performance. Figure 5 illustrates the performance of different classification algorithms on the dataset with 5-fold cross-validation.

Figure 5.

Performance of different machine learning-based models.

Open in new tab Download slide

In practice, non-GRAs always dominate over true GRAs, even a small false positive rate will result in a large number of false positive predictions. As can be seen in Figure 5, the AUC value is generally higher than AUPRC for each machine learning algorithm. This might be partly due to that the data imbalance causes the prediction bias to negative samples. This bias means more samples are classified into non-GRAs, resulting in a higher specificity that leads to a relatively high AUC. On the contrary, these false positives further lead to a lower precision, which is the crucial factor that makes the AUPRC decreases (29).

Conclusion

Dense granule proteins have been demonstrated to play a major role in multiple complex diseases caused by apicomplexan parasites. Thus, deepening the research on GRAs is essential to understand and treat the diseases. However, this research field is progressing slowly owing to difficulties in GRAs data collection and mining. In this paper, we integrate rich existing GRAs data to build the first repository, DGPD. It contains abundant proteins across five representative categories, consisting primarily of the basic information and additional annotations. Users can utilize the DGPD database information to conduct targeted research on existing GRAs and further on understanding the action mechanism.

With the development of science and technology, more novel GRAs and efficient prediction algorithms will, respectively, be discovered and developed. Our current work may still have flaws, such as the shallow study of machine learning algorithm in the GRAs prediction. By the advent of deep learning, the GRAs prediction capabilities will be strengthened. In the future, we will always focus on the latest research results in the GRAs field, and incorporate more new GRAs and species into DGPD. In view of the exact biological function of GRAs are still controversial, we will be devoted to exploring the biological problems related to GRAs. We hope that DGPD would be a unique platform for further investigation of GRAs function and action mechanism in the disease treatment.

Funding

This work was supported by the grants from the Science and Technology Department of Anhui Province (Natural Science Young Foundation of Anhui, 2008085QC136, 2008085QF293), the National Natural Science Foundation of China (62102004), the Natural Science Young Foundation of Anhui Agricultural University (2019zd12) and the Introduction and Stabilization of Talent Project of Anhui Agricultural University (yj2019-32).

Conflict of interest

None declared.

References

Mercier

Adjogble

K.D.Z.

Däubener

et al. (

2005

)

Dense granules: are they key organelles to help understand the parasitophorous vacuole of all apicomplexa parasites?

Int. J. Parasitol.

829

–

849

. doi:

10.1016/j.ijpara.2005.03.011

Egea

P.F.

(

2020

)

Crossing the vacuolar rubicon: structural insights into effector protein trafficking in apicomplexan parasites

Microorganisms

, 865.doi:

10.3390/microorganisms8060865

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

Hill

and

Dubey

J.P.

(

2002

)

Toxoplasma gondii: transmission, diagnosis and prevention

Clin. Microbiol. Infect.

634

–

640

.doi:

10.1046/j.1469-0691.2002.00485.x

Feleke

S.M.

Reichert

E.N.

Mohammed

et al. (

2021

)

Plasmodium falciparum is evolving to escape malaria rapid diagnostic tests in Ethiopia

Nat. Microbiol.

1289

–

1299

.doi:

10.1038/s41564-021-00962-4

Dubey

J.P.

Lago

E.G.

Gennari

S.M.

et al. (

2012

)

Toxoplasmosis in humans and animals in Brazil: high prevalence, high burden of disease, and epidemiology

Parasitology

139

1375

–

1424

.doi:

10.1017/S0031182012000765

Yang

Wang

Liu

et al. (

2021

)

Biotinylation of the Neospora caninum parasitophorous vacuole reveals novel dense granule proteins

Parasit. Vectors

, 521.doi:

10.1186/s13071-021-05023-7

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

Dessì

Tamponi

Pasini

et al. (

2022

)

A survey on Apicomplexa protozoa in sheep slaughtered for human consumption

Parasitol. Res.

121

1437

–

1445

.doi:

10.1007/s00436-022-07469-9

Schares

Globokar Vrhovec

Tuschy

et al. (

2021

)

A real-time quantitative polymerase chain reaction for the specific detection of Hammondia hammondi and its differentiation from Toxoplasma gondii

Parasit. Vectors

, 78.doi:

10.1186/s13071-020-04571-8

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

Rome

M.E.

Beck

J.R.

Turetzky

J.M.

et al. (

2008

)

Intervacuolar transport and unique topology of GRA14, a novel dense granule protein in Toxoplasma gondii

Infect. Immun.

4865

–

4875

.doi:

10.

Achbarou

Mercereau-Puijalon

Sadak

et al. (

1991

)

Differential targeting of dense granule proteins in the parasitophorous vacuole of Toxoplasma gondii

Parasitology

103

321

–

329

.doi:

10.1017/S0031182000059837

11.

Travier

Mondragon

Dubremetz

J.-F.

et al. (

2008

)

Functional domains of the Toxoplasma GRA2 protein in the formation of the membranous nanotubular network of the parasitophorous vacuole

Int. J. Parasitol.

757

–

773

.doi:

10.1016/j.ijpara.2007.10.010

12.

Braun

Brenier-Pinchart

M.-P.

Yogavel

et al. (

2013

)

A Toxoplasma dense granule protein, GRA24, modulates the early immune response to infection by promoting a direct and sustained host p38 MAPK activation

J. Exp. Med.

210

2071

–

2086

.doi:

13.

Heaslip

A.T.

Nelson

S.R.

and

Warshaw

D.M.

(

2016

)

Dense granule trafficking in Toxoplasma gondii requires a unique class 27 myosin and actin filaments

Mol. Biol. Cell

2080

–

2089

.doi:

10.1091/mbc.E15-12-0824

14.

Petry

and

Harris

J.R.

(

1999

)

Ultrastructure, fractionation and biochemical analysis of Cryptosporidium parvum sporozoites

Int. J. Parasitol.

1249

–

1260

.doi:

10.1016/S0020-7519(99)00080-6

15.

Kimmel

Kehrer

Frischknecht

et al. (

2022

)

Proximity‐dependent biotinylation approaches to study apicomplexan biology

Mol. Microbiol.

117

553

–

568

.doi:

16.

Harb

O.S.

Roos

D.S.

(

2020

) ToxoDB: functional genomics resource for toxoplasma and related organisms. In:

Tonkin

(ed). Vol.

2071

Toxoplasma Gondii, Methods in Molecular Biology

Springer

New York

, pp.

–

17.

Aurrecoechea

Brestelli

Brunk

B.P.

et al. (

2009

)

PlasmoDB: a functional genomic database for malaria parasites

Nucleic Acids Res.

D539

–

D543

.doi:

18.

Fox

B.A.

Sanders

K.L.

Rommereim

L.M.

et al. (

2016

)

Secretion of rhoptry and dense granule effector proteins by nonreplicating Toxoplasma gondii uracil auxotrophs controls the development of antitumor immunity

PLoS Genet.

, e1006189.doi:

10.1371/journal.pgen.1006189

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

19.

Mercer

H.L.

Snyder

L.M.

Doherty

C.M.

et al. (

2020

)

Toxoplasma gondii dense granule protein GRA24 drives MyD88-independent p38 MAPK activation, IL-12 production and induction of protective immunity

PLoS Pathog.

, e1008572.doi:

10.1371/journal.ppat.1008572

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

20.

Overington

J.P.

(

1992

)

Comparison of three-dimensional structures of homologous proteins

Curr. Res. Struct. Biol.

394

–

401

.doi:

10.1016/0959-440X(92)90231-U

Google Scholar

Crossref

WorldCat

21.

Mercier

and

Cesbron-Delauw

M.-F.

(

2015

)

Toxoplasma secretory granules: one population or more?

Trends Parasitol.

–

.doi:

10.1016/j.pt.2014.12.002

22.

Quevillon

Silventoinen

Pillai

et al. (

2005

)

InterProScan: protein domains identifier

Nucleic Acids Res.

W116

–

W120

.doi:

23.

Davis

E.O.

Thangaraj

H.S.

Brooks

P.C.

et al. (

1994

)

Evidence of selection for protein introns in the recAs of pathogenic mycobacteria

EMBO J.

699

–

703

.doi:

10.1002/j.1460-2075.1994.tb06309.x

24.

Gajria

Bahl

Brestelli

et al. (

2007

)

ToxoDB: an integrated Toxoplasma gondii database resource

Nucleic Acids Res.

D553

–

D556

.doi:

25.

Chen

Zhao

et al. (

2020

)

iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data

Brief. Bioinf.

1047

–

1057

.doi:

26.

Chen

Zhao

et al. (

2018

)

iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences

Bioinformatics

2499

–

2502

.doi:

10.1093/bioinformatics/bty140

27.

Ozenne

Subtil

and

Maucort-Boulch

(

2015

)

The precision–recall curve overcame the optimism of the receiver operating characteristic curve in rare diseases

J. Clin. Epidemiol.

855

–

859

.doi:

10.1016/j.jclinepi.2015.02.010

28.

Lobo

J.M.

Jiménez-Valverde

and

Real

(

2008

)

AUC: a misleading measure of the performance of predictive distribution models

Glob. Ecol. Biogeogr.

145

–

151

.doi:

10.1111/j.1466-8238.2007.00358.x

Google Scholar

Crossref

WorldCat

29.

Davis

and

Goadrich

(

2006

)

The relationship between precision–recall and ROC curves

. In: Proceedings of the 23rd international conference on Machine learning— ICML’06.

ACM Press

Pittsburgh, Pennsylvania

, pp.

233

–

240

30.

Yue

Chu

and

Xia

(

2021

)

PredCID: prediction of driver frameshift indels in human cancer

Brief. Bioinf.

, bbaa119.doi:

10.1093/bib/bbaa119

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

31.

Buchman

A.R.

and

Berg

(

1988

)

Comparison of intron-dependent and intron-independent gene expression

Mol. Cell. Biol.

4395

–

4405

. doi:

10.1128/mcb.8.10.4395-4405.1988

32.

Käll

Krogh

and

Sonnhammer

E.L.L.

(

2004

)

A combined transmembrane topology and signal peptide prediction method

J. Mol. Biol.

338

1027

–

1036

.doi:

10.1016/j.jmb.2004.03.016

33.

Grabski

D.F.

Broseus

Kumari

et al. (

2021

)

Intron retention and its impact on gene expression and protein diversity: a review and a practical guide

Wiley Interdiscip. Rev. RNA

, e1631.doi:

10.1002/wrna.1631

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

34.

Choo

K.H.

Tan

T.W.

and

Ranganathan

(

2005

)

SPdb—a signal peptide database

BMC Bioinform.

, 249.doi:

10.1186/1471-2105-6-249

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

35.

Charoenkwan

Nantasenamat

Hasan

M.M.

et al. (

2022

)

StackDPPIV: a novel computational approach for accurate prediction of dipeptidyl peptidase IV (DPP-IV) inhibitory peptides

Methods

204

189

–

198

.doi:

10.1016/j.ymeth.2021.12.001

36.

Fushiki

(

2011

)

Estimation of prediction error by using K-fold cross-validation

Stat. Comput.

137

–

146

.doi:

10.1007/s11222-009-9153-8

Google Scholar

Crossref

WorldCat

37.

Huang

Cai

Pacheco

P.P.

et al. (

2018

)

Applications of support vector machine (SVM) learning in cancer genomics

Cancer Genomics Proteomics

–

. doi:

38.

Mohammadi

Rashid

T.A.

Karim

S.H.T.

et al. (

2021

)

A comprehensive survey and taxonomy of the SVM-based intrusion detection systems

J. Netw. Comput. Appl.

178

, 102983.doi:

10.1016/j.jnca.2021.102983

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

Author notes

†

contributed equally to this work.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Download all slides

Month:	Total Views:
September 2022	206
October 2022	210
November 2022	115
December 2022	114
January 2023	48
February 2023	63
March 2023	81
April 2023	66
May 2023	146
June 2023	121
July 2023	138
August 2023	142
September 2023	143
October 2023	74
November 2023	24
December 2023	26
January 2024	39
February 2024	41
March 2024	28
April 2024	33
May 2024	20
June 2024	31
July 2024	33
August 2024	22
September 2024	25
October 2024	27
November 2024	18
December 2024	28
January 2025	16
February 2025	8
March 2025	26
April 2025	28
May 2025	36
June 2025	17
July 2025	3

Article Contents

DGPD: a knowledge database of dense granule proteins of the Apicomplexa

Abstract

Introduction

Material and methods

Design of database

Acquisition of protein data

Data integration and processing

GRAs prediction

Results and discussion

Statistics of database

Implementation of database website

Home Panel to search\browse proteins

About panel to introduce the database

Download panel to obtain the protein data

Submit panel to upload the new data

Contact panel to stay in touch with us

Exploration of GRAs characteristics

Development of prediction model for GRAs

Conclusion

Funding

Conflict of interest

References

Author notes

Citations

Views

Altmetric

Citing articles via

Latest

Most Read

Most Cited

Article Contents

DGPD: a knowledge database of dense granule proteins of the Apicomplexa Open Access

Abstract

Introduction

Material and methods

Design of database

Acquisition of protein data

Data integration and processing

GRAs prediction

Results and discussion

Statistics of database

Implementation of database website

Home Panel to search\browse proteins

About panel to introduce the database

Download panel to obtain the protein data

Submit panel to upload the new data

Contact panel to stay in touch with us

Exploration of GRAs characteristics

Development of prediction model for GRAs

Conclusion

Funding

Conflict of interest

References

Author notes

Citations

Views

Altmetric

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only

DGPD: a knowledge database of dense granule proteins of the Apicomplexa