Peptipedia: a user-friendly web application and a comprehensive database for peptide research supported by Machine Learning approach

Summary of main categories for biological activities

Biological activity	Brief description
Propeptide	Precursor with no biological activity. This molecule can be activated after a post-translational modification, such as the cleavage of a region or the addition of another molecule. Some of the molecules in this group could be in the ‘therapeutic category’ in their active form.
Signal	Used as a post-translational modification or translocation, because these peptides are useful for marking the protein secretion pathway and target location. These molecules are commonly used for the recombinant protein production, diagnosis and vaccination.
Transit	Involved in the transport of a protein encoded by a nuclear gene to a particular organelle, such as mitochondrion, chloroplast and peroxisome, among others.
Cell sensing	Activities related to cell detection mechanisms, such as quorum sensing, chemotactic movement, cell-to-cell communication and defence mechanisms, among others.
Drug delivery vehicle	Substance that helps a drug to be safely delivered to its therapeutic target, reducing toxic effects or degradation. Some examples are emulsions, polymers, semi-solid products, nanoparticles and encapsulations, among others.
Therapeutic	Able to be used for sickness treatments. The specific activity of this peptide will depend on the therapeutic target. This category includes antimicrobial, anticancer, toxic, metabolic and bioactive peptides.
Neurological	Activity related to the neurons or the nervous system. This group includes neuropeptides, brain peptides and antinociceptive activities.
Immunological	Activities related to the immune response against foreign substances. It could be related to defence mechanisms, immunomodulatory activities and wound healing, allergenic reactions and cell degranulation mechanism, among others.
Other	The activities of this group are not directly related to the other main categories. Includes four main subdivisions: mammalian and cancer cell peptides, protein peptides and surface-immobilized peptides.

Biological activity	Brief description
Propeptide	Precursor with no biological activity. This molecule can be activated after a post-translational modification, such as the cleavage of a region or the addition of another molecule. Some of the molecules in this group could be in the ‘therapeutic category’ in their active form.
Signal	Used as a post-translational modification or translocation, because these peptides are useful for marking the protein secretion pathway and target location. These molecules are commonly used for the recombinant protein production, diagnosis and vaccination.
Transit	Involved in the transport of a protein encoded by a nuclear gene to a particular organelle, such as mitochondrion, chloroplast and peroxisome, among others.
Cell sensing	Activities related to cell detection mechanisms, such as quorum sensing, chemotactic movement, cell-to-cell communication and defence mechanisms, among others.
Drug delivery vehicle	Substance that helps a drug to be safely delivered to its therapeutic target, reducing toxic effects or degradation. Some examples are emulsions, polymers, semi-solid products, nanoparticles and encapsulations, among others.
Therapeutic	Able to be used for sickness treatments. The specific activity of this peptide will depend on the therapeutic target. This category includes antimicrobial, anticancer, toxic, metabolic and bioactive peptides.
Neurological	Activity related to the neurons or the nervous system. This group includes neuropeptides, brain peptides and antinociceptive activities.
Immunological	Activities related to the immune response against foreign substances. It could be related to defence mechanisms, immunomodulatory activities and wound healing, allergenic reactions and cell degranulation mechanism, among others.
Other	The activities of this group are not directly related to the other main categories. Includes four main subdivisions: mammalian and cancer cell peptides, protein peptides and surface-immobilized peptides.

Table 1.

Summary of main categories for biological activities

Biological activity	Brief description
Propeptide	Precursor with no biological activity. This molecule can be activated after a post-translational modification, such as the cleavage of a region or the addition of another molecule. Some of the molecules in this group could be in the ‘therapeutic category’ in their active form.
Signal	Used as a post-translational modification or translocation, because these peptides are useful for marking the protein secretion pathway and target location. These molecules are commonly used for the recombinant protein production, diagnosis and vaccination.
Transit	Involved in the transport of a protein encoded by a nuclear gene to a particular organelle, such as mitochondrion, chloroplast and peroxisome, among others.
Cell sensing	Activities related to cell detection mechanisms, such as quorum sensing, chemotactic movement, cell-to-cell communication and defence mechanisms, among others.
Drug delivery vehicle	Substance that helps a drug to be safely delivered to its therapeutic target, reducing toxic effects or degradation. Some examples are emulsions, polymers, semi-solid products, nanoparticles and encapsulations, among others.
Therapeutic	Able to be used for sickness treatments. The specific activity of this peptide will depend on the therapeutic target. This category includes antimicrobial, anticancer, toxic, metabolic and bioactive peptides.
Neurological	Activity related to the neurons or the nervous system. This group includes neuropeptides, brain peptides and antinociceptive activities.
Immunological	Activities related to the immune response against foreign substances. It could be related to defence mechanisms, immunomodulatory activities and wound healing, allergenic reactions and cell degranulation mechanism, among others.
Other	The activities of this group are not directly related to the other main categories. Includes four main subdivisions: mammalian and cancer cell peptides, protein peptides and surface-immobilized peptides.

Biological activity	Brief description
Propeptide	Precursor with no biological activity. This molecule can be activated after a post-translational modification, such as the cleavage of a region or the addition of another molecule. Some of the molecules in this group could be in the ‘therapeutic category’ in their active form.
Signal	Used as a post-translational modification or translocation, because these peptides are useful for marking the protein secretion pathway and target location. These molecules are commonly used for the recombinant protein production, diagnosis and vaccination.
Transit	Involved in the transport of a protein encoded by a nuclear gene to a particular organelle, such as mitochondrion, chloroplast and peroxisome, among others.
Cell sensing	Activities related to cell detection mechanisms, such as quorum sensing, chemotactic movement, cell-to-cell communication and defence mechanisms, among others.
Drug delivery vehicle	Substance that helps a drug to be safely delivered to its therapeutic target, reducing toxic effects or degradation. Some examples are emulsions, polymers, semi-solid products, nanoparticles and encapsulations, among others.
Therapeutic	Able to be used for sickness treatments. The specific activity of this peptide will depend on the therapeutic target. This category includes antimicrobial, anticancer, toxic, metabolic and bioactive peptides.
Neurological	Activity related to the neurons or the nervous system. This group includes neuropeptides, brain peptides and antinociceptive activities.
Immunological	Activities related to the immune response against foreign substances. It could be related to defence mechanisms, immunomodulatory activities and wound healing, allergenic reactions and cell degranulation mechanism, among others.
Other	The activities of this group are not directly related to the other main categories. Includes four main subdivisions: mammalian and cancer cell peptides, protein peptides and surface-immobilized peptides.

One of the essential services of Peptipedia is the biological activity classification system for peptide sequences based on machine learning strategies. The training of models was based on the application of supervised learning algorithms combined with sequence encoding approaches, using physicochemical properties and digital signal processing, according to the strategies proposed by Medina-Ortiz et al., 2020 (15). In this way, we generated assembled binary models to recognize activities for peptide sequences employing our categories proposed in this work. The training process was based on developing binary data sets to evaluate two categories: presence or absence of activity. Additionally, we generated each data set using the one vs rest strategy, keeping class imbalance minimum. Finally, in those models with low performance, we used the recursive binary partition strategies, according to the method proposed by Medina-Ortiz et al., 2020 (17) to improve the performance of the classification assembled models.

Implementation and Availability

Peptipedia was designed using a Model View Controller (MVC) design pattern. The view component and the controllers were implemented using JavaScript programming language through the Express framework. Display components were optimized using Bootstrap 4. All the model members, including all service disposed in this work’s proposed tool, were developed using Python v3 programming language, supported by the libraries DMAKit-Lib (16) and Scikit-Learn (21). Both the proposed software architecture and implementation features are detailed in section 2 of Supplementary Information.

Results and Discussion

Peptipedia is a user-friendly web application system to search, analyse, evaluate and characterize peptide sequences using different strategies, including machine learning and data mining techniques. This web tool has a NoSQL database system with 92 055 peptides registered and described, being the most extensive database of peptide sequences with activities reported to date. This tool reports different types of information for each sequence, considering structural, physicochemical and phylogenetic properties. Additionally, various activities previously identified for each peptide are reported and so are the databases or repositories from which they were extracted. Besides, Peptipedia has enabled the information of published patents related to previously published peptides. Finally, statistical properties related to the percentage of residues for each sequence and the average per category are included in the database, providing interesting, useful and easy-to-understand information for scientists and researchers (see Figure 1).

Figure 1.

Representative scheme of building and characteristics of Peptipedia. Peptipedia is a computational tool for peptide sequence analysis. The information presented by our tool was consolidated from 30 databases, considering information on the sequence, taxonomy and different properties of stored peptides. Searching for sequences and relevant information in our web application is easy, personalized and intuitive, allowing download of the information in multiple formats. Peptipedia has enabled different tools that will help characterize and analyse sequences, as well as functionalities supported by machine learning methods that facilitate the development of predictive models and a biological activity predictor system.

Relevant tools and services available in Peptipedia

Searches, visualization and downloads

Different types of searches can be generated in Peptipedia, either with the sequence or through information related to its activity, physicochemical properties and frequency of residues, among other relevant information. Besides, it is possible to apply different filters to generate a personalized exploration for the user’s interest.

We develop a general summary for each search, showing statistical descriptions and various visualizations to display the information. Furthermore, we present specific details for each peptide, including thermodynamic properties, taxonomy, phylogeny, biological activity and sequence descriptors; we also show the databases where the peptide sequence was previously reported. Remarkably, Peptipedia offers specific information like IC₅₀, assay information, organism evaluation and other relevant characteristics for particular activities such as antihypertensive, anti-HIV and antiviral subcategories.

Peptipedia has general and specific modules for downloading data, making it easier to obtain information and facilitate the download in CSV, FASTA and JSON formats. The complete database can even be downloaded in easily manipulatable forms that include both the sequence and its reported information.

Updating information

To keep the information in the persistent storage system updated, the data download service is periodically executed from different databases that make up Peptipedia, comparing the existing sequences in the current collection with the downloaded sequences, updating the information in the case that corresponds. Besides, a systematic search is done for new tools, libraries or databases reported since the last update date, inserting the appropriate records. Although it is understood that this type of maintenance is not the most suitable for a platform of this style, this type of strategy is preferred due to the curation of the information and the control of the records that are inserted into the web tool. Nevertheless, for future updates of Peptipedia, work is being done on a system for capturing new records in real time based on data mining and semantic web strategies to optimise and automate the process of updating records.

Services

Different services were implemented in Peptipedia to facilitate analysis and characterization of peptide sequences. We propose various services that allow characterization through physicochemical and thermodynamic properties, using the ModLamp (19) library. We also provide modules that enable the estimation of statistical properties for peptide sequences.

Bioinformatics tools such as sequence alignments are available in our web tool: using the Edlib library (26), it is possible to align any sequence against those registered in our database.

Another relevant service is the peptide biological activity classification system supported by assembled predictive models: the user can upload a list of amino acid sequences, and our tool classifies them by the categories proposed in this work, evaluating each of them. Furthermore, a peptide-encoding service is implemented using common strategies such as a One Hot Encoder and more sophisticated ones such as a Embedding through the Tape library.

Finally, Peptipedia allows the generation of predictive models for sets of peptides with specific user requirements through supervised learning algorithms and cross-validation techniques. Configuration of hyperparameters as well as coding strategy and validation method are selectable. The tool reports the performance of the generated model by the user, allowing the download to use it locally. Besides, this service enables the interpretation of the results giving different recommendations about them.

Relevant information in Peptipedia

Using 10 previously proposed categories, we analysed the peptide sequences, identifying therapeutic peptides, signal peptides and sensory activity, as representing the highest prevalence in our records. Peptides with immunological, transit or neurological activity were least common in our database (see Figure 2A).

Figure 2.

Visualization of registered peptides on Peptipedia. Representation of the information contained in Peptipedia. A: distribution of peptides according to the categories proposed in this work. B: analysis of the relationship of simultaneous activities for the same type of peptide; the most significant trends are seen between therapeutics and cell sensing peptides, and between propeptides and signal.

It is important to highlight the moonlighting characteristics of peptides. This feature is the feasibility of a peptide to present different activities at the same time (8). The mainly found tendencies of moonlighting are between the therapeutic and cell sensing peptides, and between propeptides and signal peptides. This last overlapping of activities makes sense because propeptides generally contain a signal peptide in their sequence (34), which they lose once processed (see Figure 2B). This type of property reflects the potential features of a peptide when acting as a drug or presenting different biotechnological uses. Residue frequency analysis allows evaluating amino acid trends for particular activities. We compare trends for the main reported categories, with a clear preference for arginine residues for drug delivery peptides, which can be explained because these kinds of peptides are usually designed to cross membranes, so they need a chemical affinity for negatively charged membranes, which is given by the positive charge of arginine. In contrast, signal, transit and propeptides generally show similar trends. However, no major visible patterns were identified (see Section 4 in Supplementary Information).

The Peptipedia web interface

Peptipedia presents a user-friendly web application tool to increase usability and generate a good user experience for researchers who wish to work with the computational tool proposed in this work. Figure 3 shows different visualizations available in the web interface. Upon entering the platform, a summary of the characteristics presented by Peptipedia and the availability of the information is observed (Figure 3A). Remarkably, the information extraction is made transparent, and the different links of the tools used to integrate the database collection represented by Peptipedia (3B) are enabled. The search for information in the system requires the biological activity to be analysed. Besides, it is possible to apply different filters to personalized the queries based on different user’s requirements. It is important to note that the results are shown in a summary table. It is easy to download them to be able to work on them locally (3C and 3D). The different services enabled in Peptipedia present a simple execution. They are based on entering sequences in FASTA format and generating their execution. Depending on the selected service, a table of characteristics could be displayed, as is the case of the property characterization service (3E), a summary graph of amino acid trends, in the case of the statistical characterization of sequences (3F), as well as the generation of more complex graphs such as the hydrophobicity profiles and helical wheel (3G). Finally, the tool has glossaries and messages that facilitate both the interpretation of the results and the understanding of the different concepts, properties and characteristics worked on in the application (3H).

Figure 3.

Different visualizations of the Peptipedia web interface. Home Page of Peptipedia (A). Different web tools and previously reported database used to generate the integrated collection of peptides in the proposed tool on this work (B). The search in Peptipedia is easy, you can filter by activity and add more specific filters to personalize the search. The results are disposed in a summary table (C and D). Peptipedia has different services or tools to increase the usability of the web platform; to use any tool, you need to insert a sequence in FASTA format and execute the tool. The results will be displayed depending on the selected service, which could be a summary table, graphic summary and specialized graphic (E, F and G). Finally a glossary with different terms is available on Peptipedia to help the understanding of results generated by the application tool (H).

Binary classification of biological activities supported by assembled models

We designed and implemented 44 assembled binary classification models for biological activity of peptide sequences, considering the categories and subcategories proposed in this work. First, we encode the sequences using amino acid encoding of physicochemical properties and their representation in frequency space based on the strategies presented by Medina-Ortiz et al., 2020 (15). Next, we trained predictive models based on supervised learning algorithms and assembled learning strategies (15). Besides, we employing recursive binary division strategies to optimize performance measures (17). As a validation strategy, we used k-fold cross-validation to avoid model overfitting. Remarkably, all the models generated presented an accuracy of over 83% (see Table 2 and section 5 of Supplementary Information for details). We previously compared the results obtained by applying this type of strategies against classical sequence coding methods, demonstrating better results (15). Furthermore, we compare our results with previously developed classification models for peptide sequences. Xiao et al., 2013 (39) proposed a classification system for antimicrobial peptides with 86% accuracy; for the same task, our model achieves a performance of 88.7%. Similarly, Yi et al., 2019 (40) proposed a classification system for anticancer peptides using deep learning long short-term memory model strategies, achieving an accuracy of 81.48%, while our model achieves 83.54%. Another relevant example is identifying quorum sensing peptides (QSPs): Rajput et al., 2015 (22) proposed an identification system for QSPs based on sequence features in combination with support vector machine algorithms, obtaining 93% accuracy; our accuracy is slightly lower for these peptides, reaching an accuracy of 86.4%. However, we present a lower performance in particular situations than previously developed methods. Nevertheless, the proposed strategy is generic, could be applied in biological activity classification of peptide sequences problems, prediction of properties and multiple issues in protein engineering (15). Notably, we validated all our models using statistical methods. Each data set was created by selecting random samples and repeating this process 100 times, providing statistical support and demonstrating the robustness of the biological activity classification models implemented in Peptipedia.

Table 2.

Weighted performance for binary classification models for the nine main categories proposed in this work.

Number	Category	Size data set	Weighted performance
1.	Cell sensing peptides	19 982	85.27
2.	Drug delivery	4912	86.02
3.	Therapeutic	50 000	87.32
4.	Neurological	2712	89.33
5.	Immunological	2178	86.12
6.	Other activities	490	82.98
7.	Transit peptide	1350	88.48
8.	Signal peptide	26 794	86.41
9.	Propeptide	17 768	88.63

Number	Category	Size data set	Weighted performance
1.	Cell sensing peptides	19 982	85.27
2.	Drug delivery	4912	86.02
3.	Therapeutic	50 000	87.32
4.	Neurological	2712	89.33
5.	Immunological	2178	86.12
6.	Other activities	490	82.98
7.	Transit peptide	1350	88.48
8.	Signal peptide	26 794	86.41
9.	Propeptide	17 768	88.63

Table 2.

Weighted performance for binary classification models for the nine main categories proposed in this work.

Number	Category	Size data set	Weighted performance
1.	Cell sensing peptides	19 982	85.27
2.	Drug delivery	4912	86.02
3.	Therapeutic	50 000	87.32
4.	Neurological	2712	89.33
5.	Immunological	2178	86.12
6.	Other activities	490	82.98
7.	Transit peptide	1350	88.48
8.	Signal peptide	26 794	86.41
9.	Propeptide	17 768	88.63

Number	Category	Size data set	Weighted performance
1.	Cell sensing peptides	19 982	85.27
2.	Drug delivery	4912	86.02
3.	Therapeutic	50 000	87.32
4.	Neurological	2712	89.33
5.	Immunological	2178	86.12
6.	Other activities	490	82.98
7.	Transit peptide	1350	88.48
8.	Signal peptide	26 794	86.41
9.	Propeptide	17 768	88.63

Case of study: How to use Peptipedia to develop predictive models

The study of anti-HIV peptides is relevant due to their potential therapeutic applications. They interact with a specific domain of the glycoprotein 41, which is their pharmacological target for inhibiting the virus fusion and entry to the host cell. Different efforts have focused on designing new sequences, either through traditional techniques such as directed evolution or rational design strategies. Both strategies currently benefit from the application of machine learning since it facilitates the simulation of the effects of new variants (36, 38).

To demonstrate the usability of different services of Peptipedia, we implemented a theoretic IC₅₀ predictive model for anti-HIV peptides. First, we identify all the anti-HIV peptides using the search tool available on the web platform, with all the information provided by the downloaded tool. We then filter all sequences with a quantitative IC₅₀ measure and with a defined unit of measurement, discarding the cases in which this unit was expressed using qualitative effects (low, medium or high). Finally, we prepared the data set for the training process by selecting only the peptide sequences, and the IC₅₀ values were standardized so that all the records had the same unit of measurement, in this case, nM (See Table 3 for a summary of the conversions applied depending on the initial unit of measure). Based on the imposed conditions, a data set with 428 examples were generated, whose length of sequences varies between 5 and 150 residues, and the IC₅₀ values are distributed between 0.01 µM and 500 nM.

Table 3.

Summary of transformations applied to standardize IC₅₀ values to nM units of measure.

Number	Unity measurement actual	Change to nM
1.	mg/ml	value × 10⁹/mw
2.	ng/ml	value × 10³/mw
3.	µM	value × 10³
4.	µg/ml	value × 10⁶/mw

Table 3.

Summary of transformations applied to standardize IC₅₀ values to nM units of measure.

Number	Unity measurement actual	Change to nM
1.	mg/ml	value × 10⁹/mw
2.	ng/ml	value × 10³/mw
3.	µM	value × 10³
4.	µg/ml	value × 10⁶/mw

Using the generated data set and the predictive model training service enabled in Peptipedia, predictive models of IC₅₀ values for anti-HIV peptide sequences were trained. We select coding by phylogenetic properties for the service configuration and apply post-processing using the alpha-structure property as a strategy for the pre-processing data set. Besides, we choose random forest as a supervised learning algorithm and we select validation strategy k-fold with k = 10.

The tool reported the model’s performance, achieving a Pearson coefficient of 0.8 (see Figure 4A). Furthermore, Peptipedia allows us to analyse the prediction error’s randomness to determine if there are biases in the generated predictions (see Figure 4B). In this way, we demonstrated the usability of available services on Peptipedia in a specific case of study concerning developing predictive models for anti-HIV peptides. Despite the high-performance value achieved by the predictive model generated using Peptipedia, it is necessary to design more elaborate validations, compare with different coding strategies and combine with various supervised learning algorithms, explore different deep learning architectures, as well as the application of assembled learning strategies, not being the objective of this work to design and implement predictive models for the IC₅₀ of anti-HIV peptide sequences.

Predictive modelling of IC50 for anti-HIV peptides using Peptipedia. A: scatter plot prediction vs reality, denoting the performance of the predictive model. In general, there is no tendency to over-adjust or under-adjust in any particular range, which shows that the cross-validation strategies were correctly applied. B: histogram of the error distribution. The probability of error analysis indicates no tendency for significant errors that adversely alter the model predictions. The errors are mainly concentrated between -5 and 5, which is quite acceptable considering the nature of the entered values, where the largest reach 100 and the smallest are close to zero.

Figure 4.

Predictive modelling of IC₅₀ for anti-HIV peptides using Peptipedia. A: scatter plot prediction vs reality, denoting the performance of the predictive model. In general, there is no tendency to over-adjust or under-adjust in any particular range, which shows that the cross-validation strategies were correctly applied. B: histogram of the error distribution. The probability of error analysis indicates no tendency for significant errors that adversely alter the model predictions. The errors are mainly concentrated between -5 and 5, which is quite acceptable considering the nature of the entered values, where the largest reach 100 and the smallest are close to zero.

10.1093/bioinformatics/18.3.434

Conclusions

We designed and implemented Peptipedia, a web application supported by machine learning algorithms and data mining strategies to characterize and analyse peptide sequences. Additionally, our tool has the most extensive database of peptides with biological activity reported so far, with a total of 92 055 amino acid sequences integrated from 30 databases or repositories of previously reported peptides, Peptipedia has enabled different tools that will help in characterizing and obtaining statistical properties and bioinformatics analysis supported by sequence alignments, as well as services that facilitate the development of predictive models.

Additionally, the sequence and the reported biological activity information of the registered peptides are integrated into a robust binary classification system, implemented through machine learning strategies, allowing to predict putative peptide activities. Moreover, as a previous approach to wet-lab experimental work, it is possible to use these services for performing an activity screening of novel peptides with unknown biological activity. Besides, Peptipedia’s tools could improve the design of peptides since it helps to find residues patterns based on their activity.

Both the usability and the wide range of services available on Peptipedia, as well as the robustness of the predictive systems implemented, considerably improve the current state of the art, becoming an attractive alternative to existing traditional applications and a good support for research in peptide engineering and its biotechnological applications.

Supplementary data

Supplementary data are available at Database Online.

Acknowledgement

This work was supported mainly by the Centre for Biotechnology and Bioengineering—CeBiB (PIA project FB0001, ANID, Chile), Fondecyt 1180882 project and Universidad de Magallanes for MAG1895 project. D.M.-O. gratefully acknowledges ANID, Chile, for Ph.D. fellowship 21181435. J.A.-H. gratefully acknowledges ANID, for Ph.D. fellowship 21182109. B.A.-G. gratefully acknowledges ANID, for Masters fellowship 22212066. A.S.-D. thanks PAI Programme (I7818010006) Powered @ NLHPC: This research was partially supported by the supercomputing infrastructure of the NLHPC (ECM-02).

Conflict of interest

None declared.

Data availability

All codes are available at the authors’ GitHub repository https://github.com/ProteinEngineering-PESB2/peptipedia.

Author contributions statement

Conceptualization, D.M.O. and C.Q.; methodology, C.Q. and D.M.O.; data curation, Y.B.S. and D.M.O.; validation Á-O.N. and D.M.O.; investigation, J.A.H., A.S.D., D.M.O. and B.A.G.; writing and original draft preparation, D.M.O., A.S.D., B.A.G. and J.A.H.; writing, review and editing, Á-O.N., A.S.D., D.M.O. and J.A.H; supervision, Á-O.N. and D.M.O.; visualization, D.M.O. and A.S.D.; project administration, Á-O.N.; and funding resources, Á-O.N.

References

Basith

Manavalan

Hwan Shin

et al. (

2020

)

Machine intelligence in peptide therapeutics: a next-generation tool for rapid disease screening

Med. Res. Rev.

1276

–

1314

.doi:

Blythe

M.J.

Doytchinova

I.A.

and

Flower

D.R.

(

2002

)

Jenpep: a database of quantitative functional peptide data for immunology

Bioinformatics

434

–

439

.doi:

UniProt Consortium.

(

2015

)

Uniprot: a hub for protein information

Nucleic Acids Res.

D204

–

D212

.doi:

Das

Jaiswal

Khan

F.N.

et al. (

2020

)

Plantpepdb: a manually curated plant peptide database

Sci. Rep.

–

PubMed

10.1016/j.drudis.2021.02.019

D’Aloisio

Dognini

Hutcheon

G.A.

et al. (

2021

)

Peptherdia: database and structural composition analysis of approved peptide therapeutics and diagnostics

Drug Discovery Today

, 1409–1419.doi:

Guzmán

Barberis

and

Illanes

(

2007

)

Peptide synthesis: chemical or enzymatic

Electron. J. Biotechnol.

279

–

314

10.1016/S0968-0004(98)01335-8

Hammami

Zouhir

Le Lay

et al. (

2010

)

Bactibase second release: a database and tool platform for bacteriocin characterization

BMC Microbiol.

–

.doi:

10.1186/1471-2180-10-22

Jeffery

C.J.

(

1999

)

Moonlighting proteins

Trends Biochem. Sci.

–

.doi:

Kastin

(

2013

Handbook of Biologically Active Peptides

Academic Press

Google Preview

10.

Korber

B.T.M.

Moore

J.P.

Brander

et al. (

1998

) Hiv molecular immunology compendium.

Los Alamos National Laboratory, Theoretical Biology and Biophysics

Los Alamos, NM

Google Preview

10.1016/j.bmc.2017.06.052

11.

Kumar

Chaudhary

Sharma

et al. (

2015

)

Ahtpdb: a comprehensive platform for analysis and presentation of antihypertensive peptides

Nucleic Acids Res.

D956

–

D962

.doi:

12.

Latham

P.W.

(

1999

)

Therapeutic peptides revisited

Nat. Biotechnol.

755

–

757

.doi:

13.

Lau

J.L.

and

Dunn

M.K.

(

2018

)

Therapeutic peptides: historical perspectives, current development trends, and future directions

Bioorg. Med. Chem.

2700

–

2707

.doi:

14.

Lien

and

Lowman

H.B.

(

2003

)

Therapeutic peptides

Trends Biotechnol.

556

–

562

.doi:

10.1016/j.tibtech.2003.10.005

15.

Medina-Ortiz

Contreras

Amado-Hinojosa

et al.

Combination of digital signal processing and assembled predictive models facilitates the rational design of proteins

. arXiv preprint arXiv:2010.03516 (

2020

16.

Medina-Ortiz

Contreras

Quiroz

et al. (

2020

)

Dmakit: a user-friendly web platform for bringing state-of-the-art data analysis techniques to non-specific users

Inf. Syst.

, 101557.doi:

10.1016/j.is.2020.101557

17.

Medina-Ortiz

Contreras

Quiroz

et al. (

2020

)

Development of supervised learning predictive models for highly non-linear biological, biomedical, and general datasets

Front. Mol. Biosci.

, 13.doi:

10.3389/fmolb.2020.00013

18.

Morrison

R.T.

and

Boyd

R.N.

(

1973

Organic Chemistry

. 3rd edn.

Allyn and Bacon

, Boston, USA.

Google Preview

10.1093/bioinformatics/btx285

19.

Müller

A.T.

Gabernet

Hiss

J.A.

et al. (

2017

)

modlAMP: python for antimicrobial peptides

Bioinformatics

2753

–

2755

.doi:

20.

Novković

Simunić

Bojović

et al. (

2012

)

Dadp: the database of anuran defense peptides

Bioinformatics

1406

–

1407

21.

Pedregosa

Varoquaux

Gramfort

et al. . (

2011

)

Scikit-learn: machine learning in python

J. Mach. Learn. Res.

2825

–

2830

10.1371/journal.pone.0120066

22.

Rajput

Gupta

A.K.

Kumar

et al. (

2015

)

Prediction and analysis of quorum sensing peptides based on sequence features

PLoS One

, e0120066.doi:

23.

Rammensee

H.G.

Bachmann

Emmerich

N.P.N.

et al. (

1999

)

Syfpeithi: database for mhc ligands and peptide motifs

Immunogenetics

213

–

219

.doi:

10.1007/s002510050595

24.

Rodríguez

Asenjo

J.A.

and

Andrews

B.A.

(

2014

)

Design and implementation of a high yield production system for recombinant expression of peptides

Microb. Cell Fact.

–

PubMed

10.1093/bioinformatics/btw753

25.

Schönbach

Koh

J.L.Y.

Sheng

et al. (

2000

)

Fimm, a database of functional molecular immunology

Nucleic Acids Res.

222

–

224

26.

Sosic

and

Sikic

(

2017

)

Edlib: a C/C++ library for fast, exact sequence alignment using edit distance

Bioinformatics

1394

–

1395

.doi:

27.

Srivastava

(ed.) (

2019

Peptide Therapeutics

The Royal Society of Chemistry

, Drug Discovery.

28.

Tossi

and

Sandri

(

2002

)

Molecular diversity in gene-encoded, cationic antimicrobial polypeptides

Curr. Pharm. Des.

743

–

761

.doi:

10.2174/1381612023395475

29.

Uhlig

Kyprianou

and

Martinelli

F.G.

et al. . (

2014

)

The emergence of peptides in the pharmaceutical business: from exploration to exploitation

EuPA Open Proteomics

–

. doi:

10.1016/j.euprot.2014.05.003

10.1371/journal.pone.0181748

30.

Usmani

S.S.

Bedi

and

Samuel

J.S.

et al. . (

2017

)

Thpdb: database of FDA-approved peptide and protein therapeutics

PLoS One

, e0181748.doi:

31.

Usmani

S.S.

Kumar

and

Kumar

et al. . (

2018

)

Antitbpdb: a knowledgebase of anti-tubercular peptides

Database

2018

, bay025 doi:

10.1093/database/bay025

10.1016/j.drudis.2009.10.009

32.

Vlieghe

Lisowski

and

Martinez

et al. . (

2010

)

Synthetic therapeutic peptides: science and market

Drug Discovery Today

–

. doi:

33.

Wang

and

Wang

(

2016

)

Apd3: the antimicrobial peptide database as a tool for research and education

Nucleic Acids Res.

D1087

–

D1093

. doi:

34.

Wang

Yin

and

Xiao

et al. . (

2018

)

StraPep: a structure database of bioactive peptides

Database

2018

, bay038. doi:

10.1093/database/bay038

10.1016/j.sbi.2021.01.008

35.

Wang

and

Wang

(

2004

)

Apd: the antimicrobial peptide database

Nucleic Acids Res.

D590

–

D592

. doi:

36.

Wittmann

B.J.

Johnston

K.E.

and

et al. . (

2021

)

Advances in machine learning for directed evolution

Curr. Opin. Struct. Biol.

–

. doi:

37.

and

et al. . (

2019

)

Recent progress in machine learning-based prediction of peptide activity for drug discovery

Curr. Top. Med. Chem.

–

. doi:

10.2174/1568026619666190122151634

38.

Johnston

K.E.

and

Arnold

F.H.

et al. . (

2021

)

Protein sequence design with deep generative models

Curr. Opin. Chem. Biol.

–

. doi:

10.1016/j.cbpa.2021.04.004

39.

Xiao

Wang

and

Lin

W.-Z.

et al. . (

2013

)

iamp-2l: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types

Anal. Biochem.

436

168

–

177

. doi:

10.1016/j.ab.2013.01.019

40.

H.-C.

You

Z.-H.

and

Zhou

et al. . (

2019

)

Acp-dl: a deep learning long short-term memory model to predict anticancer peptides using high-efficiency feature representation

Mol. Ther. Nucleic Acids

–

. doi:

10.1016/j.omtn.2019.04.025

41.

Zhao

and

et al. . (

2013

)

Lamp: a database linking antimicrobial peptides

PLoS One

, e66557. doi:

10.1371/journal.pone.0066557