Abstract

Emerging studies highlight the importance of protein isoforms, which often exhibit distinct functional roles and contribute to physiological diversity, disease mechanisms, and phenotypic variation, despite originating from the same gene. However, comprehensive isoform-level resources that characterize protein isoforms remain limited. IsoProDB is an integrative and unified one-stop database that aligns protein isoforms from RefSeq and UniProtKB, enabling cross-sequence visualization for protein isoform analysis in humans. It integrates features such as domain architecture, intrinsically disordered regions, sequence variants, transmembrane topology, and 52 distinct post-translational modifications (PTMs) mapped to protein isoforms from multiple resources. Currently, IsoProDB enables users to perform gene wise comparative analyses across 110 149 protein isoforms derived from 20 536 protein-coding genes for all integrated features, supported by effective visualizations. This provides insights into conserved and nonconserved PTM sites, domains, isoform-specific membrane localization, the impact of variants on protein function, and disease relevance across protein isoforms. With specific isoforms emerging as markers and theragnostic targets for various disorders, IsoProDB is integrated with multiple global resources for easy navigation and exploration of multiomics information on isoforms.

Database URL:  https://ciods.in/isoprodb

Introduction

Despite originating from the same gene, protein isoforms have the potential to exhibit distinct biological roles. This has contributed to a growing interest in exploring the structural and functional diversity of proteins at the isoform level. Mechanisms such as alternative splicing, intron retention, and alternative transcription start/stop sites serve to diversify mRNA sequences, yielding different protein isoform [1,2]. In general, it is estimated that >90% of the human genes undergo alternative splicing, and each gene yields, on average, four protein isoforms as reported by previous studies [1,3]. These isoforms often differ in sequence and may have distinct structural and functional properties along with similarities. Moreover, aberrant regulation of protein isoforms has been associated with the development and progression of various diseases. Furthermore, isoforms display functional diversity within these disease contexts. For instance, spleen tyrosine kinase (SYK), a nonreceptor tyrosine kinase, exhibits a dual role in cancer. It acts as an oncogenic driver in leukemia [4], while in certain solid tumours, such as lung and colorectal cancers [5], it promotes cell proliferation, survival, and metastasis. In contrast, SYK acts as a tumour suppressor in breast cancer by inhibiting tumour initiation and progression [6,7]. These opposing functions are largely attributed to changes in the regulation of protein isoforms [8,9]. Moreover, recent studies indicate that the drugs miss their targets when an isoform switch occurs in the synthesis of the target protein. For instance, trastuzumab (a HER2-targeted therapy) is widely used for the treatment of patients with metastatic breast tumours overexpressing HER2. But the exon-spliced variant of HER2 shows poor recognition for the monoclonal antibody trastuzumab, resulting in drug resistance [10–12]. In addition, protein isoforms are increasingly being recognized as diagnostic biomarkers for a range of diseases, including various cancers [13–15]. This growing recognition underscores the significance of uncovering structural and expression-level differences in protein isoforms. Such analysis provides significant insights across various domains, including diagnostics, prognostics, and therapeutics, to drive global efforts in exploring disease-specific sequence, structural, and functional divergences.

Among the factors that influence protein function, post-translational modifications (PTMs) have been illustrated to have significant functional consequences [16]. PTMs are essential for the diversity of cellular functions, including signalling pathways [17], protein stability [18,19], protein–protein interactions, protein localization [20], and enzymatic activity [21]. They are also linked to several diseases, such as cancer [22], cardiovascular [23], renal [24], neurological [25], and metabolic disorders [26,27]. The rapid advancement of technologies in structural biology, proteomics, and pharmacology has made PTMs as key targets in disease research. This focus is also extendable to protein isoforms, as the isoform-specific diversity of proteins can modify the regulation and overall function of a protein [28–30]. Notably, the interplay between alternative splicing and PTMs adds an additional layer of complexity to protein regulation, as isoform variation can lead to the loss or gain of PTM sites, resulting in altered protein function. Such protein isoform-level variations are also significant to transmembrane (TM) proteins, which constitute over 20% of the human proteome and include enzymes, transporters, ion channels, and receptors of various physiological and nonphysiological ligands. They are critical for various cellular functions, including ion/molecule transport, cell adhesion, ligand–receptor interaction, catalysis of molecular reactions in biological membranes, and more generally, mediating cell–cell interactions and intracellular signal transduction [31,32]. Many of them serve as key drug targets due to their accessibility and regulatory roles in cellular signalling. Alternative splicing in these proteins can also lead in distinct TM topology attributable to the variations in the number or amino acid position of transmembrane regions (TMRs), extracellular loops, or cytoplasmic domains [33–35]. This in turn affects ligand binding, protein interactions, ion selectivity, and signalling [36].

Accounting for these features, protein isoforms may demonstrate functional divergence due to variant constitution of their domains and intrinsically disordered regions (IDRs). Domains are typically highly conserved segments of the protein sequence that confer defined structural or functional roles [37]. Although IDRs lack a fixed three-dimensional structure, they are significant for diverse cellular processes, including signal transduction [38], transcriptional and translational regulation [39], RNA processing, cell cycle control, and small molecule storage [40,41]. Isoform-specific variation in these regions may alter protein stability, interaction networks, and functional outcomes, underscoring the need to consider domain and IDR differences when studying protein function at the isoform level. Although there are several tools to analyse the IDR in proteins, analyses at the isoform level remain largely unexplored [42–45].

Initial large-scale efforts were made to develop resources and studies that facilitate comprehensive analyses of protein expression and diversity [46,47]. In this context, the recent establishment of various splicing variant databases, such as OncoSplicing [48], AScancerAtlas [49], APPRIS [50], ISOexpresso [51], FLIBase [52], ISOdb [53], and Aspdb [54], has yielded comprehensive information pertaining to isoforms.. However, the resources provide comprehensive annotation of protein-level features including TM topology, domain organization, IDRs, and PTMs in a single platform remain limited. Therefore, to meet these demands, we developed IsoProDB (https://ciods.in/isoprodb), an integrated unified database for the analysis of similarities and differences of protein isoforms in humans. This database includes protein isoforms from primary resources (RefSeq and UniProtKB) and provides information on PTMs, protein domain regions including isoform-specific IDRs, predicted topology of TM proteins, and a map of known genetic variants of protein isoforms along with their conservation across isoforms. Currently, IsoProDB covers 110 149 isoforms corresponding to 20 536 protein-coding genes from RefSeq and UniProtKB, offering successful matches for 46 547 protein isoforms across these resources. IsoProDB is accessible to researchers from diverse backgrounds, enabling them to explore and compare protein isoforms with ease, regardless of their bioinformatics expertise.

Materials and methods

Data resources

Experimentally validated and predicted protein isoforms corresponding to 20 536 protein-coding genes and the gene information were downloaded from the RefSeq (Release-230) and UniProtKB (Release 2025_02) databases. The RefSeq dataset contained 198 024 transcripts with 101 440 protein isoforms, while the UniProtKB dataset contained 227 203, including 42 534 reviewed (Swiss-Prot) and 184 668 unreviewed (TrEMBL) protein isoform sequences. Mapping between the RefSeq and UniProt accessions of these protein isoforms was performed using in-house Python scripts based on exact full length sequence matching. RefSeq proteins with 100% sequence identity across the entire sequence length to a UniProtKB entry were considered valid matches. Initially, all RefSeq protein isoforms were mapped against 42 534 reviewed (Swiss-Prot) protein isoforms, and the remaining unmapped RefSeq protein isoforms were subsequently compared against 184 668 unreviewed (TrEMBL) UniProtKB protein isoforms. This mapping resulted in 46 547 protein isoforms shared between RefSeq and UniProtKB, and 8710 and 56 340 unique protein isoforms were identified exclusively in UniProtKB (reviewed entries) and RefSeq, respectively. As a result, 110 149 protein isoforms corresponding to 20 536 genes, including all reviewed UniProtKB protein isoforms and only the mapped unreviewed (TrEMBL) protein isoforms from UniProtKB, as well as all RefSeq protein isoforms, were included in IsoProDB. Notably, in the RefSeq database, multiple accessions are provided to identical proteins depending on the different transcript IDs. These identical proteins are grouped by ‘isoform column’ in transcript table provided by RefSeq to remove redundancy and only one representative protein accession is displayed in the PTMs, domain, sequence variant, and topology sections of the web interface.

Topology of TM proteins

Although TM proteins account for over 20% of the human proteome, difficulty in determining their structures has resulted in the deposition of a limited number of TM protein structures in the Protein Data Bank [55,56]. To attain the topology of all TM proteins, we used DeepTMHMM v1.0.1 [57], a deep learning-based protein language model that is based upon a deep learning encoder–decoder sequence-to-sequence model that takes a protein sequence as input and outputs the corresponding per-residue sequence of labels. The per-residue labels are signal peptide (S), inside cell/cytosol (I), alpha membrane (M), beta membrane (B), periplasm (P) and outside cell/lumen of ER/Golgi/lysosomes (O). The sequence of residue labels defines the topology of the protein. To analyse the topology, we input the FASTA sequence of all 110 149 protein isoforms in the tool, and the analysis resulted in 22 049 protein isoforms corresponding to 5098 genes as TM proteins, and 16 732 genes were found to have zero TMRs, i.e. nontransmembrane protein isoforms. 99.92% of these proteins have alpha helix structures, and 14 protein isoforms are found to have beta sheets. Among 5098 genes, 1294 are predicted to have at least one nontransmembrane protein isoform along with TM protein isoforms. There are 11 118 protein isoforms corresponding to 1294 genes, in which 3885 are nontransmembrane protein isoforms and 7233 protein isoforms are found to have TMRs. Interestingly, it was noticed that 1230 of the 5098 genes had isoforms with varying numbers of TMRs. The 9098 isoforms with different TMR composition included both multipass and single-pass membrane proteins.

PTMs

To collate known PTMs from protein isoforms, PTMs data were downloaded from the dbPTM [58], qPTM [59], and PhosphoSitePlus [60] databases. The data included PTM site, modification type, and protein accessions (UniProt accessions) along with their corresponding references. All PTM sites corresponding to each protein isoform retrieved from the databases were uniformly and nonredundantly mapped to the respective protein isoform sequences by matching UniProt accessions. The mapping process further involved cross-verification of the modified amino acid residue positions against the corresponding protein isoform sequences in the database to ensure positional accuracy and data integrity. This resulted in the mapping of 840 734 PTM sites featuring 52 types of PTMs across 31 325 protein isoforms corresponding to 19 124 genes. Phosphorylation was found to be the most observed PTM, mapped to 636 320 sites corresponding to 27 834 protein isoforms, followed by ubiquitination (165 420 sites in 20 338 protein isoforms) and methylation (131 404 sites in 20 275 protein isoforms). Protein H3C4 (P68431) was found to be the highest modified protein (18 PTM types), while 1107 proteins had no known PTMs attributed in any of the databases.

To assess the conservation and uniqueness of PTM sites among protein isoforms, a sequence alignment-based conservation analysis was incorporated. The conservation of sites is analysed by aligning the protein isoform sequences using BioMSA, a JavaScript library that enables local alignment of sequences (DNA or protein) within the browser. After alignment, a nine-amino acid window centred on each PTM site was evaluated to assess sequence conservation across isoforms of the same protein. A PTM site was considered conserved if this nine-amino acid region was aligned and preserved across all protein isoforms. Along with the conservation, the alignment provides sites which are specific to protein isoforms.

Disordered regions and functional domains

Even though there are various tools to explore the IDRs in proteins, the variation of these across the isoforms is unexplored. Moreover, the information of experimentally validated IDRs in proteins is limited. The disordered regions in protein isoforms were predicted using InterProScan, a standalone tool developed by InterPro [61]. InterProScan uses MoBiDB, a comprehensive database that provides annotations and predictions of IDRs in proteins from protein sequences. The analysis resulted in the prediction of IDRs in 32 389 protein isoforms belonging to 11 748 genes. Similarly, domain details of protein isoforms were analysed using InterProScan, which integrates FASTA sequence-driven domain architecture data from six different databases, including Pfam [62], Conserved Domains Database (CDD) [63], PROSITE [64], CATH-Gene3D [65], PRINTS [66], and SMART [67]. As a result, IsoProDB contains comprehensive domain architecture for 40 205 protein isoforms corresponding to 14 261 genes, which is highly useful for comparing isoform-specific domain architecture from different domain resources and for comparative analysis with other isoforms of the same protein.

Clinically relevant sequence variants among isoforms

Sequence variation among protein isoforms plays a critical role in defining their structural/functional diversity and similarity, making it essential to analyse isoform-specific sequences when investigating protein isoform characteristics. To enable analysis of the impact of sequence variation on protein isoforms, variant data were obtained from the ClinVar [68] and gNOMAD [69] databases. The data included variation in the transcript, subsequent change in the protein sequence, variant type, clinical significance, transcript identifier, and protein accession from both the databases and removed redundancy in data. This variance data was mapped with protein isoforms by the RefSeq protein accession, resulting in the catalogue of 65 039 protein isoforms corresponding to 18 498 genes with sequence variants.

Technological framework for development of IsoProDB

The IsoProDB database was developed using a Django framework running in a Docker container in the backend, and the frontend was built with React.js and styled using Tailwind CSS. My Structured Query Language (MySQL) database management system was utilized for the storage and management of data, providing robust and scalable solutions for data management. The database, along with its packages makes storing and retrieving data simple, fast, and useful for the application. All codes and scripts used for data analysis and integration in this study are available at the GitHub repository https://github.com/sree-pathappillil/IsoProtDB. A schematic overview of the methodology is provided in Fig. 1.

Schematic representation of the workflow, methods, and data sources involved in the development of IsoProDB.
Figure 1

The schematic illustrates the methods and resources used for the development of IsoProDB.

Results and discussion

The web interface

IsoProDB is an integrative platform developed to explore the diversity of human protein isoforms. It uniquely aligns various protein isoforms from primary resources and enables cross-sequence visualization with integrated features such as protein domains, IDRs, sequence variants, TM topology, and over 52 PTMs mapped across isoforms. IsoProDB currently supports the interactive visualization and comparative analysis of sequence, structure, and function for 110 149 protein isoforms derived from 20 536 human protein-coding genes in humans. The web interface is organized into distinct sections that enable comparative analysis of protein isoforms in terms of PTMs, domains, and IDRs, as well as sequence variants and TM topology. Figure 2 shows an overview of the IsoProDB user interface using the example gene ABCC4.

Overview of the IsoProDB web interface with subfigures A–F, illustrating protein isoform summary, topology (TMRs), domain/IDR features, PTMs, and sequence variants with corresponding visualizations of ABCC4.
Figure 2

The figure illustrates the isoform-specific exploration of ABCC4. The search bar navigates to the summary page, where (A) the transcript table lists all the transcripts and isoforms of ABCC4 along with the protein information. (B) Topology section shows the number of TMRs in all protein isoforms along with their range in protein sequence. (C) The domain details of three protein isoforms, with parent databases that contribute to domain details are highlighted in different colours. (D) The IDRs of three protein isoforms are shown in the disordered region section within the domain tab. (E) The site table lists all the modified sites in three protein isoforms. The site is labelled with all the reported modifications along with a sequence window. The aligned sites are given in a row for each site. (F) All the variants in the protein isoforms are listed in the table along with the transcript and protein level changes of ABCC4.

The Home page provides the statistics and overview of the database along with a search bar (Fig. S1). Users can browse IsoProDB by gene symbol (e.g. ABCC4 and BRCA1) through search bar on the homepage, which directs to the protein summary page, where the information of the query gene is provided with a list of protein isoforms along with the transcripts as a table (Fig. 2A; Fig. S2A). The page also provides the sequence alignment of all isoforms aimed at analysing the regions that are conserved and those that vary across isoforms (Fig. S2B). Along with the summary page, the user can access all five sections in the database, including PTMs, TM topology, domain and sequence variants, and each of these sections provides information and visualization of unique protein isoforms of the gene of interest. Even though the summary page lists all the transcripts and isoforms of the queried gene, only unique isoforms are represented in the rest of the tabs.

Adjacent to the protein summary, the TM Topology section provides the predicted TM topology for each protein isoform of the query gene. The accompanying table (Fig. S3) and visualization support comparative analysis across protein isoforms, allowing users to easily identify similarities and differences in the number and range of TMRs among them (Fig. 2B). This section also includes results for the nontransmembrane proteins in such a way that they are either inside or outside the membrane.

The PTMs section in IsoProDB includes 52 types of PTMs detected in 20 536 protein-coding genes. The query for each gene highlights only the reported PTMs in these 52 types of protein isoforms and allows users to choose any of them for analysis. Once selected, the sites that are reported with the selected PTM type are displayed on the corresponding protein isoform within the alignment view section, alongside a detailed data table. The section also provides a site conservation analysis, which identifies regions with PTM sites conserved across protein isoforms, and this helps to infer the potential occurrence of PTMs in protein isoforms, where these sites have not yet been experimentally detected (Fig. 3). The site table lists all reported PTMs across protein isoforms and shows how specific they are to each protein isoform (Fig. 2E). In this context, specificity refers to whether a modified site is unique to a specific protein isoform or conserved across various isoforms, as determined by the sequence alignment of all protein isoforms. This, along with interactive visualizations, enables a clear and effective comparative analysis of selected PTMs across different isoforms.

Figure illustrating the steps in PTM conservation analysis of ABCC4, including PTM selection, distribution and conservation of modification sites across protein isoforms, and their positions in the aligned sequence.
Figure 3

The figure illustrates the conservation analysis of PTM sites in ABCC4. (A) The ABCC4 is reported with a total of five PTMs, and the user can select any of them for conservation analysis. The selection of ubiquitination lists the number of reported ubiquitinated sites across the protein isoforms of ABCC4. (B) The conservation analysis button shows all the conserved ubiquitinated sites among the isoforms and highlights them in blue colour within the alignment viewer in all isoforms. The diamond symbol indicates the reported ubiquitinated sites in protein isoforms.

The Domain section integrates both protein domain and IDR information along with the visualization and data table for all the protein isoforms of the queried gene (Fig. 2C and D). The data table includes domain names, amino acid ranges, and corresponding references to parent databases for more information (Figs S4 and S5). This display helps in understanding domain loss or gain, isoform-specific structural flexibility, and potential regulatory regions that contribute to the functional diversity of protein isoforms. Following the domain, the sequence variant section integrates variant information from the gnomAD and ClinVar databases. The page provides information on the variants, including transcript and protein-level consequences and identifiers, along with entry identifiers from the reference database (Fig. 2F). Moreover, the user can filter the protein isoforms and variant types in the results. This section is designed to help users understand how genetic variation may affect different isoforms, enabling insights into isoform-specific impacts of variants and their potential clinical relevance. Moreover, a graphical representation is provided in the variant section, in addition to the existing tabular data, to improve clarity and usability. Users can download the data corresponding to the queried gene in each section of the database. Furthermore, the information on genes and isoforms in IsoProDB is accessible through the download option in the database. This improves data transparency and enables users to better explore and analyse IsoProDB data. The Help page and FAQ section provide additional information to guide users when accessing and using the database. For further queries, users can reach out to the team through the Contact page.

Intended functionality and user benefits

IsoProDB is an integrative and user-friendly visualizable platform developed to explore the diverse protein isoforms in humans, particularly considering that their annotation often varies across primary resources and many of them remain unintegrated across resources. Towards this, IsoProDB aligns diverse protein isoforms from UniProtKB and RefSeq, providing mapping between accessions from both databases. It focuses on integrating protein features, such as domain architecture, PTMs, IDRs, TM topology, and sequence variants at the protein isoform level, providing a consolidated view of these attributes within a single framework. Recent studies underscore that variation in the number of TMRs among protein isoforms is a significant factor that deserves greater attention, as it can lead to distinct patterns of tissue-specific expression, subcellular localization, and functional divergence. Extending this concept, vascular endothelial growth factor receptor 1 (VEGFR1/FLT1) also exists as TM and soluble isoforms, illustrating how isoform diversity can have diagnostic relevance. The soluble VEGFR1 (sFLT1) isoforms serve as clinical biomarkers, as their quantification is used to determine the sFLT1: PlGF ratio, a diagnostic indicator that predicts the absence of pre-eclampsia. Although the antibodies used for this assay recognize all VEGFR1 isoforms, the test effectively measures only the soluble FLT1 isoforms since the others are TM forms (Fig. S6) [15, 29,70]. Considering these, the TM topology in IsoProDB offers new perspectives on the concept that TMR-based isoform diversity should be recognized as a key factor in understanding protein function at the isoform level and should be more broadly discussed in the scientific community. IsoProDB acts as a platform that gives information about the TMRs of all protein isoforms, allowing users to analyse nontransmembrane and transmembrane isoforms, variations in the TMR number, etc., of the proteins of interest. In a similar vein, PTMs show notable differences across protein isoforms, adding another layer of functional complexity. For example, STAT3 and its phosphorylated sites have been a promising target in cancer treatments for years, and Y705 is one of them [71–75]. In our analysis, it is found that Y705 is conserved across all 16 STAT3 protein isoforms except NP_001371917.1. Phosphorylation at this site has been reported in only three isoforms, despite its conservation in 16 protein isoforms (Fig. S7). Similarly, in hepatocellular carcinoma (HCC), the isoforms of spleen tyrosine kinase, SYK(L) and SYK(S), exhibit distinct expression patterns and functional roles. SYK(L) expression is generally downregulated in HCC tumour samples compared to normal liver tissue, whereas SYK(S) levels are elevated. This differential regulation is linked to CHK1-mediated phosphorylation, which promotes proteasomal degradation of SYK(L) in HCC [76]. The phosphorylation site (S295) targeted by CHK1 is located within the spliced (DEL) region present only in SYK(L) but absent in SYK(S); consequently, elevated CHK1 levels lead to the selective loss of the full-length SYK(L) without affecting SYK(S) expression (Fig. S8). Functionally, SYK(S) enhances cellular invasion, whereas SYK(L) suppresses metastasis in HCC [77]. The PTM section in IsoProDB allows users to explore and enhance their understanding of patterns of conservation and specificity of reported PTM sites among the protein isoforms, which contributes to a better grasp of protein function, regulation, and potential therapeutic or biomarker applications. Beyond PTMs and TM topology, IsoProDB provides a quick overview of protein isoforms in terms of domain and IDR, as well as sequence variants. Moreover, the data visualization in each section is intended to provide a comparative analysis across protein isoforms for the quick understanding of differences and similarities across protein isoforms. Beyond comparative analysis, IsoProDB enables researchers to investigate the specificity or conservation of particular sites among isoforms and to explore their functional relevance in terms of domains, IDRs, PTMs, and clinical significance associated with sequence variants, all within a single platform.

Comparison of IsoProDB with other databases

The development of various isoform databases has emerged in recent years, providing various information regarding the protein isoforms, such as OncoSplicing [48], AScancerAtlas [49], FLIBase [52], ISOdb [53], CanIsoNet [78], APPRIS [50], and ASpdb [54]. Most of these resources focus on specific diseases like cancer. OncoSplicing and AScancerAtlas provide information on splicing events in cancer, while FLIBase characterizes and catalogues full-length isoforms using long-read RNA sequencing techniques. In contrast, CanIsoNet focuses on disease-specific isoform interaction networks in humans. APPRIS annotates and identifies principal splice isoforms of protein-coding genes by integrating structural, functional, and domain-based information.

Moreover, while ASpdb provides structural analysis and explores the relationship between splicing variants, diseases, and drugs in humans and APPRIS annotates splice isoforms with protein structure, domains, and TM helices, no other resource brings together important features such as TM topology, domain composition, IDRs, PTMs, and clinical variants in one database. IsoProDB aims to provide detailed information on these features for all listed unique protein isoforms of a queried gene through a structured, unified platform. Interestingly, most of these databases provide information at the transcript level rather than at the protein level, whereas IsoProDB characterizes all protein isoforms derived from the RefSeq and UniProtKB databases. IsoProDB covers 110 149 isoforms corresponding to 20 536 protein-coding genes and successfully matches 46 547 protein isoforms across these resources. This comprehensive integration and unique coverage position IsoProDB as a valuable resource for in-depth isoform-level protein analysis, broadens the scope of existing resources. The comparison of IsoProDB between the existing isoform databases are provided in Table S1.

Future development and maintenance

IsoProDB addresses extensive isoform-level data. However, there are a few limitations that anticipate updating in the future. First, IsoProDB provides PTM conservation among isoforms by considering primary sequence alignment due to the limited availability of well-defined tertiary structures. This limitation restricts our ability to account for structural context when evaluating the functional conservation of PTM sites. We intend to integrate tertiary structural information in the future, facilitating a more precise and spatially informed comprehension of PTM conservation across protein isoforms. Additionally, the current version of IsoProDB does not incorporate interactors of proteins at the isoform level. The inclusion of isoform-specific interaction networks in future updates would provide deeper insights into the distinct functional roles and regulatory mechanisms of individual protein isoforms. Furthermore, although clinical relevance has been partially addressed through the integration of sequence variant data, direct associations between specific isoforms and disease phenotypes are not yet established. We intend to incorporate such information as an additional data layer to further support disease-focused and clinical research applications.

Conclusion

IsoProDB serves as an integrative platform for the understanding and analysis of protein isoform diversity in humans. In recent years, protein isoform characterization has gained considerable attention since differences among isoforms can have significant functional implications in diverse areas, particularly therapeutics. The regulatory mechanism, cellular localization, functional properties of protein isoforms in terms of PTMs, domain, and IDRs, and the TM topology are not explored in the existing databases, making IsoProDB unique. And these characterizations at the protein level of all the isoforms, regardless of their expression in specific diseases, establish IsoProDB as a reference database for global users. Moreover, the inclusion of all protein isoforms from UniProtKB and RefSeq and the cross-reference between the databases provides a user-friendly unified platform for the analysis. Although IsoProDB currently provides protein isoform information at the sequence level, future updates will extend its coverage to include structural-level details. Together, IsoProDB addresses a critical gap in the field by providing a powerful, isoform-centric platform, serving as a foundational resource for future discoveries in genomics, structural biology, and translational research.

Acknowledgements

We thank Yenepoya (Deemed to be University), Mangalore, India for their support in establishing the computational facility at CIODS. This article contains no studies with human participants or animals performed by any of the authors.

Author contributions

R.R.: Conceptualization; S.U., S.S., and S.P.S.: Methodology, Software (database development); S.P.S., S.U., P.B.S., and M.N.: Data curation, Formal analysis; S.P.S.: Writing – original draft; P.R., M.A., Y.S., and R.R.: Validation, Writing – review & editing.

Conflicts of interest

The authors declare that they have no competing interests.

Funding

M.A. is thankful to the ongoing Research Funding Program, (ORF-2026-984), King Saud University, Riyadh, Saudi Arabia. S.P.S. is recipient of the Senior Research Fellowship from Yenepoya (Deemed to be University).

Data availability

All codes and scripts used for data analysis and integration in this study are available at the GitHub repository https://github.com/sree-pathappillil/IsoProtDB. The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

References

1.

Wang
 
ET
,
Sandberg
 
R
,
Luo
 
S
 et al.  
Alternative isoform regulation in human tissue transcriptomes
.
Nature
.
2008
;
456
:
470
76
.

2.

Reyes
 
A
,
Huber
 
W
.
Alternative start and termination sites of transcription drive most transcript isoform differences across human tissues
.
Nucleic Acids Res
.
2018
;
46
:
582
92
.

3.

Melamud
 
E
,
Moult
 
J
.
Structural implication of splicing stochastics
.
Nucleic Acids Res
.
2009
;
37
:
4862
72
.

4.

Buchner
 
M
,
Fuchs
 
S
,
Prinz
 
G
 et al.  
Spleen tyrosine kinase is overexpressed and represents a potential therapeutic target in chronic lymphocytic leukemia
.
Cancer Res
.
2009
;
69
:
5424
32
.

5.

Denis
 
V
,
Cassagnard
 
N
,
Del Rio
 
M
 et al.  
Targeting the splicing isoforms of spleen tyrosine kinase affects the viability of colorectal cancer cells
.
PLoS One
.
2022
;
17
:
e0274390
.

6.

Coopman
 
PJ
,
Do
 
MT
,
Barth
 
M
 et al.  
The Syk tyrosine kinase suppresses malignant growth of human breast cancer cells
.
Nature
.
2000
;
406
:
742
47
.

7.

Wang
 
L
,
Devarajan
 
E
,
He
 
J
 et al.  
Transcription repressor activity of spleen tyrosine kinase mediates breast tumor suppression
.
Cancer Res
.
2005
;
65
:
10289
97
.

8.

Krisenko
 
MO
,
Geahlen
 
RL
.
Calling in SYK: SYK’s dual role as a tumor promoter and tumor suppressor in cancer
.
Biochim Biophys Acta Mol Cell Res
.
2015
;
1853
:
254
63
.

9.

Zhang
 
J
,
Manley
 
JL
.
Misregulation of pre-mRNA alternative splicing in cancer
.
Cancer Discov
.
2013
;
3
:
1228
37
.

10.

Wang
 
L
,
Wang
 
Y
,
Li
 
Y
 et al.  
Resistance mechanisms and prospects of trastuzumab
.
Front Oncol
.
2024
;
14
:
1389390
.

11.

Wang
 
ZH
,
Zheng
 
ZQ
,
Jia
 
SC
 et al.  
Trastuzumab resistance in HER2-positive breast cancer: mechanisms, emerging biomarkers and targeting agents
.
Front Oncol
.
2022
;
12
:
1006429
.

12.

Mitra
 
D
,
Brumlik
 
MJ
,
Okamgba
 
SU
 et al.  
An oncogenic isoform of HER2 associated with locally disseminated breast cancer and trastuzumab resistance
.
Mol Cancer Ther
.
2009
;
8
:
2152
62
.

13.

Higgins
 
G
,
Roper
 
KM
,
Watson
 
IJ
 et al.  
Variant Ciz1 is a circulating biomarker for early-stage lung cancer
.
Proc Natl Acad Sci
.
2012
;
109
:
E3128
35
.

14.

Coverley
 
D
,
Higgins
 
G
,
West
 
D
 et al.  
A quantitative immunoassay for lung cancer biomarker CIZ1b in patient plasma
.
Clin Biochem
.
2017
;
50
:
336
43
.

15.

Rowson
 
S
,
Reddy
 
M
,
De Guingand
 
DL
 et al.  
Comparison of circulating total sFLT-1 to placental-specific sFLT-1 e15a in women with suspected preeclampsia
.
Placenta
.
2022
;
120
:
73
78
.

16.

Crowl
 
S
,
Coleman
 
MB
,
Chaphiv
 
A
 et al.  
Systematic analysis of the effects of splicing on the diversity of post-translational modifications in protein isoforms using PTM-POSE
.
Cell Syst
.
2025
;
16
:
101318
.

17.

Pawson
 
T
,
Scott
 
JD
.
Protein phosphorylation in signaling—50 years and counting
.
Trends Biochem Sci
.
2005
;
30
:
286
90
.

18.

Lee
 
JM
,
Hammaren
 
HM
,
Savitski
 
MM
 et al.  
Control of protein stability by post-translational modifications
.
Nat Commun
.
2023
;
14
:
201
.

19.

Ciechanover
 
A
.
Proteolysis: from the lysosome to ubiquitin and the proteasome
.
Nat Rev Mol Cell Biol
.
2005
;
6
:
79
87
.

20.

Kuwahara
 
H
,
Nishizaki
 
M
,
Kanazawa
 
H
.
Nuclear localization signal and phosphorylation of Serine350 specify intracellular localization of DRAK2
.
J Biochem
.
2008
;
143
:
349
58
.

21.

Kubatzky
 
KF
,
Gao
 
Y
,
Yu
 
D
.
Post-translational modulation of cell signalling through protein succinylation
.
Explor Target Antitumor Ther
.
2023
;
4
:
1260
85
.

22.

Fraga
 
MF
,
Ballestar
 
E
,
Villar-Garea
 
A
 et al.  
Loss of acetylation at Lys16 and trimethylation at Lys20 of histone H4 is a common hallmark of human cancer
.
Nat Genet
.
2005
;
37
:
391
400
.

23.

Li
 
S
,
Zhu
 
Z
,
Xue
 
M
 et al.  
Fibroblast growth factor 21 protects the heart from angiotensin II-induced cardiac hypertrophy and dysfunction via SIRT1
.
Biochim Biophys Acta Mol Basis Dis
.
2019
;
1865
:
1241
52
.

24.

Xu
 
L
,
Zhou
 
Y
,
Wang
 
G
 et al.  
The UDPase ENTPD5 regulates ER stress-associated renal injury by mediating protein N-glycosylation
.
Cell Death Dis
.
2023
;
14
:
166
.

25.

Li
 
C
,
Gotz
 
J
.
Tau-based therapies in neurodegeneration: opportunities and challenges
.
Nat Rev Drug Discov
.
2017
;
16
:
863
83
.

26.

Lee
 
JH
,
Liu
 
R
,
Li
 
J
 et al.  
Stabilization of phosphofructokinase 1 platelet isoform by AKT promotes tumorigenesis
.
Nat Commun
.
2017
;
8
:
949
.

27.

Jeon
 
SM
,
Lim
 
JS
,
Park
 
SH
 et al.  
Wnt signaling promotes tumor development in part through phosphofructokinase 1 platelet isoform upregulation
.
Oncol Rep
.
2021
;
46
:
234
.

28.

Voss
 
K
,
Gamblin
 
TC
.
GSK-3beta phosphorylation of functionally distinct tau isoforms has differential, but mild effects
.
Mol Neurodegen
.
2009
;
4
:
18
.

29.

Kjer-Hansen
 
P
,
Phan
 
TG
,
Weatheritt
 
RJ
.
Protein isoform-centric therapeutics: expanding targets and increasing specificity
.
Nat Rev Drug Discov
.
2024
;
23
:
759
79
.

30.

Yuen
 
S
,
Ogut
 
O
,
Brozovich
 
FV
.
MYPT1 protein isoforms are differentially phosphorylated by protein kinase G
.
J Biol Chem
.
2011
;
286
:
37274
79
.

31.

Chataigner
 
LMP
,
Leloup
 
N
,
Janssen
 
BJC
.
Structural perspectives on extracellular recognition and conformational changes of several type-I transmembrane receptors
.
Front Mol Biosci
.
2020
;
7
:
129
.

32.

Aguayo-Ortiz
 
R
,
Creech
 
J
,
Jimenez-Vazquez
 
EN
 et al.  
A multiscale approach for bridging the gap between potency, efficacy, and safety of small molecules directed at membrane proteins
.
Sci Rep
.
2021
;
11
:
16580
.

33.

Xing
 
Y
,
Xu
 
Q
,
Lee
 
C
.
Widespread production of novel soluble protein isoforms by alternative splicing removal of transmembrane anchoring domains
.
FEBS Lett
.
2003
;
555
:
572
78
.

34.

Gonzalez
 
A
,
Borquez
 
M
,
Trigo
 
CA
 et al.  
The splice variant of the V2 vasopressin receptor adopts alternative topologies
.
Biochemistry
.
2011
;
50
:
4981
86
.

35.

Bidaux
 
G
,
Beck
 
B
,
Zholos
 
A
 et al.  
Regulation of activity of transient receptor potential melastatin 8 (TRPM8) channel by its short isoforms
.
J Biol Chem
.
2012
;
287
:
2948
62
.

36.

Clark
 
MB
,
Wrzesinski
 
T
,
Garcia
 
AB
 et al.  
Long-read sequencing reveals the complex splicing profile of the psychiatric risk gene CACNA1C in human brain
.
Mol Psychiatry
.
2020
;
25
:
37
47
.

37.

Basu
 
MK
,
Poliakov
 
E
,
Rogozin
 
IB
.
Domain mobility in proteins: functional and evolutionary implications
.
Briefings Bioinf
.
2009
;
10
:
205
16
.

38.

Iakoucheva
 
LM
,
Brown
 
CJ
,
Lawson
 
JD
 et al.  
Intrinsic disorder in cell-signaling and cancer-associated proteins
.
J Mol Biol
.
2002
;
323
:
573
84
.

39.

Miao
 
J
,
Chong
 
S
.
Roles of intrinsically disordered protein regions in transcriptional regulation and genome organization
.
Curr Opin Genet Dev
.
2025
;
90
:
102285
.

40.

Lieutaud
 
P
,
Ferron
 
F
,
Uversky
 
AV
 et al.  
How disordered is my protein and what is its disorder for? A guide through the “dark side” of the protein universe
.
Intrinsically Disordered Proteins
.
2016
;
4
:
e1259708
.

41.

McConnell
 
BS
,
Parker
 
MW
.
Protein intrinsically disordered regions have a non-random, modular architecture
.
Bioinformatics
.
2023
;
39
:
btad732
.

42.

Hatos
 
A
,
Hajdu-Soltesz
 
B
,
Monzon
 
AM
 et al.  
DisProt: intrinsic protein disorder annotation in 2020
.
Nucleic Acids Res
.
2020
;
48
:
D269
D76
.

43.

Piovesan
 
D
,
Necci
 
M
,
Escobedo
 
N
 et al.  
MobiDB: intrinsically disordered proteins in 2021
.
Nucleic Acids Res
.
2021
;
49
:
D361
67
.

44.

Meszaros
 
B
,
Erdos
 
G
,
Dosztanyi
 
Z
.
IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding
.
Nucleic Acids Res
.
2018
;
46
:
W329
37
.

45.

Orlando
 
G
,
Raimondi
 
D
,
Codice
 
F
 et al.  
Prediction of disordered regions in proteins with recurrent neural networks and protein dynamics
.
J Mol Biol
.
2022
;
434
:
167579
.

46.

Kim
 
MS
,
Pinto
 
SM
,
Getnet
 
D
 et al.  
A draft map of the human proteome
.
Nature
.
2014
;
509
:
575
81
.

47.

Aravind
 
A
,
Nandakumar
 
R
,
Ahmed
 
M
 et al.  
REMEMProt: a resource of membrane-enriched proteome profiles, their disease associations, and biomarker status
.
Life Sci Allian
.
2024
;
7
:
e202302443
.

48.

Zhang
 
Y
,
Liu
 
K
,
Xu
 
Z
 et al.  
OncoSplicing 3.0: an updated database for identifying RBPs regulating alternative splicing events in cancers
.
Nucleic Acids Res
.
2025
;
53
:
D1460
66
.

49.

Wu
 
S
,
Huang
 
Y
,
Zhang
 
M
 et al.  
ASCancer Atlas: a comprehensive knowledgebase of alternative splicing in human cancers
.
Nucleic Acids Res
.
2023
;
51
:
D1196
204
.

50.

Rodriguez
 
JM
,
Pozo
 
F
,
Cerdan-Velez
 
D
 et al.  
APPRIS: selecting functionally important isoforms
.
Nucleic Acids Res
.
2022
;
50
:
D54
59
.

51.

Yang
 
IS
,
Son
 
H
,
Kim
 
S
 et al.  
ISOexpresso: a web-based platform for isoform-level expression analysis in human cancer
.
BMC Genomics
.
2016
;
17
:
631
.

52.

Shi
 
Q
,
Li
 
X
,
Liu
 
Y
 et al.  
FLIBase: a comprehensive repository of full-length isoforms across human cancers and tissues
.
Nucleic Acids Res
.
2024
;
52
:
D124
33
.

53.

Xie
 
SQ
,
Han
 
Y
,
Chen
 
XZ
 et al.  
ISOdb: a comprehensive database of full-length isoforms generated by iso-seq
.
Int J Genomics
.
2018
;
2018
:
1
.

54.

Yang
 
Y
,
Kumar
 
H
,
Xie
 
Y
 et al.  
ASpdb: an integrative knowledgebase of human protein isoforms from experimental and AI-predicted structures
.
Nucleic Acids Res
.
2025
;
53
:
D331
39-D9
.

55.

Zhu
 
J
,
Lu
 
P
.
Computational design of transmembrane proteins
.
Curr Opin Struct Biol
.
2022
;
74
:
102381
.

56.

Duart
 
G
,
Grana-Montes
 
R
,
Pastor-Cantizano
 
N
 et al.  
Experimental and computational approaches for membrane protein insertion and topology determination
.
Methods
.
2024
;
226
:
102
19
.

57.

Hallgren
 
J
,
Tsirigos
 
KD
,
Pedersen
 
MD
 et al.  
DeepTMHMM predicts alpha and beta transmembrane proteins using deep neural networks
.
bioRxiv
.
2022
;
2022
2004
.

58.

Chung
 
CR
,
Tang
 
Y
,
Chiu
 
YP
 et al.  
dbPTM 2025 update: comprehensive integration of PTMs and proteomic data for advanced insights into cancer research
.
Nucleic Acids Res
.
2025
;
53
:
D377
86
.

59.

Yu
 
K
,
Wang
 
Y
,
Zheng
 
Y
 et al.  
qPTM: an updated database for PTM dynamics in human, mouse, rat and yeast
.
Nucleic Acids Res
.
2023
;
51
:
D479
87
.

60.

Hornbeck
 
PV
,
Zhang
 
B
,
Murray
 
B
 et al.  
PhosphoSitePlus, 2014: mutations, PTMs and recalibrations
.
Nucleic Acids Res
.
2015
;
43
:
D512
20
.

61.

Paysan-Lafosse
 
T
,
Blum
 
M
,
Chuguransky
 
S
 et al.  
InterPro in 2022
.
Nucleic Acids Res
.
2023
;
51
:
D418
27
.

62.

Mistry
 
J
,
Chuguransky
 
S
,
Williams
 
L
 et al.  
Pfam: the protein families database in 2021
.
Nucleic Acids Res
.
2021
;
49
:
D412
19
.

63.

Wang
 
J
,
Chitsaz
 
F
,
Derbyshire
 
MK
 et al.  
The conserved domain database in 2023
.
Nucleic Acids Res
.
2023
;
51
:
D384
88
.

64.

Sigrist
 
CJ
,
de Castro
 
E
,
Cerutti
 
L
 et al.  
New and continuing developments at PROSITE
.
Nucleic Acids Res
.
2013
;
41
:
D344
47
.

65.

Sillitoe
 
I
,
Bordin
 
N
,
Dawson
 
N
 et al.  
CATH: increased structural coverage of functional space
.
Nucleic Acids Res
.
2021
;
49
:
D266
73
.

66.

Attwood
 
TK
,
Coletta
 
A
,
Muirhead
 
G
 et al.  
The PRINTS database: a fine-grained protein sequence annotation and analysis resource—its status in 2012
.
Database
.
2012
;
2012
:
bas019
.

67.

Letunic
 
I
,
Khedkar
 
S
,
Bork
 
P
.
SMART: recent updates, new developments and status in 2020
.
Nucleic Acids Res
.
2021
;
49
:
D458
60
.

68.

Landrum
 
MJ
,
Lee
 
JM
,
Riley
 
GR
 et al.  
ClinVar: public archive of relationships among sequence variation and human phenotype
.
Nucleic Acids Res
.
2014
;
42
:
D980
85
.

69.

Karczewski
 
KJ
,
Francioli
 
LC
,
Tiao
 
G
 et al.  
The mutational constraint spectrum quantified from variation in 141,456 humans
.
Nature
.
2020
;
581
:
434
43
.

70.

Zeisler
 
H
,
Llurba
 
E
,
Chantraine
 
F
 et al.  
Predictive value of the sFlt-1:plGF ratio in women with suspected preeclampsia
.
N Engl J Med
.
2016
;
374
:
13
22
.

71.

Berkley
 
K
,
Zalejski
 
J
,
Sharma
 
A
.
Targeting STAT3 for cancer therapy: focusing on Y705, S727, or dual inhibition?
.
Cancers
.
2025
;
17
:
755
.

72.

Liang
 
B
,
Li
 
SY
,
Gong
 
HZ
 et al.  
Clinicopathological and prognostic roles of STAT3 and its phosphorylation in glioma
.
Dis Markers
.
2020
;
2020
:
1
.

73.

Hashemi
 
M
,
Sabouni
 
E
,
Rahmanian
 
P
 et al.  
Deciphering STAT3 signaling potential in hepatocellular carcinoma: tumorigenesis, treatment resistance, and pharmacological significance
.
Cell Mol Biol Lett
.
2023
;
28
:
33
.

74.

Fukuda
 
A
,
Wang
 
SC
,
Morris
 
JP
 et al.  
Stat3 and MMP7 contribute to pancreatic ductal adenocarcinoma initiation and progression
.
Cancer Cell
.
2011
;
19
:
441
55
.

75.

Wang
 
Y
,
Wang
 
S
,
Wu
 
Y
 et al.  
Suppression of the growth and invasion of human head and neck squamous cell carcinomas via regulating STAT3 signaling and the miR-21/beta-catenin axis with HJC0152
.
Mol Cancer Ther
.
2017
;
16
:
578
90
.

76.

Hong
 
J
,
Hu
 
K
,
Yuan
 
Y
 et al.  
CHK1 targets spleen tyrosine kinase (L) for proteolysis in hepatocellular carcinoma
.
J Clin Invest
.
2012
;
122
:
2165
75
.

77.

Hong
 
J
,
Yuan
 
Y
,
Wang
 
J
 et al.  
Expression of variant isoforms of the tyrosine kinase SYK determines the prognosis of hepatocellular carcinoma
.
Cancer Res
.
2014
;
74
:
1845
56
.

78.

Karakulak
 
T
,
Szklarczyk
 
D
,
Saylan
 
CC
 et al.  
CanIsoNet: a database to study the functional impact of isoform switching events in diseases
.
Bioinform Adv
.
2023
;
3
:
vbad050
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data