IsoProDB: an integrated map of human protein isoforms for accelerated research

Abstract

Emerging studies highlight the importance of protein isoforms, which often exhibit distinct functional roles and contribute to physiological diversity, disease mechanisms, and phenotypic variation, despite originating from the same gene. However, comprehensive isoform-level resources that characterize protein isoforms remain limited. IsoProDB is an integrative and unified one-stop database that aligns protein isoforms from RefSeq and UniProtKB, enabling cross-sequence visualization for protein isoform analysis in humans. It integrates features such as domain architecture, intrinsically disordered regions, sequence variants, transmembrane topology, and 52 distinct post-translational modifications (PTMs) mapped to protein isoforms from multiple resources. Currently, IsoProDB enables users to perform gene wise comparative analyses across 110 149 protein isoforms derived from 20 536 protein-coding genes for all integrated features, supported by effective visualizations. This provides insights into conserved and nonconserved PTM sites, domains, isoform-specific membrane localization, the impact of variants on protein function, and disease relevance across protein isoforms. With specific isoforms emerging as markers and theragnostic targets for various disorders, IsoProDB is integrated with multiple global resources for easy navigation and exploration of multiomics information on isoforms.

Database URL: https://ciods.in/isoprodb

Introduction

Despite originating from the same gene, protein isoforms have the potential to exhibit distinct biological roles. This has contributed to a growing interest in exploring the structural and functional diversity of proteins at the isoform level. Mechanisms such as alternative splicing, intron retention, and alternative transcription start/stop sites serve to diversify mRNA sequences, yielding different protein isoform [1,2]. In general, it is estimated that >90% of the human genes undergo alternative splicing, and each gene yields, on average, four protein isoforms as reported by previous studies [1,3]. These isoforms often differ in sequence and may have distinct structural and functional properties along with similarities. Moreover, aberrant regulation of protein isoforms has been associated with the development and progression of various diseases. Furthermore, isoforms display functional diversity within these disease contexts. For instance, spleen tyrosine kinase (SYK), a nonreceptor tyrosine kinase, exhibits a dual role in cancer. It acts as an oncogenic driver in leukemia [4], while in certain solid tumours, such as lung and colorectal cancers [5], it promotes cell proliferation, survival, and metastasis. In contrast, SYK acts as a tumour suppressor in breast cancer by inhibiting tumour initiation and progression [6,7]. These opposing functions are largely attributed to changes in the regulation of protein isoforms [8,9]. Moreover, recent studies indicate that the drugs miss their targets when an isoform switch occurs in the synthesis of the target protein. For instance, trastuzumab (a HER2-targeted therapy) is widely used for the treatment of patients with metastatic breast tumours overexpressing HER2. But the exon-spliced variant of HER2 shows poor recognition for the monoclonal antibody trastuzumab, resulting in drug resistance [10–12]. In addition, protein isoforms are increasingly being recognized as diagnostic biomarkers for a range of diseases, including various cancers [13–15]. This growing recognition underscores the significance of uncovering structural and expression-level differences in protein isoforms. Such analysis provides significant insights across various domains, including diagnostics, prognostics, and therapeutics, to drive global efforts in exploring disease-specific sequence, structural, and functional divergences.

Among the factors that influence protein function, post-translational modifications (PTMs) have been illustrated to have significant functional consequences [16]. PTMs are essential for the diversity of cellular functions, including signalling pathways [17], protein stability [18,19], protein–protein interactions, protein localization [20], and enzymatic activity [21]. They are also linked to several diseases, such as cancer [22], cardiovascular [23], renal [24], neurological [25], and metabolic disorders [26,27]. The rapid advancement of technologies in structural biology, proteomics, and pharmacology has made PTMs as key targets in disease research. This focus is also extendable to protein isoforms, as the isoform-specific diversity of proteins can modify the regulation and overall function of a protein [28–30]. Notably, the interplay between alternative splicing and PTMs adds an additional layer of complexity to protein regulation, as isoform variation can lead to the loss or gain of PTM sites, resulting in altered protein function. Such protein isoform-level variations are also significant to transmembrane (TM) proteins, which constitute over 20% of the human proteome and include enzymes, transporters, ion channels, and receptors of various physiological and nonphysiological ligands. They are critical for various cellular functions, including ion/molecule transport, cell adhesion, ligand–receptor interaction, catalysis of molecular reactions in biological membranes, and more generally, mediating cell–cell interactions and intracellular signal transduction [31,32]. Many of them serve as key drug targets due to their accessibility and regulatory roles in cellular signalling. Alternative splicing in these proteins can also lead in distinct TM topology attributable to the variations in the number or amino acid position of transmembrane regions (TMRs), extracellular loops, or cytoplasmic domains [33–35]. This in turn affects ligand binding, protein interactions, ion selectivity, and signalling [36].

Accounting for these features, protein isoforms may demonstrate functional divergence due to variant constitution of their domains and intrinsically disordered regions (IDRs). Domains are typically highly conserved segments of the protein sequence that confer defined structural or functional roles [37]. Although IDRs lack a fixed three-dimensional structure, they are significant for diverse cellular processes, including signal transduction [38], transcriptional and translational regulation [39], RNA processing, cell cycle control, and small molecule storage [40,41]. Isoform-specific variation in these regions may alter protein stability, interaction networks, and functional outcomes, underscoring the need to consider domain and IDR differences when studying protein function at the isoform level. Although there are several tools to analyse the IDR in proteins, analyses at the isoform level remain largely unexplored [42–45].

Initial large-scale efforts were made to develop resources and studies that facilitate comprehensive analyses of protein expression and diversity [46,47]. In this context, the recent establishment of various splicing variant databases, such as OncoSplicing [48], AScancerAtlas [49], APPRIS [50], ISOexpresso [51], FLIBase [52], ISOdb [53], and Aspdb [54], has yielded comprehensive information pertaining to isoforms.. However, the resources provide comprehensive annotation of protein-level features including TM topology, domain organization, IDRs, and PTMs in a single platform remain limited. Therefore, to meet these demands, we developed IsoProDB (https://ciods.in/isoprodb), an integrated unified database for the analysis of similarities and differences of protein isoforms in humans. This database includes protein isoforms from primary resources (RefSeq and UniProtKB) and provides information on PTMs, protein domain regions including isoform-specific IDRs, predicted topology of TM proteins, and a map of known genetic variants of protein isoforms along with their conservation across isoforms. Currently, IsoProDB covers 110 149 isoforms corresponding to 20 536 protein-coding genes from RefSeq and UniProtKB, offering successful matches for 46 547 protein isoforms across these resources. IsoProDB is accessible to researchers from diverse backgrounds, enabling them to explore and compare protein isoforms with ease, regardless of their bioinformatics expertise.

Materials and methods

Data resources

Experimentally validated and predicted protein isoforms corresponding to 20 536 protein-coding genes and the gene information were downloaded from the RefSeq (Release-230) and UniProtKB (Release 2025_02) databases. The RefSeq dataset contained 198 024 transcripts with 101 440 protein isoforms, while the UniProtKB dataset contained 227 203, including 42 534 reviewed (Swiss-Prot) and 184 668 unreviewed (TrEMBL) protein isoform sequences. Mapping between the RefSeq and UniProt accessions of these protein isoforms was performed using in-house Python scripts based on exact full length sequence matching. RefSeq proteins with 100% sequence identity across the entire sequence length to a UniProtKB entry were considered valid matches. Initially, all RefSeq protein isoforms were mapped against 42 534 reviewed (Swiss-Prot) protein isoforms, and the remaining unmapped RefSeq protein isoforms were subsequently compared against 184 668 unreviewed (TrEMBL) UniProtKB protein isoforms. This mapping resulted in 46 547 protein isoforms shared between RefSeq and UniProtKB, and 8710 and 56 340 unique protein isoforms were identified exclusively in UniProtKB (reviewed entries) and RefSeq, respectively. As a result, 110 149 protein isoforms corresponding to 20 536 genes, including all reviewed UniProtKB protein isoforms and only the mapped unreviewed (TrEMBL) protein isoforms from UniProtKB, as well as all RefSeq protein isoforms, were included in IsoProDB. Notably, in the RefSeq database, multiple accessions are provided to identical proteins depending on the different transcript IDs. These identical proteins are grouped by ‘isoform column’ in transcript table provided by RefSeq to remove redundancy and only one representative protein accession is displayed in the PTMs, domain, sequence variant, and topology sections of the web interface.

Topology of TM proteins

Although TM proteins account for over 20% of the human proteome, difficulty in determining their structures has resulted in the deposition of a limited number of TM protein structures in the Protein Data Bank [55,56]. To attain the topology of all TM proteins, we used DeepTMHMM v1.0.1 [57], a deep learning-based protein language model that is based upon a deep learning encoder–decoder sequence-to-sequence model that takes a protein sequence as input and outputs the corresponding per-residue sequence of labels. The per-residue labels are signal peptide (S), inside cell/cytosol (I), alpha membrane (M), beta membrane (B), periplasm (P) and outside cell/lumen of ER/Golgi/lysosomes (O). The sequence of residue labels defines the topology of the protein. To analyse the topology, we input the FASTA sequence of all 110 149 protein isoforms in the tool, and the analysis resulted in 22 049 protein isoforms corresponding to 5098 genes as TM proteins, and 16 732 genes were found to have zero TMRs, i.e. nontransmembrane protein isoforms. 99.92% of these proteins have alpha helix structures, and 14 protein isoforms are found to have beta sheets. Among 5098 genes, 1294 are predicted to have at least one nontransmembrane protein isoform along with TM protein isoforms. There are 11 118 protein isoforms corresponding to 1294 genes, in which 3885 are nontransmembrane protein isoforms and 7233 protein isoforms are found to have TMRs. Interestingly, it was noticed that 1230 of the 5098 genes had isoforms with varying numbers of TMRs. The 9098 isoforms with different TMR composition included both multipass and single-pass membrane proteins.

PTMs

To collate known PTMs from protein isoforms, PTMs data were downloaded from the dbPTM [58], qPTM [59], and PhosphoSitePlus [60] databases. The data included PTM site, modification type, and protein accessions (UniProt accessions) along with their corresponding references. All PTM sites corresponding to each protein isoform retrieved from the databases were uniformly and nonredundantly mapped to the respective protein isoform sequences by matching UniProt accessions. The mapping process further involved cross-verification of the modified amino acid residue positions against the corresponding protein isoform sequences in the database to ensure positional accuracy and data integrity. This resulted in the mapping of 840 734 PTM sites featuring 52 types of PTMs across 31 325 protein isoforms corresponding to 19 124 genes. Phosphorylation was found to be the most observed PTM, mapped to 636 320 sites corresponding to 27 834 protein isoforms, followed by ubiquitination (165 420 sites in 20 338 protein isoforms) and methylation (131 404 sites in 20 275 protein isoforms). Protein H3C4 (P68431) was found to be the highest modified protein (18 PTM types), while 1107 proteins had no known PTMs attributed in any of the databases.

To assess the conservation and uniqueness of PTM sites among protein isoforms, a sequence alignment-based conservation analysis was incorporated. The conservation of sites is analysed by aligning the protein isoform sequences using BioMSA, a JavaScript library that enables local alignment of sequences (DNA or protein) within the browser. After alignment, a nine-amino acid window centred on each PTM site was evaluated to assess sequence conservation across isoforms of the same protein. A PTM site was considered conserved if this nine-amino acid region was aligned and preserved across all protein isoforms. Along with the conservation, the alignment provides sites which are specific to protein isoforms.

Disordered regions and functional domains

Even though there are various tools to explore the IDRs in proteins, the variation of these across the isoforms is unexplored. Moreover, the information of experimentally validated IDRs in proteins is limited. The disordered regions in protein isoforms were predicted using InterProScan, a standalone tool developed by InterPro [61]. InterProScan uses MoBiDB, a comprehensive database that provides annotations and predictions of IDRs in proteins from protein sequences. The analysis resulted in the prediction of IDRs in 32 389 protein isoforms belonging to 11 748 genes. Similarly, domain details of protein isoforms were analysed using InterProScan, which integrates FASTA sequence-driven domain architecture data from six different databases, including Pfam [62], Conserved Domains Database (CDD) [63], PROSITE [64], CATH-Gene3D [65], PRINTS [66], and SMART [67]. As a result, IsoProDB contains comprehensive domain architecture for 40 205 protein isoforms corresponding to 14 261 genes, which is highly useful for comparing isoform-specific domain architecture from different domain resources and for comparative analysis with other isoforms of the same protein.

Clinically relevant sequence variants among isoforms

Sequence variation among protein isoforms plays a critical role in defining their structural/functional diversity and similarity, making it essential to analyse isoform-specific sequences when investigating protein isoform characteristics. To enable analysis of the impact of sequence variation on protein isoforms, variant data were obtained from the ClinVar [68] and gNOMAD [69] databases. The data included variation in the transcript, subsequent change in the protein sequence, variant type, clinical significance, transcript identifier, and protein accession from both the databases and removed redundancy in data. This variance data was mapped with protein isoforms by the RefSeq protein accession, resulting in the catalogue of 65 039 protein isoforms corresponding to 18 498 genes with sequence variants.

Technological framework for development of IsoProDB

The IsoProDB database was developed using a Django framework running in a Docker container in the backend, and the frontend was built with React.js and styled using Tailwind CSS. My Structured Query Language (MySQL) database management system was utilized for the storage and management of data, providing robust and scalable solutions for data management. The database, along with its packages makes storing and retrieving data simple, fast, and useful for the application. All codes and scripts used for data analysis and integration in this study are available at the GitHub repository https://github.com/sree-pathappillil/IsoProtDB. A schematic overview of the methodology is provided in Fig. 1.

Schematic representation of the workflow, methods, and data sources involved in the development of IsoProDB.

Figure 1

The schematic illustrates the methods and resources used for the development of IsoProDB.

Open in new tab Download slide

Results and discussion

The web interface

IsoProDB is an integrative platform developed to explore the diversity of human protein isoforms. It uniquely aligns various protein isoforms from primary resources and enables cross-sequence visualization with integrated features such as protein domains, IDRs, sequence variants, TM topology, and over 52 PTMs mapped across isoforms. IsoProDB currently supports the interactive visualization and comparative analysis of sequence, structure, and function for 110 149 protein isoforms derived from 20 536 human protein-coding genes in humans. The web interface is organized into distinct sections that enable comparative analysis of protein isoforms in terms of PTMs, domains, and IDRs, as well as sequence variants and TM topology. Figure 2 shows an overview of the IsoProDB user interface using the example gene ABCC4.

Overview of the IsoProDB web interface with subfigures A–F, illustrating protein isoform summary, topology (TMRs), domain/IDR features, PTMs, and sequence variants with corresponding visualizations of ABCC4.

Figure 2

The figure illustrates the isoform-specific exploration of ABCC4. The search bar navigates to the summary page, where (A) the transcript table lists all the transcripts and isoforms of ABCC4 along with the protein information. (B) Topology section shows the number of TMRs in all protein isoforms along with their range in protein sequence. (C) The domain details of three protein isoforms, with parent databases that contribute to domain details are highlighted in different colours. (D) The IDRs of three protein isoforms are shown in the disordered region section within the domain tab. (E) The site table lists all the modified sites in three protein isoforms. The site is labelled with all the reported modifications along with a sequence window. The aligned sites are given in a row for each site. (F) All the variants in the protein isoforms are listed in the table along with the transcript and protein level changes of ABCC4.

Open in new tab Download slide

The Home page provides the statistics and overview of the database along with a search bar (Fig. S1). Users can browse IsoProDB by gene symbol (e.g. ABCC4 and BRCA1) through search bar on the homepage, which directs to the protein summary page, where the information of the query gene is provided with a list of protein isoforms along with the transcripts as a table (Fig. 2A; Fig. S2A). The page also provides the sequence alignment of all isoforms aimed at analysing the regions that are conserved and those that vary across isoforms (Fig. S2B). Along with the summary page, the user can access all five sections in the database, including PTMs, TM topology, domain and sequence variants, and each of these sections provides information and visualization of unique protein isoforms of the gene of interest. Even though the summary page lists all the transcripts and isoforms of the queried gene, only unique isoforms are represented in the rest of the tabs.

Adjacent to the protein summary, the TM Topology section provides the predicted TM topology for each protein isoform of the query gene. The accompanying table (Fig. S3) and visualization support comparative analysis across protein isoforms, allowing users to easily identify similarities and differences in the number and range of TMRs among them (Fig. 2B). This section also includes results for the nontransmembrane proteins in such a way that they are either inside or outside the membrane.

The PTMs section in IsoProDB includes 52 types of PTMs detected in 20 536 protein-coding genes. The query for each gene highlights only the reported PTMs in these 52 types of protein isoforms and allows users to choose any of them for analysis. Once selected, the sites that are reported with the selected PTM type are displayed on the corresponding protein isoform within the alignment view section, alongside a detailed data table. The section also provides a site conservation analysis, which identifies regions with PTM sites conserved across protein isoforms, and this helps to infer the potential occurrence of PTMs in protein isoforms, where these sites have not yet been experimentally detected (Fig. 3). The site table lists all reported PTMs across protein isoforms and shows how specific they are to each protein isoform (Fig. 2E). In this context, specificity refers to whether a modified site is unique to a specific protein isoform or conserved across various isoforms, as determined by the sequence alignment of all protein isoforms. This, along with interactive visualizations, enables a clear and effective comparative analysis of selected PTMs across different isoforms.

Figure illustrating the steps in PTM conservation analysis of ABCC4, including PTM selection, distribution and conservation of modification sites across protein isoforms, and their positions in the aligned sequence.

Figure 3

The figure illustrates the conservation analysis of PTM sites in ABCC4. (A) The ABCC4 is reported with a total of five PTMs, and the user can select any of them for conservation analysis. The selection of ubiquitination lists the number of reported ubiquitinated sites across the protein isoforms of ABCC4. (B) The conservation analysis button shows all the conserved ubiquitinated sites among the isoforms and highlights them in blue colour within the alignment viewer in all isoforms. The diamond symbol indicates the reported ubiquitinated sites in protein isoforms.

Open in new tab Download slide

The Domain section integrates both protein domain and IDR information along with the visualization and data table for all the protein isoforms of the queried gene (Fig. 2C and D). The data table includes domain names, amino acid ranges, and corresponding references to parent databases for more information (Figs S4 and S5). This display helps in understanding domain loss or gain, isoform-specific structural flexibility, and potential regulatory regions that contribute to the functional diversity of protein isoforms. Following the domain, the sequence variant section integrates variant information from the gnomAD and ClinVar databases. The page provides information on the variants, including transcript and protein-level consequences and identifiers, along with entry identifiers from the reference database (Fig. 2F). Moreover, the user can filter the protein isoforms and variant types in the results. This section is designed to help users understand how genetic variation may affect different isoforms, enabling insights into isoform-specific impacts of variants and their potential clinical relevance. Moreover, a graphical representation is provided in the variant section, in addition to the existing tabular data, to improve clarity and usability. Users can download the data corresponding to the queried gene in each section of the database. Furthermore, the information on genes and isoforms in IsoProDB is accessible through the download option in the database. This improves data transparency and enables users to better explore and analyse IsoProDB data. The Help page and FAQ section provide additional information to guide users when accessing and using the database. For further queries, users can reach out to the team through the Contact page.

Intended functionality and user benefits

IsoProDB is an integrative and user-friendly visualizable platform developed to explore the diverse protein isoforms in humans, particularly considering that their annotation often varies across primary resources and many of them remain unintegrated across resources. Towards this, IsoProDB aligns diverse protein isoforms from UniProtKB and RefSeq, providing mapping between accessions from both databases. It focuses on integrating protein features, such as domain architecture, PTMs, IDRs, TM topology, and sequence variants at the protein isoform level, providing a consolidated view of these attributes within a single framework. Recent studies underscore that variation in the number of TMRs among protein isoforms is a significant factor that deserves greater attention, as it can lead to distinct patterns of tissue-specific expression, subcellular localization, and functional divergence. Extending this concept, vascular endothelial growth factor receptor 1 (VEGFR1/FLT1) also exists as TM and soluble isoforms, illustrating how isoform diversity can have diagnostic relevance. The soluble VEGFR1 (sFLT1) isoforms serve as clinical biomarkers, as their quantification is used to determine the sFLT1: PlGF ratio, a diagnostic indicator that predicts the absence of pre-eclampsia. Although the antibodies used for this assay recognize all VEGFR1 isoforms, the test effectively measures only the soluble FLT1 isoforms since the others are TM forms (Fig. S6) [15, 29,70]. Considering these, the TM topology in IsoProDB offers new perspectives on the concept that TMR-based isoform diversity should be recognized as a key factor in understanding protein function at the isoform level and should be more broadly discussed in the scientific community. IsoProDB acts as a platform that gives information about the TMRs of all protein isoforms, allowing users to analyse nontransmembrane and transmembrane isoforms, variations in the TMR number, etc., of the proteins of interest. In a similar vein, PTMs show notable differences across protein isoforms, adding another layer of functional complexity. For example, STAT3 and its phosphorylated sites have been a promising target in cancer treatments for years, and Y705 is one of them [71–75]. In our analysis, it is found that Y705 is conserved across all 16 STAT3 protein isoforms except NP_001371917.1. Phosphorylation at this site has been reported in only three isoforms, despite its conservation in 16 protein isoforms (Fig. S7). Similarly, in hepatocellular carcinoma (HCC), the isoforms of spleen tyrosine kinase, SYK(L) and SYK(S), exhibit distinct expression patterns and functional roles. SYK(L) expression is generally downregulated in HCC tumour samples compared to normal liver tissue, whereas SYK(S) levels are elevated. This differential regulation is linked to CHK1-mediated phosphorylation, which promotes proteasomal degradation of SYK(L) in HCC [76]. The phosphorylation site (S295) targeted by CHK1 is located within the spliced (DEL) region present only in SYK(L) but absent in SYK(S); consequently, elevated CHK1 levels lead to the selective loss of the full-length SYK(L) without affecting SYK(S) expression (Fig. S8). Functionally, SYK(S) enhances cellular invasion, whereas SYK(L) suppresses metastasis in HCC [77]. The PTM section in IsoProDB allows users to explore and enhance their understanding of patterns of conservation and specificity of reported PTM sites among the protein isoforms, which contributes to a better grasp of protein function, regulation, and potential therapeutic or biomarker applications. Beyond PTMs and TM topology, IsoProDB provides a quick overview of protein isoforms in terms of domain and IDR, as well as sequence variants. Moreover, the data visualization in each section is intended to provide a comparative analysis across protein isoforms for the quick understanding of differences and similarities across protein isoforms. Beyond comparative analysis, IsoProDB enables researchers to investigate the specificity or conservation of particular sites among isoforms and to explore their functional relevance in terms of domains, IDRs, PTMs, and clinical significance associated with sequence variants, all within a single platform.

Comparison of IsoProDB with other databases

The development of various isoform databases has emerged in recent years, providing various information regarding the protein isoforms, such as OncoSplicing [48], AScancerAtlas [49], FLIBase [52], ISOdb [53], CanIsoNet [78], APPRIS [50], and ASpdb [54]. Most of these resources focus on specific diseases like cancer. OncoSplicing and AScancerAtlas provide information on splicing events in cancer, while FLIBase characterizes and catalogues full-length isoforms using long-read RNA sequencing techniques. In contrast, CanIsoNet focuses on disease-specific isoform interaction networks in humans. APPRIS annotates and identifies principal splice isoforms of protein-coding genes by integrating structural, functional, and domain-based information.

Moreover, while ASpdb provides structural analysis and explores the relationship between splicing variants, diseases, and drugs in humans and APPRIS annotates splice isoforms with protein structure, domains, and TM helices, no other resource brings together important features such as TM topology, domain composition, IDRs, PTMs, and clinical variants in one database. IsoProDB aims to provide detailed information on these features for all listed unique protein isoforms of a queried gene through a structured, unified platform. Interestingly, most of these databases provide information at the transcript level rather than at the protein level, whereas IsoProDB characterizes all protein isoforms derived from the RefSeq and UniProtKB databases. IsoProDB covers 110 149 isoforms corresponding to 20 536 protein-coding genes and successfully matches 46 547 protein isoforms across these resources. This comprehensive integration and unique coverage position IsoProDB as a valuable resource for in-depth isoform-level protein analysis, broadens the scope of existing resources. The comparison of IsoProDB between the existing isoform databases are provided in Table S1.

Future development and maintenance

IsoProDB addresses extensive isoform-level data. However, there are a few limitations that anticipate updating in the future. First, IsoProDB provides PTM conservation among isoforms by considering primary sequence alignment due to the limited availability of well-defined tertiary structures. This limitation restricts our ability to account for structural context when evaluating the functional conservation of PTM sites. We intend to integrate tertiary structural information in the future, facilitating a more precise and spatially informed comprehension of PTM conservation across protein isoforms. Additionally, the current version of IsoProDB does not incorporate interactors of proteins at the isoform level. The inclusion of isoform-specific interaction networks in future updates would provide deeper insights into the distinct functional roles and regulatory mechanisms of individual protein isoforms. Furthermore, although clinical relevance has been partially addressed through the integration of sequence variant data, direct associations between specific isoforms and disease phenotypes are not yet established. We intend to incorporate such information as an additional data layer to further support disease-focused and clinical research applications.

Conclusion

IsoProDB serves as an integrative platform for the understanding and analysis of protein isoform diversity in humans. In recent years, protein isoform characterization has gained considerable attention since differences among isoforms can have significant functional implications in diverse areas, particularly therapeutics. The regulatory mechanism, cellular localization, functional properties of protein isoforms in terms of PTMs, domain, and IDRs, and the TM topology are not explored in the existing databases, making IsoProDB unique. And these characterizations at the protein level of all the isoforms, regardless of their expression in specific diseases, establish IsoProDB as a reference database for global users. Moreover, the inclusion of all protein isoforms from UniProtKB and RefSeq and the cross-reference between the databases provides a user-friendly unified platform for the analysis. Although IsoProDB currently provides protein isoform information at the sequence level, future updates will extend its coverage to include structural-level details. Together, IsoProDB addresses a critical gap in the field by providing a powerful, isoform-centric platform, serving as a foundational resource for future discoveries in genomics, structural biology, and translational research.

Acknowledgements

We thank Yenepoya (Deemed to be University), Mangalore, India for their support in establishing the computational facility at CIODS. This article contains no studies with human participants or animals performed by any of the authors.

Author contributions

R.R.: Conceptualization; S.U., S.S., and S.P.S.: Methodology, Software (database development); S.P.S., S.U., P.B.S., and M.N.: Data curation, Formal analysis; S.P.S.: Writing – original draft; P.R., M.A., Y.S., and R.R.: Validation, Writing – review & editing.

Conflicts of interest

The authors declare that they have no competing interests.

Funding

M.A. is thankful to the ongoing Research Funding Program, (ORF-2026-984), King Saud University, Riyadh, Saudi Arabia. S.P.S. is recipient of the Senior Research Fellowship from Yenepoya (Deemed to be University).

Data availability

All codes and scripts used for data analysis and integration in this study are available at the GitHub repository https://github.com/sree-pathappillil/IsoProtDB. The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Wang

Sandberg

Luo

et al.

Alternative isoform regulation in human tissue transcriptomes

Nature

2008

;

456

470

–

Reyes

Huber

Alternative start and termination sites of transcription drive most transcript isoform differences across human tissues

Nucleic Acids Res

2018

;

582

–

Melamud

Moult

Structural implication of splicing stochastics

Nucleic Acids Res

2009

;

4862

–

Buchner

Fuchs

Prinz

et al.

Spleen tyrosine kinase is overexpressed and represents a potential therapeutic target in chronic lymphocytic leukemia

Cancer Res

2009

;

5424

–

10.1158/0008-5472.CAN-08-4252

Denis

Cassagnard

Del Rio

et al.

Targeting the splicing isoforms of spleen tyrosine kinase affects the viability of colorectal cancer cells

PLoS One

2022

;

e0274390

10.1371/journal.pone.0274390

Coopman

Barth

et al.

The Syk tyrosine kinase suppresses malignant growth of human breast cancer cells

Nature

2000

;

406

742

–

Wang

Devarajan

et al.

Transcription repressor activity of spleen tyrosine kinase mediates breast tumor suppression

Cancer Res

2005

;

10289

–

10.1158/0008-5472.CAN-05-2231

Krisenko

Geahlen

Calling in SYK: SYK’s dual role as a tumor promoter and tumor suppressor in cancer

Biochim Biophys Acta Mol Cell Res

2015

;

1853

254

–

10.1016/j.bbamcr.2014.10.022

Google Scholar

Crossref

WorldCat

Zhang

Manley

Misregulation of pre-mRNA alternative splicing in cancer

Cancer Discov

2013

;

1228

–

10.1158/2159-8290.CD-13-0253

10.

Wang

et al.

Resistance mechanisms and prospects of trastuzumab

Front Oncol

2024

;

1389390

10.3389/fonc.2024.1389390

11.

Wang

Zheng

Jia

et al.

Trastuzumab resistance in HER2-positive breast cancer: mechanisms, emerging biomarkers and targeting agents

Front Oncol

2022

;

1006429

10.3389/fonc.2022.1006429

12.

Mitra

Brumlik

Okamgba

et al.

An oncogenic isoform of HER2 associated with locally disseminated breast cancer and trastuzumab resistance

Mol Cancer Ther

2009

;

2152

–

10.1158/1535-7163.MCT-09-0295

13.

Higgins

Roper

Watson

et al.

Variant Ciz1 is a circulating biomarker for early-stage lung cancer

Proc Natl Acad Sci

2012

;

109

E3128

–

10.1073/pnas.1210107109

14.

Coverley

Higgins

West

et al.

A quantitative immunoassay for lung cancer biomarker CIZ1b in patient plasma

Clin Biochem

2017

;

336

–

10.1016/j.clinbiochem.2016.11.015

15.

Rowson

Reddy

De Guingand

et al.

Comparison of circulating total sFLT-1 to placental-specific sFLT-1 e15a in women with suspected preeclampsia

Placenta

2022

;

120

–

10.1016/j.placenta.2022.02.017

16.

Crowl

Coleman

Chaphiv

et al.

Systematic analysis of the effects of splicing on the diversity of post-translational modifications in protein isoforms using PTM-POSE

Cell Syst

2025

;

101318

10.1016/j.cels.2025.101318

17.

Pawson

Scott

Protein phosphorylation in signaling—50 years and counting

Trends Biochem Sci

2005

;

286

–

10.1016/j.tibs.2005.04.013

18.

Lee

Hammaren

Savitski

et al.

Control of protein stability by post-translational modifications

Nat Commun

2023

;

201

10.1038/s41467-023-35795-8

19.

Ciechanover

Proteolysis: from the lysosome to ubiquitin and the proteasome

Nat Rev Mol Cell Biol

2005

;

–

20.

Kuwahara

Nishizaki

Kanazawa

Nuclear localization signal and phosphorylation of Serine350 specify intracellular localization of DRAK2

J Biochem

2008

;

143

349

–

21.

Kubatzky

Gao

Post-translational modulation of cell signalling through protein succinylation

Explor Target Antitumor Ther

2023

;

1260

–

10.37349/etat.2023.00196

22.

Fraga

Ballestar

Villar-Garea

et al.

Loss of acetylation at Lys16 and trimethylation at Lys20 of histone H4 is a common hallmark of human cancer

Nat Genet

2005

;

391

–

400

23.

Zhu

Xue

et al.

Fibroblast growth factor 21 protects the heart from angiotensin II-induced cardiac hypertrophy and dysfunction via SIRT1

Biochim Biophys Acta Mol Basis Dis

2019

;

1865

1241

–

10.1016/j.bbadis.2019.01.019

24.

Zhou

Wang

et al.

The UDPase ENTPD5 regulates ER stress-associated renal injury by mediating protein N-glycosylation

Cell Death Dis

2023

;

166

10.1038/s41419-023-05685-4

25.

Gotz

Tau-based therapies in neurodegeneration: opportunities and challenges

Nat Rev Drug Discov

2017

;

863

–

26.

Lee

Liu

et al.

Stabilization of phosphofructokinase 1 platelet isoform by AKT promotes tumorigenesis

Nat Commun

2017

;

949

10.1038/s41467-017-00906-9

27.

Jeon

Lim

Park

et al.

Wnt signaling promotes tumor development in part through phosphofructokinase 1 platelet isoform upregulation

Oncol Rep

2021

;

234

28.

Voss

Gamblin

GSK-3beta phosphorylation of functionally distinct tau isoforms has differential, but mild effects

Mol Neurodegen

2009

;

10.1186/1750-1326-4-18

Google Scholar

Crossref

WorldCat

29.

Kjer-Hansen

Phan

Weatheritt

Protein isoform-centric therapeutics: expanding targets and increasing specificity

Nat Rev Drug Discov

2024

;

759

–

10.1038/s41573-024-01025-z

30.

Yuen

Ogut

Brozovich

MYPT1 protein isoforms are differentially phosphorylated by protein kinase G

J Biol Chem

2011

;

286

37274

–

10.1074/jbc.M111.282905

31.

Chataigner

LMP

Leloup

Janssen

BJC

Structural perspectives on extracellular recognition and conformational changes of several type-I transmembrane receptors

Front Mol Biosci

2020

;

129

10.3389/fmolb.2020.00129

32.

Aguayo-Ortiz

Creech

Jimenez-Vazquez

et al.

A multiscale approach for bridging the gap between potency, efficacy, and safety of small molecules directed at membrane proteins

Sci Rep

2021

;

16580

10.1038/s41598-021-96217-7

33.

Xing

Lee

Widespread production of novel soluble protein isoforms by alternative splicing removal of transmembrane anchoring domains

FEBS Lett

2003

;

555

572

–

10.1016/s0014-5793(03)01354-1

34.

Gonzalez

Borquez

Trigo

et al.

The splice variant of the V2 vasopressin receptor adopts alternative topologies

Biochemistry

2011

;

4981

–

35.

Bidaux

Beck

Zholos

et al.

Regulation of activity of transient receptor potential melastatin 8 (TRPM8) channel by its short isoforms

J Biol Chem

2012

;

287

2948

–

10.1074/jbc.M111.270256

36.

Clark

Wrzesinski

Garcia

et al.

Long-read sequencing reveals the complex splicing profile of the psychiatric risk gene CACNA1C in human brain

Mol Psychiatry

2020

;

–

10.1038/s41380-019-0583-1

37.

Basu

Poliakov

Rogozin

Domain mobility in proteins: functional and evolutionary implications

Briefings Bioinf

2009

;

205

–

38.

Iakoucheva

Brown

Lawson

et al.

Intrinsic disorder in cell-signaling and cancer-associated proteins

J Mol Biol

2002

;

323

573

–

10.1016/s0022-2836(02)00969-5

39.

Miao

Chong

Roles of intrinsically disordered protein regions in transcriptional regulation and genome organization

Curr Opin Genet Dev

2025

;

102285

10.1016/j.gde.2024.102285

40.

Lieutaud

Ferron

Uversky

et al.

How disordered is my protein and what is its disorder for? A guide through the “dark side” of the protein universe

Intrinsically Disordered Proteins

2016

;

e1259708

10.1080/21690707.2016.1259708

41.

McConnell

Parker

Protein intrinsically disordered regions have a non-random, modular architecture

Bioinformatics

2023

;

btad732

10.1093/bioinformatics/btad732

42.

Hatos

Hajdu-Soltesz

Monzon

et al.

DisProt: intrinsic protein disorder annotation in 2020

Nucleic Acids Res

2020

;

D269

–

D76

43.

Piovesan

Necci

Escobedo

et al.

MobiDB: intrinsically disordered proteins in 2021

Nucleic Acids Res

2021

;

D361

–

44.

Meszaros

Erdos

Dosztanyi

IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding

Nucleic Acids Res

2018

;

W329

–

45.

Orlando

Raimondi

Codice

et al.

Prediction of disordered regions in proteins with recurrent neural networks and protein dynamics

J Mol Biol

2022

;

434

167579

10.1016/j.jmb.2022.167579

46.

Kim

Pinto

Getnet

et al.

A draft map of the human proteome

Nature

2014

;

509

575

–

47.

Aravind

Nandakumar

Ahmed

et al.

REMEMProt: a resource of membrane-enriched proteome profiles, their disease associations, and biomarker status

Life Sci Allian

2024

;

e202302443

10.26508/lsa.202302443

Google Scholar

Crossref

WorldCat

48.

Zhang

Liu

et al.

OncoSplicing 3.0: an updated database for identifying RBPs regulating alternative splicing events in cancers

Nucleic Acids Res

2025

;

D1460

–

49.

Huang

Zhang

et al.

ASCancer Atlas: a comprehensive knowledgebase of alternative splicing in human cancers

Nucleic Acids Res

2023

;

D1196

–

204

50.

Rodriguez

Pozo

Cerdan-Velez

et al.

APPRIS: selecting functionally important isoforms

Nucleic Acids Res

2022

;

D54

–

51.

Yang

Son

Kim

et al.

ISOexpresso: a web-based platform for isoform-level expression analysis in human cancer

BMC Genomics

2016

;

631

10.1186/s12864-016-2852-6

52.

Shi

Liu

et al.

FLIBase: a comprehensive repository of full-length isoforms across human cancers and tissues

Nucleic Acids Res

2024

;

D124

–

53.

Xie

Han

Chen

et al.

ISOdb: a comprehensive database of full-length isoforms generated by iso-seq

Int J Genomics

2018

;

2018

54.

Yang

Kumar

Xie

et al.

ASpdb: an integrative knowledgebase of human protein isoforms from experimental and AI-predicted structures

Nucleic Acids Res

2025

;

D331

–

39-D9

55.

Zhu

Computational design of transmembrane proteins

Curr Opin Struct Biol

2022

;

102381

10.1016/j.sbi.2022.102381

56.

Duart

Grana-Montes

Pastor-Cantizano

et al.

Experimental and computational approaches for membrane protein insertion and topology determination

Methods

2024

;

226

102

–

10.1016/j.ymeth.2024.03.012

57.

Hallgren

Tsirigos

Pedersen

et al.

DeepTMHMM predicts alpha and beta transmembrane proteins using deep neural networks

bioRxiv

2022

;

2022

–

2004

Google Scholar

OpenURL Placeholder Text

WorldCat

58.

Chung

Tang

Chiu

et al.

dbPTM 2025 update: comprehensive integration of PTMs and proteomic data for advanced insights into cancer research

Nucleic Acids Res

2025

;

D377

–

59.

Wang

Zheng

et al.

qPTM: an updated database for PTM dynamics in human, mouse, rat and yeast

Nucleic Acids Res

2023

;

D479

–

60.

Hornbeck

Zhang

Murray

et al.

PhosphoSitePlus, 2014: mutations, PTMs and recalibrations

Nucleic Acids Res

2015

;

D512

–

61.

Paysan-Lafosse

Blum

Chuguransky

et al.

InterPro in 2022

Nucleic Acids Res

2023

;

D418

–

62.

Mistry

Chuguransky

Williams

et al.

Pfam: the protein families database in 2021

Nucleic Acids Res

2021

;

D412

–

63.

Wang

Chitsaz

Derbyshire

et al.

The conserved domain database in 2023

Nucleic Acids Res

2023

;

D384

–

64.

Sigrist

de Castro

Cerutti

et al.

New and continuing developments at PROSITE

Nucleic Acids Res

2013

;

D344

–

65.

Sillitoe

Bordin

Dawson

et al.

CATH: increased structural coverage of functional space

Nucleic Acids Res

2021

;

D266

–

66.

Attwood

Coletta

Muirhead

et al.

The PRINTS database: a fine-grained protein sequence annotation and analysis resource—its status in 2012

Database

2012

;

2012

bas019

10.1093/database/bas019

67.

Letunic

Khedkar

Bork

SMART: recent updates, new developments and status in 2020

Nucleic Acids Res

2021

;

D458

–

68.

Landrum

Lee

Riley

et al.

ClinVar: public archive of relationships among sequence variation and human phenotype

Nucleic Acids Res

2014

;

D980

–

69.

Karczewski

Francioli

Tiao

et al.

The mutational constraint spectrum quantified from variation in 141,456 humans

Nature

2020

;

581

434

–

10.1038/s41586-020-2308-7

70.

Zeisler

Llurba

Chantraine

et al.

Predictive value of the sFlt-1:plGF ratio in women with suspected preeclampsia

N Engl J Med

2016

;

374

–

10.1056/NEJMoa1414838

71.

Berkley

Zalejski

Sharma

Targeting STAT3 for cancer therapy: focusing on Y705, S727, or dual inhibition?

Cancers

2025

;

755

10.3390/cancers17050755

72.

Liang

Gong

et al.

Clinicopathological and prognostic roles of STAT3 and its phosphorylation in glioma

Dis Markers

2020

;

2020

73.

Hashemi

Sabouni

Rahmanian

et al.

Deciphering STAT3 signaling potential in hepatocellular carcinoma: tumorigenesis, treatment resistance, and pharmacological significance

Cell Mol Biol Lett

2023

;

10.1186/s11658-023-00438-9

74.

Fukuda

Wang

Morris

et al.

Stat3 and MMP7 contribute to pancreatic ductal adenocarcinoma initiation and progression

Cancer Cell

2011

;

441

–

10.1016/j.ccr.2011.03.002

75.

Wang

et al.

Suppression of the growth and invasion of human head and neck squamous cell carcinomas via regulating STAT3 signaling and the miR-21/beta-catenin axis with HJC0152

Mol Cancer Ther

2017

;

578

–

10.1158/1535-7163.MCT-16-0606

76.

Hong

Yuan

et al.

CHK1 targets spleen tyrosine kinase (L) for proteolysis in hepatocellular carcinoma

J Clin Invest

2012

;

122

2165

–

77.

Hong

Yuan

Wang

et al.

Expression of variant isoforms of the tyrosine kinase SYK determines the prognosis of hepatocellular carcinoma

Cancer Res

2014

;

1845

–

10.1158/0008-5472.CAN-13-2104

78.

Karakulak

Szklarczyk

Saylan

et al.

CanIsoNet: a database to study the functional impact of isoform switching events in diseases

Bioinform Adv

2023

;

vbad050

10.1093/bioadv/vbad050

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Article Contents

IsoProDB: an integrated map of human protein isoforms for accelerated research

Abstract

Introduction

Materials and methods

Data resources

Topology of TM proteins

PTMs

Disordered regions and functional domains

Clinically relevant sequence variants among isoforms

Technological framework for development of IsoProDB

Results and discussion

The web interface

Intended functionality and user benefits

Comparison of IsoProDB with other databases

Future development and maintenance

Conclusion

Acknowledgements

Author contributions

Conflicts of interest

Funding

Data availability

References

Supplementary data

Citations

Views

Altmetric

Citing articles via

Latest

Most Read

Most Cited

Article Contents

IsoProDB: an integrated map of human protein isoforms for accelerated research Open Access

Abstract

Introduction

Materials and methods

Data resources

Topology of TM proteins

PTMs

Disordered regions and functional domains

Clinically relevant sequence variants among isoforms

Technological framework for development of IsoProDB

Results and discussion

The web interface

Intended functionality and user benefits

Comparison of IsoProDB with other databases

Future development and maintenance

Conclusion

Acknowledgements

Author contributions

Conflicts of interest

Funding

Data availability

References

Supplementary data

Citations

Views

Altmetric

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only

Gift article access

Gift article access

Gift article access

Gift article access

IsoProDB: an integrated map of human protein isoforms for accelerated research