-
PDF
- Split View
-
Views
-
Cite
Cite
Lev Tsarin, Polina V Shcherbakova, PolED: a manually curated database of functional studies of POLE and POLD1 variants reported in humans, Database, Volume 2025, 2025, baaf076, https://doi.org/10.1093/database/baaf076
Close - Share Icon Share
Abstract
Human POLE and POLD1 genes encode DNA polymerases responsible for genome replication and proofreading of DNA synthesis errors. Germline and somatic POLE/POLD1 mutations compromising the polymerase fidelity cause cancers with high mutational burden. Ultramutation is associated with a better prognosis and immunotherapy response, highlighting the need to define tumour POLE/POLD1 status unambiguously. Prior studies assessed the functional significance of numerous POLE/POLD1 variants in experimental models. However, the data remain scattered and difficult to evaluate by non-specialists, limiting their utility for research and clinical applications. Through manual literature curation, we integrated data from functional studies of clinically relevant POLE and POLD1 variants into PolED, a publicly available database (https://poled-db.org). PolED compiles information on variant effects in biochemical assays, yeast, mammalian cells, and mouse tumour models along with supporting references. It also includes a concise summary of functional significance for each variant. PolED aims to assist in clinical decision-making, guide personalized therapy, and promote further research.
Introduction
Genome stability relies on multiple mechanisms, including DNA repair pathways, dNTP pools maintenance, accurate DNA replication, and cell cycle checkpoints. Among these, high-fidelity DNA synthesis by replicative polymerases ε (Pol ε) and δ (Pol δ) is the most critical for preventing spontaneous mutations [1, 2]. The catalytic subunits of Pol ε and Pol δ, encoded by the POLE and POLD1 genes in humans, contain a DNA polymerase domain for accurate processive DNA synthesis and an exonuclease domain for correcting rare misincorporation errors. Mutations affecting nucleotide selection by the polymerase domain or proofreading by the exonuclease domain reduce the accuracy of DNA synthesis, increase mutagenesis, and cause cancer predisposition in experimental models [3–17].
Over 12 000 non-silent POLE/POLD1 variants have been reported in humans (Supplementary Tables S1–S3). A subset of these variants impair the polymerase fidelity and drive the development of tumours with exceptionally high mutation burdens. Somatic POLE/POLD1 mutations occur across many cancer types [18], but are particularly frequent in colorectal and endometrial cancer [19, 20] and in brain tumours of children with constitutional DNA mismatch repair deficiency (CMMRD) [21, 22]. Germline POLE/POLD1 mutations cause high-penetrance adult-onset cancer predisposition syndromes [23–26] and have also been associated with paediatric malignancies [27–33]. Ultramutated tumours exhibit improved outcomes due to high neoantigen production and an enhanced anti-tumour immune response. They also respond well to immune checkpoint inhibitors [34, 35]. Thus, POLE/POLD1 mutation status is a promising predictive biomarker for immunotherapy treatment. Additionally, detecting germline POLE/POLD1 drivers is crucial for risk assessment and long-term patient management in families with the hereditary syndromes [36, 37].
Distinguishing driver POLE/POLD1 alleles and benign variants is not straightforward. Pol ε and Pol δ are essential for DNA replication. Mutations result from error-prone DNA synthesis by these enzymes; therefore, only missense variants that preserve robust DNA synthesis capacity could theoretically drive ultramutation. Loss-of-function variants (deletions, insertions, premature stop codons) prevent production of the enzyme and cannot cause ultramutated cancers. The functional impact of most missense variants—except those at catalytic amino acid residues—is difficult to predict without experimental analysis. Since the discovery of Pol ε and Pol δ genes in the late 1980s [38–40], structure-function studies and genetic screens identified many amino acid substitutions that disrupt polymerase fidelity and increase mutation rates [4–7, 10, 12, 13, 15, 16, 41–57]. Some of this information was instrumental in establishing the pathogenicity of first reported human variants. The discovery of a multitude of POLE/POLD1 mutations in human cancers in the 2010s further boosted functional analysis efforts, which now specifically focused on cancer-associated variants with suspected clinical significance [22, 23, 58–85]. Due to the high conservation of polymerase functional regions, yeast model systems have been used extensively to engineer mutations analogous to human variants and determine their impact on the mutation rate. In vitro assays with purified Pol ε and Pol δ helped understand the effects of the variants on exonucleolytic proofreading. Some variants have been engineered in cultured human cells and mice to assess the impact on mutation rates, tumour susceptibility, and immunotherapy response. The availability of functional analysis data remains an important consideration in gauging the pathogenicity of new variants. However, much of the available experimental data on POLE/POLD1 variants remains unconsolidated. In the early literature, studies of the polymerase variants do not typically discuss the link to cancer, making these data even more difficult to find. Moreover, interpreting experimental data on mutator phenotypes and DNA polymerase fidelity requires specialized expertise not easily available to clinicians or interdisciplinary researchers.
To address these issues, we developed PolED, a manually curated database that compiles functional studies on POLE and POLD1 variants reported in humans. PolED consolidates and structures data from diverse experimental systems—purified proteins, yeast, cultured mammalian cells, and mouse models—to aid in variant classification. The data are reviewed for reliability and presented in a user-friendly format, making them accessible to non-specialists. PolED assists clinicians in making patient management decisions. It also helps researchers navigate through previously investigated variants and design further studies (Fig. 1).

PolED features and application in variant interpretation. Most POLE/POLD1 mutations are classified as variants of unknown significance. PolED compiles functional analysis data on POLE/POLD1 variants reported in humans. This information can assist clinicians and researchers in interpreting clinical variants.
Data collection and curation
We assessed 12 820 variants (8736 POLE and 4084 POLD1; Supplementary Tables S1 and S2) for the availability of functional analysis data. We extracted these variants from cBioPortal, gnomAD, ClinVar, COSMIC, Human Gene Mutation Database (HGMD), The Single Nucleotide Polymorphism Database (dbSNP), Leiden Open Variation Database (LOVD) v.3.0, OncoKB, Cancer Cell Line Encyclopedia (CCLE), and an internal Shcherbakova laboratory literature database maintained since 1990 and cross-checked using PubMed searches using the terms ‘POLE’ and ‘mutation’, or ‘POLD1’ and ‘mutation’ (Fig. 2A; Supplementary Table S3). Most were missense variants, and ∼1.5% were in-frame deletions or insertions. Among the 8736 POLE variants, 2204 were reported both in the germline and as somatic mutations in tumours, 5872 only as germline variants, and 660 only as somatic variants. Among the 4084 POLD1 variants, 1180 were reported in both germline and somatic contexts, 2480 only as germline variants, and 424 only as somatic variants (Fig. 2B).

Literature curation and database content. (A) Publications reporting human POLE/POLD1 variants and their functional studies in experimental models. See Supplementary Tables S3 and S4 for the complete list of publications. (B) Venn diagrams of germline and somatic POLE/POLD1 variants reported in the literature and public databases as of August 2024. See Supplementary Tables S1 and S2 for the complete list of variants. (C) Venn diagrams of POLE/POLD1 variants, for which functional analysis data are included in PolED. (D) Summary of PolED data by experimental model.
To collect functional analysis data, we manually curated 90 publications reporting studies of Pol ε and Pol δ variants in experimental assays. We extracted these publications, dating from 1991 to 2025, from the Shcherbakova laboratory literature database cross-checked via extensive PubMed searches. From this literature, we selected data on the variants reported in humans. The data underwent a thorough review for evidence of scientific rigour and statistical significance before inclusion in PolED. At this time, PolED references only original publications reporting a variant’s effect for the first time. Subsequent studies repeating the analysis are cited only if they add new information (e.g. the phenotype of heterozygous vs. homozygous cells, biochemical effects on different polymerase subcomplexes, or tissue-specific vs. whole-body knock-ins in mice). Figure 2A and Supplementary Table S4 show functional analysis publications included in the current version of PolED. In rare cases, PolED includes unpublished data from our laboratory if there are no analogous published reports. In these cases, the experimental assays use the same methodology and adhere to the same quality standards as our published work. These data are available upon request.
The current PolED version contains functional analysis data for 67 POLE and 69 POLD1 variants. Most occurred both in the germline and as somatic mutations in tumours. Seventeen POLE and 24 POLD1 variants were reported only in the germline, and seven POLE and five POLD1 variants were reported only as somatic mutations (Fig. 2C). Experimental models include in vitro biochemical assays with purified polymerases, in vivo mutation rate assays in yeast Saccharomyces cerevisiae or Schizosaccharomyces pombe, ex vivo mutation rate assays in cultured human or mouse cells, and in vivo mouse models (Fig. 2D). PolED also provides functional information on 32 catalytic amino acid residue variants with no direct experimental data, because structural and functional studies of other polymerases established the importance of these residues for catalysis of the exonuclease reaction.
Database design and features
The PolED web application is developed using Uvicorn (0.34.1, an ASGI server) and FastAPI (v0.115.12, a Python web framework). It runs in a Python 3.10.12 virtual environment and is managed by systemd on a Linux server (Fig. 3, top). Variants and associated information are stored in SQLite (3.37.2) and accessed via SQLAlchemy (2.03.23). The interface is rendered using HTML, CSS, and JavaScript with static content managed by FastAPI. Visual design is powered by Bootstrap (v5.3.5). The interface is adapted for both desktop and mobile browsers. Processing scripts are developed in Python and JavaScript. PolED supports access via modern web browsers, including Chrome, Firefox, and Safari.

PolED architecture and interface. The PolED database is implemented as a FastAPI-based web application, deployed using Uvicorn, and managed via systemd. The backend stores variant data in an SQLite database and serves both static files and dynamic content. Users can interact with PolED through a web-based interface or programmatically via the API. The interface allows users to explore data on individual variants or view and download summary tables of all variants with demonstrated functional significance. The API enables integration with external tools and workflows (e.g. NGS pipelines, annotation tools, external databases).
Users access the PolED database through an intuitive web interface. It includes options to explore data on individual POLE/POLD1 variants through the variant browser or view and download summary tables of all variants with demonstrated functional significance (Fig. 3, bottom left). Variants with evidence of functional significance are marked by colour in the variant browser (blue = significant; grey = not significant). Each variant has a dedicated page with a brief summary of functional effects and links to detailed summaries of available data from specific experimental systems (biochemical assays, yeast, human cells, mouse cells, or mouse tumour models). Both the variant-specific pages and the summary tables provide hyperlinks to the corresponding literature references. Users can contribute new functional analysis data through the “Submit data” page.
PolED also provides a representational state transfer (REST) application programming interface (API) (Fig. 3, bottom right). The API is documented and can be examined in an interactive Swagger UI interface. The API retrieves data in JavaScript Object Notation (JSON) format and supports the following two endpoints:
https://poled-db.org/API/variants/{gene} retrieves all variants for the specified gene, including information about domain location, effect on catalytic residues, a general summary, and functional significance.
https://poled-db.org/API/{variant}/ retrieves available functional analysis data for the specified variant across different experimental models.
Conclusions and future developments
In this report, we introduce PolED, a web-based database that allows clinicians and researchers to access and analyse functional data on POLE and POLD1 variants in a user-friendly format. By consolidating experimental findings from diverse model systems and presenting them in a structured way, PolED addresses the current gap in the functional annotation of these clinically relevant variants. It emphasizes variant effects on properties relevant to the development of ultramutated cancers: exonuclease and polymerase activities, DNA synthesis fidelity, mutation rate, and tumour susceptibility. The resource facilitates variant interpretation and aids in clinical decision-making, particularly in cancer patient management, where ultramutated phenotypes have emerged as biomarkers for immunotherapy. PolED also helps interpret germline POLE and POLD1 variants to facilitate risk assessment and surveillance of families with cancer predisposition syndromes. Additionally, PolED stimulates further studies of ultramutation in cancer by systematically organizing existing information and providing a catalogue of alleles available as research tools.
We expect that the number of reported POLE and POLD1 variants will grow, and more variants will be examined in experimental assays. We will continue to evaluate new findings and incorporate them into PolED. Data discovered through literature searches or submitted by users are manually curated, processed, and incorporated into the database release within a week of discovery or submission. These data will be of interest to both clinicians and a broad community of researchers in the fields of genome instability and cancer biology.
Acknowledgements
We thank Maggie Luong, Stephanie Barbari, Dmitry Gordenin, Doug Levine, and Diego Castrillon for valuable feedback on the database.
Conflict of interest
None declared.
Funding
This work was supported by the National Institutes of Health grants CA239688 and ES015869 to P.V.S.
Data availability
PolED primarily compiles published data and is freely available at https://poled-db.org/. Unpublished data included in the database are available upon reasonable request to the corresponding author.