ScLineageAtlas: a comprehensive single-cell genomics database for characterizing cellular clones in cancer

Abstract

Accurate identification of clonal relationships between cell populations is crucial for investigating cellular differentiation trajectories and gaining insights into the underlying mechanisms of cancer initiation and development. The Single Cell Lineage Atlas (ScLineageAtlas; https://www.scladb.geneis.org.cn) is a comprehensive single-cell genomics database that characterizes cellular clones across various cancer types. The database currently includes 24 processed single-cell RNA sequencing datasets spanning 13 different cancer types. ScLineageAtlas leverages advanced computational methods to identify cellular clones, providing researchers with a detailed understanding of clone relationships and evolutionary dynamics. Additionally, the database offers comprehensive metadata for each sample, enabling researchers to explore contextual information and sample characteristics. The spatial visualization of cell clones presented in the ScLineageAtlas provides a valuable tool for enhancing our understanding of the genetic heterogeneity within the tumour microenvironment. Through the analysis of biological differences between these diverse cell populations, researchers can explore key genes and signalling pathways associated with cancer initiation, development, and therapeutic efficacy. In summary, the ScLineageAtlas serves as a user-friendly platform for data operations on cellular clones, facilitating the understanding of tumour heterogeneity, differentiation trajectories, and evolution. It thus contributes significantly to cancer research and clinical practice.

Introduction

The identification of cellular differentiation trajectories offers valuable insights into the mechanisms underlying cancer initiation and development [1–3]. In model organisms, lineage tracing can be accomplished by introducing heritable genetic sequences into individual cells and monitoring their alterations in descendant cells, enabling researchers to delineate the differentiation pathways of cells within tissues [4, 5]. However, the application of these methods in human studies remains impractical.

The hierarchical acquisition of genetic mutations, including single nucleotide variants (SNVs) and copy number variations (CNVs), is a fundamental characteristic of cellular clonal evolution [3, 6–8]. Despite technological advancements, the use of whole-genome sequencing to detect genetic mutations in individual cells still encounters challenges, such as high costs and elevated error rates [9]. Additionally, genetic mutations within the cell nucleus are relatively infrequent [10]. As a result, reconstructing cell lineages based exclusively on nuclear genetic information remains a significant challenge.

Tumour heterogeneity is a key factor driving tumour evolution and treatment resistance [11, 12]. Different subclones respond variably to targeted therapies, and resistant subclones can expand under treatment pressure, leading to disease progression. Furthermore, even tumours of the same pathological type can exhibit different molecular characteristics in different patients. Therefore, personalizing treatment based on the unique tumour features of each patient can significantly improve treatment success rates.

Mutations in mitochondrial DNA (mtDNA) are effective endogenous genetic markers for reconstructing cellular clonal structures [13–15]. mtDNA mutations are prevalent across various tumours and are considered key factors influencing tumour heterogeneity. Research indicates that the burden of mtDNA mutations exhibits significant lineage specificity among different tumour types, with certain cancers showing markedly higher mutation rates than others. These mutations not only impact cellular metabolic homeostasis but may also exacerbate tumour heterogeneity by promoting the clonal evolution of tumour cells [16]. Kwok et al. [14] introduced MQuad, a method for identifying mitochondrial variants (mtSNVs) from single-cell RNA sequencing (scRNA-seq) and the assay for transposase-accessible chromatin with high throughput sequencing (ATAC-Seq) data for clonality inference. However, the integration of large-scale scRNA-seq data presents significant challenges, including the need for substantial data processing expertise and computational resources, as well as addressing the presence of batch effects. Therefore, establishing a platform capable of comprehensively characterizing cellular clones by integrating published data across studies is critically needed.

Recent studies have also shown that clonal evolution exhibits significant heterogeneity [16]. Clonal evolution is considered one of the most critical aspects of cancer and plays a key role in its treatment. The ability to identify new clones, particularly at their emergence, and to determine which therapeutic strategies are effective against these clones will become increasingly important in clinical settings [17]. This understanding can potentially guide more personalized and effective treatment approaches, ultimately improving patient outcomes.

Here, we present Single Cell Lineage Atlas (ScLineageAtlas), a manually curated single-cell clone database for human tumours. By integrating publicly available scRNA-seq data and characterizing clones across various cancers, ScLineageAtlas elucidates clonal relationships and evolutionary patterns, enhancing our understanding of tumour heterogeneity and differentiation.

Materials and methods

scRNA-seq data collection

We systematically collected scRNA-seq data by searching PubMed for the literature related to scRNA-seq research using ((cancer[Title/Abstract]) OR (tumour[Title/Abstract])) AND ((single cell RNA sequencing[Title/Abstract]) OR (scRNA-seq [Title/Abstract])) as keywords. The literature was then manually confirmed whether the raw FASTQ data generated by the CellRanger mkfastq command were publicly available from the sequence read archive (SRA) repository. We downloaded the data from SRA and manually mapped FASTQ files to the corresponding samples. Through a manual review of all relevant literature and supplemental materials, we collected clinical information for each dataset, including patient age, gender, and tissue type. Additionally, we provided details on the sequencing platforms and library preparation methods, encompassing platform type, library strategy, source, and selection criteria. For comprehensive metadata regarding the samples, refer to Supplementary Table 1.

Genotyping single cells

The pipeline genotypes each cell in scRNA-seq data through many steps. The input data consist of the FASTQ files collected by the above procedure. Then, BAM files are generated using the CellRanger count command. The cellSNP-lite [18], a C/C++-based tool, efficiently genotypes bi-allelic single-nucleotide polymorphisms (SNPs) in single cells. We utilized cellSNP-lite (v1.2.1) to conduct pileup analysis on raw reads from BAM files, generating two output types: (1) an SNP-by-cell matrix in VCF format and (2) a sparse matrix detailing allele depth (AD) and total sequencing depth (DP) for each cell at variant sites. Importantly, the output includes all SNPs detected in the mitochondrial genome, which is characterized by significant technical noise and non-informative variants. Throughout the analysis, cellSNP-lite was operated with default parameters.

Quality control on inferred variant data and clonal assignments

The SNP-by-cell matrix includes all SNPs located in the mitochondrial genome derived from raw reads, resulting in significant noise and many uninformative variants. To address this, we employed MQuad (v0.1.6) to identify high-quality, informative variants using a binomial mixture model. Compared to Gaussian mixture models, the binomial mixture model incorporates heterogeneity as a proportional parameter and directly utilizes raw read counts. In this model, the number of reads (or UMIs) supporting the alternative allele (AD) at a given SNP is assumed to follow a binomial distribution, where the total number of trials equals the combined depth of both alleles (DP), and the success probability depends on variant presence. MQuad utilizes a model selection approach to evaluate the informativeness of each variant, providing a more robust analysis than relying solely on raw allele frequencies. It assesses the heteroplasmy of mtDNA variants through this binomial mixture model, which treats heteroplasmy as a proportional value and can directly utilize raw read counts. MQuad identifies the optimal ‘knee’ point in the distribution of SNPs to detect outlier SNPs with the highest clonal discriminative power, enabling the robust identification of genetically distinct cellular populations. Finally, vireoSNP17 (v0.5.3) conducts variational inference to cluster cells into clones based on the SNPs selected by MQuad.

Quality control and batch effect correction on scRNA-seq data

The expression matrix of scRNA-seq could be downloaded from the Gene Expression Omnibus, the website provided by the original paper, or the output of the CellRanger count command. Using the quality control parameters and scanpy (v1.8.2), we filtered low-quality cells and rarely expressed genes. For each dataset, after normalization to UPM (UMI per million) and integrating across samples, the batch effect removal was performed using Harmony and BBKNN [19].

Cell clustering and annotation

BBKNN constructs a k-nearest-neighbour graph for the whole dataset considering the batch effect. Using this graph, scanpy can perform Leiden, an unsupervised graph-based clustering method, to cluster cells. The cell-type annotation of scRNA-seq data is processed following the methods described in the corresponding paper. Sometimes, the metadata provided by the original authors already contain the cell-type annotations and we prefer to use them instead of re-doing the whole procedure.

Visualization of individual cells

Cell-type visualization of a dataset was performed using both t-distributed stochastic neighbour embedding and the uniform manifold approximation and projection (UMAP) method. UMAP was generated by the graph from BBKNN.

NEBULA analysis of scRNA-seq data

The negative binomial mixed model using large sample approximation (NEBULA) is a new fast algorithm for differential gene expression analysis of scRNA-seq data [20]. Using the single-cell data and the clone labels as input, we discern the differentially expressed genes (DEGs) between two specific clones within a particular cell type.

Database construction

The front-end of the ScLineageAtlas website was developed using Vue 3.3.0 and Element Plus version 2.3.5. The back-end of the website was developed using Java and Spring Boot. Data storage and management were performed using MySQL version 5.7. The ECharts version 5.4.2 plugin software was utilized to create interactive tables and visualize the results. All upstream and downstream analyses were performed using R version 4.1.2 and Python version 2.7, based on the Linux operating system.

Results

Overview of ScLineageAtlas

ScLineageAtlas employing a standardized workflow that includes quality control, normalization, batch correction, and meticulous manual adjustment of meta-information, coupled with an integrated clonality discovery pipeline cellSNP-MQuad-VireoSNP effectively identifies mtDNA variants and infers clonal relationships between cells (Fig. 1). This comprehensive platform provides a suite of visualization tools that facilitate the elucidation of clonal structures and evolutionary patterns within cancer tissues.

Figure 1.

Scheme of the ScLineageAtlas portal.

Open in new tab Download slide

Summary of datasets in ScLineageAtlas

Currently, the database comprises 24 datasets and 129 samples across 13 cancer types (Fig. 2a and b). The total number of clones is 397, ranging from two to six across various datasets (Fig. 2c). The number of cells assigned to clones totals 526 697, ranging from 90 to 18 512 across various datasets (Fig. 2d). In ScLineageAtlas, the three cancers with the highest total number of clones are gastric cancer, prostate cancer, and ovarian cancer (Fig. 2e). Conversely, the cancers with the highest average number of clones per sample are pancreatic cancer, followed by gastric cancer and prostate cancer (Fig. 2f). Pancreatic cancer is a highly lethal malignancy characterized by significant inter- and intra-tumoural heterogeneity [21, 22].

Figure 2.

Statistics in ScLineageAtlas. (a) Number of datasets summarized by cancer type. (b) Number of samples summarized by cancer type. (c) Clone counts summarized by dataset. (d) Assigned and unassigned cell counts summarized by dataset. (e) and (f) illustrate the distribution of total clone numbers and average clone numbers per sample across different cancer types, respectively. PC: pancreatic cancer; OV: ovarian cancer; CRC: colorectal cancer; GC: gastric cancer; HNSCC: head and neck squamous cell carcinoma; PTC: papillary thyroid carcinoma; PCa: prostate cancer; BRCA: breast cancer; HSTCL: hepatosplenic T-cell lymphoma; LEU: leukaemia; LUAD: lung adenocarcinoma; MESO: mesothelioma; and MM: multiple myeloma.

Open in new tab Download slide

Comparison with other mainstream single-cell genomics databases

We have systematically compiled and analysed ScLineageAtlas alongside existing databases, examining various dimensions such as primary functionality, biological focus, and database scale. The results of this comparison are presented in Table 1.

Table 1.

Open in new tab

Comparison of ScLineageAtlas with other mainstream single-cell genomics databases.

Database	Main functionality	Biological focus	Cell count	Dataset count	Disease/cancer types	Species
scMethBank	Single-base resolution single-cell DNA methylation profiling	Epigenetic heterogeneity, cell state regulation	8328	15	2	Human, mouse
scCancerExplorer	Integrated analysis and visualization of single-cell multi-omics data (genome + epigenome + transcriptome)	Pan-cancer molecular mechanisms, multi-omics correlations	6200 000	161	50	Human
HSCGD	Single-cell whole-genome mutation profiling (SNVs/CNVs)	Genomic instability, somatic mutation accumulation	74 154	63	8	Human
ScLineageAtlas	Clonal lineage reconstruction and evolutionary analysis at single-cell resolution	Tumour clonal evolution, heterogeneity, and differentiation	526 697	24	13	Human

Database	Main functionality	Biological focus	Cell count	Dataset count	Disease/cancer types	Species
scMethBank	Single-base resolution single-cell DNA methylation profiling	Epigenetic heterogeneity, cell state regulation	8328	15	2	Human, mouse
scCancerExplorer	Integrated analysis and visualization of single-cell multi-omics data (genome + epigenome + transcriptome)	Pan-cancer molecular mechanisms, multi-omics correlations	6200 000	161	50	Human
HSCGD	Single-cell whole-genome mutation profiling (SNVs/CNVs)	Genomic instability, somatic mutation accumulation	74 154	63	8	Human
ScLineageAtlas	Clonal lineage reconstruction and evolutionary analysis at single-cell resolution	Tumour clonal evolution, heterogeneity, and differentiation	526 697	24	13	Human

Table 1.

Open in new tab

Comparison of ScLineageAtlas with other mainstream single-cell genomics databases.

Database	Main functionality	Biological focus	Cell count	Dataset count	Disease/cancer types	Species
scMethBank	Single-base resolution single-cell DNA methylation profiling	Epigenetic heterogeneity, cell state regulation	8328	15	2	Human, mouse
scCancerExplorer	Integrated analysis and visualization of single-cell multi-omics data (genome + epigenome + transcriptome)	Pan-cancer molecular mechanisms, multi-omics correlations	6200 000	161	50	Human
HSCGD	Single-cell whole-genome mutation profiling (SNVs/CNVs)	Genomic instability, somatic mutation accumulation	74 154	63	8	Human
ScLineageAtlas	Clonal lineage reconstruction and evolutionary analysis at single-cell resolution	Tumour clonal evolution, heterogeneity, and differentiation	526 697	24	13	Human

Database	Main functionality	Biological focus	Cell count	Dataset count	Disease/cancer types	Species
scMethBank	Single-base resolution single-cell DNA methylation profiling	Epigenetic heterogeneity, cell state regulation	8328	15	2	Human, mouse
scCancerExplorer	Integrated analysis and visualization of single-cell multi-omics data (genome + epigenome + transcriptome)	Pan-cancer molecular mechanisms, multi-omics correlations	6200 000	161	50	Human
HSCGD	Single-cell whole-genome mutation profiling (SNVs/CNVs)	Genomic instability, somatic mutation accumulation	74 154	63	8	Human
ScLineageAtlas	Clonal lineage reconstruction and evolutionary analysis at single-cell resolution	Tumour clonal evolution, heterogeneity, and differentiation	526 697	24	13	Human

ScLineageAtlas is the first platform to analyse tumour heterogeneity through the lens of clonal dynamics. Its lineage tracing feature offers an essential tool for investigating the mechanisms of treatment resistance and metastasis.

User-friendly searching modules to efficiently retrieve data

The home page of ScLineageAtlas utilizes interactive human organism maps to visually represent the cancer type information stored in the database (Fig. 3a). Users are able to click on the organ icon for direct navigation to the corresponding dataset of interest. Besides visual navigation, the home page also provides a search function, enabling users to query the database with multiple parameters, including organ, cancer type, bioproject ID, the Gene Expression Omnibus (GEO) accession, and sample alias.

Figure 3.

Data retrieval functionality. (a) Interactive human organism maps visually representing cancer type information; (b) a search interface based on clinical information; and (c) a search interface based on reference metadata.

Open in new tab Download slide

The ScLineageAtlas platform also provides a dedicated ‘Search’ page for users to quickly obtain the information of interest. This search interface includes two distinct approaches: searching based on clinical information (Fig. 3b) and searching based on reference metadata (Fig. 3c). To facilitate more targeted searches, users can utilize a drop-down menu to select specific criteria, such as organ, cancer type, or data source. All search results are efficiently displayed in a tabular format and are readily downloadable.

Comprehensive multiple-dimensional online data exploration

ScLineageAtlas provides users with eight interactive modules designed to facilitate a comprehensive and in-depth exploration of clone relationships and evolutionary dynamics. By clicking the ‘Detail’ button within the search result table, users can access all the analysis results for their selected samples.

Exploring the spatial distribution of clones

ScLineageAtlas empowers users to explore the spatial distribution of clones through the readily accessible ‘Overview’ menu. The platform delivers two key visualizations to facilitate this crucial analysis. The UMAP module depicts the classification of individual cells in a visually compelling manner (Fig. 4a). Distinct colours are utilized to represent the different cell types, with the intensity of the colour corresponding to a specific clone subtype. The Clone Abundance Module displays the percentage of each clone within the specific cell type (Fig. 4b). Each colour within this module signifies a distinct clone, providing invaluable insights into the relative abundance of each clone across the multifarious cell types. These interactive visualizations empower users to gain a comprehensive understanding of the spatial distribution and clonal composition of the samples within the ScLineageAtlas.

$Data exploration functionality. (a) UMAP of cell types, where each dot represents a cell and different colours indicate distinct cell types. (b) Bar plot displaying clone fractions in different cell types. (c) Heatmap showing the probability of each cell being assigned to each clone, where each row represents a cell and each column represents a clone. (d) Heatmap of mean allelic frequency of mtDNA variants in each clone, with each row representing an mtDNA SNV and each column representing a clone. (e) Heatmap of allele frequency for clonally discriminative mtDNA variants, where each row represents an mtDNA SNV and each column represents a cell. (f) A volcano plot comparing two specific clones was generated based on DEGs. (g) A bubble plot illustrating the Gene OO)enrichment analysis for the two clones based on DEGs. (h) A bubble plot displaying the Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis for the two clones based on DEGs.$

Figure 4.

Data exploration functionality. (a) UMAP of cell types, where each dot represents a cell and different colours indicate distinct cell types. (b) Bar plot displaying clone fractions in different cell types. (c) Heatmap showing the probability of each cell being assigned to each clone, where each row represents a cell and each column represents a clone. (d) Heatmap of mean allelic frequency of mtDNA variants in each clone, with each row representing an mtDNA SNV and each column representing a clone. (e) Heatmap of allele frequency for clonally discriminative mtDNA variants, where each row represents an mtDNA SNV and each column represents a cell. (f) A volcano plot comparing two specific clones was generated based on DEGs. (g) A bubble plot illustrating the Gene OO)enrichment analysis for the two clones based on DEGs. (h) A bubble plot displaying the Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis for the two clones based on DEGs.

Open in new tab Download slide

Evaluating genetic heterogeneity

The ‘Identification of Clones’ menu provides users with access to critical information pertaining to cell clone assignment probabilities and allele frequencies of clonally informative mtDNA variants. The Cell Assignment Probability module presents a visualization of the likelihood of each cell being assigned to a specific clone (Fig. 4c), empowering researchers to assess the confidence of the cell assignments. The Mean mtSNV Allelic Frequency module (Fig. 4d) and mtSNV Allelic Frequency module (Fig. 4e) decipher the mtSNV mutational profiles of individual clones, allowing users to identify highly clone-specific variants and evaluate the extent of genetic heterogeneity. These modules offer invaluable insights into the characteristics and genetic makeup of the individual clones within the analysed samples.

Characterization of biological significance across different cell populations

Functional characterization at the gene and pathway levels of individual clones can shed invaluable light on the mechanisms driving tumour progression and relapse. By utilizing the ‘Comparison between Clones’ menu, users can access the results of DEGs and pathways enrichment analysis across different clones (Fig. 4f–h). These advanced features empower users to conduct rigorous comparative assessments of the biological significance among distinct cell populations.

Example application

To demonstrate the utility and potential applications of ScLineageAtlas, we analysed Cela-010-022-119 (tumour sample) and Cela-010-022-121 (benign sample) from the same patient with prostate cancer. In the tumour sample, we identified a highly clone-specific mutation, 785C>T, in clone 1, which was also observed in normal samples, suggesting that clone 1 may represent a progenitor clone. Additionally, novel mutations were detected in clone 2 (2226T>C), clone 3 (7368T>C), clone 4 (4124T>C), and clone 5 (7395T>C) within the tumour sample, indicating that these cell populations may represent novel clones. These novel clones are predominantly found in endothelial cells, epithelial cells, fibroblasts, T cells, monocytes, and NK cells. Furthermore, through the examination of DEGs between the mtDNA clones, we identified DEGs in specific subclones. For instance, when compared with clone 1, the expression of EGR1 is up-regulated in clone 2. EGR1 plays a significant role in promoting the progression and metastasis of prostate cancer [23, 24]. Ho et al. [25] elucidated the critical regulatory role of EGR1 in renal inflammation and fibrosis, suggesting its potential as a therapeutic target for human renal diseases. Additionally, research has demonstrated that EGR1 can inhibit cholestasis-induced hepatic inflammatory responses [26], highlighting its significant value in the treatment of liver injury. In comparison with clone 3, the expression of ELF3 is down-regulated in clone 4. ELF3 activates NF-kB signalling pathway and drives prostate cancer [27]. Studies have demonstrated that ELF3 expression levels are significantly correlated with tumour metastatic potential [28]. Numerous studies have indicated that ELF3 plays a regulatory role in mesenchymal–epithelial transition and epithelial–mesenchymal transition (EMT) [29, 30]. EMT enhances the migratory and invasive capabilities of tumour cells, facilitating their dissemination from primary tumours to distant sites. Additionally, EMT is often associated with the acquisition of cancer stem cell properties, which contributes to increased therapeutic resistance. Gene Ontology (GO) enrichment analysis demonstrates that the pathways related to adaptive immunity, tumour necrosis factor, and inflammatory response are dysregulated during clonal evolution.

Summary and future developments

As a freely and openly accessible database platform, ScLineageAtlas enables researchers to explore clonal evolution in human tumours. It integrates multi-source scRNA-seq and offers cell-type annotations along with visualizations of differential gene analysis results, all based on a standardized analytical workflow that includes quality control, dimensionality reduction, and clustering. These features address the needs of fundamental research while providing open data downloads and an interactive analysis interface, thereby significantly reducing the operational barriers for non-bioinformatics users. In contrast to other databases, ScLineageAtlas specifically focuses on the genetic heterogeneity of cancer cell clones, enhancing our understanding of tumour heterogeneity, differentiation trajectories, and evolution. Consequently, it makes substantial contributions to both cancer research and clinical practice.

Its update plan encompasses the following aspects: Future versions will centre on multi-omics integration, including genomics, transcriptomics, and epigenomics. Currently, ScLineageAtlas is limited to mitochondrial variants due to the technical challenges in detecting low-frequency mutations from scRNA-seq data. With the advancements in single-cell genomics techniques, nuclear mutations will be incorporated into our analyses, and several computational tools have already been developed for lineage inference, utilizing CNVs, nuclear DNA mutations, and mtSNVs. Notable examples include Cardelino [6], LineageOT [31], PhylEx [3], MutaSeq [32], and EMBLEM [13]. We plan to systematically evaluate and compare the performance of these tools to inform the development of next-generation lineage inference methods based on our findings. Furthermore, beyond cancer research, elucidating lineage relationships between cells is crucial for understanding the cellular origins of disease development, determining cell fate during embryonic development, and mapping the dynamic trajectories of tissue and organ formation. This knowledge offers essential insights into the complexity of life processes and the fundamental mechanisms underlying disease emergence. Therefore, we plan to expand our dataset to encompass a wider range of disease conditions and tissue types. Additionally, we intend to introduce a ‘Compare’ page, which will allow users to conduct cross-dataset comparisons, facilitating the analysis of gene expression alterations and pathway activities among samples across various cancer types and tissues. We will also provide batch-corrected data using multiple correction techniques, enabling users to select the method that best fits their research needs. These enhancements aim to improve the functionality and usability of ScLineageAtlas, ultimately facilitating a deeper understanding of clonal relationships and evolutionary patterns.

Acknowledgements

We thank Beijing Zhilan Technology Co., Ltd for their help in developing the web interface for ScLineageAtlas.

Author contributions

J.Y. and G.T. made substantial contributions to the study’s conception and design. J.L. was in charge of data analysis and interpretation, as well as drafting the research article. R.H. provided invaluable guidance and suggestions for the analysis. J.X., T.H., and H.T., Y.L., along with M.Z., carried out comprehensive dataset filtering and web-page testing. All authors participated in the finalization of the manuscript.

Conflict of interest

The authors declare no conflicts of interest.

Funding

This work was supported by the National Natural Science Foundation of China (Grant Nos. 62302156), the Natural Science Foundation of Hunan Province (Grant No. 2023JJ40180), and 2024 Hebei Province Talent Introduction Project (25120201B) “Identification of Key Genes in CAFs of Colorectal Cancer Liver Metastasis Using Single-Cell Sequencing Technology” (China).

Data availability

All public datasets we gathered in ScLineageAtlas are available from GEO. All public datasets that were processed and integrated into the database are available at https://www.scladb.geneis.org.cn, with no login requirement.

References

Gulati

Sikandar

Wesche

et al.

Single-cell transcriptional diversity is a hallmark of developmental potential

Science

2020

;

367

405

–

411

10.1126/science.aax0249

Kim

Lee

et al.

Single-cell RNA sequencing demonstrates the molecular and cellular reprogramming of metastatic lung adenocarcinoma

Nat Commun

2020

;

2285

10.1038/s41467-020-16164-1

Jun

S-H

Toosi

Mold

et al.

Reconstructing clonal tree for phylo-phenotypic characterization of cancer using single-cell transcriptomics

Nat Commun

2023

;

982

10.1038/s41467-023-36202-y

Kester

van Oudenaarden

Single-cell transcriptomics meets lineage tracing

Cell Stem Cell

2018

;

166

–

179

10.1016/j.stem.2018.04.014

Woodworth

Girskis

Walsh

Building a lineage from single cells: genetic techniques for cell lineage tracking

Nat Rev Genet

2017

;

230

–

244

Mccarthy

Rostom

Huang

et al.

Cardelino: computational integration of somatic clonal substructure and single-cell transcriptomes

Nat Methods

2020

;

414

–

421

10.1038/s41592-020-0766-3

Campbell

Steif

Laks

et al.

clonealign: statistical integration of independent single-cell RNA and DNA sequencing data from human cancers

Genome Biol

2019

;

10.1186/s13059-019-1645-z

Gao

Bai

Henderson

et al.

Delineating copy number and clonal substructure in human tumors from single-cell transcriptomes

Nat Biotechnol

2021

;

599

–

608

10.1038/s41587-020-00795-2

Oota

Somatic mutations—evolution within the individual

Methods

2020

;

176

–

10.1016/j.ymeth.2019.11.002

10.

Acuna-Hidalgo

Veltman

Hoischen

New insights into the generation and role of de novo mutations in health and disease

Genome Biol

2016

;

241

10.1186/s13059-016-1110-1

11.

Dagogo-Jack

Shaw

Tumour heterogeneity and resistance to cancer therapies

Nat Rev Clin Oncol

2018

;

–

10.1038/nrclinonc.2017.166

12.

Veneziani

Gonzalez-Ochoa

Alqaisi

et al.

Heterogeneity and treatment landscape of ovarian carcinoma

Nat Rev Clin Oncol

2023

;

820

–

842

10.1038/s41571-023-00819-1

13.

Nuno

Litzenburger

et al.

Single-cell lineage tracing by endogenous mutations enriched in transposase accessible mitochondrial DNA

Elife

2019

;

e45105

14.

Kwok

AWC

Qiao

Huang

et al.

MQuad enables clonal substructure discovery using single cell mitochondrial variants

Nat Commun

2022

;

1205

10.1038/s41467-022-28845-0

15.

Ludwig

Lareau

Ulirsch

et al.

Lineage tracing in humans enabled by mitochondrial mutations and single-cell genomics

Cell

2019

;

176

1325

–

1339

e1322

10.1016/j.cell.2019.01.022

16.

Ding

Raphael

Chen

et al.

Advances for studying clonal evolution in cancer

Cancer Lett

2013

;

340

212

–

219

10.1016/j.canlet.2012.12.028

17.

Wang

Zhang

Wang

Cellular barcoding: from developmental tracing to anti-tumor drug discovery

Cancer Lett

2023

;

567

216281

10.1016/j.canlet.2023.216281

18.

Huang

Cellsnp-lite: an efficient tool for genotyping single cells

Bioinformatics

2021

;

4569

–

4571

10.1093/bioinformatics/btab358

19.

Polański

Young

Miao

et al.

BBKNN: fast batch alignment of single cell transcriptomes

Bioinformatics

2020

;

964

–

965

10.1093/bioinformatics/btz625

20.

Davila-Velderrain

Sumida

et al.

NEBULA is a fast negative binomial mixed model for differential or co-expression analysis of large-scale multi-subject single-cell data

Commun Biol

2021

;

629

10.1038/s42003-021-02146-6

21.

Huang

Zhang

Tang

et al.

Personalized pancreatic cancer therapy: from the perspective of mRNA vaccine

Mil Med Res

2022

;

10.1186/s40779-022-00416-w

22.

Sherman

Beatty

Tumor microenvironment in pancreatic cancer pathogenesis and therapeutic resistance

Annu Rev Pathol: Mech Dis

2023

;

123

–

148

10.1146/annurev-pathmechdis-031621-024600

Google Scholar

Crossref

WorldCat

23.

Gregg

Fraizer

Transcriptional regulation of EGR1 by EGF and the ERK signaling pathway in prostate cancer cells

Genes Cancer

2011

;

900

–

909

10.1177/1947601911431885

24.

Ameri

Wang

et al.

EGR1 regulates angiogenic and osteoclastogenic factors in prostate cancer and promotes metastasis

Oncogene

2019

;

6241

–

6255

10.1038/s41388-019-0873-8

25.

Li-C

Sung

J-M

Shen

Yi-T

et al.

Egr-1 deficiency protects from renal inflammation and fibrosis

J Mol Med

2016

;

933

–

942

10.1007/s00109-016-1403-6

26.

Zhang

et al.

Anti-inflammatory, anti-oxidative stress and novel therapeutic targets for cholestatic liver injury

BioSci Trends

2019

;

–

10.5582/bst.2018.01247

27.

Longoni

Sarti

Albino

et al.

ETS transcription factor ESE1/ELF3 orchestrates a positive feedback loop that constitutively activates NF-κB and drives prostate cancer progression

Cancer Res

2013

;

4533

–

4547

10.1158/0008-5472.Can-12-4537

28.

Tao

Liu

et al.

Consensus nonnegative matrix factorization reveals metastatic gene expression program and identifies E74-like ETS transcription factor 3 confers to the lymph nodes metastasis in papillary thyroid cancer

Endocrine

2025

;

798

–

819

10.1007/s12020-025-04205-y

29.

MacParland

Liu

X-Z

et al.

Single cell RNA sequencing of human liver reveals distinct intrahepatic macrophage populations

Nat Commun

2018

;

4383

10.1038/s41467-018-06318-7

30.

Suzuki

Saito-Adachi

Arai

et al.

E74-like factor 3 is a key regulator of epithelial integrity and immune response genes in biliary tract cancer

Cancer Res

2021

;

489

–

500

10.1158/0008-5472.Can-19-2988

31.

Forrow

Schiebinger

LineageOT is a unified framework for lineage tracing and trajectory inference

Nat Commun

2021

;

4940

10.1038/s41467-021-25133-1

32.

Velten

Story

Hernández-Malmierca

et al.

Identification of leukemic and pre-leukemic stem cells by clonal tracking from single-cell transcriptomics

Nat Commun

2021

;

1366

10.1038/s41467-021-21650-1

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Month:	Total Views:
September 2025	72
October 2025	122
November 2025	20
December 2025	41
January 2026	27
February 2026	24
March 2026	7

Article Contents

ScLineageAtlas: a comprehensive single-cell genomics database for characterizing cellular clones in cancer

Abstract

Introduction

Materials and methods

scRNA-seq data collection

Genotyping single cells

Quality control on inferred variant data and clonal assignments

Quality control and batch effect correction on scRNA-seq data

Cell clustering and annotation

Visualization of individual cells

NEBULA analysis of scRNA-seq data

Database construction

Results

Overview of ScLineageAtlas

Summary of datasets in ScLineageAtlas

Comparison with other mainstream single-cell genomics databases

User-friendly searching modules to efficiently retrieve data

Comprehensive multiple-dimensional online data exploration

Exploring the spatial distribution of clones

Evaluating genetic heterogeneity

Characterization of biological significance across different cell populations

Example application

Summary and future developments

Acknowledgements

Author contributions

Conflict of interest

Funding

Data availability

References

Supplementary data

Citations

Views

Altmetric

Citing articles via

Latest

Most Read

Most Cited

Article Contents

ScLineageAtlas: a comprehensive single-cell genomics database for characterizing cellular clones in cancer Open Access

Abstract

Introduction

Materials and methods

scRNA-seq data collection

Genotyping single cells

Quality control on inferred variant data and clonal assignments

Quality control and batch effect correction on scRNA-seq data

Cell clustering and annotation

Visualization of individual cells

NEBULA analysis of scRNA-seq data

Database construction

Results

Overview of ScLineageAtlas

Summary of datasets in ScLineageAtlas

Comparison with other mainstream single-cell genomics databases

User-friendly searching modules to efficiently retrieve data

Comprehensive multiple-dimensional online data exploration

Exploring the spatial distribution of clones

Evaluating genetic heterogeneity

Characterization of biological significance across different cell populations

Example application

Summary and future developments

Acknowledgements

Author contributions

Conflict of interest

Funding

Data availability

References

Supplementary data

Citations

Views

Altmetric

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only

Gift article access

Gift article access

Gift article access

Gift article access

ScLineageAtlas: a comprehensive single-cell genomics database for characterizing cellular clones in cancer