Abstract

RNA interference (RNAi) is a gene silencing process within living cells, which is controlled by the RNA-induced silencing complex with a sequence-specific manner. In flies and mice, the pseudogene transcripts can be processed into short interfering RNAs (siRNAs) that regulate protein-coding genes through the RNAi pathway. Following these findings, we construct an innovative and comprehensive database to elucidate siRNA-mediated mechanism in human transcribed pseudogenes (TPGs). To investigate TPG producing siRNAs that regulate protein-coding genes, we mapped the TPGs to small RNAs (sRNAs) that were supported by publicly deep sequencing data from various sRNA libraries and constructed the TPG-derived siRNA-target interactions. In addition, we also presented that TPGs can act as a target for miRNAs that actually regulate the parental gene. To enable the systematic compilation and updating of these results and additional information, we have developed a database, pseudoMap, capturing various types of information, including sequence data, TPG and cognate annotation, deep sequencing data, RNA-folding structure, gene expression profiles, miRNA annotation and target prediction. As our knowledge, pseudoMap is the first database to demonstrate two mechanisms of human TPGs: encoding siRNAs and decoying miRNAs that target the parental gene. pseudoMap is freely accessible at http://pseudomap.mbc.nctu.edu.tw/.

Database URL:http://pseudomap.mbc.nctu.edu.tw/

Introduction

Pseudogenes are genomic DNA sequences homologous to functional genes yet are not translated into proteins (1). Although pseudogenes are often considered the structurally defective non-functional copies of protein-coding genes, the human genome comprises more numbers of pseudogenes than corresponding functional genes (2). Despite the previous assumption of pseudogenes as genomic fossils, the genome-wide investigations have demonstrated actively transcribed pseudogenes (TPGs) with functional potential (3–12). For instant, TPG of nitric oxide synthase (ψNOS) acts as an antisense regulator of neuronal NOS protein synthesis in snails (13, 14). Another study has established that binding of transcriptional repressor to receptor of ψmakorin1-p1 could activate the homologous parental gene Mkrn1 (15), despite contradictory result was also reported (16). In addition, the TPG of PTENP1PTEN), a highly conserved processed pseudogene of tumour suppressor PTEN, acts as a miRNA-decoy by binding to PTEN-targeting miRNAs (17). Moreover, human pseudogene myosin light chain kinase pseudogene 1 is partially duplicated from the original MYLK gene and promotes cancer cell proliferation (18). Above findings clearly suggest that the non-coding RNA products of TPGs may play an important role in biogenesis pathway and functional processes.

The RNA interference (RNAi) is an important component of the RNA modulation pathway and is incorporated into the RNA-induced silencing complex (RISC) with a sequence-specific manner (19). In mice and fruit flies, double-stranded RNAs arising from the antisense/sense transcripts of processed pseudogene, and its cognate gene, or hairpin structures from inversion and duplication, are cut by Dicer into 21 nt endogenous short interfering RNAs (esiRNAs) with the ability to bind RISC and regulate the expression of parental gene (20–25). Such regulatory mechanism in human remains unclear.

To demonstrate that in human, as in animal models, TPGs may generate naturally occurring siRNAs and Piwi interacting RNAs (piRNAs) to regulate the expression of protein-coding genes, we have developed a computational pipeline and constructed a database-pseudoMap, the map for studying pseudogenes. pseudoMap pre-processes the raw data of public microarray and deep sequencing data into gene expression profiles for both TPG and its cognate gene and small RNA (sRNA) profiles for TPG-derived esiRNAs. pseudoMap further combined the gene expression profiles to construct the TPG-derived esiRNA-target interactions (eSTIs). In addition, according to the previous study of pseudogene, PTENP1 exerts a miRNA decoy by binding to cognate-targeting miRNAs (17), and pseudoMap also provided the ‘miRNA regulator’ to elucidate the relationship of TPG and its cognate gene with miRNA target regulation.

Data generation

In total, more than 20 000 human pseudogenes and their cognate genes were obtained from the Ensembl Genome Browser (Ensembl 63, GRCH37) (26) using BioMart (http://www.ensembl.org/index.html). Affymetrix GeneChip® Human Genome U133A/U133Plus2 is a microarray composed of oligonucleotide probes to measure the level of transcription of each sequence represented, which included transcribed pseudogenes. 1404 pseudogenes have been detectable by this chip, thus considered being transcribed and referred as TPGs. Functional sRNAs (fsRNAs) with sequence length between 18 to 40 nt were collected from the Functional RNA Database (27), which hosts a large collection of known/predicted non-coding RNA sequences from public databases: H-invDB v5.0 (6), FANTOM3 (28), miRBase 17.0 (29, 30), NONCODE v1.0 (31), Rfam v8.1 (32), RNAdb v2.0 (33) and snoRNA-LBME-db rel. 3 (34). The public deep sequencing data from sRNA libraries (35–38) were experimented with on human embryo stem cells, liver tissues or hepatocellular carcinoma (HCC) tissues. Supplementary Table 1 summarizes the statistics of the deep sequencing data from various sRNA libraries. The genomic sequences were obtained from UCSC hg19 (39). Table 1 lists the integrated databases and tools for mining potential regulators and functions of human TPGs.

Table 1

Supported databases and tools in pseudoMap

Integrated database or toolsDatasetDescription
miRBase (29, 30)miRNA annotationThis database not only provides published miRNA sequences and annotations but also supplies known/predict targets
Functional RNA Database (27)sRNA annotationA database to support mining and annotation of functional RNAs
Ensembl Genome Browser (26)Pseudogene, protein-coding geneIt produces genome databases for vertebrates and other eukaryotic species
UCSC Genome Browser (39)Conserved region and Genomic view of genesThis browser provides a rapid and reliable display of any requested portion of genomes at any scale, together with dozens of aligned annotation tracks
GeneCards (52)Gene annotationGeneCards is a searchable, integrated, database of human genes that provides concise genomic-related information of all known and predicted human genes
Mfold (40)RNA folding toolFolding RNA structure
GEO (47)Gene expression profiles and deep sequencing dataA public functional genomics data
BLAST (51)Sequence alignment toolBLAST finds regions of similarity between biological sequences
Integrated database or toolsDatasetDescription
miRBase (29, 30)miRNA annotationThis database not only provides published miRNA sequences and annotations but also supplies known/predict targets
Functional RNA Database (27)sRNA annotationA database to support mining and annotation of functional RNAs
Ensembl Genome Browser (26)Pseudogene, protein-coding geneIt produces genome databases for vertebrates and other eukaryotic species
UCSC Genome Browser (39)Conserved region and Genomic view of genesThis browser provides a rapid and reliable display of any requested portion of genomes at any scale, together with dozens of aligned annotation tracks
GeneCards (52)Gene annotationGeneCards is a searchable, integrated, database of human genes that provides concise genomic-related information of all known and predicted human genes
Mfold (40)RNA folding toolFolding RNA structure
GEO (47)Gene expression profiles and deep sequencing dataA public functional genomics data
BLAST (51)Sequence alignment toolBLAST finds regions of similarity between biological sequences
Table 1

Supported databases and tools in pseudoMap

Integrated database or toolsDatasetDescription
miRBase (29, 30)miRNA annotationThis database not only provides published miRNA sequences and annotations but also supplies known/predict targets
Functional RNA Database (27)sRNA annotationA database to support mining and annotation of functional RNAs
Ensembl Genome Browser (26)Pseudogene, protein-coding geneIt produces genome databases for vertebrates and other eukaryotic species
UCSC Genome Browser (39)Conserved region and Genomic view of genesThis browser provides a rapid and reliable display of any requested portion of genomes at any scale, together with dozens of aligned annotation tracks
GeneCards (52)Gene annotationGeneCards is a searchable, integrated, database of human genes that provides concise genomic-related information of all known and predicted human genes
Mfold (40)RNA folding toolFolding RNA structure
GEO (47)Gene expression profiles and deep sequencing dataA public functional genomics data
BLAST (51)Sequence alignment toolBLAST finds regions of similarity between biological sequences
Integrated database or toolsDatasetDescription
miRBase (29, 30)miRNA annotationThis database not only provides published miRNA sequences and annotations but also supplies known/predict targets
Functional RNA Database (27)sRNA annotationA database to support mining and annotation of functional RNAs
Ensembl Genome Browser (26)Pseudogene, protein-coding geneIt produces genome databases for vertebrates and other eukaryotic species
UCSC Genome Browser (39)Conserved region and Genomic view of genesThis browser provides a rapid and reliable display of any requested portion of genomes at any scale, together with dozens of aligned annotation tracks
GeneCards (52)Gene annotationGeneCards is a searchable, integrated, database of human genes that provides concise genomic-related information of all known and predicted human genes
Mfold (40)RNA folding toolFolding RNA structure
GEO (47)Gene expression profiles and deep sequencing dataA public functional genomics data
BLAST (51)Sequence alignment toolBLAST finds regions of similarity between biological sequences

System flow of pseudoMap

The system flow of pseudoMap is shown in Figure 1, mainly including the collection of datasets such as TPGs, parental genes, fsRNAs, sRNA deep sequencing data, expression profiles, integration of various tools and identification of functions and regulations of TPGs. Based on a genome-wide computational pipeline of sequence-alignment approaches, this work constructed pseudoMap database for elucidation of two major discoveries: TPG-derived eSTI and miRNA-decoy mechanism of TPGs. The detailed analyses are described below.

Figure 1

System flow of pseudoMap. The system flow of pseudoMap mainly includes the collection of datasets such as TPGs, parental genes, miRNAs, piRNAs, sRNA deep sequencing data and expression profiles; integration of various tools and identification of functions and regulations of TPGs. Based on a genome-wide computational pipeline of sequence-alignment approaches and gene expression profiles, this work constructed pseudoMap database for elucidation of two major discoveries: TPG-derived esiRNA-target interaction and miRNA-decoy mechanism of TPGs.

Identification of TPG-derived esiRNAs by public next-generation sequencing data

A computational pipeline was developed to verify the hypotheses that human TPGs may generate esiRNAs to regulate protein-coding genes (Figure 2). An attempt was made to identify the candidates of TPG-derived esiRNAs, by aligning the sequences of TPGs and fsRNAs. These candidates were verified using the deep sequencing data from various sRNA libraries (35–38) experimented with on human embryo stem cells, liver tissues or HCC tissues. The hairpin structure by Mfold (40) was then determined by using the extended sequences of these candidates of esiRNAs. In pseudoMap, a total of 1232 TPGs may produce esiRNAs, which were profiling by deep sequencing data, within 1404 human TPGs were characterized. The information of these TPGs is shown in Supplementary File 1. The results showed that 4 miRNAs and 326 piRNAs may derive from TPGs. We also found that miRNA has-miR-622 was identified, which was derived from keratin 18 pseudogene 27, located on nt 858879, as similar as miRBase database. Table 2 summarizes the entire statistical analysis of pseudoMap.

Figure 2

Computational pipeline for identification of TPG-derived esiRNA-target interactions.

Table 2

Summarizes the entire statistical analysis of pseudoMap

DatasetCounts
No. of miRNA regulators5771/1014a
No. of TPG-derived miRNAs4
No. of TPG-derived piRNAs326
Deep sequencing data for profiling TPG-derived esiRNAs
    Human embryo stem cell—hB247
    Human embryo stem cell—hESC553
    Human embryo stem cell—hues6190
    Human embryo stem cell—hues6NP81
    Human embryo stem cell—hues6Neuron16
    HBV(+) adjacent tissue sample 1917
    HBV(+) adjacent tissue sample 24377
    HBV(+) distal tissue sample 11011
    HBV(+) HCC tissue sample 11281
    HBV(+) HCC tissue sample 22649
    HBV-infected liver tissue3056
    HBV(+) side tissue sample 11087
    HCV(+) adjacent tissue sample14 297
    HCV(+) HCC tissue sample9277
    HBV(−) HCV(−) adjacent tissue sample2324
    HBV(−) HCV(−) HCC tissue sample6579
    Human normal liver tissue sample 11220
    Human normal liver tissue sample 21290
    Human normal liver tissue sample 31209
    Severe chronic hepatitis B liver tissue1247
DatasetCounts
No. of miRNA regulators5771/1014a
No. of TPG-derived miRNAs4
No. of TPG-derived piRNAs326
Deep sequencing data for profiling TPG-derived esiRNAs
    Human embryo stem cell—hB247
    Human embryo stem cell—hESC553
    Human embryo stem cell—hues6190
    Human embryo stem cell—hues6NP81
    Human embryo stem cell—hues6Neuron16
    HBV(+) adjacent tissue sample 1917
    HBV(+) adjacent tissue sample 24377
    HBV(+) distal tissue sample 11011
    HBV(+) HCC tissue sample 11281
    HBV(+) HCC tissue sample 22649
    HBV-infected liver tissue3056
    HBV(+) side tissue sample 11087
    HCV(+) adjacent tissue sample14 297
    HCV(+) HCC tissue sample9277
    HBV(−) HCV(−) adjacent tissue sample2324
    HBV(−) HCV(−) HCC tissue sample6579
    Human normal liver tissue sample 11220
    Human normal liver tissue sample 21290
    Human normal liver tissue sample 31209
    Severe chronic hepatitis B liver tissue1247

a1014 distinct miRNAs involved in 5771 miRNA regulators.

Table 2

Summarizes the entire statistical analysis of pseudoMap

DatasetCounts
No. of miRNA regulators5771/1014a
No. of TPG-derived miRNAs4
No. of TPG-derived piRNAs326
Deep sequencing data for profiling TPG-derived esiRNAs
    Human embryo stem cell—hB247
    Human embryo stem cell—hESC553
    Human embryo stem cell—hues6190
    Human embryo stem cell—hues6NP81
    Human embryo stem cell—hues6Neuron16
    HBV(+) adjacent tissue sample 1917
    HBV(+) adjacent tissue sample 24377
    HBV(+) distal tissue sample 11011
    HBV(+) HCC tissue sample 11281
    HBV(+) HCC tissue sample 22649
    HBV-infected liver tissue3056
    HBV(+) side tissue sample 11087
    HCV(+) adjacent tissue sample14 297
    HCV(+) HCC tissue sample9277
    HBV(−) HCV(−) adjacent tissue sample2324
    HBV(−) HCV(−) HCC tissue sample6579
    Human normal liver tissue sample 11220
    Human normal liver tissue sample 21290
    Human normal liver tissue sample 31209
    Severe chronic hepatitis B liver tissue1247
DatasetCounts
No. of miRNA regulators5771/1014a
No. of TPG-derived miRNAs4
No. of TPG-derived piRNAs326
Deep sequencing data for profiling TPG-derived esiRNAs
    Human embryo stem cell—hB247
    Human embryo stem cell—hESC553
    Human embryo stem cell—hues6190
    Human embryo stem cell—hues6NP81
    Human embryo stem cell—hues6Neuron16
    HBV(+) adjacent tissue sample 1917
    HBV(+) adjacent tissue sample 24377
    HBV(+) distal tissue sample 11011
    HBV(+) HCC tissue sample 11281
    HBV(+) HCC tissue sample 22649
    HBV-infected liver tissue3056
    HBV(+) side tissue sample 11087
    HCV(+) adjacent tissue sample14 297
    HCV(+) HCC tissue sample9277
    HBV(−) HCV(−) adjacent tissue sample2324
    HBV(−) HCV(−) HCC tissue sample6579
    Human normal liver tissue sample 11220
    Human normal liver tissue sample 21290
    Human normal liver tissue sample 31209
    Severe chronic hepatitis B liver tissue1247

a1014 distinct miRNAs involved in 5771 miRNA regulators.

Identification of TPG-derived esiRNA-target interactions

Our previous approach (41) was modified to identify TPG-derived esiRNA targets. Briefly, the esiRNA target sites within the conserved regions of coding, 5’-UTR and 3’-UTR of genes were identified in 12 metazoan genomes by using three computational approaches, TargetScan (42–44), miRanda (45) and RNAhybrid (46). The minima free energy (MFE) threshold was −20 kcal/mol with a score more than or equal to 150 for miRanda and default parameters for TargetScan and RNAhybrid. The targets were identified using the following criteria: (i) the potential target sites were determined by at least two approaches; (ii) multiple target sites were prioritized and (iii) target sites must be located in accessible regions. Finally, we provided the gene expression profiles of TPG and its cognate gene to construct the eSTIs.

Gene expression analysis

The mRNA abundances of TPGs and protein-coding genes were obtained from Gene Expression Omnibus (47), such as GDS596 examined from 79 human physiologically normal tissues (48), GSE2109 examined from 2158 samples with 61 tumour tissues, GSE3526 examined from 353 samples with 65 normal tissues (49) and GSE5364 examined from primary human tumours and adjacent non-tumour tissues, which include 270 tumours and 71 normal-cancer pairs from patients with breast, colon, liver, lung, oesophagal and thyroid cancers (50). Moreover, the Pearson correlation coefficient was computed from TPGs and protein-coding genes.

Determination of miRNA-target interactions

According to the study by Poliseno et al. (17), pseudogenes PTENP1 and KRAS1P act as a ‘miRNA decoy’, binding to and thereby reducing the effective cellular concentration of miRNAs, therefore resulting their cognate genes to escape miRNA-mediated repression. In this study, we analyse the relationships between TPG and its cognate gene with miRNA decoys mechanism to examine miRNA-target interactions (MTIs) by performing a pipeline. First, the parental genes were obtained by mapping the TPGs and genomic sequences with the BLAST (51) program. The MTIs with TPGs and parental genes were then investigated using our previous approach (41). The MFE threshold was −20 kcal/mol with a score more than or equal to 150 for miRanda and default parameters for TargetScan and RNAhybrid. Finally, the TPGs and their cognates co-regulated by miRNAs were obtained. The miRNA and 3’UTR sequences were obtained from miRBase R18 (29, 30) and Ensembl Genome Browser release 63 (26), respectively. Analysis results indicated that 874 miRNAs with MFE ≤ −20 and Score ≥ 150 interact with many possible target sites in 248 TPGs and their cognate genes and might potentially co-regulate this pair of TPG and parental gene (Supplementary File 2).

Web interface

As a web-based system, pseudoMap can thoroughly identify TPGs, including TPGs act as a miRNA regulators and TPGs-derived eSTIs in humans. There are two ways to access pseudoMap: by browsing the database content or by searching for a particular TPG. Figure 3A displays the interface of output results of the browse gateway. The interface contains general information of TPGs, the relationships of TPG and its cognate gene with miRNA-mediated repression termed as ‘miRNA Regulator’, TPG-derived eSTIs named as ‘esiRNA’, and ‘Expression’ showed the gene expression profiles. Figure 3B provides a detailed view of miRNA regulator, which displays more fine-grained information. Above results indicated the relationships between TPGs and cognate genes by a miRNA decoy mechanism such as that observed by Poliseno et al. (17). The ‘Expression’ presents the gene expression profiles of not only distinct TPG and corresponding parental gene but also TPG referenced by cognate in various experimental conditions (Figure 3C). Moreover, the view of esiRNA indicates the TPG-derived esiRNAs and graphical display of deep sequencing data (Figure 3D). The red line represents the TPG and the blue line refers to esiRNAs. We also estimate the eSTIs (Figure 3E) and the RNA folding structure of TPG-derived esiRNA (Figure 3F). All the results and sequences can be downloaded for further experimental tests. In pseudoMap, we also incorporate the external sources, such as UCSC genome browser (39) for a genomic view, GeneCards (52) for gene annotation and miRBase (29) for miRNA annotation (Figure 3G). In addition, pseudoMap also consists of a tutorial and knowledge of pseudogenes.

Figure 3

Web interface of pseudoMap. (A) Browse interface of pseudoMap illustrates general information of TPGs, miRNA regulators, esiRNAs and gene expression profiles. (B) The miRNA regulator indicates the miRNA decoys mechanisms between TPG and its cognate. (C) Gene expression profiles of TPG and its cognate gene in various experimental conditions. (D) The diagram of esiRNA represents TPG-derived siRNAs as profiled by deep sequencing data. It displays the more fine-grained information of (E) esiRNA-target interaction and (F) RNA folding structure of TPG-derived esiRNA. In addition, pseudoMap also incorporates the external sources, such as (G) UCSC genome browser for a genomic view, GeneCards for gene annotation and miRBase for miRNA annotation.

In the search gateway, the TPG ID, Ensembl ID, TPG symbol and parental gene symbol are allowed for further analysis. Figure 4 displays the interface of output results with the search a particular TPG ID/Ensembl ID/TPG symbol/parental gene symbol. The interface contains the general information of TPG, miRNA regulators, gene expression profiles and TPG-derived esiRNAs.

Figure 4

Search interface of pseudoMap.

Construction and content

In pseudoMap, various databases are integrated and maintained with MySQL (http://www.mysql.com/) relational database management system. While operating on an Apache HTTP server (http://www.apache.org/) and PHP (http://www.php.net/) on a Linux operation system (http://www.linux.com), pseudoMap was constructed using the Smarty template engine (http://www.smarty.net). Based on PHP, JavaScript (http://www.javascriptsource.com/), CSS (http://www.w3schools.com/css/) and HTML (http://www.w3schools.com/html/) languages, the web interface enables dynamic MySQL queries with user-friendly graphics. Above software are open source technologies.

Discussion and conclusions

Comparison with other previous databases related to pseudogenes

A few databases have been constructed to explore pseudogenes. In particular, PseudoGene database (53) identifies pseudogenes using various computational methods in genomes; HOPPSIGEN (54) represents the homologous processed pseudogenes shared between the mouse and human genomes that contains location information and potential function; as a web-based system, PseudoGeneQuest (55) identifies novel human pseudogenes based on a user-provided protein sequence; in addition, the University of Iowa’s UI Pseudogenes website contains human pseudogenes and the candidates for gene conversion (56). However, these databases focus on automatic detection of pseudogenes by using a variety of homology-based approaches. Our database, pseudoMap, aims at providing comprehensive resource for genome-wide identifying the functions and regulators of human TPGs. In briefly, there are three major differentiating features from currently public databases of pseudogenes. First, pseudoMap elucidates the relationships of TPG and its cognate gene with miRNA decoys mechanism. Second, to explore the interaction of TPG and its parental gene, pseudoMap provides the gene expression profiles of TPG and its cognate gene in various experimental conditions. Third, pseudoMap curates the TPG-derived esiRNAs, which supported by deep sequencing data, as well as their interacting gene targets in the human genome. Table 3 lists the detailed comparisons of pseudoMap with other previous databases related to pseudogenes.

Table 3

Comparisons of pseudoMap with currently public databases of pseudogenes

Supported featurespseudoMap (our database)PseudoGene databaseUI pseudogeneHoppsigen
Web interfacehttp://pseudomap.mbc.nctu.edu.tw/http://www.pseudogene.org/https://genome.uiowa.edu/pseudogenes/http://pbil.univ-lyon1.fr/databases/hoppsigen.html
DescriptionpseudoMap provides a comprehensive resource for genome-wide identifying the functions and regulators of human pseudogenes.This site contains a comprehensive database of identified pseudogenes, utilities used to find pseudogenes, various publication data sets and a pseudogene knowledgebase.This site serves as a repository for all pseudogenes in the human genome and provides a ranked list of human pseudogenes that have been identified as candidates for gene conversion.Hoppsigen is a nucleic database of homologous processed pseudogenes.
Species supportedHumanEukaryote and prokaryoteHumanHuman
Sequence downloadYesYesYesYes
Pseudogene informationYesYesYesYes
Parental gene informationYesYes
Knowledge of pseudogenesYesYesYes
miRNA–pseudogene interactionsYes
miRNA–parental gene interactionsYes
Gene expression profilesYes (both pseudogene and its parental gene)
Pseudogene-derived siRNAsYes
Deep sequencing data for profiling TPG-derived siRNAsYes
Supported featurespseudoMap (our database)PseudoGene databaseUI pseudogeneHoppsigen
Web interfacehttp://pseudomap.mbc.nctu.edu.tw/http://www.pseudogene.org/https://genome.uiowa.edu/pseudogenes/http://pbil.univ-lyon1.fr/databases/hoppsigen.html
DescriptionpseudoMap provides a comprehensive resource for genome-wide identifying the functions and regulators of human pseudogenes.This site contains a comprehensive database of identified pseudogenes, utilities used to find pseudogenes, various publication data sets and a pseudogene knowledgebase.This site serves as a repository for all pseudogenes in the human genome and provides a ranked list of human pseudogenes that have been identified as candidates for gene conversion.Hoppsigen is a nucleic database of homologous processed pseudogenes.
Species supportedHumanEukaryote and prokaryoteHumanHuman
Sequence downloadYesYesYesYes
Pseudogene informationYesYesYesYes
Parental gene informationYesYes
Knowledge of pseudogenesYesYesYes
miRNA–pseudogene interactionsYes
miRNA–parental gene interactionsYes
Gene expression profilesYes (both pseudogene and its parental gene)
Pseudogene-derived siRNAsYes
Deep sequencing data for profiling TPG-derived siRNAsYes
Table 3

Comparisons of pseudoMap with currently public databases of pseudogenes

Supported featurespseudoMap (our database)PseudoGene databaseUI pseudogeneHoppsigen
Web interfacehttp://pseudomap.mbc.nctu.edu.tw/http://www.pseudogene.org/https://genome.uiowa.edu/pseudogenes/http://pbil.univ-lyon1.fr/databases/hoppsigen.html
DescriptionpseudoMap provides a comprehensive resource for genome-wide identifying the functions and regulators of human pseudogenes.This site contains a comprehensive database of identified pseudogenes, utilities used to find pseudogenes, various publication data sets and a pseudogene knowledgebase.This site serves as a repository for all pseudogenes in the human genome and provides a ranked list of human pseudogenes that have been identified as candidates for gene conversion.Hoppsigen is a nucleic database of homologous processed pseudogenes.
Species supportedHumanEukaryote and prokaryoteHumanHuman
Sequence downloadYesYesYesYes
Pseudogene informationYesYesYesYes
Parental gene informationYesYes
Knowledge of pseudogenesYesYesYes
miRNA–pseudogene interactionsYes
miRNA–parental gene interactionsYes
Gene expression profilesYes (both pseudogene and its parental gene)
Pseudogene-derived siRNAsYes
Deep sequencing data for profiling TPG-derived siRNAsYes
Supported featurespseudoMap (our database)PseudoGene databaseUI pseudogeneHoppsigen
Web interfacehttp://pseudomap.mbc.nctu.edu.tw/http://www.pseudogene.org/https://genome.uiowa.edu/pseudogenes/http://pbil.univ-lyon1.fr/databases/hoppsigen.html
DescriptionpseudoMap provides a comprehensive resource for genome-wide identifying the functions and regulators of human pseudogenes.This site contains a comprehensive database of identified pseudogenes, utilities used to find pseudogenes, various publication data sets and a pseudogene knowledgebase.This site serves as a repository for all pseudogenes in the human genome and provides a ranked list of human pseudogenes that have been identified as candidates for gene conversion.Hoppsigen is a nucleic database of homologous processed pseudogenes.
Species supportedHumanEukaryote and prokaryoteHumanHuman
Sequence downloadYesYesYesYes
Pseudogene informationYesYesYesYes
Parental gene informationYesYes
Knowledge of pseudogenesYesYesYes
miRNA–pseudogene interactionsYes
miRNA–parental gene interactionsYes
Gene expression profilesYes (both pseudogene and its parental gene)
Pseudogene-derived siRNAsYes
Deep sequencing data for profiling TPG-derived siRNAsYes

Applications

PseudoMap provides two major applications. One is the non-coding RNA products of TPGs, as like animal models, that may generate esiRNAs to regulate protein-coding genes in humans. In this process, pseudoMap supplies next-generation sequencing data from sRNA libraries to support the candidates of TPG-derived esiRNAs and gene expression profiles to verify the target interactions, respectively. Another application is that both the gene and pseudogene contain miRNA target sites, if the pseudogene competes for the freely available repressor molecules that would be free the gene to reduce the miRNA-mediated repression. Another words, the pseudogene may act as a ‘miRNA decoy’ to release the repression of its cognate gene. pseudoMap provides another insight into the pathway of MTIs with TPG-mediated mechanism.

Conclusion

In this study, we performed a computational pipeline to identify TPG-derived esiRNAs-target interactions and constructed a comprehensive database to represent the potential functions and regulators of TPGs in human. To our knowledge, the pseudoMap is the first database to identify TPGs to enable biologists and bioinformaticians to elucidate two major discoveries, the relationships between TPG and its cognate gene with miRNA decoyed mechanisms and TPG-derived eSTIs. Efforts are underway in our laboratory to expand the methods used in pseudoMap to other species such as mice, fruit flies and plants. The pseudoMap will be updated frequently by continuingly surveying experimentally validated sRNAs and will be maintained with a long-term support from National Chiao Tung University and National Science Council at Taiwan. This novel and creative resource is now freely available at http://pseudomap.mbc.nctu.edu.tw/.

Funding

National Science Council of the Republic of China [NSC 99-2320-B-037-006-MY3 to J.G.C., NSC 98-2314-B039-010MY3 to W.K.Y., NSC 98-2311-B-009-004-MY3 to H.D.H., NSC 99-2627-B-009-003 to H.D.H., NSC 101-2311-B-009-003-MY3 to H.D.H. and NSC 100-2627-B-009-002 to H.D.H.]; UST-UCSD International Center of Excellence in Advanced Bio-engineering sponsored by the Taiwan National Science Council I-RiCE Program [NSC 101-2911-I-009 -101 - to H.D.H., in part]; Veterans General Hospitals and University System of Taiwan (VGHUST) Joint Research Program [VGHUST101-G5-1-1 to H.D.H., in part]; MOE ATU [in part]. Funding for open access charge: National Science Council of the Republic of China.

Conflict of interest statement. None declared.

References

1
Mighell
AJ
Smith
NR
Robinson
PA
, et al. 
Vertebrate pseudogenes
FEBS Lett.
2000
, vol. 
468
 (pg. 
109
-
114
)
2
Torrents
D
Suyama
M
Zdobnov
E
, et al. 
A genome-wide survey of human pseudogenes
Genome Res.
2003
, vol. 
13
 (pg. 
2559
-
2567
)
3
Balasubramanian
S
Zheng
D
Liu
YJ
, et al. 
Comparative analysis of processed ribosomal protein pseudogenes in four mammalian genomes
Genome Biol.
2009
, vol. 
10
 pg. 
R2
 
4
Harrison
P
Yu
Z
Frame disruptions in human mRNA transcripts, and their relationship with splicing and protein structures
BMC Genomics
2007
, vol. 
8
 pg. 
371
 
5
Harrison
PM
Zheng
D
Zhang
Z
, et al. 
Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability
Nucleic Acids Res.
2005
, vol. 
33
 (pg. 
2374
-
2383
)
6
Imanishi
T
Itoh
T
Suzuki
Y
, et al. 
Integrative annotation of 21,037 human genes validated by full-length cDNA clones
PLoS Biol.
2004
, vol. 
2
 pg. 
e162
 
7
Khachane
AN
Harrison
PM
Assessing the genomic evidence for conserved transcribed pseudogenes under selection
BMC Genomics
2009
, vol. 
10
 pg. 
435
 
8
Vinckenbosch
N
Dupanloup
I
Kaessmann
H
Evolutionary fate of retroposed gene copies in the human genome
Proc. Natl Acad. Sci. USA
2006
, vol. 
103
 (pg. 
3220
-
3225
)
9
Zheng
D
Zhang
Z
Harrison
PM
, et al. 
Integrated pseudogene annotation for human chromosome 22: evidence for transcription
J. Mol. Biol.
2005
, vol. 
349
 (pg. 
27
-
45
)
10
Zheng
D
Frankish
A
Baertsch
R
, et al. 
Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution
Genome Res.
2007
, vol. 
17
 (pg. 
839
-
851
)
11
McCarrey
JR
Riggs
AD
Determinator-inhibitor pairs as a mechanism for threshold setting in development: a possible function for pseudogenes
Proc. Natl Acad. Sci. USA
1986
, vol. 
83
 (pg. 
679
-
683
)
12
Zou
C
Lehti-Shiu
MD
Thibaud-Nissen
F
, et al. 
Evolutionary and expression signatures of pseudogenes in Arabidopsis and rice
Plant Physiol.
2009
, vol. 
151
 (pg. 
3
-
15
)
13
Korneev
S
O'Shea
M
Evolution of nitric oxide synthase regulatory genes by DNA inversion
Mol. Biol. Evol.
2002
, vol. 
19
 (pg. 
1228
-
1233
)
14
Korneev
SA
Park
JH
O'Shea
M
Neuronal expression of neural nitric oxide synthase (nNOS) protein is suppressed by an antisense RNA transcribed from an NOS pseudogene
J. Neurosci.
1999
, vol. 
19
 (pg. 
7711
-
7720
)
15
Hirotsune
S
Yoshida
N
Chen
A
, et al. 
An expressed pseudogene regulates the messenger-RNA stability of its homologous coding gene
Nature
2003
, vol. 
423
 (pg. 
91
-
96
)
16
Gray
TA
Wilson
A
Fortin
PJ
, et al. 
The putatively functional Mkrn1-p1 pseudogene is neither expressed nor imprinted, nor does it regulate its source gene in trans
Proc. Natl Acad. Sci. USA
2006
, vol. 
103
 (pg. 
12039
-
12044
)
17
Poliseno
L
Salmena
L
Zhang
J
, et al. 
A coding-independent function of gene and pseudogene mRNAs regulates tumour biology
Nature
2010
, vol. 
465
 (pg. 
1033
-
1038
)
18
Han
YJ
Ma
SF
Yourek
G
, et al. 
A transcribed pseudogene of MYLK promotes cell proliferation
FASEB J.
2011
, vol. 
25
 (pg. 
2305
-
2312
)
19
Kim
DH
Rossi
JJ
Strategies for silencing human disease using RNA interference
Nat. Rev. Genet.
2007
, vol. 
8
 (pg. 
173
-
184
)
20
Czech
B
Malone
CD
Zhou
R
, et al. 
An endogenous small interfering RNA pathway in Drosophila
Nature
2008
, vol. 
453
 (pg. 
798
-
802
)
21
Ghildiyal
M
Seitz
H
Horwich
MD
, et al. 
Endogenous siRNAs derived from transposons and mRNAs in Drosophila somatic cells
Science
2008
, vol. 
320
 (pg. 
1077
-
1081
)
22
Kawamura
Y
Saito
K
Kin
T
, et al. 
Drosophila endogenous small RNAs bind to Argonaute 2 in somatic cells
Nature
2008
, vol. 
453
 (pg. 
793
-
797
)
23
Okamura
K
Chung
WJ
Ruby
JG
, et al. 
The Drosophila hairpin RNA pathway generates endogenous short interfering RNAs
Nature
2008
, vol. 
453
 (pg. 
803
-
806
)
24
Tam
OH
Aravin
AA
Stein
P
, et al. 
Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes
Nature
2008
, vol. 
453
 (pg. 
534
-
538
)
25
Watanabe
T
Totoki
Y
Toyoda
A
, et al. 
Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes
Nature
2008
, vol. 
453
 (pg. 
539
-
543
)
26
Flicek
P
Amode
MR
Barrell
D
, et al. 
Ensembl 2012
Nucleic Acids Res.
2012
, vol. 
40
 (pg. 
D84
-
D90
)
27
Mituyama
T
Yamada
K
Hattori
E
, et al. 
The Functional RNA Database 3.0: databases to support mining and annotation of functional RNAs
Nucleic Acids Res.
2009
, vol. 
37
 (pg. 
D89
-
D92
)
28
Carninci
P
Kasukawa
T
Katayama
S
, et al. 
The transcriptional landscape of the mammalian genome
Science
2005
, vol. 
309
 (pg. 
1559
-
1563
)
29
Kozomara
A
Griffiths-Jones
S
miRBase: integrating microRNA annotation and deep-sequencing data
Nucleic Acids Res
2011
, vol. 
39
 (pg. 
D152
-
D157
)
30
Griffiths-Jones
S
Saini
HK
van Dongen
S
, et al. 
miRBase: tools for microRNA genomics
Nucleic Acids Res.
2008
, vol. 
36
 (pg. 
D154
-
D158
)
31
He
S
Liu
C
Skogerbo
G
, et al. 
NONCODE v2.0: decoding the non-coding
Nucleic Acids Res.
2008
, vol. 
36
 (pg. 
D170
-
D172
)
32
Griffiths-Jones
S
Moxon
S
Marshall
M
, et al. 
Rfam: annotating non-coding RNAs in complete genomes
Nucleic Acids Res.
2005
, vol. 
33
 (pg. 
D121
-
D124
)
33
Pang
KC
Stephen
S
Dinger
ME
, et al. 
RNAdb 2.0—an expanded database of mammalian non-coding RNAs
Nucleic Acids Res.
2007
, vol. 
35
 (pg. 
D178
-
D182
)
34
Lestrade
L
Weber
MJ
snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs
Nucleic Acids Res.
2006
, vol. 
34
 (pg. 
D158
-
D162
)
35
Morin
RD
O'Connor
MD
Griffith
M
, et al. 
Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells
Genome Res.
2008
, vol. 
18
 (pg. 
610
-
621
)
36
Seila
AC
Calabrese
JM
Levine
SS
, et al. 
Divergent transcription from active promoters
Science
2008
, vol. 
322
 (pg. 
1849
-
1851
)
37
Yeo
GW
Xu
X
Liang
TY
, et al. 
Alternative splicing events identified in human embryonic stem cells and neural progenitors
PLoS Comput. Biol.
2007
, vol. 
3
 (pg. 
1951
-
1967
)
38
Hou
J
Lin
L
Zhou
W
, et al. 
Identification of miRNomes in human liver and hepatocellular carcinoma reveals miR-199a/b-3p as therapeutic target for hepatocellular carcinoma
Cancer Cell
2011
, vol. 
19
 (pg. 
232
-
243
)
39
Kent
WJ
Sugnet
CW
Furey
TS
, et al. 
The human genome browser at UCSC
Genome Res.
2002
, vol. 
12
 (pg. 
996
-
1006
)
40
Zuker
M
Mfold web server for nucleic acid folding and hybridization prediction
Nucleic Acids Res.
2003
, vol. 
31
 (pg. 
3406
-
3415
)
41
Hsu
SD
Chu
CH
Tsou
AP
, et al. 
miRNAMap 2.0: genomic maps of microRNAs in metazoan genomes
Nucleic Acids Res.
2008
, vol. 
36
 (pg. 
D165
-
D169
)
42
Friedman
RC
Farh
KK
Burge
CB
, et al. 
Most mammalian mRNAs are conserved targets of microRNAs
Genome Res.
2009
, vol. 
19
 (pg. 
92
-
105
)
43
Grimson
A
Farh
KK
Johnston
WK
, et al. 
MicroRNA targeting specificity in mammals: determinants beyond seed pairing
Mol. Cell
2007
, vol. 
27
 (pg. 
91
-
105
)
44
Lewis
BP
Burge
CB
Bartel
DP
Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets
Cell
2005
, vol. 
120
 (pg. 
15
-
20
)
45
John
B
Enright
AJ
Aravin
A
, et al. 
Human MicroRNA targets
PLoS Biol.
2004
, vol. 
2
 pg. 
e363
 
46
Kruger
J
Rehmsmeier
M
RNAhybrid: microRNA target prediction easy, fast and flexible
Nucleic Acids Res.
2006
, vol. 
34
 (pg. 
W451
-
W454
)
47
Barrett
T
Edgar
R
Gene expression omnibus: microarray data storage, submission, retrieval, and analysis
Methods Enzymol.
2006
, vol. 
411
 (pg. 
352
-
369
)
48
Su
AI
Wiltshire
T
Batalov
S
, et al. 
A gene atlas of the mouse and human protein-encoding transcriptomes
Proc. Natl Acad. Sci. USA
2004
, vol. 
101
 (pg. 
6062
-
6067
)
49
Roth
RB
Hevezi
P
Lee
J
, et al. 
Gene expression analyses reveal molecular relationships among 20 regions of the human CNS
Neurogenetics
2006
, vol. 
7
 (pg. 
67
-
80
)
50
Yu
K
Ganesan
K
Tan
LK
, et al. 
A precisely regulated gene expression cassette potently modulates metastasis and survival in multiple solid cancers
PLoS Genet.
2008
, vol. 
4
 pg. 
e1000129
 
51
Altschul
SF
Gish
W
Miller
W
, et al. 
Basic local alignment search tool
J. Mol. Biol.
1990
, vol. 
215
 (pg. 
403
-
410
)
52
Stelzer
G
Dalah
I
Stein
TI
, et al. 
In-silico human genomics with GeneCards
Hum. Genomics
2011
, vol. 
5
 (pg. 
709
-
717
)
53
Karro
JE
Yan
Y
Zheng
D
, et al. 
Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation
Nucleic Acids Res.
2007
, vol. 
35
 (pg. 
D55
-
D60
)
54
Khelifi
A
Duret
L
Mouchiroud
D
HOPPSIGEN: a database of human and mouse processed pseudogenes
Nucleic Acids Res.
2005
, vol. 
33
 (pg. 
D59
-
D66
)
55
Ortutay
C
Vihinen
M
PseudoGeneQuest—service for identification of different pseudogene types in the human genome
BMC Bioinformatics
2008
, vol. 
9
 pg. 
299
 
56
Bischof
JM
Chiang
AP
Scheetz
TE
, et al. 
Genome-wide identification of pseudogenes capable of disease-causing gene conversion
Hum. Mutat.
2006
, vol. 
27
 (pg. 
545
-
552
)

Author notes

Citation details: Chan,W.-L., Yang,W.-K., Huang,H.-D. et al. pseudoMap: an innovative and comprehensive resource for identification of siRNA-mediated mechanisms in human transcribed pseudogenes. Database (2013) Vol. 2013: article ID bat001; doi:10.1093/database/bat001.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data