AnaLysis of Expression on human chromosome 21, ALE-HSA21: a pilot integrated web resource Open Access

Number of analyzed gene elements collected in ALE-HSA21 database

	Genes	Transcripts	Promoters	Exons	Introns	3′ UTR
Coding	238	1713	1713	14 177	12 454	1201^a
Noncoding	296	462				244 ^b
Total	534	2175	1713	14 177	12 454	1445

	Genes	Transcripts	Promoters	Exons	Introns	3′ UTR
Coding	238	1713	1713	14 177	12 454	1201^a
Noncoding	296	462				244 ^b
Total	534	2175	1713	14 177	12 454	1445

^aThe number of 3′ UTRs analyzed for the presence of MREs is smaller than the total number of transcripts because some of them lack UTR in the GENCODE annotation or are currently annotated as ‘processed transcripts’, ‘retained introns’ and other categories lacking a typical 3′ UTR.

^bFor noncoding RNAs, the entire sequences of 220 lincRNAs and 24 pseudogenes have been analyzed for the presence of MREs.

Table 1.

Number of analyzed gene elements collected in ALE-HSA21 database

	Genes	Transcripts	Promoters	Exons	Introns	3′ UTR
Coding	238	1713	1713	14 177	12 454	1201^a
Noncoding	296	462				244 ^b
Total	534	2175	1713	14 177	12 454	1445

	Genes	Transcripts	Promoters	Exons	Introns	3′ UTR
Coding	238	1713	1713	14 177	12 454	1201^a
Noncoding	296	462				244 ^b
Total	534	2175	1713	14 177	12 454	1445

^bFor noncoding RNAs, the entire sequences of 220 lincRNAs and 24 pseudogenes have been analyzed for the presence of MREs.

For the identification of miRNA responsive elements (MREs) within the mRNA 3′ UTRs, the complete list of 6121 mammalian miRNAs of TargetScan database was filtered for 4582 nonhuman miRNAs. Thus, targetscan_60.pl script downloaded from TargetScan was used to analyze 1201 3′UTRs of protein-coding transcripts, searching for binding sites of 1539 miRNAs. Of note, the number of 3′UTRs analyzed for the presence of MREs is smaller than the total number of transcripts because some of them lack UTR in the GENCODE annotation or are annotated as ‘processed transcripts’, ‘retained introns’ and other categories lacking the typical 3′UTR.

The same analysis was performed also to predict MREs within the entire nucleotide sequences of 220 lincRNAs' and 24 pseudogenes' transcripts mapping on HSA21.

RNAfold web server, with default parameters (19), was used to predict—and visualize—the secondary structure of 19 snoRNA and 30 pre-miRNA sequences (figures available on ALE-HSA21). Such analysis represents predictions of secondary structures based on minimum free energy and partition functions, and the images do not represent in vivo structures.

For the 19 HSA21 miRNAs annotated in miRBase (20), the mature miRNA sequence was also highlighted in such drawings. miRNA target genes (MiTGs) were predicted by miRWalk [using algorithms both for ‘validated’ and ‘predicted’ genes; (21)] and CoMeTa databases (22). Complete data were available for five miRNAs (miR99a, miR125b, miR155, miR802 and let-7 c). Finally, the three lists of MiTGs were intersected to determine a common pool using Venny (23).

Database development and description

ALE-HSA21 is a database-driven Web site, more properly driven on a Relational Database Management System. It allows to structure the information contained in the web portal and to display them in tables. Overall, the database contains 534 HSA21 genes, consisting of 33.394 different genomic elements (Table 1). Database was implemented using server version 5.1.67—10.04.1 (Ubuntu)—and a web server Apache/2.2.14 (Ubuntu). MySQL client version 5.1.67—10.04.1 (Ubuntu)—and the free tool phpMyAdmin version 3.3.2 deb1ubuntu were used to handle the administration of MySQL over the World Wide Web.

ALE-HSA21 Web site

The web-oriented side was created using the scripting language PHP. The Javascript technology for dynamic contents, the markup language HTML for static contents and style sheet CSS 2.0 were also used. All 3D images have been implemented using the ‘clickable image maps’ method, thus making only a certain portion of the image sensitive to mouse clicks, linking to different destinations and contents. The ALE-HSA21 code is validated according to the standard web of the international community W3C (World Wide Web Consortium: http://www.w3.org/) and therefore, although optimized for Mozilla Firefox, it is easily visible and accessible by all browsers and smartphones.

The web resource will be regularly updated on the basis of the progress of our study.

Results

Global organization of ALE-HSA21 web portal

ALE-HSA21 is an open-access user-friendly web resource built on a relational database. This web portal is a comprehensive ‘transcript-centric’ database for HSA21, in which data from different sources are dynamically integrated (Figure 1A). ALE-HSA21 contains 2164 annotated transcripts, generated through AS from 534 genes. Particularly, 1702 transcripts derive from 238 protein-coding genes and 462 from 296 noncoding genes (Table 1; Supplementary Figure S2). Moreover, ALE-HSA21 also collects 11 transcripts (of protein-coding genes) identified—and validated—through the computational reanalysis of our RNA-Seq data sets.

The web resource consists of five main sections, schematically shown in the block diagram of Supplementary Figure S3 and in the ‘Sitemap’ of ALE-HSA21. The core is represented by the sections ‘Coding genes’—divided in ‘Annotated’ and ‘Novel’—and ‘Noncoding genes’. The ‘Novel’ subsection hosts transcripts identified de novo by the analysis of our RNA-Seq data sets by merging custom computational workflows to open source tools and public databases (Figure 1B). In this subsection, only experimentally validated new HSA21 transcripts are reported. For all of them, sequences have been submitted and approved by EMBL Nucleotide Sequence Database.

Both sections, ‘Coding genes’ and ‘Noncoding genes’, contain two types of data: genomic/structural and regulatory. In particular, for all HSA21 transcripts, ALE-HSA21 provides a brief gene/category description, sequences in FASTA format and 3D structures. These dynamic images are linked to the regulatory data, consisting of computationally predicted motifs within nucleotide sequences. In addition, for coding genes, this web resource integrates links to public repositories of expression data, gene networks and ontologies, nucleotide variations, proteins and association to disease (Gene Expression Atlas, Gene Networks, dbSNP, UniProt and OMIM, respectively).

In silico identification and experimental validation of novel HSA21 transcripts

Our RNA-Seq data sets (11) were reanalyzed using a custom computational workflow (see ‘Materials and Methods’ section; schematized in Figure 1B). Through this in silico approach we identified different intronic/intergenic regions possibly representing novel transcripts (Supplementary File S2). Additionally, we checked the presence/absence of intronic/intergenic HCPs in two independent RNA-Seq data sets from HeLa and K562 cell lines (24) and tumor samples (manuscript in preparation), observing these are not cell-specific. Similarly, for known HSA21 genes putative extended UTRs and novel splice junctions—according to our categorization method—were also detected (Supplementary Files S3 and Supplementary Data). Filtering out signals (i.e. mapped reads) from pre-mRNAs and repeats, and considering their overlap with AceView predictions and/or expressed sequence tags, we confirmed the presence of 11 novel transcripts arising from six HSA21 genes (Table 2). Their bona fide was confirmed in progenitor cells used in our previous study (11), as well as in other cell lines (data not shown). Nucleotide sequences of these new transcripts were submitted to the EMBL Nucleotide Archive, and accession numbers are listed in Table 2. Newly identified transcripts were added to ALE-HSA21 database and included in the computational analyses, further described.

Table 2.

Novel transcripts identified by RNA-Seq analysis

Gene symbol	Transcript	Accession number	Primer sequence (5′–3′)
Gene symbol	Transcript	Accession number	Forward primer	Reverse primer
IFNAR2	IFNAR2_var1	HG380509	CTGGGAGTCCGCTTTCGTT	GGAGACTTTATTACTGCTTGC
MCM3AP	MCM3AP_HF584748	HF584748	AGTGCTGAGCGAACCGGAAG	GGCTCAACAGGAAATGGTAAA
NRIP1	NRIP1_HF584749	HF584749	GAGAGCTGCTGAAGAAGTAG	TAAATGAGAAAAAATGCATTGTC
NRIP1	NRIP1_HF584750	HF584750	GAGAGCTGCTGAAGAAGTAG	TAAATGAGAAAAAATGCATTGTC
POFUT2	POFUT2_var1	HG380510	GGGCCATGGCGACACTCA	TGTGTTTCTCAGCAGCAGGG
POFUT2	POFUT2_var2	HG380511	GGGCCATGGCGACACTCA	TTTATCCCTGGCGCTGCAC
SAMSN1	SAMSN1_all_skip	HG380514	GCACACTGCTGACTGTTTTC	ATCTTCCTCTCCTATTTGACG
SAMSN1	SAMSN1_var1	HG380512	GCACACTGCTGACTGTTTTC	ACTATAGAAGTGCTTGGTACT
SAMSN1	SAMSN1_var2	HG380513	GCACACTGCTGACTGTTTTC	ATCTTCCTCTCCTATTTGACG
DYRK1A	DYRK1A_var1	HF584751	TGTTATAGTTTTGCCGCTGGA	CTGTTGGTCACTTATGTTTGG
DYRK1A	DYRK1A_var2	HF584752	TGTTATAGTTTTGCCGCTGGA	CTGTTGGTCACTTATGTTTGG

Gene symbol	Transcript	Accession number	Primer sequence (5′–3′)
Gene symbol	Transcript	Accession number	Forward primer	Reverse primer
IFNAR2	IFNAR2_var1	HG380509	CTGGGAGTCCGCTTTCGTT	GGAGACTTTATTACTGCTTGC
MCM3AP	MCM3AP_HF584748	HF584748	AGTGCTGAGCGAACCGGAAG	GGCTCAACAGGAAATGGTAAA
NRIP1	NRIP1_HF584749	HF584749	GAGAGCTGCTGAAGAAGTAG	TAAATGAGAAAAAATGCATTGTC
NRIP1	NRIP1_HF584750	HF584750	GAGAGCTGCTGAAGAAGTAG	TAAATGAGAAAAAATGCATTGTC
POFUT2	POFUT2_var1	HG380510	GGGCCATGGCGACACTCA	TGTGTTTCTCAGCAGCAGGG
POFUT2	POFUT2_var2	HG380511	GGGCCATGGCGACACTCA	TTTATCCCTGGCGCTGCAC
SAMSN1	SAMSN1_all_skip	HG380514	GCACACTGCTGACTGTTTTC	ATCTTCCTCTCCTATTTGACG
SAMSN1	SAMSN1_var1	HG380512	GCACACTGCTGACTGTTTTC	ACTATAGAAGTGCTTGGTACT
SAMSN1	SAMSN1_var2	HG380513	GCACACTGCTGACTGTTTTC	ATCTTCCTCTCCTATTTGACG
DYRK1A	DYRK1A_var1	HF584751	TGTTATAGTTTTGCCGCTGGA	CTGTTGGTCACTTATGTTTGG
DYRK1A	DYRK1A_var2	HF584752	TGTTATAGTTTTGCCGCTGGA	CTGTTGGTCACTTATGTTTGG

Table 2.

Open in new tab Download slide

Novel transcripts identified by RNA-Seq analysis

Gene symbol	Transcript	Accession number	Primer sequence (5′–3′)
Gene symbol	Transcript	Accession number	Forward primer	Reverse primer
IFNAR2	IFNAR2_var1	HG380509	CTGGGAGTCCGCTTTCGTT	GGAGACTTTATTACTGCTTGC
MCM3AP	MCM3AP_HF584748	HF584748	AGTGCTGAGCGAACCGGAAG	GGCTCAACAGGAAATGGTAAA
NRIP1	NRIP1_HF584749	HF584749	GAGAGCTGCTGAAGAAGTAG	TAAATGAGAAAAAATGCATTGTC
NRIP1	NRIP1_HF584750	HF584750	GAGAGCTGCTGAAGAAGTAG	TAAATGAGAAAAAATGCATTGTC
POFUT2	POFUT2_var1	HG380510	GGGCCATGGCGACACTCA	TGTGTTTCTCAGCAGCAGGG
POFUT2	POFUT2_var2	HG380511	GGGCCATGGCGACACTCA	TTTATCCCTGGCGCTGCAC
SAMSN1	SAMSN1_all_skip	HG380514	GCACACTGCTGACTGTTTTC	ATCTTCCTCTCCTATTTGACG
SAMSN1	SAMSN1_var1	HG380512	GCACACTGCTGACTGTTTTC	ACTATAGAAGTGCTTGGTACT
SAMSN1	SAMSN1_var2	HG380513	GCACACTGCTGACTGTTTTC	ATCTTCCTCTCCTATTTGACG
DYRK1A	DYRK1A_var1	HF584751	TGTTATAGTTTTGCCGCTGGA	CTGTTGGTCACTTATGTTTGG
DYRK1A	DYRK1A_var2	HF584752	TGTTATAGTTTTGCCGCTGGA	CTGTTGGTCACTTATGTTTGG

Gene symbol	Transcript	Accession number	Primer sequence (5′–3′)
Gene symbol	Transcript	Accession number	Forward primer	Reverse primer
IFNAR2	IFNAR2_var1	HG380509	CTGGGAGTCCGCTTTCGTT	GGAGACTTTATTACTGCTTGC
MCM3AP	MCM3AP_HF584748	HF584748	AGTGCTGAGCGAACCGGAAG	GGCTCAACAGGAAATGGTAAA
NRIP1	NRIP1_HF584749	HF584749	GAGAGCTGCTGAAGAAGTAG	TAAATGAGAAAAAATGCATTGTC
NRIP1	NRIP1_HF584750	HF584750	GAGAGCTGCTGAAGAAGTAG	TAAATGAGAAAAAATGCATTGTC
POFUT2	POFUT2_var1	HG380510	GGGCCATGGCGACACTCA	TGTGTTTCTCAGCAGCAGGG
POFUT2	POFUT2_var2	HG380511	GGGCCATGGCGACACTCA	TTTATCCCTGGCGCTGCAC
SAMSN1	SAMSN1_all_skip	HG380514	GCACACTGCTGACTGTTTTC	ATCTTCCTCTCCTATTTGACG
SAMSN1	SAMSN1_var1	HG380512	GCACACTGCTGACTGTTTTC	ACTATAGAAGTGCTTGGTACT
SAMSN1	SAMSN1_var2	HG380513	GCACACTGCTGACTGTTTTC	ATCTTCCTCTCCTATTTGACG
DYRK1A	DYRK1A_var1	HF584751	TGTTATAGTTTTGCCGCTGGA	CTGTTGGTCACTTATGTTTGG
DYRK1A	DYRK1A_var2	HF584752	TGTTATAGTTTTGCCGCTGGA	CTGTTGGTCACTTATGTTTGG

Of note, potentially new transcripts reported in the Supplementary Files S2–S4, not yet validated by reverse transcriptase-polymerase chain reaction and Sanger sequencing, are, however, supported by RNA-Seq data (HCPs with uniquely mapped reads and/or a sufficient number of reads mapping on the splice junctions). However, as RNA-Seq data sets come from fragment libraries of 50 bp reads, and given the heuristic nature of algorithms for reads' alignment, despite our checks, false-positive alignments may have occurred.

Computational analysis of regulatory sequences in protein-coding transcripts

Predictive in silico analysis of regulatory sequences within gene promoters, exons, introns and 3′ UTRs was performed for both annotated and newly identified HSA21 protein-coding transcripts (Table 1). The ‘consensus’ sequences for 78 human TFs were predicted within gene promoters. Similarly to the RSAT output, the results are provided to the final user as tables with a ‘weight score’ column measured by the ‘Background model estimation method’ (17). Such tables, dynamically integrated on the web portal, can be accessed through the clickable ‘Promoter’ button in the 3D structure of each transcript (Figure 2C and D). In addition, to provide the users with ChIP-Seq (chromatin immunoprecipitation followed by massive sequencing) data for TFs of the ENCODE project, we also integrated a clickable button linked to these tracks, loaded into an open-access custom session of UCSC Genome Browser. This kind of approach, based both on computational predictions of TFs' binding motifs and experimental large-scale data, represents a starting point to investigate differential TFs' binding among distinct genes and, more interestingly, among different transcripts of the same gene.

Figure 2.

Screenshots from ALE-HSA21 web resource. Panel (A) shows the Homepage with Navigation Bar; panel (B) shows the list of HSA21 transcripts in the ‘Coding genes’ section in tabular format. Official gene symbol, ID, genomic coordinates, the sense of transcription, the number of exons and UniProt IDs are reported. The Black arrow and circle indicate an example of a clickable item (SOD1 gene in the example). By clicking there, the users access the Gene Description page, depicted in Panel (C). Interactive 3D graphical representation for each transcript is embedded in this web page. Each gene element is linked to results of in silico analysis. Colored circles—red for ‘Promoter’, green for ‘exons’, light blue for ‘introns’ and gray for ‘3′ UTRs’—correspond to the clickable elements of the 3D images. The same color scheme is used in panel (D) to indicate the relative results for the computational analyses of those elements.

Moreover, as ‘noncanonical’ exonic and intronic sequences are known to affect splicing—and mutations herein can cause monogenic or can be associated to complex disorders—we computationally predicted ESE/ESS and ISE/ISS within all exons/introns of each HSA21 transcript. Similar to the predictive analysis of TFs' binding sites, data in tabular format were included within ALE-HSA21 and dynamically integrated in 3D transcripts' models. The user can access these results by clicking the exon/intron of interest on the 3D structures (Figure 2C and D).

In addition, given the crucial role of miRNAs in the posttranscriptional regulation of mRNAs through their binding to 3′ UTRs (25, 26), we added to our resource this layer of gene regulation. Particularly, we computationally predicted the binding sites of all annotated human miRNAs within 3′ UTRs sequences of HSA21 mapping mRNAs. Results of such predictive analysis can be accessed through the ‘3′ UTR’ button integrated within the 3D graphical representation of each transcript (Figure 2C and D). Predicting the presence of differential MREs within distinct transcripts of the same gene can be rapidly and easily assessed browsing data of our computational analyses integrated in ALE-HSA21.

Computational analysis for noncoding transcripts

Recent evidences have shown that pseudogenes sequester miRNAs and act as competitive endogenous RNAs, and a similar mechanism has been proposed for lincRNAs (27). The role of these transcripts as miRNAs' sponges has been directly linked to carcinogenesis and muscle differentiation (28–30), and proposed for the onset of neurodegenerative diseases (31). Given these considerations, using prediction algorithms we searched for MREs in HSA21 lincRNAs and pseudogenes. The results were dynamically integrated in tabular format within the ‘Noncoding genes’ section of ALE-HSA21, accessible by clicking the ‘miRNA binding sites’ button located in the web page of the related transcript. In addition, using three different prediction algorithms, we also independently predicted—for the five HSA21 miRNAs available in CoMeTa and miRWalk databases—the putative MiTGs, not limiting such analysis to HSA21 genes. Data integrated in the web portal are provided in tabular format (Figure 3A). Moreover, the intersections between the three above-mentioned lists of computationally predicted MITGs—one for each prediction algorithm used—are shown as Venn diagrams (Figure 3B). These data allow rapidly observing a common pool of target genes, possibly regulated by the same HSA21 miRNA. Finally, as miRNAs and snoRNAs have a peculiar folding, which in turn determines their biological functions, we predicted their secondary structures and integrated these data as static images within ALE-HSA21 (Figure 3C). More in detail, in each miRNA and snoRNA drawing, nucleotides are colored by the base-pair probabilities, according to local measures of reliability, as described in (32). For all 19 HSA21 miRNAs, currently annotated in miRBase, we highlighted the nucleotide sequences corresponding to the mature—and functional—form of these miRNAs.

Figure 3.

Example of the data provided for miRNAs in ALE-HSA21 web portal. In panel (A) and (B) are shown the results of the computational prediction of MiTGs in tabular format and Venn diagrams, respectively. Such data are accessible by clicking the ‘Target genes’ button embedded within miRNA web pages. ‘Validated’, ‘predicted’ and ‘co-expressed’ correspond to the target genes according to miRWalk and CoMeTa databases. Panel (C) shows a prediction of the secondary pre-miRNA structure obtained by RNAfold. Mature miRNA sequences are indicated by black brackets.

Open in new tab Download slide

Exploring ALE-HSA21

The intuitive interface of ALE-HSA21 makes this web resource an easy and fast—although scientifically accurate, comprehensive and updated—tool to retrieve, in few clicks, relevant information about HSA21 genes and transcripts. A detailed user's guide on how to browse ALE-HSA21 and how to extract useful information is provided as Supplementary File S5 and also available on the web portal. The homepage is shown in Figure 2A. From the navigation bar the user can access to the different sections of the portal. Within the ‘Coding genes’ section there is the complete and updated list of HSA21 transcripts, both known and newly identified in this study, in a user-friendly tabular format (Figure 2B). For each transcript, there is the official gene symbol, ID (NM and/or ENST according to the RefSeq and the GENCODE annotations), the genomic coordinates with a link to an open-access session of UCSC Genome Browser, ad hoc created with chromosome 21 coverage files, the sense of transcription (indicated by ‘+/-’ strand), the number of exons and the corresponding, if any, UniProt IDs. Each feature has a link to other sections of the web portal as well as to other external widely used databases. From the gene list table (Figure 2B) the user can easily access an interactive 3D graphical representation of all the transcripts. A ‘Search by Gene’ button allows a direct and quick access to all information for a given gene of interest. For each gene, such information consist of (i) full gene name and gene symbol, (ii) brief gene description with eventual literature references, (iii) links to gene expression data, gene networks and ontologies, (iv) single-nucleotide polymorphisms, (v) involvement in human diseases and (vi) 3D structures of all transcripts generated by AS (Figure 2C). In the 3D structures, each gene element (i.e. promoter, exons, introns and 3′ UTR) is clickable, allowing to directly access the results of our in silico analysis of regulatory sequences. As previously described, these data are provided in easily comprehensible tabular format, as illustrated in Figure 2D. In addition, for each element, below the results table, the nucleotide sequences are viewable and downloadable (FASTA format).

The ‘Noncoding genes’ section contains the comprehensive list of all HSA21 noncoding transcripts, divided in 10 categories, as depicted in Figure 1B. Similar to the ‘Coding genes’ section, by clicking on a specific transcript ID, the users can retrieve structural and regulatory data, described in the previous paragraph. In particular, for lincRNAs and pseudogenes, clickable ‘miRNA binding sites’ button allows accessing a complete list of predicted MREs in tabular format. Furthermore, the ‘Target genes’ button, included within the web pages dedicated to miRNAs, is directly linked to the results of our analysis on genes predicted to be targets of the selected miRNA.

Finally, a ‘Search by Features’ section has been implemented to ease the browsing ALE-HSA21 for specific features of interest, such as Gene Symbol, Transcript ID and UniProt ID.

Conclusion

RNA-Seq studies have clearly shown that the vast majority of—if not all—human genes undergo AS, generating different transcripts (33, 34). These can be transcribed from alternative promoters and may possibly be regulated by different TFs as well as epigenetic factors (i.e. methylation and/or histone modifications). Transcripts with alternative 3′ UTRs may also undergo different co- and posttranscriptional regulation by miRNAs. In light of this, ALE-HSA21 is designed to provide, at the transcript level, computational predictions of TFs' binding sites in promoters as well as MREs in 3′ UTRs. Nonetheless, the presence of in silico predicted binding motifs does not guarantee a specific regulatory factor will bind that sequence. Future studies, based on ChIP-Seq data sets for TFs and sequencing of small RNAs, will surely allow us to improve the accuracy of this analysis. Understanding if alternative transcripts may be targets of different miRNAs, or if their transcription may be triggered by the same TFs, is clearly important to address their differential regulation, both in physiological and pathological conditions.

Moreover, it is known that different transcripts can arise from the same gene through the usage of alternative canonical splice sites, as well as noncanonical splicing enhancer/silencer sequences. Mutations within these regions affect splicing, causing diseases (35). Among them, familial isolated growth hormone deficiency type II is caused by different mutations occurring in the 5′ splice site, ISE and ESE that increase splicing of the exon 3 of GH1 gene (36–38). In addition, aberrant splicing events—such as those caused by mutations in WT1 gene in Frasier syndrome—may alter isoform abundance, affecting several cell processes (39–41). Thus, cataloguing the predicted binding sites for auxiliary splicing factors within all the splicing isoforms of a gene is an added value for clinicians studying human diseases.

Moreover, incomplete annotation of transcripts can lead to misinterpret the effect of nucleotide variations, both mutations and single-nucleotide polymorphisms. On the opposite, the exact knowledge of all splicing isoforms is crucial for clinicians to identify disease-causing mutations. The discovery of a 3′ terminal exon of RPGR gene—mutated in 60% of X-linked retinitis pigmentosa patients—is one of the first and most convincing examples of the link between AS and human diseases (6).

Such evidence highlights the importance to study at the isoform level, rather than at gene level, both physiological processes and disease mechanisms. The recently developed DataBase of Alternative Transcripts Expression (DBATE), valuable source of expression data for AS variants, is a good example (42).

In this ‘scenario’, our pilot transcript-centric database represents a fast and intuitive resource for medical geneticists interested in HSA21-related pathologies (such as DS), as well as for researchers investigating any molecular pathway or cell process involving HSA21 genes. Because ALE-HSA21 is an easy-to-use resource, it is accessible to all scientists with low experience in computational biology or informatics. ALE-HSA21 has been conceived to simply and rapidly provide the user with data usually dispersed in distinct databases or accessible by independently using different computational tools. The presence of genomic data, as well as of in silico predictions of regulatory sequences, links to gene expression, mutation and gene network databases in a unique Web site is an added value of ALE-HSA21.

Finally, our resource is in line with the growing interest for ncRNAs—supported by the wide diffusion of databases such as miRBase (20), miRWalk (21), lncRNome (43), lncRNA db (44), NONCODE (45) and Pseudofam (46). ALE-HSA21 provides open-access computational predictions about the presence of regulatory sequences for pseudogenes, lncRNAs and miRNAs, known to be involved in several biological processes.

This study is likely to represent an interesting proof-of-concept and a starting point for implementing similar resources with the aim to integrate information available in different databases to ‘omics’ data, generated by next-generation sequencing (RNA-Seq, ChIP-Seq and MeDIP-Seq) or by other large-scale technologies.

Acknowledgements

We thank Luciano De Leo for technical assistance to the web portal and Manuela Marescotti for reading the manuscript. We also thank the anonymous reviewers for the helpful comments and suggestions. We also acknowledge the ComBOlab (Computational & Biology Open laboratory).

Funding

This work was funded by the FLAGSHIP “InterOmics” Project (PB.P05) to A.C. and C.A., supported by the Italian MIUR and CNR organizations. Both laboratories are members of the COST-Action (BM1006) “Next-Generation Sequencing Data Analysis Network” funded by the EU.

Conflict of interest. None declared.

References

Kapranov

Cheng

Dike

, et al. ,

RNA maps reveal new RNA classes and a possible function for pervasive transcription

Science

2007

, vol.

316

(pg.

1484

1488

)

Birney

Stamatoyannopoulos

Dutta

, et al. ,

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project

Nature

2007

, vol.

447

(pg.

799

816

)

Jacquier

. ,

The complex eukaryotic transcriptome: unexpected pervasive transcription and novel small RNAs

Nat. Rev. Genet.

2009

, vol.

(pg.

833

844

)

Clark

Amaral

Schlesinger

, et al. ,

The reality of pervasive transcription

PLoS Biol.

2011

, vol.

pg.

e1000625

Harrow

Frankish

Gonzalez

, et al. ,

GENCODE: the reference human genome annotation for The ENCODE project

Genome Res.

2012

, vol.

(pg.

1760

1774

)

Crossref

Vervoort

Lennon

Bird

, et al. ,

Mutational hot spot within a new RPGR exon in X-linked retinitis pigmentosa

Nat. Genet.

2000

, vol.

(pg.

462

466

)

Liu

Mattick

Taft

. ,

A meta-analysis of the genomic and transcriptomic composition of complex life

Cell Cycle

2013

, vol.

(pg.

2061

2072

)

Barrett

Fletcher

Wilton

. ,

Regulation of eukaryotic gene expression by the untranslated gene regions and other non-coding elements

Cell Mol. Life Sci.

2012

, vol.

(pg.

3613

3634

)

Yandell

Ence

. ,

A beginner's guide to eukaryotic genome annotation

Nat. Rev. Genet.

2012

, vol.

(pg.

329

342

)

Costa

Angelini

De Feis

, et al. ,

Uncovering the complexity of transcriptomes with RNA-Seq

J. Biomed. Biotechnol.

2010

2010, 853916

OpenURL Placeholder Text

Costa

Angelini

D'Apice

, et al. ,

Massive-scale RNA-Seq analysis of non ribosomal transcriptome in human trisomy 21

PLoS One

2011

, vol.

pg.

e18493

Trapnell

Pachter

Salzberg

. ,

TopHat: discovering splice junctions with RNA-Seq

Bioinformatics

2009

, vol.

(pg.

1105

1111

)

Quinlan

Hall

. ,

BEDTools: a flexible suite of utilities for comparing genomic features

Bioinformatics

2010

, vol.

(pg.

841

842

)

Handsaker

Wysoker

, et al. ,

The Sequence alignment/map format and SAMtools

Bioinformatics

2009

, vol.

(pg.

2078

2079

)

Zhang

Liu

Meyer

, et al. ,

Model-based analysis of ChIP-Seq (MACS)

Genome Biol.

2008

, vol.

pg.

R137

Benjamini

Hochberg

. ,

Controlling the false discovery rate: a practical and powerful approach to multiple testing

J. R. Stat. Soc. Ser. B

1995

, vol.

(pg.

289

300

)

OpenURL Placeholder Text

van Helden

. ,

Regulatory sequence analysis tools

Nucleic Acids Res.

2003

, vol.

(pg.

3593

3596

)

Huang

Chien

Jen

, et al. ,

RegRNA: an integrated web server for identifying regulatory RNA motifs and elements

Nucleic Acids Res.

2006

, vol.

(pg.

W429

W434

)

Hofacker

. ,

Vienna RNA secondary structure server

Nucleic Acids Res.

2003

, vol.

(pg.

3429

3431

)

Griffiths-Jones

Grocock

van Dongen

, et al. ,

miRBase: microRNA sequences, targets and gene nomenclature

Nucleic Acids Res.

2006

, vol.

(pg.

D140

D144

)

Dweep

Sticht

Pandey

Gretz

. ,

miRWalk–database: prediction of possible miRNA binding sites by “walking” the genes of three genomes

J. Biomed. Inform.

2011

, vol.

(pg.

839

847

)

Gennarino

D'Angelo

Dharmalingam

, et al. ,

Identification of microRNA-regulated gene networks by expression analysis of target genes

Genome Res.

2012

, vol.

(pg.

1163

1172

)

Oliveros

. ,

2007

VENNY. An interactive tool for comparing lists with Venn Diagrams. http://bioinfogp.cnb.csic.es/tools/venny/index.html

Tilgner

Raha

Habegger

, et al. ,

Accurate identification and analysis of human mRNA isoforms using deep long read sequencing

G3 (Bethesda)

2013

, vol.

(pg.

387

397

)

Bartel

. ,

MicroRNAs: target recognition and regulatory functions

Cell

2009

, vol.

136

(pg.

215

233

)

Guo

Ingolia

Weissman

, et al. ,

Mammalian microRNAs predominantly act to decrease target mRNA levels

Nature

2010

, vol.

466

(pg.

835

840

)

Salmena

Poliseno

Tay

, et al. ,

A ceRNA hypothesis: the Rosetta stone of a hidden RNA language?

Cell

2011

, vol.

146

(pg.

353

368

)

Cesana

Cacchiarelli

Legnini

, et al. ,

A long noncoding RNA controls muscle differentiation by functioning as a competing endogenous RNA

Cell

2011

, vol.

147

(pg.

358

369

)

Karreth

Tay

Perna

, et al. ,

In vivo identification of tumor-suppressive PTEN ceRNAs in an oncogenic BRAF-induced mouse model of melanoma

Cell

2011

, vol.

147

(pg.

382

395

)

Tay

Kats

Salmena

, et al. ,

Coding-independent regulation of the tumor suppressor PTEN by competing endogenous mRNAs

Cell

2011

, vol.

147

(pg.

344

357

)

Costa

Esposito

Aprile

, et al. ,

Non-coding RNA and pseudogenes in neurodegenerative diseases: “The (un)Usual Suspects”

Front. Genet.

2012

, vol.

pg.

231

Gruber

Lorenz

Bernhart

, et al. ,

The Vienna RNA websuite

Nucleic Acids Res.

2008

, vol.

(pg.

W70

W74

)

Pan

Shai

Lee

, et al. ,

Deep surveying of alternative splicing complexity in the human transcriptome by highthroughput sequencing

Nat. Genet.

2008

, vol.

(pg.

1413

1415

)

Keren

Lev-Maor

Ast

. ,

Alternative splicing and evolution: diversification, exon definition and function

Nat. Rev. Genet.

2010

, vol.

(pg.

345

355

)

Kornblihtt

Schor

Alló

, et al. ,

Alternative splicing: a pivotal step between eukaryotic transcription and translation

Nat. Rev. Mol. Cell Biol.

2013

, vol.

(pg.

153

165

)

Binder

Brown

Parks

. ,

Mechanisms responsible for dominant expression of human growth hormone gene mutations

J. Clin. Endocrinol. Metab.

1996

, vol.

(pg.

4047

4050

)

PubMed

OpenURL Placeholder Text