PGD: a pangolin genome hub for the research community Open Access

Summary statistics of two pangolin genome and transcriptome datasets in PGD

Genome	Malayan Pangolin	Chinese Pangolin
Number of scaffolds	81,732	87,621
Estimated coverage (X)	146	56
Estimated Genome size	2,549,959,554 bp	2,205,289,822 bp
N50 (bp)	204,525	157,892
# of protein-coding genes	23,446	20,298
# of annotated genes	21,451 (91%)	19,287 (95%)
# of pseudogenes	4660	2416
# of transcripts	89,751	NA

Genome	Malayan Pangolin	Chinese Pangolin
Number of scaffolds	81,732	87,621
Estimated coverage (X)	146	56
Estimated Genome size	2,549,959,554 bp	2,205,289,822 bp
N50 (bp)	204,525	157,892
# of protein-coding genes	23,446	20,298
# of annotated genes	21,451 (91%)	19,287 (95%)
# of pseudogenes	4660	2416
# of transcripts	89,751	NA

Assembly statistic of pangolins genome. Adapted from Pangolin genomes and the evolution of mammalian scales and immunity. by Choo et al., 2016.

Table 1.

Summary statistics of two pangolin genome and transcriptome datasets in PGD

Genome	Malayan Pangolin	Chinese Pangolin
Number of scaffolds	81,732	87,621
Estimated coverage (X)	146	56
Estimated Genome size	2,549,959,554 bp	2,205,289,822 bp
N50 (bp)	204,525	157,892
# of protein-coding genes	23,446	20,298
# of annotated genes	21,451 (91%)	19,287 (95%)
# of pseudogenes	4660	2416
# of transcripts	89,751	NA

Genome	Malayan Pangolin	Chinese Pangolin
Number of scaffolds	81,732	87,621
Estimated coverage (X)	146	56
Estimated Genome size	2,549,959,554 bp	2,205,289,822 bp
N50 (bp)	204,525	157,892
# of protein-coding genes	23,446	20,298
# of annotated genes	21,451 (91%)	19,287 (95%)
# of pseudogenes	4660	2416
# of transcripts	89,751	NA

Assembly statistic of pangolins genome. Adapted from Pangolin genomes and the evolution of mammalian scales and immunity. by Choo et al., 2016.

Transcriptome data

PGD stores the sequences of expressed genes from different organs, namely the cerebellum, cerebrum, liver, heart, kidney, lung, thymus and spleen of the Malayan pangolin. Briefly, these transcriptomes were sequenced using an Illumina HiSeq 100 bp Paired End sequencing strategy. To generate a representative catalog of pangolin genes, the sequencing reads from all organs were pooled and de novo assembled using three different approaches: SOAPdenovo ( 18 ), Velvet ( 19 ) and Trinity ( 20 ) software. To generate a high-quality set of pangolin genes, we only accepted common assembled transcripts or genes predicted by the three different assemblers. The assembled data were filtered and clustered based on their similarity, and the longest transcripts were selected as unigene. This resulted in a set of 89 751 unigenes in the Malayan pangolin transcriptome ( Table 1 ).

Pseudogene annotation data

To the best of our knowledge, the Malayan and Chinese pangolin genomes are the first whole-genome sequencing efforts for the Pholidota order; therefore, information regarding pseudogenes are still not available. Here, PGD offers information about the putative pangolin pseudogenes. To predict pseudogenes in the pangolin genomes, we searched the whole-genome sequences using MAKER-generated protein-coding genes (parent genes) as query sequences through the whole-genome using Pseudopipe pipeline ( 21 ). Pseudopipe is a well-established pipeline for whole-genome pseudogene screening ( 22–24 ). Protein sequences derived from MAKER annotation were used as queries to BLASTALL ( 25 ) through the whole pangolin genomes. After all homologous BLAST hits were reported, Pseudopipe removed all hits that belongs to its parent genes, and merged small gaps between two short hits and output as potential pseudogene candidates. These candidate pseudogenes were then screened using the tFasty ( 26 ) tool to report insertion, deletion, premature stop codon and frame shift mutation events that may disrupt the normal functions of the genes in order to generate the final pseudogene data set. Stringent thresholds were used to filter all the false-positive candidates (1E−10 e-value, 70% parent gene coverage, 40% gene identity). All pseudogenes were reported as duplicated or processed pseudogenes based on the nature of the reported hits. There are 4660 pseudogenes reported in Malayan pangolin and 2416 pseudogenes ( Table 1 ) in the Chinese pangolin genome after applying the stringent filtering criteria.

PGD implementation

The PGD hub was developed based on the four-tier web application architecture (client workstation, web server, application server and database server) implemented under the Linux system using various common software packages including Apache, MySQL, PHP and Perl. The website was designed in PHP-HTML5 using Codelgniter and Twitter Bootstrap as the back-end and front-end frameworks, respectively ( Figure 1 ). In addition, the website has been separated into logic, presentation and application data into three interconnected parts following the Model-view-controller framework. For the client-side, this was achieved with jQuery, which is a feature-rich JavaScript library that may enhance user interaction with the web pages through AJAX (Asynchronous JavaScript and XML) that is used to transfer data between the client side and the server side. The genomic annotation data that are generated from in-house scripts and published software are stored in the MySQL database management system.

Figure 1.

PGD four-tier web application architecture. (client workstation, web server, application server and database server).

PGD contains four main features/tabs: (i) ‘Browse’ tab for users to browse all pangolin genome annotation and transcripts information; (ii) ‘Tools’ tab that contains a list of analysis tools for users to analyze pangolin genomic data; (iii) ‘Genome Browser’ tab that incorporates dual genome browser, UCSC genome browser and JBrowse for visualizing pangolin genome sequences and annotation and (iv) ‘Download' tab that allows users to download all pangolin genome/transcript data and annotations. In addition, PGD also has implemented quick search function, allowing users to rapidly search for genes of interest stored in the MySQL database. Furthermore, users can perform sequence searches against the pangolin genome sequences using their sequences of interest and the incorporated web-based BLAT and BLAST tools. Our web architecture allows users to access and submit their tasks using the front-end provided securely with the web server via the Internet. We normalized and optimized the database schema in order to reduce the data redundancy of pangolin's genomic data. Therefore, the database was well designed in such a way that MySQL would be performing optimally, without the loss of data integrity. As such, user can submit tasks, search, browse and retrieve the genomic data through PGD in a robust and efficient manner.

Graphical User Interface Design

Overview

The homepage of PGD contains general descriptions of pangolins in the main panel and manually compiled information about pangolins such as latest news and conferences, blogs and published articles in the right-side panel. On the top of the PGD homepage, several option tabs are provided, allowing users to access different features of PGD. For instance, the Browse tab allows users to quickly browse the detailed annotation data of Malayan and Chinese pangolins, as well as transcriptomic data for the Malayan pangolin. In addition, the Tools tab allows access to analysis tools such as BLAST ( 25 ) and BLAT ( 27 ) packages that we have incorporated into PGD. Users also can visualize the genome and genomic features using the unique real-time keyword search feature for fast and smooth searching of genes of interest ( Figure 2 ).

Figure 2.

Schematic structure of the PGD.

Browsing pangolin genomic data in PGD

By clicking on the Browse tab on the top of the PGD homepage, users can access all gene information of the both pangolin species ( Figure 3 ). All genes will be displayed after users click on the tab. Users can access the gene details page containing the gene annotation and functional information of a particular gene by clicking on the ‘Details’ button associated with the gene of interest. This gene details page will display all information about the gene of interest including sequence name, scaffold ID, start and stop position of the genes, putative functions, gene sequences and BLAST2GO alignment results such as ‘Top Hit Species’, ‘Min E-value’, annotated ‘GOs’ information, ‘InterPro Scan’ domain information. JBrowse is also incorporated in this page, allowing users to visualize and explore the locus and neighboring regions of the gene within the pangolin genome.

Figure 3.

A screenshot of gene details page. This page will display information of a gene including its sequences and functional annotation.

Keyword and sequence searches

PGD contains a vast amount of pangolin genomic, transcriptomic data and annotations. Therefore, it is necessary to provide intuitive web-based graphical user interface (GUI) allowing users to rapidly search a large volume of data. We have implemented a keyword search system in PGD to allow users to search genes of interest in a spontaneous manner. Alternatively, users can search by sequences. PGD has built-in BLAST ( 25 ) and BLAT ( 27 ) tools, which allows users to search or compare a query sequence against the pangolin genome and transcriptomic data. Using the BLAST tools, users can perform nucleotide searches to each annotated sequence (BLASTN), whole-genome nucleotide searches (BLAST Whole Genome), protein searches of each annotated sequence (BLASTP), and nucleotide searches of each annotated protein sequence (BLASTX). Users can choose the pangolin species of interest (either Malayan pangolin or Chinese pangolin) and also set the desired cutoffs (e.g. expect value and enable to search for low compositional complexity regions) for their sequence searches. Alternatively, users can perform a sequence search using BLAT, which is structured differently from BLAST, to search the similarity in a query sequence but it accepts an exact or nearly exact match to find the hit. Unlike BLAST which is a local alignment software, BLAT is also designed to map transcripts/ESTs that have no introns back to the reference pangolin genomes. In this case, users would be able to examine the location of this transcript and also its gene structure within the genome. Using our provided web-based BLAST, users are able to set the parameters such as (i) minimum number of matches, (ii) minimum score, (iii) minimum identity percentage, (iv) maximum gap, (v) tile size and (vi) maximum intron size. The BLAT outputs will be linked with the genome browser, allowing instant visual comparison of each match hit.

Interactive Pangolin Genome Data Visualization

A fully dynamic real-time genome browser, JBrowse, is incorporated into PGD, allowing users to instantly browse and visualize pangolin genome sequence and annotation data on the fly. JBrowse provides a few useful features: (i) supports fast and smooth genome navigation, (ii) utilizes multiple types of data format, (iii) provides real-time genome browsing with light server resource required, (iv) provides high-speed visualizing the data, sequences and genome annotation results.

Using the pangolin genome browser (JBrowse) ( Figure 4 ), users can choose whether to visualize the pangolin data in Malayan pangolin genome or Chinese pangolin genome. Currently more annotations (e.g. transcriptomic data) are available for the Malayan pangolin than the Chinese pangolin, because many research projects are ongoing under the IPaRC for the Malayan pangolin species. Once users choose which genome to explore, they can see all relevant annotation tracks in the genome browser. For instance, we have a basic track as the reference genome sequence of the selected genome. We also have different annotation tracks that can be triggered on/off for easier visualization and analysis. These tracks are classified into different categories such as ‘Genes and Annotations’, ‘Repeat’, ‘Transcriptome’, ‘Transcript Reads Mapping’, ‘Organ Specific Expression’ and ‘Variation’ which represent our annotations from different analyses.

Figure 4.

Pangolin genome browser. Users can turn on and off the annotation tracks in the left panel.

Genes and annotations tracks

This category will show the results of MAKER gene models including the evidence-based and de novo gene predictions, together with the predicted pseudogenes. In the pseudogene track, users can visualize the predicted pseudogenes in the selected pangolin genome. These pseudogenes are labeled with ‘PSSD’ and ‘DUP’ as categorized by PseudoPipe pipeline ( 21 ). The pseudogene tracks report all useful features found in the pseudogenes, including parent gene coverage ‘Frac’, identity ‘Ident’, insertion ‘Ins’, deletion ‘Del’, frameshift ‘Shift’ and stop codon ‘Stop’. This information will provide a better understanding of pseudogene attributes found in both pangolins.

Repeat tracks

The repetitive element information is provided by different software which we display in three different tracks: RepeatMasker, Repeat Modeler and Structural Simple Repeat tracks. RepeatMasker track shows the repetitive elements screened in the pangolin genomes using two different repeat libraries: the carnivore repeat library and mammal repeat library. We used SSRIT ( 28 ) for scanning the simple sequence repeat or structural sequence repeat present in both pangolin genome. The SSRIT reports the entire simple sequence repeats and allows user to study microsatellites in pangolin genomes.

Transcriptome tracks

Users can explore and visualize expressed genes in Malayan pangolin through the tracks under this category. The expressed transcripts generated by pooling all reads from the eight pangolin organs and assembled using three different assemblers are available here. These common transcripts were mapped onto the reference genome using GMAP ( 29 ). Moreover, users can also explore and examine which transcripts/genes are expressed in a specific pangolin organ such as cerebrum, cerebellum and liver through the provided organ-specific transcripts tracks.

Transcript reads mapping tracks

Users can also examine the expression level of each gene of Malayan pangolin across different organs. Briefly, for each organ or organ-specific transcriptome, we mapped RNA-Seq reads to the Malayan pangolin genome sequence and the reads mapping data of eight different organs (cerebrum, cerebellum, liver, heart, kidney, thymus, spleen and lung) were available in the ‘Transcript Reads Mapping’ category.

Transcriptome profile tracks

Raw transcript reads mapping results of eight different pangolin organs were normalized using RPKM value and converted into a graphical bigwig format track that visualized the expression profile of each organ. Users are able to examine the expression profiles of a gene of interest across eight different organs simultaneously. Of note, the PGD genome browser also links some features (e.g. genes and repeats) to external resources. Among of these resources are RepeatMasker database for each repeats elements, BLASTX evidence tracks is linked to well-established databases to check each hit on ENSEMBL database, PANTHER database ( 30 ), Dog Genome SNP database ( 31 ) and UniProt ( 32 ) by simply right clicking on the features in the tracks. These external database interlinks may help users understand further information of the features.

Multiple mammalian genome structural alignment

We performed the multispecies structural alignment of two pangolin genomes to other mammalian genomes (dog, cat, cow, horse, human and mouse) using the Progressive Cactus software ( 33 ). The animal genome data used in the alignment are shown in Table 2 .

Table 2.

Genome assembly version for each mammal genome used for multiple sequence alignment

Animal	Scientific name	Genome assembly
Dog	Canis familiaris	CanFam3.1
Cat	Felis catus	Felis_catus_8.0
Cow	Bos taurus	Bos_taurus_3.1
Horse	Equus caballus	EquCab_2.0
Human	Homo sapiens	GRCh37.p5
Mouse	Mus musculus	GRCm38.p4

Animal	Scientific name	Genome assembly
Dog	Canis familiaris	CanFam3.1
Cat	Felis catus	Felis_catus_8.0
Cow	Bos taurus	Bos_taurus_3.1
Horse	Equus caballus	EquCab_2.0
Human	Homo sapiens	GRCh37.p5
Mouse	Mus musculus	GRCm38.p4

Table 2.

Genome assembly version for each mammal genome used for multiple sequence alignment

Animal	Scientific name	Genome assembly
Dog	Canis familiaris	CanFam3.1
Cat	Felis catus	Felis_catus_8.0
Cow	Bos taurus	Bos_taurus_3.1
Horse	Equus caballus	EquCab_2.0
Human	Homo sapiens	GRCh37.p5
Mouse	Mus musculus	GRCm38.p4

Animal	Scientific name	Genome assembly
Dog	Canis familiaris	CanFam3.1
Cat	Felis catus	Felis_catus_8.0
Cow	Bos taurus	Bos_taurus_3.1
Horse	Equus caballus	EquCab_2.0
Human	Homo sapiens	GRCh37.p5
Mouse	Mus musculus	GRCm38.p4

Prior to performing the structural alignment procedure, repetitive regions of the assemblies were masked with RepeatMasker ( 34 ). The phylogenetic tree used to perform the alignment is given in Figure 5 .

Figure 5.

The phylogenetic tree of species involved into the structural alignment.

Pangolin genome conservation tracks

Basewise conservation scores were obtained for Malayan and Chinese pangolin genomes from the multispecies structural alignment using the HAL-phyloP tool from the HAL package ( 35 ). HAL-phyloP wraps the original phyloP algorithm ( 36 ) and is designed to efficiently process multispecies structural alignments by ProgressiveCactus.

Coding sequences (CDSs) of Malayan pangolin genes were used to train a neutral evolution model for phyloP. Next, phyloP was launched in the conservation and acceleration (‘CONACC’) mode to obtain the corresponding P values for each position in both pangolin genomes. The common logarithms of the basewise P values were reported; positions under acceleration were flagged by making their values negative, whereas positions under conservation had their values positive.

Assembly hub for UCSC Genome Browser

The UCSC Genome Browser ( 37 ) is a well-known and widely used tool for bioinformatics analysis. Alongside with the PGD database, we provide an assembly hub ( 38 ) for the UCSC Genome Browser that contains datasets from PGD and provides means for viewing them in the genome browser. The hub also presents the structural alignment between genomes of both pangolin species and other mammals in the form of the snake tracks ( 39 ) and the genomic conservation tracks obtained from the alignment with the HAL-phyloP tool.

In addition to presenting the pangolin genome annotation tracks, the hub provides access to other features and tools available in the UCSC Genome Browser, including Table Browser ( 40 ) for manipulating the tracks and obtaining nucleotide sequences of the features.

Data download

PGD provides a user-friendly interface for downloading pangolin genome and transcriptomic data and annotations. The raw sequencing reads from different library sizes are available for download. Users also can download the assembled reference genome sequences of the Malayan and Chinese pangolin for downstream analyses. Other available data or annotations are available for download including the MAKER-generated gene sequences (nucleotide CDS, exons and protein sequences), expressed genes from Malayan pangolin transcriptomes and the raw RNA-Seq data.

There are two ways whereas users can download these data and annotations. First, users can download these data through the ‘Download’ page using the provided web interface. Users can select the data or annotations of interest to download. Second, users can download the raw data and annotations through the File Transfer Protocol interface provided in the download page ( Figure 6 ).

Figure 6.

Web interfaces for data download in PGD.

Conclusion

We anticipate that the PGD will become an invaluable hub, allowing researchers to access, browse, retrieve and analyze pangolin genomic and expression data and annotations. This hub would facilitate research in pangolin biology, particularly in the conservation of this critically endangered species and will also enhance our understanding of mammalian biology and evolution. We will continue updating PGD by incorporating more data, annotations and analysis tools particularly from our IPaRC consortium as it becomes available. We also welcome researchers to provide suggestions and/or share data for the improvement of the PGD hub.

Availability

PGD is accessible at http://pangolin-genome.um.edu.my . The UCSC Genome Browser hub configuration file is available at http://public.dobzhanskycenter.ru/PangolinHub/hub.txt . Users can download all the raw sequences and datasets used in this paper from the PGD website.

Acknowledgements

We appreciate all members of GIRG for the helpful discussion and valuable comments. We also want to thank Mark Diekhans, Benedict Paten, Joel Armstrong, Glenn Hickey and Brian Raney of the UCSC Genomics Institute for the support in use of ProgressiveCactus tool and the HAL package. Also thank to Tan Shi Yang for providing IT and bioinformatics supports in this project.

Funding

This project was supported by University of Malaya and Ministry of Education, Malaysia under the High Impact Research (HIR) grant UM.C/625/HIR/MOHE/CHAN-08.

References

Kondrashov

Agadjanian

A.K.

(

2012

)

A nearly complete skeleton of ernanodon (Mammalia, Palaeanodonta) from Mongolia: morphofunctional analysis

J. Vertebr. Paleontol

983

–

1001

Kawashima

Thorington

R.W.

Bohaska

P.W

. et al. . (

2015

)

Anatomy of shoulder girdle muscle modifications and walking adaptation in the scaly Chinese Pangolin ( Manis Pentadactyla Pentadactyla: Pholidota) compared with the partially Osteoderm-Clad Armadillos (Dasypodidae)

Anat. Rec

298

1217

–

1236

Lin

M.F.

Chang

C.Y.

Yang

C.W

. et al. . (

2015

)

Aspects of digestive anatomy, feed intake and digestion in the Chinese pangolin ( Manis Pentadactyla ) at Taipei zoo

Zoo Biol

262

–

270

Chin

S.C.

Lien

C.Y.

Chan

Y.T

. et al. . (

2015

)

Hematologic and Serum Biochemical Parameters of Apparently Healthy Rescued Formosan Pangolins ( Manis Pentadactyla Pentadactyla)

J. Zoo Wildlife Med

–

IUCN 2015. The IUCN Red List of Threatened Species. Version 2015-4. < http://www.iucnredlist.org >.

Guo

Zou

Chen

. et al. . (

1997

)

Sustainability of wildlife use in traditional Chinese medicine

Conserv. China’s Biodivers

190

–

220

Zhou

Z.M.

Zhou

Newman

. et al. . (

2014

)

Scaling up pangolin protection in China

Front. Ecol. Environ

–

Luo

Yan

Song

et al. . (

2013

)

A strategy for trade monitoring and substitution of the organs of threatened animals

Sci. Rep

Simpson

J.T.

Durbin

(

2012

)

Efficient de novo assembly of large genomes using compressed data structures

Genome Res

549

–

556

Cantarel

B.L.

Korf

Robb

S.M

. et al. . (

2008

)

MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes

Genome Res

188

–

196

Shirak

Grabherr

Di Palma

. et al. . (

2010

)

Identification of repetitive elements in the genome of Oreochromis niloticus : Tilapia Repeat Masker

Mar. Biotechnol

121

–

125

Smith

C.D.

Edgar

R.C.

Yandell

M.D

. et al. . (

2007

)

Improved repeat identification and masking in Dipterans

Gene

389

–

Cunningham

Amode

M.R.

Barrell

. et al. . (

2015

)

Ensembl 2015

Nucleic Acid Res

D662

–

D669

Conesa

Götz

García-Gómez

J.M

. et al. . (

2005

)

Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research

Bioinformatics

3674

–

3676

Ashburner

Ball

C.A.

Blake

J.A

. et al. . (

2000

)

Gene Ontology: tool for the unification of biology

Nat. Genet

–

Kanehisa

Goto

(

2000

)

KEGG: Kyoto encyclopedia of genes and genomes

Nucleic Acid Res

–

Hunter

Apweiler

Attwood

T.K

. et al. . (

2009

)

InterPro: the integrative protein signature database

Nucleic Acid Res

D211

–

D215

Xie

Y.L.

G.X.

Tang

J.B

. et al. . (

2014

)

SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads

Bioinformatics

1660

–

1666

Zerbino

D.R.

Birney

(

2008

)

Velvet: algorithms for de novo short read assembly using de Bruijn graphs

Genome Res

821

–

829

Haas

B.J.

Papanicolaou

Yassour

. et al. . (

2013

)

De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis

Nat. Protoc

1494

–

1512

Zhang

Z.L.

Carriero

Zheng

D.Y

. et al. . (

2006

)

PseudoPipe: an automated pseudogene identification pipeline

Bioinformatics

1437

–

1439

Sisu

Pei

B.K.

Leng

. et al. . (

2014

)

Comparative analysis of pseudogenes across three phyla

Proc. Natl Acad. Sci. USA

111

13361

–

13366

Pei

B.K.

Sisu

Frankish

. et al. . (

2012

)

The GENCODE pseudogene resource

Genome Biol

Karro

J.E.

Yan

Y.P.

Zheng

D.Y

. et al. . (

2007

)

Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation

Nucleic Acids Res

D55

–

D60

Altschul

S.F.

Gish

Miller

. et al. . (

1990

)

Basic local alignment search tool

J. Mol. Biol.

215

403

–

410

Pearson

W.R.

(

1999

)

Flexible sequence similarity searching with the FASTA3 program package

Methods Mol. Biol.

132

185

–

219

Kent

W.J.

(

2002

)

BLAT—the BLAST-like alignment tool

Genome Res

656

–

664

Temnykh

DeClerck

Lukashova

. et al. . (

2001

)

Computational and experimental analysis of microsatellites in rice ( Oryza sativa L.): Frequency, length variation, transposon associations, and genetic marker potential

Genome Res

1441

–

1452

T.D.

Watanabe

C.K.

(

2005

)

GMAP: a genomic mapping and alignment program for mRNA and EST sequences

Bioinformatics

1859

–

1875

Thomas

P.D.

Campbell

M.J.

Kejariwal

. et al. . (

2003

)

PANTHER: A library of protein families and subfamilies indexed by function

Genome Res

2129

–

2141

Bai

Zhao

W.M.

Tang

B.X

. et al. . (

2015

)

DoGSD: the dog and wolf genome SNP database

Nucleic Acid Res

D777

–

D783

Chan

W.M.

UniProt Consortium

. (

2010

)

The UniProt Knowledgebase (UniProtKB): a freely accessible, comprehensive and expertly curated protein sequence database

Genet. Res

–

Paten

Diekhans

Earl

. et al. . (

2011

)

Cactus graphs for genome comparisons

J. Computat. Biol

469

–

481

Smit,A.F.A., Hubley,R., Green, P. (2004). RepeatMasker Open-3.0. 2004. Seattle (WA): Institute for Systems Biology.

Hickey

Paten

Earl

. et al. . (

2013

)

HAL: a hierarchical format for storing and analyzing multiple genome alignments

Bioinformatics

1341

–

1342

Pollard

K.S.

Hubisz

M.J.

Rosenbloom

K.R

. et al. . (

2010

)

Detection of nonneutral substitution rates on mammalian phylogenies

Genome Res

110

–

121

Kent

W.J.

Sugnet

C.W.

Furey

T.S

. et al. . (

2002

)

The human genome browser at UCSC

Genome Res

996

–

1006

Raney

B.J.

Dreszer

T.R.

Barber

G.P

. et al. . (

2014

)

Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser

Bioinformatics

1003

–

1005

Nguyen

Hickey

Raney

B.J

. et al. . (

2014

)

Comparative assembly hubs: web-accessible browsers for comparative genomics

Bioinformatics

3293

–

3301

Karolchik

Hinrichs

A.S.

Furey

T.S

. et al. . (

2004

)

The UCSC Table Browser data retrieval tool

Nucleic Acid Res

D493

–

D496

Choo

S.W.

Rayko

Tan

T.K

. et al. . (

2016

)

Pangolin genomes and the evolution of mammalian scales and immunity

Genome Research

gr-203521