iMITEdb: the genome-wide landscape of miniature inverted-repeat transposable elements in insects

Author Notes

Miniature inverted-repeat transposable elements (MITEs) have attracted much attention due to their widespread occurrence and high copy numbers in eukaryotic genomes. However, the systematic knowledge about MITEs in insects and other animals is still lacking. In this study, we identified 6012 MITE families from 98 insect species genomes. Comparison of these MITEs with known MITEs in the NCBI non-redundant database and Repbase showed that 5701(∼95%) of 6012 MITE families are novel. The abundance of MITEs varies drastically among different insect species, and significantly correlates with genome size. In general, larger genomes contain more MITEs than small genomes. Furthermore, all identified MITEs were included in a newly constructed database (iMITEdb) (http://gene.cqu.edu.cn/iMITEdb/), which has functions such as browse, search, BLAST and download. Overall, our results not only provide insight on insect MITEs but will also improve assembly and annotation of insect genomes. More importantly, the results presented in this study will promote studies of MITEs function, evolution and application in insects.

Database URL: http://gene.cqu.edu.cn/iMITEdb/

Introduction

Miniature inverted-repeat transposable elements (MITEs) were first discovered in plants, and are widely distributed in eukaryotes (1–5). MITEs belong to class II (or DNA) transposable elements (TEs), and are non-autonomous elements derived from the internal-deletion of autonomous DNA transposons (6, 7). However, they can be mobilized by transposases encoded by their parental autonomous transposons (called trans-mobilization) or non-parental elements (called cross-mobilization) (8, 9). MITEs can be classified into different superfamilies based on the nucleotide composition of terminal inverted repeats (TIRs) and target site duplications (TSDs). Unlike other DNA transposons, MITEs often have some obvious characteristics: shorter sequence length (<800 bp), high AT content, insertion preference in or near genes and high copy numbers in a genome (10–12).

MITEs have attracted widespread attention due to their roles in gene expression, genome evolution and phenotypic diversity (13–16). MITEs not only up-regulate the expression of nearby genes by acting as new cis-regulatory elements but also down-regulate or silence the expression of some genes by small RNAs derived from these elements at the transcriptional and/or post-transcriptional levels (14, 17–20). Besides, MITEs make a great contribution on the evolution of genome size (12, 16). Furthermore, MITEs are considered as a good genetic source applied in DNA makers, transgenic vectors and effective insertion mutagen (21–24). However, most of above results were obtained from studies of plants.

Since more and more genome sequences become available, several computer programs have been developed to identify MITEs in genomes, and a larger number of MITEs have been identified in the eukaryotic genomes especially in plant genomes (13, 20, 25–28). Although several studies tried to identify MITEs in the insect genomes (3, 5, 29), the number of reported MITEs could be just the tip of the iceberg with rapidly increasing insect genome were released.

In the present study, MITEs from 98 insect genomes were identified, classified and annotated using MITE-Hunter and Repetitive Sequence with Precise Boundaries (RSPB) as well as a series of Perl scripts. We identified 6012 MITE families belonging to 16 known superfamilies in these genomes. In total 5701 of 6012 MITEs families are novel and have no matches to the previously known MITEs in the databases of Repbase and NCBI non-redundant nucleotide database. The abundance of MITEs varies greatly among the different insect species and significantly correlated with genome size. Finally, all identified MITEs are made available in a newly constructed database called iMITEdb.

Materials and methods

Data sources used in this study

Ninety-eight released insect genomes including Coleoptera (7 species), Diptera (48 species), Hemiptera (8 species), Hymenoptera (20 species), Lepidoptera (9 species), Strepsiptera (1 species), Orthoptera (1 species), Odonata (1 species), Isoptera (1 species), Thysanoptera (1 species) and Ephemeroptera (1 species) were downloaded from NCBI (http://www.ncbi.nlm.nih.gov/) (as of 8 March 2015) (Supplementary Table S1).

Identification, classification and characterization of insect MITEs

MITE-Hunter and RSPB were used to search for MITEs in 98 insect genomes (20, 27). Briefly, the pipeline for MITEs identification included four steps (Supplementary Figure S1): (i) First, MITE-Hunter was used to search insect genomes for candidate MITEs. Then, RSPB was used to identify potential insect MITEs. In RSPB, the hunter2ref.pl script, a Perl script of RSPB, was used to skip the confirmed MITEs identified by MITE-Hunter; (ii) Each candidate MITE was used as a query in BLASTN (e-value < e⁻⁶) search against the corresponding genome sequence. Candidate MITE families with copy numbers <3 were discarded. Then, multiple sequences retrieved by each candidate MITE were aligned using MUSCLE (30); (iii) Consensus sequence was generated using a Perl script, and consensus sequences >800 bp in length were discarded; (iv) Finally, the TSDs and TIRs of each MITEs were retrieved using Perl script. MITEs from each species were assigned into families through all-versus-all BLAST method. The same family was defined by nucleotide identity>80%, BLAST e-value < e⁻⁶ and percent query coverage >80%. MITEs were classified into superfamily based on TIRs and TSDs (16).

Construction of insect MITEs database

A database containing the information of all insect MITEs identified in this study was constructed using Linux, PHP, Apache, MySQL and Perl as well as Common Gateway Interface.

Results and discussion

Identification, classification and abundance of MITEs in 98 insect genomes

In this study, a total of 6012 MITE families were identified in 98 insect genomes. The consensus sequences of these MITE families were used as queries in BLASTN (e-value < 10⁻¹⁰) searches against the Repbase and NCBI non-redundant nucleotide database. Both databases include almost all known MITEs. We found that 5701 (∼95%) MITE families did not match to any known TEs in both databases. Therefore, these families were defined as novel MITE families. MITEs like other TEs are huge challenges for host genome sequencing, assembly and annotation due to their repeatability. Thus, the larger number of novel MITE families identified in this study will greatly improve sequencing, assembly and annotation of insect genomes, and facilitate the evolutionary and functional studies of MITEs in the future.

Based on the characteristics of TSDs and TIRs, 5531 MITE families were classified into 16 known superfamilies including TC1-Mariner, PHIS, P, Kolobok, PiggyBac, CMC, Sola1, hAT, Merlin, Sola2, Ginger, MULE, Sola3, Academ, Transib and Zator. Meanwhile, 481 families could not be readily assigned to any known DNA transposon superfamilies, and were designated as unknown (Figure 1A). The abundance of different MITE superfamilies in the 98 insect genomes varies markedly. The largest MITE superfamily is the TC1-Mariner constituting of 124.84 megabase (Mb) in the insect genomes studied. The smallest superfamily is Transib cover 0.19 Mb (Figure 1B). The numbers of families and copies also vary greatly among the different MITE superfamilies. The number of families ranges from 2 to 1,661 and the number of copies ranges from 404 to 334, 508. TC1-Mariner is also the largest superfamily with high numbers of families and copies (Figure 1C). However, the average sequence length of TC1-Mariner (∼373 bp) superfamily is smaller than Academ (∼470 bp), Kolobok (∼386 bp), Transib (∼482 bp), Sola3 (∼428 bp) and Merlin (∼383 bp) superfamilies (Figure 1D).

Figure 1.

Characteristics of each MITEs superfamily in insect genomes. (A) Structure of each superfamily. TSDs sequence and TIRs are shown. (B) Amount of nucleotide covered of each superfamily in 98 insect genomes. (C) The number of families and copies of each superfamily in the investigated insect genomes. Numbers in parenthesis represents ‘families/copies’. (D) The distribution of consensus sequence length for each MITE superfamily.

Open in new tab Download slide

Such abundance of the TC1-Mariner superfamily in the insect genomes could be in part explained by its short TSDs because short TSDs likely have much more target sites in host genomes. TC1-Mariner transposons are prevalent in eukaryotes, and feature di-nucleotide (5′-TA-3′) TSDs (31). TC1-Mariner TSDs are the shortest among known DNA transposon superfamilies (PiggyBac is characterized by 5′-TTAA-3′ TSDs, P is 7-8 bp TSDs, Academ is 3 bp TSDs etc.) (32). TSDs of Academ are shorter than PiggyBac, hAT, Merlin etc. However, among the superfamilies, Academ has the lowest abundance. Thus, the number of target sites can not completely explain the abundance variation of transposon superfamilies.

The result of correlation analysis revealed that the abundance of each superfamily in insect genomes significantly correlated with the numbers of its families and copies (Supplementary Figure S2A and S2B), and have no significant correlation to its length (Supplementary Figure S2C). In general, the numbers of families and copies of a transposon superfamily are affected by their transposition activities, removal rate or host TE regulation and so on. Whether TC1-Mariner in insect genomes has higher transposition activity is to be experimentally verified in the future. If this is case, TC1-Mariner could be exploited as a good vector in insect transgenic technology.

MITE abundance in 98 insect genomes

To estimate the abundance of MITEs in each insect genome, the consensus sequence of each MITE family was used as query in a BLASTN search against corresponding genome. The results show that the abundance of MITEs varies greatly among the different genomes (Figure 2A). For instance, MITEs constitute ∼84.60 Mb (occupied ∼6.4% of the genome sequence) in the Aedes aegypti genome. But Anopheles darlingii harbors only ∼0.04 Mb (∼0.03% of genome) MITE sequences in its genome. Similarly, there are ∼2.18 Mb MITE sequences in the Onthophagus taurus genome, whereas only ∼0.9 Mb in the Agrilus planipennis genome. In addition, the numbers of MITE superfamilies and families also vary markedly among the different genomes (Figure 2B). For example, 80 families with 7 known superfamilies were detected in the Megachile rotundata genome, whereas only one family was identified in the Apis florea genome.

Figure 2.

Distribution and abundance of MITEs in 98 insect genomes. (A) Amount of nucleotide covered of MITEs in each insect genome. Same color bars represent the same insect order. Numbers represent MITEs abundance (in megabase) in different insect genomes. (B) Distribution of MITEs superfamily in each insect genome, color boxes indicated presence; numbers within the color box represent the number of families.

Open in new tab Download slide

We performed a correlation analysis between the abundance of MITEs in a species and the corresponding genome size and found a very significant correlation between the two characteristics (r = 0.41, P-value = 2.214e-5), indicating that the abundance of MITEs influences the insect genome size (Figure 3). This is consistent with the observation in plant genomes (16). When compared with other TEs, MITEs are shorter in a given genome. However, MITEs usually have high copy numbers (12). Thus, they may have an important role in the evolution of the genome size.

Figure 3.

Correlation between the abundance of MITEs and genome size. Histogram above the graph (in red) represents distribution of genome size (unit—1000 megabase). Histogram below the graph (in blue) represents the distribution of MITEs abundance (unit—megabase). Correlation analysis was performed using the R program with the Pearson’s method.

Open in new tab Download slide

Construction of insect MITEs database

We constructed a database, called iMITEdb, using the MITE sequences identified in this study and other available MITE information. The iMITEdb contains the following functions: browse, search, BLAST and download (http://gene.cqu.edu.cn/iMITEdb/) (Figure 4). Each MITE family in the database includes species, superfamily name, family name, TIRs, TSDs, copies, length and consensus sequence. This database allows browsing the information of interesting MITEs based on insect species, superfamily and family. Users can also perform individual search based on a MITE name to obtain information about each MITE family. In BLAST searches, users can enter a sequence in FASTA format or load a DNA sequence containing file to perform BLASTN search against all identified MITEs. All MITE sequences available in the database can also be downloaded. This will greatly facilitate the studies of function and evolution of MITEs in insects in future.

Figure 4.

The web interface of iMITEdb. The interfaces had browse, search, blast, download, links and contacts.

Open in new tab Download slide

Acknowledgements

We thank all members of Dai’s group for their laboratory assistance and useful comments on this article, and thank Dr Cédric Feschotte at Department of Human Genetics, University of Utah, USA, for his helpful discussion during the study.

Funding

This work was supported by the National Natural Science Foundation of China (No. 31401106 to M.J.H., No. 31471197 to Z.Z. and No. 31560308 to Z.H.H.); Fundamental and Advanced Research Project of Chongqing Municipality (No. cstc2016jcyjA0258 to M.J.H.); the Hi-Tech Research and Development (863) Program of China (No. 2013AA102507 to F.Y.D.); Fundamental Research Funds for the Central Universities (XDJK2016C009 and SWU115035 to M.J.H.).

Supplementary data

Supplementary data are available at Database Online.

Conflict of interest: None declared.

References

Wessler

S.R.

Varagona

M.J.

(

1985

)

Molecular basis of mutations at the waxy locus of maize: correlation with the fine structure genetic map

Proc. Natl. Acad. Sci. USA

4177

–

4181

Google Scholar

Crossref

WorldCat

Bureau

T.E.

Wessler

S.R.

(

1992

)

Tourist: a large family of small inverted repeat elements frequently associated with maize genes

Plant Cell

1283

–

1249

(

2001

)

Eight novel families of miniature inverted repeat transposable elements in the African malaria mosquito, Anopheles gambiae

Proc. Natl. Acad. Sci. USA

1699

–

1704

Google Scholar

Crossref

WorldCat

Oki

Yano

Okumoto

et al. . (

2008

)

A genome-wide view of miniature inverted-repeat transposable elements (MITEs) in rice, Oryza sativa ssp. japonica

Genes Genet. Syst

321

–

329

Han

M.J.

Shen

Y.H.

Gao

Y.H.

et al. . (

2010

)

Burst expansion, distribution and diversification of MITEs in the silkworm genome

BMC Genomics

520.

Feschotte

Mouchès

(

2000

)

Evidence that a family of miniature inverted-repeat transposable elements (MITEs) from the Arabidopsis thaliana genome has arisen from a pogo-like DNA transposon

Mol. Biol. Evol

730

–

737

Feschotte

Swamy

Wessler

S.R.

(

2003

)

Genome-wide analysis of mariner-like transposable elements in rice reveals complex relationships with stowaway miniature inverted repeat transposable elements (MITEs)

Genetics

163

747

–

758

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Feschotte

Jiang

Wessler

S.R.

(

2002

)

Plant transposable elements: where genetics meets genomics

Nat. Rev. Genet

329

–

341

Yang

Nagel

D.H.

Feschotte

et al. . (

2009

)

Tuned for transposition: molecular determinants underlying the hyperactivity of a Stowaway MITE

Science

325

1391

–

1394

Bureau

T.E.

Wessler

S.R.

(

1994

)

Stowaway: a new family of inverted repeat elements associated with the genes of both monocotyledonous and dicotyledonous plants

Plant Cell

907

–

916

Smit

A.F.

Riggs

A.D.

(

1996

)

Tiggers and DNA transposon fossils in the human genome

Proc. Natl. Acad. Sci. USA

1443

–

1448

Google Scholar

Crossref

WorldCat

Jiang

Bao

Zhang

et al. . (

2003

)

An active DNA transposon family in rice

Nature

421

163

–

167

Wessler

S.R.

Bureau

T.E.

White

S.E.

(

1995

)

LTR-retrotransposons and MITEs: important players in the evolution of plant genomes

Curr. Opin. Genet. Dev

814

–

821

Naito

Zhang

Tsukiyama

et al. . (

2009

)

Unexpected consequences of a sudden and massive transposon amplification on rice gene expression

Nature

461

1130

–

1134

Chen

Zhang

et al. . (

2012

)

Miniature inverted-221 repeat transposable elements (MITEs) in rice were originated and amplified predominantly after the divergence of Oryza and Brachypodium and contributed considerable diversity to the species

Mob. Genet. Elements

127

–

132

Chen

Zhang

et al. . (

2014

)

P-MITE: a database for plant miniature inverted-repeat transposable elements

Nucleic Acids Res

D1176

–

D1181

Yang

Lee

Y.H.

Jiang

et al. . (

2005

)

A two-edged role for the transposable element Kiddo in the rice ubiquitin2 promoter

Plant Cell

1559

–

1568

Hollister

J.D.

Gaut

B.S.

(

2009

)

Epigenetic silencing of transposable elements: a trade-off between reduced transposition and deleterious effects on neighboring gene expression

Genome Res

1419

–

1428

Kuang

Padmanabhan

et al. . (

2009

)

Identification of miniature inverted-repeat transposable elements (MITEs) and biogenesis of their siRNAs in the Solanaceae: new functional implications for MITEs

Genome Res

–

Chen

Zhang

et al. . (

2012

)

Miniature inverted-repeat transposable elements (MITEs) have been accumulated through amplification bursts and play important roles in gene expression and species diversity in Oryza sativa

Mol. Biol. Evol

1005

–

1017

Amundsen

Rotter

et al. . (

2011

)

Miniature inverted-repeat transposable element identification and genetic marker development in Agrostis

Crop Sci

854

–

861

Google Scholar

Crossref

WorldCat

Hancock

C.N.

Zhang

Floyd

et al. . (

2011

)

The rice miniature inverted repeat transposable element mPing is an effective insertional mutagen in soybean

Plant Physiol

157

552

–

562

Yaakov

Ceylan

Domb

et al. . (

2012

)

Marker utility of miniature inverted-repeat transposable elements for wheat biodiversity and evolution

Theor. Appl. Genet

124

1365

–

1373

Fattash

Bhardwaj

Hui

et al. . (

2013

)

A rice stowaway MITE for gene transfer in yeast

PLoS One

e64135.

Yang

Hall

T.C.

(

2003

)

MAK, a computational tool kit for automated MITE analysis

Nucleic Acids Res

3659

–

3665

Chen

Zhou

et al. . (

2009

)

MUST: a system for identification of miniature inverted-repeat transposable elements and applications to Anabaena variabilis and Haloquadratum walsbyi

Gene

436

–

Han

Wessler

S.R.

(

2010

)

MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences

Nucleic Acids Res

e199.

Yang

(

2013

)

MITE Digger, an efficient and accurate algorithm for genome wide discovery of miniature inverted repeat transposable elements

BMC Bioinformatics

186.

Nene

Wortman

J.R.

Lawson

et al. . (

2007

)

Genome sequence of Aedes aegypti, a major arbovirus vector

Science

316

1718

–

1723

Edgar

R.C.

(

2004

)

MUSCLE: multiple sequence alignment with high accuracy and high throughput

Nucleic Acids Res

1792

–

1797

Shao

(

2001

)

Expanding the diversity of the IS630-Tc1-mariner superfamily: discovery of a unique DD37E transposon and reclassification of the DD37D and DD39D transposons

Genetics

159

1103

–

1115

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Yuan

Y.W.

Wessler

S.R.

(

2011

)

The catalytic domain of all eukaryotic cut-and-paste transposase superfamilies

Proc. Natl. Acad. Sci. USA

108

7884

–

7889

Google Scholar

Crossref

WorldCat

Author notes

†

These authors contributed equally to this work.

Citation details: Han,M.J., Zhou,Q.Z., Zhang,H.H. et al. iMITEdb: the genome-wide landscape of miniature inverted-repeat transposable elements in insects. Database (2016) Vol. 2016: article ID baw148; doi:10.1093/database/baw148

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Month:	Total Views:
January 2017	17
February 2017	16
March 2017	11
April 2017	10
May 2017	9
June 2017	3
July 2017	5
August 2017	7
September 2017	27
October 2017	11
November 2017	14
December 2017	17
January 2018	17
February 2018	16
March 2018	13
April 2018	15
May 2018	12
June 2018	6
July 2018	18
August 2018	33
September 2018	6
October 2018	8
November 2018	24
December 2018	5
January 2019	14
February 2019	35
March 2019	40
April 2019	51
May 2019	33
June 2019	11
July 2019	17
August 2019	8
September 2019	17
October 2019	117
November 2019	31
December 2019	65
January 2020	51
February 2020	27
March 2020	11
April 2020	28
May 2020	12
June 2020	30
July 2020	17
August 2020	6
September 2020	11
October 2020	60
November 2020	26
December 2020	14
January 2021	6
February 2021	5
March 2021	19
April 2021	22
May 2021	27
June 2021	24
July 2021	27
August 2021	16
September 2021	17
October 2021	26
November 2021	22
December 2021	14
January 2022	10
February 2022	9
March 2022	27
April 2022	18
May 2022	8
June 2022	10
July 2022	8
August 2022	13
September 2022	13
October 2022	11
November 2022	15
December 2022	6
January 2023	10
February 2023	3
March 2023	8
April 2023	7
May 2023	6
June 2023	7
July 2023	9
August 2023	21
September 2023	21
October 2023	17
November 2023	16
December 2023	29
January 2024	24
February 2024	31
March 2024	10
April 2024	12
May 2024	8
June 2024	11
July 2024	13
August 2024	5
September 2024	9
October 2024	16
November 2024	10
December 2024	2
January 2025	18
February 2025	4
March 2025	6
April 2025	8
May 2025	6
June 2025	8
July 2025	16
August 2025	14
September 2025	14
October 2025	7
November 2025	8
December 2025	11
January 2026	6

Article Contents

iMITEdb: the genome-wide landscape of miniature inverted-repeat transposable elements in insects

Introduction

Materials and methods

Data sources used in this study

Identification, classification and characterization of insect MITEs

Construction of insect MITEs database

Results and discussion

Identification, classification and abundance of MITEs in 98 insect genomes

MITE abundance in 98 insect genomes

Construction of insect MITEs database

Acknowledgements

Funding

Supplementary data

References

Author notes

Citations

Views

Altmetric

Citing articles via

Latest

Most Read

Most Cited

Article Contents

iMITEdb: the genome-wide landscape of miniature inverted-repeat transposable elements in insects Open Access

Introduction

Materials and methods

Data sources used in this study

Identification, classification and characterization of insect MITEs

Construction of insect MITEs database

Results and discussion

Identification, classification and abundance of MITEs in 98 insect genomes

MITE abundance in 98 insect genomes

Construction of insect MITEs database

Acknowledgements

Funding

Supplementary data

References

Author notes

Citations

Views

Altmetric

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only

Gift article access

Gift article access

Gift article access

Gift article access

iMITEdb: the genome-wide landscape of miniature inverted-repeat transposable elements in insects