Abstract

Carrot ( Daucus carota L.) is an economically important vegetable worldwide and is the largest source of carotenoids and provitamin A in the human diet. Given the importance of this vegetable to humans, research and breeding communities on carrot should obtain useful genomic and transcriptomic information. The first whole-genome sequences of ‘DC-27’ carrot were de novo assembled and analyzed. Transcriptomic sequences of 14 carrot genotypes were downloaded from the Sequence Read Archive (SRA) database of National Center for Biotechnology Information (NCBI) and mapped to the whole-genome sequence before assembly. Based on these data sets, the first Web-based genomic and transcriptomic database for D. carota (CarrotDB) was developed (database homepage: http://apiaceae.njau.edu.cn/car rotdb ). CarrotDB offers the tools of Genome Map and Basic Local Alignment Search Tool. Using these tools, users can search certain target genes and simple sequence repeats along with designed primers of ‘DC-27’. Assembled transcriptomic sequences along with fragments per kilobase of transcript sequence per millions base pairs sequenced information (FPKM) information of 14 carrot genotypes are also provided. Users can download de novo assembled whole-genome sequences, putative gene sequences and putative protein sequences of ‘DC-27’. Users can also download transcriptome sequence assemblies of 14 carrot genotypes along with their FPKM information. A total of 2826 transcription factor (TF) genes classified into 57 families were identified in the entire genome sequences. These TF genes were embedded in CarrotDB as an interface. The ‘GERMPLASM’ part of CarrotDB also offers taproot photos of 45 carrot genotypes and a table containing accession numbers, names, countries of origin and colors of cortex, phloem and xylem parts of taproots corresponding to each carrot genotype. CarrotDB will be continuously updated with new information.

Database URL:http://apiaceae.njau.edu.cn/carrotdb/

Introduction

Carrot ( Daucus carota L.) species belong to the family Apiaceae, which comprises up to 3700 species across 450 genera worldwide ( 1 , 2 ). Carrot is among the top 10 vegetable crop members in terms of production area and market value with >20 million tons of production a year worldwide according to data from the Food and Agriculture Organization of the United Nations ( http://fao stat3.fao.org ). Most cultivated carrots are functional diploids (2n = 2× = 18) with a genome size of ∼480 Mb ( 3 ). The modern cultivated carrot genus ( D. carota spp. sativus ) is genetically diverse and is classified into two groups, namely, carotene ( D. carota ssp. sativus var. sativus ) and anthocyanin groups ( D. carota ssp. sativus var. atrorubens Alef.) ( 4 ). Carotene carrot cultivars are an important source of carotenoids and provitamin A and account for the majority of the species of carrot. Carotene carrot cultivars have been cultivated as root crops for ∼1100 years, whereas anthocyanin group carrots have a history dating back 3000 years ( 3 , 5 ). Over the centuries, carrot transformed from being a small, tough and bitter root to a fleshy, sweet, pigmented, unbranched and edible vegetable.

In recent years, with increasing number of genomic and genetic studies on carrot, some carrot data sets have become publicly available. De novo assembly of the inbred carrot line ‘B493B’ mitochondrial genome has been reported ( 6 ). This assembly provided new information on intercompartmental interactions and cell evolution in this species. Transcriptomic sequences of three carrot lines have also been de novo assembled ( 3 ), and these sequences revealed new information, such as novel genes and markers. RNA-seq raw data sets of 11 carrot genotypes have been uploaded to the NCBI SRA database ( http://www.ncbi.nlm.nih.gov/sra/ ). Many expressed sequence tags (ESTs) are also available on NCBI. These data are useful for genomic and transcriptomic analysis. However, no database for carrot is currently available despite that carrot is among the top 10 important vegetable crops worldwide. Given the importance of carrot to humans, a carrot genomic and transcriptomic database should be available for researchers to explore important agronomic characteristics of carrots, such as contents of β-carotene, anthocyanins or sugar; resistance to disease; and taproot tolerance to abiotic stress. Based on the completion of the whole-genome sequence of an inbred line ‘DC-27’ ( D. carota ssp. sativus L.) in our laboratory using next-generation sequencing, a genomic and transcriptomic database for D. carota (CarrotDB) has been constructed for the use of the global scientific community. CarrotDB is now publicly available ( http://apiaceae.njau.edu.cn/carrotdb ).

In this Web site, the de novo assembled draft genome sequence of ‘DC-27’ is available. Users can also search the transcriptomic sequences of 14 carrot genotypes and their expected number of fragments per kilobase of transcript sequence per millions base pairs sequenced information (FPKM). The transcriptome sequences have been assembled by mapping reads to whole-genome sequences. Based on these resources, easy-to-use interfaces were also developed to help researchers find sequences of the scaffolds, putative genes or gene fragments, simple sequence repeat (SSR) markers, expressed genes in the transcriptome and FPKM information of expressed genes by keyword search or homology search. Researchers can check gene annotation and submit information to the GENOME MAP group. This database will be useful for scientists worldwide who are working on carrot genetics.

Data resources

The data sets of information in the current CarrotDB version were based on the analysis of the genome sequence of ‘DC-27’ (an F 8 inbred line of a commercial variety ‘Kuroda’ from our laboratory) and transcriptome sequences of 14 carrot genotypes. The carrot materials were obtained from Lab of Apiaceae Plant Genetics and Germplasm Enhancement, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University. We provided the whole draft genome sequences, nucleotide sequences of putative genes and amino acid sequences of putative proteins. We also provided SSRs and designed primers of ‘DC-27’ carrot. We assembled transcriptomic sequences along with FPKM information of 14 carrot genotypes in CarrotDB.

Genome sequencing and de novo assembly

DNA was extracted from ‘DC-27’ carrot and sequenced using Roche 454GS FLX + Titanium sequencer and Illumina HiSeq 2000 (CA, USA). We obtained ∼14.04 Gb of raw data. The 4.7 million shotgun reads and 120.0 million raw reads reached up to 32× coverage of the genome sequence, and these reads were generated in Roche 454 sequencing and HiSeq 2000 sequencing, respectively. SOAPdenovo software ( http://soap.genomics.org.cn/soap denovo.html ) ( 7 ) and Newbler Assembler software ( http://swes.cals.arizona.edu/maier_lab/kartchner/documentation/index.php/home/docs/newbler ) were used for assembly. After assembly, merging with Minimus2 software ( http://sourceforge.net/apps/mediawiki/amos/index.php?title =Mi ni mus2 ) was performed. An assembly of 371.6 Mb was produced. Details of assembly procedure are shown in Figure 1 .

Figure 1.

Workflow for processing and analyzing of genomic sequences of ‘DC-27’ and transcriptomic sequences specific to 14 carrot genotypes.

Gene prediction, annotation and similarity analysis

Using Augustus software, 78 935 putative genes or gene fragments were predicted ( 8 ). The total, average and maximum lengths were 65.8 Mb, 833 bp and 8217 bp, respectively. These putative genes were annotated using Blast2GO with Gene Ontology and Eukaryotic Clusters of Orthologous Groups databases ( 9–11 ).

Identification of transcription factor genes

Transcription factors (TFs) play a vital role in regulating gene expression and plant growth development. We used HMMER (v3.1b1; cutoff E -value:1e-5) to predict TFs ( 12 ). A total of 2826 TF genes were identified in the whole-genome sequences. These putative TFs genes were classified into 57 families. MYB gene family was the largest among all these predicted TF families, followed by ABI3-VP1 and AP2-EREBP gene families. MYB gene family contained 386 gene members.

Development of genome-wide SSR markers

SSR markers are among the most important molecular markers in plants and have been widely used in carrots ( 3 , 13–17 ). Based on the draft genome sequence of ‘DC-27’, SSRs were identified using MIcroSAtellite (MISA) identification tool ( http://pgrc.ipk-gatersle ben.de/misa/ ). A certain minimum number of units were 10, 6, 5, 5, 5 or 5 for mono-, di-, tri-, tetra-, penta- or hexanucleotide repeats, respectively. A total of 65 942 SSRs were found in the whole-genome sequence of ‘DC-27’, including 48 398, 15 172, 2316, 4, 8 and 44 for mono-, di-, tri-, tetra-, penta- and hexanucleotide repeats, respectively ( Table 1 ). To develop SSR markers, 57 519 pairs of primers specific to SSRs were designed using Primer3 software ( 18 ).

Table 1.

Identified SSRs in whole genome sequence of ‘DC-27’

Repeat type of SSRMinimum unitsQuantity
Mononucleotide repeats1048 398
Dinucleotide repeats615 172
Trinucleotide repeats52316
Tetranucleotide repeats54
Pentanucleotide repeats58
Hexanucleotide repeats544
Total65 942
Repeat type of SSRMinimum unitsQuantity
Mononucleotide repeats1048 398
Dinucleotide repeats615 172
Trinucleotide repeats52316
Tetranucleotide repeats54
Pentanucleotide repeats58
Hexanucleotide repeats544
Total65 942
Table 1.

Identified SSRs in whole genome sequence of ‘DC-27’

Repeat type of SSRMinimum unitsQuantity
Mononucleotide repeats1048 398
Dinucleotide repeats615 172
Trinucleotide repeats52316
Tetranucleotide repeats54
Pentanucleotide repeats58
Hexanucleotide repeats544
Total65 942
Repeat type of SSRMinimum unitsQuantity
Mononucleotide repeats1048 398
Dinucleotide repeats615 172
Trinucleotide repeats52316
Tetranucleotide repeats54
Pentanucleotide repeats58
Hexanucleotide repeats544
Total65 942

Assembly of transcriptomic sequences of 14 carrot genotypes

The RNA-seq raw data of 14 carrot genotypes were downloaded from the NCBI SRA database. The variety names of carrots and accession numbers of data sets specific to the carrot genotypes are listed in Table 2 . The raw read sets of ‘B493xQAL’, ‘B6274’ and ‘B7262’ have been de novo assembled previously ( 3 ), whereas the raw read sets of other 11 carrot genotypes have not been assembled. Based on the whole-genome information of ‘DC-27’, we assembled or reassembled these read sets to obtain more accurate transcriptome assemblies. NGS QC Toolkit (v2.3.2) was used to remove reads with 10% or more low-quality bases (Phred score <20) ( 19 ). High-quality reads of 14 carrot genotypes were then mapped to the genome sequences of ‘DC-27’ with TopHat (v2.0.10) and Bowtie2 (2.1.0), separately, followed by assembly with cufflinks ( 20 , 21 ). The nine resulting assemblies corresponding to ‘B493xQAL’, ‘B6274’ and ‘B7262’ were merged to three assemblies according to genotype. A total of 14 assemblies within each genotype containing thousands of contigs were generated. The quantities of scaffolds in each assembly are listed in Table 2 . Resulting scaffolds in each genotype transcriptome assembly were separately used for Basic Local Alignment Search Tool (BLAST) searches and annotation against nucleotide (NT), non-redundant (NR), Swiss-Prot and Cluster of Orthologous Groups of proteins (COG) databases using BLAST software ( 22 ). A total of 283 469 contigs, which were mapped to the genome sequence, were produced for 14 carrot genotypes with the maximum contig length of 26 633 bp for ‘B493xQAL’. The FPKM information of each contig was also generated in the assemblies. To generate non-redundant contig assembly, the resulting 14 assemblies within each genotype were merged to one assembly by using Cufflinks (v2.1.1) ( 23 ). A total of 61 986 contigs were generated with the maximum contig length of 26 726 bp. Details of assembly procedure are shown in Figure 1 .

Table 2.

Process and analysis of transcriptomes of 14 carrot genotypes

Genotype of carrotAccessionReadsBases (bp) HQ reads aHQ bases (bp)Number of contigsBases (bp)
B493xQALSRR18775571033257.17E+0813987081412695086198697732153
SRR187756130889221.6E+0991858571120674554
SRR187757106851046.52E+089966187607937407
SRR18775875332187.61E+081538700155408700
B6274SRR187759113100041.38E+09819736510000785304849990887030
SRR18776098835806.03E+089228881562961741
SRR18776178992117.98E+0877616278392362
B7262SRR187762136432811.66E+0993842511448786224622985503687
SRR187763119180247.27E+0811065168674975248
Wild carrots from Lachish, IsraelERR1859296768111.02E+085947718921565022641561741
ChantenayERR18593054845588.23E+0845385596807838503091834748896
FlakkeeERR18593159869798.98E+0849375237406284501393012793328
Wild carrots from Meijendel, NetherlandsERR185932686691030035061360920400015960288
BerlikumerERR18593312357841.85E+08104664515699675036292787541
Amsterdamse BakERR18593418870612.83E+08157085323562795036732438495
Wild carrots from Esposende, PortugalERR18593558427748.76E+0849563017434451501625117443806
ParijseERR18593678433371.18E+0966096719914506502183025778255
Wild carrots from Schermer Polder, NetherlandsERR18593783533981.25E+09736381611045724002171220885790
Wild carrots from Trencin, SlovakiaERR18593841762066.26E+0836427975464195501530515209873
NantesERR18593952310597.85E+0843082166462324001453515628203
Genotype of carrotAccessionReadsBases (bp) HQ reads aHQ bases (bp)Number of contigsBases (bp)
B493xQALSRR18775571033257.17E+0813987081412695086198697732153
SRR187756130889221.6E+0991858571120674554
SRR187757106851046.52E+089966187607937407
SRR18775875332187.61E+081538700155408700
B6274SRR187759113100041.38E+09819736510000785304849990887030
SRR18776098835806.03E+089228881562961741
SRR18776178992117.98E+0877616278392362
B7262SRR187762136432811.66E+0993842511448786224622985503687
SRR187763119180247.27E+0811065168674975248
Wild carrots from Lachish, IsraelERR1859296768111.02E+085947718921565022641561741
ChantenayERR18593054845588.23E+0845385596807838503091834748896
FlakkeeERR18593159869798.98E+0849375237406284501393012793328
Wild carrots from Meijendel, NetherlandsERR185932686691030035061360920400015960288
BerlikumerERR18593312357841.85E+08104664515699675036292787541
Amsterdamse BakERR18593418870612.83E+08157085323562795036732438495
Wild carrots from Esposende, PortugalERR18593558427748.76E+0849563017434451501625117443806
ParijseERR18593678433371.18E+0966096719914506502183025778255
Wild carrots from Schermer Polder, NetherlandsERR18593783533981.25E+09736381611045724002171220885790
Wild carrots from Trencin, SlovakiaERR18593841762066.26E+0836427975464195501530515209873
NantesERR18593952310597.85E+0843082166462324001453515628203

a High-quality (HQ) reads and bases were obtained after removing reads with 10% or more low-quality bases (PHRED score <20).

Table 2.

Process and analysis of transcriptomes of 14 carrot genotypes

Genotype of carrotAccessionReadsBases (bp) HQ reads aHQ bases (bp)Number of contigsBases (bp)
B493xQALSRR18775571033257.17E+0813987081412695086198697732153
SRR187756130889221.6E+0991858571120674554
SRR187757106851046.52E+089966187607937407
SRR18775875332187.61E+081538700155408700
B6274SRR187759113100041.38E+09819736510000785304849990887030
SRR18776098835806.03E+089228881562961741
SRR18776178992117.98E+0877616278392362
B7262SRR187762136432811.66E+0993842511448786224622985503687
SRR187763119180247.27E+0811065168674975248
Wild carrots from Lachish, IsraelERR1859296768111.02E+085947718921565022641561741
ChantenayERR18593054845588.23E+0845385596807838503091834748896
FlakkeeERR18593159869798.98E+0849375237406284501393012793328
Wild carrots from Meijendel, NetherlandsERR185932686691030035061360920400015960288
BerlikumerERR18593312357841.85E+08104664515699675036292787541
Amsterdamse BakERR18593418870612.83E+08157085323562795036732438495
Wild carrots from Esposende, PortugalERR18593558427748.76E+0849563017434451501625117443806
ParijseERR18593678433371.18E+0966096719914506502183025778255
Wild carrots from Schermer Polder, NetherlandsERR18593783533981.25E+09736381611045724002171220885790
Wild carrots from Trencin, SlovakiaERR18593841762066.26E+0836427975464195501530515209873
NantesERR18593952310597.85E+0843082166462324001453515628203
Genotype of carrotAccessionReadsBases (bp) HQ reads aHQ bases (bp)Number of contigsBases (bp)
B493xQALSRR18775571033257.17E+0813987081412695086198697732153
SRR187756130889221.6E+0991858571120674554
SRR187757106851046.52E+089966187607937407
SRR18775875332187.61E+081538700155408700
B6274SRR187759113100041.38E+09819736510000785304849990887030
SRR18776098835806.03E+089228881562961741
SRR18776178992117.98E+0877616278392362
B7262SRR187762136432811.66E+0993842511448786224622985503687
SRR187763119180247.27E+0811065168674975248
Wild carrots from Lachish, IsraelERR1859296768111.02E+085947718921565022641561741
ChantenayERR18593054845588.23E+0845385596807838503091834748896
FlakkeeERR18593159869798.98E+0849375237406284501393012793328
Wild carrots from Meijendel, NetherlandsERR185932686691030035061360920400015960288
BerlikumerERR18593312357841.85E+08104664515699675036292787541
Amsterdamse BakERR18593418870612.83E+08157085323562795036732438495
Wild carrots from Esposende, PortugalERR18593558427748.76E+0849563017434451501625117443806
ParijseERR18593678433371.18E+0966096719914506502183025778255
Wild carrots from Schermer Polder, NetherlandsERR18593783533981.25E+09736381611045724002171220885790
Wild carrots from Trencin, SlovakiaERR18593841762066.26E+0836427975464195501530515209873
NantesERR18593952310597.85E+0843082166462324001453515628203

a High-quality (HQ) reads and bases were obtained after removing reads with 10% or more low-quality bases (PHRED score <20).

Identification of structural genes involved in anthocyanin biosynthesis from whole-genome and merge transcriptomic sequence assemblies

Anthocyanins are abundant in taproots of purple carrot cultivars, whereas cyanidin-based anthocyanins represent almost all of the anthocyanins in carrots ( 24 ). To detect the quality of transcriptomes assembled in this study, we searched the structural genes involved in cyanidin-based anthocyanin biosynthesis pathway. All the candidate structural genes, including 17 genes in 10 gene families, were involved in the cyanidin-based anthocyanin pathway. These genes were found in the whole draft genome sequence assembly and the merge transcriptome assembly using BLASTN ( Table 3 ). Chalcone–flavonone isomerase (CHI) , flavonoid 3′-monooxygenase (F3’H) and UDP-galactose: flavonoid 3-O-galactosyltransferase ( UF3GaT ) gene sequences were identified in carrot for the first time. These sequences have not been reported in previous de novo assembled transcripts ( 3 ). These results emphasize the accuracy and depth of our genome and transcriptome database.

Table 3.

Results of BLASTN of structural genes involved in anthocyanins biosynthesis in carrot genome and transcriptome assemblies

Gene familyReference sequence sourceGenBank IDScaffold IDs in genome sequence with hits to referencePutative gene IDs in genome sequence with hits to referenceContigs in merged transcriptome sequencewith hits to reference
PALD. carotaD85850.122943300412, 00410
D. carotaAB089813.116648, 16649, 16924317456, 17457, 7615817301, 17302, 59170
D. carotaAB435640.116648, 16649, 16924317456, 17457, 7615817301, 17302, 59170
CA4HAmmi majusAY219918.1257492473324075, 33981
4CLPetroselinum crispumX13324.11380198802081
P. crispumX13325.11380198802081
CHSD. carotaAJ006779.127333, 2140825908, 2133920975, 25160
D. carotaD16255.1139899, 493261298, 622706288, 48099
D. carotaD16256.1139899, 493261298, 622706288, 48099
CHICamellia nitidissimaHQ269805.188761038910363
F3HD. carotaAF184270.1172261795017727
F3’HPetunia x hybridaAF155332.1365313197430143
DFRD. carotaAF184271.11955, 1956, 3579631520, 2745, 315202876, 29801, 29802
D. carotaAF184272.1440463605933353
LDOXD. carotaAF184273.120348, 182749, 20349, 2035020467, 78025, 2047220139, 20140, 20142
D. carotaAF184274.120348, 182749, 20349, 2035020467, 78025, 20470, 2047261071, 20140, 20142
UF3GaTVigna mungoAB009370.1146346, 2123864830, 2120220841, 50490
Gene familyReference sequence sourceGenBank IDScaffold IDs in genome sequence with hits to referencePutative gene IDs in genome sequence with hits to referenceContigs in merged transcriptome sequencewith hits to reference
PALD. carotaD85850.122943300412, 00410
D. carotaAB089813.116648, 16649, 16924317456, 17457, 7615817301, 17302, 59170
D. carotaAB435640.116648, 16649, 16924317456, 17457, 7615817301, 17302, 59170
CA4HAmmi majusAY219918.1257492473324075, 33981
4CLPetroselinum crispumX13324.11380198802081
P. crispumX13325.11380198802081
CHSD. carotaAJ006779.127333, 2140825908, 2133920975, 25160
D. carotaD16255.1139899, 493261298, 622706288, 48099
D. carotaD16256.1139899, 493261298, 622706288, 48099
CHICamellia nitidissimaHQ269805.188761038910363
F3HD. carotaAF184270.1172261795017727
F3’HPetunia x hybridaAF155332.1365313197430143
DFRD. carotaAF184271.11955, 1956, 3579631520, 2745, 315202876, 29801, 29802
D. carotaAF184272.1440463605933353
LDOXD. carotaAF184273.120348, 182749, 20349, 2035020467, 78025, 2047220139, 20140, 20142
D. carotaAF184274.120348, 182749, 20349, 2035020467, 78025, 20470, 2047261071, 20140, 20142
UF3GaTVigna mungoAB009370.1146346, 2123864830, 2120220841, 50490

PAL , Phenylalanine ammonia-lyase; CA4H , Cinnammate 4-hydroxylase; 4CL , 4-coumaroyl-coenzyme A ligase; CHS , Chalcone synthase; CHI , Chalcone–flavanone isomerase; F3H , Flavanone 3-hydroxylase; F3’H’ , Flavonoid 3’-monooxygenase; DFR , Dihydroflavonol 4-reductase; LDOX , Leucoanthocyanidin dioxygenase; UF3GaT , UDP-galactose: flavonoid 3-O-galactosyltransferase.

Table 3.

Results of BLASTN of structural genes involved in anthocyanins biosynthesis in carrot genome and transcriptome assemblies

Gene familyReference sequence sourceGenBank IDScaffold IDs in genome sequence with hits to referencePutative gene IDs in genome sequence with hits to referenceContigs in merged transcriptome sequencewith hits to reference
PALD. carotaD85850.122943300412, 00410
D. carotaAB089813.116648, 16649, 16924317456, 17457, 7615817301, 17302, 59170
D. carotaAB435640.116648, 16649, 16924317456, 17457, 7615817301, 17302, 59170
CA4HAmmi majusAY219918.1257492473324075, 33981
4CLPetroselinum crispumX13324.11380198802081
P. crispumX13325.11380198802081
CHSD. carotaAJ006779.127333, 2140825908, 2133920975, 25160
D. carotaD16255.1139899, 493261298, 622706288, 48099
D. carotaD16256.1139899, 493261298, 622706288, 48099
CHICamellia nitidissimaHQ269805.188761038910363
F3HD. carotaAF184270.1172261795017727
F3’HPetunia x hybridaAF155332.1365313197430143
DFRD. carotaAF184271.11955, 1956, 3579631520, 2745, 315202876, 29801, 29802
D. carotaAF184272.1440463605933353
LDOXD. carotaAF184273.120348, 182749, 20349, 2035020467, 78025, 2047220139, 20140, 20142
D. carotaAF184274.120348, 182749, 20349, 2035020467, 78025, 20470, 2047261071, 20140, 20142
UF3GaTVigna mungoAB009370.1146346, 2123864830, 2120220841, 50490
Gene familyReference sequence sourceGenBank IDScaffold IDs in genome sequence with hits to referencePutative gene IDs in genome sequence with hits to referenceContigs in merged transcriptome sequencewith hits to reference
PALD. carotaD85850.122943300412, 00410
D. carotaAB089813.116648, 16649, 16924317456, 17457, 7615817301, 17302, 59170
D. carotaAB435640.116648, 16649, 16924317456, 17457, 7615817301, 17302, 59170
CA4HAmmi majusAY219918.1257492473324075, 33981
4CLPetroselinum crispumX13324.11380198802081
P. crispumX13325.11380198802081
CHSD. carotaAJ006779.127333, 2140825908, 2133920975, 25160
D. carotaD16255.1139899, 493261298, 622706288, 48099
D. carotaD16256.1139899, 493261298, 622706288, 48099
CHICamellia nitidissimaHQ269805.188761038910363
F3HD. carotaAF184270.1172261795017727
F3’HPetunia x hybridaAF155332.1365313197430143
DFRD. carotaAF184271.11955, 1956, 3579631520, 2745, 315202876, 29801, 29802
D. carotaAF184272.1440463605933353
LDOXD. carotaAF184273.120348, 182749, 20349, 2035020467, 78025, 2047220139, 20140, 20142
D. carotaAF184274.120348, 182749, 20349, 2035020467, 78025, 20470, 2047261071, 20140, 20142
UF3GaTVigna mungoAB009370.1146346, 2123864830, 2120220841, 50490

PAL , Phenylalanine ammonia-lyase; CA4H , Cinnammate 4-hydroxylase; 4CL , 4-coumaroyl-coenzyme A ligase; CHS , Chalcone synthase; CHI , Chalcone–flavanone isomerase; F3H , Flavanone 3-hydroxylase; F3’H’ , Flavonoid 3’-monooxygenase; DFR , Dihydroflavonol 4-reductase; LDOX , Leucoanthocyanidin dioxygenase; UF3GaT , UDP-galactose: flavonoid 3-O-galactosyltransferase.

Database construction

Database system implementation

A database system for integrating the produced carrot draft genome sequence, putative genes sequences, coding sequences, SSRs with specific primers, predicted TFs, transcript sequences and FPKM information was developed. An Apache HTTP server in a Linux (CentOS6.2) operating system was used to develop CarrotDB. We used PHP5, Perl scripts, HTML and JavaScript to build the user-friendly interface and to write the Web pages. Genome visualization was developed using the Generic Genome Browser (GBrowse1.7) package ( 25 ). The CarrotDB system also contains an installed BLAST server. An overview of the database architecture is shown in Figure 2 .

Figure 2.

Overview of CarrotDB architecture.

Genome map

In the genome map section, CarrotDB provides Genome Map tools to visualize the whole draft genome sequence of ‘DC-27’ using GBrowse ( Figure 3 ). All 185 376 de novo assembled scaffolds were used to construct the genome browser. Five icons for each scaffold can be tracked to the detailed view: (i) gene, (ii) mRNA, (iii) coding sequences (CDS), (iv) transcript and (v) SSR. A total of 78 935 putative genes predicted from 185 376 assembled scaffolds of’DC-27’ are displayed here. Users can link to exons, introns and UTR sequences along with the locations of a putative gene on a selected assembled scaffold by clicking the gene icon. Clicking an mRNA icon leads users to the exons and UTR sequences along with the locations of a putative gene on a selected assembled scaffold. The CDS icon directs the user to the exon sequences along with the locations of a putative gene on a selected assembled scaffold. A total of 345 455 contigs from 14 assemblies corresponding to 14 carrot genotypes along with the merged assembles are also provided. Clicking on a transcript icon leads users to the transcript sequence along with the location on a selected assembled scaffold and FPKM information. Given the important uses of SSR markers in carrots, 65 942 SSRs are provided in this block. Clicking the SSR icon provides the link to the SSR sequence and primer sequences specific to SSRs. Users can freely obtain useful information from this section.

Figure 3.

Organizational structure of CarrotDB Web pages.

BLAST

To help users perform sequence alignment, BLAST was embedded in the database using the wwwBlast program (v2.2.26) to provide a graphic user interface through the Web forms ( Figure 3 ) ( 26 ). The whole draft genome sequence, putative gene sequences, protein sequences assemblies of ‘DC-27’, transcriptome sequences of 14 carrot genotypes and merged transcriptome sequences are used for BLAST. Users can perform similarity searches against each type of sequences using various BLAST search forms (BLASTn, BLASTp, BLASTx, tBLASTn and tBLASTx). The nucleotide acid or amino acid sequences in FASTA format can be submitted by entering the data in the frame or uploading a file. Users can set some suitable parameters or simply select the default parameters before performing the search. Clicking the search icon provides the link to the query results interface. Therefore, the query sequences along with their position on the whole-genome sequences of ‘DC-27’ were produced and ordered according to the expected value. This section also provides genome, gene and protein sequence databases of Arabidopsis thaliana for BLAST.

Transcription factor

In plants, TFs control the transcription of downstream genes by binding to specific DNA sequences. For the users’ convenience, 2826 putative genes that have been classified into 57 TF families from the whole-genome sequence assembly of carrot ‘DC-27’ are listed in a table in this section. The quantities of putative TF genes in each family are shown in Figure 3 . Clicking the download icon below the ‘Hmmer outfile’ will lead users to interfaces containing information from an HMMER outfile, such as target gene ID, expected value and HMMER score. Users can link to the interface of nucleotide sequences and deduced amino acid sequences of all the putative TF genes from each TF family by clicking the ‘download’ icons below CDS sequence’ or ‘Protein sequence’.

Germplasm

Many genotypes of D. carota cultivars have been collected and stored in our laboratory. A total of 45 genotypes are displayed in this section (as shown in Figure 3 ). Every genotype of carrot has an accession number. The accession number, names, countries of origin and colors of cortex, phloem and xylem parts of taproots corresponding to each carrot genotype are listed in the table embedded in this section. The taproots of these carrots are in various colors, including orange, red, yellow, purple and white. These carrots also have different taproot shapes. We obtained some photos of the taproots of these carrots and provided the images in this interface.

Download

Besides the BLAST section, the genome and transcriptome information of D. carota are also available ( Figure 3 ) in this section. HTTP links are provided for users who want to freely download the released assembled genome sequences, nucleotide sequences of putative genes, gene annotation, amino acid sequences of putative proteins of ‘DC-27’ and transcriptome sequence assemblies of 14 carrot genotypes along with the merged transcriptome sequence assembly for research.

Discussion

Carrot is among the top 10 economically important vegetable crops worldwide and is the most essential source of provitamin A carotenoids in the human diet. Given the importance of carrots to humans, a database containing the genome and transcriptome information of D. carota , CarrotDB, was developed. CarrotDB provides the whole draft genome sequences, nucleotide sequences of putative genes and amino acid sequences of putative proteins, and SSRs with the designed primers of ‘DC-27’ carrot, as well as assembled transcriptomic sequences with FPKM information of 14 carrot genotypes. We developed an easy-to-use Genome Map and BLAST tool interfaces for these data sets. These interfaces enable users to search and acquire target gene sequences efficiently. All the putative TF families were identified based on the whole draft genome sequence and were embedded in CarrotDB. We also provided photos of the taproots of 45 carrot genotypes. To our knowledge, CarrotDB is the first public genomic and transcriptomic database for carrot and for all members of the Apiaceae family.

The CDS of putative genes from the genome data set of ‘DC-27’ were computationally predicted and not experimentally proven. Whether these putative gene CDS are accurate or can produce alternative splices cannot be determined. Transcriptomes of three carrot genotypes were only de novo assembled in a previous work ( 3 ). Transcriptome sequence data sets are useful to improve accuracy of these putative gene sequences, whereas the whole genome sequence assembly is helpful for assembling transcriptome sequences. Transcriptomes of 14 carrot genotypes were assembled by mapping to the whole genome sequences, producing more accurate and deeper transcripts of carrots. The expression patterns of putative genes can be somehow represented by the FPKM information of transcripts. Researchers may also find gene alternative splicing in the transcripts of the 14 carrot genotypes.

SSR markers have been applied in many genomic research aspects, such as genetic polymorphism detection, genetic relationship determination, genetic map construction and gene localization. In carrots, SSRs have been developed based on EST sequences and are widely applied in genetic relationships and diversity assessment ( 13–16 , 27 ), linkage mapping ( 15 , 17 ) and gene location ( 28 , 29 ). Based on the whole genome sequence, various genotypes of SSRs were identified in intergenic and intragenic regions. These SSRs along with the designed primers will be valuable for further research on carrots.

Future directions

The current database version will be updated with new information. With increasing research on carrots, sequencing data will also emerge significantly. We will continuously collect more transcriptome data for further analysis and store such data in CarrotDB. Single-nucleotide polymorphisms in various genotypes of carrots will also be identified and released on the CarrotDB after we collect more transcriptome sequence data sets. Users are encouraged to send us feedback for further improvement of this database. We believe that CarrotDB will be a useful database for researchers and breeders who are working on carrots.

Funding

The work was supported by the New Century Excellent Talents in University (NCET-11-0670); Jiangsu Natural Science Foundation (BK20130027); National Natural Science Foundation of China (31272175); China Postdoctoral Science Foundation (2014M551609); Priority Academic Program Development of Jiangsu Higher Education Institutions and Jiangsu Shuangchuang Project.

Conflict of interest . None declared.

References

1

Constance
L.
(
1971
)
History of the classification of Umbelliferae.(Apiaceae)
.
Bot. J. Linn. Soc
,
64
,
1
11
.

2

Pimenov
M.G.
Leonov
M.
(
1993
)
The Genera of the Umbelliferae: a Nomenclator
.
Royal Botanic Gardens
,
Kew, London, UK
.

3

Iorizzo
M.
Senalik
D.A.
Grzebelus
D.
et al.  . (
2011
)
De novo assembly and characterization of the carrot transcriptome reveals novel genes, new markers, and genetic diversity
.
BMC Genomics
,
12
,
389
.

4

Pistrick
K.
(
2001
)
Umbelliferae (Apiaceae)
. In
Hanelt
P.
(ed.)
Mansfeld's encyclopedia of agricultural and horticultural crops
.
Berlin Heidelberg New York
,
Springer
, pp.
1259
1267
.

5

Kammerer
D.
Carle
R.
Schieber
A.
(
2003
)
Detection of peonidin and pelargonidin glycosides in black carrots ( Daucus carota ssp. sativus var. atrorubens Alef.) by high-performance liquid chromatography/electrospray ionization mass spectrometry
.
Rapid Commun. Mass Spectrom.
,
17
,
2407
2412
.

6

Iorizzo
M.
Senalik
D.
Szklarczyk
M.
et al.  . (
2012
)
De novo assembly of the carrot mitochondrial genome using next generation sequencing of whole genomic DNA provides first evidence of DNA transfer into an angiosperm plastid genome
.
BMC Plant Biol.
,
12
,
61
.

7

Li
R.
Zhu
H.
Ruan
J.
et al.  . (
2010
)
De novo assembly of human genomes with massively parallel short read sequencing
.
Genome Res.
,
20
,
265
272
.

8

Stanke
M.
et al.  . (
2004
)
AUGUSTUS: a web server for gene finding in eukaryotes
.
Nucleic Acids Res.
,
32
,
W309
W312
.

9

Ashburner
M.
Ball
C.A.
Blake
J.A.
et al.  . (
2000
)
Gene ontology: tool for the unification of biology
.
Nat. Genet.
,
25
,
25
29
.

10

Tatusov
R.L.
Fedorova
N.D.
Jackson
J.D.
et al.  . (
2003
)
The COG database: an updated version includes eukaryotes
.
BMC Bioinformatics
,
4
,
41
.

11

Conesa
A.
Götz
S.
García-Gómez
J.M.
et al.  . (
2005
)
Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research
.
Bioinformatics
,
21
,
3674
3676
.

12

Eddy
S.R.
(
2011
)
Accelerated profile HMM searches
.
PLoS Computat. Biol.
,
7
,
e1002195
.

13

Baranski
R.
Maksylewicz-Kaul
A.
Nothnagel
T.
et al.  . (
2012
)
Genetic diversity of carrot (Daucus carota L.) cultivars revealed by analysis of SSR loci
.
Genet. Res. Crop Evol.
,
59
,
163
170
.

14

ISSR, E.C.D.M.
(
2001
)
A comparative study on the use of ISSR, microsatellites and RAPD markers for varietal identification of carrot genotypes
.
Acta. Hort.
,
546
,
377
385
.

15

Cavagnaro
P.F.
Chung
S.-M.
Manin
S.
et al.  . (
2011
)
Microsatellite isolation and marker development in carrot-genomic distribution, linkage mapping, genetic diversity analysis and marker transferability across Apiaceae
.
BMC Genomics
,
12
,
386
.

16

Jhang
T.
Kaur
M.
Kalia
P.
Sharma
T.
(
2010
)
Efficiency of different marker systems for molecular characterization of subtropical carrot germplasm
.
J. Agric. Sci.
,
148
,
171
181
.

17

Vivek
B.
Simon
P.
(
1999
)
Linkage relationships among molecular markers and storage root traits of carrot (Daucus carota L. ssp. sativus)
.
Theor. Appl. Genet.
,
99
,
58
64
.

18

Rozen
S.
Skaletsky
H.
(
1999
),
Primer3 on the WWW for general users and for biologist programmers
. In
Bioinformatics Methods and Protocols
.
Humana Press
,
Springer
, pp.
365
386
.

19

Patel
R.K.
Jain
M.
(
2012
)
NGS QC Toolkit: a toolkit for quality control of next generation sequencing data
.
PLoS One
,
7
,
e30619
.

20

Trapnell
C.
Pachter
L.
Salzberg
S.L.
(
2009
)
TopHat: discovering splice junctions with RNA-Seq
.
Bioinformatics
,
25
,
1105
1111
.

21

Langmead
B.
Salzberg
S.L.
(
2012
)
Fast gapped-read alignment with Bowtie 2
.
Nat. Methods
,
9
,
357
359
.

22

Camacho
C.
Coulouris
G.
Avagyan
V.
et al.  . (
2009
)
BLAST+: architecture and applications
.
BMC Bioinformatics
,
10
,
421
.

23

Mortazavi
A.
Williams
B.A.
McCue
K.
et al.  . (
2008
)
Mapping and quantifying mammalian transcriptomes by RNA-Seq
.
Nat. Methods
,
5
,
621
628
.

24

Montilla
E.C.
Arzaba
M.R.
Hillebrand
S.
Winterhalter
P.
(
2011
)
Anthocyanin composition of black carrot (Daucus carota ssp. sativus var. atrorubens Alef.) cultivars Antonina, Beta Sweet, Deep Purple, and Purple Haze
.
J. Agric. Food Chem.
,
59
,
3385
3390
.

25

Stein
L.D.
Mungall
C.
Shu
S.
et al.  . (
2002
)
The generic genome browser: a building block for a model organism system database
.
Genome Res.
,
12
,
1599
1610
.

26

Johnson
M.
Zaretskaya
I.
Raytselis
Y.
et al.  . (
2008
)
NCBI BLAST: a better web interface
.
Nucleic Acids Res.
,
36
,
W5
W9
.

27

Maksylewicz
A.
Baranski
R.
(
2013
)
Intra-population genetic diversity of cultivated carrot (Daucus carota L.) assessed by analysis of microsatellite markers
.
Acta Biochimica Polonica
,
60
,
753
760
.

28

GyuDong
O.
EunMi
H.
EunJo
S.
et al.  . (
2010
)
EST profiling for seed-hair characteristic and development of EST-SSR and SNP markers in carrot
.
Korean J. Hortic. Sci. Technol.
,
28
,
1025
1038
.

29

GyuDong
O.
EunJo
S.
SangJin
J.
YoungDoo
P.
(
2013
)
Construction of cDNA library and EST analysis related to seed-hair characteristics in carrot
.
Korean J. Hortic. Sci. Technol.
,
31
,
782
789
.

Author notes

These authors contributed equally to this work.

Citation details: Xu,Z.-S., Tan,H.-W., Wang,F., et al . CarrotDB: a genomic and transcriptomic database for carrot. Database (2014) Vol. 2014: article ID bau096; doi:10.1093/database/bau096

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.