The Comprehensive Phytopathogen Genomics Resource: a web-based resource for data-mining plant pathogen genomes

List of genomes and transcriptome projects within the Comprehensive Phytopathogen Genome Resource Warehouse

Taxonomic group	Status
Taxonomic group	Finished	Draft	In progress	ESTs
Bacteria	34	15	14	0
Fungi	7	13	14	22
Nematodes	0	2	0	14
Oomycetes	0	6	0	6
Virus	623	0	0	0
Viroid	36	0	0	0
Total	700	36	28	42

Taxonomic group	Status
Taxonomic group	Finished	Draft	In progress	ESTs
Bacteria	34	15	14	0
Fungi	7	13	14	22
Nematodes	0	2	0	14
Oomycetes	0	6	0	6
Virus	623	0	0	0
Viroid	36	0	0	0
Total	700	36	28	42

Table 1.

List of genomes and transcriptome projects within the Comprehensive Phytopathogen Genome Resource Warehouse

Taxonomic group	Status
Taxonomic group	Finished	Draft	In progress	ESTs
Bacteria	34	15	14	0
Fungi	7	13	14	22
Nematodes	0	2	0	14
Oomycetes	0	6	0	6
Virus	623	0	0	0
Viroid	36	0	0	0
Total	700	36	28	42

Taxonomic group	Status
Taxonomic group	Finished	Draft	In progress	ESTs
Bacteria	34	15	14	0
Fungi	7	13	14	22
Nematodes	0	2	0	14
Oomycetes	0	6	0	6
Virus	623	0	0	0
Viroid	36	0	0	0
Total	700	36	28	42

For each entry in the CPGR Warehouse, we provide the name of the organism, NCBI Taxonomy identifier (http://www.ncbi.nlm.nih.gov/taxonomy), the group of organism (Virus, Viroid, Bacterium, Fungus, Oomycete, Nematode), disease caused by the organism, status of the project (Finished, Draft, In progress, EST), genome size or number of ESTs, GenBank accession numbers (if available), PubMed accession number (if available) and Genome Center or Laboratory that performed the work (Figure 1). When available, this metadata is hyperlinked to appropriate web-based links for additional information for the user. A tool is provided at the top of the CPGR Warehouse page to allow filtering of the Warehouse contents based on taxonomic group or status. Each column in the Warehouse display page can be sorted.

Figure 1.

CPGR Warehouse. Output derived from filtering the CPGR Warehouse for bacterial genome projects. The output has been alphabetized based on the organism name and only the first four genome projects are listed. The organism name including strain designation, NCBI Taxon ID (http://www.ncbi.nlm.nih.gov/taxonomy), warehouse group, disease, genome status, genome size, number of ESTs (none in this example as these are genome projects), GenBank accession numbers hyperlinked to GenBank http://www.ncbi.nlm.nih.gov/genbank/), Pubmed accession numbers hyperlinked to Pubmed (http://www.ncbi.nlm.nih.gov/pubmed) and the Genome Center or laboratory that completed the work are provided in the output.

TAs

Complete genome sequences are not available for all organisms, yet large-scale sequence data sets exist in the form of mRNAs and ESTs that sample the genic regions of the genome. Within the CPGR, all publicly available mRNAs and ESTs from eukaryotic plant pathogens are downloaded from GenBank and the NCBI dbEST database (http://www.ncbi.nlm.nih.gov/projects/dbEST/), respectively. The data are clustered and assembled into a set of assemblies and singleton ESTs using the TGICL package (3). The resulting TAs (or contigs), along with the unassembled singleton ESTs, are annotated for function through searches against a protein database and loaded into the Plant Pathogen Transcript Assemblies database. Each TA is numbered uniquely (e.g. TA279_62688) (Figure 2) in which 279 represents a unique identifier within a specific TA build and the 62688 is the NCBI taxon identifier for the species (e.g. Blumeria graminis f. sp. hordei from NCBI Taxonomy). Singleton ESTs are represented through their GenBank accession numbers, e.g. EB530721. The current TA database contains transcripts from 82 different species (Table 2) representing 811 Mb of total sequence.

Figure 2.

Example of a CPGR plant pathogen transcript assembly. (A) Summary metrics on Blumeria graminis f. sp. hordei TA279_62688. (B) Annotation of TA279_62688. (C) Putative simple sequence repeats for TA279_62688. (D) Primers predicted with Primer3 for putative SSR 4532. (E) Assembly diagram for TA279_62688.

Table 2.

Transcript assemblies of phytopathogens

Group	No. of species	No. of ESTs and mRNAs	No. of Transcript Assemblies
Fungi	58	1 049 338	401 952
Nematodes	17	162 300	76 681
Oomycetes	7	317 936	126 336
Total	82	1 529 574	604 969

Group	No. of species	No. of ESTs and mRNAs	No. of Transcript Assemblies
Fungi	58	1 049 338	401 952
Nematodes	17	162 300	76 681
Oomycetes	7	317 936	126 336
Total	82	1 529 574	604 969

Table 2.

Transcript assemblies of phytopathogens

Group	No. of species	No. of ESTs and mRNAs	No. of Transcript Assemblies
Fungi	58	1 049 338	401 952
Nematodes	17	162 300	76 681
Oomycetes	7	317 936	126 336
Total	82	1 529 574	604 969

Group	No. of species	No. of ESTs and mRNAs	No. of Transcript Assemblies
Fungi	58	1 049 338	401 952
Nematodes	17	162 300	76 681
Oomycetes	7	317 936	126 336
Total	82	1 529 574	604 969

A comprehensive report page is available for each TA that includes the species, taxon identifier, number of component sequences, orientation and length (Figure 2). Functional annotation is provided with the ‘top match’ of a BLAST search of the TA against the UniRef100 protein reference database, including the annotation, percent identity and percent length of the match. A component diagram indicating the individual sequence accessions used in the assembly, their orientation and relative length are depicted in the assembly diagram and tabulated in the assembly component table. Another form of annotation provided is prediction of SSRs and primers that flank the putative SSR. FASTA formatted sequence for the TA or singleton is provided to facilitate sequence-based searches by the user.

Genome and gene level tools

For whole bacterial, fungal and oomycete genomes for which the sequence and/or annotation data sets are publicly available and published, we have imported those data sets into the CPGR and provide access to the sequence and annotation through a series of interfaces that permit browsing, download and query by the user. Note that not all genomes in the warehouse are available as full data sets in the CPGR as these genomes are either still in progress or are not currently available for redistribution to third party sites such as the CPGR. For publicly available genomes, the primary method for visualizing a genome is through the Genome Browser (Figure 3), an open-source genome visualization software made available through the GMOD project (45). Currently, Genome Browser views are available for 74 annotated genomes [60 bacteria (http://cpgr.plantbiology.msu.edu/cgi-bin/gbrowse/bacteria/), 12 fungi (http://cpgr.plantbiology.msu.edu/cgi-bin/gbrowse/fungi/) and 2 oomycetes (http://cpgr.plantbiology.msu.edu/cgi-bin/gbrowse/oomycete/)]. Additional plant pathogenic bacterial, fungal, oomycete and nematode genomes will be added in the near future. The genome browsers can be accessed directly through the Genome Browser Selection tool available through the top menu bar in which a default genome is displayed for each of the phyla. Other genomes within each phylum can be viewed by selecting the GBrowse Selection Tool (http://cpgr.plantbiology.msu.edu/cgi-bin/bact_gateway.pl?page=bact_gbrowse_select; http://cpgr.plantbiology.msu.edu/cgi-bin/fungi_gateway.pl?page=fungi_gbrowse_select; http://cpgr.plantbiology.msu.edu/cgi-bin/oomycete_gateway.pl?page=oomycete_gbrowse_select) either from inside each of the three phyla level genome browsers or from the top menu bar. The CPGR genome browsers include tracks representing the loci, gene models (fungi and oomycetes only), rRNA and tRNA genes, putative SSRs, GC content and six-frame translation (Figure 3). The Scroll/Zoom tools allow the user to visualize loci throughout the entire genome at variable resolution levels. The reports/analysis tools provide for generation of decorated FASTA files and high resolution images of the genome browser. For the CPGR fungal and oomycete genome browsers, the loci are linked to a Genome Browser detail page containing coordinate, function and sequence information. Each locus in the CPGR bacterial genome browsers and each gene model in the fungal and oomycete genome browsers is hyperlinked to the respective CPGR Gene Report Page (Figure 4), which collates an array of metrics and annotation at the locus/gene model level. Functional annotation of the gene is also provided on the CPGR Gene Report Page with locus name, gene name (if available), Pfam domain matches (46), InterPro database matches (47), orthologous group membership (bacteria only) and gene name assignment. Depth of information on relatedness is provided through sequence similarity search results of the predicted protein sequence with UniRef100 (38). While the genome browser provides a mechanism to enter genomes and genes visually, the Bacterial/Fungal/Oomycete Genome Gene List page (e.g. http://cpgr.plantbiology.msu.edu/cgi-bin/bact_gateway.pl?page=all_gene_list), rRNA Gene List page (e.g. http://cpgr.plantbiology.msu.edu/cgi-bin/bact_gateway.pl?page=rna_gene_list), Pfam Domain Page (e.g. http://cpgr.plantbiology.msu.edu/cgi-bin/bact_gateway.pl?page=pfam_list) and InterPro Domain Page (e.g. http://cpgr.plantbiology.msu.edu/cgi-bin/bact_gateway.pl?page=interpro_list) provide mechanisms to obtain tabular gene sets from each genome.

Figure 3.

Display of bacterial genome through Genome Browser. A region of the Xylella fastidiosa 9a5c chromosome is displayed with tracks representing the loci, tRNA genes, putative Simple Sequence Repeats, GC content and 6-frame translation.

Figure 4.

Gene Report Page for Acidovorax avenae subsp. citrulli AAC00-1 Aave_2450. (A) Hyperlinks to download gene sequence and genome browser display of the locus. (B) Locus name, functional annotation and gene name (if available). (C) Gene attributes including molecule location (chromosome, plasmid), coordinates and protein metrics. (D) Gene structure. (E) Pfam domain matches with scores. (F) InterPro hits including position of matches and E-value. (G) Orthologous groups from β-Proteobacteria clustering. (H) BLASTP search results from an all versus search of bacterial proteins within the CPGR. (I) Partial listing of UniRef100 top matches.

Precompiled whole-genome Mauve (41) alignments (http://cpgr.plantbiology.msu.edu/cpgr_asap_mauve.shtml) from within-genera groups of phytopathogenic bacteria are available for download to assist users interested in comparative genomics. The provided links download a java applet and the sequence alignment. Mauve provides a graphic view of multiple genomes with features and sequence that can be resized from an overview of the entire genome down to individual nucleotides. Features within the alignment link out to feature pages within ASAP, as well as GenBank.

For bacterial, fungal and oomycete genomes, a suite of tools is available for searching at the gene level. These include BLAST search page (e.g. http://cpgr.plantbiology.msu.edu/cpgr_bact_blast.shtml), Locus ID query page (e.g. http://cpgr.plantbiology.msu.edu/cgi-bin/bact_gateway.pl?page=locus_id_search), Putative Function search page (e.g. http://cpgr.plantbiology.msu.edu/cgi-bin/bact_gateway.pl?page=pfunc_search), Pfam domain search page (e.g. http://cpgr.plantbiology.msu.edu/cgi-bin/bact_gateway.pl?page=pfam_search) and Interpro domain search page (e.g. http://cpgr.plantbiology.msu.edu/cgi-bin/bact_gateway.pl?page=interpro_search). Due to the limited phylogenetic coverage of fungal and oomycete genomes, an orthologous group search page is available only for bacterial genomes within the CPGR (http://cpgr.plantbiology.msu.edu/cgi-bin/bact_gateway.pl?page=orthomcl_search).

Updates to each genome are tracked in the Bacteria, Fungi and Oomycete Genome Information pages. These pages display the current CPGR version of the imported genomes and annotation, the date it was released in the CPGR and the source of the original genome download. Data files for older versions such as database dumps, sequence files and BLAST databases will be maintained on the CPGR FTP server.

rDNA database

rDNA sequences, including the internal transcribed spacer (ITS) region, are widely used to develop molecular diagnostic markers for plant pathogens. To facilitate marker development for all plant pathogens, we created the rDNA database that includes sequences not only for plant pathogen taxa, but also closely related taxa and allows for stringent filtering of sequences for marker design. The rDNA sequences and associated annotations are downloaded from GenBank and stored in a MySQL relational database. A dedicated search page (http://cpgr.plantbiology.msu.edu/cgi-bin/cpgr_rdna/cpgr_rdna_db.pl) allows users to select from the nuclear and/or mitochondrial rDNA loci, the output format and then select specific loci based on genus and/or species. The current rDNA database (Version 1) contains 131 755 sequences from 17 613 species, of which 65 232 sequences are from 3760 plant pathogenic species. All sequences can be downloaded through the CPGR FTP site (ftp://ftp.plantbiology.msu.edu/pub/data/CPGR/).

Marker identification tools

SSR identification tool

To facilitate rapid discovery and testing of diagnostic SSR markers, we developed an on-line tool for plant pathologists and diagnosticians to identify candidate SSRs within a query sequence and design primers for amplification of the SSR using Primer3 (http://primer3.sourceforge.net/) (48). Users can select which type of SSR to predict both, the type of nucleotide (mononucleotide, dinucleotide, trinucleotide, etc.), as well as the number of repeats (Figure 5). As described above, SSRs have been annotated in all of the Plant Pathogen Transcript Assemblies and are viewable along with predicted primers to amplify the SSR in the Transcript Assembly Report Page (Figure 2). SSRs have also been annotated in all 74 genomes and are viewable through the Genome Browser or through the Bacteria/Fungal/Oomycete Putative SSR list query page (e.g. http://cpgr.plantbiology.msu.edu/cgi-bin/bact_gateway.pl?page=putative_ssrs).

Figure 5.

Simple Sequence Repeat search tool. (A) Selection options for SSR type and length. (B) Primer selection criteria for putative SSRs. (C) SSR Report page showing SSRs identified, motif, number of motifs and position of start/stop. (D) Predicted primers for putative SSR.

Bacterial unique loci marker search tool

Genome level sequencing of multiple taxa permits the rapid identification of unique loci that could serve as diagnostic markers. The Unique Loci Candidate Search Tool (http://cpgr.plantbiology.msu.edu/cgi-bin/bact_gateway.pl?page=unique_loci) facilitates data-mining loci of bacteria that are restricted in their phylogenetic distribution. From the Unique Loci List Query Page (http://cpgr.plantbiology.msu.edu/cgi-bin/bact_gateway.pl?page=unique_loci), the user selects a specific bacterial genome and a list of unique loci are shown with their gene symbol (if known), putative function, genome coordinates and resident molecule. The locus is hyperlinked to the Gene Report Page from which the sequence can be downloaded and primers can be picked from the loci using Primer3.

Other tools and resources

BLAST search tool

A primary method to identify sequences is through sequence-based searches such as BLAST (49). BLAST search pages are available for users to search genes, transcripts and genomes within the CPGR. These include a dedicated BLAST server for searching bacterial, fungal and oomycete loci within the CPGR using nucleotide or protein level searches (http://cpgr.plantbiology.msu.edu/cpgr_bact_blast.shtml; http://cpgr.plantbiology.msu.edu/cpgr_fungi_blast.shtml; http://cpgr.plantbiology.msu.edu/cpgr_oomycete_blast.shtml). Additional BLAST searches can be performed on the Plant Pathogen Transcript Assemblies (http://cpgr.plantbiology.msu.edu/cpgr_blast.shtml) that supports taxon selection by the user and a BLAST search against plant pathogen sequences downloaded from GenBank (http://cpgr.plantbiology.msu.edu/cpgr_pp_genbank_blast.shtml) including ESTs from dbEST, cloned genes/mRNAs/cDNAs, whole genomes and draft genome sequences and assemblies. Search results can be viewed typically in less than 1 min but are also held temporarily on a private URL for 24 h.

FTP site

The Plant Pathogen Transcript Assemblies and the collated GenBank plant pathogen sequences within the CPGR are available through FTP (ftp://ftp.plantbiology.msu.edu/pub/data/CPGR/).

Discussion

The diversity and breadth of organisms that can cause diseases on plants is vast. To facilitate access to the growing body of plant pathogen genome sequence, we have created the CPGR that serves as a portal to all publicly available plant pathogen genome sequence data and projects. Establishment of the CPGR Warehouse with accompanying metadata provides a broad, yet, detailed view of the status of plant pathogen genome sequence data. Not only are complete, publicly available data sets available, but planned and in-progress projects are collated. The CPGR supports researchers to quickly assess and obtain the genome sequence for their organisms of interest obviating the need to have either personal knowledge of the status of genomics initiatives or having to search in multiple locations for information. In addition to the Warehouse, the CPGR offers display, search and access tools to the genome sequence and annotation of 74 genomes and 82 transcriptomes. In addition, rDNA sequences are provided for 17 613 species to facilitate diagnostic marker development.

Existing web-based databases for plant pathogen genome data differ not only in terms of the number and diversity of genomes they encompass, but also in the types of data analyses supported. A number of databases support comparative genomics analysis of fungal plant pathogens; e.g. e-Fungi, COGEME and CFGP, all integrate a wide variety of fungal genomes. The Fungal Genome Initiative at the Broad Institute and the Fungal Genomics Program at the DOE Joint Genome Institute include genome sequences of select fungal plant pathogens. The Phytophthora Functional Genomics Database (PFGD) (50) and VMD are databases dedicated to plant pathogenic oomycetes including Phytophthora and Hyaloperonospora. Furthermore, there are a number of databases such as the Candida Genome Database (CGD) (51), FGDB and the Aspergillus Genome Database (AspGD) (52), which include genome sequences and other information from specific plant pathogens. Whereas these pathogen-specific databases do not support comparative analyses across a range of plant pathogen taxa, the IMG system maintained by JGI supports comparative analysis and annotation of a wide variety of microbial genomes in a comprehensive integrated context comparable with CPGR.

Genomics has resulted in fundamental improvements in the breadth and depth of our understanding of plant pathogens. For example, extensive sequencing of oomycete pathogens has revealed classes of effector molecules that modulate the host–parasite interaction (14, 53, 54). Genome-scale microarrays were used in comparative genome hybridizations to determine the core and the variable genes within the Ralstonia solanacearum genome (55). Surprisingly, of the 5074 R. solanacearum genes placed on the array, only 53% were present in all of the strains examined forming the ‘core genome’, whereas 46% were variable and present in a subset of strains. Sequencing of Fusarium graminearum, causal agent of Fusarium head blight of wheat and barley, coupled with expression profiling experiments revealed a set of 408 genes expressed exclusively during infection of barley that, based on single nucleotide polymorphism frequency were more divergent than other genes in the genome (17).

Whereas these examples show the power of genomics to advance basic research, genomics can and will have a significant role in deciphering pathogen population structure and its relationship to disease, as well as in the development of diagnostic markers for plant pathogens. For example, a number of detection methods rely on DNA-based markers where a targeted locus (loci) is amplified from the pathogen using PCR (56, 57) or detected through hybridization (58–62). Typically, these DNA-based markers can be scored in a binary fashion (present/absent), by size polymorphism, or by the kinetic nature of the amplification reaction (real-time PCR). Perhaps the most challenging aspect of developing a DNA-based marker for diagnostics is identifying unique or distinguishing loci within the target organism to provide a high resolution of detection, perhaps at the pathovar or race level. The usefulness of the CPGR as a resource was validated by Lang et al. (63) in the development of highly specific PCR-based diagnostic markers that distinguished Xanthomonas oryzae pv oryzae and Xanthomonas oryzae pv oryzicola, the causal agents of bacterial blight and bacterial leaf streak of rice, respectively. These pathovars, which are on the USDA-APHIS Select Agent list (http://www.aphis.usda.gov/programs/ag_selectagent/ag_bioterr_toxinlist.shtml), cannot easily be differentiated by morphological or physiological characteristics in culture. Using the comparative and computational resources within the CPGR, sets of unique and conserved loci were identified. These lists of candidate markers were then screened in a panel of Xanthomonas strains using PCR to validate the bioinformatics prediction of their phylogenetic distribution. Due to the availability of genome sequences from not only the target species (X. oryzae pv oryzae and X. oryzae pv oryzicola), but also other species of Xanthomonas, delineation of bona fide markers from the candidate list was straightforward, demonstrating the power of genomics, coupled with bioinformatics, to facilitate diagnostic marker development.

Next-generation sequencing methodologies, in which ultra-high-throughput sequencing capabilities are coupled with highly reduced costs (18–20, 64), enable new research directions due to the inherent paradigm-changing scale of data generation. Certainly, data handling and mining will be a large challenge and a bottleneck that needs to be addressed. However, bioinformatics solutions such as Galaxy, an open source platform for next generation sequencing computational efforts (65) are emerging to handle and process these data sets. The CPGR is already incorporating data from these methodologies and merging them with data generated from ‘first generation sequencing platforms’. Assembled genomes sequenced with next generation sequencing technologies can be readily incorporated into the ASAP and CPGR databases. In fact, 17 of the genomes obtained from ASAP were generated using next-generation sequencing technologies and were seamlessly incorporated into the CPGR. Other amenable data sets include RNA-seq data sets (66) in which mRNA is converted into cDNA and sequenced using short read next-generation platforms. Algorithms are available to perform de novo assemblies (67) of these transcriptomes that can be readily incorporated into the CPGR Transcript Assemblies database. Whereas the CPGR can currently handle the volume of plant pathogen genomes being deposited in NCBI, the pace at which genomes are being generated along with the large range in quality of genome and transcriptomes generated, will become prohibitive. As a consequence, standards for the quality of the underlying sequence for inclusion in the CPGR will need to be invoked. For example, for new genome assemblies, i.e. those without a quality reference genome, N50 contig sizes need to be sufficiently robust to permit reasonably accurate gene prediction. For transcriptome data, the quality of the underlying reads are critical to successful transcript assembly and imposition of high-quality thresholds on the sequence reads would permit more robust transcript assemblies. As these next-generation sequencing methods improve, quality criteria for reads, assemblies and annotations will stabilize and permit community-defined quality standards for genome projects that can be applied to target genomes for the CPGR.

Clearly, there is enormous potential for genomic data to shape biology, including plant pathology and the CPGR provides a portal for plant pathologists to determine the genome sequence status of their organism of interest, data mine these bacterial and eukaryotic genomes and identify candidate markers for diagnostic marker development (68).

Funding

USDA National Institute for Food and Agriculture (grant nos. 2006-55605-16645 and 2006-55605-04558 to C.R.B., J.E.L. and N.A.T.); the joint CPGR–ASAP work is funded by the USDA National Institute for Food and Agriculture (grant no. 2009-65109-05719 to C.R.B. and N.P.). Funding for open access charge: USDA National Institute for Food and Agriculture grant no. 2009-65109-05719.

Conflict of interest. None declared.

References

Kamoun

Hraber

Sobral

, et al. ,

Initial assessment of gene diversity for the oomycete pathogen Phytophthora infestans based on expressed sequences

Fungal Genet. Biol.

1999

, vol.

(pg.

106

)

Childs

Hamilton

Zhu

, et al. ,

The TIGR Plant Transcript Assemblies database

Nucleic Acids Res.

2007

, vol.

(pg.

D846

D851

)

Pertea

Huang

Liang

, et al. ,

TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets

Bioinformatics

2003

, vol.

(pg.

651

652

)

Sayers

Barrett

Benson

, et al. ,

Database resources of the National Center for Biotechnology Information

Nucleic Acids Res.

2010

, vol.

(pg.

D16

)

Yin

Chen

Wang

, et al. ,

Generation and analysis of expression sequence tags from haustoria of the wheat stripe rust fungus Puccinia striiformis f. sp. tritici

BMC Genomics

2009

, vol.

pg.

626

Zhang

Zheng

, et al. ,

Stage-specific gene expression during urediniospore germination in Puccinia striiformis f. sp tritici

BMC Genomics

2008

, vol.

pg.

203

Brown

Cheung

Proctor

, et al. ,

Comparative analysis of 87,000 expressed sequence tags from the fumonisin-producing fungus Fusarium verticillioides

Fungal Genet. Biol.

2005

, vol.

(pg.

848

861

)

Keon

Antoniw

Rudd

, et al. ,

Analysis of expressed sequence tags from the wheat leaf blotch pathogen Mycosphaerella graminicola (anamorph Septoria tritici)

Fungal Genet. Biol.

2005

, vol.

(pg.

376

389

)

Neumann

Dobinson

. ,

Sequence tag analysis of gene expression during pathogenic growth and microsclerotia development in the vascular wilt pathogen Verticillium dahliae

Fungal Genet. Biol.

2003

, vol.

(pg.

)

Simpson

Reinach

Arruda

, et al. ,

The genome sequence of the plant pathogen Xylella fastidiosa. The Xylella fastidiosa Consortium of the Organization for Nucleotide Sequencing and Analysis

Nature

2000

, vol.

406

(pg.

151

159

)

Dean

Talbot

Ebbole

, et al. ,

The genome sequence of the rice blast fungus Magnaporthe grisea

Nature

2005

, vol.

434

(pg.

980

986

)

Kamper

Kahmann

Bolker

, et al. ,

Insights from the genome of the biotrophic fungal plant pathogen Ustilago maydis

Nature

2006

, vol.

444

(pg.

101

)

Opperman

Bird

Williamson

, et al. ,

Sequence and genetic map of Meloidogyne hapla: A compact nematode genome for plant parasitism

Proc. Natl Acad. Sci. USA

2008

, vol.

105

(pg.

14802

14807

)

Tyler

Tripathy

Zhang

, et al. ,

Phytophthora genome sequences uncover evolutionary origins and mechanisms of pathogenesis

Science

2006

, vol.

313

(pg.

1261

1266

)

Buell

Joardar

Lindeberg

, et al. ,

The complete genome sequence of the Arabidopsis and tomato pathogen Pseudomonas syringae pv. tomato DC3000

Proc. Natl Acad. Sci. USA

2003

, vol.

100

(pg.

10181

10186

)

Salanoubat

Genin

Artiguenave

, et al. ,

Genome sequence of the plant pathogen Ralstonia solanacearum

Nature

2002

, vol.

415

(pg.

497

502

)

Cuomo

Guldener

, et al. ,

The Fusarium graminearum genome reveals a link between localized polymorphism and pathogen specialization

Science

2007

, vol.

317

(pg.

1400

1402

)

Shendure

. ,

Next-generation DNA sequencing

Nat. Biotechnol.

2008

, vol.

(pg.

1135

1145

)

Munroe

Harris

. ,

Third-generation sequencing fireworks at Marco Island

Nat. Biotechnol.

2010

, vol.

(pg.

426

428

)

Metzker

. ,

Sequencing technologies - the next generation

Nat. Rev. Genet.

2010

, vol.

(pg.

)

Duan

Zhou

Hall

, et al. ,

Complete genome sequence of citrus huanglongbing bacterium, 'Candidatus Liberibacter asiaticus' obtained through metagenomics

Mol. Plant Microbe Interact.

2009

, vol.

(pg.

1011

1020

)

da Silva

Ferro

Reinach

, et al. ,

Comparison of the genomes of two Xanthomonas pathogens with differing host specificities

Nature

2002

, vol.

417

(pg.

459

463

)

Tettelin

Riley

Cattuto

, et al. ,

Comparative genomics: the bacterial pan-genome

Curr. Opin. Microbiol.

2008

, vol.

(pg.

472

477

)

Overbeek

Bartels

Vonstein

, et al. ,

Annotation of bacterial and archaeal genomes: improving accuracy and consistency

Chem. Rev.

2007

, vol.

107

(pg.

3431

3447

)

Angiuoli

Dunning Hotopp

Salzberg

, et al. ,

Improving pan-genome annotation using whole genome multiple alignment

BMC Bioinformatics

2011

, vol.

pg.

272

Otto

Dillon

Degrave

, et al. ,

RATT: Rapid Annotation Transfer Tool

Nucleic Acids Res.

2011

, vol.

pg.

e57

Tripathy

Pandey

Fang

, et al. ,

VMD: a community annotation database for oomycetes and microbial genomes

Nucleic Acids Res.

2006

, vol.

(pg.

D379

D381

)

Guldener

Mannhaupt

Munsterkotter

, et al. ,

FGDB: a comprehensive fungal genome resource on the plant pathogen Fusarium graminearum

Nucleic Acids Res.

2006

, vol.

(pg.

D456

D458

)

Wong

Walter

Lee

, et al. ,

FGDB: revisiting the genome annotation of the plant pathogen Fusarium graminearum

Nucleic Acids Res.

2011

, vol.

(pg.

D637

D639

)

Hedeler

Wong

Cornell

, et al. ,

e-Fungi: a data resource for comparative analysis of fungal genomes

BMC Genomics

2007

, vol.

pg.

426

Markowitz

Chen

Palaniappan

, et al. ,

The integrated microbial genomes system: an expanding comparative analysis resource

Nucleic Acids Res.

2010

, vol.

(pg.

D382

D390

)

Winnenburg

Urban

Beacham

, et al. ,

PHI-base update: additions to the pathogen-host interaction database

Nucleic Acids Res.

2008

, vol.

(pg.

D572

D576

)

Soanes

Skinner

Keon

, et al. ,

Genomics of phytopathogenic fungi and the development of bioinformatic resources

Mol. Plant Microbe Interact.

2002

, vol.

(pg.

421

427

)

Choi

Park

Kim

, et al. ,

Fungal Secretome Database: integrated platform for annotation of fungal secretomes

BMC Genomics

2010

, vol.

pg.

105

Wise

Caldo

Hong

, et al. ,

BarleyBase/PLEXdb

Methods Mol. Biol.

2007

, vol.

406

(pg.

347

363

)

Mungall

Emmert

. ,

A Chado case study: an ontology-based modular schema for representing genome-associated biological information

Bioinformatics

2007

, vol.

(pg.

i337

i346

)

Glasner

Rusch

Liss

, et al. ,

ASAP: a resource for annotating, curating, comparing, and disseminating genomic data

Nucleic Acids Res.

2006

, vol.

(pg.

D41

D45

)

Suzek

Huang

McGarvey

, et al. ,

UniRef: comprehensive and non-redundant UniProt reference clusters

Bioinformatics

2007

, vol.

(pg.

1282

1288

)

Wootton

Federhen

. ,

Analysis of compositionally biased regions in sequence databases

Methods Enzymol.

1996

, vol.

266

(pg.

554

571

)

Rosen

Skaletsky

Krawetz

Misener

. ,

Primer3 on the WWW for general users and for biologist programmers

Bioinformatics Methods and Protocols: Methods in Molecular Biology

2000

Totowa, NJ

Humana Press

(pg.

365

386

)

Google Preview

Darling

Mau

Perna

. ,

progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement

PLoS One

2010

, vol.

pg.

e11147

Chen

Mackey

Vermunt

, et al. ,

Assessing performance of orthology detection strategies applied to eukaryotic genomes

PLoS One

2007

, vol.

pg.

e383

Stoeckert

Roos

. ,

OrthoMCL: identification of ortholog groups for eukaryotic genomes

Genome Res.

2003

, vol.

(pg.

2178

2189

)

Gish

States

. ,

Identification of protein coding regions by database similarity search

Nat. Genet.

1993

, vol.

(pg.

266

272

)

Stein

Mungall

Shu

, et al. ,

The generic genome browser: a building block for a model organism system database

Genome Res.

2002

, vol.

(pg.

1599

1610

)

Finn

Mistry

Schuster-Bockler

, et al. ,

Pfam: clans, web tools and services

Nucleic Acids Res.

2006

, vol.

(pg.

D247

D251

)

Mulder

Apweiler

Attwood

, et al. ,

New developments in the InterPro database

Nucleic Acids Res.

2007

, vol.

(pg.

D224

D228

)

Rozen

Skaletsky

. ,

Primer3 on the WWW for general users and for biologist programmers

Methods Mol. Biol.

2000

, vol.

132

(pg.

365

386

)

Altschul

Madden

Schaffer

, et al. ,

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

Nucleic Acids Res.

1997

, vol.

(pg.

3389

3402

)

Gajendran

Gonzales

Farmer

, et al. ,

Phytophthora functional genomics database (PFGD): functional genomics of phytophthora-plant interactions

Nucleic Acids Res.

2006

, vol.

(pg.

D465

D470

)

Arnaud

Costanzo

Skrzypek

, et al. ,

The Candida Genome Database (CGD), a community resource for Candida albicans gene and protein information

Nucleic Acids Res.

2005

, vol.

(pg.

D358

D363

)

Arnaud

Chibucos

Costanzo

, et al. ,

The Aspergillus Genome Database, a curated comparative genomics resource for gene, protein and sequence information for the Aspergillus research community

Nucleic Acids Res.

2010

, vol.

(pg.

D420

D427

)

Haas

Kamoun

Zody

, et al. ,

Genome sequence and analysis of the Irish potato famine pathogen Phytophthora infestans

Nature

2009

, vol.

461

(pg.

393

398

)

Win

Bos

Krasileva

, et al. ,

Adaptive evolution has targeted the C-terminal domain of the RXLR effectors of plant pathogenic oomycetes

The Plant Cell

2007

, vol.

(pg.

2349

2369

)

Guidot

Prior

Schoenfeld

, et al. ,

Genomic structure and phylogeny of the plant pathogen Ralstonia solanacearum inferred from gene distribution analysis

J. Bacteriol.

2007

, vol.

189

(pg.

377

387

)

Chen

Zhang

Liu

, et al. ,

A real-time PCR assay for the quantitative detection of Ralstonia solanacearum in the horticultural soil and plant tissues

J. Microbiol. Biotechnol.

2010

, vol.

(pg.

193

201

)

Kubota

Vine

Alvarez

, et al. ,

Detection of Ralstonia solanacearum by loop-mediated isothermal amplification

Phytopathology

2008

, vol.

(pg.

1045

1051

)

Fessehaie

De Boer

Levesque

. ,

An oligonucleotide array for the identification and differentiation of bacteria pathogenic on potato

Phytopathology

2003

, vol.

(pg.

262

269

)

Aittamaa

Somervuo

Pirhonen

, et al. ,

Distinguishing bacterial pathogens of potato using a genome-wide microarray approach

Mol. Plant Pathol.

2008

, vol.

(pg.

705

717

)

Agindotan

Perry

. ,

Macroarray detection of plant RNA viruses using randomly primed and amplified complementary DNAs from Infected plants

Phytopathology

2007

, vol.

(pg.

119

127

)

Robideau

Caruso

Oudemans

, et al. ,

Detection of cranberry fruit rot fungi using DNA array hybridization

Can. J. Plant Pathol.

2008

, vol.

(pg.

226

240

)

Uehara

Kushida

Momota

. ,

Rapid and sensitive identification of Pratylenchus spp. using reverse dot blot hybridization

Nematology

1999

, vol.

(pg.

549

555

)

Lang

Hamilton

Diaz

MGQ

, et al. ,

Genomics-based diagnostic marker development for Xanthomonas oryzae pv. oryzae and X. oryzae pv. oryzicola

Plant Dis.

2010

, vol.

(pg.

311

319

)

Margulies

Egholm

Altman

, et al. ,

Genome sequencing in microfabricated high-density picolitre reactors

Nature

2005

, vol.

437

(pg.

376

380

)