Published online 6 November 2013 Nucleic Acids Research, 2014, Vol. 42, Database issue D1229–D1236 doi:10.1093/nar/gkt1064 CottonGen: a genomics, genetics and breeding database for cotton research Jing Yu1, Sook Jung1, Chun-Huai Cheng1, Stephen P. Ficklin1, Taein Lee1, Ping Zheng1, Don Jones2, Richard G. Percy3 and Dorrie Main1,* 1 Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA, 2Cotton Incorporated, Cary, NC 27513, USA and 3Crop Germplasm Research Unit, USDA-ARS-SPARC, College Station, TX 77845, USA Received August 23, 2013; Revised October 12, 2013; Accepted October 14, 2013 ABSTRACT CottonGen (http://www.cottongen.org) is a curated and integrated web-based relational database providing access to publicly available genomic, genetic and breeding data for cotton. CottonGen supercedes CottonDB and the Cotton Marker Database, with enhanced tools for easier data sharing, mining, visualization and data retrieval of cotton research data. CottonGen contains annotated whole genome sequences, unigenes from expressed sequence tags (ESTs), markers, trait loci, genetic maps, genes, taxonomy, germplasm, publications and communication resources for the cotton community. Annotated whole genome sequences of Gossypium raimondii are available with aligned genetic markers and transcripts. These whole genome data can be accessed through genome pages, search tools and GBrowse, a popular genome browser. Most of the published cotton genetic maps can be viewed and compared using CMap, a comparative map viewer, and are searchable via map search tools. Search tools also exist for markers, quantitative trait loci (QTLs), germplasm, publications and trait evaluation data. CottonGen also provides online analysis tools such as NCBI BLAST and Batch BLAST. INTRODUCTION Cotton (Gossypium spp.) is the world’s leading natural textile ﬁbre crop and a signiﬁcant contributor of oilseed. Consisting of 50 species with different levels of ploidy, Gossypium has long served as a model for studying fundamental biological questions on genome evolution, plant development, polyploidization and crop productivity (1–5). The application of new sequencing technologies and high-throughput genotyping has improved understanding of diploid and polyploid cotton species and has resulted in a wealth of genetics, genomics and breeding information for cotton over the last two decades. These publicly available resources include 49 genetic maps, 24 000 markers, >1000 quantitative trait loci (QTL) representing >30 agronomically important traits, phenotype data from >15 000 germplasm accessions, >650 000 NCBI sequences derived from 181 DNA libraries, 18 000 genes and gene products, 460 000 expressed sequence tags (ESTs) and expression data in the form of microarrays and RNA-Seq from high-throughput sequencing. More recently, two genome assemblies and annotations of Gossypium raimondii, have become available (6,7). The availability of the cotton genome sequence provides a major source of candidate genes with potential for the genetic improvement of cotton quality and productivity. Integrating this whole genome data with other genomic and genetic data in an online database that is easy to query, view and download is essential to maximize utility of these valuable research data. Three online databases traditionally hosted much of the available genomic and genetic cotton data prior to 2012. CottonDB (8) was founded in 1995 as part of a national USDA-ARS program to develop plant genome databases for all agricultural commodities. Using a hybrid database system, the genomic, genetic, taxonomic and bibliographic data were stored in an object-oriented AceDB database (9), while the genetic maps and genome sequences were maintained in a MySQL relational database. Initiated in 2004, the Cotton Marker Database (CMD) (10) was funded by Cotton Incorporated to provide centralized access to all publicly available cotton simple sequence repeat (SSR) markers and accelerate basic and applied research in molecular breeding and genetic mapping. It used a custom MySQL database with search interfaces developed in the Perl programming language. The third *To whom correspondence should be addressed. Tel: +1 509 335 2774; Fax: +1 509 335 8690; Email: email@example.com The authors wish it to be known that, in their opinion, the ﬁrst two authors should be regarded as Joint First Authors. ß The Author(s) 2013. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from https://academic.oup.com/nar/article-abstract/42/D1/D1229/1046782 by guest on 29 March 2018 D1230 Nucleic Acids Research, 2014, Vol. 42, Database issue database, TropGene Cotton (11), was developed as part of a larger project to manage genetic, molecular and phenotypic data on tropical crop species. It uses a custom MySQL database with search interfaces developed in the Java programming language. The majority of public cotton data from TropGene was shared with CottonDB. CottonDB, while rich in data, was limited by older technology, which resulted in a relatively unfriendly query interface and made further development difﬁcult. CMD, although more user friendly, was limited primarily to marker data and used a custom database schema that limited the integration of other types of data. CottonGen, therefore, was created to address these limitations by consolidating and expanding cotton data from CottonDB, CMD and TropGene into a new, standardsbased, freely accessible scientiﬁc database for worldwide cotton research. Another feature developed in CottonDB but adopted by CottonGen is the hosting of the website for the International Cotton Genome Initiative (ICGI). ICGI is a non-proﬁt organization created in 2000 to increase knowledge of the structure and function of the cotton genome for the beneﬁt of the global community. It facilitates global communication, collaboration, and education; knowledge and resource integration; technology and resource development; and coordinates research planning. The CottonGen team agreed to redevelop and host the ICGI website within CottonGen as part of its mission to serve as a centralized resource for the cotton community. CottonGen is developed using Tripal (12), a toolkit for construction of online genomic and genetic databases. Tripal is based on a community-derived database schema named Chado (13) and employs the use of controlled vocabularies such as the Sequence Ontology (14), Gene Ontology (15) and others to ensure standardization of data storage. Tripal currently is used for several genome databases (16–21). Additionally, Tripal provides simpliﬁed site development by merging the power of Drupal (http://drupal.org), a popular web Content Management System allowing non-programmers the ability to contribute content with Chado. Migration of data from CottonDB to CottonGen was initiated on 1 October 2011, and CottonGen was released one year later, superseding CottonDB and CMD with additional data and enhanced functionality. As of 15 August 2013, CottonGen includes (i) the Gossypium raimondii whole genome assemblies and annotation, (ii) annotated unigene for the Gossypium genus, (iii) extensive genetic and QTL maps, markers and traits, (iv) trait evaluation data, (v) enhanced user interfaces including various search tools with downloadable results and (vi) resources to support community activities and to facilitate communication among cotton researchers. Here we describe the data and the functionality in CottonGen. DATABASE DESCRIPTION CottonGen Data and Web Interface CottonGen contains various genetics, genomics and trait evaluation data including annotated whole genome Downloaded from https://academic.oup.com/nar/article-abstract/42/D1/D1229/1046782 by guest on 29 March 2018 sequences, EST sequences, markers, traits, genetic maps, genes, taxonomy, germplasm and publications. All CottonGen web pages have a common navigation menu for easy access. The navigation menu provides links for general information, data, search, tools, help and community resources for the ICGI. The data section lists major data classes in CottonGen (Table 1), such as gene, genome, germplasm, map, marker, publication, species and trait. Users can view a summary of the data, and various links to access the data. The search section lists various search tools such as for genes, germplasm, markers, QTL, publications and trait evaluation. Each search tool provides options for customization by applying restrictions in the query. From the search result site or the downloads page, users can download the entire data and/or go to the various data details pages. Major CottonGen data and the web interface to the data are described below. Genomics data Whole genome sequence data CottonGen includes the ﬁrst fully sequenced cotton species, Gossypium raimondii, from two independent research teams (6,7). On CottonGen, these assemblies are titled the ‘Gossypium raimondii (D5) genome JGI assembly v2.0 (annot v2.1)’ (6) (referred to hereafter as the JGI version) and the ‘Gossypium raimondii (D5) Draft Genome BGI-CGP v1.0 Assembly & Annotation’ (7) (referred to hereafter as the draft BGI version). The predicted genes from these assemblies have been further annotated by the CottonGen team to include homology to proteins in other well annotated or closely related species, and in silico annotation of InterPro protein domains, GO terms and Kyoto Encyclopedia of Genes and Genomes database (KEGG) pathway terms, providing information on probable pathways and traits. Additional annotation by the CottonGen team includes the alignment of cotton genetic markers, and cotton transcripts such as CottonGen Unigene version v1, Udall cotton Unigene contigs (22), PlantGDB Cotton Unigene and NCBI Cotton ESTs from all major Gossypium species. Single nucleotide polymorphisms (SNPs) between the diploid genomes of A and D and those between the tetraploid genomes of AT and DT (T represents tetraploid) were also aligned to the JGI version of the G. raimondii reference genome (23,24). The annotated sequence data can be accessed in CottonGen via the genome page, gene and sequence search tools and GBrowse (25). The genome pages, found under the data navigation menu, contain various downloadable ﬁles including the FASTA ﬁles of predicted gene transcripts, coding sequences (CDS) and predicted gene peptides. Excel ﬁles of protein homologues with cotton genes and other species including those found in databases such as Swiss-Prot and TrEMBL (26) and NCBI nr (27), are also available with hyperlinks to these databases. Other downloadable ﬁles include ESTs and genetic markers in FASTA and Excel format that map to the whole genome sequences and functional annotation ﬁles from protein, Interpo and KEGG alignments. In the gene and sequence search tools, whole genome data can be Nucleic Acids Research, 2014, Vol. 42, Database issue D1231 Table 1. Number of CottonGen entries by data type (15 August 2013) Data type BLAST Genome Gene Germplasm Marker Map QTL Publication Species Trait evaluation Number of entries 20 2 119 971 14 959 23 935 49 988 10 731 49 73 296 Details 5 peptide data sets, 15 nucleotide data sets (genome sequences, markers, unigenes, ests) for BLAST searching. Draft BGI v1.0 and JGI annot v2.1 G. raimondii genome projects. 1269 cotton genes from NCBI gene (06/12/2013); 40 976 and 77 726 CDS from the BGI v1.0 and JGI annot v2.1 G. raimondii genome projects, respectively, and 21 698 Contigs from CottonGen Gossypium Unigene v1.0. From 14 collections. 19 074 SSRs, 3541 RFLPs, 2146 AFLPs, 1018 SNPs and 310 other types. 34 559 loci Representing 25 traits Journal articles, conference proceedings, patents, book chapters and theses. Origin, genome group, germplasm, haploid number, sequences and libraries. From 6871 accessions found by ﬁltering by name, GO terms, InterPro domains or KEGG pathway terms (28) (Figure 1). From the alignment page, users can go to GBrowse. Using GBrowse, site visitors can view genomic features aligned to the genome, such as gene models, repeats, SNPs, as well as alignments of ESTs, repeats, genetic markers and genes from other plant model species. Each feature in GBrowse is hyperlinked to a page with sequences and additional information, and hyperlinks to external databases where applicable. The chloroplast genome sequences and annotations of Gossypium hirsutum, Gossypium barbadense, Gossypium arboreum and G. raimondii are also available in GBrowse. Annotated EST unigene CottonGen contains all Gossypium ESTs publicly available from dbEST at NCBI as of 12 September 2012. To reduce inherent redundancy in ESTs and generate a data set representing the genes of cotton, we developed the CottonGen v1.0 unigene. Routine processing involved sequence ﬁltering for contamination against the NCBI UniVec database and species-speciﬁc chloroplast, mitochondrial, tRNA and rRNA sequences using the BLAST algorithm with NCBI UniVec-recommended parameters; trimming of low quality sequence; assembly into contigs using CAP3 (29) with an overlap percentage parameter of 90% (p -90); and annotation. 437 185 ﬁltered sequences were assembled into 21 698 contigs and 128 218 singletons to make a unigene set of 149 916 sequences. The CottonGen annotation procedure includes comparison of both the ﬁltered ESTs and the EST contig consensus sequences using BLASTX against the SWISS-PROT, TrEMBL, InterPro, TAIR (30) and other well annotated species protein databases. The top 10 matches with an expectation value <1e-6 are recorded for each EST and contig. Results of in silico functional annotations of Gene Ontology (GO) terms and functional classiﬁcation by pathways from KEGG are also recorded in the database. The 21 698 contigs from the v1.0 unigene can be searched using the gene and sequence search tools by name, Interpro domain, GO term or KEGG term or gene and the results downloadable as Excel ﬁles from the search page. All the Downloaded from https://academic.oup.com/nar/article-abstract/42/D1/D1229/1046782 by guest on 29 March 2018 unigene data set and annotations can also be obtained from the downloads page. Additional sequence annotation includes computational analysis of SSR found in the unigene contigs using the method described in Jung et al., 2008. Of the 21 698 contigs, 24.6% had one or more SSRs, with 493 motifs detected in 6979 SSRs. The results may be obtained from the Downloads page as an Excel ﬁle with details for each SSR containing sequence including motif, motif length, location in the sequence, location relative to the ORF, suggested primers and expected product size. NCBI genes All Gossypium sequences from the NCBI nucleotide database were downloaded, parsed for gene, mRNA, CDS, 50 UTR and 30 UTR features and imported to CottonGen. As with predicted genes from whole genome sequences, genes parsed from NCBI have been further annotated by homology to genes in other species, InterPro protein domains, GO terms and KEGG pathway terms. The distinct gene names in Gossypium are stored separately in the database to build a community-driven gene database for cotton. Each gene, unique in the Gossypium genus, is currently linked to all the NCBI genes from various species and will serve as a base entity to be linked to other associated data such as predicted genes from whole genome sequences, QTL, genetic markers and mutant phenotypes as annotation progresses. All genes and mRNAs that are parsed out from NCBI sequences are searchable in the gene search site. Map, marker and QTL data CottonGen provides access to the cotton genetic, QTL, and physical (FPC) maps, including the underlying molecular markers, QTL and mapping populations. For sequencebased markers such as SSRs, Amplifed Fragment Length Polymorphisms (AFLPs), Sequence Related Ampliﬁed Polymorphisms (SRAPs), and cDNA-Rapid Fragment Length Polymorphisms (RFLPs), CottonGen provides details on experimental conditions, such as the primer, amplicon-sequence information and the PCR ampliﬁcation conditions. CottonGen currently has 49 maps, which covers D1232 Nucleic Acids Research, 2014, Vol. 42, Database issue Figure 1. Gene/Sequence search site in CottonGen. (A) Genes/sequences can be searched using various categories, such as by name, GO terms, InterPro protein domain name or KEGG pathway term. The example shows the InterPro term search site. (B) The search result page has links to the download, gene/sequence detail page and external database. (C) The Gene detail page has various tabs to show the data. The annotated sequence page is highlighted. (D) The alignment tab of the gene detail page shows the position in the whole genome with link to GBrowse. (E) The GBrowse page linked from the alignment tab of the gene detail page. Users can go back to the gene detail page from GBrowse. Gossypium genome groups AD, A, D and G, consisting of approximately 34 000 marker loci and a thousand QTLs. Markers can be browsed and searched using various search interfaces (found under search and then markers in the navigation menu). All markers can be searched by marker source, map information or nearby loci. The advanced marker search interface allows researchers to search by various categories in combination (Figure 2). Researchers can also browse/search only the mapped markers with sequences using various categories. From the search result page, researchers can go to the details pages of markers, maps, sequences, germplasm and species. From the marker details page, relevant data such as marker source, primers, polymorphisms, map information and anchored position in the genome can be accessed. CottonGen houses 273 QTLs with associated data such as CottonGen curator-assigned QTL label, published symbol, trait name, alias, population, map position, associated markers and statistical values. The QTL search page allows searching for QTLs by trait name, published symbol and QTL label. Search results are hyperlinked to CMap (31) and downloadable in Excel format. Downloaded from https://academic.oup.com/nar/article-abstract/42/D1/D1229/1046782 by guest on 29 March 2018 Germplasm and trait evaluation data CottonGen includes information for each of the 50 Gossypium species such as genome groups, geographic origins, inter-species compatibilities and germplasm. About 15 000 germplasm accessions are stored in CottonGen. These individuals were identiﬁed from >47 000 entries that consist mainly of the USDA-ARS Germplasm Resources Information Network (32) cotton collection, the cotton germplasm collection of the China Cotton Research Institute, the Chinese Academy of Agricultural Sciences and the cotton germplasm collection of Uzbekistan Center of Genomics and Bioinformatics, Academy of Sciences of Uzbekistan. Germplasm data include aliases, pedigrees, publically available passport information, stock collection centre, associated maps, libraries and sequences. In addition, trait evaluation data, with >118 000 trait scores, from 9000 germplasm are available. The Gossypium species summary page (found under data and then species in the navigation menu) provides a list of species along with information such as genome group, haploid chromosome number and geographic origin. The summary of data available in CottonGen is also given: number of germplasm, sequences Nucleic Acids Research, 2014, Vol. 42, Database issue D1233 Figure 2. Marker search site in CottonGen. (A) Multiple markers search sites are available based on the type of information users are interested in. (B) An example search interface where users can view and search for marker source information. (C) A Marker details page with various links to detailed information. (D) The Map position tab of the marker page shows all the maps where the marker has been mapped. (E) From the marker page users can go to the CMap. (F) For the markers that are anchored to the genome, CMap provides hyperlinks to GBrowse. From GBrowse users can follow the links to go back to Cmap, the marker detail page or the Sequence Retrieval Tool. and DNA libraries. The species name in the table leads to a species page, which shows more details such as common name, images and additional data as seen in the summary table. The species page also shows the results of functional analysis of the genes, both from NCBI and whole genome sequences, which include KEGG and GO analysis reports. Several germplasm search pages provide access to different types of data (Figure 3). The search by collection page provides a list of germplasm along with stock collection centre information. The search can be ﬁltered by collection centre name, germplasm name and/or accession name in the stock centre. The search by pedigree page provides an interface to search germplasm by pedigree and the search germplasm by country page searches by the country of origin. From the germplasm search page, researchers can go to the germplasm details page, which shows all the detail information such as pedigree, passport, collection centre, image and associated genotypic and phenotypic data. Germplasm can also be searched based on their trait evaluation data. Both the qualitative and quantitative trait evaluation search sites Downloaded from https://academic.oup.com/nar/article-abstract/42/D1/D1229/1046782 by guest on 29 March 2018 allows the trait values of up to three trait descriptors to be speciﬁed to view the germplasm trait data. Data from all the search result sites can be downloaded in Excel ﬁles. Publications CottonGen houses information about publications that are important to cotton researchers. Details about publications were imported to CottonGen from NCBI PubMed (http://www.ncbi.nlm.nih.gov/pubmed) and the USDA National Agricultural Library (NAL) (http://agricola. nal.usda.gov/) databases. Additionally, details of publications from other journals not present in PubMed or the USDA NAL databases were manually imported to CottonGen. In addition, CottonGen maintains reference information and abstracts for works published in cotton research conference proceedings such as the ICGI Conferences and the Plant and Animal Genome Conferences. Book chapters, theses and patents are also collected. In total, CottonGen houses 10 731 references. Publications can be found using a combinations of keywords (in the abstract or title), all or partial titles, D1234 Nucleic Acids Research, 2014, Vol. 42, Database issue Figure 3. Germplasm search site in CottonGen. (A) Multiple germplasm search sites are available based on the type of information users are interested in. (B) An example search where users can view and search for germplasm and their collection centre. (C) A germplasm details page with various tabs to show the detailed information. (D) The Map tab of a germplasm page shows all the maps for which the germplasm has been used. (E) From the map page users can open CMap for further exploration. authors and other categories. Search results link to publication pages that contain the abstract, citation, external link to the full article and other details about the publication. Online analysis tools CottonGen contains several online analysis tools. These include an instance of NCBI’s wwwBLAST tool (http:// www.ncbi.nlm.nih.gov/staff/tao/URLAPI/wwwblast/) and a custom Batch BLAST tool where users can perform pair-wise BLAST alignments using their sequences against the current 20 CottonGen data sets. The Batch BLAST server supports upload of large data sets for pair-wise comparison. It executes BLAST, and parses the output into an Excel ﬁle. Users are notiﬁed by email when the job is complete and directed to a website to download result ﬁles. The same data sets are available in both BLAST servers for alignment. Protein data sets available for BLAST include Gossypium proteins from GenBank and UniProKB and G. raimondii protein sequences from the draft BGI v1.0 and JGI v2.1 genome data. Nucleotides sequence databases include GenBank Gossypium Downloaded from https://academic.oup.com/nar/article-abstract/42/D1/D1229/1046782 by guest on 29 March 2018 sequences, Gossypium dbSNP, CottonGen SSR, RFLP, and SNP/InDel marker sequences, CottonGen Gossypium unigene v1.0, DFCI Cotton Gene Index v11 (http://compbio.dfci.harvard.edu/tgi/plant.html), PlantGDB (http://www.plantgdb.org/) unigene from several Gossypium species, Udall 2012 transcript contigs and predicted genes and genome sequences from the BGJ and JGI genome data. The Sequence Retrieval tool enables download of sequences including full chromosomes, scaffolds, genes, full transcripts, transcript coding sequences, proteins, genetic markers aligned to chromosomes, unigene contigs and ESTs. Users supply a list of sequence names to retrieve, and can ﬁlter by a speciﬁc genome assembly, unigene or other project data. For features aligned to a whole genome, such as genes, transcripts and genetic markers, a user can include a speciﬁed number of upstream and downstream bases in the sequence. Community resources CottonGen houses the resources for the ICGI. It maintains the ICGI membership database, information for the ICGI biennial international research conferences, hosting of biennial elections and tools for registration and Nucleic Acids Research, 2014, Vol. 42, Database issue manuscript submission for the 2012 ICGI Conference. The CottonGen home page includes rotating pictures for recent research stories or community news, brief project descriptions, a news section for the cotton community and a section to quickly ﬁnd newly added site functionality or data. Email mailing lists for both CottonGen and ICGI are available for communication with the community, and the mailing list archives can be viewed online. Other resources in the help section provide a Frequently Asked Question page for CottonGen and ICGI and user tutorials for both. D1235 FUNDING Cotton Incorporated; the USDA-ARS Crop Germplasm Research Unit at College Station, TX; Southern Association of Agricultural Experiment Station Directors; Bayer CropScience; Dow/Phytogen; Monsanto. Components of the infrastructure for CottonGen were created under funding for Tripal development for other databases (USDA NIFA [2009-5118106036, 2009-51181-05808]). As these databases all use the same underlying Tripal infrastructure, source code was shared amongst all of these databases. That code is also freely available on the Tripal website at http://tripal.info. Funding for open access charge: CottonGen Grant. FUTURE PLANS CottonGen will be updated as new data become available and new or improved functionality is added to the site. This includes adding GBrowse-syn, a GBrowse-based synteny browser (33), to view multiple sequence alignment data, synteny or co-linearity data from closely related or useful species such as cacao and Arabidopsis. A comprehensive breeders toolbox, similar to that developed for the Rosaceae community as part of the USDA NIFA SCRIfunded project RosBREED (Grant number #2009-5118106036), is planned for future implementation. In addition, a digital image library will be created for over one hundred thousand images created from the USDA-ARS Research Project: ‘Genotypic and Phenotypic Analysis and Digital Imaging of Accessions in the US National Cotton Germplasm Collection’. The associated phenotypic data will also be stored in CottonGen. CONCLUSION CottonGen is now the consolidated cotton genomics, genetics and breeding database for the cotton community. It aims to provide a comprehensive, integrated, online resource that serves basic, translational and applied cotton research. It is constructed using the open-source Tripal genome database toolkit, which merges the power of Drupal, a popular web Content Management System with that of Chado, a community-derived database schema for storage of genomic and genetic data. Data types in CottonGen include maps and markers, whole genome assemblies and annotations, gene and sequences with analyzed data, taxonomic and germplasm data and publication data. CottonGen maintains online resources for ICGI, a non-proﬁt organization created as a global afﬁnity group with common goals and interests. From its release on 1 March 2012 to 15 August 2013, CottonGen had 11 111 visits by 4756 unique visitors from 94 countries who accessed 75 551 pages. ACKNOWLEDGEMENTS We acknowledge with thanks our funding sources, the cotton research community providing data and feedback and the Tripal community of developers for developing and sharing Tripal modules and code. Downloaded from https://academic.oup.com/nar/article-abstract/42/D1/D1229/1046782 by guest on 29 March 2018 Conﬂict of interest statement. None declared. REFERENCES 1. Brubaker,C.L. and Wendel,J.F. (1994) Reevaluating the origin of domesticated cotton (Gossypium hirsutum; Malvaceae) using nuclear restriction fragment length polymorphisms (RFLPs). Am. J. Bot., 81, 1309–1326. 2. Brubaker,C.L., Bourland,F.M. and Wendel,J.F. (1999) The origin and domestication of cotton. In: Smith,C.W. and Cothren,J.T. (eds), ‘Cotton: Origin, History, Technology and Production’. Wiley, New York, pp. 3–31. 3. Percy,R.G. and Wendel,J.F. (1990) Allozyme evidence for the origin and diversiﬁcation of Gossypium barbadense L. Theor. Appl. Genet., 79, 529–542. 4. Wendel,J.F. (1989) New World tetraploid cottons contain Old World cytoplasm. Proc. Natl Acad. Sci. USA, 86, 4132–4136. 5. Wendel,J.F., Small,R.L., Cronn,R.C. and Brubaker,C.L. (1999) Genes, jeans, and genomes: reconstructing the history of cotton. In: van Raamsdonk,L.W.D. and den Nijs,J.C.M. (eds), Plant Evolution in Man-Made Habitats. Proceedings of the VIIth International Symposium of the International Organization of Plant Biosystematists. Hugo de Vries Laboratory, Amsterdam, The Netherlands, pp. 133–161. 6. Paterson,A.H., Wendel,J.F., Gundlach,H., Guo,H., Jenkins,J., Jin,D., Llewellyn,D., Showmaker,K.C., Shu,S., Udall,J. et al. (2012) Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton ﬁbres. Nature, 492, 423–427. 7. Wang,K.B., Wang,Z.W., Li,F.G., Ye,W.W., Wang,J.Y., Song,G.L., Yue,Z., Cong,L., Shang,H.H., Zhu,S.L. et al. (2012) The draft genome of a diploid cotton Gossypium raimondii. Nat. Genet., 44, 1098–1103. 8. Yu,J., Kohel,R., Hinze,L., Yu,J.Z., Frelichowski,J., Ficklin,S.G., Main,D. and Percy,R.G. (2012) CottonDB. In: Proceedings of the International Plant and Animal Genome Conference: January 14–18, 2012, San Diego, CA, USA. 9. Durbin,R. and Thierry-Mieg,J. (1994) The ACEDB genome database. In: Suhai,S. (ed.), Computational Methods in Genome Research. Springer, USA, pp. 45–55. 10. Blenda,A., Schefﬂer,J., Schefﬂer,B., Palmer,M., Lacape,J., Yu,J.Z., Jesudurai,C., Jung,S., Muthukumar,S., Yellambalase,P. et al. (2006) CMD: a cotton microsatellite database resource for Gossypium genomics. BMC Genomics, 7, 132. 11. Hamelin,C., Sempere,G., Jouffe,V. and Ruiz,M. (2013) TropGeneDB, the multi-tropical information system updated and expanded. Nucleic Acids Res., 41, D1172–D1175. 12. Ficklin,S.P., Sanderson,L., Cheng,C.H., Staton,M.E., Lee,T., Cho,I.H., Jung,S., Bett,K.E. and Main,D. (2011) Tripal: a construction toolkit for online genome databases. Database (Oxford), 2011, bar044. 13. Mungall,C.J., Batchelor,C. and Eilbeck,K. (2011) Evolution of the Sequence Ontology terms and relationships. J. Biomed. Inform., 44, 87–93. D1236 Nucleic Acids Research, 2014, Vol. 42, Database issue 14. Yandell,M., Stein,L., Durbin,R. and Ashburner,M. (2005) The sequence ontology: a tool for uniﬁcation of genome annotations. Genome Biol., 6, R44. 15. Gene Ontology Consortium. (2013) Gene Ontology annotations and resources. Nucleic Acids Res., 41, D530–D535. 16. Jung,S., Staton,M.E., Lee,T., Blenda,A., Svancara,R., Abbott,A.G. and Main,D. (2008) GDR (Genome Database for Rosaceae): integrated web-database for Rosaceae genomics and genetics data. Nucleic Acids Res., 36, D1034–D1040. 17. Sanderson,L.A., Krilow,C., Vandenberg,B., Tom,W., Taran,B. and Bett,K. (2012) KnowPulse: a breeder-focused web portal that integrates genetics and genomics of pulse crops with model genomes. In: Proceedings of the International Plant and Animal Genome Conference: January 14–18, 2013, San Diego, CA, USA. 18. Droc,G., Lariviere,D., Guignon,V., Yahiaoui,N., This,D., Garsmeur,O., Dereeper,A., Hamelin,C., Argout,X., Dufayard,J.F. et al. (2013) The banana genome hub. Database (Oxford), 2013, bat035. 19. Staton,M.E., Barakat,A.U., Ficklin,S.P., Tomkins,J.P., DiLoreto,D.S. and Carlson,J.E. (2009) Genomic tool development for the fagaceae. In: Proceedings of the International Plant and Animal Genome Conference: January 10–14, 2013, San Diego, CA, USA. 20. Main,D., Jung,S., Ficklin,S.P., Zheng,P., Cheng,C.-H., Olmstead,M., Abbott,A.G., Blenda,A.V., Lee,T., Chen,C. et al. (2013) Citrus genome database: updates and new functionality. In: Proceedings of the International Plant and Animal Genome Conference: January 14–18, 2013, San Deigo, CA, USA. 21. Ficklin,S.P., Cook,M.D., Morris,P.J., Gemmill,J.B., Chapman,R.W. and Warr,G.W. (2009) A review of the marine genomics project, a web-based genomic and transcriptional database. In: Proceedings of the International Plant and Animal Genome Conference: January 10–14, 2013, San Diego, CA, USA. 22. Udall,J.A., Swanson,J.M., Haller,K., Rapp,R.A., Sparks,M.E., Hatﬁeld,J. and Wendel,J.F. (2006) A global assembly of cotton ESTs. Genome Res., 16, 441–450. 23. Page,J.T., Huynh,M.D., Liechty,Z.S., Grupp,K., Stelly,D., Hulse,A.M., Ashraﬁ,H., Deynze,A.V., Wendel,J.F. and Udall,J.A. (2013) Insights into the evolution of cotton diploids and Downloaded from https://academic.oup.com/nar/article-abstract/42/D1/D1229/1046782 by guest on 29 March 2018 polyploids from whole-genome re-sequencing. G3 (Bethesda), 3, 1809–1818. 24. Page,J.T., Gingle,A.R. and Udall,J.A. (2013) PolyCat: a resource for genome categorization of sequencing reads from allopolyploid organisms. G3 (Bethesda), 3, 517–525. 25. Stein,L.D., Mungall,C., Shu,S.Q., Caudy,M., Mangone,M., Day,A., Nickerson,E., Stajich,J.E., Harris,T.W., Arva,A. et al. (2002) The Generic Genome Browser: a building block for a model organism system database. Genome Res., 12, 1599–1610. 26. Uniprot Consortium. (2013) Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res., 41, D43–D47. 27. NCBI Resources Coordinators. (2013) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 41, D8–D20. 28. Kanehisa,M., Goto,S., Sato,Y., Furumichi,M. and Tanabe,M. (2012) KEGG for integration and interpretation of large-scale molecular datasets. Nucleic Acids Res., 40, D109–D114. 29. Huang,X. and Madan,A. (1999) CAP3: a DNA sequence assembly program. Genome Res., 9, 868–877. 30. Lamesch,P., Berardini,T.Z., Li,D., Swarbreck,D., Wilks,C., Sasidharan,R., Muller,R., Dreher,K., Alexander,D.L., GarciaHernandez,M. et al. (2011) The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res., 40, D1202–D1210. 31. Youens-Clark,K., Faga,B., Yap,I.V., Stein,L. and Ware,D. (2009) CMap 1.01: a comparative mapping application for the Internet. Bioinformatics, 25, 3040–3042. 32. Wiersema,J.H. (1994) Taxonomic information on cultivated plants in the USDA/ARS germplasm resources information network (GRIN). In: Tukey,H.B. Jr (ed.), Acta Horticulturae 413. International Society for Horticultural Science. Proceedings of the Second International Symposium on the Taxonomy of Cultivated Plants, 10–14 August 1994, Seattle, Washington, USA. Leuven, Belgium. 33. McKay,S.J., Vergara,I.A. and Stajich,J.E. (2010) Using the Generic Synteny Browser (GBrowse_syn). Curr. Protoc. Bioinformatics, Chapter 9, Unit 9.12.