There are many specialized databases and analysis tools for the design and research of bio-microarrays. CD Genomics develops and integrates various analytical methods on this basis, so that users can share data and achieve fast data analysis.

dbSNP ( The Single Nucleotide Polymorphism Database was established by NCBI in collaboration with National Human Genome Research Institute. It contains SNPs, InDels, microsatellites and ISSR, as well as information about their sources, detection and validation methods, genotype information, upstream and downstream sequences, frequency in populations, etc.

TSC: The SNP Consortium. A graphical genome browsing interface shows SNPs mapped onto the genome assembly in the context of externally available gene predictions and other features. SNP allele frequency and genotype data are available via FTP-download and on individual SNP report web pages. SNP linkage maps are available for download and for browsing in a comparative map viewer.

TCGA ( The Cancer Genome Atlas, a landmark cancer genomics program, molecularly characterized over 20,000 primary cancer and matched normal samples spanning 33 cancer types. This joint effort between the National Cancer Institute and the National Human Genome Research Institute began in 2006, bringing together researchers from diverse disciplines and multiple institutions.

GEO ( Gene Expression Omnibus is a public functional genomics data repository supporting MIAME-compliant data submissions. Array- and sequence-based data are accepted.

UALCAN ( a comprehensive, user-friendly, and interactive web resource for analyzing cancer OMICS data. It is built on PERL-CGI with high quality graphics using javascript and CSS.

SNPedia ( a wiki investigating human genetics. It shares information about the effects of variations in DNA, citing peer-reviewed scientific publications. It is used by Promethease to create a personal report linking your DNA variations to the information published about them.

GENCODE ( The goal of the GENCODE project is to identify and classify all gene features in the human and mouse genomes with high accuracy based on biological evidence, and to release these annotations for the benefit of biomedical research and genome interpretation.

ALFRED ( designed to make allele frequency data on human population samples readily available for use by the scientific and educational communities. ALFRED now has data on 664,708 polymorphisms, 762 populations and 66,726,252 frequency tables (one population typed for one site).

AFND ( The Allele Frequency Net Database provides a central source, freely available to all, for the storage of allele frequencies from different polymorphic areas in the Human Genome. Users can contribute the results of their work into one common database and can perform database searches on information already available.

SNPs3D ( a website that assigns molecular functional effects of non-synonymous SNPs based on structure and sequence analysis.

COSMIC ( The Catalogue of Somatic Mutations in Cancer is the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer.

cBioPortal ( integrates data from 126 cancer genome studies, including large cancer research projects such as TCGA and ICGC, covering 28,000 samples, and some samples also include phenotypic information such as clinical prognosis.

UCSC Cancer Genomics Browser ( a network analysis tool that integrates, visualizes, and analyses cancer genomics and clinical data. The platform currently has 355 data sets, including genome-wide data from 71,870 samples.

ArrayMap ( a curated reference database and bioinformatics resource targeting copy number profiling data in human cancer. The ArrayMap database provides an entry point for meta-analysis and systems level data integration of high-resolution oncogenomic CNA data.

Cancer Hotspots ( This resource is maintained by the Kravis Center for Molecular Oncology at Memorial Sloan Kettering Cancer Center. It provides information about statistically significantly recurrent mutations identified in large scale cancer genomics data.

OncoKB ( a comprehensive and curated precision oncology knowledge base, offers oncologists detailed, evidence-based information about individual somatic mutations and structural alterations present in patient tumors with the goal of supporting optimal treatment decisions.

ArrayExpress ( Archive of Functional Genomics Data stores data from high-throughput functional genomics experiments, and provides these data for reuse to the research community.

Oncomine ( from web applications to translational bioinformatics services—provides solutions for individual researchers and multinational companies, with robust, peer-reviewed analysis methods and a powerful set of analysis functions that compute gene expression signatures, clusters and gene-set modules, automatically extracting biological insights from the data.

CRN ( systematically collected RNA-seq datasets from The Cancer Genome Atlas (TCGA), Sequence Read Archive (SRA) and NCBI Gene Expression Omnibus (GEO). It resulted in 89 cancer RNA-seq datasets including 325 subsets and 12,167 samples.

DIANA ( provide algorithms, databases and software for interpreting and archiving data in a systematic framework ranging from the analysis of expression regulation from deep sequencing data, the annotation of miRNA regulatory elements and targets to the interpretation of the role of ncRNAs in various diseases and pathways.

LNCediting ( provides a comprehensive resource for the functional prediction of RNA editing in long noncoding RNAs (lncRNAs).

NPInter: expands the data set to 491,416 interactions in 188 tissues (or cell lines) from 68 kinds of experimental technologies. NPInter also improves the user interface and adds new web services, including a local UCSC Genome Browser to visualize binding sites. Additionally, NPInter defined a high-confidence set of interactions and predicted the functions of lncRNAs in human and mouse based on the interactions curated in the database.

LncReg ( predicts the binding information of lncRNA and microRNAs based on 259 research results.

LNCipedia ( a public database for long non-coding RNA (lncRNA) sequence and annotation. The current release contains 127,802 transcripts and 56,946 genes.

LncRNAMAP ( This site with four major differentiating features is distinct from public databases. It identifies the IncRNAs and provide the expression profiles for lncRNAs and their homologous protein-coding genes. It can provide the miRNA regulators of lncRNAs as well as their homologous genes. It detects the IncRNA-derived endogenous siRNAs (esiRNAs) as supported by any amount of sRNA deep sequencing data and construct the interactions of lncRNA-derived esiRNAs as well as their interacting gene targets in the human genome. And finally, it can present the neighboring genes between lncRNAs.

LncRNASNP ( a database providing comprehensive resources of single nucleotide polymorphisms (SNPs) in human/mouse lncRNAs. It contains SNPs in lncRNAs, SNP effects on lncRNA structure, mutation in lncRNAs and lncRNA:miRNA binding.

starBase ( mainly focus on miRNA-target interactions. It is an open-source platform for studying the miRNA-ncRNA, miRNA-mRNA, ncRNA-RNA, RNA-RNA, RBP-ncRNA and RBP-mRNA interactions from CLIP-seq, degradome-seq and RNA-RNA interactome data.

Lnc2Meth ( provide a comprehensive resource and web tool for clarifying the regulatory relationships between human lncRNAs and associated DNA methylation in diverse diseases.

lncSNP ( a database specially developed by researchers of Huazhong University of Science and Technology to collect information related to lncRNA and SNP, including human and mouse species.

RegRNA: an integrated web server for identifying functional RNA motifs in an input RNA sequence and a widely used regulatory RNA motifs identification tool by incorporating more analytical methods and updated data sources.

ChIPBase ( an open database for studying the transcription factor binding sites and motifs, and decoding the transcriptional regulatory networks of lncRNAs, miRNAs, other ncRNAs and protein-coding genes from ChIP-seq data. The database currently contains ~10,200 curated peak datasets derived from ChIP-seq methods in 10 species.

lncRNAdb: provides users with a comprehensive, manually curated reference database of 287 eukaryotic lncRNAs that have been described independently in the scientific literature.

LncRNADisease ( The LncRNADisease database is not only a resource that curated the experimentally supported lncRNA-disease association data but also a platform that integrated tool(s) for predicting novel lncRNA-disease associations. In addition, LncRNADisease also curated lncRNA interactions in various levels, including protein, RNA, miRNA, and DNA.

NONCODE ( an integrated knowledge database dedicated to non-coding RNAs (excluding tRNAs and rRNAs). Now, there are 17 species in NONCODE.

NRED: a public repository, which provides gene expression information for thousands of long ncRNAs in humans and mice. The database contains both microarray and in situ hybridization data.

MiRcode ( provides "whole transcriptome" human microRNA target predictions based on the comprehensive GENCODE gene annotation, including >10,000 long non-coding RNA genes. Coding genes are also covered, including atypical regions such as 5'UTRs and CDS.

Linc2GO: a web resource that aims to provide comprehensive functional annotations for human lincRNA. MicroRNA-mRNA and microRNA-lincRNA interaction data were integrated to generate lincRNA functional annotations based on the 'competing endogenous RNA hypothesis'.

Plant snoRNA Database ( brings together information from three independent computer-assisted searches of the Arabidopsis genome for box C/D snoRNA genes and from studies of ncRNAs. To date, the Arabidopsis box C/D snoRNAs have been used to identify approximately 250 genes from different non-Arabidopsis plant species and these sequences are included as alignments in the Database. Finally, the Database provides a unifying nomenclature for all of the plant snoRNA genes.

PlantCARE ( a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. 435 different names of plant transcription sites, describing more than 159 plant promoters.

GabiPD ( constitutes a repository and analysis platform for a wide array of heterogeneous data from high-throughput experiments in several plant species. Data from different 'omics' fronts are incorporated (i.e. genomics, transcriptomics, proteomics and metabolomics), originating from 14 different model or crop species.

PlantMarkers: a genetic marker database that contains a comprehensive pool of predicted molecular markers.

PlantProm: a plant promoter database, an annotated, non-redundant collection of proximal promoter sequences for RNA polymerase II with experimentally determined transcription start site(s), TSS, from various plant species.

POGs/PlantRBP: a relational database that integrates data from rice, Arabidopsis, and maize by placing the complete Arabidopsis and rice proteomes and available maize sequences into ‘putative orthologous groups’ (POGs).

TIGR Plant: a repository of sequences collected for the construction of transcript assemblies. It uses expressed sequences collected from the NCBI GenBank Nucleotide database for the construction of transcript assemblies. The sequences collected include expressed sequence tags (ESTs) and full-length and partial cDNAs, but exclude computationally predicted gene sequences.

ppdb: a plant promoter database that provides promoter annotation of Arabidopsis and rice. The database contains information on promoter structures, transcription start sites (TSSs) that have been identified from full-length cDNA clones and also a vast amount of TSS tag data.

TropGeneDB ( a database that manages genomic, genetic and phenotypic information about tropical crops. It is organised on a crop basis.

GrainGenes ( a digital platform that serves small grains research communities as their central data repository and as a facilitator for community activities, primarily for wheat, barley, oat, and rye.

BarleyBase ( and its successor PLEXdb ( are public resources for large-scale gene expression analysis for plants and plant pathogens.

PathoPlant ( a database on plant-pathogen interactions and components of signal transduction pathways related to plant pathogenesis. PathoPlant also harbors gene expression data from Arabidopsis thaliana microarray experiments to enable searching for specific genes regulated upon certain stimuli like pathogen infection, elicitor treatment, or abiotic stress. Validation of short DNA sequences as cis-elements responsive to different stimuli can also be performed in PathoPlant.

Phytome ( an online comparative genomics resource that can be applied to functional plant genomics, molecular breeding and evolutionary studies. It contains predicted protein sequences, protein family assignments, multiple sequence alignments, phylogenies and functional annotations for proteins from a large, phylogenetically diverse set of plant taxa. Phytome serves as a glue between disparate plant gene databases both by identifying the evolutionary relationships among orthologous and paralogous protein sequences from different species and by enabling cross-references between different versions of the same gene curated independently by different database groups.

PlantGDB ( a plant genome sequence database, mainly ESTs data. There are also gene annotations, EST genome mapping and links to other databases for these data.

The Plant Ontology ( describes plant anatomy, morphology, and the stages of plant development, and offers a database of plant genomics annotations associated with the PO terms. The scope of the PO has grown from its original design covering only rice, maize, and Arabidopsis, and now includes terms to describe all green plants from angiosperms to green algae.

EasyGO ( database is used to provide functional annotations of a series of genes to be looked up, as well as microarray probe information. It currently includes more than 40 data types from 15 species (mainly plants).

PlantTFDB ( identified 320,370 transcription factors from 165 species and classified them into 58 families. Species included in PlantTFDB cover the main lineages of green plants, providing genomic TF repertoires across green plants.

PlnTFDB ( a public database arising from efforts to identify and catalogue all Plant genes involved in transcriptional control.

PLEXdb ( in partnership with community databases, this database supports comparisons of gene expression across multiple plant and pathogen species, promoting individuals and/or consortia to upload genome-scale data sets to contrast them to previously archived data. These analyses facilitate the interpretation of structure, function and regulation of genes in economically important plants.

PlantQTL-GE: Provides access to genes, gene expression information in microarray data, expressed sequence tags (ESTs) and genetic markers on Oryza sativa and Arabidopsis thaliana. PlantQTL-GE is an integrated database system that assists the user to focus on candidate genes that are restricted to the quantitative trait locus QTLs of interest. Data were collected from the literature and various public databases.

harvEST ( initiated during the EST sequencing era for Barley, Brachypodium, Cassava, Citrus, Coffee, Cowpea, Musa (banana/plantain), Soybean, Rice and Wheat. HarvEST originated as EST database-viewing software in support of gene function analyses and oligonucleotide design, then grew to support activities including microarray content design, SNP identification, genotyping platform design, comparative genomics and the coupling of physical and genetic maps.

Gramene ( a curated, open-source, integrated data resource for comparative functional genomics in crops and model plant species.

