Overview

Gemma (pronounced: Jemma) is an open-source and open-data project for the curation, re-analysis and re-use of transcriptomic data. Gemma database provides harmonized data, experimental design annotations, and differential analysis results for over 20,000 diverse microarray and RNA-seq experiments for human, mouse and rat.

Key features:

All data sets are manually curated and QC’d in a standardized pipeline.
Support for a variety of expression technologies, including scRNA-seq, bulk RNA-seq, Affymetrix, Illumina and other oligonucleotide arrays, one channel and ratiometric cDNA arrays.
Re-annotated microarray platforms at the sequence level, which allows for more consistent cross-platform comparisons.
Re-processed raw sequencing data (single cell and bulk tissue RNA-seq).
Re-analysis of data sets for differential expression
GUI and programmatic access

Explore topics:

Data sources

We are indebted to the many researchers who have made data publicly available. Lists of published papers that relate to the data included in Gemma are available here (full list) and here (search).

If your data is in Gemma, and your paper is not listed, please let us know.

Contact

If you find a problem or need help, you can file a new github issue, or contact us at pavlab-support@msl.ubc.ca.

Credits

Financial support

As of 2023, Gemma is primarily supported by a grant from NIMH, and additional support from NSERC and CFI, for which we are grateful!

Citing

If you use any of Gemma tools or data for your research, please cite one of the following papers:

Lim et al. Curation of over 10 000 transcriptomic studies to enable data reuse. Database, 2021

Zoubarev, A., et al., Gemma: A resource for the re-use, sharing and meta-analysis of expression profiling data. Bioinformatics, 2012.

Project lead: Paul Pavlidis, Ph.D.