Gemma is a set of tools for genomics data meta-analysis, currently primarily targeted at the analysis of gene expression profiles. Gemma contains data from thousands of public studies, referencing thousands of published papers. Users can search, access and visualize coexpression and differential expression results.

This webpage serves as an introduction and end-user documentation.

Table of contents

Key features:

Important Q&A

How do you pronounce ‘Gemma’?

The ‘G’ is soft, as in ‘general’.

How do you map probes to genes?

Essentially as described by Barnes et al, 2005. Gene annotations are obtained from NCBI and UCSC.

Isn’t expression data very noisy?

Yes, sometimes. This is a motivation for performing meta-analyses: to look for results that are in some sense consistent across laboratories.

Isn’t data quality a problem?

Yes; see the question about noisy data. We have been working hard to ensure that data sets we use for analysis are of high quality, or to “clean up” those that have problems. One problem we have observed is the presence of outlier samples, which are flagged and removed. Batch correction is implemented where possible and we analyze data from raw sources (CEL files or FASTQ files) where possible. We also have a dataset scoring system in place.


Terms and Conditions

This documentation is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License

The rest of this section concerns the use of the Gemma website and the Gemma Web Services, including the Gemma RESTful API (which is documented here).

Please read the full agreement after clicking here!

Non-binding summary of terms:

Using the Gemma website

These guides will help you navigate and use the tools provided through the gemma website.


The API has its own interactive documentation, where you will find all necessary information on how to interact with gemma programatically. Please follow this link to the RESTful API documentation

Data sources

We are indebted to the many researchers who have made data publicly available. Lists of published papers that relate to the data included in Gemma are available here (full list) and here (search).

If your data is in Gemma, and your paper is not listed, please let us know.

Reference data

GO and UCSC Genome Annotations were last updated in March 2018.


If you find a problem or need help, you can file a new github issue, or contact us at pavlab-support@msl.ubc.ca.


Financial support

National Insitute of Health Canada foundation for innovation Michael Smith Foundation for Health Research Neuro dev net
(NIGMS/NIMH) Grant: GM0769990 Canadian Institute of Health Research Genome British Columbia Natural Sciences and Engineering Research Council of Canada


If you use any of Gemma tools for your research, please cite:

Zoubarev, A., et al., Gemma: A resource for the re-use, sharing and meta-analysis of expression profiling data. Bioinformatics, 2012.

Other publications

Lee, H.K., et al., Coexpression analysis of human genes across many microarray data sets. Genome Research, 2004. 14: p. 1085-1094.


Project lead: Paul Pavlidis, Ph.D.

Developers: Matthew Jacobson, Justin Leong, Manuel Belmadani, Stepan Tesar.

Data curation: Brenna Li, James Liu, Patrick Savage, Nathan Holmes, Jenni Hantula, Nathan Eveleigh, John Choi, Artemis Lai, Cathy Kwok, Celia Siu, Luchia Tseng, Lydia Xu, Mark Lee, Olivia Marais, Roland Au, Suzanne Lane, Tianna Koreman, Willie Kwok, Yiqi Chen, Brandown Huntington, John Phan, Jimmy Liu, Cindy-Lee Crichlow, Sophia Ly, Ellie Hogan

System administration: Kevin Griffin, Stephen Macdonald, Dima Vavilov


The following people have contributed code, algorithms, implementations of algorithms, or other computational work relating to Gemma.


Other contributers to early stages of Gemma include David Quigley, Anshu Sinha and Gozde Cozen. Gemma’s precursor was TMM, which was developed by Homin Lee, Jon Sajdak, Jie Qin and Amy Hsu. Martin Krzywinski has provided helpful advice on visualization. ___

Copyright © University of British Columbia