R/convenience.R
get_differential_expression_values.Rd
Retrieves the differential expression result set(s) associated with the dataset.
To get more information about the contrasts in individual resultSets and
annotation terms associated them, use get_dataset_differential_expression_analyses()
get_differential_expression_values(
dataset = NA_character_,
resultSets = NA_integer_,
keepNonSpecific = FALSE,
readableContrasts = FALSE,
memoised = getOption("gemma.memoised", FALSE)
)
A dataset identifier.
resultSet identifiers. If a dataset is not provided, all result sets will be downloaded. If it is provided it will only be used to ensure all result sets belong to the dataset.
logical. FALSE by default. If TRUE, results from probesets that are not specific to the gene will also be returned.
If FALSE
(default), the returned columns will
use internal constrasts IDs as names. Details about the contrasts can be accessed
using get_dataset_differential_expression_analyses
. If TRUE IDs will
be replaced with human readable contrast information.
Whether or not to save to cache for future calls with the
same inputs and use the result saved in cache if a result is already saved.
Doing options(gemma.memoised = TRUE)
will ensure that the cache is always
used. Use forget_gemma_memoised
to clear the cache.
A list of data tables with differential expression values per result set.
In Gemma each result set corresponds to the estimated effects associated with a single factor in the design, and each can have multiple contrasts (for each level compared to baseline). Thus a dataset with a 2x3 factorial design will have two result sets, one of which will have one contrast, and one having two contrasts.
The methodology for differential expression is explained in Curation of over 10000 transcriptomic studies to enable data reuse. Briefly, differential expression analysis is performed on the dataset based on the annotated experimental design with up two three potentially nested factors. Gemma attempts to automatically assign baseline conditions for each factor. In the absence of a clear control condition, a baseline is arbitrarily selected. A generalized linear model with empirical Bayes shrinkage of t-statistics is fit to the data for each platform element (probe/gene) using an implementation of the limma algorithm. For RNA-seq data, we use weighted regression, applying the voom algorithm to compute weights from the mean–variance relationship of the data. Contrasts of each condition are then computed compared to the selected baseline. In some situations, Gemma will split the data into subsets for analysis. A typical such situation is when a ‘batch’ factor is present and confounded with another factor, the subsets being determined by the levels of the confounding factor.
get_differential_expression_values("GSE2018")
#> $`573187`
#> Probe NCBIid GeneSymbol
#> <char> <char> <char>
#> 1: 210279_at 2841 GPR18
#> 2: 204529_s_at 9760 TOX
#> 3: 202746_at 9452 ITM2A
#> 4: 206761_at 10225 CD96
#> 5: 204352_at 7188 TRAF5
#> ---
#> 18011: 218682_s_at 22950 SLC4A1AP
#> 18012: 202853_s_at 6259 RYK
#> 18013: 203293_s_at 3998 LMAN1
#> 18014: 212708_at 339287 MSL1
#> 18015: 201060_x_at 2040 STOM
#> GeneName pvalue
#> <char> <num>
#> 1: G protein-coupled receptor 18 4.108e-15
#> 2: thymocyte selection associated high mobility group box 1.243e-14
#> 3: integral membrane protein 2A 5.784e-12
#> 4: CD96 molecule 9.116e-12
#> 5: TNF receptor associated factor 5 1.156e-11
#> ---
#> 18011: solute carrier family 4 member 1 adaptor protein 9.995e-01
#> 18012: receptor like tyrosine kinase 9.995e-01
#> 18013: lectin, mannose binding 1 9.998e-01
#> 18014: MSL complex subunit 1 9.997e-01
#> 18015: stomatin 9.999e-01
#> corrected_pvalue rank contrast_1_coefficient contrast_1_log2fc
#> <num> <num> <num> <num>
#> 1: 9.153e-11 4.488e-05 9.6010e-01 9.6010e-01
#> 2: 1.385e-10 8.975e-05 6.4080e-01 6.4080e-01
#> 3: 4.296e-08 1.000e-04 1.7128e+00 1.7128e+00
#> 4: 5.079e-08 2.000e-04 5.5470e-01 5.5470e-01
#> 5: 5.152e-08 2.000e-04 9.8710e-01 9.8710e-01
#> ---
#> 18011: 9.998e-01 9.997e-01 -5.8720e-05 -5.8720e-05
#> 18012: 9.998e-01 9.998e-01 -5.2710e-05 -5.2710e-05
#> 18013: 9.998e-01 1.000e+00 -3.6000e-05 -3.6000e-05
#> 18014: 9.998e-01 9.999e-01 2.9150e-05 2.9150e-05
#> 18015: 9.999e-01 1.000e+00 6.6370e-06 6.6370e-06
#> contrast_1_tstat contrast_1_pvalue
#> <num> <num>
#> 1: 13.37250000 4.219e-15
#> 2: 12.86830000 1.243e-14
#> 3: 10.27070000 5.784e-12
#> 4: 10.09150000 9.116e-12
#> 5: 9.99860000 1.156e-11
#> ---
#> 18011: -0.00060000 9.995e-01
#> 18012: -0.00060000 9.995e-01
#> 18013: -0.00030000 9.998e-01
#> 18014: 0.00040000 9.997e-01
#> 18015: 0.00007557 9.999e-01
#>