Retrieves the differential expression result set(s) associated with the dataset. To get more information about the contrasts in individual resultSets and annotation terms associated them, use get_dataset_differential_expression_analyses()

get_differential_expression_values(
  dataset = NA_character_,
  resultSets = NA_integer_,
  keepNonSpecific = FALSE,
  readableContrasts = FALSE,
  memoised = getOption("gemma.memoised", FALSE)
)

Arguments

dataset

A dataset identifier.

resultSets

resultSet identifiers. If a dataset is not provided, all result sets will be downloaded. If it is provided it will only be used to ensure all result sets belong to the dataset.

keepNonSpecific

logical. FALSE by default. If TRUE, results from probesets that are not specific to the gene will also be returned.

readableContrasts

If FALSE (default), the returned columns will use internal constrasts IDs as names. Details about the contrasts can be accessed using get_dataset_differential_expression_analyses. If TRUE IDs will be replaced with human readable contrast information.

memoised

Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.

Value

A list of data tables with differential expression values per result set.

Details

In Gemma each result set corresponds to the estimated effects associated with a single factor in the design, and each can have multiple contrasts (for each level compared to baseline). Thus a dataset with a 2x3 factorial design will have two result sets, one of which will have one contrast, and one having two contrasts.

The methodology for differential expression is explained in Curation of over 10000 transcriptomic studies to enable data reuse. Briefly, differential expression analysis is performed on the dataset based on the annotated experimental design with up two three potentially nested factors. Gemma attempts to automatically assign baseline conditions for each factor. In the absence of a clear control condition, a baseline is arbitrarily selected. A generalized linear model with empirical Bayes shrinkage of t-statistics is fit to the data for each platform element (probe/gene) using an implementation of the limma algorithm. For RNA-seq data, we use weighted regression, applying the voom algorithm to compute weights from the mean–variance relationship of the data. Contrasts of each condition are then computed compared to the selected baseline. In some situations, Gemma will split the data into subsets for analysis. A typical such situation is when a ‘batch’ factor is present and confounded with another factor, the subsets being determined by the levels of the confounding factor.

Examples

get_differential_expression_values("GSE2018")
#> $`573187`
#>              Probe NCBIid GeneSymbol
#>             <char> <char>     <char>
#>     1:   210279_at   2841      GPR18
#>     2: 204529_s_at   9760        TOX
#>     3:   202746_at   9452      ITM2A
#>     4:   206761_at  10225       CD96
#>     5:   204352_at   7188      TRAF5
#>    ---                              
#> 18011: 218682_s_at  22950   SLC4A1AP
#> 18012: 202853_s_at   6259        RYK
#> 18013: 203293_s_at   3998      LMAN1
#> 18014:   212708_at 339287       MSL1
#> 18015: 201060_x_at   2040       STOM
#>                                                      GeneName    pvalue
#>                                                        <char>     <num>
#>     1:                          G protein-coupled receptor 18 4.108e-15
#>     2: thymocyte selection associated high mobility group box 1.243e-14
#>     3:                           integral membrane protein 2A 5.784e-12
#>     4:                                          CD96 molecule 9.116e-12
#>     5:                       TNF receptor associated factor 5 1.156e-11
#>    ---                                                                 
#> 18011:       solute carrier family 4 member 1 adaptor protein 9.995e-01
#> 18012:                          receptor like tyrosine kinase 9.995e-01
#> 18013:                              lectin, mannose binding 1 9.998e-01
#> 18014:                                  MSL complex subunit 1 9.997e-01
#> 18015:                                               stomatin 9.999e-01
#>        corrected_pvalue      rank contrast_1_coefficient contrast_1_log2fc
#>                   <num>     <num>                  <num>             <num>
#>     1:        9.153e-11 4.488e-05             9.6010e-01        9.6010e-01
#>     2:        1.385e-10 8.975e-05             6.4080e-01        6.4080e-01
#>     3:        4.296e-08 1.000e-04             1.7128e+00        1.7128e+00
#>     4:        5.079e-08 2.000e-04             5.5470e-01        5.5470e-01
#>     5:        5.152e-08 2.000e-04             9.8710e-01        9.8710e-01
#>    ---                                                                    
#> 18011:        9.998e-01 9.997e-01            -5.8720e-05       -5.8720e-05
#> 18012:        9.998e-01 9.998e-01            -5.2710e-05       -5.2710e-05
#> 18013:        9.998e-01 1.000e+00            -3.6000e-05       -3.6000e-05
#> 18014:        9.998e-01 9.999e-01             2.9150e-05        2.9150e-05
#> 18015:        9.999e-01 1.000e+00             6.6370e-06        6.6370e-06
#>        contrast_1_tstat contrast_1_pvalue
#>                   <num>             <num>
#>     1:      13.37250000         4.219e-15
#>     2:      12.86830000         1.243e-14
#>     3:      10.27070000         5.784e-12
#>     4:      10.09150000         9.116e-12
#>     5:       9.99860000         1.156e-11
#>    ---                                   
#> 18011:      -0.00060000         9.995e-01
#> 18012:      -0.00060000         9.995e-01
#> 18013:      -0.00030000         9.998e-01
#> 18014:       0.00040000         9.997e-01
#> 18015:       0.00007557         9.999e-01
#>