gsr.Rd
The goal of this method is the same as for ora
: to
provide a p value for each gene set. The key difference lies in that ORA
requires that you select a threshold for “gene selection”, whereas GSR does
not.
GSR uses all the gene scores for the genes in a gene set to produce a score. This means that genes that do not meet a statistical threshold for selection can contribute to the score. In addition, more information contained in the gene scores is preserved than in ORA, because ORA is essentially rank-based, whereas GSR uses the gene scores themselves. (The precision-recall method is even more close to the ORA method).
In practice, ORA and GSR can yield similar results; however, we have found that GSR tends to be more robust than ORA (because there is no threshold to set) and can give interesting results in situations where ORA doesn’t work as well (when no genes meet the predetermined selection threshold).
GSR is appropriate when you have reasonably good trust in the values of your gene scores, as opposed to the ranking. Its key advantage over ORA is that you do not have to set a threshold gene score. If you trust the ranking of gene scores more, the precision-recall or ROC methods might be useful.
Method overview taken from: http://erminej.msl.ubc.ca/help/tutorials/running-an-analysis-resampling/
gsr(
scores,
scoreColumn = 1,
bigIsBetter = FALSE,
logTrans = FALSE,
annotation = NULL,
aspects = c("Molecular Function", "Cellular Component", "Biological Process"),
iterations = 10000,
geneReplicates = c("mean", "best"),
pAdjust = c("FDR", "Bonferroni"),
geneSetDescription = "Latest_GO",
customGeneSets = NULL,
minClassSize = 20,
maxClassSize = 200,
output = NULL,
return = TRUE
)
A data.frame. Rownames have to be gene identifiers (eg. probes,
must be unique), followed by any number of columns. The column used for
scoring is chosen by scoreColumn
. See
http://erminej.msl.ubc.ca/help/input-files/gene-scores/ for
information abot how to specify scores. (for test = ORA, GSR and ROC)
Integer or character. Which column of the scores
data.frame
to use as scores. Defaults to first column of scores
. See
http://erminej.msl.ubc.ca/help/input-files/gene-scores/ for details.
(for test = ORA, GSR and ROC)
Logical. If TRUE large scores are considered to be higher.
FALSE
by default (as in p values).
Logical. Should the data be -log10 transformed. Recommended for
p values. FALSE
by default
Annotation. A file path, a data.frame or a platform short
name (eg. GPL127). If given a platform short name it will be downloaded
from annotation repository of Pavlidis Lab (https://gemma.msl.ubc.ca/annots/).
To get a list of available annotations, use listGemmaAnnotations
.
Note that if there is a file or folder with the same name as the platform
name in the directory, that file will be read instead of getting a copy from
Pavlidis Lab. If this file isn't a valid annotation file, the function will fail.
If providing a custom annotation file, see makeAnnotation
to do it from
R or erminej.msl.ubc.ca/help/input-files/gene-annotations/ to do it manually.
If you are providing a custom gene set, you can leave annotation as NULL
Character vector. Which Go aspects to include in the analysis.
Can be in long form (eg. 'Molecular Function') or short form (eg. c('M','C','B')
)
Number of iterations. We suggest a starting value of 10000 iterations. When you decide on parameters you like, we recommend a larger number of iterations (perhaps 200,000 or more). This is to get sufficient precision in the p-values to make multiple-test correction work correctly. (test = GSR CORR and precRecall methods only)
What to do when genes have multiple scores in input file (due to multiple probes per gene)
Which multiple test correction method to use. Can be "FDR" or 'Westfall-Young' (slower).
"Latest_GO", a file path that leads to a GO XML or OBO file or a URL that leads to a go ontology file that ends with rdf-xml.gz.
If you left annotation as NULL and provided customGeneSets, this argument is
not required and will default to NULL. Otherwise, by default it'll be set to
"Latest_GO" which downloads the latest available GO XML file. This option won't work
without an internet connection. To get a frozen file
that you can use later, see goToday
, goAtDate
and getGoDates
.
See http://erminej.msl.ubc.ca/help/input-files/gene-set-descriptions/
for details.
Path to a directory that contains custom gene set files, paths to custom gene set files themselves or a named list of character strings. Use this option to create your own gene sets. If you provide directory you can specify probes or gene symbols to include in your gene sets. See http://erminej.msl.ubc.ca/help/input-files/gene-sets/ for information about format for this file. If you are providing a list, only gene symbols are accepted.
minimum class size
maximum class size
Output file name.
If results should be returned. Set to FALSE if you only want a file
A list containing a "results" component and a "details" component. "results" is a data.frame containing the main output. The columns of this table are
Name
: the name of the gene set
ID
: the id of the gene set
NumProbes
: the number of elements (e.g. probes) in the gene set.
NumGenes
: the number of genes in the gene set.
RawScore
: the raw statistic for the gene set. For explanations see this page
Pval
: the p value for the gene set.
CorrectedPvalue
: the corrected p pvalue. See this page for more information.
MFPvalue
: pvalue after multifunctionality correction. Might be missing if correction was not performed.
CorrectedMFPvalue
: Like CorrectedPvalue, but for the multifunctionality “corrected” pvalue.
Multifunctionality
: How biased the genes in the set are towards multifunctional genes.
Same as
: a list of gene sets which have the exact same members as this one. Such gene sets are not listed anywhere else.
GeneMembers
: If you selected the “Include genes” option when saving, this will contain a list of the genes that are in the gene set, separated by “|”.
"details" section contain settings that were used to run the analysis.