This function calls DESeq2::DESeq
and
DESeq2::results
on a pre-existing
DESeqDataSet
object and returns a DESeqResults
table for one or
more pairwise comparisons. However, unlike a standard call to
DESeq2::results
using the contrast
argument, this function
subsets the dataset so that DESeq2 only estimates dispersion for the samples
being compared, and not for all samples present.
getDESeqResults(
dds,
contrast.numer,
contrast.denom,
comparisons = NULL,
sizeFactors = NULL,
alpha = 0.1,
lfcShrink = FALSE,
args.DESeq = NULL,
args.results = NULL,
args.lfcShrink = NULL,
ncores = getOption("mc.cores", 2L),
quiet = FALSE
)
A DESeqDataSet object, produced using either
getDESeqDataSet
from this package
or DESeqDataSet
from DESeq2
. If
dds
was not created using getDESeqDataSet
, dds
must be
made with design = ~condition
such that a unique condition
level exists for each sample/treatment condition.
A string naming the condition
to use as the
numerator in the DESeq2 comparison, typically the perturbative condition.
A string naming the condition
to use as the
denominator in the DESeq2 comparison, typically the control condition.
As an optional alternative to supplying a single
contrast.numer
and contrast.denom
, users can supply a list of
character vectors containing numerator-denominator pairs, e.g.
list(c("B", "A"), c("C", "A"), c("C", "B"))
. comparisons
can
also be a dataframe in which each row is a comparison, the first column
contains the numerators, and the second column contains the denominators.
A vector containing DESeq2 sizeFactors
to apply to
each sample. Each sample's readcounts are divided by its respective
DESeq2 sizeFactor
. A warning will be generated if the
DESeqDataSet
already contains sizeFactors
, and the previous
sizeFactors
will be over-written.
The significance threshold passed to DESeqResults
, which
is used for independent filtering of results (see DESeq2 documentation).
Logical indicating if log2FoldChanges and their standard
errors should be shrunk using lfcShrink
.
LFC shrinkage is very useful for making fold-change values meaningful, as
low-expression/high variance genes are given low fold-changes.
Set to FALSE
by default.
Additional arguments passed to
DESeq
, given as a list of argument-value pairs,
e.g. list(fitType = "local", useT = TRUE)
. All arguments given here
will be passed to DESeq
except for object
and
parallel
. If no arguments are given, all defaults will be used.
Additional arguments passed to
DESeq2::results, given as a list of argument-value
pairs, e.g. list(altHypothesis = "greater", lfcThreshold = 1.5)
. All
arguments given here will be passed to results
except for
object
, contrast
, alpha
, and parallel
. If no
arguments are given, all defaults will be used.
Additional arguments passed to
lfcShrink
, given as a list of
argument-value pairs. All arguments given here will be passed to
lfcShrink
except for dds
, coef
, contrast
, and
parallel
. If no arguments are given, all defaults will be used.
The number of cores to use for parallel processing. Multicore
processing is only used if more than one comparison is being made (i.e.
argument comparisons
is used), and the number of cores utilized will
not be greater than the number of comparisons being performed.
If TRUE
, all output messages from calls to DESeq
and results
will be suppressed, although passing option quiet
in args.DESeq
will supersede this option for the call to
DESeq
.
For a single comparison, the output is the DESeqResults
result
table. If a comparisons
is used to make multiple comparisons, the
output is a named list of DESeqResults
objects, with elements named
following the pattern "X_vs_Y"
, where X
is the name of the
numerator condition, and Y
is the name of the denominator condition.
ncores > 1
If this function returns an error,
set ncores = 1
. Whether or not this occurs can depend on whether
users are using alternative BLAS libraries (e.g. OpenBLAS or Apple's
Accelerate framework) and/or how DESeq2 was installed. This is because some
DESeq2 functions (e.g.
nbinomWaldTest
) use C code that can be compiled to use parallelization,
and this conflicts with our use of process forking (via the
parallel package
) when
ncores > 1
.
#--------------------------------------------------#
# getDESeqDataSet
#--------------------------------------------------#
suppressPackageStartupMessages(require(DESeq2))
data("PROseq") # import included PROseq data
data("txs_dm6_chr4") # import included transcripts
# divide PROseq data into 6 toy datasets
ps_a_rep1 <- PROseq[seq(1, length(PROseq), 6)]
ps_b_rep1 <- PROseq[seq(2, length(PROseq), 6)]
ps_c_rep1 <- PROseq[seq(3, length(PROseq), 6)]
ps_a_rep2 <- PROseq[seq(4, length(PROseq), 6)]
ps_b_rep2 <- PROseq[seq(5, length(PROseq), 6)]
ps_c_rep2 <- PROseq[seq(6, length(PROseq), 6)]
ps_list <- list(A_rep1 = ps_a_rep1, A_rep2 = ps_a_rep2,
B_rep1 = ps_b_rep1, B_rep2 = ps_b_rep2,
C_rep1 = ps_c_rep1, C_rep2 = ps_c_rep2)
# make flawed dataset (ranges in txs_dm6_chr4 not disjoint)
# this means there is double-counting
# also using discontinuous gene regions, as gene_ids are repeated
dds <- getDESeqDataSet(ps_list,
txs_dm6_chr4,
gene_names = txs_dm6_chr4$gene_id,
ncores = 1)
dds
#> class: DESeqDataSet
#> dim: 111 6
#> metadata(1): version
#> assays(1): counts
#> rownames(111): FBgn0267363 FBgn0266617 ... FBgn0039924 FBgn0027101
#> rowData names(2): tx_name gene_id
#> colnames(6): A_rep1 A_rep2 ... C_rep1 C_rep2
#> colData names(2): condition replicate
#--------------------------------------------------#
# getDESeqResults
#--------------------------------------------------#
res <- getDESeqResults(dds, "B", "A")
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
res
#> log2 fold change (MLE): condition B vs A
#> Wald test p-value: condition B vs A
#> DataFrame with 111 rows and 6 columns
#> baseMean log2FoldChange lfcSE stat pvalue
#> <numeric> <numeric> <numeric> <numeric> <numeric>
#> FBgn0267363 0.254249 -1.4228145 4.997084 -0.28472898 0.775852
#> FBgn0266617 10.800541 0.1563201 1.033936 0.15118944 0.879826
#> FBgn0265633 2.057706 0.0175256 2.125748 0.00824443 0.993422
#> FBgn0264617 66.127353 0.2087670 0.467572 0.44649198 0.655242
#> FBgn0085432 4678.846158 -0.2333084 0.236723 -0.98557490 0.324342
#> ... ... ... ... ... ...
#> FBgn0039928 685.650 0.126857 0.258212 0.491292 0.623220
#> FBgn0052017 130.983 0.239103 0.418935 0.570741 0.568175
#> FBgn0002521 174.375 -0.110680 0.342843 -0.322829 0.746825
#> FBgn0039924 161.131 0.322486 0.456555 0.706346 0.479973
#> FBgn0027101 361.983 0.233889 0.294518 0.794142 0.427113
#> padj
#> <numeric>
#> FBgn0267363 0.997413
#> FBgn0266617 0.997413
#> FBgn0265633 0.997413
#> FBgn0264617 0.997413
#> FBgn0085432 0.997413
#> ... ...
#> FBgn0039928 0.997413
#> FBgn0052017 0.997413
#> FBgn0002521 0.997413
#> FBgn0039924 0.997413
#> FBgn0027101 0.997413
reslist <- getDESeqResults(dds, comparisons = list(c("B", "A"), c("C", "A")),
ncores = 1)
names(reslist)
#> [1] "B_vs_A" "C_vs_A"
reslist$B_vs_A
#> log2 fold change (MLE): condition B vs A
#> Wald test p-value: condition B vs A
#> DataFrame with 111 rows and 6 columns
#> baseMean log2FoldChange lfcSE stat pvalue
#> <numeric> <numeric> <numeric> <numeric> <numeric>
#> FBgn0267363 0.254249 -1.4228145 4.997084 -0.28472898 0.775852
#> FBgn0266617 10.800541 0.1563201 1.033936 0.15118944 0.879826
#> FBgn0265633 2.057706 0.0175256 2.125748 0.00824443 0.993422
#> FBgn0264617 66.127353 0.2087670 0.467572 0.44649198 0.655242
#> FBgn0085432 4678.846158 -0.2333084 0.236723 -0.98557490 0.324342
#> ... ... ... ... ... ...
#> FBgn0039928 685.650 0.126857 0.258212 0.491292 0.623220
#> FBgn0052017 130.983 0.239103 0.418935 0.570741 0.568175
#> FBgn0002521 174.375 -0.110680 0.342843 -0.322829 0.746825
#> FBgn0039924 161.131 0.322486 0.456555 0.706346 0.479973
#> FBgn0027101 361.983 0.233889 0.294518 0.794142 0.427113
#> padj
#> <numeric>
#> FBgn0267363 0.997413
#> FBgn0266617 0.997413
#> FBgn0265633 0.997413
#> FBgn0264617 0.997413
#> FBgn0085432 0.997413
#> ... ...
#> FBgn0039928 0.997413
#> FBgn0052017 0.997413
#> FBgn0002521 0.997413
#> FBgn0039924 0.997413
#> FBgn0027101 0.997413
# or using a dataframe
reslist <- getDESeqResults(dds, comparisons = data.frame(num = c("B", "C"),
den = c("A", "A")),
ncores = 1)
reslist$B_vs_A
#> log2 fold change (MLE): condition B vs A
#> Wald test p-value: condition B vs A
#> DataFrame with 111 rows and 6 columns
#> baseMean log2FoldChange lfcSE stat pvalue
#> <numeric> <numeric> <numeric> <numeric> <numeric>
#> FBgn0267363 0.254249 -1.4228145 4.997084 -0.28472898 0.775852
#> FBgn0266617 10.800541 0.1563201 1.033936 0.15118944 0.879826
#> FBgn0265633 2.057706 0.0175256 2.125748 0.00824443 0.993422
#> FBgn0264617 66.127353 0.2087670 0.467572 0.44649198 0.655242
#> FBgn0085432 4678.846158 -0.2333084 0.236723 -0.98557490 0.324342
#> ... ... ... ... ... ...
#> FBgn0039928 685.650 0.126857 0.258212 0.491292 0.623220
#> FBgn0052017 130.983 0.239103 0.418935 0.570741 0.568175
#> FBgn0002521 174.375 -0.110680 0.342843 -0.322829 0.746825
#> FBgn0039924 161.131 0.322486 0.456555 0.706346 0.479973
#> FBgn0027101 361.983 0.233889 0.294518 0.794142 0.427113
#> padj
#> <numeric>
#> FBgn0267363 0.997413
#> FBgn0266617 0.997413
#> FBgn0265633 0.997413
#> FBgn0264617 0.997413
#> FBgn0085432 0.997413
#> ... ...
#> FBgn0039928 0.997413
#> FBgn0052017 0.997413
#> FBgn0002521 0.997413
#> FBgn0039924 0.997413
#> FBgn0027101 0.997413