R/signal_counting.R
getPausingIndices.Rd
Pausing index (PI) is calculated for each gene (within matched
promoters.gr
and genebodies.gr
) as promoter-proximal (or pause
region) signal counts divided by genebody signal counts. If
length.normalize = TRUE
(recommended), the signal counts within each
range in promoters.gr
and genebodies.gr
are divided by their
respective range widths (region lengths) before pausing indices are
calculated.
getPausingIndices(
dataset.gr,
promoters.gr,
genebodies.gr,
field = "score",
length.normalize = TRUE,
remove.empty = FALSE,
blacklist = NULL,
melt = FALSE,
region_names = NULL,
expand_ranges = FALSE,
ncores = getOption("mc.cores", 2L)
)
A GRanges object in which signal is contained in metadata (typically in the "score" field), or a named list of such GRanges objects.
A GRanges object containing promoter-proximal regions of interest.
A GRanges object containing genebody regions of interest.
The metadata field of dataset.gr
to be counted. If
length(field) > 1
, a dataframe is returned containing the pausing
indices for each region in each field. If field
not found in
names(mcols(dataset.gr))
, will default to using all fields found in
dataset.gr
. If dataset.gr
is a list, a single field
should be given, or length(field)
should be the equal to the number
of datasets in dataset.gr
.
A logical indicating if signal counts within regions
of interest should be length normalized. The default is TRUE
, which
is recommended, especially if input regions don't all have the same width.
A logical indicating if genes without any signal in
promoters.gr
should be removed. No genes are filtered by default. If
dataset.gr
is a list of datasets, or if length(field) > 1
,
regions are filtered unless they have promoter signal in all datasets.
An optional GRanges object containing regions that should be
excluded from signal counting. If length.normalize = TRUE
,
blacklisted positions will be excluded from length calculations. Users
should take care to note if regions of interest substantially overlap
blacklisted positions.
If melt = TRUE
, a dataframe is returned containing a
column for regions and another column for pausing indices. If multiple
datasets are given (if dataset.gr
is a list or if
length(field) > 1
), the output dataframe is melted to contain a
third column indicating the sample names. (See section on return values
below).
If melt = TRUE
, an optional vector of names for
the regions in regions.gr
. If left as NULL
, indices of
regions.gr
are used instead.
Logical indicating if ranges in dataset.gr
should
be treated as descriptions of single molecules (FALSE
), or if ranges
should be treated as representing multiple adjacent positions with the same
signal (TRUE
). See
getCountsByRegions
.
Multiple cores will only be used if dataset.gr
is a list
of multiple datasets, or if length(field) > 1
.
A vector parallel to the input genelist, unless remove.empty =
TRUE
, in which case the vector may be shorter. If dataset.gr
is a
list, or if length(field) > 1
, a dataframe is returned, containing a
column for each field. However, if melt = TRUE
, dataframes contain
one column to indicate regions (either by their indices, or by
region_names
, if given), another column to indicate signal, and a
third column containing the sample name (unless dataset.gr
is a
single GRanges object).
data("PROseq") # load included PROseq data
data("txs_dm6_chr4") # load included transcripts
#--------------------------------------------------#
# Get promoter-proximal and genebody regions
#--------------------------------------------------#
# genebodies from +300 to 300 bp before the poly-A site
gb <- genebodies(txs_dm6_chr4, 300, -300, min.window = 400)
# get the transcripts that are large enough (>1kb in size)
txs <- subset(txs_dm6_chr4, tx_name %in% gb$tx_name)
# for the same transcripts, promoter-proximal region from 0 to +100
pr <- promoters(txs, 0, 100)
#--------------------------------------------------#
# Calculate pausing indices
#--------------------------------------------------#
pidx <- getPausingIndices(PROseq, pr, gb)
length(txs)
#> [1] 318
length(pidx)
#> [1] 318
head(pidx)
#> [1] 0.00000 3.50500 0.00000 0.00000 29.34224 28.41707
#--------------------------------------------------#
# Without length normalization
#--------------------------------------------------#
head( getPausingIndices(PROseq, pr, gb, length.normalize = FALSE) )
#> [1] 0.00000000 0.50000000 0.00000000 0.00000000 0.09316770 0.07329843
#--------------------------------------------------#
# Removing empty means the values no longer match the genelist
#--------------------------------------------------#
pidx_signal <- getPausingIndices(PROseq, pr, gb, remove.empty = TRUE)
length(pidx_signal)
#> [1] 238