R/dataset_functions.R
makeGRangesBRG.Rd
makeGRangesBRG
splits up all ranges in dataset.gr
to be each 1
basepair wide. For any range that is split up, all metadata information
belonging to that range is inherited by its daughter ranges, and therefore
the transformation is non-destructive. isBRG
checks whether an object
is a basepair resolution GRanges object.
makeGRangesBRG(dataset.gr, ncores = getOption("mc.cores", 2L))
isBRG(x)
A disjoint GRanges object, or a list of such objects.
If dataset.gr
is a list, the number of cores to use for
computations.
Object to be tested.
makeGRangesBRG
returns a GRanges object for which
length(output) == sum(width(dataset.gr))
, and for which
all(width(output) == 1)
.
isBRG(x)
returns TRUE
if x
is a GRanges object with
the above characteristics.
Note that makeGRangesBRG
doesn't perform any transformation
on the metadata in the input. This function assumes that for an input
GRanges object, any metadata for each range is equally correct when
inherited by each individual base in that range. In other words, the
dataset's "signal" (usually readcounts) fundamentally belongs to a single
basepair position.
The motivating case for this function is a bigWig file
(e.g. one imported by rtracklayer
), as bigWig files typically use
run-length compression on the data signal (the 'score' column), such that
adjacent bases sharing the same signal are combined into a single range. As
basepair-resolution genomic data is typically sparse, this compression has
a minimal impact on memory usage, and removing it greatly enhances data
handling as each index (each range) of the GRanges object corresponds to a
single genomic position.
If working
with a GRanges object containing whole reads, one can obtain base-pair
resolution information by using the strand-specific function
GenomicRanges::resize
to
select a single base from each read: set width = 1
and use the
fix
argument to choose the strand-specific 5' or 3' end. Then,
strand-specific coverage can be calculated using
getStrandedCoverage
.
The
GPos
class is a more suitable
container for data of this type, as the GPos class is specific to 1-bp-wide
ranges. However, in early testing, we encountered some kind of
compatibility limitations with the newer GPos class, and have not re-tested
it since. If you have feedback on switching to this class, please contact
the author. Users can readily coerce a basepair-resolution GRanges object
to a GPos object via gp <- GPos(gr, score = score(gr))
.
if (.Platform$OS.type == "unix") {
#--------------------------------------------------#
# Make a bigWig file single width
#--------------------------------------------------#
# get local address for an included bigWig file
bw_file <- system.file("extdata", "PROseq_dm6_chr4_plus.bw",
package = "BRGenomics")
# BRGenomics::import_bigWig automatically applies makeGRangesBRG;
# therefore will import using rtracklayer
bw <- rtracklayer::import.bw(bw_file)
strand(bw) <- "+"
range(width(bw))
length(bw)
# make basepair-resolution (single-width)
gr <- makeGRangesBRG(bw)
isBRG(gr)
range(width(gr))
length(gr)
length(gr) == sum(width(bw))
sum(score(gr)) == sum(score(bw) * width(bw))
#--------------------------------------------------#
# Reverse using getStrandedCoverage
#--------------------------------------------------#
# -> for more examples, see getStrandedCoverage
undo <- getStrandedCoverage(gr, ncores = 1)
isBRG(undo)
range(width(undo))
length(undo) == length(bw)
all(score(undo) == score(bw))
}
#> Warning: longer object length is not a multiple of shorter object length
#> [1] FALSE