Merge GRanges objects

Merges 2 or more GRanges objects by combining all of their ranges and associated signal (e.g. readcounts). If multiplex = TRUE, the input datasets are reversibly combined into a multiplexed GRanges containing a field for each input dataset.

mergeGRangesData(
  ...,
  field = "score",
  multiplex = FALSE,
  makeBRG = TRUE,
  exact_overlaps = FALSE,
  ncores = getOption("mc.cores", 2L)
)

Arguments

...: Any number of GRanges objects in which signal (e.g. readcounts) are contained within metadata. Lists of GRanges can also be passed, but they must be named lists if multiplex = TRUE. Multiple lists can be passed, but if any inputs are lists, then all inputs must be lists.
field: One or more input metadata fields to be combined, typically the "score" field. Fields typically contain coverage information. If only a single field is given (i.e. all input GRanges use the same field), that same field will be used for the output. Otherwise, the "score" metadata field will be used by default. The output metadata fields are different if multiplex is enabled.
multiplex: When set to FALSE (the default), input GRanges are merged irreversibly into a single new GRange, effectively combining the reads from different experiments. When multiplex = TRUE, the input GRanges data are reversibly combined into a multiplexed GRanges object, such that each input GRanges object has its own metadata field in the output.
makeBRG: If TRUE (the default), the output GRanges will be made "basepair-resolution" via makeGRangesBRG. This is always set to codeFALSE if exact_overlaps = TRUE.
exact_overlaps: By default (FALSE), any ranges that cover more than a single base are treated as "coverage" signal (see getCountsByRegions). If exact_overlaps = TRUE, all input ranges are conserved exactly as they are; ranges are only combined if they overlap exactly, and the signal of any combined range is the sum of the input ranges that were merged.
ncores: Number of cores to use for computations.

Value

A disjoint, basepair-resolution (single-width) GRanges object comprised of all ranges found in the input GRanges objects.

If multiplex = FALSE, single fields from each input are combined into a single field in the output, the total signal of which is the sum of all input GRanges.

If multiplex = TRUE, each field of the output corresponds to an input GRanges object.

Subsetting a multiplexed GRanges object

If multiplex = TRUE, the datasets are only combined into a single object, but the data themselves are not combined. To subset field_i, corresponding to input dataset_i:

multi.gr <- mergeGRangesData(gr1, gr2, multiplex = TRUE) subset(multi.gr, gr1 != 0, select = gr1) # select gr1

Author

Mike DeBerardine

Examples

data("PROseq") # load included PROseq data

#--------------------------------------------------#
# divide & recombine PROseq (no overlapping positions)
#--------------------------------------------------#

thirds <- floor( (1:3)/3 * length(PROseq) )
ps_1 <- PROseq[1:thirds[1]]
ps_2 <- PROseq[(thirds[1]+1):thirds[2]]
ps_3 <- PROseq[(thirds[2]+1):thirds[3]]

# re-merge
length(PROseq)
#> [1] 47380
length(ps_1)
#> [1] 15793
length(mergeGRangesData(ps_1, ps_2, ncores = 1))
#> [1] 31586
length(mergeGRangesData(ps_1, ps_2, ps_3, ncores = 1))
#> [1] 47380

#--------------------------------------------------#
# combine PRO-seq with overlapping positions
#--------------------------------------------------#

gr1 <- PROseq[10:13]
gr2 <- PROseq[12:15]

PROseq[10:15]
#> GRanges object with 6 ranges and 1 metadata column:
#>       seqnames    ranges strand |     score
#>          <Rle> <IRanges>  <Rle> | <integer>
#>   [1]     chr4     42618      + |         1
#>   [2]     chr4     42619      + |         2
#>   [3]     chr4     42621      + |         1
#>   [4]     chr4     42622      + |         2
#>   [5]     chr4     42652      + |         3
#>   [6]     chr4     42657      + |         1
#>   -------
#>   seqinfo: 7 sequences from an unspecified genome

mergeGRangesData(gr1, gr2, ncores = 1)
#> GRanges object with 6 ranges and 1 metadata column:
#>       seqnames    ranges strand |     score
#>          <Rle> <IRanges>  <Rle> | <integer>
#>   [1]     chr4     42618      + |         1
#>   [2]     chr4     42619      + |         2
#>   [3]     chr4     42621      + |         2
#>   [4]     chr4     42622      + |         4
#>   [5]     chr4     42652      + |         3
#>   [6]     chr4     42657      + |         1
#>   -------
#>   seqinfo: 7 sequences from an unspecified genome

#--------------------------------------------------#
# multiplex separate PRO-seq experiments
#--------------------------------------------------#

multi.gr <- mergeGRangesData(gr1, gr2, multiplex = TRUE, ncores = 1)
multi.gr
#> GRanges object with 6 ranges and 2 metadata columns:
#>       seqnames    ranges strand |       gr1       gr2
#>          <Rle> <IRanges>  <Rle> | <integer> <integer>
#>   [1]     chr4     42618      + |         1         0
#>   [2]     chr4     42619      + |         2         0
#>   [3]     chr4     42621      + |         1         1
#>   [4]     chr4     42622      + |         2         2
#>   [5]     chr4     42652      + |         0         3
#>   [6]     chr4     42657      + |         0         1
#>   -------
#>   seqinfo: 7 sequences from an unspecified genome

#--------------------------------------------------#
# subset a multiplexed GRanges object
#--------------------------------------------------#

subset(multi.gr, gr1 > 0)
#> GRanges object with 4 ranges and 2 metadata columns:
#>       seqnames    ranges strand |       gr1       gr2
#>          <Rle> <IRanges>  <Rle> | <integer> <integer>
#>   [1]     chr4     42618      + |         1         0
#>   [2]     chr4     42619      + |         2         0
#>   [3]     chr4     42621      + |         1         1
#>   [4]     chr4     42622      + |         2         2
#>   -------
#>   seqinfo: 7 sequences from an unspecified genome

subset(multi.gr, gr1 > 0, select = gr1)
#> GRanges object with 4 ranges and 1 metadata column:
#>       seqnames    ranges strand |       gr1
#>          <Rle> <IRanges>  <Rle> | <integer>
#>   [1]     chr4     42618      + |         1
#>   [2]     chr4     42619      + |         2
#>   [3]     chr4     42621      + |         1
#>   [4]     chr4     42622      + |         2
#>   -------
#>   seqinfo: 7 sequences from an unspecified genome

Arguments

Value

Subsetting a multiplexed GRanges object

See also

Author

Examples