Parse a Dorado BAM file with MM/ML base modification tags
Source:R/parse_dorado.R
dot-parseDorado.RdReads a Dorado-aligned BAM file containing MM (base modification) and
ML (modification likelihood) tags, and aggregates per-read modification
probabilities into per-site beta values. This is an internal function called
by commaData when caller = "dorado".
Arguments
- file
Character string. Path to the Dorado-aligned BAM file. An accompanying
.baiindex file must exist (required byRsamtools).- sample_name
Character string. Sample name (used in messages only).
- mod_type
Character vector or
NULL. If provided, only sites with a matching modification type are returned.NULLretains all types.- min_coverage
Integer. Minimum read depth required to retain a site. Default
5L.
Value
A data.frame with columns:
chromChromosome name (character).
position1-based genomic position (integer).
strandStrand,
"+"or"-"(character).mod_typeModification type:
"6mA","5mC", or"4mC"(character).betaProportion of reads called methylated, range 0–1 (numeric).
coverageTotal read depth at this site (integer).
Details
The function reads the BAM using scanBam and
processes the MM/ML tags from each aligned read. The MM tag encodes which
bases carry modifications and their positions within the read sequence
(as inter-base offsets); the parallel ML tag provides modification
probabilities (0–255 scaled to 0–1).
A base is called modified when its ML probability is \(> 0.5\). Per-site statistics are aggregated by counting modified reads (\(n_{\text{mod}}\)) and total reads at each genomic position (\(n_{\text{total}}\)). The beta value is \(\beta = n_{\text{mod}} / n_{\text{total}}\).
CIGAR operations are used to map read positions to reference coordinates.
Soft-clipped (S) and hard-clipped (H) bases are excluded;
insertions (I) consume read positions without advancing the reference.
Recommended workflow: For most users, it is simpler to first run
modkit pileup on the Dorado BAM, then load the resulting BED file
with caller = "modkit". Direct BAM parsing is provided for users who
prefer to avoid the modkit step.