Parse a Megalodon per-read methylation file into a tidy per-site data frame
Source:R/parse_megalodon.R
dot-parseMegalodon.RdReads a Megalodon per-read modification output file, aggregates per-read
calls to per-site beta values and coverage, and returns a tidy data frame
compatible with the commaData constructor. This is an internal
function called when caller = "megalodon".
Arguments
- file
Character string. Path to the Megalodon per-read BED file.
- sample_name
Character string. Sample name (used in messages).
- mod_type
Character string or
NULL. Modification type to assign to all sites (e.g.,"6mA"). Megalodon files are modification- type-specific, so the type cannot be auto-detected from the file alone. IfNULL, defaults to"6mA"with a warning.- min_coverage
Integer. Minimum read depth. Default
5.
Details
Megalodon per-read output (modified_bases.5mC.bed or similar) has
the format:
chrom start end read_id score strand ... mod_prob
where mod_prob is the per-read probability of modification. This
function aggregates across reads at each site by computing:
beta= mean of per-read probabilities at each sitecoverage= number of reads overlapping each site
Sites with coverage < min_coverage are dropped.