API Reference¶
consenrich.core¶
Note
Many parameters do not require tuning in practice but are listed here for completeness.
Notation¶
For interval \(i\) and replicate \(j\):
\(y_{[j,i]}\) is the observed track value
\(\mathbf{x}_{[i]} = (x_{[i,0]}, x_{[i,1]})^\top\) is the latent level/slope state
\(g_{[i]}\) is the shared zero-centered background
\(b_j\) and \(a_j\) are replicate bias and observation-scale terms
\(v_{[j,i]}\) is the plugin observation variance track derived from the given data
\(b(i)\) maps interval \(i\) to a block index
\(q_b\) is the corresponding process-noise scale
\(\lambda_{[j,i]}\) and \(\kappa_{[i]}\) are precision weights
Model¶
Observations
Prior
The latent state vector \(\mathbf{x}_{[i]}\) evolves according to a first-order process with fat-tailed innovations:
Here \(g_{[i]}\) is a shared zero-centered smooth background. The outer loop updates \(g_{[i]}\) while the plugin observation-variance track stays fixed within each inner solve.
- class consenrich.core.inputParams(bamFiles, bamFilesControl, treatmentSources=None, controlSources=None)[source]¶
Parameters related to the input data for Consenrich.
- Parameters:
bamFiles (List[str]) – A list of paths to distinct coordinate-sorted and indexed BAM files.
bamFilesControl (List[str], optional) – A list of paths to distinct coordinate-sorted and indexed control BAM files. e.g., IgG control inputs for ChIP-seq.
treatmentSources (List[inputSource] | None) – Parsed treatment input sources
controlSources (List[inputSource] | None) – Parsed control input sources
- class consenrich.core.genomeParams(genomeName, chromSizesFile, blacklistFile, sparseBedFile, chromosomes, excludeChroms, excludeForNorm)[source]¶
Specify assembly-specific resources, parameters.
- Parameters:
genomeName (str) – If supplied, default resources for the assembly (sizes file, blacklist, and ‘sparse’ regions) in src/consenrich/data are used.
ce10, ce11, dm6, hg19, hg38, mm10, mm39have default resources available.chromSizesFile (str) – A two-column TSV file with chromosome names and sizes (in base pairs).
blacklistFile (str, optional) – A BED file with regions to exclude.
sparseBedFile (str, optional) – A BED file with ‘sparse regions’ that are mutually exclusive with or devoid of the targeted signal. Used to estimate noise levels. See
getMuncTrack().chromosomes (List[str]) – A list of chromosome names to analyze. If None, all chromosomes in chromSizesFile are used.
excludeChroms (List[str]) – A list of chromosome names to exclude from analysis.
excludeForNorm (List[str]) – A list of chromosome names to exclude when summing up the ‘effective genome size’ during normalization. This can be useful to avoid bias from poorly assembled, highly repetitive, and/or sex-specific chromosomes (e.g., chrM, chrUn, etc.). For reference, see effective genome size.
- class consenrich.core.countingParams(intervalSizeBP, backgroundBlockSizeBP, scaleFactors, scaleFactorsControl, normMethod, fragmentsGroupNorm, fixControl, globalWeight, logOffset, logMult)[source]¶
Parameters related to counting aligned reads
- Parameters:
intervalSizeBP (int) – Length (bp) of each genomic interval \(i=1\ldots n\) that comprise the larger genomic region (contig, chromosome, etc.) The default value is generally robust, but users may consider increasing this value when expected feature size is large and/or sequencing depth is low (less than \(\approx 5 \textsf{million}\), depending on assay).
backgroundBlockSizeBP (int) – Length (bp) of blocks used to estimate local statistics (background, noise, etc.). If a negative value is provided (default), this value is inferred from the data using
consenrich.core.getContextSize().normMethod (str) –
Method used to normalize read counts for sequencing depth / library size.
EGS: Effective Genome Size normalization (seeconsenrich.detrorm.getScaleFactor1x()) only appropriate for alignment coverage, not fragments pseudobulksSF: Median of ratios scale factors (seeconsenrich.cconsenrich.cSF()). Restricted to analyses with>= 3samples (no input control).RPKM/CPM: Scale factors based on emitted counts per million mapped units fragments pseudobulks use emitted insertions rather than raw fragment totals
fragmentsGroupNorm (str | None) – Optional extra normalization for fragments pseudobulks NONE keeps library-size scaling only and CELLS additionally divides by selected cell count
fixControl (bool, optional) – If True, treatment samples are not upscaled, and control samples are not downscaled.
globalWeight (float, optional) – Preprocessing centering weight. Any positive value applies subtraction of the dense centering offset estimated from the transformed coverage track, while non-positive values skip preprocessing centering entirely.
logOffset (float, optional) – A small constant added to read normalized counts before log-transforming (pseudocount). For example, \(\log(x + 1)\) for
logOffset = 1. Default is1.0.logMult (float, optional) – Multiplicative factor applied to log-scaled and normalized counts. For example, setting
logMult = 1 / \log(2)will yield log2-scaled counts after transformation, and settinglogMult = 1.0yields natural log-scaled counts.scaleFactors (List[float] | None)
scaleFactorsControl (List[float] | None)
- Seealso:
consenrich.cconsenrich.cTransform()
Treatment vs. Control Extension Lengths in Single-End Data
For single-end data, cross-correlation-based estimates for fragment length in control inputs can be biased due to a comparative lack of structure in strand-specific coverage tracks.
This can create artifacts during counting, so it is common to use the estimated treatment fragment length for both treatment and control samples. Consenrich resolves inferred control extension lengths from their paired treatments.
- class consenrich.core.scParams(barcodeTag='CB', defaultCountMode='cutsite', fragmentsGroupNorm='NONE', defaultFragmentPositionMode='insertionEndpoints')[source]¶
Parameters related to single-cell and fragments inputs
- Parameters:
barcodeTag (str | None) – Default alignment tag used to read cell barcodes from single-cell BAM inputs
defaultCountMode (str | None) – Default count mode for fragments inputs when a source does not set countMode
fragmentsGroupNorm (str | None) – Optional extra normalization for fragments pseudobulks NONE keeps library-size scaling only and CELLS additionally divides by selected cell count
defaultFragmentPositionMode (str | None) – Default fragments endpoint interpretation mode.
- class consenrich.core.processParams(deltaF=-1.0, minQ=0.00025, maxQ=1000.0, offDiagQ=0.0)[source]¶
Parameters related to the process model of Consenrich.
- Parameters:
deltaF (float) – Integration step size in the two-state transition \(x_{[i+1,0]} = x_{[i,0]} + \delta_F x_{[i,1]}\). If
deltaF < 0, the CLI centers a narrow search around0.5 * intervalSizeBP / medianFragmentLength.minQ (float) – Minimum process noise scale (diagonal in \(\mathbf{Q}_{[i]}\)) on the primary state variable (signal level). If
minQ < 0, a small value scales the minimum observation noise level (observationParams.minR) and is used for numerical stability.maxQ (float) – Maximum process noise scale. If
maxQ < 0, no effective upper bound is enforced.offDiagQ (float) – Off-diagonal value in the process noise covariance \(\mathbf{Q}_{[i,01]}\)
- Seealso:
- class consenrich.core.observationParams(minR, maxR, samplingIters, samplingBlockSizeBP, binQuantileCutoff, EB_minLin, EB_use, EB_setNu0, EB_setNuL, numNearest, restrictLocalAR1ToSparseBed, pad)[source]¶
Parameters related to the observation model of Consenrich.
The observation model supplies the plugin variance track used by the inner filter-smoother.
- Parameters:
minR (float | None) – Genome-wide lower bound for replicate-specific observation noise levels.
maxR (float | None) – Genome-wide upper bound for the replicate-specific observation noise levels.
samplingIters (int | None) – Number of blocks (within-contig) to sample while building the empirical absMean-variance trend in
consenrich.core.fitVarianceFunction().samplingBlockSizeBP (int | None) – Expected size (in bp) of contiguous blocks that are sampled when fitting AR1 parameters to estimate \((\lvert \mu_b \rvert, \sigma^2_b)\) pairs. Note, during sampling, each block’s size (unit: genomic intervals) is drawn from truncated \(\textsf{Geometric}(p=1/\textsf{samplingBlockSize})\) to reduce artifacts from fixed-size blocks. If None or ` < 1`, then this value is inferred using
consenrich.core.getContextSize().binQuantileCutoff (float | None) – When fitting the variance function, pairs \((\lvert \mu_b \rvert, \sigma^2_b)\) are binned by their (absolute) means. This parameter sets the quantile of variances within each bin to use when fitting the global mean-variance trend. Increasing this value toward 1.0 can raise the prior trend for observation noise levels and therefore yield stiffer signal estimates overall.
EB_minLin (float | None) – Require that the fitted trend in
consenrich.core.getMuncTrack()satisfy: \(\textsf{variance} \geq \textsf{minLin} \cdot |\textsf{mean}|\). SeefitVarianceFunction().EB_use (bool | None) – If True, shrink ‘local’ noise estimates to a prior trend dependent on amplitude. See
consenrich.core.getMuncTrack().EB_setNu0 (int | None) – If provided, manually set \(\nu_0\) to this value (rather than computing via
consenrich.core.EB_computePriorStrength()).EB_setNuL (int | None) – If provided, manually set local model df, \(\nu_L\), to this value.
numNearest (int | None) – If
> 0and an explicit sparse BED is supplied, estimate the local observation variance from the nearest sparse regions instead of the default rolling AR(1) local variance. In this sparse-nearest mode, the same nearest sparse blocks also define a signed local intercept track that is subtracted before fitting and evaluating the global mean-variance prior.restrictLocalAR1ToSparseBed (bool | None) – If True, and a sparse BED mask is supplied to
consenrich.core.getMuncTrack(), restrict the default rolling AR(1) local observation noise level estimates to windows fully contained in sparse BED regions. This only affects the local rolling AR(1) model, the global prior fit and sparse-nearest mode are unchanged.pad (float | None) – A small constant added to the observation noise variance estimates for conditioning
- Seealso:
consenrich.core.getMuncTrack(),consenrich.core.fitVarianceFunction(),consenrich.core.EB_computePriorStrength(),consenrich.cconsenrich.cinnerEM()
- class consenrich.core.stateParams(stateInit, stateCovarInit, boundState, stateLowerBound, stateUpperBound, effectiveInfoRescale, effectiveInfoBlockLengthBP)[source]¶
Parameters related to state variables and covariances.
- Parameters:
stateInit (float) – Initial value of the ‘primary’ state/signal at the first genomic interval: \(x_{[1]}\)
stateCovarInit (float) – Initial state covariance (covariance) scale. Note, the initial state uncertainty \(\mathbf{P}_{[1]}\) is a multiple of the identity matrix \(\mathbf{I}\). Final results are typically insensitive to this parameter, since the filter effectively ‘forgets’ its initialization after processing a moderate number of intervals and backward smoothing.
boundState (bool) – If True, the primary state estimate for \(x_{[i]}\) is reported within stateLowerBound and stateUpperBound. Note that the internal filtering is unaffected.
stateLowerBound (float) – Lower bound for the state estimate.
stateUpperBound (float) – Upper bound for the state estimate.
effectiveInfoRescale (bool | None) – If True, inflate the default model-based uncertainty track by block-level effective-information correction factors estimated from standardized one-step-ahead innovations via Bartlett/Newey-West long-run variance.
effectiveInfoBlockLengthBP (int | None) – Genomic block length, in bp, for HAC uncertainty scaling. Non-positive values fall back to chromosome-wide scaling.
- class consenrich.core.outputParams(convertToBigWig, roundDigits, writeUncertainty, writeJackknifeSE, applyJackknife)[source]¶
Parameters related to output files.
- Parameters:
convertToBigWig (bool) – If True, output bedGraph files are converted to bigWig format.
roundDigits (int) – Number of decimal places to round output values (bedGraph)
writeUncertainty (bool) – If True, write the state uncertainty track to bedGraph. By default this is \(\sqrt{c_{\mathrm{eff},i}\,\widetilde{P}_{[i,0,0]}}\) when
stateParams.effectiveInfoRescale=True, where \(c_{\mathrm{eff},i} \ge 1\) is a correction factor estimated from standardized innovations. Otherwise it is \(\sqrt{\widetilde{P}_{[i,0,0]}}\).writeJackknifeSE (bool) – If True, write the standard error of the signal level estimates across jackknife replicates to bedGraph. This is only relevant if applyJackknife is True.
applyJackknife (bool) – If True, estimate replicate-level sampling variability in the signal level estimates with the jackknife
- class consenrich.core.matchingParams(enabled, randSeed, tau0, numBootstrap, thresholdZ, dependenceSpan, gamma, selectionPenalty, gammaScale, nestedRoccoIters, nestedRoccoBudgetScale, exportFilterUncertaintyMultiplier)[source]¶
Parameters related to post-fit peak calling.
Consenrich uses the within-package dynamic-programming peak caller based on ROCCO.
- Parameters:
enabled (bool) – If True, run post-fit ROCCO peak calling on the emitted state bedGraph.
randSeed (Optional[int]) – Random seed used for bootstrap calibration and any stochastic tie-breaking.
tau0 (Optional[float]) – Shrinkage-score pseudovariance parameter; direct ROCCO scoring uses the fitted Consenrich state values.
numBootstrap (Optional[int]) – Number of dependent wild-bootstrap null draws used for budget calibration.
thresholdZ (Optional[float]) – One-sided null tail threshold on the ROCCO score, on a Gaussian
zscale.dependenceSpan (Optional[int]) – Optional fixed dependence span in intervals for bootstrap calibration. If not provided, it is estimated from the score track.
gamma (Optional[float]) – ROCCO boundary penalty. Non-negative values are used directly (default
0.5); negative values request data-driven estimation from local score scale and context size.selectionPenalty (Optional[float]) – Optional direct per-bin selection penalty override.
gammaScale (Optional[float]) – Multiplicative scale used when converting estimated context size into a ROCCO boundary penalty.
nestedRoccoIters (Optional[int]) – Number of nested, monotone-shrinking local ROCCO refinement iterations to run within first-pass peak regions. Set to
0to disable.nestedRoccoBudgetScale (Optional[float]) – Optional fraction of each eligible parent peak region available to the nested local ROCCO subproblem. Values below
1make the local pass stricter. The default is0.5.exportFilterUncertaintyMultiplier (Optional[float]) – Non-negative multiplier
cin the final export filtermedianState < -c * median(local uncertainty). The default is2.5. Settingc=0requires exported peaks to have positive median signal.
- Seealso:
consenrich.peaks,outputParams.
Primary functions¶
- consenrich.core.runConsenrich(matrixData, matrixMunc, deltaF, minQ, maxQ, offDiagQ, stateInit, stateCovarInit, boundState, stateLowerBound, stateUpperBound, blockLenIntervals, projectStateDuringFiltering=False, pad=0.0001, disableCalibration=False, EM_maxIters=50, EM_innerRtol=0.0001, EM_tNu=8.0, EM_useObsPrecReweight=True, EM_useProcPrecReweight=True, EM_useAPN=False, EM_useReplicateBias=True, EM_repBiasShrink=0.0, EM_outerIters=3, EM_outerRtol=0.001, EM_backgroundSmoothness=1.0, returnScales=True, returnReplicateOffsets=False, applyJackknife=False, jackknifeEM_maxIters=5, jackknifeEM_innerRtol=0.01, useWhiteAccel=False, useDiscreteConstAccel=False, autoDeltaF=True, autoDeltaF_low=0.0001, autoDeltaF_high=2.0, autoDeltaF_init=0.01, autoDeltaF_maxEvals=25, effectiveInfoRescale=True, effectiveInfoBandwidthIntervals=None, effectiveInfoBlockLengthIntervals=None)[source]¶
Run Consenrich over a contiguous genomic region
Consenrich estimates a shared signal level from multiple replicate tracks using a two-state linear smoother plus an outer calibration loop.
The observation model is
\[y_{[j,i]} = g_{[i]} + x_{[i,0]} + b_j + \epsilon_{[j,i]}, \qquad \mathrm{Var}(\epsilon_{[j,i]}) = \frac{v_{[j,i]} + \mathrm{pad}}{\lambda_{[j,i]}}.\]Here \(g_{[i]}\) is a shared zero-centered smooth background, \(b_j\) is a replicate-level bias term, and \(v_{[j,i]}\) is the plugin observation variance supplied by
matrixMunc.The latent state follows
\[\mathbf{x}_{[i+1]} = \mathbf{F}(\delta_F)\mathbf{x}_{[i]} + \eta_{[i]}, \qquad \mathrm{Var}(\eta_{[i]}) = \frac{\mathbf{Q}_0}{\kappa_{[i]}}.\]If
EM_useAPN=True, the forward filter instead uses the adaptive-process-noise D-statistic update to scale \(\mathbf{Q}_0\) and process-precision reweighting is disabled.If
effectiveInfoRescale=True, the default state variance output is multiplied by effective-information correction factors estimated from standardized one-step-ahead innovations via Bartlett/Newey-West long-run variance.This wrapper ties together several fundamental routines written in Cython:
consenrich.cconsenrich.cforwardPass(): Forward filter (predict, update)consenrich.cconsenrich.cbackwardPass(): Backward fixed-interval smootherconsenrich.cconsenrich.cinnerEM(): Joint optimization of robust precision reweighting and replicate-level observation calibration
- Seealso:
consenrich.core.getMuncTrack(),consenrich.cconsenrich.cTransform(),consenrich.cconsenrich.cforwardPass(),consenrich.cconsenrich.cbackwardPass(),consenrich.cconsenrich.cinnerEM()- Parameters:
matrixData (ndarray)
matrixMunc (ndarray)
deltaF (float)
minQ (float)
maxQ (float)
offDiagQ (float)
stateInit (float)
stateCovarInit (float)
boundState (bool)
stateLowerBound (float)
stateUpperBound (float)
blockLenIntervals (int)
projectStateDuringFiltering (bool)
pad (float)
disableCalibration (bool)
EM_maxIters (int)
EM_innerRtol (float)
EM_tNu (float)
EM_useObsPrecReweight (bool)
EM_useProcPrecReweight (bool)
EM_useAPN (bool)
EM_useReplicateBias (bool)
EM_repBiasShrink (float)
EM_outerIters (int)
EM_outerRtol (float)
EM_backgroundSmoothness (float)
returnScales (bool)
returnReplicateOffsets (bool)
applyJackknife (bool)
jackknifeEM_maxIters (int)
jackknifeEM_innerRtol (float)
useWhiteAccel (bool)
useDiscreteConstAccel (bool)
autoDeltaF (bool)
autoDeltaF_low (float)
autoDeltaF_high (float)
autoDeltaF_init (float)
autoDeltaF_maxEvals (int)
effectiveInfoRescale (bool)
effectiveInfoBandwidthIntervals (int | None)
effectiveInfoBlockLengthIntervals (int | None)
- consenrich.core.readSegments(sources, chromosome, start, end, intervalSizeBP, readLengths, scaleFactors, oneReadPerBin, samThreads, samFlagExclude, bamInputMode='auto', defaultCountMode='coverage', shiftForward5p=0, shiftReverse5p=0, extendFrom5pBP=None, maxInsertSize=1000, inferFragmentLength=0, minMappingQuality=0, minTemplateLength=-1)[source]¶
Read binned tracks from generic input sources
this is the source-agnostic entry point for counting.
For BAM inputs,
bamInputModecontrols whether we count template spans, per-read alignments, or only read1 tags from paired-end BAM. CombineshiftForward5p/shiftReverse5pwithextendFrom5pBPorinferFragmentLengthto emulate MACS--shiftand--extsizebehavior.- Parameters:
sources (List[inputSource])
chromosome (str)
start (int)
end (int)
intervalSizeBP (int)
readLengths (List[int])
scaleFactors (List[float])
oneReadPerBin (int)
samThreads (int)
samFlagExclude (int)
bamInputMode (str | None)
defaultCountMode (str | None)
shiftForward5p (int | None)
shiftReverse5p (int | None)
extendFrom5pBP (List[int] | int | None)
maxInsertSize (int | None)
inferFragmentLength (int | None)
minMappingQuality (int | None)
minTemplateLength (int | None)
- Return type:
ndarray[tuple[Any, …], dtype[float32]]
- consenrich.core.getMuncTrack(chromosome, intervals, values, intervalSizeBP, samplingBlockSizeBP=None, samplingIters=25000, randomSeed=42, excludeMask=None, useEMA=True, excludeFitCoefs=None, binQuantileCutoff=0.5, EB_minLin=0.0, EB_use=True, EB_setNu0=None, EB_setNuL=None, sparseIntervalIndices=None, sparseRegionMask=None, numNearest=0, restrictLocalAR1ToSparseBed=False, EB_localQuantile=0.0, verbose=False, eps=0.01)[source]¶
Approximate initial sample-specific (M)easurement (unc)ertainty tracks
For an individual experimental sample (replicate), quantify positional observation noise levels over genomic intervals \(i=1,2,\ldots n\) spanning
chromosome. These tracks (per-sample) comprise thematrixMuncinput torunConsenrich(), \(\mathbf{R}[:,:] \in \mathbb{R}^{m \times n}\).Variance is modeled as a function of the absolute mean signal level. For
EB_use=True, local variance estimates are shrunk toward a signal level dependent global variance fit.- Parameters:
chromosome (str) – chromosome/contig name
values (np.ndarray) – normalized/transformed signal measurements over genomic intervals (e.g.,
consenrich.cconsenrich.cTransform()output)intervals (np.ndarray) – genomic intervals positions (start positions)
intervalSizeBP (int)
samplingBlockSizeBP (int | None)
samplingIters (int)
randomSeed (int)
excludeMask (ndarray | None)
useEMA (bool | None)
excludeFitCoefs (Tuple[int, ...] | None)
binQuantileCutoff (float)
EB_minLin (float)
EB_use (bool)
EB_setNu0 (int | None)
EB_setNuL (int | None)
sparseIntervalIndices (ndarray | None)
sparseRegionMask (ndarray | None)
numNearest (int)
restrictLocalAR1ToSparseBed (bool)
EB_localQuantile (float)
verbose (bool)
eps (float)
- Return type:
tuple[ndarray[tuple[Any, …], dtype[float32]], float]
See
consenrich.core.observationParamsfor other parameters.
- consenrich.core.getContextSize(vals, minSpan=3, maxSpan=64, bandZ=1.0, maxOrder=5)[source]¶
(Experimental) Heuristic estimator for characteristic feature width from local peak widths
Candidate features are detected on a smoothed log-scale track, half-height widths are measured locally, and width uncertainty is estimated by a local residual bootstrap before EB shrinkage on the log-width scale
- Parameters:
vals (ndarray)
minSpan (int | None)
maxSpan (int | None)
bandZ (float)
maxOrder (int)
- Return type:
tuple[int, int, int]
consenrich.peaks¶
- consenrich.peaks.getROCCOBudget(state, uncertainty=None, tau0=1.0, statistic='integrated', numBootstrap=128, dependenceSpan=None, kernel='bartlett', thresholdZ=2.0, bulkQuantile=0.6, randomSeed=0, nullQuantile=0.8, pooledNullFloor=None, returnDetails=False)[source]¶
Estimate a chromosome ‘budget’ from the fitted Consenrich state.
- Parameters:
state (ArrayLike)
uncertainty (ArrayLike | None)
tau0 (float)
statistic (str)
numBootstrap (int)
dependenceSpan (int | None)
kernel (str)
thresholdZ (float)
bulkQuantile (float)
randomSeed (int)
nullQuantile (float)
pooledNullFloor (Dict[str, Any] | None)
returnDetails (bool)
- Return type:
float | Tuple[float, Dict[str, Any]]
- consenrich.peaks.solveRocco(stateBedGraphFile, uncertaintyBedGraphFile=None, chromosomes=None, tau0=1.0, numBootstrap=128, thresholdZ=2.0, dependenceSpan=None, gamma=0.5, selectionPenalty=None, gammaScale=0.5, nestedRoccoIters=3, nestedRoccoBudgetScale=0.5, exportFilterUncertaintyMultiplier=2.5, randSeed=42, outPath=None, metaPath=None, verbose=False)[source]¶
Run Consenrich+ROCCO peak caller directly on bedGraphs.
- Parameters:
stateBedGraphFile (str)
uncertaintyBedGraphFile (str | None)
chromosomes (Iterable[str] | None)
tau0 (float)
numBootstrap (int)
thresholdZ (float)
dependenceSpan (int | None)
gamma (float | None)
selectionPenalty (float | None)
gammaScale (float)
nestedRoccoIters (int)
nestedRoccoBudgetScale (float)
exportFilterUncertaintyMultiplier (float)
randSeed (int)
outPath (str | None)
metaPath (str | None)
verbose (bool)
- Return type:
str