Consenrich¶
Consenrich is an adaptive linear filter for genome-wide estimation of signals hidden in noisy multi-sample HTS datasets. A corresponding manuscript preprint is available on bioRxiv. Consenrich supports a variety of functional genomics assays including ATAC-seq, DNase-seq, ChIP-seq, CUT&RUN, etc.

Usage¶
A brief but nontrivial analysis is carried out below for demonstration.
Publicly available data from ENCODE (H3K27ac ChIP-seq data) is used in this analysis.
Input Data¶
The input dataset consists of four donors’ treatment and control epidermal samples.
Experiment |
Biosample |
H3K27ac Alignment |
Control Alignment |
---|---|---|---|
Epidermis/Female/71 |
|||
Epidermis/Male/67 |
|||
Epidermis/Female/80 |
|||
Epidermis/Male/75 |
Downloading alignment files
Run the following shell script to obtain the input BAM files for this demo. You can also use curl -O <URL> in place of wget <URL>.
encodeFiles=https://www.encodeproject.org/files
for file in ENCFF793ZHL ENCFF647VPO ENCFF809VKT ENCFF295EFL; do
wget "$encodeFiles/$file/@@download/$file.bam"
done
for ctrl in ENCFF444WVG ENCFF619NYP ENCFF898LKJ ENCFF490MWV; do
wget "$encodeFiles/$ctrl/@@download/$ctrl.bam"
done
samtools index -M *.bam
Using a YAML Configuration file¶
Now, with the needed alignment files downloaded and indexed, we can copy and paste the following YAML into a file named demoHistoneChIPSeq.yaml.
Feel free to tinker with parameters (<process,observation,etc.>Params
), but we will use defaults in the demo.
Note that Consenrich can be run programmatically using the API instead of a command-line interface (CLI) used below.
For a quick trial (\(\approx\) 1 minute), you can restrict analysis to chromosome 22 and still reproduce the results shown in the IGV browser snapshot below by specifying genomeParams.chromosomes: ['chr22']
.
experimentName: demoHistoneChIPSeq
genomeParams.name: hg38
genomeParams.excludeForNorm: ['chrX', 'chrY']
inputParams.bamFiles: [ENCFF793ZHL.bam,
ENCFF647VPO.bam,
ENCFF809VKT.bam,
ENCFF295EFL.bam]
inputParams.bamFilesControl: [ENCFF444WVG.bam,
ENCFF619NYP.bam,
ENCFF898LKJ.bam,
ENCFF490MWV.bam]
Control inputs. For assays like ATAC-seq, DNase-seq, or CUT&RUN, control samples are optional: simply omit inputParams.bamFilesControl
.
Run Consenrich¶
With the config file saved, we can invoke the command-line interface to run Consenrich:
consenrich --config demoHistoneChIPSeq.yaml
Output bedGraph/bigWig files will be saved to the current working directory, prefixed by consenrichOutput_<experimentName>
.
IGV snapshot: demoHistoneChIPSeq

The Consenrich-estimated signal track is displayed in the top row (blue) over a generic 25kb region of chromosome 22.
For reference, we also show ENCODE’s fold change over control bigwig tracks for each sample (red) and the input treatment/control alignments (black).
Note
Refer to the <process,observation,etc.>Params
classes in module consenrich.core for complete documentation configuration options.
Installation¶
From PyPI¶
Consenrich distributes multiple wheels on PyPI for different Python versions and platforms. To install the latest version, run:
python -m pip install consenrich
This is a more convenient but potentially less flexible option across platforms and Python versions. See below to build from source.
Build from source¶
Building and installing from source is recommended to ensure compatibility across platforms and Python versions.
To build from source, first ensure you have the necessary package development tools
python -m pip install setuptools wheel Cython build
If you prefer to use a virtual environment, see the YAML file consenrichEnv
.
This can be used to create an isolated setup with all required dependencies using
conda or mamba.
Clone the repository:
git clone https://github.com/nolan-h-hamilton/Consenrich.git
Build and install the package:
cd Consenrich python -m build python -m pip install .
Previous versions¶
To install a specific version of Consenrich from PyPI, you can specify the version number in the pip install command, for example:
python -m pip install consenrich==0.1.13b1
API Reference¶
Core API: consenrich.core¶
The core module implements the main aspects of Consenrich.
- class consenrich.core.processParams(deltaF, minQ, maxQ, offDiagQ, dStatAlpha, dStatd, dStatPC)[source]¶
Parameters related to the process model of Consenrich. The process model governs the signal and variance propagation through the state transition \(\mathbf{F} \in \mathbb{R}^{2 \times 2}\) and process noise covariance \(\mathbf{Q}_{[i]} \in \mathbb{R}^{2 \times 2}\) matrices.
- Parameters:
deltaF (float) – Propagation length. Informally, how far forward/backward to project the estimate and covariance at the previous genomic interval to obtain the initial prediction of the state and covariance at the current genomic interval \(i\): \(x_{[i|i-1]}\) and covariance \(\mathbf{P}_{[i|i-1]}\).
minQ (float) – Minimum process noise variance (diagonal in \(\mathbf{Q}_{[i]}\)) for each state variable.
maxQ (float) – Maximum process noise variance (diagonal in \(\mathbf{Q}_{[i]}\)) for each state variable.
offDiagQ (float) – Initial off-diagonal noise covariance between states.
dStatAlpha (float) – Innovation-based model mismatch threshold \(\alpha_D\). If we observe \(D_{[i]} > \alpha_D\), we consider the process model to be unreliable and therefore scale-up the process noise covariance to favor the observation model (the data) instead.
dStatd (float) – Constant \(d\) in the scaling expression \(\sqrt{d|D_{[i]} - \alpha_D| + c}\) that is used to up/down-scale the process noise covariance in the event of a model mismatch.
dStatPC (float) – Constant \(c\) in the scaling expression \(\sqrt{d|D_{[i]} - \alpha_D| + c}\) that is used to up/down-scale the process noise covariance in the event of a model mismatch.
- class consenrich.core.observationParams(minR, maxR, useALV, useConstantNoiseLevel, noGlobal, numNearest, localWeight, globalWeight, approximationWindowLengthBP, lowPassWindowLengthBP, returnCenter)[source]¶
Parameters related to the observation model of Consenrich. The observation model is used to integrate sequence alignment count data from the multiple input samples and account for region-and-sample-specific noise processes corrupting data. The observation model matrix \(\mathbf{H} \in \mathbb{R}^{m \times 2}\) maps from the state dimension (2) to the dimension of measurements/data (\(m\)).
- Parameters:
minR (float) – The minimum observation noise variance for each sample \(j=1\ldots m\) in the observation noise covariance matrix \(\mathbf{R}_{[i, (11:mm)]}\).
maxR (float) – The maximum observation noise variance for each sample \(j=1\ldots m\) in the observation noise covariance matrix \(\mathbf{R}_{[i, (11:mm)]}\).
numNearest (int) – The number of nearest nearby sparse features to use for local variance calculation.
localWeight (float) – The weight for the local variance in the observation model.
globalWeight (float) – The weight for the global noise level in the observation model.
approximationWindowLengthBP (int) – The length of the approximation window in base pairs (BP) for the local variance calculation.
sparseBedFile (str, optional) – The path to a BED file of ‘sparse’ regions for the local variance calculation.
noGlobal (bool) – If True, only the ‘local’ variances are used to approximate observation noise covariance \(\mathbf{R}_{[:, (11:mm)]}\).
useALV (bool) – Whether to use average local variance (ALV) to approximate observation noise covariances per-sample, per-interval.
useConstantNoiseLevel (bool) – Whether to use a constant noise level in the observation model.
lowPassWindowLengthBP (int)
returnCenter (bool)
- class consenrich.core.stateParams(stateInit, stateCovarInit, boundState, stateLowerBound, stateUpperBound)[source]¶
Parameters related to state and uncertainty bounds and initialization.
- Parameters:
stateInit (float) – Initial (primary) state estimate at the first genomic interval: \(x_{[1]}\)
stateCovarInit (float) – Initial state covariance (covariance) scale. The state uncertainty \(\mathbf{P_{[1]}}\) is a multiple of \(\mathbf{I}_2\)
boundState (bool)
stateLowerBound (float)
stateUpperBound (float)
- Param:
boundState: If True, the primary state estimate for \(x_{[i]}\) is constrained within stateLowerBound and stateUpperBound.
- class consenrich.core.detrendParams(useOrderStatFilter, usePolyFilter, detrendTrackPercentile, detrendSavitzkyGolayDegree, detrendWindowLengthBP)[source]¶
Parameters related detrending and background-removal
- Parameters:
useOrderStatFilter (bool) – Whether to use order statistics for filtering the read density data.
usePolyFilter (bool) – Whether to use polynomial fitting for filtering the read density data.
detrendSavitzkyGolayDegree (int) – The polynomial degree of the Savitzky-Golay filter to use for detrending
detrendTrackPercentile (float) – The percentile to use for detrending the read density data.
detrendWindowLengthBP (int) – The length of the window in base pairs for detrending the read density data.
- class consenrich.core.inputParams(bamFiles, bamFilesControl)[source]¶
Parameters related to the input data for Consenrich.
- Parameters:
bamFiles (List[str]) – A list of paths to distinct coordinate-sorted and indexed BAM files.
bamFilesControl (List[str], optional) – A list of paths to distinct coordinate-sorted and indexed control BAM files. e.g., IgG control inputs for ChIP-seq.
- class consenrich.core.genomeParams(genomeName, chromSizesFile, blacklistFile, sparseBedFile, chromosomes, excludeChroms, excludeForNorm)[source]¶
- Parameters:
genomeName (str)
chromSizesFile (str)
blacklistFile (str | None)
sparseBedFile (str | None)
chromosomes (List[str])
excludeChroms (List[str])
excludeForNorm (List[str])
- class consenrich.core.countingParams(stepSize, scaleDown, scaleFactors, scaleFactorsControl, numReads)[source]¶
Parameters related to counting reads in genomic intervals.
- Parameters:
stepSize (int) – Step size for the genomic intervals.
scaleDown (bool, optional) – If using paired treatment and control BAM files, whether to scale down the larger of the two before computing the difference/ratio
scaleFactors (List[float], optional) – Scale factors for the read counts.
scaleFactorsControl (List[float], optional) – Scale factors for the control read counts.
numReads (int) – Number of reads to sample.
- class consenrich.core.samParams(samThreads, samFlagExclude, oneReadPerBin, chunkSize)[source]¶
Parameters related to reading BAM files
- Parameters:
samThreads (int) – The number of threads to use for reading BAM files.
samFlagExclude (int) – The SAM flag to exclude certain reads.
oneReadPerBin (int) – If 1, only the interval with the greatest read overlap is incremented.
chunkSize (int) – maximum number of intervals’ data to hold in memory before flushing to disk.
Note
For an overview of SAM flags see https://broadinstitute.github.io/picard/explain-flags.html
- class consenrich.core.matchingParams(templateNames, cascadeLevels, iters, alpha, minMatchLengthBP, maxNumMatches, minSignalAtMaxima)[source]¶
Parameters related the experimental wavelet-based matched filter for pattern recognition.
- Parameters:
templateNames (List[str]) – The names of the templates to match against.
cascadeLevels (List[int]) – Templates are currently derived from cascade-approximated wavelets at level=cascadeLevel.
iters (int) – Number of iterations to use for sampling block maxima while building the empirical null
alpha (float) – Significance level for the empirical null distribution.
minMatchLength (int) – Minimum length around response maxima to qualify matches.
minMatchLengthBP (int | None)
maxNumMatches (int | None)
minSignalAtMaxima (float | None)
- consenrich.core.getChromRanges(bamFile, chromosome, chromLength, samThreads, samFlagExclude)[source]¶
Get the start and end positions of reads in a chromosome from a BAM file.
- Parameters:
bamFile (str) – See
inputParams
.chromosome (str) – the chromosome to read in bamFile.
chromLength (int) – Base pair length of the chromosome.
samThreads (int) – See
samParams
.samFlagExclude (int) – See
samParams
.
- Returns:
Tuple of start and end positions (nucleotide coordinates) in the chromosome.
- Return type:
Tuple[int, int]
- Seealso:
getChromRangesJoint()
,cconsenrich.cgetFirstChromRead()
,cconsenrich.cgetLastChromRead()
- consenrich.core.getChromRangesJoint(bamFiles, chromosome, chromSize, samThreads, samFlagExclude)[source]¶
For multiple BAM files, reconcile a single start and end position over which to count reads, where the start and end positions are defined by the first and last reads across all BAM files.
- Parameters:
bamFiles (List[str]) – List of BAM files to read.
chromosome (str) – Chromosome to read.
chromSize (int) – Size of the chromosome.
samThreads (int) – Number of threads to use for reading the BAM files.
samFlagExclude (int) – SAM flag to exclude certain reads.
- Returns:
Tuple of start and end positions.
- Return type:
Tuple[int, int]
- Seealso:
getChromRanges()
,cconsenrich.cgetFirstChromRead()
,cconsenrich.cgetLastChromRead()
- consenrich.core.getReadLength(bamFile, numReads, maxIterations, samThreads, samFlagExclude)[source]¶
Infer read length from mapped reads in a BAM file. Samples at least numReads reads passing criteria given by samFlagExclude and returns the median read length.
- Parameters:
bamFile (str) – See
inputParams
.numReads (int) – Number of reads to sample.
maxIterations (int) – Maximum number of iterations to perform.
samThreads (int) – See
samParams
.samFlagExclude (int) – See
samParams
.
- Returns:
The median read length.
- Return type:
int
- Raises:
ValueError – If the read length cannot be determined after scanning maxIterations reads.
- Seealso:
cconsenrich.cgetReadLength()
- consenrich.core.readBamSegments(bamFiles, chromosome, start, end, stepSize, readLengths, scaleFactors, oneReadPerBin, samThreads, samFlagExclude)[source]¶
Calculate tracks of read counts (or a function thereof) for each BAM file.
See
cconsenrich.creadBamSegment()
for the underlying implementation in Cython.- Parameters:
bamFiles (List[str]) – See
inputParams
.chromosome (str) – Chromosome to read.
start (int) – Start position of the genomic segment.
end (int) – End position of the genomic segment.
readLengths (List[int]) – List of read lengths for each BAM file.
scaleFactors (List[float]) – List of scale factors for each BAM file.
stepSize (int) – See
countingParams
.oneReadPerBin (int)
samThreads (int)
samFlagExclude (int)
- Return type:
ndarray[tuple[Any, …], dtype[float64]]
- consenrich.core.getAverageLocalVarianceTrack(values, stepSize, approximationWindowLengthBP, lowPassWindowLengthBP, minR, maxR)[source]¶
Approximate local noise levels in a segment using an ALV approach.
First, computes a segment-length simple moving average of values with a bp-length window approximationWindowLengthBP.
Second, computes a segment-length simple moving average of squared values.
Between these two averages, the difference between the latter and the square of the former approximates the local variance of the segment. These local variances are then combined with a median filter of length lowPassWindowLengthBP.
- Parameters:
values (np.ndarray) – An array of read-density-based values (typically from a single row in a sample-by-interval matrix)
stepSize (int) – See
countingParams
.observationParams (observationParams) – See
observationParams
approximationWindowLengthBP (int) – The length of the approximation window in base pairs (BP).
lowPassWindowLengthBP (int) – The length of the low-pass filter window in base pairs (BP).
minR (float)
maxR (float)
- Seealso:
- Return type:
ndarray[tuple[Any, …], dtype[float64]]
- consenrich.core.constructMatrixF(deltaF)[source]¶
Build the state transition matrix for the process model
- Parameters:
deltaF (float) – See
processParams
.- Returns:
The state transition matrix \(\mathbf{F}\)
- Return type:
npt.NDArray[np.float64]
- Seealso:
- consenrich.core.constructMatrixQ(minDiagQ, offDiagQ=0.0)[source]¶
Build the initial process noise covariance matrix \(\mathbf{Q}_{[1]}\).
- Parameters:
minDiagQ (float) – See
processParams
.offDiagQ (float) – See
processParams
.
- Returns:
The initial process noise covariance matrix \(\mathbf{Q}_{[1]}\).
- Return type:
npt.NDArray[np.float64]
- Seealso:
- consenrich.core.constructMatrixH(m, coefficients=None)[source]¶
Build the observation model matrix \(\mathbf{H}\).
- Parameters:
m (int) – Number of observations.
coefficients (Optional[npt.NDArray[np.float64]]) – Optional coefficients for the observation model, which can be used to weight the observations manually.
- Returns:
The observation model matrix \(\mathbf{H}\).
- Return type:
npt.NDArray[np.float64]
- Seealso:
observationParams
, class:inputParams
- consenrich.core.runConsenrich(matrixData, matrixMunc, deltaF, minQ, maxQ, offDiagQ, dStatAlpha, dStatd, dStatPC, stateInit, stateCovarInit, boundState, stateLowerBound, stateUpperBound, chunkSize, progressIter, coefficientsH=None, residualCovarInversionFunc=None, adjustProcessNoiseFunc=None)[source]¶
Run consenrich on a contiguous segment (e.g. a chromosome) of read-density-based data. Completes the forward and backward passes given data and approximated observation noise covariance matrices \(\mathbf{R}_{[1:n, (11:mm)]}\).
- Parameters:
matrixData (npt.NDArray[np.float64]) – Read density data for a single chromosome or general contiguous segment, possibly preprocessed. Two-dimensional array of shape \(m \times n\) where \(m\) is the number of samples/tracks and \(n\) the number of genomic intervals.
matrixMunc (npt.NDArray[np.float64]) – Uncertainty estimates for the read density data, e.g. local variance. Two-dimensional array of shape \(m \times n\) where \(m\) is the number of samples/tracks and \(n\) the number of genomic intervals. :seealso:
getAverageLocalVarianceTrack()
,getMuncTrack()
.deltaF (float) – See
processParams
.minQ (float) – See
processParams
.maxQ (float) – See
processParams
.offDiagQ (float) – See
processParams
.dStatAlpha (float) – See
processParams
.dStatd (float) – See
processParams
.dStatPC (float) – See
processParams
.stateInit (float) – See
stateParams
.stateCovarInit (float) – See
stateParams
.chunkSize (int) – Number of genomic intervals’ data to keep in memory before flushing to disk.
progressIter (int) – The number of iterations after which to log progress.
coefficientsH (Optional[npt.NDArray[np.float64]]) – Optional coefficients for the observation model matrix \(\mathbf{H}\). If None, the coefficients are set to 1.0 for all samples.
residualCovarInversionFunc (Optional[Callable]) – Callable function to invert the observation covariance matrix \(\mathbf{E}_{[i]}\). If None, defaults to
cconsenrich.cinvertMatrixE()
.adjustProcessNoiseFunc (Optional[Callable]) – Function to adjust the process noise covariance matrix \(\mathbf{Q}_{[i]}\). If None, defaults to
cconsenrich.updateProcessNoiseCovariance()
.boundState (bool)
stateLowerBound (float)
stateUpperBound (float)
- Returns:
Tuple of three numpy arrays: - state estimates \(\widetilde{\mathbf{x}}_{[i]}\) of shape \(n \times 2\) - state covariance estimates \(\widetilde{\mathbf{P}}_{[i]}\) of shape \(n \times 2 \times 2\) - post-fit residuals \(\widetilde{\mathbf{y}}_{[i]}\) of shape \(n \times m\)
- Return type:
Tuple[npt.NDArray[np.float64], npt.NDArray[np.float64], npt.NDArray[np.float64]]
- Raises:
ValueError – If the number of samples in matrixData is not equal to the number of samples in matrixMunc.
- consenrich.core.getPrimaryState(stateVectors, roundPrecision=3)[source]¶
Get the primary state estimate from each vector after running Consenrich.
- Parameters:
stateVectors (npt.NDArray[np.float64]) – State vectors from
runConsenrich()
.roundPrecision (int)
- Returns:
A one-dimensional numpy array of the primary state estimates.
- Return type:
npt.NDArray[np.float64]
- consenrich.core.getStateCovarTrace(stateCovarMatrices, roundPrecision=3)[source]¶
Get a one-dimensional array of state covariance traces after running Consenrich
- Parameters:
stateCovarMatrices (npt.NDArray[np.float64]) – Estimated state covariance matrices \(\widetilde{\mathbf{P}}_{[i]}\)
roundPrecision (int)
- Returns:
A one-dimensional numpy array of the traces of the state covariance matrices.
- Return type:
npt.NDArray[np.float64]
- consenrich.core.getPrecisionWeightedResidual(postFitResiduals, matrixMunc, roundPrecision=3)[source]¶
Get a one-dimensional precision-weighted array residuals after running Consenrich.
This is essentially an estimate of the residuals with respect to the observation noise covariance \(\mathbf{R}_{[:, (11:mm)]}\).
Applies an inverse-variance weighting (with respect to the observation noise levels) of the post-fit residuals \(\widetilde{\mathbf{y}}_{[i]}\) and returns a one-dimensional array of “precision-weighted residuals”.
- Parameters:
postFitResiduals (npt.NDArray[np.float64]) – Post-fit residuals from
runConsenrich()
.matrixMunc (npt.NDArray[np.float64]) – an \(m \times n\) numpy array where each column stores the diagonal entries of the observation noise covariance matrix \(\mathbf{R}_{[:, (11:mm)]}\) for each sample \(j=1,2,\ldots,m\) and each genomic interval \(i=1,2,\ldots,n\).
roundPrecision (int)
- Returns:
A one-dimensional array of “precision-weighted residuals”
- Return type:
npt.NDArray[np.float64]
- consenrich.core.getMuncTrack(chromosome, intervals, stepSize, rowValues, minR, maxR, useALV, useConstantNoiseLevel, noGlobal, localWeight, globalWeight, approximationWindowLengthBP, lowPassWindowLengthBP, returnCenter, sparseMap=None)[source]¶
Get observation noise variance \(R_{[:,jj]}\) for the sample \(j\).
- Parameters:
chromosome (str) – Tracks are approximated for this chromosome.
intervals (ndarray) – Genomic intervals for which to compute the noise track.
stepSize (int) – See
countingParams
.rowValues (npt.NDArray[np.float64]) – Read-density-based values for the sample \(j\) at the genomic intervals \(i=1,2,\ldots,n\).
minR (float) – See
observationParams
.maxR (float) – See
observationParams
.useALV (bool) – See
observationParams
.useConstantNoiseLevel (bool) – See
observationParams
.noGlobal (bool) – See
observationParams
.localWeight (float) – See
observationParams
.globalWeight (float) – See
observationParams
.approximationWindowLengthBP (int) – See
observationParams
and/orgetAverageLocalVarianceTrack()
.lowPassWindowLengthBP (int) – See
observationParams
and/orgetAverageLocalVarianceTrack()
.sparseMap (Optional[dict[int, int]]) – Optional mapping (dictionary) of interval indices to the nearest sparse regions. See
getSparseMap()
.returnCenter (bool)
- Returns:
A one-dimensional numpy array of the observation noise track for the sample \(j\).
- Return type:
npt.NDArray[np.float64]
- consenrich.core.sparseIntersection(chromosome, intervals, sparseBedFile)[source]¶
If using an annotation of sparse features to complement approximation of observation noise levels, this function returns intervals in the chromosome that overlap with the sparse features. :param chromosome: The chromosome name. :type chromosome: str :param intervals: The genomic intervals to consider. :type intervals: np.ndarray :param sparseBedFile: Path to the sparse BED file. :type sparseBedFile: str :return: A numpy array of start positions of the sparse features that overlap with the intervals :rtype: np.ndarray[Tuple[Any], np.dtype[Any]]
- Parameters:
chromosome (str)
intervals (ndarray)
sparseBedFile (str)
- Return type:
ndarray[tuple[Any, …], dtype[int64]]
- consenrich.core.adjustFeatureBounds(feature, stepSize)[source]¶
Adjust the start and end positions of a BED feature to be centered around a step.
- Parameters:
feature (Interval)
stepSize (int)
- Return type:
Interval
- consenrich.core.getSparseMap(chromosome, intervals, numNearest, sparseBedFile)[source]¶
Build a map between each genomic interval and numNearest sparse features
- Parameters:
chromosome (str) – The chromosome name. Note, this function only needs to be run once per chromosome.
intervals (npt.NDArray[np.int64]) – The genomic intervals to map.
numNearest (int) – The number of nearest sparse features to consider
sparseBedFile (str) – path to the sparse BED file.
- Returns:
A dictionary mapping each interval index to the indices of the nearest sparse regions.
- Return type:
dict[int, np.ndarray]
Background/Trend Removal and Scaling: consenrich.detrorm¶
Note
See consenrich.core.detrendParams
for relevant parameters.
- consenrich.detrorm.getScaleFactor1x(bamFile, effectiveGenomeSize, readLength, excludeChroms, chromSizesFile, samThreads)[source]¶
Generic normalization factor based on effective genome size and number of mapped reads in non-excluded chromosomes.
- Parameters:
bamFile (str) – See
consenrich.core.inputParams
.effectiveGenomeSize (int) – Effective genome size in base pairs. See
consenrich.constants.getEffectiveGenomeSize()
.readLength (int) – Read length (base pairs). See
consenrich.core.getReadLength()
.excludeChroms (List[str]) – List of chromosomes to exclude from the analysis.
chromSizesFile (str) – Path to the chromosome sizes file.
samThreads (int) – See
consenrich.core.samParams
.
- Returns:
Scale factor for 1x normalization.
- Return type:
float
- consenrich.detrorm.getScaleFactorPerMillion(bamFile, excludeChroms)[source]¶
Generic normalization factor based on number of mapped reads in non-excluded chromosomes.
- Parameters:
bamFile (str) – See
consenrich.core.inputParams
.excludeChroms (List[str]) – List of chromosomes to exclude when counting mapped reads.
- Returns:
Scale factor accounting for number of mapped reads (only).
- Return type:
float
- consenrich.detrorm.getPairScaleFactors(bamFileA, bamFileB, effectiveGenomeSizeA, effectiveGenomeSizeB, readLengthA, readLengthB, excludeChroms, chromSizesFile, samThreads, scaleDown=True)[source]¶
Get scaling constants that normalize two alignment files to each other (e.g. ChIP-seq treatment and control) with respect to sequence coverage.
- Parameters:
bamFileA (str) – Path to the first BAM file.
bamFileB (str) – Path to the second BAM file.
effectiveGenomeSizeA (int) – Effective genome size for the first BAM file.
effectiveGenomeSizeB (int) – Effective genome size for the second BAM file.
readLengthA (int) – Read length for the first BAM file.
readLengthB (int) – Read length for the second BAM file.
excludeChroms (List[str]) – List of chromosomes to exclude from the analysis.
chromSizesFile (str) – Path to the chromosome sizes file.
samThreads (int) – Number of threads to use for reading BAM files.
scaleDown (bool)
- Returns:
A tuple containing the scale factors for the first and second BAM files.
- Return type:
Tuple[float, float]
- consenrich.detrorm.detrendTrack(values, stepSize, detrendWindowLengthBP, useOrderStatFilter, usePolyFilter, detrendTrackPercentile, detrendSavitzkyGolayDegree)[source]¶
Detrend tracks using either an order statistic filter or a polynomial filter.
- Parameters:
values (np.ndarray) – Values to detrend.
stepSize (int) – see
consenrich.core.countingParams
.detrendWindowLengthBP (int) – See
consenrich.core.detrendParams
.useOrderStatFilter (bool) – Whether to use a sliding order statistic filter.
usePolyFilter (bool) – Whether to use a sliding polynomial/least squares filter.
detrendTrackPercentile (float) – Percentile to use for the order statistic filter.
detrendSavitzkyGolayDegree (int) – Degree of the polynomial for the Savitzky-Golay/Polynomial filter.
- Returns:
Detrended values.
- Return type:
np.ndarray
- Raises:
ValueError – If the detrend window length is not greater than 3 times the step size or if the values length is less than the detrend window length.
Genomic resources and constants: consenrich.constants¶
Note
This module is provided for convenience. If a genome is not listed here, users can still specify resources manually. For any genome assembly, you can provide the following manually.
- consenrich.constants.getEffectiveGenomeSize(genome, readLength)[source]¶
Get the effective genome size for a given genome and read length.
- Parameters:
genome (str) – Name of the genome. See
consenrich.constants.resolveGenomeName()
andconsenrich.core.genomeParams
.readLength (int) – Length of the reads. See
consenrich.core.getReadLength()
.
- Raises:
ValueError – If the genome is not recognized or if the read length is not available for the genome.
- Returns:
Effective genome size in base pairs.
- Return type:
int
- consenrich.constants.getGenomeResourceFile(genome, fileType, dir_='data')[source]¶
Get the path to a genome resource file.
- Parameters:
genome (str) – the genome assembly. See
consenrich.constants.resolveGenomeName()
andconsenrich.core.genomeParams
.fileType (str) – One of ‘sizes’, ‘blacklist’, ‘sparse’.
dir_ (str)
- Returns:
Path to the resource file.
- Return type:
str
- Raises:
ValueError – If not a sizes, blacklist, or sparse file.
FileNotFoundError – If the resource file does not exist.
- consenrich.constants.resolveGenomeName(genome)[source]¶
Standardize the genome name for consistency :param genome: Name of the genome. See
consenrich.core.genomeParams
. :type genome: str :return: Standardized genome name. :rtype: str :raises ValueError: If the genome is not recognized.- Parameters:
genome (str)
- Return type:
str
Cython functions: consenrich.cconsenrich¶
Several functions are implemented in Cython for efficiency in the main loop and during matrix construction.
- consenrich.cconsenrich.creadBamSegment(bamFile, chromosome, start, end, stepSize, readLength, oneReadPerBin, samThreads, samFlagExclude)¶
Count reads in a BAM file for a given chromosome and range, returning a numpy array of counts.
See
consenrich.core.readBamSegments()
for the multi-sample python wrapper- Parameters:
bamFile (str) – See
consenrich.core.inputParams
.chromosome (str) – Chromosome name.
start (uint32_t) – Start position of the range in base pairs.
end (uint32_t) – End position of the range in base pairs.
stepSize (uint32_t) – Size of the bins to count reads in.
readLength (uint32_t) – Length of the reads. If greater than stepSize, counts reads in bins defined by read start and end positions. See
consenrich.core.getReadLength()
.oneReadPerBin (uint8_t) – See
consenrich.core.samParams
.samThreads (uint16_t) – See
consenrich.core.samParams
.samFlagExclude (uint16_t) – See
consenrich.core.samParams
.
- Returns:
A numpy array of counts for each bin in the specified range.
- Return type:
cnp.ndarray[cnp.uint32_t, ndim=1]
- consenrich.cconsenrich.cinvertMatrixE(muncMatrixIter, priorCovarianceOO)¶
Invert the residual covariance matrix during the forward pass.
- Parameters:
muncMatrixIter (cnp.ndarray[cnp.float64_t, ndim=1]) – The diagonal elements of the covariance matrix at a given genomic interval.
priorCovarianceOO (cnp.float64_t) – The a priori ‘primary’ state variance \(P_{[i|i-1,11]}\).
- Returns:
The inverted covariance matrix.
- Return type:
cnp.ndarray[cnp.float64_t, ndim=2]
- consenrich.cconsenrich.updateProcessNoiseCovariance(matrixQ, matrixQCopy, dStat, dStatAlpha, dStatd, dStatPC, inflatedQ, maxQ, minQ)¶
Adjust process noise covariance matrix \(\mathbf{Q}_{[i]}\)
- Parameters:
matrixQ – Current process noise covariance
matrixQCopy – A copy of the initial original covariance matrix \(\mathbf{Q}_{[.]}\)
inflatedQ – Flag indicating if the process noise covariance is inflated
- Returns:
Updated process noise covariance matrix and inflated flag
- Return type:
tuple
- consenrich.cconsenrich.csampleBlockStats(values, expectedBlockSize, iters, randSeed)¶
Sample contiguous blocks in the response sequence, record maxima, and repeat.
Used to determine significance threshold in the response sequence. See
consenrich.matching.matchWavelet()
- Parameters:
values (cnp.ndarray[cnp.float64_t, ndim=1]) – The response sequence to sample from.
expectedBlockSize (int) – The expected size (geometric) of the blocks to sample.
iters (int) – The number of blocks to sample.
randSeed (int) – Random seed for reproducibility.
- Returns:
An array of sampled block maxima.
- Return type:
cnp.ndarray[cnp.float64_t, ndim=1]