Consenrich

Consenrich is an adaptive linear filter for genome-wide estimation of signals hidden in noisy multi-sample HTS datasets. A corresponding manuscript preprint is available on bioRxiv. Consenrich supports a variety of functional genomics assays including ATAC-seq, DNase-seq, ChIP-seq, CUT&RUN, etc.

Simplified schematic of Consenrich

Usage

A brief but nontrivial analysis is carried out below for demonstration.

Publicly available data from ENCODE (H3K27ac ChIP-seq data) is used in this analysis.

Input Data

The input dataset consists of four donors’ treatment and control epidermal samples.

Input Data

Experiment

Biosample

H3K27ac Alignment

Control Alignment

ENCSR214UZE

Epidermis/Female/71

ENCFF793ZHL.bam

ENCFF444WVG.bam

ENCSR334DRN

Epidermis/Male/67

ENCFF647VPO.bam

ENCFF619NYP.bam

ENCSR340ZTB

Epidermis/Female/80

ENCFF809VKT.bam

ENCFF898LKJ.bam

ENCSR386CKJ

Epidermis/Male/75

ENCFF295EFL.bam

ENCFF490MWV.bam

Downloading alignment files

Run the following shell script to obtain the input BAM files for this demo. You can also use curl -O <URL> in place of wget <URL>.

encodeFiles=https://www.encodeproject.org/files

for file in ENCFF793ZHL ENCFF647VPO ENCFF809VKT ENCFF295EFL; do
    wget "$encodeFiles/$file/@@download/$file.bam"
done

for ctrl in ENCFF444WVG ENCFF619NYP ENCFF898LKJ ENCFF490MWV; do
    wget "$encodeFiles/$ctrl/@@download/$ctrl.bam"
done

samtools index -M *.bam

Using a YAML Configuration file

Now, with the needed alignment files downloaded and indexed, we can copy and paste the following YAML into a file named demoHistoneChIPSeq.yaml. Feel free to tinker with parameters (<process,observation,etc.>Params), but we will use defaults in the demo.

Note that Consenrich can be run programmatically using the API instead of a command-line interface (CLI) used below.

For a quick trial (\(\approx\) 1 minute), you can restrict analysis to chromosome 22 and still reproduce the results shown in the IGV browser snapshot below by specifying genomeParams.chromosomes: ['chr22'].

experimentName: demoHistoneChIPSeq
genomeParams.name: hg38
genomeParams.excludeForNorm: ['chrX', 'chrY']
inputParams.bamFiles: [ENCFF793ZHL.bam,
ENCFF647VPO.bam,
ENCFF809VKT.bam,
ENCFF295EFL.bam]
inputParams.bamFilesControl: [ENCFF444WVG.bam,
ENCFF619NYP.bam,
ENCFF898LKJ.bam,
ENCFF490MWV.bam]

Control inputs. For assays like ATAC-seq, DNase-seq, or CUT&RUN, control samples are optional: simply omit inputParams.bamFilesControl.

Run Consenrich

With the config file saved, we can invoke the command-line interface to run Consenrich:

consenrich --config demoHistoneChIPSeq.yaml

Output bedGraph/bigWig files will be saved to the current working directory, prefixed by consenrichOutput_<experimentName>.

IGV snapshot: demoHistoneChIPSeq

Output Consenrich Signal Estimates
  • The Consenrich-estimated signal track is displayed in the top row (blue) over a generic 25kb region of chromosome 22.

  • For reference, we also show ENCODE’s fold change over control bigwig tracks for each sample (red) and the input treatment/control alignments (black).

Note

Refer to the <process,observation,etc.>Params classes in module consenrich.core for complete documentation configuration options.

Installation

From PyPI

Consenrich distributes multiple wheels on PyPI for different Python versions and platforms. To install the latest version, run:

python -m pip install consenrich

This is a more convenient but potentially less flexible option across platforms and Python versions. See below to build from source.

Build from source

Building and installing from source is recommended to ensure compatibility across platforms and Python versions.

To build from source, first ensure you have the necessary package development tools

python -m pip install setuptools wheel Cython build

If you prefer to use a virtual environment, see the YAML file consenrichEnv. This can be used to create an isolated setup with all required dependencies using conda or mamba.

Clone the repository:

git clone https://github.com/nolan-h-hamilton/Consenrich.git

Build and install the package:

cd Consenrich
python -m build
python -m pip install .

Previous versions

To install a specific version of Consenrich from PyPI, you can specify the version number in the pip install command, for example:

python -m pip install consenrich==0.1.13b1

API Reference

Core API: consenrich.core

The core module implements the main aspects of Consenrich.

class consenrich.core.processParams(deltaF, minQ, maxQ, offDiagQ, dStatAlpha, dStatd, dStatPC)[source]

Parameters related to the process model of Consenrich. The process model governs the signal and variance propagation through the state transition \(\mathbf{F} \in \mathbb{R}^{2 \times 2}\) and process noise covariance \(\mathbf{Q}_{[i]} \in \mathbb{R}^{2 \times 2}\) matrices.

Parameters:
  • deltaF (float) – Propagation length. Informally, how far forward/backward to project the estimate and covariance at the previous genomic interval to obtain the initial prediction of the state and covariance at the current genomic interval \(i\): \(x_{[i|i-1]}\) and covariance \(\mathbf{P}_{[i|i-1]}\).

  • minQ (float) – Minimum process noise variance (diagonal in \(\mathbf{Q}_{[i]}\)) for each state variable.

  • maxQ (float) – Maximum process noise variance (diagonal in \(\mathbf{Q}_{[i]}\)) for each state variable.

  • offDiagQ (float) – Initial off-diagonal noise covariance between states.

  • dStatAlpha (float) – Innovation-based model mismatch threshold \(\alpha_D\). If we observe \(D_{[i]} > \alpha_D\), we consider the process model to be unreliable and therefore scale-up the process noise covariance to favor the observation model (the data) instead.

  • dStatd (float) – Constant \(d\) in the scaling expression \(\sqrt{d|D_{[i]} - \alpha_D| + c}\) that is used to up/down-scale the process noise covariance in the event of a model mismatch.

  • dStatPC (float) – Constant \(c\) in the scaling expression \(\sqrt{d|D_{[i]} - \alpha_D| + c}\) that is used to up/down-scale the process noise covariance in the event of a model mismatch.

class consenrich.core.observationParams(minR, maxR, useALV, useConstantNoiseLevel, noGlobal, numNearest, localWeight, globalWeight, approximationWindowLengthBP, lowPassWindowLengthBP, returnCenter)[source]

Parameters related to the observation model of Consenrich. The observation model is used to integrate sequence alignment count data from the multiple input samples and account for region-and-sample-specific noise processes corrupting data. The observation model matrix \(\mathbf{H} \in \mathbb{R}^{m \times 2}\) maps from the state dimension (2) to the dimension of measurements/data (\(m\)).

Parameters:
  • minR (float) – The minimum observation noise variance for each sample \(j=1\ldots m\) in the observation noise covariance matrix \(\mathbf{R}_{[i, (11:mm)]}\).

  • maxR (float) – The maximum observation noise variance for each sample \(j=1\ldots m\) in the observation noise covariance matrix \(\mathbf{R}_{[i, (11:mm)]}\).

  • numNearest (int) – The number of nearest nearby sparse features to use for local variance calculation.

  • localWeight (float) – The weight for the local variance in the observation model.

  • globalWeight (float) – The weight for the global noise level in the observation model.

  • approximationWindowLengthBP (int) – The length of the approximation window in base pairs (BP) for the local variance calculation.

  • sparseBedFile (str, optional) – The path to a BED file of ‘sparse’ regions for the local variance calculation.

  • noGlobal (bool) – If True, only the ‘local’ variances are used to approximate observation noise covariance \(\mathbf{R}_{[:, (11:mm)]}\).

  • useALV (bool) – Whether to use average local variance (ALV) to approximate observation noise covariances per-sample, per-interval.

  • useConstantNoiseLevel (bool) – Whether to use a constant noise level in the observation model.

  • lowPassWindowLengthBP (int)

  • returnCenter (bool)

class consenrich.core.stateParams(stateInit, stateCovarInit, boundState, stateLowerBound, stateUpperBound)[source]

Parameters related to state and uncertainty bounds and initialization.

Parameters:
  • stateInit (float) – Initial (primary) state estimate at the first genomic interval: \(x_{[1]}\)

  • stateCovarInit (float) – Initial state covariance (covariance) scale. The state uncertainty \(\mathbf{P_{[1]}}\) is a multiple of \(\mathbf{I}_2\)

  • boundState (bool)

  • stateLowerBound (float)

  • stateUpperBound (float)

Param:

boundState: If True, the primary state estimate for \(x_{[i]}\) is constrained within stateLowerBound and stateUpperBound.

class consenrich.core.detrendParams(useOrderStatFilter, usePolyFilter, detrendTrackPercentile, detrendSavitzkyGolayDegree, detrendWindowLengthBP)[source]

Parameters related detrending and background-removal

Parameters:
  • useOrderStatFilter (bool) – Whether to use order statistics for filtering the read density data.

  • usePolyFilter (bool) – Whether to use polynomial fitting for filtering the read density data.

  • detrendSavitzkyGolayDegree (int) – The polynomial degree of the Savitzky-Golay filter to use for detrending

  • detrendTrackPercentile (float) – The percentile to use for detrending the read density data.

  • detrendWindowLengthBP (int) – The length of the window in base pairs for detrending the read density data.

class consenrich.core.inputParams(bamFiles, bamFilesControl)[source]

Parameters related to the input data for Consenrich.

Parameters:
  • bamFiles (List[str]) – A list of paths to distinct coordinate-sorted and indexed BAM files.

  • bamFilesControl (List[str], optional) – A list of paths to distinct coordinate-sorted and indexed control BAM files. e.g., IgG control inputs for ChIP-seq.

class consenrich.core.genomeParams(genomeName, chromSizesFile, blacklistFile, sparseBedFile, chromosomes, excludeChroms, excludeForNorm)[source]
Parameters:
  • genomeName (str)

  • chromSizesFile (str)

  • blacklistFile (str | None)

  • sparseBedFile (str | None)

  • chromosomes (List[str])

  • excludeChroms (List[str])

  • excludeForNorm (List[str])

class consenrich.core.countingParams(stepSize, scaleDown, scaleFactors, scaleFactorsControl, numReads)[source]

Parameters related to counting reads in genomic intervals.

Parameters:
  • stepSize (int) – Step size for the genomic intervals.

  • scaleDown (bool, optional) – If using paired treatment and control BAM files, whether to scale down the larger of the two before computing the difference/ratio

  • scaleFactors (List[float], optional) – Scale factors for the read counts.

  • scaleFactorsControl (List[float], optional) – Scale factors for the control read counts.

  • numReads (int) – Number of reads to sample.

class consenrich.core.samParams(samThreads, samFlagExclude, oneReadPerBin, chunkSize)[source]

Parameters related to reading BAM files

Parameters:
  • samThreads (int) – The number of threads to use for reading BAM files.

  • samFlagExclude (int) – The SAM flag to exclude certain reads.

  • oneReadPerBin (int) – If 1, only the interval with the greatest read overlap is incremented.

  • chunkSize (int) – maximum number of intervals’ data to hold in memory before flushing to disk.

Note

For an overview of SAM flags see https://broadinstitute.github.io/picard/explain-flags.html

class consenrich.core.matchingParams(templateNames, cascadeLevels, iters, alpha, minMatchLengthBP, maxNumMatches, minSignalAtMaxima)[source]

Parameters related the experimental wavelet-based matched filter for pattern recognition.

Parameters:
  • templateNames (List[str]) – The names of the templates to match against.

  • cascadeLevels (List[int]) – Templates are currently derived from cascade-approximated wavelets at level=cascadeLevel.

  • iters (int) – Number of iterations to use for sampling block maxima while building the empirical null

  • alpha (float) – Significance level for the empirical null distribution.

  • minMatchLength (int) – Minimum length around response maxima to qualify matches.

  • minMatchLengthBP (int | None)

  • maxNumMatches (int | None)

  • minSignalAtMaxima (float | None)

consenrich.core.getChromRanges(bamFile, chromosome, chromLength, samThreads, samFlagExclude)[source]

Get the start and end positions of reads in a chromosome from a BAM file.

Parameters:
  • bamFile (str) – See inputParams.

  • chromosome (str) – the chromosome to read in bamFile.

  • chromLength (int) – Base pair length of the chromosome.

  • samThreads (int) – See samParams.

  • samFlagExclude (int) – See samParams.

Returns:

Tuple of start and end positions (nucleotide coordinates) in the chromosome.

Return type:

Tuple[int, int]

Seealso:

getChromRangesJoint(), cconsenrich.cgetFirstChromRead(), cconsenrich.cgetLastChromRead()

consenrich.core.getChromRangesJoint(bamFiles, chromosome, chromSize, samThreads, samFlagExclude)[source]

For multiple BAM files, reconcile a single start and end position over which to count reads, where the start and end positions are defined by the first and last reads across all BAM files.

Parameters:
  • bamFiles (List[str]) – List of BAM files to read.

  • chromosome (str) – Chromosome to read.

  • chromSize (int) – Size of the chromosome.

  • samThreads (int) – Number of threads to use for reading the BAM files.

  • samFlagExclude (int) – SAM flag to exclude certain reads.

Returns:

Tuple of start and end positions.

Return type:

Tuple[int, int]

Seealso:

getChromRanges(), cconsenrich.cgetFirstChromRead(), cconsenrich.cgetLastChromRead()

consenrich.core.getReadLength(bamFile, numReads, maxIterations, samThreads, samFlagExclude)[source]

Infer read length from mapped reads in a BAM file. Samples at least numReads reads passing criteria given by samFlagExclude and returns the median read length.

Parameters:
  • bamFile (str) – See inputParams.

  • numReads (int) – Number of reads to sample.

  • maxIterations (int) – Maximum number of iterations to perform.

  • samThreads (int) – See samParams.

  • samFlagExclude (int) – See samParams.

Returns:

The median read length.

Return type:

int

Raises:

ValueError – If the read length cannot be determined after scanning maxIterations reads.

Seealso:

cconsenrich.cgetReadLength()

consenrich.core.readBamSegments(bamFiles, chromosome, start, end, stepSize, readLengths, scaleFactors, oneReadPerBin, samThreads, samFlagExclude)[source]

Calculate tracks of read counts (or a function thereof) for each BAM file.

See cconsenrich.creadBamSegment() for the underlying implementation in Cython.

Parameters:
  • bamFiles (List[str]) – See inputParams.

  • chromosome (str) – Chromosome to read.

  • start (int) – Start position of the genomic segment.

  • end (int) – End position of the genomic segment.

  • readLengths (List[int]) – List of read lengths for each BAM file.

  • scaleFactors (List[float]) – List of scale factors for each BAM file.

  • stepSize (int) – See countingParams.

  • oneReadPerBin (int)

  • samThreads (int)

  • samFlagExclude (int)

Return type:

ndarray[tuple[Any, …], dtype[float64]]

consenrich.core.getAverageLocalVarianceTrack(values, stepSize, approximationWindowLengthBP, lowPassWindowLengthBP, minR, maxR)[source]

Approximate local noise levels in a segment using an ALV approach.

First, computes a segment-length simple moving average of values with a bp-length window approximationWindowLengthBP.

Second, computes a segment-length simple moving average of squared values.

Between these two averages, the difference between the latter and the square of the former approximates the local variance of the segment. These local variances are then combined with a median filter of length lowPassWindowLengthBP.

Parameters:
  • values (np.ndarray) – An array of read-density-based values (typically from a single row in a sample-by-interval matrix)

  • stepSize (int) – See countingParams.

  • observationParams (observationParams) – See observationParams

  • approximationWindowLengthBP (int) – The length of the approximation window in base pairs (BP).

  • lowPassWindowLengthBP (int) – The length of the low-pass filter window in base pairs (BP).

  • minR (float)

  • maxR (float)

Seealso:

observationParams

Return type:

ndarray[tuple[Any, …], dtype[float64]]

consenrich.core.constructMatrixF(deltaF)[source]

Build the state transition matrix for the process model

Parameters:

deltaF (float) – See processParams.

Returns:

The state transition matrix \(\mathbf{F}\)

Return type:

npt.NDArray[np.float64]

Seealso:

processParams

consenrich.core.constructMatrixQ(minDiagQ, offDiagQ=0.0)[source]

Build the initial process noise covariance matrix \(\mathbf{Q}_{[1]}\).

Parameters:
Returns:

The initial process noise covariance matrix \(\mathbf{Q}_{[1]}\).

Return type:

npt.NDArray[np.float64]

Seealso:

processParams

consenrich.core.constructMatrixH(m, coefficients=None)[source]

Build the observation model matrix \(\mathbf{H}\).

Parameters:
  • m (int) – Number of observations.

  • coefficients (Optional[npt.NDArray[np.float64]]) – Optional coefficients for the observation model, which can be used to weight the observations manually.

Returns:

The observation model matrix \(\mathbf{H}\).

Return type:

npt.NDArray[np.float64]

Seealso:

observationParams, class:inputParams

consenrich.core.runConsenrich(matrixData, matrixMunc, deltaF, minQ, maxQ, offDiagQ, dStatAlpha, dStatd, dStatPC, stateInit, stateCovarInit, boundState, stateLowerBound, stateUpperBound, chunkSize, progressIter, coefficientsH=None, residualCovarInversionFunc=None, adjustProcessNoiseFunc=None)[source]

Run consenrich on a contiguous segment (e.g. a chromosome) of read-density-based data. Completes the forward and backward passes given data and approximated observation noise covariance matrices \(\mathbf{R}_{[1:n, (11:mm)]}\).

Parameters:
  • matrixData (npt.NDArray[np.float64]) – Read density data for a single chromosome or general contiguous segment, possibly preprocessed. Two-dimensional array of shape \(m \times n\) where \(m\) is the number of samples/tracks and \(n\) the number of genomic intervals.

  • matrixMunc (npt.NDArray[np.float64]) – Uncertainty estimates for the read density data, e.g. local variance. Two-dimensional array of shape \(m \times n\) where \(m\) is the number of samples/tracks and \(n\) the number of genomic intervals. :seealso: getAverageLocalVarianceTrack(), getMuncTrack().

  • deltaF (float) – See processParams.

  • minQ (float) – See processParams.

  • maxQ (float) – See processParams.

  • offDiagQ (float) – See processParams.

  • dStatAlpha (float) – See processParams.

  • dStatd (float) – See processParams.

  • dStatPC (float) – See processParams.

  • stateInit (float) – See stateParams.

  • stateCovarInit (float) – See stateParams.

  • chunkSize (int) – Number of genomic intervals’ data to keep in memory before flushing to disk.

  • progressIter (int) – The number of iterations after which to log progress.

  • coefficientsH (Optional[npt.NDArray[np.float64]]) – Optional coefficients for the observation model matrix \(\mathbf{H}\). If None, the coefficients are set to 1.0 for all samples.

  • residualCovarInversionFunc (Optional[Callable]) – Callable function to invert the observation covariance matrix \(\mathbf{E}_{[i]}\). If None, defaults to cconsenrich.cinvertMatrixE().

  • adjustProcessNoiseFunc (Optional[Callable]) – Function to adjust the process noise covariance matrix \(\mathbf{Q}_{[i]}\). If None, defaults to cconsenrich.updateProcessNoiseCovariance().

  • boundState (bool)

  • stateLowerBound (float)

  • stateUpperBound (float)

Returns:

Tuple of three numpy arrays: - state estimates \(\widetilde{\mathbf{x}}_{[i]}\) of shape \(n \times 2\) - state covariance estimates \(\widetilde{\mathbf{P}}_{[i]}\) of shape \(n \times 2 \times 2\) - post-fit residuals \(\widetilde{\mathbf{y}}_{[i]}\) of shape \(n \times m\)

Return type:

Tuple[npt.NDArray[np.float64], npt.NDArray[np.float64], npt.NDArray[np.float64]]

Raises:

ValueError – If the number of samples in matrixData is not equal to the number of samples in matrixMunc.

consenrich.core.getPrimaryState(stateVectors, roundPrecision=3)[source]

Get the primary state estimate from each vector after running Consenrich.

Parameters:
  • stateVectors (npt.NDArray[np.float64]) – State vectors from runConsenrich().

  • roundPrecision (int)

Returns:

A one-dimensional numpy array of the primary state estimates.

Return type:

npt.NDArray[np.float64]

consenrich.core.getStateCovarTrace(stateCovarMatrices, roundPrecision=3)[source]

Get a one-dimensional array of state covariance traces after running Consenrich

Parameters:
  • stateCovarMatrices (npt.NDArray[np.float64]) – Estimated state covariance matrices \(\widetilde{\mathbf{P}}_{[i]}\)

  • roundPrecision (int)

Returns:

A one-dimensional numpy array of the traces of the state covariance matrices.

Return type:

npt.NDArray[np.float64]

consenrich.core.getPrecisionWeightedResidual(postFitResiduals, matrixMunc, roundPrecision=3)[source]

Get a one-dimensional precision-weighted array residuals after running Consenrich.

This is essentially an estimate of the residuals with respect to the observation noise covariance \(\mathbf{R}_{[:, (11:mm)]}\).

Applies an inverse-variance weighting (with respect to the observation noise levels) of the post-fit residuals \(\widetilde{\mathbf{y}}_{[i]}\) and returns a one-dimensional array of “precision-weighted residuals”.

Parameters:
  • postFitResiduals (npt.NDArray[np.float64]) – Post-fit residuals from runConsenrich().

  • matrixMunc (npt.NDArray[np.float64]) – an \(m \times n\) numpy array where each column stores the diagonal entries of the observation noise covariance matrix \(\mathbf{R}_{[:, (11:mm)]}\) for each sample \(j=1,2,\ldots,m\) and each genomic interval \(i=1,2,\ldots,n\).

  • roundPrecision (int)

Returns:

A one-dimensional array of “precision-weighted residuals”

Return type:

npt.NDArray[np.float64]

consenrich.core.getMuncTrack(chromosome, intervals, stepSize, rowValues, minR, maxR, useALV, useConstantNoiseLevel, noGlobal, localWeight, globalWeight, approximationWindowLengthBP, lowPassWindowLengthBP, returnCenter, sparseMap=None)[source]

Get observation noise variance \(R_{[:,jj]}\) for the sample \(j\).

Parameters:
Returns:

A one-dimensional numpy array of the observation noise track for the sample \(j\).

Return type:

npt.NDArray[np.float64]

consenrich.core.sparseIntersection(chromosome, intervals, sparseBedFile)[source]

If using an annotation of sparse features to complement approximation of observation noise levels, this function returns intervals in the chromosome that overlap with the sparse features. :param chromosome: The chromosome name. :type chromosome: str :param intervals: The genomic intervals to consider. :type intervals: np.ndarray :param sparseBedFile: Path to the sparse BED file. :type sparseBedFile: str :return: A numpy array of start positions of the sparse features that overlap with the intervals :rtype: np.ndarray[Tuple[Any], np.dtype[Any]]

Parameters:
  • chromosome (str)

  • intervals (ndarray)

  • sparseBedFile (str)

Return type:

ndarray[tuple[Any, …], dtype[int64]]

consenrich.core.adjustFeatureBounds(feature, stepSize)[source]

Adjust the start and end positions of a BED feature to be centered around a step.

Parameters:
  • feature (Interval)

  • stepSize (int)

Return type:

Interval

consenrich.core.getSparseMap(chromosome, intervals, numNearest, sparseBedFile)[source]

Build a map between each genomic interval and numNearest sparse features

Parameters:
  • chromosome (str) – The chromosome name. Note, this function only needs to be run once per chromosome.

  • intervals (npt.NDArray[np.int64]) – The genomic intervals to map.

  • numNearest (int) – The number of nearest sparse features to consider

  • sparseBedFile (str) – path to the sparse BED file.

Returns:

A dictionary mapping each interval index to the indices of the nearest sparse regions.

Return type:

dict[int, np.ndarray]

Background/Trend Removal and Scaling: consenrich.detrorm

Note

See consenrich.core.detrendParams for relevant parameters.

consenrich.detrorm.getScaleFactor1x(bamFile, effectiveGenomeSize, readLength, excludeChroms, chromSizesFile, samThreads)[source]

Generic normalization factor based on effective genome size and number of mapped reads in non-excluded chromosomes.

Parameters:
Returns:

Scale factor for 1x normalization.

Return type:

float

consenrich.detrorm.getScaleFactorPerMillion(bamFile, excludeChroms)[source]

Generic normalization factor based on number of mapped reads in non-excluded chromosomes.

Parameters:
  • bamFile (str) – See consenrich.core.inputParams.

  • excludeChroms (List[str]) – List of chromosomes to exclude when counting mapped reads.

Returns:

Scale factor accounting for number of mapped reads (only).

Return type:

float

consenrich.detrorm.getPairScaleFactors(bamFileA, bamFileB, effectiveGenomeSizeA, effectiveGenomeSizeB, readLengthA, readLengthB, excludeChroms, chromSizesFile, samThreads, scaleDown=True)[source]

Get scaling constants that normalize two alignment files to each other (e.g. ChIP-seq treatment and control) with respect to sequence coverage.

Parameters:
  • bamFileA (str) – Path to the first BAM file.

  • bamFileB (str) – Path to the second BAM file.

  • effectiveGenomeSizeA (int) – Effective genome size for the first BAM file.

  • effectiveGenomeSizeB (int) – Effective genome size for the second BAM file.

  • readLengthA (int) – Read length for the first BAM file.

  • readLengthB (int) – Read length for the second BAM file.

  • excludeChroms (List[str]) – List of chromosomes to exclude from the analysis.

  • chromSizesFile (str) – Path to the chromosome sizes file.

  • samThreads (int) – Number of threads to use for reading BAM files.

  • scaleDown (bool)

Returns:

A tuple containing the scale factors for the first and second BAM files.

Return type:

Tuple[float, float]

consenrich.detrorm.detrendTrack(values, stepSize, detrendWindowLengthBP, useOrderStatFilter, usePolyFilter, detrendTrackPercentile, detrendSavitzkyGolayDegree)[source]

Detrend tracks using either an order statistic filter or a polynomial filter.

Parameters:
  • values (np.ndarray) – Values to detrend.

  • stepSize (int) – see consenrich.core.countingParams.

  • detrendWindowLengthBP (int) – See consenrich.core.detrendParams.

  • useOrderStatFilter (bool) – Whether to use a sliding order statistic filter.

  • usePolyFilter (bool) – Whether to use a sliding polynomial/least squares filter.

  • detrendTrackPercentile (float) – Percentile to use for the order statistic filter.

  • detrendSavitzkyGolayDegree (int) – Degree of the polynomial for the Savitzky-Golay/Polynomial filter.

Returns:

Detrended values.

Return type:

np.ndarray

Raises:

ValueError – If the detrend window length is not greater than 3 times the step size or if the values length is less than the detrend window length.

Genomic resources and constants: consenrich.constants

Note

This module is provided for convenience. If a genome is not listed here, users can still specify resources manually. For any genome assembly, you can provide the following manually.

consenrich.constants.getEffectiveGenomeSize(genome, readLength)[source]

Get the effective genome size for a given genome and read length.

Parameters:
Raises:

ValueError – If the genome is not recognized or if the read length is not available for the genome.

Returns:

Effective genome size in base pairs.

Return type:

int

consenrich.constants.getGenomeResourceFile(genome, fileType, dir_='data')[source]

Get the path to a genome resource file.

Parameters:
Returns:

Path to the resource file.

Return type:

str

Raises:
  • ValueError – If not a sizes, blacklist, or sparse file.

  • FileNotFoundError – If the resource file does not exist.

consenrich.constants.resolveGenomeName(genome)[source]

Standardize the genome name for consistency :param genome: Name of the genome. See consenrich.core.genomeParams. :type genome: str :return: Standardized genome name. :rtype: str :raises ValueError: If the genome is not recognized.

Parameters:

genome (str)

Return type:

str

Cython functions: consenrich.cconsenrich

Several functions are implemented in Cython for efficiency in the main loop and during matrix construction.

consenrich.cconsenrich.creadBamSegment(bamFile, chromosome, start, end, stepSize, readLength, oneReadPerBin, samThreads, samFlagExclude)

Count reads in a BAM file for a given chromosome and range, returning a numpy array of counts.

See consenrich.core.readBamSegments() for the multi-sample python wrapper

Parameters:
Returns:

A numpy array of counts for each bin in the specified range.

Return type:

cnp.ndarray[cnp.uint32_t, ndim=1]

consenrich.cconsenrich.cinvertMatrixE(muncMatrixIter, priorCovarianceOO)

Invert the residual covariance matrix during the forward pass.

Parameters:
  • muncMatrixIter (cnp.ndarray[cnp.float64_t, ndim=1]) – The diagonal elements of the covariance matrix at a given genomic interval.

  • priorCovarianceOO (cnp.float64_t) – The a priori ‘primary’ state variance \(P_{[i|i-1,11]}\).

Returns:

The inverted covariance matrix.

Return type:

cnp.ndarray[cnp.float64_t, ndim=2]

consenrich.cconsenrich.updateProcessNoiseCovariance(matrixQ, matrixQCopy, dStat, dStatAlpha, dStatd, dStatPC, inflatedQ, maxQ, minQ)

Adjust process noise covariance matrix \(\mathbf{Q}_{[i]}\)

Parameters:
  • matrixQ – Current process noise covariance

  • matrixQCopy – A copy of the initial original covariance matrix \(\mathbf{Q}_{[.]}\)

  • inflatedQ – Flag indicating if the process noise covariance is inflated

Returns:

Updated process noise covariance matrix and inflated flag

Return type:

tuple

consenrich.cconsenrich.csampleBlockStats(values, expectedBlockSize, iters, randSeed)

Sample contiguous blocks in the response sequence, record maxima, and repeat.

Used to determine significance threshold in the response sequence. See consenrich.matching.matchWavelet()

Parameters:
  • values (cnp.ndarray[cnp.float64_t, ndim=1]) – The response sequence to sample from.

  • expectedBlockSize (int) – The expected size (geometric) of the blocks to sample.

  • iters (int) – The number of blocks to sample.

  • randSeed (int) – Random seed for reproducibility.

Returns:

An array of sampled block maxima.

Return type:

cnp.ndarray[cnp.float64_t, ndim=1]