Quantitative BiologyEnglishPublished

PyPeakRankR builds reproducible feature tables to rank ATAC‑seq peaks

June 17, 2026arXiv: 2606.18179v1

This paper presents PyPeakRankR, an open‑source Python tool that gathers many quantitative measures for genomic “peaks” into a single, reproducible table. Peaks are candidate regulatory DNA regions identified by assays such as ATAC‑seq (which finds open chromatin). The authors say labs often use different ad hoc scripts to compute these features, which makes it hard to compare or reproduce peak prioritization across studies. PyPeakRankR aims to standardize the upstream step: turning raw signal and sequence files into a portable peak-by-feature matrix.

At a high level the tool reads peak coordinates and extracts several types of features for each peak and writes them into a tab-separated values (TSV) file. Features include BigWig signal summaries (BigWig is a common file format for continuous genomic signals), GC content (the fraction of G or C bases in the sequence), PhyloP conservation scores (numbers that flag evolutionary conservation), and signal distribution moments such as kurtosis (how sharp the signal is), skewness (asymmetry), and a measure of bimodality. The package also computes cell‑type specificity rankings. PyPeakRankR separates deterministic feature extraction from ranking: the same inputs produce the same feature table, and the user can try different ranking strategies later on the same table.

Why this matters: many downstream experiments need a short, reliable list of peaks to test in the lab. By producing a standard, portable matrix of biologically motivated features, PyPeakRankR makes it easier to compare ranking methods and to choose candidates for experiments such as enhancer discovery or viral tool (AAV) design. The authors report validation evidence: the R predecessor PeakRankR ranked among the top three of 16 methods in a Brain Initiative Cell Census Network (BICCN) community challenge for predicting cell‑type specific enhancers. In a recent basal ganglia study the PyPeakRankR workflow was used in a Cross‑species Enhancer Ranking Pipeline (CERP) and helped identify enhancer‑AAV tools with over 70% on‑target specificity, with some enhancers exceeding 90%.