Skip to content

PeakATail

PeakATail — coiled snake wordmark with poly(A) tail

PeakATail detects poly(A) sites (PAS) at single-cell resolution from any scRNA-seq BAM that carries cell-barcode (CB) and UMI (UB) tags — STARsolo, CellRanger, Alevin-fry, or any aligner that emits the standard 10x-style tag schema. It clusters cells by their 3' UTR usage patterns and tests for differential alternative polyadenylation (APA) between cell types or conditions. The tool is packaged as the ema CLI, installable via pip or uv.

Input requirements

PeakATail does not correct cell barcodes — your aligner must already have applied a barcode whitelist. The BAM must carry CB:Z (corrected barcode) and UB:Z (corrected UMI) tags. See Preparing your BAM below for the recommended STAR/STARsolo command.

Why PeakATail

  • Discover APA isoforms per cell type. Peak-calling runs per-strand on sorted BAM files to identify PAS coordinates; each PAS is associated with its nearest gene via GTF-based UTR lookup.
  • Cluster cells by 3' UTR usage. PAS-by-cell count matrices are processed through TF-IDF + LSI dimensionality reduction and Leiden community detection so clusters reflect 3' isoform choice, not total expression level.
  • Identify cell-type-specific PAS switching. ema switch diff runs Fisher exact tests or negative binomial regression across every cluster pair and reports differentially used PAS with FDR control.

Install

PeakATail requires samtools and bedtools on your system path, and Python 3.11 or later.

pip install peakatail
uv pip install peakatail
git clone https://github.com/BMGLab/PeakATail.git
cd PeakATail
uv sync          # installs all deps from uv.lock
# or: pip install -e .

System dependencies

sudo apt-get install samtools bedtools   # Debian / Ubuntu

Preparing your BAM

PeakATail reads only what's already in your BAM — it does not correct barcodes, demultiplex reads, or align FASTQs. You need a coord-sorted BAM with the standard 10x-style CB:Z (corrected cell barcode) and UB:Z (corrected UMI) tags. The recommended STARsolo invocation (matches the protocol the reference benchmarks were run on):

STAR \
    --runThreadN 16 \
    --genomeDir <STAR_INDEX> \
    --readFilesIn R2.fastq.gz R1.fastq.gz \
    --readFilesCommand zcat \
    --outFileNamePrefix sample_ \
    --outSAMtype BAM SortedByCoordinate \
    --outSAMattributes NH HI nM AS CR UR CB UB GX GN sS sQ sM \
    --soloType CB_UMI_Simple \
    --soloCBstart 1  --soloCBlen 16 \
    --soloUMIstart 17 --soloUMIlen 10 \
    --soloCBwhitelist /path/to/737K-august-2016.txt

Barcode whitelist is required

--soloCBwhitelist is mandatory. PeakATail trusts the CB:Z tag — if you skip the whitelist, sequencing errors will appear as thousands of spurious "cells". For 10x v2/v3 the whitelists ship with CellRanger (737K-august-2016.txt, 3M-february-2018.txt). For Drop-seq / inDrops / smart-seq3 / etc., use the protocol-specific whitelist that your aligner accepts (see your aligner's docs).

What about CellRanger / Alevin-fry / kallisto|bustools?

Any aligner that emits the standard CB/UB tag schema works. CellRanger BAMs work out-of-the-box. Alevin-fry emits the same tags via its --sketch / --rad-then-convert flow. For salmon/kallisto-bustools you need to convert the busfile back to a tagged BAM before passing it to PeakATail.

Quick start

Step 1 — Run the full pipeline (peak-calling + clustering):

ema run --config example.yaml

The example.yaml at the repo root shows the full schema. At minimum, provide a datasets block pointing to your BAM file(s), a gtf path, and read-geometry parameters:

datasets:
  - id: sample1
    merge_strategy: none
    bams:
      - /path/to/cellranger_output/possorted_genome_bam.bam
gtf: /path/to/gencode.v44.annotation.gtf
seqlen: 150
cb_len: 16
barcode_tag: CB

Step 2 — Inspect per-dataset clusters in the timestamped output directory (default: emaout/). Each dataset gets a clusters.h5ad inside per_dataset/<id>/.

Step 3 — Test for differential APA between cluster pairs:

ema switch diff \
  --h5ad emaout/per_dataset/sample1/clusters.h5ad \
  --strategy fisher \
  --fdr 0.05

Results land in switch_diff_<timestamp>/ inside the same run directory.

Documentation map

Status and citation

PeakATail is developed at the BMG Lab. Source code and issue tracker: github.com/BMGLab/PeakATail.

If you use PeakATail in your research, please cite the repository until a peer-reviewed publication is available.