PeakATail¶
PeakATail detects poly(A) sites (PAS) at single-cell resolution from any scRNA-seq BAM
that carries cell-barcode (CB) and UMI (UB) tags — STARsolo, CellRanger, Alevin-fry,
or any aligner that emits the standard 10x-style tag schema. It clusters cells by their
3' UTR usage patterns and tests for differential alternative polyadenylation (APA) between
cell types or conditions. The tool is packaged as the ema CLI, installable via pip or
uv.
Input requirements
PeakATail does not correct cell barcodes — your aligner must already have applied a
barcode whitelist. The BAM must carry CB:Z (corrected barcode) and UB:Z (corrected UMI)
tags. See Preparing your BAM below for the recommended STAR/STARsolo
command.
Why PeakATail¶
- Discover APA isoforms per cell type. Peak-calling runs per-strand on sorted BAM files to identify PAS coordinates; each PAS is associated with its nearest gene via GTF-based UTR lookup.
- Cluster cells by 3' UTR usage. PAS-by-cell count matrices are processed through TF-IDF + LSI dimensionality reduction and Leiden community detection so clusters reflect 3' isoform choice, not total expression level.
- Identify cell-type-specific PAS switching.
ema switch diffruns Fisher exact tests or negative binomial regression across every cluster pair and reports differentially used PAS with FDR control.
Install¶
PeakATail requires samtools and bedtools on your system path, and Python 3.11 or later.
Preparing your BAM¶
PeakATail reads only what's already in your BAM — it does not correct
barcodes, demultiplex reads, or align FASTQs. You need a coord-sorted BAM with
the standard 10x-style CB:Z (corrected cell barcode) and UB:Z (corrected
UMI) tags. The recommended STARsolo invocation (matches the protocol the
reference benchmarks were run on):
STAR \
--runThreadN 16 \
--genomeDir <STAR_INDEX> \
--readFilesIn R2.fastq.gz R1.fastq.gz \
--readFilesCommand zcat \
--outFileNamePrefix sample_ \
--outSAMtype BAM SortedByCoordinate \
--outSAMattributes NH HI nM AS CR UR CB UB GX GN sS sQ sM \
--soloType CB_UMI_Simple \
--soloCBstart 1 --soloCBlen 16 \
--soloUMIstart 17 --soloUMIlen 10 \
--soloCBwhitelist /path/to/737K-august-2016.txt
Barcode whitelist is required
--soloCBwhitelist is mandatory. PeakATail trusts the CB:Z tag — if you
skip the whitelist, sequencing errors will appear as thousands of
spurious "cells". For 10x v2/v3 the whitelists ship with CellRanger
(737K-august-2016.txt, 3M-february-2018.txt). For Drop-seq /
inDrops / smart-seq3 / etc., use the protocol-specific whitelist that
your aligner accepts (see your aligner's docs).
What about CellRanger / Alevin-fry / kallisto|bustools?
Any aligner that emits the standard CB/UB tag schema works. CellRanger
BAMs work out-of-the-box. Alevin-fry emits the same tags via its
--sketch / --rad-then-convert flow. For salmon/kallisto-bustools you
need to convert the busfile back to a tagged BAM before passing it to
PeakATail.
Quick start¶
Step 1 — Run the full pipeline (peak-calling + clustering):
The example.yaml at the repo root shows the full schema. At minimum, provide a datasets block
pointing to your BAM file(s), a gtf path, and read-geometry parameters:
datasets:
- id: sample1
merge_strategy: none
bams:
- /path/to/cellranger_output/possorted_genome_bam.bam
gtf: /path/to/gencode.v44.annotation.gtf
seqlen: 150
cb_len: 16
barcode_tag: CB
Step 2 — Inspect per-dataset clusters in the timestamped output directory (default: emaout/).
Each dataset gets a clusters.h5ad inside per_dataset/<id>/.
Step 3 — Test for differential APA between cluster pairs:
Results land in switch_diff_<timestamp>/ inside the same run directory.
Documentation map¶
-
CLI reference
Every
emasubcommand, flag, and option documented with types, defaults, and examples. -
Strategies
The algorithms behind peak calling, clustering, quantification, differential testing, and visualisation.
-
Tutorials
Step-by-step guides: single-sample run, multi-sample atlas mode, cluster-pair differential APA, and more.
-
Concepts
Data flow through the pipeline, output file layout, and the YAML configuration schema.
Status and citation¶
PeakATail is developed at the BMG Lab. Source code and issue tracker: github.com/BMGLab/PeakATail.
If you use PeakATail in your research, please cite the repository until a peer-reviewed publication is available.