Strategy reference¶

PeakATail is built around pluggable strategy registries. Every algorithm choice — how cells are clustered, how APA usage is quantified, how differential tests are run, how clusters are matched across datasets, and which figures are produced — is a named strategy that can be swapped without touching the rest of the pipeline. This page lists every registered strategy and links to its full reference.

Clustering¶

Clustering groups cells by their 3' UTR usage so that downstream APA tests have biologically meaningful contrasts. See clustering strategies.

Strategy	CLI flag	When to use
`leiden_tfidf`	`--cluster-method leiden_tfidf`	Sparse PAS counts (10x Chromium, Drop-seq). Recommended default.
`leiden_libsize`	`--cluster-method leiden_libsize`	Dense gene-count-like data; when TF-IDF produces too many small clusters.
`external`	`--cluster-method external`	Import Seurat / Scanpy clusters produced from gene expression.

Quantification (PDUI)¶

Quantification converts the PAS count matrix into a per-cell APA summary score. See quantification strategies.

Strategy	CLI flag	Output file	When to use
`classic`	`--pdui-method classic`	`pdui_classic.tsv`	Binary proximal/distal question on a per-gene or per-isoform basis.
`proportion`	`--pdui-method proportion`	`proportion.tsv`	Full per-PAS usage vector per cell; feeds `proportion_heatmap` viz.
`shannon`	`--pdui-method shannon`	`entropy_shannon.tsv`	Heterogeneity of PAS usage; genes with many PAS, or pseudotime analyses.

Differential APA¶

Differential testing identifies PAS that shift between two or more clusters. See diff strategies.

Strategy	CLI flag	Multi-condition	When to use
`fisher`	`--diff-method fisher`	No	Fast exploratory screen; within-gene framing (DEXSeq-style).
`nb_pairwise`	`--diff-method nb_pairwise`	No	Cell-level NB GLM; corrects for library size; no pseudo-replication.
`nb_multi`	`--diff-method nb_multi`	Yes	Omnibus test across all clusters at once.

Cross-dataset matching¶

Matching assigns canonical cluster IDs when the same experiment is processed in multiple batches or from multiple samples. See match strategies.

Strategy	CLI flag	When to use
`marker_overlap`	`--match-method marker_overlap`	Default. Datasets share PAS features; reasonably similar protocols.
`mnn`	`--match-method mnn`	Low marker overlap; different capture technologies or library sizes.
`jaccard`	`--match-method jaccard`	Same physical cells in both runs (regression testing or split-BAM experiments).

Visualization¶

Every plot type is a registered strategy with optional engine variants (matplotlib, plotly, scanpy). See viz strategies.

Plot type	Registered names	What it shows
`umap`	`umap_matplotlib`, `umap_plotly`, `umap_scanpy`	2-D UMAP colored by cluster.
`cluster_sizes`	`cluster_sizes_matplotlib`, `cluster_sizes_plotly`	Bar chart of cell counts per cluster.
`peak_qc`	`peak_qc_matplotlib`, `peak_qc_plotly`	4-panel peak and cell QC distributions.
`volcano`	`volcano_matplotlib`, `volcano_plotly`	Differential APA volcano: log2fc vs -log10(q).
`pdui_distribution`	`pdui_distribution_matplotlib`, `pdui_distribution_plotly`, `pdui_distribution_scanpy`	Per-cluster PDUI violin plots.
`proportion_heatmap`	`proportion_heatmap_matplotlib`, `proportion_heatmap_plotly`	PAS x cluster mean proportion heatmap.
`entropy_distribution`	`entropy_distribution_matplotlib`, `entropy_distribution_plotly`	Per-cluster Shannon entropy violin plots.
`diff_agreement`	`diff_agreement_matplotlib`, `diff_agreement_plotly`	Jaccard overlap between significant PAS sets from different test strategies.
`length_shifts`	`length_shifts_matplotlib`, `length_shifts_plotly`	Gene x cluster-pair PDUI delta heatmap.
`pas_overlap`	`pas_overlap_matplotlib`, `pas_overlap_plotly`	UpSet / Venn of PAS detected across datasets.
`atlas_snap_diag`	`atlas_snap_diag_matplotlib`, `atlas_snap_diag_plotly`	Atlas-snap result: snapped vs unsnapped counts + distance histogram.
`cluster_match_sankey`	`cluster_match_sankey_matplotlib`, `cluster_match_sankey_plotly`	Alluvial chart of original → canonical cluster correspondence.
`match_confidence`	`match_confidence_matplotlib`, `match_confidence_plotly`	Heatmap of match confidence scores per (dataset, cluster) pair.
`tile_timing`	`tile_timing_matplotlib`, `tile_timing_plotly`	Bar chart of per-tile peak-calling wall time, outliers highlighted.
`resource_timeline`	`resource_timeline_matplotlib`, `resource_timeline_plotly`	Dual-axis RAM (GB) and CPU% over elapsed seconds.
`gene_track`	`gene_track_matplotlib`, `gene_track_plotly`	Per-gene: isoform structure + per-cluster PAS coverage bars.