Umi-pipeline-nf

Umi-pipeline-nf creates highly accurate single-molecule consensus sequences for unique molecular identifier (UMI)-tagged amplicons from nanopore sequencing data.
The pipeline processes FastQ files (typically from the fastq_pass folder of your nanopore run) and outputs high-quality aligned consensus sequences in BAM format for each UMI cluster. The optional variant calling creates a vcf file for all variants that are found in the consensus sequences.
The newest version of the pipeline supports live analysis of the clusters during sequencing and seemless polishing of the clusters as soon as enough clusters are found.

Umi-pipeline-nf originated from a Snakemake-based analysis pipeline (pipeline-umi-amplicon; originally developed by Karst et al, Nat Biotechnol 18:165–169, 2021). We have migrated the pipeline to Nextflow and incorporated several optimizations and additional functionalities.

Workflow

The pipeline is organized into four main subworkflows, each with its own processing steps and outputs:

LIVE UMI PROCESSING
- Purpose: Real-time processing of raw FastQ files.
- Steps:
  - Merge and filter raw FastQ files.
  - Align reads to the reference genome.
  - Extract UMI sequences.
  - Cluster UMI-tagged reads.
- Outputs:
  - Processed UMI clusters are passed on to later stages.
  - Raw alignment files (e.g., in <output>/<barcodeXX>/raw/align/ or <output>/<barcodeXX>/<target>/fastq_filtered/raw/).
  - Filtered FastQ files and clustering statistics.
To stop the pipeline when it's in live mode, create a CONTINUE file in the output directory:
touch <output>/CONTINUE
OFFLINE UMI PROCESSING
- Purpose: Batch processing with an optional subsampling step.
- Steps:
  - Merge and filter FastQ files.
  - Optionally subsample the merged reads.
  - Perform alignment, UMI extraction, and clustering similar to LIVE processing.
- Outputs:
  - Processed UMI clusters.
  - Alignment and subsampling reports (e.g., in <output>/<barcodeXX>/raw/subsampling/ and <output>/<barcodeXX>/<target>/fastq_filtered/raw/).
UMI POLISHING
- Purpose: Refine UMI clusters to generate high-quality consensus sequences.
- Steps:
  - Polish clusters using medaka.
  - Realign consensus sequences to the reference genome.
  - Re-extract and re-cluster UMIs from consensus reads.
  - Parse final consensus clusters.
- Outputs:
  - Consensus BAM and FastQ files (e.g., in <output>/<barcodeXX>/<target>/align/consensus/ and <output>/<barcodeXX>/<target>/fastq/consensus/).
  - Polishing logs and detailed cluster statistics.
VARIANT CALLING
- Purpose: Identify genetic variants from the consensus data.
- Steps:
  - Perform variant calling using one of the supported callers: freebayes, lofreq, or mutserve.
- Outputs:
  - VCF files with variant calls (e.g., in <output>/<barcodeXX>/<target>/<freebayes/mutserve/lofreq>/).

See the output documentation for a detailed overview of the pipeline outputs and directory structure.

Main Adaptations

It comes with a docker/singularity container making installation simple, easy to use on clusters and results highly reproducible.
The pipeline is optimized for parallelization.
Additional UMI cluster splitting step to remove admixed UMI clusters.
Read filtering strategy per UMI cluster was adapted to preserve the highest quality reads.
Three commonly used variant callers (freebayes, lofreq or mutserve) are supported by the pipeline.
The raw reads can be optionally subsampled.
The raw reads can be filtered by read length and quality.
GPU acceleration for cluster polishing by Medaka is available when using the docker profile. Tested with an RTX 4080 SUPER GPU (16 GB).
Allows multi line bed files to run the pipeline for several targets at once.
Supports live analysis of the clusters during sequencing and seemless polishing of the clusters as soon as enough clusters are found

See the usage documentation for all of the available parameters of the pipeline.

Quick Start

Install nextflow.
Download the pipeline and test it on a minimal dataset with a single command.

nextflow run genepi/umi-pipeline-nf -r v1.0.0-beta -profile test,docker

Start running your own analysis!
3.1 Download and adapt the config/custom.config with paths to your data (relative and absolute paths possible).

nextflow run genepi/umi-pipeline-nf -r v1.0.0-beta -c <custom.config> -profile custom,<docker,singularity>

Citation

If you use the pipeline please cite our Paper:

Amstler S, Streiter G, Pfurtscheller C, Forer L, Di Maio S, Weissensteiner H, Paulweber B, Schoenherr S, Kronenberg F, Coassin S. Nanopore sequencing with unique molecular identifiers enables accurate mutation analysis and haplotyping in the complex lipoprotein(a) KIV-2 VNTR. Genome Med 16, 117 (2024). https://doi.org/10.1186/s13073-024-01391-8

Credits

The pipeline was written by @StephanAmstler.
Nextflow template pipeline: EcSeq.
Snakemake-based ONT pipeline for UMI nanopore sequencing analysis: nanoporetech/pipeline-umi-amplicon.
UMI-corrected nanopore sequencing analysis first shown by: SorenKarst/longread_umi.

Name		Name	Last commit message	Last commit date
Latest commit History 565 Commits
.github/workflows		.github/workflows
bin		bin
config		config
docs		docs
lib		lib
modules/local		modules/local
tests		tests
workflows		workflows
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
environment.yml		environment.yml
main.nf		main.nf
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
nf-test.config		nf-test.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Umi-pipeline-nf

Workflow

Main Adaptations

Quick Start

Citation

Credits

About

Releases 4

Packages

Contributors 2

Languages

License

genepi/umi-pipeline-nf

Folders and files

Latest commit

History

Repository files navigation

Umi-pipeline-nf

Workflow

Main Adaptations

Quick Start

Citation

Credits

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 2

Languages

Packages