Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to auto-detect if SWIFT/SNAP protocol is being used and add a warning #178

Merged
merged 10 commits into from
Apr 28, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 16 additions & 14 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,22 +7,23 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### :warning: Major enhancements

* Pipeline has been re-implemented in [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html).
* All software containers are now exclusively obtained from [Biocontainers](https://biocontainers.pro/#/registry).
* Updated Nextflow version to `v21.04.0-edge` (see [nextflow#572](https://github.com/nextflow-io/nextflow/issues/1964)).
* Default human `--kraken2_db` link has been changed from Zenodo to an AWS S3 bucket for more reliable downloads.
* Illumina and Nanopore runs containing the same 48 samples sequenced on both platforms have been uploaded to the nf-core AWS account for full-sized tests on release.
* Variant graph processes to call variants relative to the reference genome directly from _de novo_ assemblies have been deprecated and removed.
* Variant calling with Varscan 2 has been deprecated and removed due to [licensing restrictions](https://github.com/dkoboldt/varscan/issues/12).
* Pipeline has been re-implemented in [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html)
* All software containers are now exclusively obtained from [Biocontainers](https://biocontainers.pro/#/registry)
* Updated Nextflow version to `v21.04.0-edge` (see [nextflow#572](https://github.com/nextflow-io/nextflow/issues/1964))
* Default human `--kraken2_db` link has been changed from Zenodo to an AWS S3 bucket for more reliable downloads
* Illumina and Nanopore runs containing the same 48 samples sequenced on both platforms have been uploaded to the nf-core AWS account for full-sized tests on release
* Variant graph processes to call variants relative to the reference genome directly from _de novo_ assemblies have been deprecated and removed
* Variant calling with Varscan 2 has been deprecated and removed due to [licensing restrictions](https://github.com/dkoboldt/varscan/issues/12)

### Other enhancements & fixes

* Updated pipeline template to nf-core/tools `1.13.3`.
* Bumped Nextflow version `19.10.0` -> `21.03.0-edge`.
* Optimise MultiQC configuration and input files for faster run-time on huge sample numbers.
* [#122](https://github.com/nf-core/viralrecon/issues/122) - Single SPAdes command to rule them all.
* [#138](https://github.com/nf-core/viralrecon/issues/138) - Problem masking the consensus sequence.
* [#142](https://github.com/nf-core/viralrecon/issues/142) - Unknown method invocation `toBytes` on String type.
* Updated pipeline template to nf-core/tools `1.13.3`
* Bumped Nextflow version `19.10.0` -> `21.04.0-edge`
* Optimise MultiQC configuration and input files for faster run-time on huge sample numbers
* [#122](https://github.com/nf-core/viralrecon/issues/122) - Single SPAdes command to rule them all
* [#138](https://github.com/nf-core/viralrecon/issues/138) - Problem masking the consensus sequence
* [#142](https://github.com/nf-core/viralrecon/issues/142) - Unknown method invocation `toBytes` on String type
* [#170](https://github.com/nf-core/viralrecon/issues/170) - ivar trimming of Swift libraries new offset feature

### Parameters

Expand All @@ -42,6 +43,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
| | `--enable_conda` |
| | `--fast5_dir` |
| | `--fastq_dir` |
| | `--ivar_trim_offset` |
| | `--kraken2_assembly_host_filter` |
| | `--kraken2_variants_host_filter` |
| | `--min_barcode_reads` |
Expand Down Expand Up @@ -108,7 +110,7 @@ Note, since the pipeline is now using Nextflow DSL2, each process will be run wi
| `cutadapt` | 2.10 | 3.2 |
| `ivar` | 1.2.2 | 1.3.1 |
| `kraken2` | 2.0.9beta | 2.1.1 |
| `nanoplot` | | 1.32.1 |
| `nanoplot` | | 1.36.1 |
| `markdown` | 3.2.2 | |
| `minimap2` | 2.17 | |
| `mosdepth` | 0.2.6 | 0.3.1 |
Expand Down
45 changes: 34 additions & 11 deletions lib/Workflow.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -210,10 +210,10 @@ class Workflow {
count++
if (count > 1) {
log.warn "=============================================================================\n" +
" This pipeline does not officially support multi-fasta genome files!\n\n" +
" The parameters and processes are tailored for viral genome analysis.\n" +
" Please amend the '--fasta' parameter.\n" +
"==================================================================================="
" This pipeline does not officially support multi-fasta genome files!\n\n" +
" The parameters and processes are tailored for viral genome analysis.\n" +
" Please amend the '--fasta' parameter.\n" +
"==================================================================================="
break
}
}
Expand Down Expand Up @@ -254,13 +254,36 @@ class Workflow {
}
if (total != (left + right)) {
log.warn "=============================================================================\n" +
" Please check the name field (column 4) in the file supplied via --primer_bed.\n\n" +
" All of the values in that column do not end with those supplied by:\n" +
" --primer_left_suffix : $primer_left_suffix\n" +
" --primer_right_suffix: $primer_right_suffix\n\n" +
" This information is required to collapse the primer intervals into amplicons\n" +
" for the coverage plots generated by the pipeline.\n" +
"==================================================================================="
" Please check the name field (column 4) in the file supplied via --primer_bed.\n\n" +
" All of the values in that column do not end with those supplied by:\n" +
" --primer_left_suffix : $primer_left_suffix\n" +
" --primer_right_suffix: $primer_right_suffix\n\n" +
" This information is required to collapse the primer intervals into amplicons\n" +
" for the coverage plots generated by the pipeline.\n" +
"==================================================================================="
}
}

// Check if the primer BED file supplied to the pipeline is from the SWIFT/SNAP protocol
public static void checkIfSwiftProtocol(primer_bed_file, name_prefix, log) {
def count = 0
def line = null
primer_bed_file.withReader { reader ->
while (line = reader.readLine()) {
def name = line.split('\t')[3]
if (name.contains(name_prefix)) {
count++
if (count > 1) {
log.warn "=============================================================================\n" +
" Found '${name_prefix}' in the name field of the primer BED file!\n" +
" This suggests that you have used the SWIFT/SNAP protocol to prep your samples.\n" +
" If so, please set '--ivar_trim_offset 5' as suggested in the issue below:\n" +
" https://github.com/nf-core/viralrecon/issues/170\n" +
"==================================================================================="
break
}
}
}
}
}

Expand Down
70 changes: 70 additions & 0 deletions modules/nf-core/software/artic/guppyplex/functions.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
/*
* -----------------------------------------------------
* Utility functions used in nf-core DSL2 module files
* -----------------------------------------------------
*/

/*
* Extract name of software tool from process name using $task.process
*/
def getSoftwareName(task_process) {
return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()
}

/*
* Function to initialise default values and to generate a Groovy Map of available options for nf-core modules
*/
def initOptions(Map args) {
def Map options = [:]
options.args = args.args ?: ''
options.args2 = args.args2 ?: ''
options.args3 = args.args3 ?: ''
options.publish_by_meta = args.publish_by_meta ?: []
options.publish_dir = args.publish_dir ?: ''
options.publish_files = args.publish_files
options.suffix = args.suffix ?: ''
return options
}

/*
* Tidy up and join elements of a list to return a path string
*/
def getPathFromList(path_list) {
def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries
paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes
return paths.join('/')
}

/*
* Function to save/publish module results
*/
def saveFiles(Map args) {
if (!args.filename.endsWith('.version.txt')) {
def ioptions = initOptions(args.options)
def path_list = [ ioptions.publish_dir ?: args.publish_dir ]
if (ioptions.publish_by_meta) {
def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta
for (key in key_list) {
if (args.meta && key instanceof String) {
def path = key
if (args.meta.containsKey(key)) {
path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key]
}
path = path instanceof String ? path : ''
path_list.add(path)
}
}
}
if (ioptions.publish_files instanceof Map) {
for (ext in ioptions.publish_files) {
if (args.filename.endsWith(ext.key)) {
def ext_list = path_list.collect()
ext_list.add(ext.value)
return "${getPathFromList(ext_list)}/$args.filename"
}
}
} else if (ioptions.publish_files == null) {
return "${getPathFromList(path_list)}/$args.filename"
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ process ARTIC_GUPPYPLEX {

input:
tuple val(meta), path(fastq_dir)

output:
tuple val(meta), path("*.fastq.gz"), emit: fastq
path "*.version.txt" , emit: version
Expand All @@ -34,9 +34,8 @@ process ARTIC_GUPPYPLEX {
$options.args \\
--directory $fastq_dir \\
--output ${prefix}.fastq

pigz -p $task.cpus *.fastq

pigz -p $task.cpus *.fastq
echo \$(artic --version 2>&1) | sed 's/^.*artic //; s/ .*\$//' > ${software}.version.txt
"""
}
44 changes: 44 additions & 0 deletions modules/nf-core/software/artic/guppyplex/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
name: artic_guppyplex
description: Aggregates fastq files with demultiplexed reads
keywords:
- artic
- aggregate
- demultiplexed reads
tools:
- artic:
description: ARTIC pipeline - a bioinformatics pipeline for working with virus sequencing data sequenced with nanopore
homepage: https://artic.readthedocs.io/en/latest/
documentation: https://artic.readthedocs.io/en/latest/
tool_dev_url: https://github.com/artic-network/fieldbioinformatics
doi: ""
licence: ['MIT']

input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- fastq_dir:
type: directory
description: Directory containing the fastq files with demultiplexed reads
pattern: "*"

output:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- fastq:
type: file
description: Aggregated FastQ files
pattern: "*.{fastq.gz}"
- version:
type: file
description: File containing software version
pattern: "*.{version.txt}"

authors:
- "@joseespinosa"
- "@drpatelh"
6 changes: 3 additions & 3 deletions modules/nf-core/software/nanoplot/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ process NANOPLOT {
mode: params.publish_dir_mode,
saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) }

conda (params.enable_conda ? "bioconda::nanoplot=1.32.1" : null)
conda (params.enable_conda ? "bioconda::nanoplot=1.36.1" : null)
if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) {
container "https://depot.galaxyproject.org/singularity/nanoplot:1.32.1--py_0"
container "https://depot.galaxyproject.org/singularity/nanoplot:1.36.1--pyhdfd78af_0"
} else {
container "quay.io/biocontainers/nanoplot:1.32.1--py_0"
container "quay.io/biocontainers/nanoplot:1.36.1--pyhdfd78af_0"
}

input:
Expand Down
2 changes: 1 addition & 1 deletion modules/nf-core/software/nanoplot/meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ input:
- summary_txt:
type: file
description: |
List of sequenicng_summary.txt files from running basecalling.
List of sequencing_summary.txt files from running basecalling.
output:
- meta:
type: map
Expand Down
4 changes: 4 additions & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ params {
callers = 'ivar'
min_mapped_reads = 1000
ivar_trim_noprimer = false
ivar_trim_offset = null
filter_duplicates = false
primer_left_suffix = '_LEFT'
primer_right_suffix = '_RIGHT'
Expand Down Expand Up @@ -205,6 +206,9 @@ profiles {
test_full_nanopore { includeConfig 'conf/test_full_nanopore.config' }
}

// Increase time available to build Conda environment
conda { createTimeout = "120 min" }

// Export these variables to prevent local Python/R libraries from conflicting with those in the container
env {
PYTHONNOUSERSITE = 1
Expand Down
6 changes: 6 additions & 0 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,12 @@
"description": "This option unsets the '-e' parameter in 'ivar trim' to discard reads without primers.",
"fa_icon": "fas fa-cut"
},
"ivar_trim_offset": {
"type": "integer",
"description": "This option sets the '-x' parameter in 'ivar trim' so that reads that occur at the specified offset positions relative to primer positions will also be trimmed.",
"fa_icon": "fas fa-cut",
"help_text": "This parameter will need to be set for some amplicon-based sequencing protocols (e.g. SWIFT) as described and implemented [here](https://github.com/andersen-lab/ivar/pull/88)"
},
"filter_duplicates": {
"type": "boolean",
"fa_icon": "fas fa-clone",
Expand Down
11 changes: 10 additions & 1 deletion workflows/illumina.nf
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ if (!params.save_reference) {

def ivar_trim_options = modules['illumina_ivar_trim']
ivar_trim_options.args += params.ivar_trim_noprimer ? '' : Utils.joinModuleArgs(['-e'])
ivar_trim_options.args += params.ivar_trim_offset ? Utils.joinModuleArgs(["-x ${params.ivar_trim_offset}"]) : ''

def ivar_trim_sort_bam_options = modules['illumina_ivar_trim_sort_bam']
if (params.skip_markduplicates) {
Expand Down Expand Up @@ -170,12 +171,20 @@ workflow ILLUMINA {
.fasta
.map { Workflow.isMultiFasta(it, log) }

// Check primer BED file only contains suffixes provided --primer_left_suffix / --primer_right_suffix
if (params.protocol == 'amplicon' && !params.skip_variants) {
// Check primer BED file only contains suffixes provided --primer_left_suffix / --primer_right_suffix
PREPARE_GENOME
.out
.primer_bed
.map { Workflow.checkPrimerSuffixes(it, params.primer_left_suffix, params.primer_right_suffix, log) }

// Check if the primer BED file supplied to the pipeline is from the SWIFT/SNAP protocol
if (!params.ivar_trim_offset) {
PREPARE_GENOME
.out
.primer_bed
.map { Workflow.checkIfSwiftProtocol(it, 'covid19genome', log) }
}
}

/*
Expand Down
16 changes: 8 additions & 8 deletions workflows/nanopore.nf
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,6 @@ artic_minion_options.args += params.artic_minion_aligner == 'bwa' ? Utils.joi
def multiqc_options = modules['nanopore_multiqc']
multiqc_options.args += params.multiqc_title ? Utils.joinModuleArgs(["--title \"$params.multiqc_title\""]) : ''

include { ARTIC_GUPPYPLEX } from '../modules/local/artic_guppyplex' addParams( options: modules['nanopore_artic_guppyplex'] )
include { ARTIC_MINION } from '../modules/local/artic_minion' addParams( options: artic_minion_options )
include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files: ['csv':'']] )
include { MULTIQC } from '../modules/local/multiqc_nanopore' addParams( options: multiqc_options )
Expand Down Expand Up @@ -89,13 +88,14 @@ include { SNPEFF_SNPSIFT } from '../subworkflows/local/snpeff_snpsift'
/*
* MODULE: Installed directly from nf-core/modules
*/
include { PYCOQC } from '../modules/nf-core/software/pycoqc/main' addParams( options: modules['nanopore_pycoqc'] )
include { NANOPLOT } from '../modules/nf-core/software/nanoplot/main' addParams( options: modules['nanopore_nanoplot'] )
include { BCFTOOLS_STATS } from '../modules/nf-core/software/bcftools/stats/main' addParams( options: modules['nanopore_bcftools_stats'] )
include { QUAST } from '../modules/nf-core/software/quast/main' addParams( options: modules['nanopore_quast'] )
include { PANGOLIN } from '../modules/nf-core/software/pangolin/main' addParams( options: modules['nanopore_pangolin'] )
include { MOSDEPTH as MOSDEPTH_GENOME } from '../modules/nf-core/software/mosdepth/main' addParams( options: modules['nanopore_mosdepth_genome'] )
include { MOSDEPTH as MOSDEPTH_AMPLICON } from '../modules/nf-core/software/mosdepth/main' addParams( options: modules['nanopore_mosdepth_amplicon'] )
include { PYCOQC } from '../modules/nf-core/software/pycoqc/main' addParams( options: modules['nanopore_pycoqc'] )
include { NANOPLOT } from '../modules/nf-core/software/nanoplot/main' addParams( options: modules['nanopore_nanoplot'] )
include { ARTIC_GUPPYPLEX } from '../modules/nf-core/software/artic/guppyplex/main' addParams( options: modules['nanopore_artic_guppyplex'] )
include { BCFTOOLS_STATS } from '../modules/nf-core/software/bcftools/stats/main' addParams( options: modules['nanopore_bcftools_stats'] )
include { QUAST } from '../modules/nf-core/software/quast/main' addParams( options: modules['nanopore_quast'] )
include { PANGOLIN } from '../modules/nf-core/software/pangolin/main' addParams( options: modules['nanopore_pangolin'] )
include { MOSDEPTH as MOSDEPTH_GENOME } from '../modules/nf-core/software/mosdepth/main' addParams( options: modules['nanopore_mosdepth_genome'] )
include { MOSDEPTH as MOSDEPTH_AMPLICON } from '../modules/nf-core/software/mosdepth/main' addParams( options: modules['nanopore_mosdepth_amplicon'] )

/*
* SUBWORKFLOW: Consisting entirely of nf-core/modules
Expand Down