West Nile Virus (WNV) Global and Washington Focused Builds

Build Overview

Default Build Name: WNV Global
State Based Build Name: WNV Washington Focused Build
Pathogen/Strain: West Nile Virus
Scope: Full genome
Purpose: This repository analyzes West Nile Viral (WNV) genomes using Nextstrain to understand the circulation and transmission of WNV globally (WNV Global build) and within Washington State (WNV Washington Focused Build). This repository was developed based on the WNV repository used for the Twenty years of West Nile Virus in the Americas Nextstrain Narrative
Nextstrain Build/s Location/s: [Insert the URL for the Nextstrain build on Nextstrain Groups] [Insert another URL for instances when more than one Nextstrain build exists]

Getting Started

Some high-level features and capabilities specific to this build include:

Lineage Designation: We use Pathoplexus for clade calling based off of a Nextclade dataset in this PR
Subsampling: The WNV Washington Focused Build uses a tiered subsampling strategy which allows for filtering NCBI data based on geographic location. The subsampling criteria in the WNV Washington Focused Build is set to select all sequences from Washington, neighboring states, and region, up to a maximum of 5,000 sequences. Additionally, up to 300 sequences are randomly selected from other states. These criteria can be modified as needed.
Mapping Specific Locations: We have added the option to map specific locations using coordinates in the WNV Washington Focused Build. This feature is useful for a state that needs to map the locations of mosquito traps, for example.

Data Sources & Inputs

This build pulls WNV genomes that are publicly available from NCBI.

Sequence and Metadata Data: NCBI
Expected Inputs:
- ingest/data/sequences.fasta (containing WNV genome sequences)
- ingest/data/metadata.tsv (with relevant sample information)
Private geolocation data, if applicable:
- phylogenetic/defaults/wa/annotations.tsv (containing location name, latitude, and longitude information)

Setup & Dependencies

Installation

Follow the standard installation instructions for Nextstrain's suite of software tools.

Clone the repository

git clone https://github.com/nextstrain/WNV.git
cd WNV

Try running Augur and Auspice

augur -help
auspice -help

Run the build

This build can process and output global or Washington state focused WNV information.

To run the build by workflows first run the ingest workflow

nextstrain build ingest

Inside the ingest folder there should be two output files: metadata.tsv and sequences.fasta

Run the phylogenetic workflow Execute the global build

nextstrain build phylogenetic

Or execute the Washington focused build

nextstrain build phylogenetic --configfile build-configs/washington-state/config.yaml

Inside the phylogenetic folder there should be at least one output file: WNV_{build name}.json

Repository File Structure Overview

This Nextstrain build follows the structure detailed in the Pathogen Repo Guide. Mainly, this build contains two workflows for the analysis of WNV data:

ingest/ Download data from NCBI, clean, format, curate it, and assign clades.
phylogenetic/ Subsample data and make phylogenetic trees for use in nextstrain.

Expected Outputs

After successfully running the build there will be two output folders containing the build results.

phylogenetic/auspice/ folder contains: a file called WNV_{build name}.json
results/ folder contains: multiple intermediate files which include the aligned sequences, subsampled sequences, and phylogenetic trees in .nwk format

Scientific Decisions

The following are critical decisions that were made during the development of the WNV build that should be kept in mind when analyzing the data.

Global and Washington Focused Outputs

This build can process and output global or Washington state focused WNV information. To accomplish this, a washington-state.yaml file was added to the build-configs which specifies Washington subsampling preferences. This file can be adopted and modified to accommodate other sampling references appropriate to other regions or states.

Reference Selection

The Global and the Washington focused WNV builds use different references.

The Global WNV build uses the reference sequence AF260968 which is the first WNV L1 (cluster 1) strain recovered in Egypt from 1951. Mencattelli, G., Ndione, M.H.D., Silverj, A. et al. Spatial and temporal dynamics of West Nile virus between Africa and Europe. Nat Commun 14, 6440 (2023). https://doi.org/10.1038/s41467-023-42185-7

The Washington focused WNV build uses the reference sequence AF481864 as this is the sequence that is most closely related to the sequences isolated from New York in 1999. Hadfield J, Brito AF, Swetnam DM, Vogels CBF, Tokarz RE, Andersen KG, Smith RC, Bedford T, Grubaugh ND. Twenty years of West Nile virus spread and evolution in the Americas visualized by Nextstrain. PLoS Pathog. 2019 Oct 31;15(10):e1008042. doi: 10.1371/journal.ppat.1008042. PMID: 31671157; PMCID: PMC6822705.

Lineage Designation

For global lineage designations, we query pathoplexus

Host mapping to Host Genus and Host Type

We further refined the information in the NCBI Host column by categorizing it into Host_Genus and Host_Type, creating broader groupings for more effective data analysis. For example, the Host Homo sapiens is classified under Host_Genus as Homo and Host_Type as Human. This broader categorization is particularly useful for visualizing host information on the phylogenetic tree. Instead of distinguishing between individual mosquito species, you can use the broader categories like Host_Genus Culex or the higher-level category Host_Type Mosquito to color the tips of the tree.

Determination of Minimum Genome Length

The average genome length of WNV is 10,948 bp. We evaluated minimum genome length thresholds of 90% (9,800 bp), 80% (8,700 bp), 75% (8,200 bp), and 70% (7,700 bp). For each threshold, we ran the Washington-focused build and compared: (1) the number of sequences included, (2) data gap locations in the alignment files using an alignment viewer, and (3) the topology and lineage assignments from the phylogenetic tree outputs to determine the optimal threshold. We concluded that a minimum genome length of 75% (8,200 bp) included a higher number of sequences while balancing alignment quality. Lastly, we validated this threshold using the global build.

To modify the minimum length of nucleotide sequence in the WNV global build enter the desired threshold in the --min-length <MIN_LENGTH> parameter that is listed in the defaults/config.yaml file
To modify the minimum length of nucleotide sequence in the WNV Washington focused build enter the desired threshold in the --min-length <MIN_LENGTH> parameter that is listed in the washington-state/config.yaml file.

Customization for Local Adaptation

This build can be customized for use by other demes, including as states, cities, counties, or countries.

Subsampling

The Washington focused WNV build retrieves all available WNV sequences from NCBI and filters the data within the phylogenetic workflow based on criteria defined in the build-configs/washington-state/config.yaml file. For details on the current subsampling configuration and instructions on modifying the criteria, refer to the phylogenetic/build-configs/washington-state README.md.

Incorporating Additional Metadata

We have added the option to integrate additional metadata, which can include either public or sensitive information. This feature is especially useful for state health departments that need to annotate the phylogenetic trees or map visualizations in Auspice. For example, in the Washington focused WNV build, we mapped the centroids of zip codes where mosquito traps are located. This information is located in the phylogenetic/data-private/metadata.tsv folder. For more details on the current metadata configuration and instructions on modifying it, refer to the phylogenetic/data-private/README.md.

Contributing

For any questions please submit them to our [Discussions](insert link here) page otherwise software issues and requests can be logged as a Git [Issue](insert link here).

License

Acknowledgements

[add acknowledgements to those who have contributed to this work]

Name		Name	Last commit message	Last commit date
Latest commit History 265 Commits
.github		.github
ingest		ingest
nextclade		nextclade
phylogenetic		phylogenetic
.gitignore		.gitignore
README.md		README.md
nextstrain-pathogen.yaml		nextstrain-pathogen.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

West Nile Virus (WNV) Global and Washington Focused Builds

Build Overview

Table of Contents

Getting Started

Data Sources & Inputs

Setup & Dependencies

Installation

Clone the repository

Run the build

Repository File Structure Overview

Expected Outputs

Scientific Decisions

Global and Washington Focused Outputs

Reference Selection

Lineage Designation

Host mapping to Host Genus and Host Type

Determination of Minimum Genome Length

Customization for Local Adaptation

Subsampling

Incorporating Additional Metadata

Contributing

License

Acknowledgements

About

Releases

Packages

Contributors 7

Languages

nextstrain/WNV

Folders and files

Latest commit

History

Repository files navigation

West Nile Virus (WNV) Global and Washington Focused Builds

Build Overview

Table of Contents

Getting Started

Data Sources & Inputs

Setup & Dependencies

Installation

Clone the repository

Run the build

Repository File Structure Overview

Expected Outputs

Scientific Decisions

Global and Washington Focused Outputs

Reference Selection

Lineage Designation

Host mapping to Host Genus and Host Type

Determination of Minimum Genome Length

Customization for Local Adaptation

Subsampling

Incorporating Additional Metadata

Contributing

License

Acknowledgements

About

Topics

Resources

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages