feat: Explain our QC, clades and alignment better #3592

anna-parker · 2025-01-31T08:17:03Z

We should at least link somewhere to the nextclade dataset that we use. Even better might be as @emmahodcroft suggested adding an info page where we describe our annotation process in detail.

Link to the reference more clearly
Link to the nextclade dataset: Surface more info on preprocessing tools #3301
describe our QC scores and clade annotation

emmahodcroft · 2025-01-31T08:19:23Z

Do we have public nextclade datasets available for each pathogen? I can't remember - I guess that would be a 'step 1' for any where we don't (actually put them on Nextclade)

emmahodcroft · 2025-01-31T08:20:18Z

As another idea (but may be too manual/too much hard work) - At Nextstrain we have this fig for SC2 which I use all the time. It doesn't work ideally, but something like for each pathogen - if we could get it to automatically generated and update - would probably be the most useful thing for a naive user:

https://github.com/nextstrain/ncov-clades-schema

anna-parker · 2025-01-31T08:26:54Z

Do we have public nextclade datasets available for each pathogen? I can't remember - I guess that would be a 'step 1' for any where we don't (actually put them on Nextclade)

they are all on nextclade - but some are just on branches and not ~~public~~ merged on main

Datasets to merge:

CCHF: Cornelius cchfv nextstrain/nextclade_data#200 - no clades or qc
West Nile: West Nile Virus overview nextstrain/nextclade_data#197 - should be ready to merge
Ebola: Reference-only Ebola Zaire and Sudan nextstrain/nextclade_data#184 - no clades or qc

emmahodcroft · 2025-01-31T08:39:53Z

Slightly related to this, we may want to prioritze #3194
Pulling in more QC scores so that people have more metrics by which to judge data quality. This was requested specifically by Aine/Andrew in mpox context, but probably applies more broadly.

(But posting this mostly for visibility, we don't have to try and tackle all of this at once)

j23414 · 2025-02-04T17:52:58Z

+1 for documenting and merging the WNV dataset.

In nextstrain/WNV, we've been using the Pathoplexus API to get the draft WNV global lineage calls:

Ingest: fetch and append pathoplexus global lineage calls nextstrain/WNV#40

but would love to have more documentation on the reference, QC scores, and other nextclade specific parameters for the WNV dataset.

anna-parker added the high_priority Work on this as soon as possible (potentially post-MVP) label Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Explain our QC, clades and alignment better #3592

feat: Explain our QC, clades and alignment better #3592

anna-parker commented Jan 31, 2025 •

edited by chaoran-chen

Loading

emmahodcroft commented Jan 31, 2025

emmahodcroft commented Jan 31, 2025 •

edited

Loading

anna-parker commented Jan 31, 2025 •

edited

Loading

emmahodcroft commented Jan 31, 2025

j23414 commented Feb 4, 2025 •

edited

Loading

feat: Explain our QC, clades and alignment better #3592

feat: Explain our QC, clades and alignment better #3592

Comments

anna-parker commented Jan 31, 2025 • edited by chaoran-chen Loading

emmahodcroft commented Jan 31, 2025

emmahodcroft commented Jan 31, 2025 • edited Loading

anna-parker commented Jan 31, 2025 • edited Loading

emmahodcroft commented Jan 31, 2025

j23414 commented Feb 4, 2025 • edited Loading

anna-parker commented Jan 31, 2025 •

edited by chaoran-chen

Loading

emmahodcroft commented Jan 31, 2025 •

edited

Loading

anna-parker commented Jan 31, 2025 •

edited

Loading

j23414 commented Feb 4, 2025 •

edited

Loading