Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull in more QC fields from Nextclade #3194

Open
emmahodcroft opened this issue Nov 11, 2024 · 1 comment
Open

Pull in more QC fields from Nextclade #3194

emmahodcroft opened this issue Nov 11, 2024 · 1 comment
Labels
discussion Open questions feature Feature proposal preprocessing Issues related to the preprocessing component

Comments

@emmahodcroft
Copy link
Member

From discussion in 11/11 dev meeting, we talked about allowing users to have more ways to filter sequences based on QC scores. In particular, since we 'let everything in' we should allow users to then have useful ways to filter out things that may be questionable.

There are many ways we could run with this afterwards, but as a first step, we should bring these fields in.

Current status:

Currently we have pulled in only QC fields that are 'objective' (e.g. counts of missing Nucs) and avoided those that are specific to Nextclade
However these could be not-ideal for getting an overall feel for the quality of a sequence and having one good place to filter out 'bad stuff'

Possible Objectives:

  • Pull in some of the 'less-objective' fields from Nextclade (ex: overall QC score)
    • Need a list of possible fields
  • Make it clear(er) these are from Nextclade and how they're generated/what they mean
  • Make it clear these fields may change over time (ex: a 'bad' seq may become 'good' if it's now recognised correctly as a recombinant rather than an outlier)
@emmahodcroft emmahodcroft added discussion Open questions preprocessing Issues related to the preprocessing component feature Feature proposal labels Nov 11, 2024
@emmahodcroft
Copy link
Member Author

emmahodcroft commented Nov 15, 2024

I'd suggest we pull in:

'Overall QC Score'
'Overall QC Status'

The following as statuses (ex: 'good', 'mediocre'):
'Missing Data'
'Mixed Sites'
'Private Mutations'
'Mutation Clusters'
'Frame Shifts'
'Stop codons'

One could additionally have a discussion about whether these should additionally be pulled in as the numeric scores as well (if available - I am not sure if they all have numeric values out of 100 as well?) - it would allow users more fine-grained filtering, but would add more fields to TSV, details page, etc etc.

I'd propose to preface all of these with 'Nextclade' to make it clear where these come from as they aren't purely quantitative.

It would be ideal if we could add some kind of (?) icon mouseover on the sequence details page next to these so that people could read more about how they are calculated and/or link to Nextclade descriptions of how these are generated. (I imagine this could be implemented just as a generic text-box mouseover, and one specifies in a config somewhere what the text should be and whether to include a link, for example). However, this may be a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Open questions feature Feature proposal preprocessing Issues related to the preprocessing component
Projects
None yet
Development

No branches or pull requests

1 participant