Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use GFF3 genemap instead of genbank reference so that genenames are OPG #69

Merged

Conversation

corneliusroemer
Copy link
Member

@corneliusroemer corneliusroemer commented Jun 17, 2022

Needs augur translate to be patched first before merging. @rneher

victorlin and others added 7 commits June 13, 2022 22:26
Note: this requires forthcoming PRs in augur and auspice
Previously, we would backfill `rule_traversal` with wildcards ("*")
if we could not find a matching rule for a particular `raw_geolocation`.
This would NOT work for cases where there are partial rule matches AND
wildcard rules that match. I realized the flaw in this logic
while responding to @victorlin's post-merge review:
#41 (comment)

This commit updates the logic for when there are no matching rules. The
`rule_traversal` is reset to the last index that is currently not a
wildcard rule, and then change this value to a wildcard. This allows the
recursive function to try different iterations of `rule_traversal` with
different combinations of raw values and wildcards.
This is necessary for the temporal colour scale introduced in the
preceding commit.

For searchibility i'll past the error message
that you would get when running with <16.0.0 (which shouldn't be
reachable as the snakemake pipeline should exit up front):

ERROR: 'temporal' is not one of ['continuous', 'ordinal', 'categorical', 'boolean']
Validation of config/auspice_config_mpxv.json failed.
trvrb and others added 16 commits June 17, 2022 15:09
The previous strategy of assigning colors used the entire metadata.tsv to assign country colors, whereas the hpmxv1 build target has a subset of 18 of these 30 countries and the mpxv build target has a subset of 26 of these 30 countries. Consequently, we weren't using color spectrum as efficiently as we could especially for the hmpxv1 target.

This commit updates augur filter to output filtered metadata for both hmpxv1 and mpxv and uses this filtered metadata to assign colors.

Additionally, this commit updates country order in update_colours.py to start with Africa rather than Asia. For ncov, I had started with Asia as this was the basal region. For monkeypox this is Africa.
Rather than replacing "accession" with "strain" in the initial wrangle_metadata.py script, instead keep "accession" column and create an additional "strain" column. This will allow "augur export" to recognize this "accession" column and include properly linked field in tooltips.

Remove "genbank_accession_rev" as "Accession" coloring. Now "Accession" will be automatically recognized.
Provision colors separately for mpxv and hmpxv1 build targets
…eolocation-rules

Fix typos in `apply-geolocation-rules`
ingest/apply-geolocation-rules: update general rules logic
feat: rename nextalign -> nextalign2 to work with base image
Copy link
Member

@jameshadfield jameshadfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested using nextstrain/augur#976 and all is working 👍

This fixes the issue with having a number of gray "outbreak_associated" tips. We may want to drop the coloring at some point, but I was still finding it valuable when putting together presentations to easily highlight the outbreak clade B.1
@jameshadfield
Copy link
Member

Note one difference. Current builds have "gene" names such as MPXV-UK_P2-XXX. With this change we'll now use names such as OPGXXX (this change introduced in #67).

trvrb and others added 2 commits June 20, 2022 18:29
…om ingest

This way we don't get clade_x and clade_y when merging nextclade in ingest
…eak-association-column

ingest(fix/feat): remove old clade and outbreak association column from ingest
@corneliusroemer
Copy link
Member Author

Indeed @jameshadfield that's on purpose, too - the UK gene names were not good, that was just arbitrary, whereas OPG is the standard suggested by NCBI recently.

@corneliusroemer corneliusroemer marked this pull request as ready for review June 24, 2022 19:23
@corneliusroemer corneliusroemer merged commit cc2b491 into chore/upgrade-reference Jun 24, 2022
@corneliusroemer corneliusroemer deleted the chore/upgrade-reference-use-genemap branch June 24, 2022 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

6 participants