Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inherited clade definitions #823

Closed
jameshadfield opened this issue Jan 4, 2022 · 1 comment · Fixed by #846
Closed

Inherited clade definitions #823

jameshadfield opened this issue Jan 4, 2022 · 1 comment · Fixed by #846
Assignees
Labels
enhancement New feature or request

Comments

@jameshadfield
Copy link
Member

jameshadfield commented Jan 4, 2022

Currently clades are defined independently of one another in the provided TSV, but we often duplicate the mutations of a parent clade. For example, 21L is a descendant of 21M so we have the following:

21M (Omicron) 	nuc 	23525 	T
21M (Omicron) 	nuc 	23599 	G
21L (Omicron) 	nuc 	23525 	T  ## mutation actually defines 21M
21L (Omicron) 	nuc 	23599 	G  ## mutation actually defines 21M
21L (Omicron) 	nuc 	24424 	T

We should allow clades to be inherited, e.g.:

21M (Omicron) 	nuc 	23525 	T
21M (Omicron) 	nuc 	23599 	G
21L (Omicron) 	clade 	21M (Omicron)
21L (Omicron) 	nuc 	24424 	T

There are a few considerations here:

  • This introduces the potential for circular dependencies (A descended from B descended from A) which should be fatal errors.
  • When both a parent and descendant clade are annotated on the same branch, the branch label should represent the descendant clade.
  • Should multiple parent clades be allowed? Probably easiest to limit the current implementation to a single parent lineage.

Related

Possible solution

There seem to be two implementations available:

  1. Expand the TSV upon parsing to replace the parent clade with the mutations of the parent clade
  2. Only consider the subtree defined by the parent clade and then find the clade defined by the extra mutations (e.g. 24424T in the example above).

I prefer solution 2, but I don't think the results will be different.

@corneliusroemer
Copy link
Member

I will implement solution 1 since it's straightforward and simplifies the clade.tsv

We can always switch to solution 2 later.

Repository owner moved this from Prioritized to Done in Nextstrain planning (archived) Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants