-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix titer substitution model tree annotations #1555
Conversation
Fixes a bug in the `augur titers sub` tree annotations in the JSON output where antigenic weights were assigned per branch for the opposite substitution values than expected. Specifically, the bug caused each branch to get the antigenic weight associated with substitutions in the form of "<derived allele><position><ancestral allele>" instead of "<ancestral allele><position><derived allele>". For example, when the substitution model found antigenic weights for HA1 substitutions N193S and S193N, the tree annotations would have assigned the N193S weight to branches with the S193N substitution. This commit expands an existing functional test to check for a data-specific antigenic weight. After confirming this test failed, I fixed the bug and confirmed that the test passed. Note that this bug dates back to December 2018 when I first added the code to annotate the tree with the substitution model weights [1]. This bug had no impact on the accuracy of the underlying model itself, so we would never have noticed it without manually comparing the titer drops in the tree to the "substitution" array in the JSON output. [1] 6721b7d#diff-96cfc90794aa092cbb7b8577ca92d5757a777325d78078da3761ea26ee4c5956R67
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1555 +/- ##
=======================================
Coverage 70.22% 70.22%
=======================================
Files 74 74
Lines 7952 7952
Branches 1945 1945
=======================================
Hits 5584 5584
Misses 2082 2082
Partials 286 286 ☔ View full report in Codecov by Sentry. |
Co-authored-by: Thomas Sibley <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, John! Good catch.
Description of proposed changes
Fixes a bug in the
augur titers sub
tree annotations in the JSON output where antigenic weights were assigned per branch for the opposite substitution values than expected. Specifically, the bug caused each branch to get the antigenic weight associated with substitutions in the form of<derived allele><position><ancestral allele>
instead of<ancestral allele><position><derived allele>
. For example, when the substitution model found antigenic weights for HA1 substitutions N193S and S193N, the tree annotations would have assigned the N193S weight to branches with the S193N substitution.This commit expands an existing functional test to check for a data-specific antigenic weight. After confirming this test failed, I fixed the bug and confirmed that the test passed.
Note that this bug dates back to December 2018 when I first added the code to annotate the tree with the substitution model weights. This bug had no impact on the accuracy of the underlying model itself, so we would never have noticed it without manually comparing the titer drops in the tree to the "substitution" array in the JSON output.
I found this issue today when I noticed:
The following screenshot shows what the cumulative antigenic advance from the substitution model looks like for the H1 HA dataset mentioned above before the bug is fixed (note small advance for nodes with HA1:155E):
This screenshot shows what the advance looks like after the bug is fixed (note larger advance for 155E nodes):
As another point of comparison, here is the scatterplot of the antigenic advance from the tree model on the x-axis and the substitution model on the y-axis with the bug. Note the low correlation value between the data.
Here is the updated scatterplot view after fixing the bug with a much higher correlation between the models.
Checklist