-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/integer branch length #618
Conversation
Codecov Report
@@ Coverage Diff @@
## master #618 +/- ##
==========================================
- Coverage 28.37% 28.32% -0.06%
==========================================
Files 39 39
Lines 5332 5345 +13
Branches 1306 1311 +5
==========================================
+ Hits 1513 1514 +1
- Misses 3766 3778 +12
Partials 53 53
Continue to review full report at Codecov.
|
This fails if a timetree is not being inferred. I think we need to tweak the initialisation of the
|
Good call. Hadn't thought of the case without timetree. But hey, at least we got a useful error message! I'll push a fix |
could you retest this? I stuck the ancestral reconstruction into the block where we do the mutation units to not slow down the step otherwise. |
Adds functional tests based on the Zika build including commands that run the usual time tree inference, use refine to label internal nodes without running time tree, and use refine to specify the type of divergence unit to report (mutations or mutations per site). Note that the last of these tests currently fails because the command asks for mutations as the divergence unit, but the input does not include an alignment FASTA. In this case, augur refine should throw an error alerting the user that they are missing a required input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code works for me under the normal time tree case and in James’s case where no time tree is requested but an alignment is still provided. This code will silently produce meaningless branch lengths if no time tree is requested and no alignment is provided, though. As refine
is written now, we allow users to run the command this way to assign names to internal nodes.
I think we should add a check immediately after arguments have been parsed for this invalid request (“mutations” as a divergence unit without an alignment input), print a helpful error message to stderr, and exit.
I will push a commit in a minute that adds functional tests for refine
and includes a failing test for this scenario. It also includes tests for the normal execution of refine with timetree and James’s example.
I also wonder if we can move the different
function to TreeTime's seq_utils.py
so augur does not need to know about the implementation of profile_maps
. I added some comments inline about this. I get that this is an annoying bit of code shuffling that will require a new TreeTime release, but I think it will make this code easier to maintain and also provide a helpful new utility that we'll want to use elsewhere. I didn't know about the profile_maps
before, but I would definitely re-use a function like the one in this PR for other analyses.
(A new TreeTime release would also have the nice side effect of getting rid of those VisibleDeprecationWarning
messages from numpy and the amino acid models arrays.)
for node in node_data['nodes']: | ||
if not args.timetree: | ||
tt.infer_ancestral_sequences() | ||
nuc_map = profile_maps['nuc'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use the "nuc_nogap" map here instead of the "nuc" map, won't that allow us to remove the checks for "-" and "N" characters in the different
function below? It seems like this map will ignore those gap bases in the calculations below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my main reason for doing this check for N
and -
was speed (without profiling). I think we should use the same alphabet as the build though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok. Thank you for clarifying this!
Makes minor edits that should improve readability for future maintainers who may not immediately recognize the meaning of the single-letter abbreviations.
Rewrites the functional test for refine with mutational divergence units to expect an error message and exit code. Also, updates the error logging from augur refine to write to stderr instead of stdout both for improved user experience and to simplify functional tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I updated the functional tests to pass with the new error message.
for node in node_data['nodes']: | ||
if not args.timetree: | ||
tt.infer_ancestral_sequences() | ||
nuc_map = profile_maps['nuc'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok. Thank you for clarifying this!
Description of proposed changes
make branch length a proper integer when units are mutations by counting mutations associated with branches. Our previous branch length calculation simply multiplied the expected number of mutations per site by the length of the genome. This resulted in slightly off numbers due to numerical inaccuracies or missing data.
Related issue(s)
#617
Testing
check branch length. these are up at
https://nextstrain.org/staging/ncovRefocus/europe
https://nextstrain.org/staging/ncovRefocus/north-america
Thank you for contributing to Nextstrain!