Refactor filters into separate functions #745
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of proposed changes
Refactors filter logic into separate function with the same signature of
func(metadata, **kwargs)
that returns aset
of strain names that pass the filter. Although this work does not reduce the complexity of the code by itself, it sets up a pattern that will allow us to move all filters into a single loop through all user-requested filters. This change should simplify the main logic and also allow us to short-cut evaluation when filters remove all possible strains (e.g.,--exclude-all
), avoiding unnecessary checks.This refactoring also includes new functions for sequence-based filters. As part of these sequence-based functions, we update the sequence index data frame to be indexed by strain name to be consistent with the metadata data frame.
One side-effect of this refactoring is the addition of a functional test for both
--include-where
and--exclude-where
filters to make sure these are properly implemented and no regressions occur during refactoring. The lack of this test initially allowed the refactoring of--exclude-where
logic to introduce a bug.Finally, we also define a new function to include strains by a query. Note that this implementation relies on the same query parser used by the
--exclude-where
argument which allows the negation operator and also the code that lowercases the strings before comparison. This change is backward compatible, however, and only adds functionality that is consistent with the--exclude-where
functionality.Related issue(s)
Builds on work in the
pandas-metadata
branch and PR #743Testing
Adds doctests for all new filter functions. Test these functions locally by installing augur with dev dependencies (
python3 -m pip install .[dev]
) with./run_tests -k filter_by
.