-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge: Support table-specific id columns and delimiters #1594
Conversation
49e6467
to
a9d821f
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1594 +/- ##
==========================================
+ Coverage 71.03% 71.06% +0.03%
==========================================
Files 79 79
Lines 8258 8267 +9
Branches 2004 2007 +3
==========================================
+ Hits 5866 5875 +9
Misses 2101 2101
Partials 291 291 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad the Nextclade and metadata merge is a useful example! Seems reasonable to me 👍
augur/merge.py
Outdated
def pairs(xs: Iterable[str]) -> Iterable[Tuple[str, str]]: | ||
""" | ||
Split an iterable of ``k=v`` strings into an iterable of ``(k,v)`` tuples. | ||
|
||
>>> pairs(["abc=123", "eight nine ten=el em en"]) | ||
[('abc', '123'), ('eight nine ten', 'el em en')] | ||
|
||
Strings missing a ``k`` and/or a ``v`` part get an empty string. | ||
|
||
>>> pairs(["v", "=v", "k=", "=", ""]) | ||
[('', 'v'), ('', 'v'), ('k', ''), ('', ''), ('', '')] | ||
|
||
``k`` ends at the first ``=``. | ||
|
||
>>> pairs(["abc=123=xyz", "=v=v"]) | ||
[('abc', '123=xyz'), ('', 'v=v')] | ||
""" | ||
return [tuple(x.split("=", 1)) if "=" in x else ("", x) for x in xs] # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking
This seems like it would be a useful custom argparse action. The k=v
is such a common pattern we use for CLI options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, perhaps! I tend to write these sorts of utility functions immediately adjacent to my current need for them and then refactor/extract as desired when I reach for them again later. (Unless I concretely foresee needing them again elsewhere very soon.)
Coverage again points me at error messages I missed testing. I'll add those tomorrow or next week. |
a9d821f
to
f0d87e1
Compare
Fixed. Will wait for #1593 to merge before this one. Merge there is pending name mangling discussion. |
40f4e2e
to
2a73eff
Compare
When trying out `augur merge` for our common Nextclade metadata merge step¹, I realized it would be good to allow table-specific values for these settings. Table-specific values avoid problems with impossible orderings, e.g. two files both have columns A and B but you want to use A as the id column in the first and B in the second. Maybe that's an edge case, but it's also just much easier to reason about table-specific values than reason about how an ordered list of N values applies to all M files. A shared list of values is still good for when you don't know what they'll be ahead of time. ¹ <https://github.com/nextstrain/pathogen-repo-guide/blob/89b3c5dbc9b8f6a009f4a19c3ac56113bc5511ee/ingest/rules/nextclade.smk#L75-L95>
2a73eff
to
232698b
Compare
f0d87e1
to
68d1a71
Compare
Added a changelog entry. |
When trying out
augur merge
for our common Nextclade metadata merge step¹, I realized it would be good to allow table-specific values for these settings.Table-specific values avoid problems with impossible orderings, e.g. two files both have columns A and B but you want to use A as the id column in the first and B in the second. Maybe that's an edge case, but it's also just much easier to reason about table-specific values than reason about how an ordered list of N values applies to all M files. A shared list of values is still good for when you don't know what they'll be ahead of time.
¹ https://github.com/nextstrain/pathogen-repo-guide/blob/89b3c5dbc9b8f6a009f4a19c3ac56113bc5511ee/ingest/rules/nextclade.smk#L75-L95
Checklist