-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Provide helpful error message when metadata file doesn't contain "strain" column #905
Comments
Interestingly, when reading in a metadata file, we seem to be ok with Should we remove support for augur/augur/util_support/metadata_file.py Lines 6 to 12 in 4b71e7d
|
We do support searching for multiple arbitrary strain ids when reading in metadata with the An alternate solution to #906 is to use |
Replaces a call to the older `utils.read_metadata` function with the newer `io.read_metadata` function while processing metadata for export to an Auspice JSON. This new function returns a pandas DataFrame indexed by the first viable strain name column found in the metadata file (removing this column from the data itself), while the original function returns a dictionary indexed by strain name (keeping the original named column like `strain` or `name` in the data). To avoid changing the downstream code that consumes the metadata, this commit converts the pandas DataFrame to a dictionary that matches the output of the original function. The main advantage here is that the calling code does not need to know what the id column is named, since `io.read_metadata` handles this and indexed the data frame by that column. This commit also adds functional tests for the expected behavior of export v2 with metadata inputs. Fixes #905
Replaces a call to the older `utils.read_metadata` function with the newer `io.read_metadata` function while processing metadata for export to an Auspice JSON. This new function returns a pandas DataFrame indexed by the first viable strain name column found in the metadata file (removing this column from the data itself), while the original function returns a dictionary indexed by strain name (keeping the original named column like `strain` or `name` in the data). To avoid changing the downstream code that consumes the metadata, this commit converts the pandas DataFrame to a dictionary that matches the output of the original function. The main advantage here is that the calling code does not need to know what the id column is named, since `io.read_metadata` handles this and indexed the data frame by that column. This commit also adds functional tests for the expected behavior of export v2 with metadata inputs. Fixes #905
A lot of users seem to get the following type of error:
https://discussion.nextstrain.org/t/error-in-job-3-exporting-data-files-for-for-auspice/493/4
It's a common discussion topic on our forum and also in emails we get to [email protected]
I think it would help users a lot if we raised a more informative error so that users know directly how to fix it.
Also, we don't seem to have documented the requirement that the metadata needs to contain a column called
strain
with strainnames.Both should be addressed.
The text was updated successfully, but these errors were encountered: