Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple data configs in evaluate #283

Merged
merged 1 commit into from
Jun 14, 2024

Conversation

athewsey
Copy link
Contributor

Issue #, if available: #269

Description of changes:

Extend EvalAlgorithmInterface.evaluate() interface to support specifying a list of multiple data_config objects. evaluate() already returns a list of results by dataset, because when run with no data_config argument all applicable built-in datasets are analyzed. As mentioned in the attached issue, it was weird and confusing that users couldn't explicitly specify a set of more than one datasets to use.

Testing done:

  • Added unit tests to cover all scenarios of get_dataset_configs()
  • Added one integration test to validate multi-dataset functionality for the only evaluator (FactualKnowledge) where multiple integration test datasets had already been defined.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

return [DATASET_CONFIGS[dataset_name] for dataset_name in EVAL_DATASETS[eval_name]]
elif isinstance(data_config, list):
return data_config
elif isinstance(data_config, tuple):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we handling the case where data_config is a tuple?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Python is a dynamic language with type annotations, rather than strictly enforcing types, it seemed like passing in a tuple would be an easy mistake for users to make and falling through to the else to return [(cfg1, cfg2)] would be a needlessly annoying/hard-to-debug consequence for that

  1. There could be fancier options for more complete Sequence support, like checking hasattr(data_config, "__iter__") and __getitem__ (which I believe would include dicts, not ideal) or checking isinstance(data_config, collections.abc.Sequence) (which would include strings - also weird)... But idk if you might want to support more sequence-like properties on DataConfig itself one day?
  2. We could explicitly check isinstance(data_config, DataConfig) rather than having a plain else, and throw an error if none of the branches match, but doing isinstance on our custom class felt like an even bigger violation of Python's duck typing convention

...So my compromise was just to explicitly support both the basic builtin array-likes but not try to handle other weird collections that might crop up more rarely? 🤷‍♂️ But agree it's a little odd.

Copy link
Contributor

@danielezhu danielezhu Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion for or against adding the elif branch for handling tuples, but I personally think it makes the code somewhat confusing, since the type annotation for data_config doesn't include the tuple case.

I don't think it's very likely for someone to pass in a tuple, as lists are predominantly used. To be honest, I can't even remember the last time I encountered a function that accepted a tuple instead of a list.

Extend EvalAlgorithmInterface.evaluate() interface to support
specifying a list of multiple data_config objects. Previously the
function already returned a list of results by dataset because when
run with no data_config argument, all built-in datasets would be
tested. However, users were not able to specify multiple datasets via
data_config.
@athewsey
Copy link
Contributor Author

Rebased to current main. As I understood from the original review there wasn't actually a need for any change, but let me know if this is not the case!

Appreciate if y'all can help get this merged so we can have a more intuitive API 🙏

@keerthanvasist keerthanvasist merged commit 41c506a into aws:main Jun 14, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants