Support multiple data configs in evaluate #283

athewsey · 2024-05-29T05:35:29Z

Issue #, if available: #269

Description of changes:

Extend EvalAlgorithmInterface.evaluate() interface to support specifying a list of multiple data_config objects. evaluate() already returns a list of results by dataset, because when run with no data_config argument all applicable built-in datasets are analyzed. As mentioned in the attached issue, it was weird and confusing that users couldn't explicitly specify a set of more than one datasets to use.

Testing done:

Added unit tests to cover all scenarios of get_dataset_configs()
Added one integration test to validate multi-dataset functionality for the only evaluator (FactualKnowledge) where multiple integration test datasets had already been defined.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

danielezhu · 2024-05-30T06:32:21Z

src/fmeval/eval_algorithms/util.py

+        return [DATASET_CONFIGS[dataset_name] for dataset_name in EVAL_DATASETS[eval_name]]
+    elif isinstance(data_config, list):
+        return data_config
+    elif isinstance(data_config, tuple):


Why are we handling the case where data_config is a tuple?

Since Python is a dynamic language with type annotations, rather than strictly enforcing types, it seemed like passing in a tuple would be an easy mistake for users to make and falling through to the else to return [(cfg1, cfg2)] would be a needlessly annoying/hard-to-debug consequence for that

There could be fancier options for more complete Sequence support, like checking hasattr(data_config, "__iter__") and __getitem__ (which I believe would include dicts, not ideal) or checking isinstance(data_config, collections.abc.Sequence) (which would include strings - also weird)... But idk if you might want to support more sequence-like properties on DataConfig itself one day?

We could explicitly check isinstance(data_config, DataConfig) rather than having a plain else, and throw an error if none of the branches match, but doing isinstance on our custom class felt like an even bigger violation of Python's duck typing convention

...So my compromise was just to explicitly support both the basic builtin array-likes but not try to handle other weird collections that might crop up more rarely? 🤷‍♂️ But agree it's a little odd.

I don't have a strong opinion for or against adding the elif branch for handling tuples, but I personally think it makes the code somewhat confusing, since the type annotation for data_config doesn't include the tuple case.

I don't think it's very likely for someone to pass in a tuple, as lists are predominantly used. To be honest, I can't even remember the last time I encountered a function that accepted a tuple instead of a list.

Extend EvalAlgorithmInterface.evaluate() interface to support specifying a list of multiple data_config objects. Previously the function already returned a list of results by dataset because when run with no data_config argument, all built-in datasets would be tested. However, users were not able to specify multiple datasets via data_config.

athewsey · 2024-06-14T04:03:41Z

Rebased to current main. As I understood from the original review there wasn't actually a need for any change, but let me know if this is not the case!

Appreciate if y'all can help get this merged so we can have a more intuitive API 🙏

danielezhu reviewed May 30, 2024

View reviewed changes

danielezhu approved these changes May 30, 2024

View reviewed changes

athewsey force-pushed the feat/multidata branch from f65d84e to 0e251bd Compare June 14, 2024 03:11

danielezhu approved these changes Jun 14, 2024

View reviewed changes

keerthanvasist approved these changes Jun 14, 2024

View reviewed changes

keerthanvasist merged commit 41c506a into aws:main Jun 14, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple data configs in evaluate #283

Support multiple data configs in evaluate #283

athewsey commented May 29, 2024

danielezhu May 30, 2024

athewsey May 31, 2024

danielezhu Jun 14, 2024 •

edited

Loading

athewsey commented Jun 14, 2024

Support multiple data configs in evaluate #283

Support multiple data configs in evaluate #283

Conversation

athewsey commented May 29, 2024

danielezhu May 30, 2024

Choose a reason for hiding this comment

athewsey May 31, 2024

Choose a reason for hiding this comment

danielezhu Jun 14, 2024 • edited Loading

Choose a reason for hiding this comment

athewsey commented Jun 14, 2024

danielezhu Jun 14, 2024 •

edited

Loading