refactor: update evaluate_dataset to take in a dataset instead of dataset config #232

danielezhu · 2024-03-26T23:02:44Z

Description of changes:
This PR refactors the evaluate_dataset method to consume a dataset directly, instead of a dataset config. This will allow evaluate_dataset to be compatible with more eval algorithms.

The verify_model_determinism function has been updated in preparation for its usage in Summarization Accuracy Semantic Robustness. By taking in a prompt template and model input column, we can verify model determinism prior to executing the transforms for prompt-generation and model-invocation. This will allow SASR's evaluate method to follow the same overall template as all other algos.

This PR additionally removes the BertscoreHelperModel class, as we now use BertscoreModel.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…aset config

danielezhu · 2024-03-26T23:20:34Z

src/fmeval/eval_algorithms/util.py

+    else:
+        try:
+            validate_dataset(dataset, [DatasetColumns.MODEL_OUTPUT.value.name])
+        except EvalAlgorithmClientError:


The try-except is just for providing a more specific error message than the generic one provided by validate_dataset

src/fmeval/eval_algorithms/util.py

src/fmeval/eval_algorithms/summarization_accuracy_semantic_robustness.py

src/fmeval/eval_algorithms/general_semantic_robustness.py

danielezhu · 2024-03-27T01:26:24Z

src/fmeval/eval_algorithms/helper_models/helper_model.py

@@ -140,14 +147,10 @@ def get_helper_scores(self, text_input: List[str]) -> Dict[str, List[float]]:  #
        """
        inputs = self._tokenizer(text_input, return_tensors="pt", truncation=True, padding=True).to(self._model.device)
        scores = torch.sigmoid(self._model(**inputs)[0]).cpu().detach().numpy()
-        results = {}


Since text_input is always a List and not a str, we don't need the isinstance(text_input, str) section. The fact that it is always a List is indicated in the type annotation of the function, and everywhere that DetoxifyHelperModel.get_helper_scores is called, we pass a list: see DetoxifyHelperModel.__call__ and evaluate_sample for toxicity.py).

Not to mention that if we use the old code, then the output type annotation of this function is wrong; the returned type will be a Dict[str, float]. I have already manually verified this.

refactor: update evaluate_dataset to take in a dataset instead of dat…

5df4910

…aset config

danielezhu commented Mar 26, 2024

View reviewed changes

malhotra18 requested changes Mar 26, 2024

View reviewed changes

src/fmeval/eval_algorithms/util.py Outdated Show resolved Hide resolved

src/fmeval/eval_algorithms/summarization_accuracy_semantic_robustness.py Outdated Show resolved Hide resolved

malhotra18 requested changes Mar 27, 2024

View reviewed changes

src/fmeval/eval_algorithms/general_semantic_robustness.py Show resolved Hide resolved

address PR comments

7f0a000

oyangz approved these changes Mar 27, 2024

View reviewed changes

danielezhu commented Mar 27, 2024

View reviewed changes

malhotra18 approved these changes Mar 27, 2024

View reviewed changes

danielezhu merged commit cb3b30e into aws:main Mar 27, 2024
2 of 3 checks passed

danielezhu deleted the refactor_evaluate_dataset branch March 27, 2024 01:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: update evaluate_dataset to take in a dataset instead of dataset config #232

refactor: update evaluate_dataset to take in a dataset instead of dataset config #232

danielezhu commented Mar 26, 2024 •

edited

Loading

danielezhu Mar 26, 2024

danielezhu Mar 27, 2024

danielezhu Mar 27, 2024

refactor: update evaluate_dataset to take in a dataset instead of dataset config #232

refactor: update evaluate_dataset to take in a dataset instead of dataset config #232

Conversation

danielezhu commented Mar 26, 2024 • edited Loading

danielezhu Mar 26, 2024

Choose a reason for hiding this comment

danielezhu Mar 27, 2024

Choose a reason for hiding this comment

danielezhu Mar 27, 2024

Choose a reason for hiding this comment

danielezhu commented Mar 26, 2024 •

edited

Loading