feat: Add `BERT_SCORE` to `QAAccuracy` and update unit/integration tests #314

kirupang-code · 2024-07-19T22:19:14Z

Added BERT_SCORE to qa_accuracy.py by creating SplitWithDelimiter, a transform that uses a target_output_delimiter to split a target output string into a list of possible targets. This allows us to compute multiple BertScores and take the max over all possible targets.
Additionally, the integration tests for qa_accuracy were taking over 10 minutes to run 100 records because of the runtime of the bertscore model, so I created a smaller dataset, triviaQA_sample_small with 4 records to be used.
Updated qa_accuracy_semantic_robustness.py to import and use QA_ACCURACY_SCORE_NAMES (QAAccuracy metrics w/o BERT_SCORE) instead of SCORE_NAMES.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

cr: https://code.amazon.com/reviews/CR-135854933

…weldge

…obustness

src/fmeval/eval_algorithms/qa_accuracy.py

test/unit/eval_algorithms/test_factual_knowledge.py

src/fmeval/eval_algorithms/qa_accuracy.py

danielezhu · 2024-07-25T17:55:36Z

src/fmeval/eval_algorithms/qa_accuracy.py

+]
+
+# for all metrics in qa_accuracy (metrics from both the QAAccuracyScores Transform and the BertScore Transform)
+SCORE_NAMES = QA_ACCURACY_SCORE_NAMES + [BERT_SCORE]


This is an incompatible change we should keep tabs on. It's less "dangerous" since we're augmenting the list instead of deleting elements from it, but at the end of the day, we're changing the value of a constant.

src/fmeval/eval_algorithms/qa_accuracy.py

src/fmeval/transforms/common.py

oyangz

(non-blocking): should we move the BertScore class out of summarization accuracy metrics since it’s used for multiple eval algos now?

danielezhu · 2024-08-12T16:09:10Z

(non-blocking): should we move the BertScore class out of summarization accuracy metrics since it’s used for multiple eval algos now?

If we do, it will be a breaking change that we should keep track of prior to the next release.

kirupang-code added 26 commits July 3, 2024 13:24

Added metric to factual knowledge + unit/integration tests

6d66b62

cr: https://code.amazon.com/reviews/CR-135854933

fixed changes from PR comments

cc866ed

Deleted metrics.py and restored code in util.py

843d9f6

added factual knowledge metrics to constants.py

c2f9efb

Merge branch 'main' of github.com:aws/fmeval

d7e5fa5

added factual knowledge metrics to be included in binary score

8d9bf4f

updated score descriptions for factual knowledge

1aee116

feat: add configurable param logical_operator (OR/AND) to factual kno…

d8c29da

…weldge

Merge branch 'main' of github.com:aws/fmeval

8715749

Merge branch 'main' of github.com:aws/fmeval

0ca7c47

fixed changes from PR comments

ba43b92

added warning and fixed typo

5e7bb50

Merge branch 'main' into main

411e0fa

modified warnings and fixed invalid config tests for factual_knowledge

f1f9792

Merge branch 'main' of github.com:aws/fmeval

43a9a48

Merge branch 'main' of github.com:kirupang-code/fmeval

5f2d1d4

Merge branch 'main' of github.com:kirupang-code/fmeval

b684515

Merge branch 'main' of github.com:kirupang-code/fmeval

e13bd5c

Merge branch 'main' of github.com:aws/fmeval

d51f3ff

feat: Adding BERTScore to QAAccuracy + QAAccuracySemanticRobustness

2a165cf

fix: documentation and tests for qa accuracy + qa accuracy semantic r…

634e85f

…obustness

fix: lint checks

c019be5

fix: created dataset for qa_accuracy, reverted to js_model_runner

0988ac7

fix: integration tests by adding approx for BertScore

28b449e

fix: moved BertScoreWithDelimiter to qa_accuracy and updated tests

5b99691

fix: restored qa_accuracy_semantic_robustness

e6c6f33

kirupang-code changed the title ~~feat: Added BERTScore to QAAccuracy and QAAccuracySemanticRobustness~~ feat: Added BERT_SCORE to QAAccuracy Jul 24, 2024

kirupang-code changed the title ~~feat: Added BERT_SCORE to QAAccuracy~~ feat: Added BERT_SCORE to QAAccuracy and updated unit/integration tests Jul 24, 2024

kirupang-code added 2 commits July 24, 2024 11:30

fix: smaller dataset for integ tests to reduce runtime

e3d02d6

fix: smaller dataset for integ tests to reduce runtime

7052ef2

Merge branch 'main' of github.com:kirupang-code/fmeval

4859531

kirupang-code commented Jul 24, 2024

View reviewed changes

src/fmeval/eval_algorithms/qa_accuracy.py Outdated Show resolved Hide resolved

kirupang-code commented Jul 25, 2024

View reviewed changes

test/unit/eval_algorithms/test_factual_knowledge.py Outdated Show resolved Hide resolved

danielezhu changed the title ~~feat: Added BERT_SCORE to QAAccuracy and updated unit/integration tests~~ feat: Add BERT_SCORE to QAAccuracy and update unit/integration tests Jul 25, 2024

danielezhu requested changes Jul 25, 2024

View reviewed changes

kirupang-code added 2 commits July 25, 2024 15:28

Add BertScoreMax transform for qa_accuracy

fdb99b1

fix: lint checks

cf6448a

kirupang-code requested a review from danielezhu July 25, 2024 22:31

kirupang-code commented Jul 25, 2024

View reviewed changes

src/fmeval/transforms/common.py Outdated Show resolved Hide resolved

fix: cleaning up code and checking reporting folder for changes

c9f565d

danielezhu requested changes Jul 29, 2024

View reviewed changes

src/fmeval/transforms/common.py Outdated Show resolved Hide resolved

src/fmeval/transforms/common.py Outdated Show resolved Hide resolved

kirupang-code added 8 commits July 30, 2024 16:41

saving changes while working on another branch

7076976

save changes before moving onto diff branch

e1e1a52

tested new summarization accuracy metrics w qa_accuracy

02182f1

refactor: using BertScore in qa_accuracy

a2a80c2

fixed summarization-accuracy_metrics errors

4f7e233

Merge branch 'main' of github.com:aws/fmeval

545cef3

Updated PR with refactored code from summarization accuracy metrics

edab8ea

fix: fixed merge issues

1fda12e

oyangz approved these changes Aug 9, 2024

View reviewed changes

kirupang-code requested a review from danielezhu August 10, 2024 20:59

danielezhu approved these changes Aug 12, 2024

View reviewed changes

xiaoyi-cheng merged commit 4164457 into aws:main Aug 12, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add `BERT_SCORE` to `QAAccuracy` and update unit/integration tests #314

feat: Add `BERT_SCORE` to `QAAccuracy` and update unit/integration tests #314

kirupang-code commented Jul 19, 2024 •

edited

Loading

danielezhu Jul 25, 2024

oyangz left a comment •

edited

Loading

danielezhu commented Aug 12, 2024

feat: Add BERT_SCORE to QAAccuracy and update unit/integration tests #314

feat: Add BERT_SCORE to QAAccuracy and update unit/integration tests #314

Conversation

kirupang-code commented Jul 19, 2024 • edited Loading

danielezhu Jul 25, 2024

Choose a reason for hiding this comment

oyangz left a comment • edited Loading

Choose a reason for hiding this comment

danielezhu commented Aug 12, 2024

feat: Add `BERT_SCORE` to `QAAccuracy` and update unit/integration tests #314

feat: Add `BERT_SCORE` to `QAAccuracy` and update unit/integration tests #314

kirupang-code commented Jul 19, 2024 •

edited

Loading

oyangz left a comment •

edited

Loading