-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add basic stat metric, and model performance report #136
Conversation
|
||
|
||
# Model Performance Metrics | ||
def multicategory_confusion_matrix( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to add an example in the docstring for reference, as I had to look at the test to fully understand it.
src/smclarify/bias/report.py
Outdated
label_name: str, | ||
metrics: List[MetricResult], | ||
binary_confusion_matrix: List[float], | ||
confusion_matrix: Optional[Dict] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a better name. is this the multiclass confusion matrix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is not a good name. But changing this will also change the output json key, so I'll keep them consistent, and specify in the docstring.
assert label_dist == basic_stats.observed_label_distribution(dfB[0], sensitive_facet_index, dfB_pos_label_idx) | ||
|
||
|
||
def test_performance_metrics(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also add a test which computes these metrics directly from dfBinary
and dfMulticategory
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per offline discussion, I need to test calculating right number of TP/TN/FP/FN from df directly. So I moved it to a helper function
:param label_series: Label Data Series | ||
:param predicted_label_series: Predicted Label Data Series | ||
:param unique_label_values: List of unique label values computed from the set of true and predicted labels | ||
:return: Matrix JSON where rows refer to true labels, and columns refer to predicted labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the values in each column? Are these normalized? Looking at the code, does not look like the values are normalized, but in confusion_matrix
, we normalize them. Why the difference?
) | ||
df.columns = ["x", "y", "z", "yhat"] | ||
|
||
expected_value = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as the previous one. red
is not found in observed labels.
Issue #, if available:
Cherry-picked Pranav's changes, and fix typo.
Description of changes:
See details in #135
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.