Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: update example notebooks to pip install the fmeval package #158

Merged
merged 9 commits into from
Dec 14, 2023

Conversation

danielezhu
Copy link
Contributor

Issue #, if available:

Description of changes:
This PR updates how the example notebooks install the fmeval package now that it's available through PyPI. It also cleans up the notebooks by adding more explanatory comments and using a uniform style so they all have the same look and feel. I've also deleted the JS toxicity example notebook as I feel it does not add any value given that we already have a JS stereotyping notebook.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -11,9 +11,9 @@
"tags": []
Copy link
Contributor

@malhotra18 malhotra18 Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we keep below commented, add an additional comment mentioning the use case of this envirnment variable?

os.environ["PARALLELIZATION_FACTOR"] = "1"

Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that almost all users (aside from contributors who are very familiar with the internals of our codebase) no longer need to be aware of this env variable. We originally needed to include this as a hacky way to avoid encountering some niche Ray bugs, but I have since fixed those bugs in this PR. I think it's best not to expose details about infra internals (i.e. Ray details) as much as possible, so as to not overwhelm the average user.

This env variable is currently only being used to determine the number of actors in the ActorPools that we create. While too many actors can lead to OOM issues in theory, in our use case, OOM issues are almost never caused by having too many ActorPool actors, since those actors are always cheap (memory and CPU usage-wise).

The only time OOM would be an issue for us in the example notebooks is if we're dealing with an eval algo that uses the BertscoreHelperModel, which is a relatively expensive (memory-wise) actor. But in this cases, we don't use ActorPool, so PARALLELIZATION_FACTOR doesn't apply. In such cases, the user just needs to use a notebook instance that has enough memory to load the Bert model.

Copy link
Contributor

@rvasahu-amazon rvasahu-amazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with minor comments/questions

@danielezhu danielezhu merged commit 66b8e10 into aws:main Dec 14, 2023
@danielezhu danielezhu deleted the update_example_notebooks branch December 14, 2023 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants