-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add training code for reward model #222
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! First step: Please run/install pre-commit
, it is mandatory for all code that enters this repo.
.vscode/settings.json
Outdated
@@ -1,4 +1,4 @@ | |||
{ | |||
"python.formatting.provider": "black", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but we require all contributors to use the same pre-commit rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just updated, please revise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see the provider as autopep8, when it should be black
. do you maybe have a local commit that you didn't push yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall nice training code! Thanks a lot ... also for instantly responding to change requests on discord.
from rank_datasets import DataCollatorForPairRank, HFSummary, WebGPT | ||
from torch.utils.data import DataLoader | ||
from transformers import AutoTokenizer | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very useful file, a short docstring at the beginning would be nice to explain how it is used during dev/purpose (e.g. dataloader test, batch-shape inspection)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just reset the formatting provider in settings.json to black, otherwise LGTM, thank you very much!
.vscode/settings.json
Outdated
@@ -1,4 +1,4 @@ | |||
{ | |||
"python.formatting.provider": "black", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see the provider as autopep8, when it should be black
. do you maybe have a local commit that you didn't push yet?
@yk yeah, it's my problem. just reset the format setting |
trainer code to train a single score reward model. Currently support webgpt and raw datasets from humanfeed back summary by openai. See readme and rank_datasets.py for more details.