-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supervised finetuning minor changes #456
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we force label_mask to be boolean at the collator stage? such that we don't need .bool() in trainer.py line 73 ?
but I think it's just minor so going to merge it |
Good point, that how I had it but it didn't work with deepspeed, I assume it casts tensors in a specific ways. I am going to fix it in the next commit. |
@@ -0,0 +1,187 @@ | |||
# Taken from https://github.com/sleekmike/Finetune_GPT-J_6B_8-bit/blob/master/gpt-j-6b-8-bit.py | |||
|
|||
import torch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@theblackcat102 Maybe instead use bitsandbytes for a generic solution for 8-bit quantization? Are there any downsides to that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really! feel free to edit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'm adding an option to train with 8-bit Adam from BNB, as suggested here.
Also I'm pretty sure I've previously seen a generic library that implements a whole pile of different adapter types, could be convenient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small changes