Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supervised finetuning minor changes #456

Merged
merged 8 commits into from
Jan 7, 2023
Merged

Conversation

sanagno
Copy link
Collaborator

@sanagno sanagno commented Jan 6, 2023

Small changes

  • small typo when masking with padding
  • better collator with comments
  • started quantization

@sanagno sanagno added the ml label Jan 6, 2023
@sanagno sanagno mentioned this pull request Jan 6, 2023
Copy link
Collaborator

@theblackcat102 theblackcat102 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we force label_mask to be boolean at the collator stage? such that we don't need .bool() in trainer.py line 73 ?

@theblackcat102
Copy link
Collaborator

but I think it's just minor so going to merge it

@theblackcat102 theblackcat102 merged commit 64a8543 into main Jan 7, 2023
@theblackcat102 theblackcat102 deleted the sft-gptjt-qa-labels branch January 7, 2023 00:27
@sanagno
Copy link
Collaborator Author

sanagno commented Jan 7, 2023

Good point, that how I had it but it didn't work with deepspeed, I assume it casts tensors in a specific ways. I am going to fix it in the next commit.

@@ -0,0 +1,187 @@
# Taken from https://github.com/sleekmike/Finetune_GPT-J_6B_8-bit/blob/master/gpt-j-6b-8-bit.py

import torch
Copy link
Contributor

@mrcabbage972 mrcabbage972 Jan 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@theblackcat102 Maybe instead use bitsandbytes for a generic solution for 8-bit quantization? Are there any downsides to that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really! feel free to edit

Copy link
Contributor

@mrcabbage972 mrcabbage972 Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm adding an option to train with 8-bit Adam from BNB, as suggested here.

Also I'm pretty sure I've previously seen a generic library that implements a whole pile of different adapter types, could be convenient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanagno Opened PR here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants