-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bth5032/78 blackcat trainer #313
Bth5032/78 blackcat trainer #313
Conversation
@bth5032 your wandb loss looks good, but I wonder why is the accuracy validation missing? Not trying to nickpick anyway, but just curious why not assign the default value for tokenizer_name is same as model_name at the argument_parser then we can remove this if else at the trainer here Overall the trainer looks fine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
@bth5032 thank you! could you run |
Thanks!
Ah, I normally use hydra/omegaconf for this, didn't realize you had that utility function, I cleaned that logic up.
I've been trying to figure that out myself. The I'll keep looking into it and submit a PR if I figure it out. @theblackcat102 @sanagno should be ready for merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
models.py is fine
#78
As per discussion with @theblackcat102 I built the rankgen trainer on top of their framework (wandb). The model seems to be training now in fp32. Apparently t5 has some issue with fp16. Blackcat suggested scaling the weights might help as in 1. This could be a good next step if we want to move with this model. Until then, I think this code is worth comitting since it shows how to add new models which are not
AutoModelForSequenceClassification