Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bth5032/78 blackcat trainer #313

Merged
merged 6 commits into from
Jan 5, 2023

Conversation

bth5032
Copy link
Contributor

@bth5032 bth5032 commented Jan 3, 2023

#78

As per discussion with @theblackcat102 I built the rankgen trainer on top of their framework (wandb). The model seems to be training now in fp32. Apparently t5 has some issue with fp16. Blackcat suggested scaling the weights might help as in 1. This could be a good next step if we want to move with this model. Until then, I think this code is worth comitting since it shows how to add new models which are not AutoModelForSequenceClassification

@yk yk linked an issue Jan 3, 2023 that may be closed by this pull request
@yk yk requested review from theblackcat102 and sanagno and removed request for yk January 3, 2023 09:21
@yk yk added the ml label Jan 3, 2023
@theblackcat102
Copy link
Collaborator

theblackcat102 commented Jan 3, 2023

@bth5032 your wandb loss looks good, but I wonder why is the accuracy validation missing?

Not trying to nickpick anyway, but just curious why not assign the default value for tokenizer_name is same as model_name at the argument_parser then we can remove this if else at the trainer here

Overall the trainer looks fine

Copy link
Collaborator

@theblackcat102 theblackcat102 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

Copy link
Collaborator

@sanagno sanagno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@yk
Copy link
Collaborator

yk commented Jan 3, 2023

@bth5032 thank you! could you run pre-commit run --all-files to make linters happy?

@bth5032
Copy link
Contributor Author

bth5032 commented Jan 4, 2023

@bth5032 thank you! could you run pre-commit run --all-files to make linters happy?

Thanks!

Not trying to nickpick anyway, but just curious why not assign the default value for tokenizer_name is same as model_name at the argument_parser then we can remove this if else at the trainer here

Ah, I normally use hydra/omegaconf for this, didn't realize you had that utility function, I cleaned that logic up.

@bth5032 your wandb loss looks good, but I wonder why is the accuracy validation missing?

I've been trying to figure that out myself. The compute_metrics function never runs for me... I found this thread and tried setting evaluation_strategy=IntervalStrategy.STEPS and a few other things. But ultimately I verified the condition they listed here does evaluate to true in prediction_step. However I still don't ever see compute_metrics being called...

I'll keep looking into it and submit a PR if I figure it out.

@theblackcat102 @sanagno should be ready for merge.

@bth5032 bth5032 requested review from theblackcat102 and removed request for andreaskoepf January 4, 2023 21:01
Copy link
Collaborator

@theblackcat102 theblackcat102 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

Copy link
Collaborator

@theblackcat102 theblackcat102 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

models.py is fine

@theblackcat102 theblackcat102 merged commit 325c978 into LAION-AI:main Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Train a reward model based on RankGen
4 participants