-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RoPE fixes for 1.5, bfloat16 support in prepare_dataset, gradient_accumulation grad norm undefined fix #107
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
RoPE_BASE_HEIGHT = transformer.config.sample_height * VAE_SCALE_FACTOR_SPATIAL | ||
RoPE_BASE_WIDTH = transformer.config.sample_width * VAE_SCALE_FACTOR_SPATIAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of DeepSpeed we won't be able to access transformer.config
. Let's use model_config
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can access it as this point because it is still not wrapped by deepspeed
@@ -828,6 +833,10 @@ def load_model_hook(models, input_dir): | |||
gradient_norm_before_clip = get_gradient_norm(transformer.parameters()) | |||
accelerator.clip_grad_norm_(transformer.parameters(), args.max_grad_norm) | |||
gradient_norm_after_clip = get_gradient_norm(transformer.parameters()) | |||
logs.update({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not consider if we're using DeepSpeed and log it accordingly?
I referring to the following comment:
# gradnorm + deepspeed: https://github.com/microsoft/DeepSpeed/issues/4555
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cannot calculate grad_norm for deepspeed, so this will be nan
or 0
i think. This is because the gradient step is handled internally in deepspeed and before we have access to the gradients, it gets cleared. If we want access to it, it will require a backward hook on the last module, but we can skip that for now and revisit in future since it didn't work before these changes either
def save_image(image: torch.Tensor, path: pathlib.Path) -> None: | ||
image = to_pil_image(image) | ||
image = image.to(dtype=torch.float32).clamp(-1, 1) | ||
image = to_pil_image(image.float()) | ||
image.save(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F.ToPILImage does not support bfloat16. So if someone was to do precomputation in bfloat16 and try to save the images/videos in viewable format, it would error out. casting to fp32 fixes this
No description provided.