add I2V sft and fix an error #97

jiashenggu · 2024-11-26T12:22:33Z

i am not sure how to set ofs in training, I just follow the inference pipeline setting

sayakpaul · 2024-11-29T09:42:16Z

Thanks! Could you provide some example results as well?

sayakpaul

I left some preliminary comments. Will let @a-r-r-o-w comment on the new script first

sayakpaul · 2024-11-29T09:51:04Z

train_image_to_video_sft.sh

+LEARNING_RATES=("1e-4")
+LR_SCHEDULES=("cosine_with_restarts")
+OPTIMIZERS=("adamw")
+MAX_TRAIN_STEPS=("20000")


20000 steps!?

I just follow the setting of train_text_to_video_sft.sh

Have you performed any experiments yourself?

sayakpaul · 2024-11-29T10:00:26Z

training/cogvideox_image_to_video_lora.py

@@ -799,12 +799,15 @@ def load_model_hook(models, input_dir):
                # (this is the forward diffusion process)
                noisy_video_latents = scheduler.add_noise(video_latents, noise, timesteps)
                noisy_model_input = torch.cat([noisy_video_latents, image_latents], dim=2)
-
+                model_config.patch_size_t if hasattr(model_config, "patch_size_t") else None,


This seems wrong. No assignment to a variable. Is this expected?

Forget to delete early edit. Fixed it

training/cogvideox_text_to_video_lora.py

training/cogvideox_text_to_video_sft.py

a-r-r-o-w

Thank you for the script and OFS fixes! Do you have any publically available training runs to verify correctness? For example, we did a 20000 step run on T2V SFT before sharing the script, so something similar to verify I2V would be nice (even if lower number of steps)

jiashenggu · 2024-12-04T06:52:55Z

Thank you for the script and OFS fixes! Do you have any publically available training runs to verify correctness? For example, we did a 20000 step run on T2V SFT before sharing the script, so something similar to verify I2V would be nice (even if lower number of steps)

I did a 24000 step run, but I'm not sure it met expectations. It seems somehow better.

valid prompt:
A black-and-white animated scene featuring three characters in a static setting. Mickey Mouse-like character stands on one leg, hands on hips, with a playful expression. Center character has an exaggerated open mouth, caught in mid-motion, suggesting singing or surprise. Female character in a tutu and flower-adorned hat dances, arms raised. Background features a plain wall with scattered musical notes. The characters maintain their positions and expressions, with no changes in lighting, environment, or camera perspective, focusing on their interaction within this continuous moment.

valid image:

base model output：

output_base.mp4

24000 step run model output：

output_24000.mp4

sayakpaul · 2024-12-04T07:04:35Z

Seems definitely better to me. At least it learned semantics and better motion (IMO).

a-r-r-o-w · 2024-12-05T09:24:27Z

@jiashenggu Looks good to merge! Could you rebase against the main branch? Because it looks like some changes already in main our made here as well

a-r-r-o-w · 2024-12-05T20:29:54Z

Thank you so much for this! I've verified that it works. Please feel free to open PRs for speedups or other suggestions for improvements :)

jiashenggu added 3 commits November 22, 2024 15:21

adaption for CogVideoX1.5

55c8320

add patch_size_t in full finetuning of T2V and lora finetuning of I2V

5aeebcf

edit args

bde8ae3

sayakpaul requested review from sayakpaul and a-r-r-o-w and removed request for sayakpaul November 29, 2024 09:42

sayakpaul reviewed Nov 29, 2024

View reviewed changes

sayakpaul mentioned this pull request Nov 29, 2024

Whether the script for training cogvideoX1.5 can be updated. #102

Closed

jiashenggu requested a review from sayakpaul December 2, 2024 02:30

a-r-r-o-w reviewed Dec 2, 2024

View reviewed changes

jiashenggu requested a review from a-r-r-o-w December 4, 2024 06:53

jiashenggu added 2 commits December 5, 2024 18:22

Merge remote-tracking branch 'upstream/main' into main

cb95416

add I2V sft and add ofs to adapt cogvideox 1.5

f1c8f14

jiashenggu force-pushed the main branch from 77aae8d to f1c8f14 Compare December 5, 2024 10:52

a-r-r-o-w approved these changes Dec 5, 2024

View reviewed changes

a-r-r-o-w merged commit 80d1150 into a-r-r-o-w:main Dec 5, 2024

a-r-r-o-w mentioned this pull request Dec 6, 2024

AttributeError: 'NoneType' object has no attribute 'shape' #93

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add I2V sft and fix an error #97

add I2V sft and fix an error #97

jiashenggu commented Nov 26, 2024 •

edited

Loading

sayakpaul commented Nov 29, 2024

sayakpaul left a comment

sayakpaul Nov 29, 2024

jiashenggu Dec 2, 2024

sayakpaul Dec 2, 2024

sayakpaul Nov 29, 2024

jiashenggu Dec 2, 2024

a-r-r-o-w left a comment

jiashenggu commented Dec 4, 2024 •

edited

Loading

sayakpaul commented Dec 4, 2024

a-r-r-o-w commented Dec 5, 2024

a-r-r-o-w commented Dec 5, 2024

add I2V sft and fix an error #97

add I2V sft and fix an error #97

Conversation

jiashenggu commented Nov 26, 2024 • edited Loading

sayakpaul commented Nov 29, 2024

sayakpaul left a comment

Choose a reason for hiding this comment

sayakpaul Nov 29, 2024

Choose a reason for hiding this comment

jiashenggu Dec 2, 2024

Choose a reason for hiding this comment

sayakpaul Dec 2, 2024

Choose a reason for hiding this comment

sayakpaul Nov 29, 2024

Choose a reason for hiding this comment

jiashenggu Dec 2, 2024

Choose a reason for hiding this comment

a-r-r-o-w left a comment

Choose a reason for hiding this comment

jiashenggu commented Dec 4, 2024 • edited Loading

sayakpaul commented Dec 4, 2024

a-r-r-o-w commented Dec 5, 2024

a-r-r-o-w commented Dec 5, 2024

jiashenggu commented Nov 26, 2024 •

edited

Loading

jiashenggu commented Dec 4, 2024 •

edited

Loading