You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 1, 2024. It is now read-only.
Thank you very much for your excellent work and for sharing the repository!
I was wondering if you could provide more details on the hyperparameter of fine-tuning on SUN RGBD and Kinetics400 (in Table 2).
I think I would use the ImageNet-1k pre-trained model (ImageSwin) and fine-tune the parameters according to Supplement A., right?
Also, is the performance of the Omnivore model in Table 2 without using a pre-trained model?
The text was updated successfully, but these errors were encountered:
Hi,
Thanks for your interest.
Yes the baseline numbers are finetuned from the ImageSwin checkpoint. The video numbers in Table 2 come from the Video Swin Transformer so you can refer to that for finetuning parameters. For SUN finetuning you can use the hyperparameters from Appendix B.
In Table 2, the omnivore model is trained from scratch on the 3 datasets, so yes, it does not use any pre-trained model.
@rohitgirdhar
Thank you very much!!!
I could understand.
I have an additional question.
Do you plan to publish all config files on finetuning (in Table 3)?
Due to the different hyper-parameters in each finetuning dataset, I am struggling to reproduce OMNIVORE's performance.
If possible, I would appreciate it if you would consider publishing all config files regarding finetuning in Table 3.
Thank you very much for your excellent work and for sharing the repository!
I was wondering if you could provide more details on the hyperparameter of fine-tuning on SUN RGBD and Kinetics400 (in Table 2).
I think I would use the ImageNet-1k pre-trained model (ImageSwin) and fine-tune the parameters according to Supplement A., right? Also, is the performance of the Omnivore model in Table 2 without using a pre-trained model?
I'm interested in understanding the procedure for handling video data during training if the input comprises RGB-D images. Do you simply set them to zero, or is there another approach?
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi.
Thank you very much for your excellent work and for sharing the repository!
I was wondering if you could provide more details on the hyperparameter of fine-tuning on SUN RGBD and Kinetics400 (in Table 2).
I think I would use the ImageNet-1k pre-trained model (ImageSwin) and fine-tune the parameters according to Supplement A., right?
Also, is the performance of the Omnivore model in Table 2 without using a pre-trained model?
The text was updated successfully, but these errors were encountered: