-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sharded distributed sampler for cached dataloading in DDP #195
Conversation
Example output:
GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/hpc/mydata/ziwen.liu/anaconda/2022.05/x86_64/envs/viscy/lib/python3.11/site-packages/lightning/pytorch/trainer/setup.py:177: GPU available but not used. You can set it by doing `Trainer(accelerator='gpu')`.
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/3
Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/3
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/3
----------------------------------------------------------------------------------------------------
distributed_backend=gloo
All distributed processes registered. Starting with 3 processes
----------------------------------------------------------------------------------------------------
|
* update torch >2.4.1 * black * ruff
This reverts commit 8c13f49.
GPU transform for FCMAE pre-training
The FcmaeUNet class can now do both pre-training and fine-tuning. This is controlled via the |
@edyoshikun do you want to keep improving the prototype in |
remove the hcs_ram.py |
* fix spelling in docstring and comment * add batched zoom transform for tta * add standalone lightning module for arbitrary TTA * fix composition of different zoom factors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. The pending thing would be to properly write the augmentations now. Happy to merge this first and then add the augmentations.
Can you elaborate? |
I mean the tiling in 3D. I was thinking of the neuromast dataset or mantis datasets where if the XY dimension is less than 1 patch, then we only crop/tile the top left corners. The equivalent thing would happen in the Z dimension. |
IIRC this only affects validation? Let's open an issue and fix the tiling transform separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have super minor comments. Otherwise I've tested this with the phase pertaining and works well.
Let's fix the tiling in a separate issue+PR.
batch_size: int = 16, | ||
num_workers: int = 8, | ||
val_subsample_ratio: int = 30, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor preference but when I see ratios I typically think of values 0-1. Like the masking ratio, the train/val ratio, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep this for now as I am the only user at the moment.
* caching dataloader * caching data module * black * ruff * Bump torch to 2.4.1 (#174) * update torch >2.4.1 * black * ruff * adding timeout to ram_dataloader * bandaid to cached dataloader * fixing the dataloader using torch collate_fn * replacing dictionary with single array * loading prior to epoch 0 * Revert "replacing dictionary with single array" This reverts commit 8c13f49. * using multiprocessing manager * add sharded distributed sampler * add example script for ddp caching * format and lint * addding the custom distrb sampler to hcs_ram.py * adding sampler to val train dataloader * fix divisibility of the last shard * hcs_ram format and lint * data module that only crops and does not collate * wip: execute transforms on the GPU * path for if not ddp * fix randomness in inversion transform * add option to pop the normalization metadata * move gpu transform definition back to data module * add tiled crop transform for validation * add stack channel transform for gpu augmentation * fix typing * collate before sending to gpu * inherit gpu transforms for livecell dataset * update fcmae engine to apply per-dataset augmentations * format and lint hcs_ram * fix abc type hint * update docstring style * disable grad for validation transforms * improve sample image logging in fcmae * fix dataset length when batch size is larger than the dataset * fix docstring * add option to disable normalization metadata * inherit gpu transform for ctmc * remove duplicate method overrride * update docstring for ctmc * allow skipping caching for large datasets * make the fcmae module compatible with image translation * remove prototype implementation * fix import path * Arbitrary prediction time transforms (#209) * fix spelling in docstring and comment * add batched zoom transform for tta * add standalone lightning module for arbitrary TTA * fix composition of different zoom factors * add docstrings * fix typo in docstring --------- Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>
Add a distributed sampler that only permutes index within ranks, improving cache hit rate in DDP.
See
viscy/scripts/shared_dict.py
for usage.Also includes changes from #196.