Sharded distributed sampler for cached dataloading in DDP #195

ziw-liu · 2024-10-21T21:15:48Z

Add a distributed sampler that only permutes index within ranks, improving cache hit rate in DDP.

See viscy/scripts/shared_dict.py for usage.

Also includes changes from #196.

ziw-liu · 2024-10-21T21:25:12Z

Example output:


GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/hpc/mydata/ziwen.liu/anaconda/2022.05/x86_64/envs/viscy/lib/python3.11/site-packages/lightning/pytorch/trainer/setup.py:177: GPU available but not used. You can set it by doing `Trainer(accelerator='gpu')`.
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/3
Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/3
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/3
----------------------------------------------------------------------------------------------------
distributed_backend=gloo
All distributed processes registered. Starting with 3 processes
----------------------------------------------------------------------------------------------------
=== Initializing cache pool for rank 0 ===

=== Initializing cache pool for rank 1 ===

=== Initializing cache pool for rank 2 ===
| Name  | Type   | Params | Mode
0 | layer | Linear | 2      | train
2         Trainable params

0         Non-trainable params

2         Total params

0.000     Total estimated model params size (MB)

1         Modules in train mode

0         Modules in eval mode

Adding 31 to cache dict on rank 1
Adding 32 to cache dict on rank 2
Adding 38 to cache dict on rank 2
Adding 42 to cache dict on rank 0
Adding 30 to cache dict on rank 0
Adding 36 to cache dict on rank 0
Adding 37 to cache dict on rank 1
Adding 43 to cache dict on rank 1
Adding 48 to cache dict on rank 0
Adding 34 to cache dict on rank 1
Adding 49 to cache dict on rank 1
Adding 30 to cache dict on rank 2
Adding 44 to cache dict on rank 2
Adding 41 to cache dict on rank 2
Adding 35 to cache dict on rank 2
Adding 40 to cache dict on rank 1
Adding 46 to cache dict on rank 1
Adding 39 to cache dict on rank 0
Adding 33 to cache dict on rank 0
Adding 47 to cache dict on rank 2
Adding 45 to cache dict on rank 0
Adding 24 to cache dict on rank 2
Adding 13 to cache dict on rank 1
Adding 0 to cache dict on rank 0
Adding 20 to cache dict on rank 2
Adding 4 to cache dict on rank 0
Adding 29 to cache dict on rank 2
Adding 19 to cache dict on rank 1
Adding 26 to cache dict on rank 2
Adding 28 to cache dict on rank 2

=== Starting training ===

=== Starting training epoch 0 ===
Adding 8 to cache dict on rank 0
Adding 15 to cache dict on rank 1
Adding 3 to cache dict on rank 0
Adding 21 to cache dict on rank 2
Adding 11 to cache dict on rank 1
Adding 7 to cache dict on rank 0
Adding 23 to cache dict on rank 2
Adding 27 to cache dict on rank 2
Adding 22 to cache dict on rank 2
Adding 1 to cache dict on rank 0
Adding 9 to cache dict on rank 0
Adding 5 to cache dict on rank 0
Adding 17 to cache dict on rank 1
Adding 6 to cache dict on rank 0
Adding 18 to cache dict on rank 1
Adding 16 to cache dict on rank 1
Adding 14 to cache dict on rank 1
Adding 10 to cache dict on rank 1
Adding 25 to cache dict on rank 2
Adding 2 to cache dict on rank 0
Adding 12 to cache dict on rank 1

=== Starting training epoch 1 ===

=== Starting training epoch 2 ===

=== Starting training epoch 3 ===

=== Starting training epoch 4 ===

Trainer.fit stopped: max_epochs=5 reached.

* update torch >2.4.1 * black * ruff

This reverts commit 8c13f49.

viscy/data/hcs_ram.py

GPU transform for FCMAE pre-training

ziw-liu · 2024-11-14T22:12:39Z

The FcmaeUNet class can now do both pre-training and fine-tuning. This is controlled via the pretraining flag in model_config. This allows keeping the old VSUNet with its integration with the sliding window dataset, but also enable GPU-accelerated augmentations for virtual staining training.

ziw-liu · 2024-11-15T00:35:06Z

@edyoshikun do you want to keep improving the prototype in hcs_ram.py or we can remove it?

edyoshikun · 2024-11-16T01:28:29Z

remove the hcs_ram.py

viscy/data/gpu_aug.py

* fix spelling in docstring and comment * add batched zoom transform for tta * add standalone lightning module for arbitrary TTA * fix composition of different zoom factors

edyoshikun

This LGTM. The pending thing would be to properly write the augmentations now. Happy to merge this first and then add the augmentations.

ziw-liu · 2025-01-02T17:57:51Z

properly write the augmentations now

Can you elaborate?

edyoshikun · 2025-01-02T19:57:01Z

I mean the tiling in 3D. I was thinking of the neuromast dataset or mantis datasets where if the XY dimension is less than 1 patch, then we only crop/tile the top left corners. The equivalent thing would happen in the Z dimension.

ziw-liu · 2025-01-02T21:29:20Z

I mean the tiling in 3D. I was thinking of the neuromast dataset or mantis datasets where if the XY dimension is less than 1 patch, then we only crop/tile the top left corners. The equivalent thing would happen in the Z dimension.

IIRC this only affects validation? Let's open an issue and fix the tiling transform separately.

edyoshikun

I have super minor comments. Otherwise I've tested this with the phase pertaining and works well.

Let's fix the tiling in a separate issue+PR.

edyoshikun · 2025-01-02T21:58:46Z

viscy/data/ctmc_v1.py

        batch_size: int = 16,
        num_workers: int = 8,
+        val_subsample_ratio: int = 30,


Minor preference but when I see ratios I typically think of values 0-1. Like the masking ratio, the train/val ratio, etc.

What would be a better name?

Let's keep this for now as I am the only user at the moment.

viscy/data/ctmc_v1.py

viscy/data/livecell.py

* caching dataloader * caching data module * black * ruff * Bump torch to 2.4.1 (#174) * update torch >2.4.1 * black * ruff * adding timeout to ram_dataloader * bandaid to cached dataloader * fixing the dataloader using torch collate_fn * replacing dictionary with single array * loading prior to epoch 0 * Revert "replacing dictionary with single array" This reverts commit 8c13f49. * using multiprocessing manager * add sharded distributed sampler * add example script for ddp caching * format and lint * addding the custom distrb sampler to hcs_ram.py * adding sampler to val train dataloader * fix divisibility of the last shard * hcs_ram format and lint * data module that only crops and does not collate * wip: execute transforms on the GPU * path for if not ddp * fix randomness in inversion transform * add option to pop the normalization metadata * move gpu transform definition back to data module * add tiled crop transform for validation * add stack channel transform for gpu augmentation * fix typing * collate before sending to gpu * inherit gpu transforms for livecell dataset * update fcmae engine to apply per-dataset augmentations * format and lint hcs_ram * fix abc type hint * update docstring style * disable grad for validation transforms * improve sample image logging in fcmae * fix dataset length when batch size is larger than the dataset * fix docstring * add option to disable normalization metadata * inherit gpu transform for ctmc * remove duplicate method overrride * update docstring for ctmc * allow skipping caching for large datasets * make the fcmae module compatible with image translation * remove prototype implementation * fix import path * Arbitrary prediction time transforms (#209) * fix spelling in docstring and comment * add batched zoom transform for tta * add standalone lightning module for arbitrary TTA * fix composition of different zoom factors * add docstrings * fix typo in docstring --------- Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>

ziw-liu marked this pull request as ready for review October 21, 2024 21:17

ziw-liu requested a review from edyoshikun October 21, 2024 21:17

ziw-liu added enhancement New feature or request translation Image translation (VS) labels Oct 21, 2024

ziw-liu changed the base branch from ram_dataloader to main October 21, 2024 23:28

edyoshikun and others added 14 commits October 21, 2024 16:28

caching dataloader

e19ee14

caching data module

d31978d

black

041d738

ruff

7f76174

Bump torch to 2.4.1 (#174)

85ea791

* update torch >2.4.1 * black * ruff

adding timeout to ram_dataloader

1838581

bandaid to cached dataloader

f5c01a3

fixing the dataloader using torch collate_fn

26a06b8

replacing dictionary with single array

f2ff43c

loading prior to epoch 0

5fb96d7

Revert "replacing dictionary with single array"

848cd63

This reverts commit 8c13f49.

using multiprocessing manager

f7e57ae

add sharded distributed sampler

c4797b4

add example script for ddp caching

2c31e7d

ziw-liu force-pushed the simple-cache branch from 728998a to 2c31e7d Compare October 21, 2024 23:29

ziw-liu and others added 5 commits October 21, 2024 16:32

format and lint

5300b4a

addding the custom distrb sampler to hcs_ram.py

8a8b4b0

adding sampler to val train dataloader

49764fa

fix divisibility of the last shard

1fe5491

hcs_ram format and lint

0b005cf

ziw-liu commented Oct 23, 2024

View reviewed changes

viscy/data/hcs_ram.py Outdated Show resolved Hide resolved

ziw-liu and others added 3 commits October 23, 2024 10:00

data module that only crops and does not collate

023ca88

wip: execute transforms on the GPU

f7ce0ba

path for if not ddp

daa6860

ziw-liu and others added 3 commits November 1, 2024 13:15

update docstring for ctmc

07c1021

Merge pull request #196 from mehta-lab/gpu-transform

949c445

GPU transform for FCMAE pre-training

Merge branch 'main' into simple-cache

d331c1f

ziw-liu added this to the v0.4.0 milestone Nov 12, 2024

ziw-liu added 2 commits November 13, 2024 10:10

allow skipping caching for large datasets

7d79473

Merge branch 'main' into simple-cache

736d4c5

ziw-liu mentioned this pull request Nov 14, 2024

OOM issues with 3D FCMAE fine-tuning #201

Open

make the fcmae module compatible with image translation

e548d52

ziw-liu added 2 commits November 18, 2024 16:23

remove prototype implementation

084717f

fix import path

fdc377a

edyoshikun reviewed Nov 27, 2024

View reviewed changes

viscy/data/gpu_aug.py Show resolved Hide resolved

This was referenced Nov 27, 2024

Support loading data from a specific resolution of a .zarr pyramid #125

Open

Arbitrary prediction time transforms #209

Merged

ziw-liu and others added 3 commits December 2, 2024 15:33

Arbitrary prediction time transforms (#209)

96313fa

* fix spelling in docstring and comment * add batched zoom transform for tta * add standalone lightning module for arbitrary TTA * fix composition of different zoom factors

add docstrings

6e1818b

Merge branch 'main' into simple-cache

9126083

edyoshikun reviewed Jan 2, 2025

View reviewed changes

edyoshikun approved these changes Jan 2, 2025

View reviewed changes

fix typo in docstring

b864c6e

ziw-liu merged commit 05af2e6 into main Jan 2, 2025
4 checks passed

ziw-liu deleted the simple-cache branch January 2, 2025 22:42

This was referenced Jan 2, 2025

Adding a caching dataloader for 3D dataset #161

Closed

Memory-mapped caching for image translation training #218

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharded distributed sampler for cached dataloading in DDP #195

Sharded distributed sampler for cached dataloading in DDP #195

ziw-liu commented Oct 21, 2024 •

edited

Loading

ziw-liu commented Oct 21, 2024 •

edited

Loading

| Name | Type | Params | Mode

0 | layer | Linear | 2 | train

ziw-liu commented Nov 14, 2024

ziw-liu commented Nov 15, 2024

edyoshikun commented Nov 16, 2024

edyoshikun left a comment •

edited

Loading

ziw-liu commented Jan 2, 2025

edyoshikun commented Jan 2, 2025 •

edited

Loading

ziw-liu commented Jan 2, 2025

edyoshikun left a comment

edyoshikun Jan 2, 2025

ziw-liu Jan 2, 2025

ziw-liu Jan 2, 2025

Sharded distributed sampler for cached dataloading in DDP #195

Sharded distributed sampler for cached dataloading in DDP #195

Conversation

ziw-liu commented Oct 21, 2024 • edited Loading

ziw-liu commented Oct 21, 2024 • edited Loading

| Name | Type | Params | Mode

0 | layer | Linear | 2 | train

ziw-liu commented Nov 14, 2024

ziw-liu commented Nov 15, 2024

edyoshikun commented Nov 16, 2024

edyoshikun left a comment • edited Loading

Choose a reason for hiding this comment

ziw-liu commented Jan 2, 2025

edyoshikun commented Jan 2, 2025 • edited Loading

ziw-liu commented Jan 2, 2025

edyoshikun left a comment

Choose a reason for hiding this comment

edyoshikun Jan 2, 2025

Choose a reason for hiding this comment

ziw-liu Jan 2, 2025

Choose a reason for hiding this comment

ziw-liu Jan 2, 2025

Choose a reason for hiding this comment

ziw-liu commented Oct 21, 2024 •

edited

Loading

ziw-liu commented Oct 21, 2024 •

edited

Loading

edyoshikun left a comment •

edited

Loading

edyoshikun commented Jan 2, 2025 •

edited

Loading