[Bugfix] Fix bug in cross entropy loss #3457

mmeendez8 · 2023-11-30T11:37:32Z

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Fixes #3412

Modification

We just need to replace tensor creation using torch.stack() instead of torch.tensor().

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repos?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMDet3D.
The documentation has been modified accordingly, like docstring or example tutorials.

call560 · 2024-01-02T02:25:30Z

When I used this 'bug fix' to fix the WCE loss error reported during KNET training, I got this assertion error again. The error message is as follows:

../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [96,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [97,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [98,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [99,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [100,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [101,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [102,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [103,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [104,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [105,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [106,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [107,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [108,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [109,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [110,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [111,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [112,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [113,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [114,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [115,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [116,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [117,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [118,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [119,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [120,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [121,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [122,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [123,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [124,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [125,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [126,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [127,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [96,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [97,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [98,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [99,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [100,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [101,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [102,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [103,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [104,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [105,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [106,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [107,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [108,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [109,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [110,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [111,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [112,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [113,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [114,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [115,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [116,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [117,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [118,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [119,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [120,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [121,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [122,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [123,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [124,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [125,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [126,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [127,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [96,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [97,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [98,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [99,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [100,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [101,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [102,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [103,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [104,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [105,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [106,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [107,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [108,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [109,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [110,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [111,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [112,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [113,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [114,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [115,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [116,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [117,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [118,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [119,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [120,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [121,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [122,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [123,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [124,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [125,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [126,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [127,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
Traceback (most recent call last):
File "tools/train.py", line 104, in
main()
File "tools/train.py", line 100, in main
runner.train()
File "/root/miniconda3/envs/mmseg/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1777, in train
model = self.train_loop.run() # type: ignore
File "/root/miniconda3/envs/mmseg/lib/python3.8/site-packages/mmengine/runner/loops.py", line 278, in run
self.run_iter(data_batch)
File "/root/miniconda3/envs/mmseg/lib/python3.8/site-packages/mmengine/runner/loops.py", line 301, in run_iter
outputs = self.runner.model.train_step(
File "/root/miniconda3/envs/mmseg/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 114, in train_step
losses = self._run_forward(data, mode='loss') # type: ignore
File "/root/miniconda3/envs/mmseg/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 346, in _run_forward
results = self(**data, mode=mode)
File "/root/miniconda3/envs/mmseg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/autodl-tmp/defect_mmseg/mmsegmentation/mmseg/models/segmentors/base.py", line 94, in forward
return self.loss(inputs, data_samples)
File "/root/autodl-tmp/defect_mmseg/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 178, in loss
loss_decode = self._decode_head_forward_train(x, data_samples)
File "/root/autodl-tmp/defect_mmseg/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 139, in _decode_head_forward_train
loss_decode = self.decode_head.loss(inputs, data_samples,
File "/root/autodl-tmp/defect_mmseg/mmsegmentation/mmseg/models/decode_heads/decode_head.py", line 262, in loss
losses = self.loss_by_feat(seg_logits, batch_data_samples)
File "/root/autodl-tmp/defect_mmseg/mmsegmentation/mmseg/models/decode_heads/knet_head.py", line 456, in loss_by_feat
loss = self.kernel_generate_head.loss_by_feat(
File "/root/autodl-tmp/defect_mmseg/mmsegmentation/mmseg/models/decode_heads/decode_head.py", line 324, in loss_by_feat
loss[loss_decode.loss_name] = loss_decode(
File "/root/miniconda3/envs/mmseg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/autodl-tmp/defect_mmseg/mmsegmentation/mmseg/models/losses/cross_entropy_loss.py", line 288, in forward
loss_cls = self.loss_weight * self.cls_criterion(
File "/root/autodl-tmp/defect_mmseg/mmsegmentation/mmseg/models/losses/cross_entropy_loss.py", line 73, in cross_entropy
avg_factor = label_weights.sum()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

This is my configuration file:

albu_train_transforms = [
dict(limit=45, p=0.5, type='SafeRotate'),
dict(p=0.5, type='Flip'),
dict(
p=0.3,
transforms=[
dict(p=1, type='RandomBrightnessContrast'),
dict(p=1, scale=0.4, type='RandomToneCurve'),
],
type='OneOf'),
dict(
n=2,
p=0.3,
transforms=[
dict(blur_limit=(
9,
11,
), p=1.0, type='GaussianBlur'),
dict(p=1.0, type='GridDistortion'),
dict(clip_limit=4.0, p=1, tile_grid_size=(
8,
8,
), type='CLAHE'),
dict(
alpha=(
0.8,
1.0,
),
blur_limit=(
11,
31,
),
p=1,
threshold=0,
type='UnsharpMask'),
dict(
color_shift=(
0.1,
0.3,
),
intensity=(
0.3,
0.5,
),
p=1.0,
type='ISONoise'),
dict(p=0.3, type='RandomGravel'),
],
type='SomeOf'),
dict(
p=0.1,
transforms=[
dict(
alpha_coef=0.1,
fog_coef_lower=0.2,
fog_coef_upper=0.5,
p=0.5,
type='RandomFog'),
dict(brightness_coefficient=0.8, p=1.0, type='RandomRain'),
dict(
brightness_coeff=1.0,
p=0.5,
snow_point_lower=0.2,
snow_point_upper=0.5,
type='RandomSnow'),
dict(
angle_lower=0.5,
flare_roi=(
0,
0,
1,
0.5,
),
p=0.2,
src_radius=50,
type='RandomSunFlare'),
dict(
num_shadows_lower=1,
num_shadows_upper=1,
p=0.2,
type='RandomShadow'),
dict(
cutout_threshold=(
0.3,
0.6,
),
mean=0.4,
p=0.2,
std=0.3,
type='Spatter'),
],
type='OneOf'),
dict(
p=0.1,
transforms=[
dict(
p=1.0,
quality_lower=30,
quality_upper=70,
type='ImageCompression'),
dict(p=1.0, type='RingingOvershoot'),
],
type='OneOf'),
]
checkpoint_file = 'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/swin/swin_large_patch4_window7_224_22k_20220308-d5bdebaf.pth'
conv_kernel_size = 1
crop_size = (
512,
512,
)
data_preprocessor = dict(
bgr_to_rgb=True,
mean=[
123.675,
116.28,
103.53,
],
pad_val=0,
seg_pad_val=255,
size=(
512,
512,
),
std=[
58.395,
57.12,
57.375,
],
type='SegDataPreProcessor')
data_root = './data/coco/'
dataset_type = 'ZBr10KDataset'
default_hooks = dict(
checkpoint=dict(
by_epoch=False,
interval=2500,
max_keep_ckpts=2,
save_best='mIoU',
type='CheckpointHook'),
logger=dict(interval=100, log_metric_by_epoch=False, type='LoggerHook'),
param_scheduler=dict(type='ParamSchedulerHook'),
sampler_seed=dict(type='DistSamplerSeedHook'),
timer=dict(type='IterTimerHook'),
visualization=dict(type='SegVisualizationHook'))
default_scope = 'mmseg'
env_cfg = dict(
cudnn_benchmark=True,
dist_cfg=dict(backend='nccl'),
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
img_ratios = [
0.5,
0.75,
1.0,
1.25,
1.5,
1.75,
]
launcher = 'none'
load_from = None
log_level = 'INFO'
log_processor = dict(by_epoch=False)
model = dict(
auxiliary_head=dict(
align_corners=False,
channels=256,
concat_input=False,
dropout_ratio=0.1,
in_channels=768,
in_index=2,
loss_decode=dict(
class_weight=[
1.0,
5.133,
5.9931,
5.0811,
4.3589,
],
loss_weight=0.4,
type='CrossEntropyLoss',
use_sigmoid=False),
norm_cfg=dict(requires_grad=True, type='BN'),
num_classes=5,
num_convs=1,
type='FCNHead'),
backbone=dict(
attn_drop_rate=0.0,
depths=[
2,
2,
18,
2,
],
drop_path_rate=0.3,
drop_rate=0.0,
embed_dims=192,
mlp_ratio=4,
num_heads=[
6,
12,
24,
48,
],
out_indices=(
0,
1,
2,
3,
),
patch_norm=True,
qk_scale=None,
qkv_bias=True,
type='SwinTransformer',
use_abs_pos_embed=False,
window_size=7),
data_preprocessor=dict(
bgr_to_rgb=True,
mean=[
123.675,
116.28,
103.53,
],
pad_val=0,
seg_pad_val=255,
size=(
512,
512,
),
std=[
58.395,
57.12,
57.375,
],
type='SegDataPreProcessor'),
decode_head=dict(
kernel_generate_head=dict(
align_corners=False,
channels=512,
dropout_ratio=0.1,
in_channels=[
192,
384,
768,
1536,
],
in_index=[
0,
1,
2,
3,
],
loss_decode=dict(
class_weight=[
1.0,
5.133,
5.9931,
5.0811,
4.3589,
],
loss_weight=1.0,
type='CrossEntropyLoss',
use_sigmoid=False),
norm_cfg=dict(requires_grad=True, type='BN'),
num_classes=5,
pool_scales=(
1,
2,
3,
6,
),
type='UPerHead'),
kernel_update_head=[
dict(
conv_kernel_size=1,
dropout=0.0,
feat_transform_cfg=dict(
act_cfg=None, conv_cfg=dict(type='Conv2d')),
feedforward_channels=2048,
ffn_act_cfg=dict(inplace=True, type='ReLU'),
in_channels=512,
kernel_updator_cfg=dict(
act_cfg=dict(inplace=True, type='ReLU'),
feat_channels=256,
in_channels=256,
norm_cfg=dict(type='LN'),
out_channels=256,
type='KernelUpdator'),
num_classes=5,
num_ffn_fcs=2,
num_heads=8,
num_mask_fcs=1,
out_channels=512,
type='KernelUpdateHead',
with_ffn=True),
dict(
conv_kernel_size=1,
dropout=0.0,
feat_transform_cfg=dict(
act_cfg=None, conv_cfg=dict(type='Conv2d')),
feedforward_channels=2048,
ffn_act_cfg=dict(inplace=True, type='ReLU'),
in_channels=512,
kernel_updator_cfg=dict(
act_cfg=dict(inplace=True, type='ReLU'),
feat_channels=256,
in_channels=256,
norm_cfg=dict(type='LN'),
out_channels=256,
type='KernelUpdator'),
num_classes=5,
num_ffn_fcs=2,
num_heads=8,
num_mask_fcs=1,
out_channels=512,
type='KernelUpdateHead',
with_ffn=True),
dict(
conv_kernel_size=1,
dropout=0.0,
feat_transform_cfg=dict(
act_cfg=None, conv_cfg=dict(type='Conv2d')),
feedforward_channels=2048,
ffn_act_cfg=dict(inplace=True, type='ReLU'),
in_channels=512,
kernel_updator_cfg=dict(
act_cfg=dict(inplace=True, type='ReLU'),
feat_channels=256,
in_channels=256,
norm_cfg=dict(type='LN'),
out_channels=256,
type='KernelUpdator'),
num_classes=5,
num_ffn_fcs=2,
num_heads=8,
num_mask_fcs=1,
out_channels=512,
type='KernelUpdateHead',
with_ffn=True),
],
num_stages=3,
type='IterativeDecodeHead'),
pretrained=
'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/swin/swin_large_patch4_window7_224_22k_20220308-d5bdebaf.pth',
test_cfg=dict(mode='whole'),
train_cfg=dict(),
type='EncoderDecoder')
norm_cfg = dict(requires_grad=True, type='BN')
num_stages = 3
optim_wrapper = dict(
clip_grad=dict(max_norm=1, norm_type=2),
optimizer=dict(
betas=(
0.9,
0.999,
), lr=6e-05, type='AdamW', weight_decay=0.0005),
paramwise_cfg=dict(
custom_keys=dict(
absolute_pos_embed=dict(decay_mult=0.0),
norm=dict(decay_mult=0.0),
relative_position_bias_table=dict(decay_mult=0.0))),
type='OptimWrapper')
optimizer = dict(lr=0.01, momentum=0.9, type='SGD', weight_decay=0.0005)
param_scheduler = [
dict(
begin=0, by_epoch=False, end=1000, start_factor=0.001,
type='LinearLR'),
dict(
begin=1000,
by_epoch=False,
end=80000,
milestones=[
60000,
72000,
],
type='MultiStepLR'),
]
randomness = dict(seed=0)
resume = False
test_cfg = dict(type='TestLoop')
test_dataloader = dict(
batch_size=1,
dataset=dict(
data_prefix=dict(
img_path='images/test', seg_map_path='annotations/test'),
data_root='./data/coco/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(keep_ratio=True, scale=(
2048,
1024,
), type='Resize'),
dict(type='LoadAnnotations'),
dict(type='PackSegInputs'),
],
type='ZBr10KDataset'),
num_workers=4,
persistent_workers=True,
sampler=dict(shuffle=False, type='DefaultSampler'))
test_evaluator = dict(
iou_metrics=[
'mIoU',
'mDice',
'mFscore',
], type='IoUMetric')
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(keep_ratio=True, scale=(
2048,
1024,
), type='Resize'),
dict(type='LoadAnnotations'),
dict(type='PackSegInputs'),
]
train_cfg = dict(max_iters=40000, type='IterBasedTrainLoop', val_interval=500)
train_dataloader = dict(
batch_size=6,
dataset=dict(
data_prefix=dict(
img_path='images/train', seg_map_path='annotations/train'),
data_root='./data/coco/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations'),
dict(
keep_ratio=True,
ratio_range=(
0.5,
2.0,
),
scale=(
2048,
1024,
),
type='RandomResize'),
dict(
cat_max_ratio=0.75, crop_size=(
512,
512,
), type='RandomCrop'),
dict(
transforms=[
dict(limit=45, p=0.5, type='SafeRotate'),
dict(p=0.5, type='Flip'),
dict(
p=0.3,
transforms=[
dict(p=1, type='RandomBrightnessContrast'),
dict(p=1, scale=0.4, type='RandomToneCurve'),
],
type='OneOf'),
dict(
n=2,
p=0.3,
transforms=[
dict(
blur_limit=(
9,
11,
),
p=1.0,
type='GaussianBlur'),
dict(p=1.0, type='GridDistortion'),
dict(
clip_limit=4.0,
p=1,
tile_grid_size=(
8,
8,
),
type='CLAHE'),
dict(
alpha=(
0.8,
1.0,
),
blur_limit=(
11,
31,
),
p=1,
threshold=0,
type='UnsharpMask'),
dict(
color_shift=(
0.1,
0.3,
),
intensity=(
0.3,
0.5,
),
p=1.0,
type='ISONoise'),
dict(p=0.3, type='RandomGravel'),
],
type='SomeOf'),
dict(
p=0.1,
transforms=[
dict(
alpha_coef=0.1,
fog_coef_lower=0.2,
fog_coef_upper=0.5,
p=0.5,
type='RandomFog'),
dict(
brightness_coefficient=0.8,
p=1.0,
type='RandomRain'),
dict(
brightness_coeff=1.0,
p=0.5,
snow_point_lower=0.2,
snow_point_upper=0.5,
type='RandomSnow'),
dict(
angle_lower=0.5,
flare_roi=(
0,
0,
1,
0.5,
),
p=0.2,
src_radius=50,
type='RandomSunFlare'),
dict(
num_shadows_lower=1,
num_shadows_upper=1,
p=0.2,
type='RandomShadow'),
dict(
cutout_threshold=(
0.3,
0.6,
),
mean=0.4,
p=0.2,
std=0.3,
type='Spatter'),
],
type='OneOf'),
dict(
p=0.1,
transforms=[
dict(
p=1.0,
quality_lower=30,
quality_upper=70,
type='ImageCompression'),
dict(p=1.0, type='RingingOvershoot'),
],
type='OneOf'),
],
type='Albu'),
dict(type='PackSegInputs'),
],
type='ZBr10KDataset'),
num_workers=2,
persistent_workers=True,
sampler=dict(shuffle=True, type='InfiniteSampler'))
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations'),
dict(
keep_ratio=True,
ratio_range=(
0.5,
2.0,
),
scale=(
2048,
1024,
),
type='RandomResize'),
dict(cat_max_ratio=0.75, crop_size=(
512,
512,
), type='RandomCrop'),
dict(
transforms=[
dict(limit=45, p=0.5, type='SafeRotate'),
dict(p=0.5, type='Flip'),
dict(
p=0.3,
transforms=[
dict(p=1, type='RandomBrightnessContrast'),
dict(p=1, scale=0.4, type='RandomToneCurve'),
],
type='OneOf'),
dict(
n=2,
p=0.3,
transforms=[
dict(blur_limit=(
9,
11,
), p=1.0, type='GaussianBlur'),
dict(p=1.0, type='GridDistortion'),
dict(
clip_limit=4.0,
p=1,
tile_grid_size=(
8,
8,
),
type='CLAHE'),
dict(
alpha=(
0.8,
1.0,
),
blur_limit=(
11,
31,
),
p=1,
threshold=0,
type='UnsharpMask'),
dict(
color_shift=(
0.1,
0.3,
),
intensity=(
0.3,
0.5,
),
p=1.0,
type='ISONoise'),
dict(p=0.3, type='RandomGravel'),
],
type='SomeOf'),
dict(
p=0.1,
transforms=[
dict(
alpha_coef=0.1,
fog_coef_lower=0.2,
fog_coef_upper=0.5,
p=0.5,
type='RandomFog'),
dict(brightness_coefficient=0.8, p=1.0, type='RandomRain'),
dict(
brightness_coeff=1.0,
p=0.5,
snow_point_lower=0.2,
snow_point_upper=0.5,
type='RandomSnow'),
dict(
angle_lower=0.5,
flare_roi=(
0,
0,
1,
0.5,
),
p=0.2,
src_radius=50,
type='RandomSunFlare'),
dict(
num_shadows_lower=1,
num_shadows_upper=1,
p=0.2,
type='RandomShadow'),
dict(
cutout_threshold=(
0.3,
0.6,
),
mean=0.4,
p=0.2,
std=0.3,
type='Spatter'),
],
type='OneOf'),
dict(
p=0.1,
transforms=[
dict(
p=1.0,
quality_lower=30,
quality_upper=70,
type='ImageCompression'),
dict(p=1.0, type='RingingOvershoot'),
],
type='OneOf'),
],
type='Albu'),
dict(type='PackSegInputs'),
]
tta_model = dict(type='SegTTAModel')
tta_pipeline = [
dict(file_client_args=dict(backend='disk'), type='LoadImageFromFile'),
dict(
transforms=[
[
dict(keep_ratio=True, scale_factor=0.5, type='Resize'),
dict(keep_ratio=True, scale_factor=0.75, type='Resize'),
dict(keep_ratio=True, scale_factor=1.0, type='Resize'),
dict(keep_ratio=True, scale_factor=1.25, type='Resize'),
dict(keep_ratio=True, scale_factor=1.5, type='Resize'),
dict(keep_ratio=True, scale_factor=1.75, type='Resize'),
],
[
dict(direction='horizontal', prob=0.0, type='RandomFlip'),
dict(direction='horizontal', prob=1.0, type='RandomFlip'),
],
[
dict(type='LoadAnnotations'),
],
[
dict(type='PackSegInputs'),
],
],
type='TestTimeAug'),
]
val_cfg = dict(type='ValLoop')
val_dataloader = dict(
batch_size=1,
dataset=dict(
data_prefix=dict(
img_path='images/val', seg_map_path='annotations/val'),
data_root='./data/coco/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(keep_ratio=True, scale=(
2048,
1024,
), type='Resize'),
dict(type='LoadAnnotations'),
dict(type='PackSegInputs'),
],
type='ZBr10KDataset'),
num_workers=4,
persistent_workers=True,
sampler=dict(shuffle=False, type='DefaultSampler'))
val_evaluator = dict(
iou_metrics=[
'mIoU',
'mDice',
'mFscore',
], type='IoUMetric')
vis_backends = [
dict(type='LocalVisBackend'),
]
visualizer = dict(
name='visualizer',
type='SegLocalVisualizer',
vis_backends=[
dict(type='LocalVisBackend'),
])
work_dir = './work_dirs/ZBr10KDataset-KNet-albu-loss'

This is my repository version information:
sys.platform: linux
Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0]
CUDA available: True
numpy_random_seed: 2147483648
GPU 0: NVIDIA GeForce RTX 4090
CUDA_HOME: /usr/local/cuda-11.8
NVCC: Cuda compilation tools, release 11.8, V11.8.89
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PyTorch: 2.0.1+cu118
PyTorch compiling details: PyTorch built with:

GCC 9.3
C++ Version: 201703
Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.8
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
CuDNN 8.7
Magma 2.6.1
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.15.2+cu118
OpenCV: 4.8.1
MMEngine: 0.10.1
MMSegmentation: 1.2.1+cbf9af1

shiomi326 · 2024-02-07T03:35:31Z

I have the same issue.

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. ## Motivation Fixes open-mmlab#3412 ## Modification We just need to replace tensor creation using torch.stack() instead of torch.tensor(). ## BC-breaking (Optional) Does the modification introduce changes that break the backward-compatibility of the downstream repos? If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR. ## Use cases (Optional) If this PR introduces a new feature, it is better to list some use cases here, and update the documentation. ## Checklist 1. Pre-commit or other linting tools are used to fix the potential lint issues. 2. The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness. 3. If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMDet3D. 4. The documentation has been modified accordingly, like docstring or example tutorials.

hadariru · 2024-04-17T00:53:09Z

Is there progress on this?
I found out that the index is being 255 which is more than the index defined in class_weight

Fix bug in cross entropy loss

1253791

xiexinch changed the base branch from main to dev-1.x December 4, 2023 06:13

xiexinch approved these changes Dec 4, 2023

View reviewed changes

xiexinch merged commit e51f511 into open-mmlab:dev-1.x Dec 4, 2023

mmeendez8 mentioned this pull request Dec 7, 2023

issue with class weight and cross entropy loss #3412

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix bug in cross entropy loss #3457

[Bugfix] Fix bug in cross entropy loss #3457

mmeendez8 commented Nov 30, 2023

call560 commented Jan 2, 2024 •

edited

Loading

shiomi326 commented Feb 7, 2024

hadariru commented Apr 17, 2024

[Bugfix] Fix bug in cross entropy loss #3457

[Bugfix] Fix bug in cross entropy loss #3457

Conversation

mmeendez8 commented Nov 30, 2023

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

call560 commented Jan 2, 2024 • edited Loading

shiomi326 commented Feb 7, 2024

hadariru commented Apr 17, 2024

call560 commented Jan 2, 2024 •

edited

Loading