[🐛BUG] 有关DCN模型的问题。 #1294

fightinghzx · 2022-05-24T12:58:49Z

根据RecBole的知乎https://zhuanlan.zhihu.com/p/273251140，写了一个run.py
from recbole.quick_start import run_recbole

run_recbole(dataset='ml-100k', model='DCN')
会报错，错误如下：
Traceback (most recent call last):
File "/home/hzx/PycharmProjects/fedrec/examples/run_recbole.py", line 15, in
run_recbole(model=args.model, dataset=args.dataset, config_file_list=config_file_list)
File "/home/hzx/anaconda3/envs/pytorch/lib/python3.9/site-packages/recbole/quick_start/quick_start.py", line 56, in run_recbole
best_valid_score, best_valid_result = trainer.fit(
File "/home/hzx/anaconda3/envs/pytorch/lib/python3.9/site-packages/recbole/trainer/trainer.py", line 335, in fit
train_loss = self._train_epoch(train_data, epoch_idx, show_progress=show_progress)
File "/home/hzx/anaconda3/envs/pytorch/lib/python3.9/site-packages/recbole/trainer/trainer.py", line 181, in _train_epoch
losses = loss_func(interaction)
File "/home/hzx/anaconda3/envs/pytorch/lib/python3.9/site-packages/recbole/model/context_aware_recommender/dcn.py", line 119, in calculate_loss
return self.loss(output, label) + l2_loss
File "/home/hzx/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/hzx/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/loss.py", line 612, in forward
return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
File "/home/hzx/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/nn/functional.py", line 3065, in binary_cross_entropy
return torch._C._nn.binary_cross_entropy(input, target, weight, reduction_enum)
RuntimeError: all elements of input should be between 0 and 1

Process finished with exit code 1
其他模型都成功了，只有DCN模型出错

)

2017pxy · 2022-05-25T11:52:08Z

@fightinghzx 你好，出现这个问题的原因是因为没有对timestamp那一列做normalize，导致在dcn的模型层数很大的情况下，会出现梯度爆炸，最终导致这个报错。

感谢你的反馈，我们已经在 #1295 修复了这个问题。

FIX: bug fix for normalize_all in ml-100k.yaml. (fix for #1294)

fightinghzx added the bug Something isn't working label May 24, 2022

Wicknight self-assigned this May 25, 2022

chenyushuo added a commit to chenyushuo/RecBole that referenced this issue May 25, 2022

FIX: bug fix for normalize_all in ml-100k.yaml. (fix for RUCAIBox#1294

274e416

)

2017pxy added a commit that referenced this issue May 25, 2022

Merge pull request #1295 from chenyushuo/master

925d707

FIX: bug fix for normalize_all in ml-100k.yaml. (fix for #1294)

2017pxy closed this as completed May 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[🐛BUG] 有关DCN模型的问题。 #1294

[🐛BUG] 有关DCN模型的问题。 #1294

fightinghzx commented May 24, 2022

2017pxy commented May 25, 2022

[🐛BUG] 有关DCN模型的问题。 #1294

[🐛BUG] 有关DCN模型的问题。 #1294

Comments

fightinghzx commented May 24, 2022

2017pxy commented May 25, 2022