You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 21, 2023. It is now read-only.
Got Negative or NaN loss while training keypoint like the following, could not figure out the reasons. The coco data is downloaded from the official website. Tried e2e_keypoint_rcnn_R-50-FPN_1x.yaml etc, also got the same errors. Printing out the losses.
print(np.array([self.losses_and_metrics[k] for k in self.model.losses]))
[ nan nan nan 1.87069760e+26
3.42327067e+24 nan nan nan
nan 1.00860528e+28 nan 2.96251020e+27
nan]
total: nan
CRITICAL train_net.py: 239: Loss is NaN, exiting...
INFO loader.py: 126: Stopping enqueue thread
INFO loader.py: 113: Stopping mini-batch loading thread
INFO loader.py: 113: Stopping mini-batch loading thread
INFO loader.py: 113: Stopping mini-batch loading thread
INFO loader.py: 113: Stopping mini-batch loading thread
The text was updated successfully, but these errors were encountered:
rbgirshick
changed the title
keypoint training error
keypoint training nan (#resolved: you need to apply the linear scaling rule when adjusting the number of GPUs during training)
Jan 24, 2018
Got Negative or NaN loss while training keypoint like the following, could not figure out the reasons. The coco data is downloaded from the official website. Tried e2e_keypoint_rcnn_R-50-FPN_1x.yaml etc, also got the same errors. Printing out the losses.
The text was updated successfully, but these errors were encountered: