-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All nan values : runtime.step=1900, loss.cls_loss=nan, loss.cls_loss_rt=nan, loss.loc_loss=nan, loss.loc_loss_rt=nan, loss.loc_elem=[nan, nan, nan, nan, nan, nan, nan] #144
Comments
Could you please change your title? it looks terrible. the 'nan' appears in first log or appears after some steps? Please don't use relative model dir path. I will add code to check this in next update. |
Already model voxelnet-7750.tckpt has been dumped. But my tensorboard results look everything nill and no exponential decay in loss. Im using 2 RTX2080 Ti GPU's. Kindly help |
Kindly help |
could you provide the log.txt in model dir? |
I tried with my trained model and used simple-inference.ipynb for validation. It seems im not getting any bounding box of car.lite.config. Kindly suggest how to go ahead |
you need to use my pretrained model with simple-inference to debug... |
Iam able to detect bounding boxes of car with your trained model for all of 3 types. But im not getting any result wrt my model. Kindly help. I followed the same procedure as you listed in the doc |
Can you train with kitti dataset correctly? if you are using custom data, do you use the web visualization tool to check the boundbox? |
The problem was wrt GPU and i reformatted the system. It is working now. Thanks for your inputs. |
@chowkamlee81 Hi I have faced the same problem with you. Would you please give me some details about how this problem happens and how did you solve this problem? |
I also got the NaN issues when training with all.pp.lowa.config.
After I implemented the above fixes and workarounds, I did not see the NaN issues again. Hope those also help your case. |
Hello How did you generate those graphs? Could you please provide me some details or hints on generating them? |
Hello @Sreeni1204, You have to install tensorboard and tensorflow. |
when executing
python ./pytorch/train.py train --config_path=/home/ubuntu/LIDAR/Traveller59/second/configs/pointpillars/car/xyres_16.config --model_dir=../model_pytorch,
iam getting nan values below ... Kindly help
runtime.step=1900, runtime.steptime=0.1566, loss.cls_loss=nan, loss.cls_loss_rt=nan, loss.loc_loss=nan, loss.loc_loss_rt=nan, loss.loc_elem=[nan, nan, nan, nan, nan, nan, nan], loss.cls_pos_rt=nan, loss.cls_neg_rt=nan, loss.dir_rt=nan, rpn_acc=0.9963, pr.prec@10=0.0, pr.rec@10=0.0, pr.prec@30=0.0, pr.rec@30=0.0, pr.prec@50=0.0, pr.rec@50=0.0, pr.prec@70=0.0, pr.rec@70=0.0, pr.prec@80=0.0, pr.rec@80=0.0, pr.prec@90=0.0, pr.rec@90=0.0, pr.prec@95=0.0, pr.rec@95=0.0, misc.num_vox=10896, misc.num_pos=92, misc.num_neg=23658, misc.num_anchors=23883, misc.lr=0.0003174
runtime.step=1950, runtime.steptime=0.1747, loss.cls_loss=nan, loss.cls_loss_rt=nan, loss.loc_loss=nan, loss.loc_loss_rt=nan, loss.loc_elem=[nan, nan, nan, nan, nan, nan, nan], loss.cls_pos_rt=nan, loss.cls_neg_rt=nan, loss.dir_rt=nan, rpn_acc=0.9962, pr.prec@10=0.0, pr.rec@10=0.0, pr.prec@30=0.0, pr.rec@30=0.0, pr.prec@50=0.0, pr.rec@50=0.0, pr.prec@70=0.0, pr.rec@70=0.0, pr.prec@80=0.0, pr.rec@80=0.0, pr.prec@90=0.0, pr.rec@90=0.0, pr.prec@95=0.0, pr.rec@95=0.0, misc.num_vox=7509, misc.num_pos=96, misc.num_neg=11006, misc.num_anchors=11237, misc.lr=0.0003183
runtime.step=2000, runtime.steptime=0.1833, loss.cls_loss=nan, loss.cls_loss_rt=nan, loss.loc_loss=nan, loss.loc_loss_rt=nan, loss.loc_elem=[nan, nan, nan, nan, nan, nan, nan], loss.cls_pos_rt=nan, loss.cls_neg_rt=nan, loss.dir_rt=nan, rpn_acc=0.9962, pr.prec@10=0.0, pr.rec@10=0.0, pr.prec@30=0.0, pr.rec@30=0.0, pr.prec@50=0.0, pr.rec@50=0.0, pr.prec@70=0.0, pr.rec@70=0.0, pr.prec@80=0.0, pr.rec@80=0.0, pr.prec@90=0.0, pr.rec@90=0.0, pr.prec@95=0.0, pr.rec@95=0.0, misc.num_vox=13329, misc.num_pos=114, misc.num_neg=21720, misc.num_anchors=22004, misc.lr=0.0003193
runtime.step=2050, runtime.steptime=0.1915, loss.cls_loss=nan, loss.cls_loss_rt=nan, loss.loc_loss=nan, loss.loc_loss_rt=nan, loss.loc_elem=[nan, nan, nan, nan, nan, nan, nan], loss.cls_pos_rt=nan, loss.cls_neg_rt=nan, loss.dir_rt=nan, rpn_acc=0.9963, pr.prec@10=0.0, pr.rec@10=0.0, pr.prec@30=0.0, pr.rec@30=0.0, pr.prec@50=0.0, pr.rec@50=0.0, pr.prec@70=0.0, pr.rec@70=0.0, pr.prec@80=0.0, pr.rec@80=0.0, pr.prec@90=0.0, pr.rec@90=0.0, pr.prec@95=0.0, pr.rec@95=0.0, misc.num_vox=12526, misc.num_pos=106, misc.num_neg=19512, misc.num_anchors=19785, misc.lr=0.0003202
The text was updated successfully, but these errors were encountered: