Check failed: error == cudaSuccess an illegal memory access was encountered #383

shiyongde · 2018-04-19T12:50:02Z

When i training retinanet , if i set IMS_PER_BATCH: 2 ， it just use 1631MB memory。
logs like :
I0419 19:38:00.144428 30279 context_gpu.cu:305] GPU 0: 1631 MB
I0419 19:38:00.144462 30279 context_gpu.cu:309] Total: 1631 MB

But if i set to IMS_PER_BATCH:4 ，get error
。。。。。。
I0419 19:29:35.016454 27192 context_gpu.cu:309] Total: 5582 MB
I0419 19:29:35.218364 27190 context_gpu.cu:305] GPU 0: 5712 MB
I0419 19:29:35.218410 27190 context_gpu.cu:309] Total: 5712 MB
E0419 19:30:51.564118 27190 net_dag.cc:188] Exception from operator chain starting at '' (type 'Conv'): caffe2::EnforceNotMet: [enforce fail at context_gpu.h:155] .## Encountered CUDA error: an illegal memory access was encountered Error from operator:
input: "gpu_0/retnet_bbox_conv_n1_fpn3" input: "gpu_0/retnet_bbox_pred_fpn3_w" input: "gpu_0/retnet_bbox_pred_fpn3_b" output: "gpu_0/retnet_bbox_pred_fpn3" name: "" type: "Conv" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN"
E0419 19:30:51.564137 27193 net_dag.cc:188] Secondary exception from operator chain starting at '' (type 'ConvGradient'): caffe2::EnforceNotMet: [enforce fail at context_gpu.h:155] . Encountered CUDA error: an illegal memory access was encountered Error from operator:
input: "gpu_0/retnet_cls_conv_n1_fpn5" input: "gpu_0/retnet_cls_pred_fpn3_w" input: "gpu_0/__m0_shared" output: "gpu_0/retnet_cls_pred_fpn3_w_grad" output: "gpu_0/retnet_cls_pred_fpn3_b_grad" output: "gpu_0/__m546_shared" name: "" type: "ConvGradient" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN" is_gradient_op: true
F0419 19:30:51.564193 27190 context_gpu.h:106] Check failed: error == cudaSuccess an illegal memory access was encountered
*** Check failure stack trace: ***
F0419 19:30:51.564163 27191 context_gpu.h:106] Check failed: error == cudaSuccess an illegal memory access was encountered
*** Check failure stack trace: ***
F0419 19:30:51.564193 27190 context_gpu.h:106] Check failed: error == cudaSuccess an illegal memory access was encounteredF0419 19:30:51.564221 27193 context_gpu.h:106] Check failed: error == cudaSuccess an illegal memory access was encountered
*** Check failure stack trace: ***
Aborted (core dumped)

I using k40 ， it has 11439MiB 。Is this a bug?

System information

Operating system:
Ubuntu 14.04.5 LTS
CUDA version: 8.0
cuDNN version: 7.1
python --version output: Python 2.7.6

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 23337 C python 2514MiB |
| 1 23339 C python 2514MiB |
| 3 28212 C python 1863MiB |
+-----------------------------------------------------------------------------+

The text was updated successfully, but these errors were encountered:

rbgirshick · 2018-04-28T13:24:09Z

Looks like a duplicate of #32.

rbgirshick closed this as completed Apr 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check failed: error == cudaSuccess an illegal memory access was encountered #383

Check failed: error == cudaSuccess an illegal memory access was encountered #383

shiyongde commented Apr 19, 2018 •

edited

Loading

rbgirshick commented Apr 28, 2018

Check failed: error == cudaSuccess an illegal memory access was encountered #383

Check failed: error == cudaSuccess an illegal memory access was encountered #383

Comments

shiyongde commented Apr 19, 2018 • edited Loading

System information

rbgirshick commented Apr 28, 2018

shiyongde commented Apr 19, 2018 •

edited

Loading