Skip to content
This repository was archived by the owner on Nov 21, 2023. It is now read-only.

Check failed: error == cudaSuccess an illegal memory access was encountered #383

Closed
shiyongde opened this issue Apr 19, 2018 · 1 comment

Comments

@shiyongde
Copy link

shiyongde commented Apr 19, 2018

When i training retinanet , if i set IMS_PER_BATCH: 2 , it just use 1631MB memory。
logs like :
I0419 19:38:00.144428 30279 context_gpu.cu:305] GPU 0: 1631 MB
I0419 19:38:00.144462 30279 context_gpu.cu:309] Total: 1631 MB

But if i set to IMS_PER_BATCH:4 ,get error
。。。。。。
I0419 19:29:35.016454 27192 context_gpu.cu:309] Total: 5582 MB
I0419 19:29:35.218364 27190 context_gpu.cu:305] GPU 0: 5712 MB
I0419 19:29:35.218410 27190 context_gpu.cu:309] Total: 5712 MB
E0419 19:30:51.564118 27190 net_dag.cc:188] Exception from operator chain starting at '' (type 'Conv'): caffe2::EnforceNotMet: [enforce fail at context_gpu.h:155] .## Encountered CUDA error: an illegal memory access was encountered Error from operator:
input: "gpu_0/retnet_bbox_conv_n1_fpn3" input: "gpu_0/retnet_bbox_pred_fpn3_w" input: "gpu_0/retnet_bbox_pred_fpn3_b" output: "gpu_0/retnet_bbox_pred_fpn3" name: "" type: "Conv" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN"
E0419 19:30:51.564137 27193 net_dag.cc:188] Secondary exception from operator chain starting at '' (type 'ConvGradient'): caffe2::EnforceNotMet: [enforce fail at context_gpu.h:155] . Encountered CUDA error: an illegal memory access was encountered Error from operator:
input: "gpu_0/retnet_cls_conv_n1_fpn5" input: "gpu_0/retnet_cls_pred_fpn3_w" input: "gpu_0/__m0_shared" output: "gpu_0/retnet_cls_pred_fpn3_w_grad" output: "gpu_0/retnet_cls_pred_fpn3_b_grad" output: "gpu_0/__m546_shared" name: "" type: "ConvGradient" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN" is_gradient_op: true
F0419 19:30:51.564193 27190 context_gpu.h:106] Check failed: error == cudaSuccess an illegal memory access was encountered
*** Check failure stack trace: ***
F0419 19:30:51.564163 27191 context_gpu.h:106] Check failed: error == cudaSuccess an illegal memory access was encountered
*** Check failure stack trace: ***
F0419 19:30:51.564193 27190 context_gpu.h:106] Check failed: error == cudaSuccess an illegal memory access was encounteredF0419 19:30:51.564221 27193 context_gpu.h:106] Check failed: error == cudaSuccess an illegal memory access was encountered
*** Check failure stack trace: ***
Aborted (core dumped)

I using k40 , it has 11439MiB 。Is this a bug?

System information

  • Operating system:
    Ubuntu 14.04.5 LTS

  • CUDA version: 8.0

  • cuDNN version: 7.1

  • python --version output: Python 2.7.6

nvidia-smi: output
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 387.12 Driver Version: 387.12 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K40m Off | 00000000:02:00.0 Off | 0 |
| N/A 34C P0 63W / 235W | 3757MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K40m Off | 00000000:03:00.0 Off | 0 |
| N/A 34C P0 62W / 235W | 5044MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K40m Off | 00000000:83:00.0 Off | 0 |
| N/A 34C P0 63W / 235W | 3534MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K40m Off | 00000000:84:00.0 Off | 0 |
| N/A 46C P0 150W / 235W | 1874MiB / 11439MiB | 100% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 23337 C python 2514MiB |
| 1 23339 C python 2514MiB |
| 3 28212 C python 1863MiB |
+-----------------------------------------------------------------------------+

@rbgirshick
Copy link
Contributor

Looks like a duplicate of #32.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants