You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 21, 2023. It is now read-only.
When i training retinanet , if i set IMS_PER_BATCH: 2 , it just use 1631MB memory。
logs like :
I0419 19:38:00.144428 30279 context_gpu.cu:305] GPU 0: 1631 MB
I0419 19:38:00.144462 30279 context_gpu.cu:309] Total: 1631 MB
When i training retinanet , if i set IMS_PER_BATCH: 2 , it just use 1631MB memory。
logs like :
I0419 19:38:00.144428 30279 context_gpu.cu:305] GPU 0: 1631 MB
I0419 19:38:00.144462 30279 context_gpu.cu:309] Total: 1631 MB
But if i set to IMS_PER_BATCH:4 ,get error
。。。。。。
I0419 19:29:35.016454 27192 context_gpu.cu:309] Total: 5582 MB
I0419 19:29:35.218364 27190 context_gpu.cu:305] GPU 0: 5712 MB
I0419 19:29:35.218410 27190 context_gpu.cu:309] Total: 5712 MB
E0419 19:30:51.564118 27190 net_dag.cc:188] Exception from operator chain starting at '' (type 'Conv'): caffe2::EnforceNotMet: [enforce fail at context_gpu.h:155] .## Encountered CUDA error: an illegal memory access was encountered Error from operator:
input: "gpu_0/retnet_bbox_conv_n1_fpn3" input: "gpu_0/retnet_bbox_pred_fpn3_w" input: "gpu_0/retnet_bbox_pred_fpn3_b" output: "gpu_0/retnet_bbox_pred_fpn3" name: "" type: "Conv" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN"
E0419 19:30:51.564137 27193 net_dag.cc:188] Secondary exception from operator chain starting at '' (type 'ConvGradient'): caffe2::EnforceNotMet: [enforce fail at context_gpu.h:155] . Encountered CUDA error: an illegal memory access was encountered Error from operator:
input: "gpu_0/retnet_cls_conv_n1_fpn5" input: "gpu_0/retnet_cls_pred_fpn3_w" input: "gpu_0/__m0_shared" output: "gpu_0/retnet_cls_pred_fpn3_w_grad" output: "gpu_0/retnet_cls_pred_fpn3_b_grad" output: "gpu_0/__m546_shared" name: "" type: "ConvGradient" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN" is_gradient_op: true
F0419 19:30:51.564193 27190 context_gpu.h:106] Check failed: error == cudaSuccess an illegal memory access was encountered
*** Check failure stack trace: ***
F0419 19:30:51.564163 27191 context_gpu.h:106] Check failed: error == cudaSuccess an illegal memory access was encountered
*** Check failure stack trace: ***
F0419 19:30:51.564193 27190 context_gpu.h:106] Check failed: error == cudaSuccess an illegal memory access was encounteredF0419 19:30:51.564221 27193 context_gpu.h:106] Check failed: error == cudaSuccess an illegal memory access was encountered
*** Check failure stack trace: ***
Aborted (core dumped)
I using k40 , it has 11439MiB 。Is this a bug?
System information
Operating system:
Ubuntu 14.04.5 LTS
CUDA version: 8.0
cuDNN version: 7.1
python --version
output: Python 2.7.6nvidia-smi: output
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 387.12 Driver Version: 387.12 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K40m Off | 00000000:02:00.0 Off | 0 |
| N/A 34C P0 63W / 235W | 3757MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K40m Off | 00000000:03:00.0 Off | 0 |
| N/A 34C P0 62W / 235W | 5044MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K40m Off | 00000000:83:00.0 Off | 0 |
| N/A 34C P0 63W / 235W | 3534MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K40m Off | 00000000:84:00.0 Off | 0 |
| N/A 46C P0 150W / 235W | 1874MiB / 11439MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 23337 C python 2514MiB |
| 1 23339 C python 2514MiB |
| 3 28212 C python 1863MiB |
+-----------------------------------------------------------------------------+
The text was updated successfully, but these errors were encountered: