Inference Time Explaination #13

beetleskin · 2018-01-23T23:39:38Z

Inference times are often expressed as "X + Y", in which X is time taken in reasonably well-optimized GPU code and Y is time taken in unoptimized CPU code. (The CPU code time could be reduced substantially with additional engineering.)

Isn't it the other way around? X is always > Y in the tables.

rbgirshick · 2018-01-24T01:10:05Z

The explanation is correct; the "Y" time is indeed unoptimized CPU code. The fact that it's often so small is why it's left unoptimized :). The main point is that when considering how fast a model is, we can take the timing to be essentially just X because Y can be made much smaller with some engineering effort (e.g., the Y for Mask R-CNN is mostly time spent upsampling 100 predicted masks, one at a time, not in parallel; this could be replaced with a parallelized GPU implementation and take almost no time at all).

beetleskin · 2018-01-24T08:30:14Z

So, if I got this right, the total inference time is always X + Y, i.e. some parts of the inference is run on GPU, some on CPU? From the explanation I thought X is inference time on the GPU and Y is inference time on the CPU, i.e. the same algorithm on different hardware.. But I guess the "+" expresses exactly that :)

Does the inference time also relate to the hardware of

8 NVIDIA Tesla P100 GPU

, run in parallel?

rbgirshick · 2018-01-24T14:15:51Z

I see the confusion. Yes, the total time is additive as in X plus Y.

When the --multi-gpu-testing flag is used with {train,test}_net.py inference happens on the dataset in a map-reduce way; the dataset is partitioned into NUM_GPUS subsets and they are processed in parallel. Inference on each individual image is always run on a single GPU.

rbgirshick closed this as completed Jan 24, 2018

sidnav mentioned this issue Aug 9, 2018

Segmentation fault while running infer_simple.py #607

Closed

JeasonUESTC mentioned this issue Mar 18, 2019

RuntimeError: [enforce fail at context_gpu.cu:234] #842

Open

chenliqiong mentioned this issue Jun 12, 2019

RuntimeError: CUDA error: invalid device ordinal (exchangeDevice at /pytorch/c10/cuda/impl/CUDAGuardImpl.h:29) #897

Open

carryyu mentioned this issue Sep 17, 2019

RuntimeError: [enforce fail at operator.cc:75] blob != nullptr. op Conv: Encountered a non-existing input blob: gpu_0/old_res3_7_sum #941

Open

cpoptic mentioned this issue Nov 26, 2019

RuntimeError: CUDA error: no kernel image is available for execution on the device #965

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference Time Explaination #13

Inference Time Explaination #13

beetleskin commented Jan 23, 2018

rbgirshick commented Jan 24, 2018 •

edited

Loading

beetleskin commented Jan 24, 2018 •

edited

Loading

rbgirshick commented Jan 24, 2018

Inference Time Explaination #13

Inference Time Explaination #13

Comments

beetleskin commented Jan 23, 2018

rbgirshick commented Jan 24, 2018 • edited Loading

beetleskin commented Jan 24, 2018 • edited Loading

rbgirshick commented Jan 24, 2018

rbgirshick commented Jan 24, 2018 •

edited

Loading

beetleskin commented Jan 24, 2018 •

edited

Loading