-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Inference Time Explaination #13
Comments
The explanation is correct; the "Y" time is indeed unoptimized CPU code. The fact that it's often so small is why it's left unoptimized :). The main point is that when considering how fast a model is, we can take the timing to be essentially just X because Y can be made much smaller with some engineering effort (e.g., the Y for Mask R-CNN is mostly time spent upsampling 100 predicted masks, one at a time, not in parallel; this could be replaced with a parallelized GPU implementation and take almost no time at all). |
So, if I got this right, the total inference time is always X + Y, i.e. some parts of the inference is run on GPU, some on CPU? From the explanation I thought X is inference time on the GPU and Y is inference time on the CPU, i.e. the same algorithm on different hardware.. But I guess the "+" expresses exactly that :) Does the inference time also relate to the hardware of
, run in parallel? |
I see the confusion. Yes, the total time is additive as in X plus Y. When the |
Isn't it the other way around? X is always > Y in the tables.
The text was updated successfully, but these errors were encountered: