Sync Guardrails Within Benchmarking Loop #6182
Unanswered
DhruvSrikanth
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In triton's
do_bench()
(found here), do we need need have a cross-device synchronize before eachfn()
call in the speed trial loop? Specifically here?I found something like this to have a far lower time than the computation actually takes. One way around this is by adding a synchronize step to
fn()
itself instead of directly passingforward(**get_inputs())
, however, this seems like something that should be covered within the benchmarking function for robustness.Curious to get the communities take on something like this
Beta Was this translation helpful? Give feedback.
All reactions