Skip to content

Commit 9c4cedc

Browse files
KenoKristofferC
authored and
KristofferC
committed
Use rr-safe nopl; rdtsc sequence (#50975)
When running under `rr`, it needs to patch out `rdtsc` to record the values returned. If this is not possible, `rr` falls back to an expensive signal-based emulation. As of rr master, a specific `nopl; rdtsc` sequence may be used to guarantee that `rdtsc` patching is always possible. Use this sequence for uses of rdtsc in our runtime. (cherry picked from commit ce3f97c)
1 parent 404750f commit 9c4cedc

File tree

1 file changed

+10
-1
lines changed

1 file changed

+10
-1
lines changed

src/julia_internal.h

+10-1
Original file line numberDiff line numberDiff line change
@@ -208,8 +208,17 @@ JL_DLLEXPORT void jl_unlock_profile_wr(void) JL_NOTSAFEPOINT JL_NOTSAFEPOINT_LEA
208208
static inline uint64_t cycleclock(void) JL_NOTSAFEPOINT
209209
{
210210
#if defined(_CPU_X86_64_)
211+
// This is nopl 0(%rax, %rax, 1), but assembler are incosistent about whether
212+
// they emit that as a 4 or 5 byte sequence and we need to be guaranteed to use
213+
// the 5 byte one.
214+
#define NOP5_OVERRIDE_NOP ".byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\t"
211215
uint64_t low, high;
212-
__asm__ volatile("rdtsc" : "=a"(low), "=d"(high));
216+
// This instruction sequence is promised by rr to be patchable. rr can usually
217+
// also patch `rdtsc` in regular code, but without the preceeding nop, there could
218+
// be an interfering branch into the middle of rr's patch region. Using this
219+
// sequence prevents a massive rr-induced slowdown if the compiler happens to emit
220+
// an unlucky pattern. See https://github.com/rr-debugger/rr/pull/3580.
221+
__asm__ volatile(NOP5_OVERRIDE_NOP "rdtsc" : "=a"(low), "=d"(high));
213222
return (high << 32) | low;
214223
#elif defined(_CPU_X86_)
215224
int64_t ret;

0 commit comments

Comments
 (0)