Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclusive keystore locking #3907

Merged
merged 9 commits into from
Aug 7, 2022
Merged

Exclusive keystore locking #3907

merged 9 commits into from
Aug 7, 2022

Conversation

cheatfate
Copy link
Contributor

Address #3686 and #3080
This PR requires status-im/nim-stew#116 to be merged.

@github-actions
Copy link

github-actions bot commented Jul 25, 2022

Unit Test Results

       12 files  ±0       860 suites  ±0   1h 11m 4s ⏱️ - 3m 23s
  1 911 tests ±0    1 764 ✔️ ±0  147 💤 ±0  0 ±0 
10 341 runs  ±0  10 151 ✔️ ±0  190 💤 ±0  0 ±0 

Results for commit 73c1ab0. ± Comparison against base commit 52b32c1.

♻️ This comment has been updated with latest results.

@etan-status
Copy link
Contributor

Attempt to workaround macos issue, this is triggered by LTO.

         let cookedKey =
           block:
             let res = publicKey.load()
             if res.isNone():
               # Skip folders which has invalid ValidatorPubKey.
               continue
             res.get()

         yield cookedKey

With LTO, the publicKey.load() block is inlined, but only partially updates res. The final step that loads it into an Option
using some() only initializes the option with zeroes, but then forgets to actually copy the computed result into it. Likewise, cookedKey is not updated with the result, and it happens to end up as a copy of publicKey instead.

Slightly changing the syntax makes LTO no longer inline publicKey.load(), and works around this problem. It is unclear, though, where else the LTO introduces such bugs.

The problem occurs using both -flto=thin and -flto=full and goes away when not using LTO.

Reproducible on macOS 12.5 with Xcode 13.4.1 on Nim compiler 1.6 (5f61f1594d0c11caf0fb650e3d3bc32cf8f38890) on an Intel MacBook Pro (16-inch, 2019) when doing a clean re-build:

rm -rf build nimcache test_keymanager_api && ulimit -n 1024 && \
    NIM_COMMIT=version-1-6 make -j test 

Disabling LTO fixes it (ensure clean re-build)

rm -rf build nimcache test_keymanager_api && ulimit -n 1024 && \
    NIM_COMMIT=version-1-6 make NIMFLAGS="-d:disableLTO" -j test

At this time, I have not identified a minimal program to reproduce this problem.

@etan-status
Copy link
Contributor

In spec/crypto.nim, tagging func load*(v: ValidatorPubKey): Option[CookedPubKey] as {.inline.} makes this also fail with -d:disableLTO, confirming the problem with inlining when not applying a workaround.

Curiously, {.noinline.} does not fully fix this, though. As expected, the load function remains un-inlined and the Option is correctly constructed. However, the get() after the load is still forgotten, so cookedKey is still incorrect, making el incorrect when the iterator is inlined into listLocalValidators.

proc listLocalValidators(validatorsDir,
                         secretsDir: string): seq[ValidatorPubKey] {.
     raises: [Defect].} =
  var validators: seq[ValidatorPubKey]
  try:
    for el in listLoadableKeys(validatorsDir, secretsDir,
                               {KeystoreKind.Local}):
      validators.add el.toPubKey()
  except OSError as err:
    error "Failure to list the validator directories",
          validatorsDir, secretsDir, err = err.msg
  validators

What works, though, is to drop the lent keyword for proc get*[T](self: Option[T]): lent T {.inline.} in Nim/lib/pure/options.nim. Without the lent keyword, even if load is annotated with {.inline.} explicitly, the tests succeed with LTO enabled!

@etan-status
Copy link
Contributor

Regarding lent, not sure if it is just a coincidence, as even putting a single debugEcho into load also works around the bug. Or, as in the original workaround, using a slightly different syntax instead of the block syntax.

@etan-status
Copy link
Contributor

After going through the generated C code thoroughly once more, this issue is actually rooted in the Nim --> C stage, not in C --> ASM! It is surfaced by aggressive optimization levels (UB), and promoted by inlining (which LTO increases).

Reported upstream with minimal example: nim-lang/Nim#20107

@zah zah merged commit 250f7b4 into unstable Aug 7, 2022
@zah zah deleted the keystore-locks branch August 7, 2022 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants