Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(symcache): Explicitly map "holes" between functions #897

Merged
merged 7 commits into from
Feb 20, 2025

Conversation

loewenheim
Copy link
Contributor

Currently, if we look up an address that falls "between" two functions/symbols, we will always resolve it to the first of the two functions/symbols because each range implicitly extends to the start of the next one. This can lead to bad user experience when we map (invalid) addresses to entirely nonsensical functions.

Therefore, we now explicitly insert a mapping at the end of the function that maps to "nowhere" to denote the function's end.

Currently, if we look up an address that falls "between"
two functions/symbols, we will always resolve it to the
first of the two functions/symbols because each range
implicitly extends to the start of the next one. This can
lead to bad user experience when we map (invalid) addresses
to entirely nonsensical functions.

Therefore, we now explicitly insert a mapping at the end
of the function that maps to "nowhere" to denote the
function's end.
@loewenheim loewenheim self-assigned this Feb 10, 2025
@loewenheim loewenheim requested a review from Swatinem February 10, 2025 17:50
@loewenheim loewenheim added bug and removed bug labels Feb 10, 2025
@loewenheim loewenheim requested a review from a team February 11, 2025 09:17
Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test case would be nice!

if let btree_map::Entry::Vacant(vacant_entry) = self.ranges.entry(end_address) {
vacant_entry.insert(NO_SOURCE_LOCATION);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why do we need to add this logic in two different places?

Copy link
Contributor Author

@loewenheim loewenheim Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We insert both "symbols" and "functions" into the symcache. A symbol is extracted from the symbol table in the debug file and has no line records, just a name and an address (and maybe a size). So functions and symbols take two separate code paths.

@@ -10,6 +10,7 @@ use symbolic_debuginfo::{DebugSession, FileFormat, Function, ObjectLike, Symbol}
use watto::{Pod, StringTable, Writer};

use super::{raw, transform};
use crate::raw::NO_SOURCE_LOCATION;
Copy link
Member

@philipphofmann philipphofmann Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will the user see in Sentry with this approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They would see an unsymbolicated frame with a "missing_symbol" status, like this (in JSON):

{
  "status": "missing_symbol",
  "original_index": 2,
  "instruction_addr": "0x100821ae0",
  "package": "",
  "function": "<redacted>",
  "in_app": true
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's sufficient for the use case that lead to this PR in Slack. Thanks again @loewenheim 💯

@loewenheim loewenheim requested a review from jjbayer February 11, 2025 14:04
Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@Swatinem Swatinem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code and logic looks reasonable. you folks mentioned a slack conversation? can you distill that down to an explanation of what exactly is happening here?
why is a stack frame pointing at the end of a function?
maybe because its a diverging function call, and the "return address" being pushed to the stack is never expected to be actually returned to at runtime, so its pointing past the callers call instruction to some nop instructions?

#[test]
fn test_lookup_between_functions() {
let buffer = ByteView::open(fixture(
"macos/3CD3E3CC-281E-3AF3-84EB-CDE6D56F0559.dSYM/Contents/Resources/DWARF/CrashLibiOS",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you only commit the actual dwarf payload and remove all the other dSYM junk thats not useful?

also, that file is a ~1.5M. is it possible to slim that down somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to just reuse the Python fixture instead of committing a copy. Didn't pay attention to the file size before.

@loewenheim
Copy link
Contributor Author

the code and logic looks reasonable. you folks mentioned a slack conversation? can you distill that down to an explanation of what exactly is happening here?

The essence is that we have a user case where we are getting a frame with an address that falls "between" functions. Under the current logic, we map it to the last function before it and that creates an entirely nonsensical frame (the function we're finding has nothing to do with the error that happened). The user thinks that this is caused by stack corruption.

@Swatinem
Copy link
Member

Well yes, so much is obvious. I was rather wondering what legitimate reason there might be that a return addr on the stack is pointing past the end of the function?
I mentioned my hunch that the call instruction never assumes to be returned to. This should be verifyable by looking at a disassembler to see that the last instruction of the function is the call, which implicitly pushes the addr after that instruction on the stack.

@philipphofmann
Copy link
Member

you folks mentioned a slack conversation?

@Swatinem, here is a link to the Slack thread if you want more details.

@loewenheim loewenheim merged commit 67b533b into master Feb 20, 2025
14 checks passed
@loewenheim loewenheim deleted the fix/symcache-holes branch February 20, 2025 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants