Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More split dwarf work #529

Merged
merged 10 commits into from
Jul 14, 2020
Merged

More split dwarf work #529

merged 10 commits into from
Jul 14, 2020

Conversation

khuey
Copy link
Contributor

@khuey khuey commented Jul 12, 2020

The pre-DWARF5 GNU split dwarf extension differs from the standardized version in at least two ways:

  1. The various new list sections do not have the headers that they have in the standardized version.
  2. The .debug_loclists section can be present in a .dwo as .debug_loc.dwo, with a slightly different encoding format.

Additionally, the default values for the various base attributes are wrong in DWARF 5, where there is the section header (that is not present in the GNU split dwarf extension) that must be skipped.

To handle all of this, I fix the default base handling by keying off the DWARF version, then I refactor LocationLists and RangeLists because only one of the two sections is present at any given time. Which encoding to use for these lists is no longer keyed off the DWARF version but rather the section in use. That allows the DWARF version to be used to distinguish the pre-DWARF5 GNU split dwarf encoding for location lists.

@philipc
Copy link
Collaborator

philipc commented Jul 13, 2020

Additionally, the default values for the various base attributes are wrong in DWARF 5, where there is the section header (that is not present in the GNU split dwarf extension) that must be skipped.

Why does this matter? Shouldn't those values be set by the attributes? From memory, the reason we need defaults for the GNU extensions is because they don't use those attributes. Edit: and trying to set defaults doesn't accomplish anything because they will only be correct for the first unit.

Since these sections are mutually exclusive, we can use an enum to represent them.

They aren't mutually exlusive. An executable can contain both DWARF 4 and DWARF 5 debuginfo since each object file can be compiled with different settings. Edit: doesn't allowing multiple dwo section names fix the problem this was trying to solve?

@khuey
Copy link
Contributor Author

khuey commented Jul 13, 2020

Additionally, the default values for the various base attributes are wrong in DWARF 5, where there is the section header (that is not present in the GNU split dwarf extension) that must be skipped.

Why does this matter? Shouldn't those values be set by the attributes? From memory, the reason we need defaults for the GNU extensions is because they don't use those attributes.

The attributes are not always present. e.g. clang does not emit the attributes if the values are the defaults.

Since these sections are mutually exclusive, we can use an enum to represent them.

They aren't mutually exlusive. An executable can contain both DWARF 4 and DWARF 5 debuginfo since each object file can be compiled with different settings.

Ugh, that's obnoxious. Is that really something we care to support?

The alternative here is probably to add a new CompatGnuDebugLocDwo type and include it in LocationLists.

@philipc
Copy link
Collaborator

philipc commented Jul 13, 2020

The attributes are not always present. e.g. clang does not emit the attributes if the values are the defaults.

From what I can tell, that's only for split DWARF, and it works because these values should be inherited from the skeleton unit. So the correct fix is to inherit those values, instead of guessing where it will be. I can be convinced otherwise if you can show me code in a consumer (e.g. gdb or lldb) that calculates these values instead.

Is that really something we care to support?

Yes. It probably occurs more due to objects being produced by different tools rather than different settings. There's less of a case for it in split DWARF, but we definitely need to support it otherwise. In split DWARF, I think it would be fine to treat the .debug_loc.dwo section the same as .debug_loclists.dwo (so the version check shouldn't be done for DWO files). I haven't investigated this in depth though.

@philipc
Copy link
Collaborator

philipc commented Jul 13, 2020

So the bug is that RawLocListIter shouldn't be using encoding.version. Instead it should remember if this was LocationLists::debug_loc or LocationLists::debug_loclists.

@khuey
Copy link
Contributor Author

khuey commented Jul 13, 2020

The attributes are not always present. e.g. clang does not emit the attributes if the values are the defaults.

From what I can tell, that's only for split DWARF,

Right.

and it works because these values should be inherited from the skeleton unit.

No. There are four sections with relevant base attributes: .debug_addr, .debug_str_offsets, .debug_loclists, and .debug_rnglists. Of those two (.debug_str_offsets and .debug_loclists) exist in the DWO where there's no need for a explicit base attribute because there's only one unit's worth of data. The DW_AT_str_offsets_base on a skeleton unit is controlling only for attributes that appear on the skeleton unit (e.g. DW_AT_comp_dir) and not for string attributes that appear on the split unit in the DWO. This is why I didn't add it to copy_relocated_attributes.

I thought clang sometimes omitted the DW_AT_rnglists_base but now that I'm looking at it again I think it only omits it for units with no ranges.

So the correct fix is to inherit those values, instead of guessing where it will be. I can be convinced otherwise if you can show me code in a consumer (e.g. gdb or lldb) that calculates these values instead.

gdb doesn't actually support DWARF 5. lldb is doing similar calculations at https://github.com/llvm/llvm-project/blob/e808cab824488af137b62902e65dec3827b83b46/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp#L425. LLVM parses the section headers so it's adjusting the specified offsets down. We're not so we need to adjust the default offset up.

Is that really something we care to support?

Yes. It probably occurs more due to objects being produced by different tools rather than different settings. There's less of a case for it in split DWARF, but we definitely need to support it otherwise. In split DWARF, I think it would be fine to treat the .debug_loc.dwo section the same as .debug_loclists.dwo (so the version check shouldn't be done for DWO files). I haven't investigated this in depth though.

Yeah ok that's not super-terrible then.

@khuey
Copy link
Contributor Author

khuey commented Jul 13, 2020

Table F.1 in the DWARF 5 spec is pretty clear about how the base attributes work. DW_AT_rnglists_base and DW_AT_loclists_base are never used in a split DWARF setup.

@philipc
Copy link
Collaborator

philipc commented Jul 13, 2020

Ok I can see that clang does calculate the header size in some cases.

Still trying to understand how what you have said matches this text from the standard:

The following attributes are not part of a split full compilation unit entry but instead are
inherited (if present) from the corresponding skeleton compilation unit: DW_AT_low_pc,
DW_AT_high_pc, DW_AT_ranges, DW_AT_stmt_list, DW_AT_comp_dir,
DW_AT_str_offsets_base, DW_AT_addr_base and DW_AT_rnglists_base.

Table F.1 in the DWARF 5 spec is pretty clear about how the base attributes work. DW_AT_rnglists_base and DW_AT_loclists_base are never used in a split DWARF setup.

This recently was discussed on the dwarf mailing list, and the conclusion is this is an error in the standard. Edit: that was actually talking about skeleton units. I'll take some time to understand this properly.

@khuey
Copy link
Contributor Author

khuey commented Jul 13, 2020

Still trying to understand how what you have said matches this text from the standard:

The following attributes are not part of a split full compilation unit entry but instead are
inherited (if present) from the corresponding skeleton compilation unit: DW_AT_low_pc,
DW_AT_high_pc, DW_AT_ranges, DW_AT_stmt_list, DW_AT_comp_dir,
DW_AT_str_offsets_base, DW_AT_addr_base and DW_AT_rnglists_base.

The text is wrong. Having DW_AT_rnglists_base on a skeleton unit makes no sense because the range lists live in the DWO. And I discussed how the strings stuff works above.

Table F.1 in the DWARF 5 spec is pretty clear about how the base attributes work. DW_AT_rnglists_base and DW_AT_loclists_base are never used in a split DWARF setup.

This recently was discussed on the dwarf mailing list, and the conclusion is this is an error in the standard.

How, exactly?

@khuey
Copy link
Contributor Author

khuey commented Jul 13, 2020

So the bug is that RawLocListIter shouldn't be using encoding.version. Instead it should remember if this was LocationLists::debug_loc or LocationLists::debug_loclists.

That doesn't actually help because the same encoding is used to choose between debug_loc and debug_loclists.

Put another way, in the DWARF 4 GNU split dwarf case LocationsList::raw_locations needs to somehow know to choose the debug_loclists branch instead of the debug_loc branch.

@philipc
Copy link
Collaborator

philipc commented Jul 13, 2020

That doesn't actually help because the same encoding is used to choose between debug_loc and debug_loclists.

Right, so that check needs changing too. The sections aren't mutually exclusive, but their use within a unit is. So I think which format to use needs to be derived from properties of the unit (its version) and the file (is it a DWO?). Does that sound right?

@khuey
Copy link
Contributor Author

khuey commented Jul 13, 2020

That doesn't actually help because the same encoding is used to choose between debug_loc and debug_loclists.

Right, so that check needs changing too. The sections aren't mutually exclusive, but their use within a unit is. So I think which format to use needs to be derived from properties of the unit (its version) and the file (is it a DWO?). Does that sound right?

That's right, but currently nothing in gimli knows about DWARF v4 split dwarf files at all. I guess we could pass that information into Dwarf or something.

@philipc
Copy link
Collaborator

philipc commented Jul 13, 2020

I think your changes to the base attribute defaults are good, but I'll look at it a bit more later to make sure I understand.

@khuey
Copy link
Contributor Author

khuey commented Jul 13, 2020

I think actually I should push the branching on the version into default_from_encoding and call it for both the v4 and v5 cases.

khuey added 4 commits July 13, 2020 17:22
…utes to just after the section header in DWARF 5 .dwos.

Of the four sections with corresponding base attributes (.debug_addr, .debug_str_offsets, .debug_loclists, and .debug_rnglists), only .debug_addr must be linked into the final executable in DWARF 5. The other three sections can be left behind in a .dwo file. In this case, a compiler (e.g. clang 11) can omit the base attribute value for the sections left behind in the .dwo because, since there is only one split CU represented in the .dwo, there is no ambiguity. However, gimli relies on these offsets to skip past the DWARF 5 section headers, so we need to change the default values here to point beyond the section header.

Note that the DW_AT_str_offsets_base attribute value may be present on the corresponding skeleton unit in the main DWARF file. This is controlling *only* for string attributes on that single DIE (for e.g. its DW_AT_comp_dir attribute). The only base value present on the skeleton unit in DWARF 5 that affects the split unit in the .dwo file is the DW_AT_addr_base value.
…dard split DWARF setup.

The GNU split DWARF extension uses a single .debug_ranges section in the final binary and a DW_AT_GNU_ranges_base attribute, while the standardized DWARF 5 version uses a .debug_rnglists table that remains in the .dwo and does not require relocation, and thus no rnglists_base attribute is needed.
@khuey khuey force-pushed the downstream branch 2 times, most recently from 7dda08e to a3964b3 Compare July 14, 2020 02:42
@khuey
Copy link
Contributor Author

khuey commented Jul 14, 2020

Alright how about this version?

Copy link
Collaborator

@philipc philipc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Just a couple of small things.

@khuey
Copy link
Contributor Author

khuey commented Jul 14, 2020

Comments addressed.

Copy link
Collaborator

@philipc philipc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@philipc philipc merged commit e659d6e into gimli-rs:master Jul 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants