Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntactic requirements of the hinting section #2

Closed
lars-t-hansen opened this issue Feb 16, 2021 · 5 comments
Closed

Syntactic requirements of the hinting section #2

lars-t-hansen opened this issue Feb 16, 2021 · 5 comments

Comments

@lars-t-hansen
Copy link

This was covered somewhat in #1, but it bears discussing again: Is it enforcable to create additional (essentially syntactic) requirements on the branch hinting section, given that the section is optional? For general engine simplicity and uniformity across implementations, my preference would be something like these:

  • function sections are sorted in increasing function index order
  • function indices are not repeated
  • hints are sorted in increasing offset order within each function section
  • hint offsets are not repeated
  • every offset in the hint section references a branch instruction

Failing validation would be desirable if the rules are violated, but is probably not possible. But by creating the rules, we make it more explicit when tools can discard the section and when engines can ignore it, and processing the section is simpler and faster.

(It's fine to do without these rules, but having them seems better than not having them. Put the complexity in the tool, not in the VM.)

@yuri91
Copy link
Collaborator

yuri91 commented Feb 16, 2021

The first 4 requirements seem straightforward to put in the spec. For example in the custom name section there are similar requirements for the name maps (https://webassembly.github.io/spec/core/appendix/custom.html#name-maps):

An indirect name map assigns names to a two-dimensional index space, where secondary indices are grouped by primary indices. It consists of a vector of primary index/name map pairs in order of increasing index value, where each name map in turn maps secondary indices to names. Each primary index must be unique, and likewise each secondary index per individual name map.

For the last one I wonder what the interaction with streaming compilation would be.
Currently we say that the custom branch hints section should appear before the code section, so the hints are available during the parsing of the code section.

But suppose that the first N hints are "correct", and then the N+1 hint has a wrong byte offset. We discover this only after some code has already compiled, and the hints section has already been validated. What should happen exactly?

My idea would be to allow any byte offset, but to ignore the ones that don't point to a valid branch instruction. This has also the side effect of being forward compatible with the expansion of valid hintable instructions.

Is this solution undesirable from an engine perspective?

@lars-t-hansen
Copy link
Author

The last condition is not too hard to validate, but it must happen a little later. Our streaming compiler at least has access to the full bytecode for a function before it starts compiling the function. If we wanted to, and if there is a hints section for a function, we could traverse the hints section in one pass at the start of the compilation for the function and for each offset look up the bytecode in the function and verify that it is a branch. This would happen on the compilation thread, so it would parallelize nicely.

(In general, we perform bytecode validation in the streaming compiler as we read bytecodes during compilation - there's no separate validation pass - and bailing out on error is not an issue. But I think the hints are best validated up front, if we are going to validate them. Which we won't be able to, if it's a custom section.)

I worry most about differences between engines and tools resulting from a too-flexible format, not about implementation efficiency, though that counts too. Rules, even ones that don't let us reject a module, will help us avoid that because they will make the producers produce unambiguous output. Tools like wabt could warn or even error out on malformed data.

@yuri91
Copy link
Collaborator

yuri91 commented Mar 15, 2021

@lars-t-hansen does #10 sufficiently address this issue?

@lars-t-hansen
Copy link
Author

Yeah, looks OK.

@yuri91
Copy link
Collaborator

yuri91 commented Mar 15, 2021

Thanks, I am closing this issue then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants