-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fragment rules should be inlined in ATN #3685
Comments
I would said in most cases inlining produces higher performing grammars (without prove). I thought fragment rules are only needed for grammar developers because it's convenient to extract common part to get rid of duplication and developers expect such rules are being inlined. There is no access to internal structure of lexer rule, fragment rules are applicable only for lexer rules.
What else syntax errors you mean except of |
Fragment rules also produce weird behavior when used in a parser. I don't remember many details, as I avoid that scenario by habit already for a long time. However it all depends on what a fragment rule is publicly know as. Is it considered being a normal rule (with some special conditions applied) or really meant as an externalized rule part, just for convenience? In the first case it should also stay a separate rule (and show up as such in an ATN graph) otherwise inlining could help. But keep in mind, what's displayed as a rule in the graph is in reality just another bunch of ATN states (rule start/end states, and two basic states with a set transition). |
@mike-lischke I don't think fragment rules are visible to the parser. They are only meant as helper rules for other Lexer rules. I'm not sure what inlining would do but it would likely expand the size of the ATN, like expanding macros. Given that we are creating DFAs, I also don't see what the performance improvement would be since there's no such thing as a fragment in the DFA. It automatically in lines |
Exactly. BTW, we have a feature request about fragment parser rules: #1963. Anyway I think they should work similar to fragments for parser rules if implemented.
Not significantly for most part of grammars if expands. Sometime inlining decreases number of states (the sample in the start message). And I think performance is more important than insignifical increasing of ATN.
It's not only about DFA but also about ATN matching and execution. Lexer makes a lot of calls to closure and getEpsilonTarget methods during recognition.
I can try to measure performance using benchmarks if abovementioned arguments are not enough. |
I'm not sure I can agree completely with this analysis. Once we have converted an ATN chunk to DFA for a given input sequence, we never do anymore LexerATNConfig for that. We only touch the ATN again for different char sequences. For a fixed sequence like the following:
we would visit an extra state or two initially but never again. |
fragment
rules similar toinline
functions in programming languages and they should be inlined during ATN generation (if possible, i.e. if they don't have loops). It should improve performance of generated lexers.Grammar
Actual ATN
Expected ATN
No ref to
DIGIT
rule:NUMBER
token should be treated asNUMBER: [0-9]+;
.The text was updated successfully, but these errors were encountered: