fragment rules should be inlined in ATN #3685

KvanTTT · 2022-04-29T14:07:26Z

fragment rules similar to inline functions in programming languages and they should be inlined during ATN generation (if possible, i.e. if they don't have loops). It should improve performance of generated lexers.

Grammar

lexer grammar G;
NUMBER: DIGIT+;
fragment DIGIT: [0-9];

Actual ATN

Expected ATN

No ref to DIGIT rule:

NUMBER token should be treated as NUMBER: [0-9]+;.

The text was updated successfully, but these errors were encountered:

KvanTTT · 2022-04-29T17:57:37Z

@sharwell wrote:

This sometimes produces a higher performing grammar, but sometimes does not.

I would said in most cases inlining produces higher performing grammars (without prove).

I thought fragment rules are only needed for grammar developers because it's convenient to extract common part to get rid of duplication and developers expect such rules are being inlined. There is no access to internal structure of lexer rule, fragment rules are applicable only for lexer rules.

It can also have weird/unexpected side effects like changing the lookahead distance and recovery characteristics for certain syntax errors.

What else syntax errors you mean except of token recognition error at? I haven't seen anything else for lexer. Also, I don't completely understand what weird/unexpected side effect it can have compare to manual inlining. Inlining is only applicable for fragment rules without loops.

mike-lischke · 2022-04-30T10:55:00Z

Fragment rules also produce weird behavior when used in a parser. I don't remember many details, as I avoid that scenario by habit already for a long time.

However it all depends on what a fragment rule is publicly know as. Is it considered being a normal rule (with some special conditions applied) or really meant as an externalized rule part, just for convenience? In the first case it should also stay a separate rule (and show up as such in an ATN graph) otherwise inlining could help. But keep in mind, what's displayed as a rule in the graph is in reality just another bunch of ATN states (rule start/end states, and two basic states with a set transition).

parrt · 2022-05-01T17:03:02Z

@mike-lischke I don't think fragment rules are visible to the parser. They are only meant as helper rules for other Lexer rules. I'm not sure what inlining would do but it would likely expand the size of the ATN, like expanding macros. Given that we are creating DFAs, I also don't see what the performance improvement would be since there's no such thing as a fragment in the DFA. It automatically in lines

KvanTTT · 2022-05-02T12:16:55Z

@mike-lischke I don't think fragment rules are visible to the parser. They are only meant as helper rules for other Lexer rules.

Exactly. BTW, we have a feature request about fragment parser rules: #1963. Anyway I think they should work similar to fragments for parser rules if implemented.

I'm not sure what inlining would do but it would likely expand the size of the ATN, like expanding macros.

Not significantly for most part of grammars if expands. Sometime inlining decreases number of states (the sample in the start message). And I think performance is more important than insignifical increasing of ATN.

Given that we are creating DFAs, I also don't see what the performance improvement would be since there's no such thing as a fragment in the DFA. It automatically in lines

It's not only about DFA but also about ATN matching and execution. Lexer makes a lot of calls to closure and getEpsilonTarget methods during recognition. execATN is called every time even if there is an initialized dfa state in decisionToDFA. The more states ATN have the more calls lexer executes, the more object is allocated (LexerATNConfig). For example, if reference to fragment rule exists, the following code is executed in execATN as well:

case Transition.RULE:
	RuleTransition ruleTransition = (RuleTransition)t;
	PredictionContext newContext =
		SingletonPredictionContext.create(config.context, ruleTransition.followState.stateNumber);
	c = new LexerATNConfig(config, t.target, newContext);
	break;

I can try to measure performance using benchmarks if abovementioned arguments are not enough.

parrt · 2022-06-18T22:34:10Z

I'm not sure I can agree completely with this analysis. Once we have converted an ATN chunk to DFA for a given input sequence, we never do anymore LexerATNConfig for that. We only touch the ATN again for different char sequences. For a fixed sequence like the following:

fragment A : [ab] ;
X : A 'x' ;

we would visit an extra state or two initially but never again.

KvanTTT mentioned this issue Apr 29, 2022

ATN optimization: removing of redundant epsilon transitions and related basic states #3676

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fragment rules should be inlined in ATN #3685

fragment rules should be inlined in ATN #3685

KvanTTT commented Apr 29, 2022 •

edited

Loading

KvanTTT commented Apr 29, 2022 •

edited

Loading

mike-lischke commented Apr 30, 2022

parrt commented May 1, 2022

KvanTTT commented May 2, 2022 •

edited

Loading

parrt commented Jun 18, 2022

fragment rules should be inlined in ATN #3685

fragment rules should be inlined in ATN #3685

Comments

KvanTTT commented Apr 29, 2022 • edited Loading

Grammar

Actual ATN

Expected ATN

KvanTTT commented Apr 29, 2022 • edited Loading

mike-lischke commented Apr 30, 2022

parrt commented May 1, 2022

KvanTTT commented May 2, 2022 • edited Loading

parrt commented Jun 18, 2022

KvanTTT commented Apr 29, 2022 •

edited

Loading

KvanTTT commented Apr 29, 2022 •

edited

Loading

KvanTTT commented May 2, 2022 •

edited

Loading