Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ATN optimization: removing of redundant epsilon transitions and related basic states #3676

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

KvanTTT
Copy link
Member

@KvanTTT KvanTTT commented Apr 23, 2022

  • Introduced RedundantEpsilonRemover that performs the following ATN optimizations for BasicState:
    • Removing of single incoming epsilon transition
    • Removing of single outgoing epsilon transition with several incoming transitions
  • Removed all optimizations from ParserATNFactory since they are useless, not fully correct and are performed on the separated optimization step
  • Introduced ATNOptimizerHelper that do the following:
    • Calculates incoming transitions that are used in ATN optimizers
    • Tracks replacement of state that are being removed during optimization. Old states are being replaced on the new ones in the final step (updateAstNodes)
    • Compresses array of ATN states (removes null items after previous optimization steps, compressStates)
  • Fixed the previous ATNOptimizer and renamed to SetMerger. Now it considers incoming transitions and ATNOptimizerHelper accurately tracks replacements.

Implemented optimizations decreases ATN especially for lexers and should improve performance for generated parsers because of decreased number of method calls. Also they don't affect runtime code except of interpreter part (that is buggy anyway).

Removing of single incoming epsilon transition

Input

single_incoming_output

Output

single_incoming_output

Removing of single outgoing epsilon transition with several incoming transitions

Input

single_outgoing_input

Output

single_outgoing_output

Example

lexer grammar L;
A: 'A' 'B'+ (C | D) 'E';
fragment C: 'C';
fragment D: 'D';

Old ATN

Number of states: 23

old_atn

New ATN

Number of states: 15

new_atn

Plans

It looks like BlockEndState (8, 14), LoopEndState (10) can be also safely removed in the same way since they are not used in decisions. But it might require small changes in runtimes.

@KvanTTT
Copy link
Member Author

KvanTTT commented Apr 23, 2022

It looks like behavior of JavaScript runtime differs from other ones because some tests are failing. I'll investigate.

@KvanTTT KvanTTT force-pushed the atn-states-optimization branch from 07b1071 to 9e7d74e Compare April 23, 2022 14:27
@KvanTTT
Copy link
Member Author

KvanTTT commented Apr 23, 2022

Definitely it was a bug in JavaScript runtime. I've fixed it by separated commit.

@KvanTTT KvanTTT force-pushed the atn-states-optimization branch 3 times, most recently from 23c9591 to af06ddf Compare April 23, 2022 19:54
@KvanTTT KvanTTT changed the title ATN optimization: removing of redundant epsilon transitions and related states ATN optimization: removing of redundant epsilon transitions and related basic states Apr 23, 2022
@KvanTTT
Copy link
Member Author

KvanTTT commented Apr 23, 2022

Now all tests are green, @parrt it's ready for review :) @mike-lischke maybe it is interesting for you as well.

@parrt
Copy link
Member

parrt commented Apr 23, 2022

Hiya. Thanks for looking into this but I'm just not sure we should be messing around with an alg/data structure that took me forever to understand and now I have to remember everything to see if this is OK. Shouldn't I be working on fixing bugs in the IDE? I only have a certain amount of time so I'm just trying to optimize.

@mike-lischke
Copy link
Member

mike-lischke commented Apr 24, 2022

Actually, I was thinking about removal of useless states since I had the first graphical visualization in my VS Code extension years ago. Also I have seen a number of unused ATN states, but that might be caused by some optimization in the antlr4ts runtime. Just check the ATN for states that have no outgoing transitions.

@parrt I don't think there's a significant risk by removing useless intermediate states. We have our test suites in place which confirm everything is still working as it should.

@KvanTTT Regarding loop start and end states: I use these special states to handle loops in my sentence generator, so I'd prefer them to stay.

transitions.remove(i);
recalculateEpsilonOnlyTransitions();
if (epsilonOnlyTransitions != null && epsilonOnlyTransitions != e.isEpsilon()) {
System.err.format(Locale.getDefault(), "ATN state %d has both epsilon and non-epsilon transitions.\n", stateNumber);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it useful to print this information? It's a pretty common situation, I'd say, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should set method be consistent with add method that has such check? https://github.com/antlr/antlr4/blob/master/runtime/Java/src/org/antlr/v4/runtime/atn/ATNState.java#L166 I think yes that's why I added the check to set method.

Copy link
Member

@mike-lischke mike-lischke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the patch to see if I can find any obvious problems, but did not find any. Reviewing the code functionality is beyond me, however. I guess Ter has to take the burden to do that.

@KvanTTT
Copy link
Member Author

KvanTTT commented Apr 24, 2022

I only have a certain amount of time so I'm just trying to optimize.

I consider this is a quite important optimization since it affect all generated parsers. Also, I hope I wrote quite understandable code that is easy for reviewing. Is it possible to review the PR step by step? There is no need to merge it ASAP.

@KvanTTT
Copy link
Member Author

KvanTTT commented Apr 24, 2022

Also I have seen a number of unused ATN states, but that might be caused by some optimization in the antlr4ts runtime. Just check the ATN for states that have no outgoing transitions.

Interesting. I haven't investigated into this.

I don't think there's a significant risk by removing useless intermediate states.

Yes, it looks like removing of basic states is safe because they aren't used in parsing process decisions. Moreover, we already have similar epsilon-removing optimizations in ParserATNFactory and TailEpsilonRemover. But:

  • They don't cover all transitions and states
  • They don't consider incoming transitions that's why they could remove potentially wrong nodes and they don't relink some transitions
  • They don't perform state replacement that could cause incorrect states in AST nodes (see GrammarAST)
  • They are being performed together with ATN construction that looks not very clear

My implementation resolves all these problems. Also, I didn't change the code of runtimes except of Java interpreter and one bug in JS.

We have our test suites in place which confirm everything is still working as it should.

Yes, it should be enough.

Regarding loop start and end states: I use these special states to handle loops in my sentence generator, so I'd prefer them to stay.

I didn't mention start states, they shouldn't be removed. But only BlockEndState and LoopEndState. They only have references to start states and it's redundant info that can be restored from respective start states. But it looks like they affect left recursion rules and I'm trying to investing by what way. Anyway, such an optimization is not implemented in the current PR.

@jcking
Copy link
Collaborator

jcking commented Apr 26, 2022

Overall this seems fine and a good idea. However I am not sure if the closure algorithm relies on this implicitly in some way. At first glance it doesn't seem like it does.

Getting this in for 4.11 seems appropriate, which gives us a while.

@KvanTTT
Copy link
Member Author

KvanTTT commented Apr 26, 2022

Overall this seems fine and a good idea. However I am not sure if the closure algorithm relies on this implicitly in some way. At first glance it doesn't seem like it does.

Do you mean basic states or end states? Basic states are used only for helper purposes, for merging or branching transitions. End states are used for closure algorithms, at least it's considered during deserialization: https://github.com/antlr/antlr4/blob/dev/runtime/Java/src/org/antlr/v4/runtime/atn/ATNDeserializer.java#L402 I think it can be fixed because start states contains all necessary info, but it requires changes in runmties. I've decided to postpone it at least until merging of the current PR.

@jcking
Copy link
Collaborator

jcking commented Apr 26, 2022

From a theory perspective, this seems fine and should be safe.

Do you mean basic states or end states?

No.

Out of curiosity, how did you generate the diagrams? The -atn option on the tool?

@KvanTTT
Copy link
Member Author

KvanTTT commented Apr 26, 2022

No.

I don't understand, the first is safe or the second is also should be safe for removing?

Out of curiosity, how did you generate the diagrams? The -atn option on the tool?

Yes. Also, it possible just to use the following code in arbitraty place in code:

DOTGenerator gen = new DOTGenerator(g);
String dot = gen.getDOT(atn.states.get(2), g.getRuleNames(), false);

The string can be put to VSCode and render it instantly using graphviz plugin.

@parrt
Copy link
Member

parrt commented Apr 26, 2022

Hiya! What problem are we trying to solve here? The risk reward here is unclear and I have 1 billion other open source projects that require my time. (I'm waiting for a video chat to start regarding my decision tree visualization tool with a colleague in Romania.) I can't possibly accept any change to ATN structure without a thorough investigation. There could be other tools that rely on that structure etc. And you are asking me to risk screwing up one of the most complicated algorithms I've ever had to work on ($ALL(*)$). If this were 2013, I would have all of that in my memory and could quickly evaluate whether this is worth the risk. (Actually I'd rely on @sharwell to do the thinking there ha ha.) Seems like I should be working on the plugin, which is what I'm focusing on at the moment. I'm not sure this is the best way to spend my limited time capital.

If you guys see this as solving a real problem, then I'm happy to spend the time investigating. I think though at this point we are chasing a metric; remember that chasing low blood pressure metric causes death after a while haha

@sharwell
Copy link
Member

I get really scared when I think it could even possibly impact things like #1398

@sharwell
Copy link
Member

sharwell commented Apr 26, 2022

My fork does optimization of the ATN at runtime instead of at generation time (applies to antlr4ts as well):
https://github.com/tunnelvisionlabs/antlr4/blob/1de7b24e31bd72ff2da7361465ce8a9d0154acd7/runtime/Java/src/org/antlr/v4/runtime/atn/ATNState.java#L123-L127

This allows the serialized ATN to retain its accurate reflection of the behavior described by the grammar, but the runtime to still avoid unnecessary transitions during closure.

Graphs produced by ATNState.transitions and ATNState.optimizedTransitions may be slightly different.

@parrt
Copy link
Member

parrt commented Apr 26, 2022

Thanks for your input @sharwell, as always. Guys, if it makes Sam nervous, it definitely makes me nervous! haha.

@KvanTTT
Copy link
Member Author

KvanTTT commented Apr 26, 2022

Hiya! What problem are we trying to solve here?

Improving quality and performance of ANTLR. Especially considering current PR also fixes some potential bugs that I've described above. For instance, for LargeLexer test I have ~30% less .class file for lexer (618 vs 847 KB). Maybe it makes sense to write some performance tests but it's more complicated.

There could be other tools that rely on that structure etc.

@mike-lischke one of developers of such tools. He said he didn't find any obvious problem.

I think though at this point we are chasing a metric

Not all changes have big effect. Quality includes a lot of small improvements steps. Also, 30% improvement for size is not small (in new Kotlin compiler we fight for every second for big projects).

I get really scared when I think it could even possibly impact things like #1398

We have several tests that cover the case here that work fast: https://github.com/antlr/antlr4/tree/dev/runtime-testsuite/resources/org/antlr/v4/test/runtime/descriptors/Performance. Also, I can write more benchmark tests if it's necessary. Also, we can try to test them on our big number of grammars.

My fork does optimization of the ATN at runtime instead of at generation time (applies to antlr4ts as well):

But it's not the complete solution, there is no need to encode useless states and transitions to ATN.

Thanks for your input @sharwell, as always. Guys, if it makes Sam nervous, it definitely makes me nervous! haha.

Maybe it makes sense of publishing RC versions for testing on real users.

BTW, I can help with repository maintaining, I'm working on Kotlin compiler and this project is much more complicated than ANTLR at least for me. I'm not completely understand all ANTLR algorithms and details but it consumed a week to study ATN structure and write new optimizer. On the next step I'm going to investigate ALL(*) more deeper. I just wanted to consolidate our efforts but it looks like you have very limited time and it makes sense to develop my own ANTLR fork.

@jcking
Copy link
Collaborator

jcking commented Apr 26, 2022

Why don't we just throw the optimization implemented in this pull request behind a command line flag? Something like -remove-redundant-epsilon-transitions?

@KvanTTT
Copy link
Member Author

KvanTTT commented Apr 26, 2022

Why don't we just throw the optimization implemented in this pull request behind a command line flag? Something like -remove-redundant-epsilon-transitions?

  • It compicates testing (with or without option)
  • If implemented and turned off, it's expected that existing optimizations also should not be activated (in ParserATNFactory and TailEpsilonRemover)
  • I don't clealy see what problem it resolves, it useful only for intermediate versions for testing.

@mike-lischke
Copy link
Member

mike-lischke commented Apr 27, 2022

There could be other tools that rely on that structure etc.

@mike-lischke one of developers of such tools. He said he didn't find any obvious problem.

Note though, I have only checked the structure of the patch, not the functionality.

We have several tests that cover the case here that work fast: https://github.com/antlr/antlr4/tree/dev/runtime-testsuite/resources/org/antlr/v4/test/runtime/descriptors/Performance. Also, I can write more benchmark tests if it's necessary. Also, we can try to test them on our big number of grammars.

I agree, when the tests indicate that the behaviour is still like what is expected (and that's their entire purpose) then why not use the patch? If all fails, it can be reverted easily.

Maybe it makes sense of publishing RC versions for testing on real users.

Good idea, if we can make the release automatic. Currently there's still too much manual work involved.

@mike-lischke
Copy link
Member

mike-lischke commented Apr 27, 2022

@sharwell While we are talking: do you plan to update antl4ts to use the changed ATN serialization introduced in 4.10? Would you consider getting antlr4ts into the main ANTLR4 repo as just another target, so that we can use just one tool to generate code for all targets?

@KvanTTT
Copy link
Member Author

KvanTTT commented Apr 27, 2022

Sam, how can an ATN state which has exactly one incoming and one outgoing transition, with no labels attached to them impact anything surrounding it?

Actually, optimizer also relinks several incoming transitions if outgoing is epsilon (old TailEpsilonRemover just cut all incoming transitions except one. It's not correct but doesn't lead to error because it's actually redundant links from end state. There is no info about incoming transitions in old implementation).

Epsilon transitions also may have labels (outermostPrecedenceReturn), but I've introduced isSafeForRemovingEpsilon that carefully considers the flag.

@mike-lischke
Copy link
Member

Yes, after a second thought I realized this and hence removed that comment.

@parrt
Copy link
Member

parrt commented Apr 27, 2022

hi guys, thanks for the comments and all of your continued efforts! I appreciate that this reduces the size of the lexer but I don't think I've ever heard a complaint about .class file size. I would be much more interested in knowing how it affects lexing performance across a variety of problems and across targets. If I'm not mistaken, we've gotten massive improvements in performance just by optimizing the runtime libraries and a little bit of the code generation; e.g., work done by @mike-lischke and @jcking on C++ (soon Go). Seems like that's easier and safer than messing with the critical secret sauce of ANTLR.

What I'm hearing is that you want me to accept a change to the secret sauce written by someone who admits they do not understand the algorithm fully. On the other side, you have the two authors recommending caution. We should all know our limitations and I currently do not have the algorithm in my head and all of the subtle details that we obsessed over for years. I just don't see how the risk reward equation suggest integrating this. I don't know about you, but I don't have confidence that we've covered everything in our unit test.

I'm not saying this PR is incorrect, but it would take me a lot of work to evaluate and I don't know if it's worth the effort given other things I'd like to work on. I'd rather focus on getting a typescript target, improving Go target, improving the plug-in, working on maybe plugins for other tools. I don't understand this obsession with the ATN.

Regarding potentially wasted effort, @KvanTTT, I believe you will recall me suggesting that you coordinate with me before you spend a week on some thing I might not accept.

I hope you will agree that my leadership over literally 35 years on this project has objectively been good. I don't always make the right decisions but I hope you will admit that I exercise excellent judgment, given my plurality of successful open source projects across languages and machine learning.

Let's leave this potentially valuable optimization open for when the algorithm has been reloaded in my mind from cold storage. I've only had a few months to restart my ANTLR efforts after ignoring it for six or seven years.

@sharwell
Copy link
Member

sharwell commented Apr 27, 2022

But it's not the complete solution, there is no need to encode useless states and transitions to ATN.

It intentionally preserves extra information, which is valuable in cases where tools either manipulate the ATN at runtime for various reasons or use the ATN information to produce graphs representing the rule structure/relations.

Would you consider getting antlr4ts into the main ANTLR4 repo as just another target, so that we can use just one tool to generate code for all targets?

The antlr4ts target is based on my optimized Java fork, so it's not compatible with the shared code generation approach. The antlr4ts-tool package contains the matching code generator for this target.

do you plan to update antl4ts to use the changed ATN serialization introduced in 4.10?

My optimized forks (Java, C#, and TypeScript) have a stronger backwards compatibility requirement than the main project. They will only be upgraded if the changes are meaningful in the context of that specific target and can be performed with strict adherence to backwards compatibility. Changes which impact compatibility (e.g. any change which would require an author regenerate their grammar in order to execute against the new runtime) will be reverted during the update.

@KvanTTT
Copy link
Member Author

KvanTTT commented Apr 27, 2022

I would be much more interested in knowing how it affects lexing performance across a variety of problems and across targets.

Ok, it makes sense to write some benchmark tests.

Seems like that's easier and safer than messing with the critical secret sauce of ANTLR.

Maybe, but optimizing ATN also has advantage, it's applicable for all targets, not concrete.

I don't understand this obsession with the ATN.

Not obsession, just step-by-step improving of the tool.

Regarding potentially wasted effort, @KvanTTT, I believe you will recall me suggesting that you coordinate with me before you spend a week on some thing I might not accept.

I don't regret about wasted time, I'd still implemented it anyway out of curiosity. By pointing time period I just emphasized the fact the structure of ATN is not so complicated (ok, maybe I don't know about all details). Also, It's not so easy and meaningful to describe optimization without touching the code. After I had touched it, it was too late to start an issue.

Let's leave this potentially valuable optimization open for when the algorithm has been reloaded in my mind from cold storage.

So be it.

It intentionally preserves extra information, which is valuable in cases where tools either manipulate the ATN at runtime for various reasons or use the ATN information to produce graphs representing the rule structure/relations.

I started implementing the optimization because of bulky graphs in VSCode. We can estimate runtime functionality by performance tests (using JMH for instance?).

@parrt
Copy link
Member

parrt commented Apr 27, 2022

@KvanTTT sounds great. Thanks for your understanding. (BTW, "bulky graphs in VSCode" actually does sound like a good motivation.)

@kaby76
Copy link
Contributor

kaby76 commented Apr 29, 2022

@KvanTTT I was wondering about the new 15-state ATN (in your opening comment) why states 11 and 12 are not removed and replaced with a single transition for the non-terminals C and D? Overall, I think tightening up the ATN might help in both performance and readability. But, the biggest factor in performance, at least in the parser and what I've seen in grammars-v4, is because the grammar has ambiguities or doesn't use Kleene operators.

@KvanTTT
Copy link
Member Author

KvanTTT commented Apr 29, 2022

@KvanTTT I was wondering about the new 15-state ATN (in #3676 (comment)) why states 11 and 12 are not removed and replaced with a single transition for the non-terminals C and D?

Because SetTransition is only applicable for terminal symbols (chars for lexer and tokens for parser). BTW, there is another optimization that should be implemented: fragment rules should be inlined, i.e. instead refs to C, D rules it should be SetTransition with [C, D] literals (analogy with inline functions).

But, the biggest factor in performance, at least in the parser and what I've seen in grammars-v4, is because the grammar has ambiguities or doesn't use Kleene operators.

Generally, it looks like ATN can be converted to DFA within one rule (no epsilon and no same transitions). It should improve performance but it might require a lot of changes in runtimes. Also it might significantly increase number of states (but for small number of cases I think).

@sharwell
Copy link
Member

sharwell commented Apr 29, 2022

fragment rules should be inlined

This sometimes produces a higher performing grammar, but sometimes does not. It can also have weird/unexpected side effects like changing the lookahead distance and recovery characteristics for certain syntax errors.

@KvanTTT
Copy link
Member Author

KvanTTT commented Apr 29, 2022

Let's move discussion about inlining to the separated issue: #3685

@KvanTTT
Copy link
Member Author

KvanTTT commented May 3, 2022

I've completed some benchmarks on some grammar (from grammars-v4) and input (from examples subdirs):

I used JMH library for JVM benchmarking (with warming up).

# JMH version: 1.35 
# VM version: JDK 11.0.11, Java HotSpot(TM) 64-Bit Server VM, 11.0.11+9-LTS-194
# VM invoker: C:\Program Files\Java\jdk-11.0.11\bin\java.exe
# VM options: <none>
# Blackhole mode: full + dont-inline hint (auto-detected, use -Djmh.blackhole.autoDetect=false to disable) 
# Warmup: 1 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Benchmark mode: Average time, time/op
# Benchmark: benchmark.GrammarBenchmarks.test

Old ATN

Grammar Score Error Units
Java 65,399 ± 0,475 ms/op
TSql 3093,503 ± 52,942 ms/op
TSqlLexer 3,547 ± 0,109 ms/op
HTML 21498,609 ± 713,526 ms/op

New ATN

Grammar Score Error Units Diff
Java 61,143 ± 0,974 ms/op -6.5%
TSql 2913,740 ± 35,263 ms/op -5.8%
TSqlLexer 3,591 ± 0,082 ms/op +1.24%
HTML 20739,729 ± 296,810 ms/op -3.53%

Info about lexer-only grammar is not very relevant because it's very fast anyway and it looks like error.

Here the full testing code: GrammarBenchmarks.java

I've concluded that optimized ATN works up to 6.5% faster. I think it's quite significant improvement especially considering decreased size of generated code and affection all runtimes.

@kaby76
Copy link
Contributor

kaby76 commented May 3, 2022

Nice! It's a win--minor but everything counts. I still think you can minimize more states by eliminating all epsilon closures for an even simpler NFA.

@KvanTTT
Copy link
Member Author

KvanTTT commented May 3, 2022

Nice! It's a win--minor but everything counts.

It's not so minor. In Kotlin compiler that currently I work on 1% is already win :)

I still think you can minimize more states by eliminating all epsilon closures for an even simpler NFA.

I was thinking about it but it could lead a lot of changes in runtimes (I doubt @parrt will even review it especially considering the situation with this more safer request). Currently there a lot of states of different kind, they could be unified. In theory, it looks like every rule could be transformed to DFA or almost DFA.

@parrt
Copy link
Member

parrt commented May 3, 2022

heh, great! Nice work, @KvanTTT. Can you tell me how much input you ran through the various grammars because I couldn't see anything obvious in the code looking on my phone before I race off to work. It's very possible that a reduction in the size of the ATN gives such a speed improvement for small input, but remember that the speed comes from the DFA cache which has to warm up. (The reason we are interested in preloading it.) The ATN size literally does not matter anymore once we have the DFA warmed up so I'd rather focus on warm up or whatever.

On the other hand, as you pointed out there is motivation to reduce the size of ATNs for incorporation into development environments. I would have to relearn how all of this stuff works in great detail however before I could accept the change as we have discussed.

@KvanTTT
Copy link
Member Author

KvanTTT commented May 3, 2022

Can you tell me how much input you ran through the various grammars because I couldn't see anything obvious in the code looking on my phone before I race off to work

All samples from examples directories: Java, TSql, HTML. But for HTML I removed some files because benchmarks were too slow (BTW the files are quite big). You can test it yourself but in this way you have to change location of grammarsRepositoryDirectory. Also, after branch switching the command mvn clean install -DskipTests should be run for entire project.

It's very possible that a reduction in the size of the ATN gives such a speed improvement for small input, but remember that the speed comes from the DFA cache which has to warm up. (The reason we are interested in preloading it.)

I included warmup iteration (for DFA cache building), this time is not included in result. By the way, warmup time is higher but not so much (+10%). Moreover, as I understand, warmup time is also important especially for C++.

The ATN size literally does not matter anymore once we have the DFA warmed up so I'd rather focus on warm up or whatever.

Not sure it's completely true. At least lexer calls execATN even if dfa state is initialized as I've written in the issue about fragments inlining. In this case, ATN size and states also matter. I haven't yet deeped into parser but it likely works in similar way.

@KvanTTT KvanTTT force-pushed the atn-states-optimization branch from af06ddf to 63b9867 Compare May 3, 2022 18:48
@KvanTTT
Copy link
Member Author

KvanTTT commented May 4, 2022

Here is the table of ATN states number comparison for Java grammar:

Standard Optimized Diff
Lexer 1112 824 -26%
Parser 1764 1441 -18%

@kaby76
Copy link
Contributor

kaby76 commented May 4, 2022

Excellent!

@KvanTTT
Copy link
Member Author

KvanTTT commented Jun 27, 2022

Also, it should decrease memory usage since deserialized ATN also takes less memory, see #3765

KvanTTT added 2 commits July 5, 2022 15:23
* Introduced `RedundantEpsilonRemover` that performs the following ATN optimizations (described below):
  * Removing of single incoming epsilon transition
  * Removing of single outgoing epsilon transition with several incoming transitions
* Removed all optimizations from `ParserATNFactory` since they are useless, not fully correct and are performed on the separated optimization step
* Introduced `ATNOptimizerHelper` that do the following:
  * Calculates incoming transitions that are used in ATN optimizers
  * Tracks replacement of state that are being removed during optimization. Old states are being replaced on the new ones in the final step (`updateAstNodes`)
  * Compresses array of ATN states (removes null items after previous optimization steps, `compressStates`)
* Fixed the previous `ATNOptimizer` and renamed to `SetMerger`. Now it considers incoming transitions and `ATNOptimizerHelper` accurately tracks replacements.

Implemented optimizations decreases ATN especially for lexers and should improve performance for generated parsers because of decreased number of method calls. Also they don't affect runtime code except of interpreter part (that is buggy anyway).

Signed-off-by: Ivan Kochurkin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants