Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save DFA for faster parser start up #3682

Open
parrt opened this issue Apr 27, 2022 · 20 comments
Open

Save DFA for faster parser start up #3682

parrt opened this issue Apr 27, 2022 · 20 comments

Comments

@parrt
Copy link
Member

parrt commented Apr 27, 2022

ANTLR performs grammar analysis on the fly and records partial parsing/lexing lookahead decisions in a DFA representation of the ATN created from the grammar statically. Every time the parser starts up it must re-create this DFA based upon the incoming sentences. Just as we serialize the ATN, it might be interesting to save the current DFA for reloading like a cached Version. I bet @mike-lischke and @jcking and @KvanTTT would be interested in this.

The main difficulty I see is reestablishing the exact state of the combined ATN/DFA frozen from the previous parser. There might be state variables and such to worry about. Anyway I thought I would get this logged as a potential feature.

@mike-lischke
Copy link
Member

Interesting indeed. I mentioned this as a possible optimisation a few times in the past, when people asked for faster startup times.

@KvanTTT
Copy link
Member

KvanTTT commented Apr 27, 2022

I haven't touched DFA yet, but it's interesting to look into.

@KvanTTT
Copy link
Member

KvanTTT commented Apr 27, 2022

Does it relevant only for lexer/parser being started not the first time? Where the cache should be loaded from? Disk?

@parrt
Copy link
Member Author

parrt commented Apr 27, 2022

Yes, the idea would be to train a parser to know about the kinds of sentences it will run into, which can be freeze dried from the DFA after such training. The next time the parser starts, it could restart from that old state; come to think of it, it's kind of like an incremental parser concept.

This would have to be investigated and design very carefully but could be useful, particularly for command line tools that have to start up and do same grammar analysis each time.

@mike-lischke
Copy link
Member

Someone once mentioned also web services, where a parser is instantiated for each web request, which means currently it always starts in cold state.

@jcking
Copy link
Collaborator

jcking commented Apr 28, 2022

As of 4.10 C++, Java, and Go all share the DFA between multiple threads/goroutines. C++ and Go lazily deserialize the ATN, so instantiating a parser/lexer is cheap after the first time making start times faster. Java deserializes the ATN when the class is loaded.

@KvanTTT
Copy link
Member

KvanTTT commented Apr 29, 2022

Someone once mentioned also web services, where a parser is instantiated for each web request, which means currently it always starts in cold state.

Why just don't create a static instance that active during web-server uptime to get rid of cold state?

@jcking
Copy link
Collaborator

jcking commented Apr 29, 2022

Someone once mentioned also web services, where a parser is instantiated for each web request, which means currently it always starts in cold state.

Why just don't create a static instance that active during web-server uptime to get rid of cold state?

For Java, C++, and Go, the lexer and parsers are not themselves thread safe. You cannot use the same lexer or parser instance across threads. However see my above comment for why this is not an issue for those languages, constructing the lexer or parser is cheap.

@mike-lischke
Copy link
Member

The DFA sharing exists already since a long time, which is why modifying the DFA is the only part which needs protection. Lazy initialization doesn't help with warming up the DFA, because it has to build first. Independant of how many parser instances exist.

Taking the training idea further: what if we take the ATN and produce all relevant configurations by running the prediction on "each" possible path. Memory and time limits play a role here, of course, but maybe hot paths can be identified that provide a base set of configurations that only require small additions for other paths when running a real parser.

@sharwell
Copy link
Member

sharwell commented Apr 29, 2022

It may be beneficial during this to pre-populate missing edges of the DFA states appearing at the k=1 (and maybe even k=2) positions of decisions.

Main reason this never made it into the optimized fork is it doesn't tend to be the part that slows down the startup. The slowest operations tend to be very long lookahead sequences in which small changes to input are unlikely to produce DFA cache hits, so even though it would be much faster for the training input it would not be significantly different for fresh input.

@parrt
Copy link
Member Author

parrt commented May 1, 2022

A number of interesting questions/comments here!

  1. Re: Services: I'm not sure that a parsing service is as useful as a service that runs antlr on grammars; the idea would be that people can avoid installing Java to run antlr tool.
  2. Re: finding and prepping "hot paths"... Lookahead languages are definitely not regular, so no DFA will handle all possible lookahead strings, which is why we had to go to dynamic analysis rather than static analysis. I think that's more or less what you are suggesting. On a related note I found this in the paper at "7.3 Effect of lookahead DFA on performance": "To demonstrate the cache’s effect on parsing speed, we disabled the DFA and repeated our Java experiments. Consider the 3.73s parse time from Figure 9 to reparse the Java corpus with pure cache hits. With the lookahead DFA cache disabled completely, the parser took 12 minutes (717.6s). Figure 10 shows that disabling the cache increases parse time from 203ms to 42.5s on the 3.2M file"
  3. Re LL(1): I believe that we do static LL(1) analysis so that there is no dynamic analysis for a decision if it is LL(1). E.g., a : A{;} | B ; generates a switch on lookahead(1) and no call to the dynamic analysis: switch (_input.LA(1)) {...}.

This will have to be designed very carefully but, from a user point of view, I can imagine a function call that saves the current DFA state (thread safely) and a function that creates the DFA by loading from a file. The DFA cannot be created by antlr tool and saved during code generation because it needs dynamic analysis to create the dfa. On the other hand, once we have a dfa filled up, ANTLR tool could re-use that and convert it to source code like we do with our ATN serialization with integer arrays in target code. OR, what if the save() function actually generated an int array or whatever in the target language so that it could be linked in rather than loaded with file operations? Hmm... That might be less applicable to many target languages so it might be easier just to generate a serialization that could be read back in by any target. One might want to warm up the dfa in C++ and then we use it for parsers derived from it in Java and python.

Without looking, I am actually not 100% positive there is a convenient way to save the DFA state in such a way that it is easy to reload. Recall that the ATN and DFA State machines are completely intertwined in memory. I think that the ATN points into the DFA and vice versa. Or maybe just the dfa points into the ATN so that it knows how to get more look ahead if necessary.

Anyway, I recommend that nobody jumps into an implementation as this will require significant design work and we should get that right first. If someone knows all of the issues, I suspect it's not a hard implementation.

@KvanTTT
Copy link
Member

KvanTTT commented May 2, 2022

To demonstrate the cache’s effect on parsing speed, we disabled the DFA and repeated our Java experiments. Consider the 3.73s parse time from Figure 9 to reparse the Java corpus with pure cache hits. With the lookahead DFA cache disabled completely, the parser took 12 minutes (717.6s). Figure 10 shows that disabling the cache increases parse time from 203ms to 42.5s on the 3.2M file

Now I'm wondering if my epsilon removing optimization can decrease parse time with disable cache. It should be more noticable if parse time is big.

Recall that the ATN and DFA State machines are completely intertwined in memory. Or maybe just the dfa points into the ATN so that it knows how to get more look ahead if necessary.

Yes, DFA depends on ATN but ATN does not depend on DFA.

@marcospassos
Copy link
Contributor

I'm interested in this. Any updates?

@marcospassos
Copy link
Contributor

The DFA sharing exists already since a long time

@mike-lischke, could you please point out where it does happen on the Java target?

@KvanTTT
Copy link
Member

KvanTTT commented Oct 26, 2022

It isn't implemented at all, including Java target.

@marcospassos
Copy link
Contributor

@KvanTTT, I refer to "The DFA sharing exists already since a long time".

@parrt
Copy link
Member Author

parrt commented Oct 26, 2022

No work has been done on this I don't think. Would be very useful, though.

@KvanTTT
Copy link
Member

KvanTTT commented Oct 26, 2022

@KvanTTT, I refer to "The DFA sharing exists already since a long time".

You can explore classes in the dfa directory and their usings. Cache is located in ParserInterpreter.

@ericvergnaud
Copy link
Contributor

ericvergnaud commented Oct 26, 2022 via email

@parrt
Copy link
Member Author

parrt commented Oct 28, 2022

Yes I think either a static or a dynamic version would be of use

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants