-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance of expression parsing #192
Comments
Nearly all of the time spent parsing the expression is due to the full context prediction portion of ANTLR 4. In the current reference release, this code does not use a dynamic DFA to improve parsing performance. However, the 4.0-opt release uses a highly optimized implementation of this code allowing the expression above to parse in a few milliseconds. The suggestions below assume you are using the reference release: You can save a great deal of time on correct inputs by using a two-stage parsing strategy.
Unlike using |
Thanks for the quick followup Sam. The two-stage parsing strategy seems to work fine for me. Is the 4.0-opt release going to be part of the next public Antlr release? And if so, would you recommend going back to a 1-stage parse or not? |
I always use the two-stage strategy. In the optimized builds speed isn't so much a problem but it still tends to reduce the overall memory requirements. There are no plans right now for the optimized build to replace the reference version (or ship alongside it), but bits and pieces of it are being incorporated into the reference version over time. |
that is described in the book as well. On Thu, Mar 21, 2013 at 6:33 AM, Sam Harwell [email protected]:
Dictation in use. Please excuse homophones, malapropisms, and nonsense. |
I'm having the same problem, I'm using C# instead. |
@ryanlin86 The C# target is based on the optimized fork of the project. Hopefully as part of the 4.2 release I can write an article describing the steps available to track down performance problems along with some of the solutions that are available. |
So you are confirming that two-stage is still very slow for that grammar and input? Have you tried the Java target? it's likely the same but that's the only target I can try this on. |
I haven’t tried java target( which I’m not familiar with java) Here is my grammar file: grammar Grammar; @Header { @parser::members @lexer::members /*
compileUnit returns [LogicalExpression value] statementList returns[LogicalExpression value] statement returns [LogicalExpression value] block returns [LogicalExpression value] blockStatement returns [LogicalExpression value] forInit returns [LogicalExpression[] value] parExpression returns [LogicalExpression value] forUpdate returns [LogicalExpression[] value] statementExpression returns [LogicalExpression value] expressionList returns [LogicalExpression[] value] expression returns [LogicalExpression value] range returns [LogicalExpression value] array returns [LogicalExpression value] qualified_identifier returns [LogicalExpression value] literal returns [ValueExpression value] /*
NULL TRUE FALSE INTEGER FLOAT STRING DATETIME E : ('E'|'e') ('+'|'-')? DIGIT+ ID fragment IDStart fragment IDPart fragment IDUNICODE fragment LETTER fragment DIGIT fragment EscapeSequence fragment HexDigit fragment UnicodeEscape LINE_COMMENT /* Ignore white spaces */ …………………………………..…………………………………..…………………………………..………………………………….. And here is my input for test performance: 1==1 and 1==1 and 1==1 and 1==1 and 1==1 and 1==1 and 1==1 and 1==1 and 1==1 and 1==1 and 1==1 and 1==1 and 1==1 and 1==1 and 1==1 and 1==1 and 1==1 and 1==1 It compiled like forever If I changed this to (1==1 and 1==1 and 1==1 and 1==1 and 1==1 and 1==1 and 1==1 and 1==1 and 1==1) and (1==1 and 1==1 and 1==1 and 1==1 and 1==1 and 1==1 and 1==1 and 1==1 and 1==1) then it compiled in milliseconds. Sent from Windows Mail From: Terence Parr So you are confirming that two-stage is still very slow for that grammar and input? Have you tried the Java target? it's likely the same but that's the only target I can try this on. — |
wait. so both the small one you originally posted and this one don't work? If the small exhibits the problem why post the big one? just trying to figure out which grammar to focus on. |
This is fixed by #400. It now takes a fraction of a second to parse expressions 10x the size of the one in your original example. |
Add regression test for expression performance (closes #192)
See: antlr/antlr4#192 (comment) Co-authored-by: Evgeniy <[email protected]>
…rser ### What changes were proposed in this pull request? This PR follows the antlr/antlr4#192 (comment) to correct the current implementation of the **two-stage parsing strategy** in `AbstractSqlParser`. ### Why are the changes needed? This should be a long-standing issue, before [SPARK-38385](https://issues.apache.org/jira/browse/SPARK-38385), Spark uses `DefaultErrorStrategy`, and after [SPARK-38385](https://issues.apache.org/jira/browse/SPARK-38385) Spark uses class `SparkParserErrorStrategy() extends DefaultErrorStrategy`. It is not a correct implementation of the "two-stage parsing strategy" As mentioned in antlr/antlr4#192 (comment) > You can save a great deal of time on correct inputs by using a two-stage parsing strategy. > > 1. Attempt to parse the input using BailErrorStrategy and PredictionMode.SLL. > If no exception is thrown, you know the answer is correct. > 2. If a ParseCancellationException is thrown, retry the parse using the default > settings (DefaultErrorStrategy and PredictionMode.LL). ### Does this PR introduce _any_ user-facing change? Yes, the Spark SQL parser becomes more powerful, SQL like `SELECT 1 UNION SELECT 2` parse succeeded after this change. ### How was this patch tested? New UT is added. Closes #40835 from pan3793/SPARK-42552. Authored-by: Cheng Pan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
Using Antlr 4.0 I've run into a performance problem with parsing a simple expression language. Here is the grammar:
And here is the sample file I'm trying to parse:
And just in case, here is how I am calling everything:
On my machine it takes just over 14 minutes to parse the given expression.
The text was updated successfully, but these errors were encountered: