-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using a "skip" token in parser rule causes major performance problems in the Go target parser #3875
Comments
Just in case I'm doing something wrong, I tested 4.11.1 on the SimpleBoolean issue that was reported. Indeed, 4.11.1 does fix that problem (fast!), but asm8080 now takes 376s for cpm22.asm. |
I did a side-by-side comparison of the debug output from ParserATNSimulator between CSharp and Go targets for asm8080/cpm22.asm. The outputs are essentially identical which means they are computing the same results. But, clearly, the Go implementation is much much slower. |
I will take a look. There is still an issue with way too many allocations in parse_context. I know what it is (failure to 'emulate' Java objects), but I felt it was too much to try and cram in as a fix at the last minute- programmer checks in code and then gets on a plane type situation ;) There was one fix that made it to what is the non-v4 version that did not get in to the v4 version (thought they were supposed to be identical). I have submitted a PR for that and is awaiting Ter's approval. But I doubt it is that because that fixes something that just did not work before. Tomorrow (public holiday here), I will take a look at those grammars and work out what is going on - I suspect it is the allocation problem, but could be a hidden bug that I did not find yet. I am surprised that they do not terminate (or did they just need more time to run?). Thanks for the heads up. This code is taking some time to wrangle in to shape. |
BTW - I get this when generating, which is a correct warning:
ANTLR4 will deal with this I think, but that could be refactored to eliminate that warning. This may have something to do with it of course. I will not correct it until I understand why the go runtime seems to hate this grammar ;) |
@jimidle Sorry, the asm8080/cpm22.asm parse does terminate. It takes about 380 seconds. The asm8080 grammar "prog" rule should be changed to something like this:
(A "prog" can be an empty file, or a file with a bunch of EOLs, or a file with a bunch of "line" that can end with EOL or just EOF. This rule is a royal PITA. The Z80 grammar probably has a similar problem.) But, even with the change, it still takes 380 seconds. I noticed that my driver still contained the old "case folding buffer stream" code for Go with a reference to "github.com/antlr/antlr4/runtime/Go/antlr", not "github.com/antlr/antlr4/runtime/Go/antlr/v4"--didn't do a find/sed. So, it had both "github.com/antlr/antlr4/runtime/Go/antlr" and "github.com/antlr/antlr4/runtime/Go/antlr/v4" in the last build on grammars-v4. But, I changed that locally, and the parse still takes 380s. I'll look at the grammar itself and see if there's a workaround. |
Fixed these rules, now works fast.
|
I did suspect the grammar, but one of the central tenets of ANTLR4 was that
it should more or less deal with anything. So I will use the original
grammar to find out what is balking the go runtime. Whatever it is, will
likely be a benefit to all grammars when I fix it.
…On Fri, Sep 9, 2022 at 20:49 Ken Domino ***@***.***> wrote:
Fixed these rules, now works fast.
prog : EOL* ((line EOL)* line EOL*)? EOF ;
line : lbl? (instruction | directive) comment? | lbl comment? | comment ;
COMMENT : ';' ~ [\r\n]* ;
—
Reply to this email directly, view it on GitHub
<#3875 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ7TMFT73I32T6JQBQOEW3V5MW6RANCNFSM6AAAAAAQHTMF7U>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@jimidle Thanks sounds good. I'll work on fixing up the grammars over in grammars-v4/asm. The should also add to the "to do" a check that Antlr tool does to flag a grammar if a parser rule uses a "skip" or off-channel symbol. |
As the grammar was written, it causes an full context parse for every line in the file, and this exposes some problems in the go runtime there. While this grammar needs quite a bit of work to be "real", it has proven useful as an indicator of problems. Sometimes (no offence meant to the contributor) poorly put together grammars can be useful. We might want to keep that grammar as it is somewhere outside where it will currently be corrected, in order that it can be used as a regression test. If there is no general place to do this, then I will add it as regression test data in the go directory (Go will not download that when the module is imported - well, at least it won't when we get the tags correct and stop adding v4.x.x tags). |
For what it is worth, the "corrected" grammar, which I would write like this below takes 0.31 seconds on my system vs 0.37 for the unchanged original grammar. Actually I would probably take out the embedded literals in the parser grammar as well. grammar asm8080;
prog
: line* EOF
;
line
: lbl? (instruction | directive)? EOL
;
instruction
: opcode expressionlist?
;
opcode
: OPCODE
;
register_
: REGISTER
;
directive
: argument? assemblerdirective expressionlist
;
assemblerdirective
: ASSEMBLER_DIRECTIVE
;
lbl
: label ':'?
;
expressionlist
: expression (',' expression)*
;
label
: name
;
expression
: multiplyingExpression (('+' | '-') multiplyingExpression)*
;
multiplyingExpression
: argument (('*' | '/') argument)*
;
argument
: number
| register_
| dollar
| name
| string_
| ('(' expression ')')
;
dollar
: '$'
;
string_
: STRING
;
name
: NAME
;
number
: NUMBER
;
ASSEMBLER_DIRECTIVE
: (O R G) | (E N D) | (E Q U) | (D B) | (D W) | (D S) | (I F) | (E N D I F) | (S E T)
;
REGISTER
: 'A' | 'B' | 'C' | 'D' | 'E' | 'H' | 'L' | 'PC' | 'SP'
;
OPCODE
: (M O V) | (M V I) | (L D A) | (S T A) | (L D A X) | (S T A X) | (L H L D) | (S H L D) | (L X I) | (P U S H) | (P O P) | (X T H L) | (S P H L) | (P C H L) | (X C H G) | (A D D) | (S U B) | (I N R) | (D C R) | (C M P) | (A N A) | (O R A) | (X R A) | (A D I) | (S U I) | (C P I) | (A N I) | (O R I) | (X R I) | (D A A) | (A D C) | (A C I) | (S B B) | (S B I) | (D A D) | (I N X) | (D C X) | (J M P) | (C A L L) | (R E T) | (R A L) | (R A R) | (R L C) | (R R C) | (I N) | (O U T) | (C M C) | (S T C) | (C M A) | (H L T) | (N O P) | (D I) | (E I) | (R S T) | (J N Z) | (J Z) | (J N C) | (J C) | (J P O) | (J P E) | (J P) | (J M) | (C N Z) | (C Z) | (C N C) | (C C) | (C P O) | (C P E) | (C P) | (C M) | (R N Z) | (R Z) | (R N C) | (R C) | (R P O) | (R P E) | (R P) | (R M)
;
fragment A
: ('a' | 'A')
;
fragment B
: ('b' | 'B')
;
fragment C
: ('c' | 'C')
;
fragment D
: ('d' | 'D')
;
fragment E
: ('e' | 'E')
;
fragment F
: ('f' | 'F')
;
fragment G
: ('g' | 'G')
;
fragment H
: ('h' | 'H')
;
fragment I
: ('i' | 'I')
;
fragment J
: ('j' | 'J')
;
fragment K
: ('k' | 'K')
;
fragment L
: ('l' | 'L')
;
fragment M
: ('m' | 'M')
;
fragment N
: ('n' | 'N')
;
fragment O
: ('o' | 'O')
;
fragment P
: ('p' | 'P')
;
fragment Q
: ('q' | 'Q')
;
fragment R
: ('r' | 'R')
;
fragment S
: ('s' | 'S')
;
fragment T
: ('t' | 'T')
;
fragment U
: ('u' | 'U')
;
fragment V
: ('v' | 'V')
;
fragment W
: ('w' | 'W')
;
fragment X
: ('x' | 'X')
;
fragment Y
: ('y' | 'Y')
;
fragment Z
: ('z' | 'Z')
;
NAME
: [a-zA-Z] [a-zA-Z0-9."]*
;
NUMBER
: '$'? [0-9a-fA-F] + ('H' | 'h')?
;
COMMENT
: ';' ~ [\r\n]* -> skip
;
STRING
: '\u0027' ~'\u0027'* '\u0027'
;
EOL
: [\r\n]
;
WS
: [ \t] -> skip
; |
Excellent. I'll add your note to the "to do list". |
I think it would be good to have a test that involved for context parsing all the time, @jimidle. Probably good for the other targets to check this as well. |
Agreed. I will put something together in the next few days.
…On Fri, Sep 16, 2022 at 07:56 Terence Parr ***@***.***> wrote:
I think it would be good to have a test that involved for context parsing
all the time, @jimidle <https://github.com/jimidle>. Probably good for
the other targets to check this as well.
—
Reply to this email directly, view it on GitHub
<#3875 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ7TMETPGB6PVG22Q2YMSDV6OZTJANCNFSM6AAAAAAQHTMF7U>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
awesome! |
I had not realized we had not merged that one - I will still come back to
this and check it out again though.
…On Nov 21, 2022 at 2:29:50 AM, Terence Parr ***@***.***> wrote:
awesome!
—
Reply to this email directly, view it on GitHub
<#3875 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ7TMEWBUSFB3KSUUSWPNDWJJUZ5ANCNFSM6AAAAAAQHTMF7U>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Sorry, it's actually still there in the current "dev" branch--tested the wrong thing. The problem is that the grammar references COMMENT on the RHS of a parser rule, and COMMENT is sent to "skip". That's a bad grammar. I applied a workaround to the grammar to not reference "skipped" tokens. (Note, I wrote an xpath grep expression to find grammars that do this, CSharp and Java terminate fine with the bad ol' grammar, Go doesn't work well. The grammar to test is:
|
hi @kaby76 What's the warning you get? Seems like the tool not the target should give the warning maybe. |
@parrt Sorry, I guess I'm not clear. The problem is that asm8080 grammar used a lexer symbol that was "skip" on the right-hand side of a parser rule.
The symbol won't appear on the token stream because "skip" are never on the token stream. Even "channel(HIDDEN)" tokens seem dubious to use on the RHS of a parser rule. I don't think the Antlr4 Tool .jar checks for this, right? |
I can see adding a warning for this since we can detect it. I don't think we weren't about it now. Seems like there should be a syntax error or something from the parser though if the input is not matching because somebody's trying to match one of those hidden tokens. |
Yes, but the parser works if optional--at least for Java or CSharp.
|
Correct. I consider these errors are very usefull since I've already met several users that encoutered the problem with not existing tokens.
It can be resolved on the grammar level. |
In any case, I'm not sure why Go would have such a fit with a grammar like this. |
If there is a simple Go binary that's taking too long, there are a number of Go performance geeks that might enjoy takng a crack at it (if you think the issue is specific to the Go side of things and not the antlr side of things). |
I am gradually knocking off the performance issues. It’s not as simple as
just pure performance. One has to analyze the behavior in totality.
…On Fri, Nov 25, 2022 at 06:13 Damian Gryski ***@***.***> wrote:
If there is a simple Go binary that's taking too long, there are a number
of Go performance geeks that might enjoy takng a crack at it (if you think
the issue is specific to the Go side of things and not the antlr side of
things).
—
Reply to this email directly, view it on GitHub
<#3875 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ7TMBA66MTULOD3ZRYZJ3WJ7SANANCNFSM6AAAAAAQHTMF7U>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
As a side note, the arena proposal to be implemented in Go v1.20 may help to improve the runtime performance. |
I'd like to see a consistent set of profiles indicating that allocations / garbage collection are the biggest issue before migrating to something like the proposed arena package. |
At the moment, I have some prime examples of over allocation because of
inferior code, and will soon get rid of them. But it really is that I just
need to improve the old code, which includes some reorg of the simulated
inheritance (which is causing a lot of unnecessary allocations).
Then it’s profiling iteration. However, the runtime is way faster than it
was at the start of this old thread. It seems that perhaps it’s time to
close this thread and start another with the state of things as they are,
which is not too bad. I will then use the existing grammars we have to
identify any other areas that can be improved. Bit of a process here.
…On Thu, Dec 8, 2022 at 09:08 Damian Gryski ***@***.***> wrote:
I'd like to see a consistent set of profiles indicating that allocations /
garbage collection are the biggest issue before migrating to something like
the proposed arena package.
—
Reply to this email directly, view it on GitHub
<#3875 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ7TMHC5QSEOAB4DZARL6TWMFUMPANCNFSM6AAAAAAQHTMF7U>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
This bug still exists for the current dev branch of the Go runtime (the version of the grammar is this). (I just checked this.) Are there PRs for the Go-target outstanding that I should be testing? I also tested this grammar with the current dev branch of the tool. There is no warning message outputted for using a So, please do not close this Issue. I'm changing the name of the Issue to reflect the actual problem: using a "skip" token in a parser rule causes the Go-target parser to parse extremely slowly. |
I’m on vacation at the moment ken. I’ve only got an iPad. But when I get
back I’ll open some individual tickets (or you can if you have time). Does
that bug belong on this ticket? I won’t close anything until I have a
computer where I can see the wood for the trees. I just feel that we have
one “performance “ ticket but there’s more than one issue.
…On Thu, Dec 8, 2022 at 17:47 Ken Domino ***@***.***> wrote:
It seems that perhaps it’s time to
close this thread and start another with the state of things as they are,
which is not too bad.
This bug still exists for the current dev branch of the Go runtime (the
version of the grammar is this
<https://github.com/antlr/grammars-v4/tree/1ad14f75ee75f375b58c38c7b2cfeb62f26a4960/asm/asm8080>).
(I just checked this.) Are there PRs for the Go-target outstanding that I
should be testing?
I also tested this grammar with the current dev branch of the tool. There
is no warning message outputted for using a skip lexer symbol in a parser
rule. (It does produce warning(154): asm8080.g4:38:0: rule prog contains
an optional block with at least one alternative that can match an empty
string, but it is unrelated to the problem.)
So, please do not close this Issue.
—
Reply to this email directly, view it on GitHub
<#3875 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ7TMDK4WTTVPWHQFV6RZTWMHRH5ANCNFSM6AAAAAAQHTMF7U>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@kaby76 Hi Ken - I am reviewing all the open tickets with the Go runtime. Using the grammar at the hash in your link above, with the dev runtime for go and the dev tool build, I get this: ~/tmp/asm8080 (develop ✘)✹✚✭ ᐅ time ./asm8080 ./CPM22.ASM
./asm8080 ./CPM22.ASM 0.13s user 0.02s system 163% cpu 0.092 total So, I think we can definitely say that this is fixed. Please verify for yourself if you wish to and let us know here, then mark it as closable, or we can take my word for it and ask @parrt to close it. I am fine either way. |
As an extra bonus, if you switch the driver in to SLL mode, then the results from hyperfine are:
And from ~/tmp/asm8080 (develop ✘)✹✚✭ ᐅ time ./asm8080 ./CPM22.ASM
./asm8080 ./CPM22.ASM 0.04s user 0.01s system 107% cpu 0.044 total |
Closing. Thanks. |
I am updating grammars-v4 for 4.11.1. Unfortunately, the asm8080 and asmZ80 grammars now do not terminate (actually killed after 5 minutes by the watchdog program trwdog). For 4.10.1, they terminated. Neither of these grammars contain target-specific code, and are really really simple.
The text was updated successfully, but these errors were encountered: