Using a "skip" token in parser rule causes major performance problems in the Go target parser #3875

kaby76 · 2022-09-08T10:47:15Z

I am updating grammars-v4 for 4.11.1. Unfortunately, the asm8080 and asmZ80 grammars now do not terminate (actually killed after 5 minutes by the watchdog program trwdog). For 4.10.1, they terminated. Neither of these grammars contain target-specific code, and are really really simple.

kaby76 · 2022-09-09T00:19:34Z

Just in case I'm doing something wrong, I tested 4.11.1 on the SimpleBoolean issue that was reported. Indeed, 4.11.1 does fix that problem (fast!), but asm8080 now takes 376s for cpm22.asm.

kaby76 · 2022-09-09T00:41:59Z

I did a side-by-side comparison of the debug output from ParserATNSimulator between CSharp and Go targets for asm8080/cpm22.asm. The outputs are essentially identical which means they are computing the same results. But, clearly, the Go implementation is much much slower.

jimidle · 2022-09-09T09:25:52Z

I will take a look. There is still an issue with way too many allocations in parse_context. I know what it is (failure to 'emulate' Java objects), but I felt it was too much to try and cram in as a fix at the last minute- programmer checks in code and then gets on a plane type situation ;)

There was one fix that made it to what is the non-v4 version that did not get in to the v4 version (thought they were supposed to be identical). I have submitted a PR for that and is awaiting Ter's approval. But I doubt it is that because that fixes something that just did not work before.

Tomorrow (public holiday here), I will take a look at those grammars and work out what is going on - I suspect it is the allocation problem, but could be a hidden bug that I did not find yet.

I am surprised that they do not terminate (or did they just need more time to run?).

Thanks for the heads up. This code is taking some time to wrangle in to shape.

jimidle · 2022-09-09T09:33:37Z

BTW - I get this when generating, which is a correct warning:

warning(154): asm8080.g4:38:0: rule prog contains an optional block with at least one alternative that can match an empty string

ANTLR4 will deal with this I think, but that could be refactored to eliminate that warning. This may have something to do with it of course. I will not correct it until I understand why the go runtime seems to hate this grammar ;)

kaby76 · 2022-09-09T11:52:47Z

@jimidle Sorry, the asm8080/cpm22.asm parse does terminate. It takes about 380 seconds.

The asm8080 grammar "prog" rule should be changed to something like this:

prog : ( EOL* | (line EOL)* line EOL*) EOF ;

(A "prog" can be an empty file, or a file with a bunch of EOLs, or a file with a bunch of "line" that can end with EOL or just EOF. This rule is a royal PITA. The Z80 grammar probably has a similar problem.)

But, even with the change, it still takes 380 seconds.

I noticed that my driver still contained the old "case folding buffer stream" code for Go with a reference to "github.com/antlr/antlr4/runtime/Go/antlr", not "github.com/antlr/antlr4/runtime/Go/antlr/v4"--didn't do a find/sed. So, it had both "github.com/antlr/antlr4/runtime/Go/antlr" and "github.com/antlr/antlr4/runtime/Go/antlr/v4" in the last build on grammars-v4. But, I changed that locally, and the parse still takes 380s.

I'll look at the grammar itself and see if there's a workaround.

kaby76 · 2022-09-09T12:16:48Z

@jimidle My God this is such a bad grammar. It's almost wonderful in its "badness".

comment can't derive a default channel string.

line : lbl? (instruction | directive)? comment?;
comment : COMMENT;
COMMENT : ';' ~ [\r\n]* -> skip;

Antlr should flag this. @parrt

kaby76 · 2022-09-09T12:49:33Z

Fixed these rules, now works fast.

prog  : EOL* ((line EOL)* line EOL*)? EOF ;
line : lbl? (instruction | directive) comment? | lbl comment? | comment ;
COMMENT : ';' ~ [\r\n]* ;

jimidle · 2022-09-09T13:58:34Z

I did suspect the grammar, but one of the central tenets of ANTLR4 was that it should more or less deal with anything. So I will use the original grammar to find out what is balking the go runtime. Whatever it is, will likely be a benefit to all grammars when I fix it.

…

On Fri, Sep 9, 2022 at 20:49 Ken Domino ***@***.***> wrote: Fixed these rules, now works fast. prog : EOL* ((line EOL)* line EOL*)? EOF ; line : lbl? (instruction | directive) comment? | lbl comment? | comment ; COMMENT : ';' ~ [\r\n]* ; — Reply to this email directly, view it on GitHub <#3875 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJ7TMFT73I32T6JQBQOEW3V5MW6RANCNFSM6AAAAAAQHTMF7U> . You are receiving this because you were mentioned.Message ID: ***@***.***>

kaby76 · 2022-09-09T14:02:59Z

@jimidle Thanks sounds good. I'll work on fixing up the grammars over in grammars-v4/asm.

The should also add to the "to do" a check that Antlr tool does to flag a grammar if a parser rule uses a "skip" or off-channel symbol.

KvanTTT · 2022-09-09T18:18:46Z

@kaby76 related issue for skip tokens: #2886 (but with using of FRAGMENT instead of skip tokens). Unfortunately, unlikely it will be fixed because my fixing request was closed: #3018

jimidle · 2022-09-10T04:03:15Z

As the grammar was written, it causes an full context parse for every line in the file, and this exposes some problems in the go runtime there. While this grammar needs quite a bit of work to be "real", it has proven useful as an indicator of problems. Sometimes (no offence meant to the contributor) poorly put together grammars can be useful. We might want to keep that grammar as it is somewhere outside where it will currently be corrected, in order that it can be used as a regression test. If there is no general place to do this, then I will add it as regression test data in the go directory (Go will not download that when the module is imported - well, at least it won't when we get the tags correct and stop adding v4.x.x tags).

jimidle · 2022-09-10T08:30:28Z

For what it is worth, the "corrected" grammar, which I would write like this below takes 0.31 seconds on my system vs 0.37 for the unchanged original grammar. Actually I would probably take out the embedded literals in the parser grammar as well.

grammar asm8080;

prog
   : line* EOF
   ;

line
   : lbl? (instruction | directive)? EOL
   ;

instruction
   : opcode expressionlist?
   ;

opcode
   : OPCODE
   ;

register_
   : REGISTER
   ;

directive
   : argument? assemblerdirective expressionlist
   ;

assemblerdirective
   : ASSEMBLER_DIRECTIVE
   ;

lbl
   : label ':'?
   ;

expressionlist
   : expression (',' expression)*
   ;

label
   : name
   ;

expression
   : multiplyingExpression (('+' | '-') multiplyingExpression)*
   ;

multiplyingExpression
   : argument (('*' | '/') argument)*
   ;

argument
   : number
   | register_
   | dollar
   | name
   | string_
   | ('(' expression ')')
   ;

dollar
   : '$'
   ;

string_
   : STRING
   ;

name
   : NAME
   ;

number
   : NUMBER
   ;

ASSEMBLER_DIRECTIVE
   : (O R G) | (E N D) | (E Q U) | (D B) | (D W) | (D S) | (I F) | (E N D I F) | (S E T)
   ;


REGISTER
   : 'A' | 'B' | 'C' | 'D' | 'E' | 'H' | 'L' | 'PC' | 'SP'
   ;


OPCODE
   : (M O V) | (M V I) | (L D A) | (S T A) | (L D A X) | (S T A X) | (L H L D) | (S H L D) | (L X I) | (P U S H) | (P O P) | (X T H L) | (S P H L) | (P C H L) | (X C H G) | (A D D) | (S U B) | (I N R) | (D C R) | (C M P) | (A N A) | (O R A) | (X R A) | (A D I) | (S U I) | (C P I) | (A N I) | (O R I) | (X R I) | (D A A) | (A D C) | (A C I) | (S B B) | (S B I) | (D A D) | (I N X) | (D C X) | (J M P) | (C A L L) | (R E T) | (R A L) | (R A R) | (R L C) | (R R C) | (I N) | (O U T) | (C M C) | (S T C) | (C M A) | (H L T) | (N O P) | (D I) | (E I) | (R S T) | (J N Z) | (J Z) | (J N C) | (J C) | (J P O) | (J P E) | (J P) | (J M) | (C N Z) | (C Z) | (C N C) | (C C) | (C P O) | (C P E) | (C P) | (C M) | (R N Z) | (R Z) | (R N C) | (R C) | (R P O) | (R P E) | (R P) | (R M)
   ;


fragment A
   : ('a' | 'A')
   ;


fragment B
   : ('b' | 'B')
   ;


fragment C
   : ('c' | 'C')
   ;


fragment D
   : ('d' | 'D')
   ;


fragment E
   : ('e' | 'E')
   ;


fragment F
   : ('f' | 'F')
   ;


fragment G
   : ('g' | 'G')
   ;


fragment H
   : ('h' | 'H')
   ;


fragment I
   : ('i' | 'I')
   ;


fragment J
   : ('j' | 'J')
   ;


fragment K
   : ('k' | 'K')
   ;


fragment L
   : ('l' | 'L')
   ;


fragment M
   : ('m' | 'M')
   ;


fragment N
   : ('n' | 'N')
   ;


fragment O
   : ('o' | 'O')
   ;


fragment P
   : ('p' | 'P')
   ;


fragment Q
   : ('q' | 'Q')
   ;


fragment R
   : ('r' | 'R')
   ;


fragment S
   : ('s' | 'S')
   ;


fragment T
   : ('t' | 'T')
   ;


fragment U
   : ('u' | 'U')
   ;


fragment V
   : ('v' | 'V')
   ;


fragment W
   : ('w' | 'W')
   ;


fragment X
   : ('x' | 'X')
   ;


fragment Y
   : ('y' | 'Y')
   ;


fragment Z
   : ('z' | 'Z')
   ;


NAME
   : [a-zA-Z] [a-zA-Z0-9."]*
   ;


NUMBER
   : '$'? [0-9a-fA-F] + ('H' | 'h')?
   ;


COMMENT
   : ';' ~ [\r\n]* -> skip
   ;


STRING
   : '\u0027' ~'\u0027'* '\u0027'
   ;


EOL
   : [\r\n]
   ;


WS
   : [ \t] -> skip
   ;

kaby76 · 2022-09-10T11:12:24Z

Excellent. I'll add your note to the "to do list".

parrt · 2022-09-15T23:56:26Z

I think it would be good to have a test that involved for context parsing all the time, @jimidle. Probably good for the other targets to check this as well.

jimidle · 2022-09-16T04:43:47Z

Agreed. I will put something together in the next few days.

…

On Fri, Sep 16, 2022 at 07:56 Terence Parr ***@***.***> wrote: I think it would be good to have a test that involved for context parsing all the time, @jimidle <https://github.com/jimidle>. Probably good for the other targets to check this as well. — Reply to this email directly, view it on GitHub <#3875 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJ7TMETPGB6PVG22Q2YMSDV6OZTJANCNFSM6AAAAAAQHTMF7U> . You are receiving this because you were mentioned.Message ID: ***@***.***>

parrt · 2022-11-20T01:18:34Z

@kaby76 I merged #3880 so maybe try the latest dev branch?

kaby76 · 2022-11-20T15:28:45Z

@kaby76 I merged #3880 so maybe try the latest dev branch?

This works great. The Go target is roughly 2x the CSharp parse time for asm8080 and asmZ80 examples.

parrt · 2022-11-20T18:29:39Z

awesome!

jimidle · 2022-11-21T04:42:27Z

I had not realized we had not merged that one - I will still come back to this and check it out again though.

…

On Nov 21, 2022 at 2:29:50 AM, Terence Parr ***@***.***> wrote: awesome! — Reply to this email directly, view it on GitHub <#3875 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJ7TMEWBUSFB3KSUUSWPNDWJJUZ5ANCNFSM6AAAAAAQHTMF7U> . You are receiving this because you were mentioned.Message ID: ***@***.***>

kaby76 · 2022-11-21T11:10:28Z

Sorry, it's actually still there in the current "dev" branch--tested the wrong thing.

The problem is that the grammar references COMMENT on the RHS of a parser rule, and COMMENT is sent to "skip". That's a bad grammar. I applied a workaround to the grammar to not reference "skipped" tokens. (Note, I wrote an xpath grep expression to find grammars that do this, for $i in ( //parserRuleSpec//TOKEN_REF[text()=doc("*")//lexerRuleSpec[./lexerRuleBlock/lexerAltList/lexerAlt/lexerCommands/lexerCommand/lexerCommandName/identifier/RULE_REF/text()="skip"]/TOKEN_REF/text()]) return concat("line ", $i/@Line, " col ", $i/@Column, " """, $i/@Text,""""), and fixed it throughout the grammars-v4 repo.) But, the problem with the Go runtime still remains. But, I also haven't checked whether the "dev" branch of the Antlr Tool outputs a warning for this bad grammar.

CSharp and Java terminate fine with the bad ol' grammar, Go doesn't work well. The grammar to test is:

git clone https://github.com/antlr/grammars-v4.git
cd grammars-v4/asm/asm8080
git checkout b5bb1059db7534ff5a7d18f40b900d2023370ce9

parrt · 2022-11-21T21:16:43Z

hi @kaby76 What's the warning you get? Seems like the tool not the target should give the warning maybe.

kaby76 · 2022-11-21T21:43:32Z

@parrt Sorry, I guess I'm not clear. The problem is that asm8080 grammar used a lexer symbol that was "skip" on the right-hand side of a parser rule.

foo : BAR EOF;
BAR : 'foo' -> skip;
WS: [ \n\r\t]+ -> skip;

The symbol won't appear on the token stream because "skip" are never on the token stream. Even "channel(HIDDEN)" tokens seem dubious to use on the RHS of a parser rule. I don't think the Antlr4 Tool .jar checks for this, right?

parrt · 2022-11-21T21:51:43Z

I can see adding a warning for this since we can detect it. I don't think we weren't about it now.

Seems like there should be a syntax error or something from the parser though if the input is not matching because somebody's trying to match one of those hidden tokens.

kaby76 · 2022-11-21T22:03:35Z

Yes, but the parser works if optional--at least for Java or CSharp.

foo: BAR? EOF;

KvanTTT · 2022-11-21T22:10:28Z

I don't think the Antlr4 Tool .jar checks for this, right?

Correct. I consider these errors are very usefull since I've already met several users that encoutered the problem with not existing tokens.

Seems like there should be a syntax error or something from the parser though if the input is not matching because somebody's trying to match one of those hidden tokens.

It can be resolved on the grammar level.

kaby76 · 2022-11-21T22:26:59Z

In any case, I'm not sure why Go would have such a fit with a grammar like this.

dgryski · 2022-11-24T22:13:14Z

If there is a simple Go binary that's taking too long, there are a number of Go performance geeks that might enjoy takng a crack at it (if you think the issue is specific to the Go side of things and not the antlr side of things).

jimidle · 2022-11-25T03:52:18Z

I am gradually knocking off the performance issues. It’s not as simple as just pure performance. One has to analyze the behavior in totality.

…

On Fri, Nov 25, 2022 at 06:13 Damian Gryski ***@***.***> wrote: If there is a simple Go binary that's taking too long, there are a number of Go performance geeks that might enjoy takng a crack at it (if you think the issue is specific to the Go side of things and not the antlr side of things). — Reply to this email directly, view it on GitHub <#3875 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJ7TMBA66MTULOD3ZRYZJ3WJ7SANANCNFSM6AAAAAAQHTMF7U> . You are receiving this because you were mentioned.Message ID: ***@***.***>

wdscxsj · 2022-12-08T04:36:35Z

As a side note, the arena proposal to be implemented in Go v1.20 may help to improve the runtime performance.

dgryski · 2022-12-08T05:08:12Z

I'd like to see a consistent set of profiles indicating that allocations / garbage collection are the biggest issue before migrating to something like the proposed arena package.

jimidle · 2022-12-08T05:13:46Z

At the moment, I have some prime examples of over allocation because of inferior code, and will soon get rid of them. But it really is that I just need to improve the old code, which includes some reorg of the simulated inheritance (which is causing a lot of unnecessary allocations). Then it’s profiling iteration. However, the runtime is way faster than it was at the start of this old thread. It seems that perhaps it’s time to close this thread and start another with the state of things as they are, which is not too bad. I will then use the existing grammars we have to identify any other areas that can be improved. Bit of a process here.

…

On Thu, Dec 8, 2022 at 09:08 Damian Gryski ***@***.***> wrote: I'd like to see a consistent set of profiles indicating that allocations / garbage collection are the biggest issue before migrating to something like the proposed arena package. — Reply to this email directly, view it on GitHub <#3875 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJ7TMHC5QSEOAB4DZARL6TWMFUMPANCNFSM6AAAAAAQHTMF7U> . You are receiving this because you were mentioned.Message ID: ***@***.***>

kaby76 · 2022-12-08T13:47:31Z

It seems that perhaps it’s time to
close this thread and start another with the state of things as they are,
which is not too bad.

This bug still exists for the current dev branch of the Go runtime (the version of the grammar is this). (I just checked this.) Are there PRs for the Go-target outstanding that I should be testing?

I also tested this grammar with the current dev branch of the tool. There is no warning message outputted for using a skip lexer symbol in a parser rule. (It does produce warning(154): asm8080.g4:38:0: rule prog contains an optional block with at least one alternative that can match an empty string, but it is unrelated to the problem.)

So, please do not close this Issue. I'm changing the name of the Issue to reflect the actual problem: using a "skip" token in a parser rule causes the Go-target parser to parse extremely slowly.

jimidle · 2022-12-09T02:45:44Z

I’m on vacation at the moment ken. I’ve only got an iPad. But when I get back I’ll open some individual tickets (or you can if you have time). Does that bug belong on this ticket? I won’t close anything until I have a computer where I can see the wood for the trees. I just feel that we have one “performance “ ticket but there’s more than one issue.

…

On Thu, Dec 8, 2022 at 17:47 Ken Domino ***@***.***> wrote: It seems that perhaps it’s time to close this thread and start another with the state of things as they are, which is not too bad. This bug still exists for the current dev branch of the Go runtime (the version of the grammar is this <https://github.com/antlr/grammars-v4/tree/1ad14f75ee75f375b58c38c7b2cfeb62f26a4960/asm/asm8080>). (I just checked this.) Are there PRs for the Go-target outstanding that I should be testing? I also tested this grammar with the current dev branch of the tool. There is no warning message outputted for using a skip lexer symbol in a parser rule. (It does produce warning(154): asm8080.g4:38:0: rule prog contains an optional block with at least one alternative that can match an empty string, but it is unrelated to the problem.) So, please do not close this Issue. — Reply to this email directly, view it on GitHub <#3875 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJ7TMDK4WTTVPWHQFV6RZTWMHRH5ANCNFSM6AAAAAAQHTMF7U> . You are receiving this because you were mentioned.Message ID: ***@***.***>

jimidle · 2023-05-04T03:37:59Z

@kaby76 Hi Ken - I am reviewing all the open tickets with the Go runtime.

Using the grammar at the hash in your link above, with the dev runtime for go and the dev tool build, I get this:

~/tmp/asm8080 (develop ✘)✹✚✭ ᐅ time ./asm8080 ./CPM22.ASM                      
./asm8080 ./CPM22.ASM  0.13s user 0.02s system 163% cpu 0.092 total

So, I think we can definitely say that this is fixed. Please verify for yourself if you wish to and let us know here, then mark it as closable, or we can take my word for it and ask @parrt to close it. I am fine either way.

jimidle · 2023-05-04T07:15:58Z

As an extra bonus, if you switch the driver in to SLL mode, then the results from hyperfine are:

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./asm8080 ./CPM22.ASM`	35.0 ± 0.4	34.2	36.5	1.00

And from time:

~/tmp/asm8080 (develop ✘)✹✚✭ ᐅ time ./asm8080 ./CPM22.ASM                    
./asm8080 ./CPM22.ASM  0.04s user 0.01s system 107% cpu 0.044 total

kaby76 · 2023-05-04T08:30:43Z

Closing. Thanks.

kaby76 changed the title ~~Go performance for 4.11.1 now a problem for some grammars~~ Yes...Terrible Golang performance for 4.11.1 Sep 9, 2022

kaby76 mentioned this issue Sep 9, 2022

XPath expressions should go across trees kaby76/Trash#175

Open

jimidle mentioned this issue Sep 10, 2022

fix: #3758 Extrememly poor performance on poorly written grammars #3880

Merged

kaby76 mentioned this issue Sep 10, 2022

Fix asm grammars antlr/grammars-v4#2808

Open

kaby76 closed this as completed Nov 20, 2022

kaby76 reopened this Nov 21, 2022

kaby76 changed the title ~~Yes...Terrible Golang performance for 4.11.1~~ Using a "skip" token in parser rule causes major performance problems in the Go target parser Dec 8, 2022

kaby76 closed this as completed May 4, 2023

Using a "skip" token in parser rule causes major performance problems in the Go target parser #3875

Using a "skip" token in parser rule causes major performance problems in the Go target parser #3875

Comments

kaby76 commented Sep 8, 2022 • edited Loading

kaby76 commented Sep 9, 2022

kaby76 commented Sep 9, 2022

jimidle commented Sep 9, 2022

jimidle commented Sep 9, 2022

kaby76 commented Sep 9, 2022

kaby76 commented Sep 9, 2022

kaby76 commented Sep 9, 2022

jimidle commented Sep 9, 2022 via email • edited Loading

kaby76 commented Sep 9, 2022

KvanTTT commented Sep 9, 2022 • edited Loading

jimidle commented Sep 10, 2022

jimidle commented Sep 10, 2022

kaby76 commented Sep 10, 2022

parrt commented Sep 15, 2022

jimidle commented Sep 16, 2022 via email

parrt commented Nov 20, 2022

kaby76 commented Nov 20, 2022

parrt commented Nov 20, 2022

jimidle commented Nov 21, 2022 via email

kaby76 commented Nov 21, 2022

parrt commented Nov 21, 2022

kaby76 commented Nov 21, 2022

parrt commented Nov 21, 2022

kaby76 commented Nov 21, 2022

KvanTTT commented Nov 21, 2022 • edited Loading

kaby76 commented Nov 21, 2022

dgryski commented Nov 24, 2022

jimidle commented Nov 25, 2022 via email

wdscxsj commented Dec 8, 2022

dgryski commented Dec 8, 2022

jimidle commented Dec 8, 2022 via email

kaby76 commented Dec 8, 2022 • edited Loading

jimidle commented Dec 9, 2022 via email

jimidle commented May 4, 2023 • edited Loading

jimidle commented May 4, 2023 • edited Loading

kaby76 commented May 4, 2023

kaby76 commented Sep 8, 2022 •

edited

Loading

jimidle commented Sep 9, 2022 via email •

edited

Loading

KvanTTT commented Sep 9, 2022 •

edited

Loading

KvanTTT commented Nov 21, 2022 •

edited

Loading

kaby76 commented Dec 8, 2022 •

edited

Loading

jimidle commented May 4, 2023 •

edited

Loading

jimidle commented May 4, 2023 •

edited

Loading