Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using a "skip" token in parser rule causes major performance problems in the Go target parser #3875

Closed
kaby76 opened this issue Sep 8, 2022 · 36 comments

Comments

@kaby76
Copy link
Contributor

kaby76 commented Sep 8, 2022

I am updating grammars-v4 for 4.11.1. Unfortunately, the asm8080 and asmZ80 grammars now do not terminate (actually killed after 5 minutes by the watchdog program trwdog). For 4.10.1, they terminated. Neither of these grammars contain target-specific code, and are really really simple.

@kaby76
Copy link
Contributor Author

kaby76 commented Sep 9, 2022

Just in case I'm doing something wrong, I tested 4.11.1 on the SimpleBoolean issue that was reported. Indeed, 4.11.1 does fix that problem (fast!), but asm8080 now takes 376s for cpm22.asm.

@kaby76 kaby76 changed the title Go performance for 4.11.1 now a problem for some grammars Yes...Terrible Golang performance for 4.11.1 Sep 9, 2022
@kaby76
Copy link
Contributor Author

kaby76 commented Sep 9, 2022

I did a side-by-side comparison of the debug output from ParserATNSimulator between CSharp and Go targets for asm8080/cpm22.asm. The outputs are essentially identical which means they are computing the same results. But, clearly, the Go implementation is much much slower.

@jimidle
Copy link
Collaborator

jimidle commented Sep 9, 2022

I will take a look. There is still an issue with way too many allocations in parse_context. I know what it is (failure to 'emulate' Java objects), but I felt it was too much to try and cram in as a fix at the last minute- programmer checks in code and then gets on a plane type situation ;)

There was one fix that made it to what is the non-v4 version that did not get in to the v4 version (thought they were supposed to be identical). I have submitted a PR for that and is awaiting Ter's approval. But I doubt it is that because that fixes something that just did not work before.

Tomorrow (public holiday here), I will take a look at those grammars and work out what is going on - I suspect it is the allocation problem, but could be a hidden bug that I did not find yet.

I am surprised that they do not terminate (or did they just need more time to run?).

Thanks for the heads up. This code is taking some time to wrangle in to shape.

@jimidle
Copy link
Collaborator

jimidle commented Sep 9, 2022

BTW - I get this when generating, which is a correct warning:

warning(154): asm8080.g4:38:0: rule prog contains an optional block with at least one alternative that can match an empty string

ANTLR4 will deal with this I think, but that could be refactored to eliminate that warning. This may have something to do with it of course. I will not correct it until I understand why the go runtime seems to hate this grammar ;)

@kaby76
Copy link
Contributor Author

kaby76 commented Sep 9, 2022

@jimidle Sorry, the asm8080/cpm22.asm parse does terminate. It takes about 380 seconds.

The asm8080 grammar "prog" rule should be changed to something like this:

prog : ( EOL* | (line EOL)* line EOL*) EOF ;

(A "prog" can be an empty file, or a file with a bunch of EOLs, or a file with a bunch of "line" that can end with EOL or just EOF. This rule is a royal PITA. The Z80 grammar probably has a similar problem.)

But, even with the change, it still takes 380 seconds.

I noticed that my driver still contained the old "case folding buffer stream" code for Go with a reference to "github.com/antlr/antlr4/runtime/Go/antlr", not "github.com/antlr/antlr4/runtime/Go/antlr/v4"--didn't do a find/sed. So, it had both "github.com/antlr/antlr4/runtime/Go/antlr" and "github.com/antlr/antlr4/runtime/Go/antlr/v4" in the last build on grammars-v4. But, I changed that locally, and the parse still takes 380s.

I'll look at the grammar itself and see if there's a workaround.

@kaby76
Copy link
Contributor Author

kaby76 commented Sep 9, 2022

@jimidle My God this is such a bad grammar. It's almost wonderful in its "badness".

  • comment can't derive a default channel string.
line : lbl? (instruction | directive)? comment?;
comment : COMMENT;
COMMENT : ';' ~ [\r\n]* -> skip;

Antlr should flag this. @parrt

@kaby76
Copy link
Contributor Author

kaby76 commented Sep 9, 2022

Fixed these rules, now works fast.

prog  : EOL* ((line EOL)* line EOL*)? EOF ;
line : lbl? (instruction | directive) comment? | lbl comment? | comment ;
COMMENT : ';' ~ [\r\n]* ;

@jimidle
Copy link
Collaborator

jimidle commented Sep 9, 2022 via email

@kaby76
Copy link
Contributor Author

kaby76 commented Sep 9, 2022

@jimidle Thanks sounds good. I'll work on fixing up the grammars over in grammars-v4/asm.

The should also add to the "to do" a check that Antlr tool does to flag a grammar if a parser rule uses a "skip" or off-channel symbol.

@KvanTTT
Copy link
Member

KvanTTT commented Sep 9, 2022

@kaby76 related issue for skip tokens: #2886 (but with using of FRAGMENT instead of skip tokens). Unfortunately, unlikely it will be fixed because my fixing request was closed: #3018

@jimidle
Copy link
Collaborator

jimidle commented Sep 10, 2022

As the grammar was written, it causes an full context parse for every line in the file, and this exposes some problems in the go runtime there. While this grammar needs quite a bit of work to be "real", it has proven useful as an indicator of problems. Sometimes (no offence meant to the contributor) poorly put together grammars can be useful. We might want to keep that grammar as it is somewhere outside where it will currently be corrected, in order that it can be used as a regression test. If there is no general place to do this, then I will add it as regression test data in the go directory (Go will not download that when the module is imported - well, at least it won't when we get the tags correct and stop adding v4.x.x tags).

@jimidle
Copy link
Collaborator

jimidle commented Sep 10, 2022

For what it is worth, the "corrected" grammar, which I would write like this below takes 0.31 seconds on my system vs 0.37 for the unchanged original grammar. Actually I would probably take out the embedded literals in the parser grammar as well.

grammar asm8080;

prog
   : line* EOF
   ;

line
   : lbl? (instruction | directive)? EOL
   ;

instruction
   : opcode expressionlist?
   ;

opcode
   : OPCODE
   ;

register_
   : REGISTER
   ;

directive
   : argument? assemblerdirective expressionlist
   ;

assemblerdirective
   : ASSEMBLER_DIRECTIVE
   ;

lbl
   : label ':'?
   ;

expressionlist
   : expression (',' expression)*
   ;

label
   : name
   ;

expression
   : multiplyingExpression (('+' | '-') multiplyingExpression)*
   ;

multiplyingExpression
   : argument (('*' | '/') argument)*
   ;

argument
   : number
   | register_
   | dollar
   | name
   | string_
   | ('(' expression ')')
   ;

dollar
   : '$'
   ;

string_
   : STRING
   ;

name
   : NAME
   ;

number
   : NUMBER
   ;

ASSEMBLER_DIRECTIVE
   : (O R G) | (E N D) | (E Q U) | (D B) | (D W) | (D S) | (I F) | (E N D I F) | (S E T)
   ;


REGISTER
   : 'A' | 'B' | 'C' | 'D' | 'E' | 'H' | 'L' | 'PC' | 'SP'
   ;


OPCODE
   : (M O V) | (M V I) | (L D A) | (S T A) | (L D A X) | (S T A X) | (L H L D) | (S H L D) | (L X I) | (P U S H) | (P O P) | (X T H L) | (S P H L) | (P C H L) | (X C H G) | (A D D) | (S U B) | (I N R) | (D C R) | (C M P) | (A N A) | (O R A) | (X R A) | (A D I) | (S U I) | (C P I) | (A N I) | (O R I) | (X R I) | (D A A) | (A D C) | (A C I) | (S B B) | (S B I) | (D A D) | (I N X) | (D C X) | (J M P) | (C A L L) | (R E T) | (R A L) | (R A R) | (R L C) | (R R C) | (I N) | (O U T) | (C M C) | (S T C) | (C M A) | (H L T) | (N O P) | (D I) | (E I) | (R S T) | (J N Z) | (J Z) | (J N C) | (J C) | (J P O) | (J P E) | (J P) | (J M) | (C N Z) | (C Z) | (C N C) | (C C) | (C P O) | (C P E) | (C P) | (C M) | (R N Z) | (R Z) | (R N C) | (R C) | (R P O) | (R P E) | (R P) | (R M)
   ;


fragment A
   : ('a' | 'A')
   ;


fragment B
   : ('b' | 'B')
   ;


fragment C
   : ('c' | 'C')
   ;


fragment D
   : ('d' | 'D')
   ;


fragment E
   : ('e' | 'E')
   ;


fragment F
   : ('f' | 'F')
   ;


fragment G
   : ('g' | 'G')
   ;


fragment H
   : ('h' | 'H')
   ;


fragment I
   : ('i' | 'I')
   ;


fragment J
   : ('j' | 'J')
   ;


fragment K
   : ('k' | 'K')
   ;


fragment L
   : ('l' | 'L')
   ;


fragment M
   : ('m' | 'M')
   ;


fragment N
   : ('n' | 'N')
   ;


fragment O
   : ('o' | 'O')
   ;


fragment P
   : ('p' | 'P')
   ;


fragment Q
   : ('q' | 'Q')
   ;


fragment R
   : ('r' | 'R')
   ;


fragment S
   : ('s' | 'S')
   ;


fragment T
   : ('t' | 'T')
   ;


fragment U
   : ('u' | 'U')
   ;


fragment V
   : ('v' | 'V')
   ;


fragment W
   : ('w' | 'W')
   ;


fragment X
   : ('x' | 'X')
   ;


fragment Y
   : ('y' | 'Y')
   ;


fragment Z
   : ('z' | 'Z')
   ;


NAME
   : [a-zA-Z] [a-zA-Z0-9."]*
   ;


NUMBER
   : '$'? [0-9a-fA-F] + ('H' | 'h')?
   ;


COMMENT
   : ';' ~ [\r\n]* -> skip
   ;


STRING
   : '\u0027' ~'\u0027'* '\u0027'
   ;


EOL
   : [\r\n]
   ;


WS
   : [ \t] -> skip
   ;

@kaby76
Copy link
Contributor Author

kaby76 commented Sep 10, 2022

Excellent. I'll add your note to the "to do list".

@parrt
Copy link
Member

parrt commented Sep 15, 2022

I think it would be good to have a test that involved for context parsing all the time, @jimidle. Probably good for the other targets to check this as well.

@jimidle
Copy link
Collaborator

jimidle commented Sep 16, 2022 via email

@parrt
Copy link
Member

parrt commented Nov 20, 2022

@kaby76 I merged #3880 so maybe try the latest dev branch?

@kaby76
Copy link
Contributor Author

kaby76 commented Nov 20, 2022

@kaby76 I merged #3880 so maybe try the latest dev branch?

This works great. The Go target is roughly 2x the CSharp parse time for asm8080 and asmZ80 examples.

@kaby76 kaby76 closed this as completed Nov 20, 2022
@parrt
Copy link
Member

parrt commented Nov 20, 2022

awesome!

@jimidle
Copy link
Collaborator

jimidle commented Nov 21, 2022 via email

@kaby76
Copy link
Contributor Author

kaby76 commented Nov 21, 2022

Sorry, it's actually still there in the current "dev" branch--tested the wrong thing.

The problem is that the grammar references COMMENT on the RHS of a parser rule, and COMMENT is sent to "skip". That's a bad grammar. I applied a workaround to the grammar to not reference "skipped" tokens. (Note, I wrote an xpath grep expression to find grammars that do this, for $i in ( //parserRuleSpec//TOKEN_REF[text()=doc("*")//lexerRuleSpec[./lexerRuleBlock/lexerAltList/lexerAlt/lexerCommands/lexerCommand/lexerCommandName/identifier/RULE_REF/text()="skip"]/TOKEN_REF/text()]) return concat("line ", $i/@Line, " col ", $i/@Column, " """, $i/@Text,""""), and fixed it throughout the grammars-v4 repo.) But, the problem with the Go runtime still remains. But, I also haven't checked whether the "dev" branch of the Antlr Tool outputs a warning for this bad grammar.

CSharp and Java terminate fine with the bad ol' grammar, Go doesn't work well. The grammar to test is:

git clone https://github.com/antlr/grammars-v4.git
cd grammars-v4/asm/asm8080
git checkout b5bb1059db7534ff5a7d18f40b900d2023370ce9

@kaby76 kaby76 reopened this Nov 21, 2022
@parrt
Copy link
Member

parrt commented Nov 21, 2022

hi @kaby76 What's the warning you get? Seems like the tool not the target should give the warning maybe.

@kaby76
Copy link
Contributor Author

kaby76 commented Nov 21, 2022

@parrt Sorry, I guess I'm not clear. The problem is that asm8080 grammar used a lexer symbol that was "skip" on the right-hand side of a parser rule.

foo : BAR EOF;
BAR : 'foo' -> skip;
WS: [ \n\r\t]+ -> skip;

The symbol won't appear on the token stream because "skip" are never on the token stream. Even "channel(HIDDEN)" tokens seem dubious to use on the RHS of a parser rule. I don't think the Antlr4 Tool .jar checks for this, right?

@parrt
Copy link
Member

parrt commented Nov 21, 2022

I can see adding a warning for this since we can detect it. I don't think we weren't about it now.

Seems like there should be a syntax error or something from the parser though if the input is not matching because somebody's trying to match one of those hidden tokens.

@kaby76
Copy link
Contributor Author

kaby76 commented Nov 21, 2022

Yes, but the parser works if optional--at least for Java or CSharp.

foo: BAR? EOF;

@KvanTTT
Copy link
Member

KvanTTT commented Nov 21, 2022

I don't think the Antlr4 Tool .jar checks for this, right?

Correct. I consider these errors are very usefull since I've already met several users that encoutered the problem with not existing tokens.

Seems like there should be a syntax error or something from the parser though if the input is not matching because somebody's trying to match one of those hidden tokens.

It can be resolved on the grammar level.

@kaby76
Copy link
Contributor Author

kaby76 commented Nov 21, 2022

In any case, I'm not sure why Go would have such a fit with a grammar like this.

@dgryski
Copy link

dgryski commented Nov 24, 2022

If there is a simple Go binary that's taking too long, there are a number of Go performance geeks that might enjoy takng a crack at it (if you think the issue is specific to the Go side of things and not the antlr side of things).

@jimidle
Copy link
Collaborator

jimidle commented Nov 25, 2022 via email

@wdscxsj
Copy link

wdscxsj commented Dec 8, 2022

As a side note, the arena proposal to be implemented in Go v1.20 may help to improve the runtime performance.

@dgryski
Copy link

dgryski commented Dec 8, 2022

I'd like to see a consistent set of profiles indicating that allocations / garbage collection are the biggest issue before migrating to something like the proposed arena package.

@jimidle
Copy link
Collaborator

jimidle commented Dec 8, 2022 via email

@kaby76
Copy link
Contributor Author

kaby76 commented Dec 8, 2022

It seems that perhaps it’s time to
close this thread and start another with the state of things as they are,
which is not too bad.

This bug still exists for the current dev branch of the Go runtime (the version of the grammar is this). (I just checked this.) Are there PRs for the Go-target outstanding that I should be testing?

I also tested this grammar with the current dev branch of the tool. There is no warning message outputted for using a skip lexer symbol in a parser rule. (It does produce warning(154): asm8080.g4:38:0: rule prog contains an optional block with at least one alternative that can match an empty string, but it is unrelated to the problem.)

So, please do not close this Issue. I'm changing the name of the Issue to reflect the actual problem: using a "skip" token in a parser rule causes the Go-target parser to parse extremely slowly.

@kaby76 kaby76 changed the title Yes...Terrible Golang performance for 4.11.1 Using a "skip" token in parser rule causes major performance problems in the Go target parser Dec 8, 2022
@jimidle
Copy link
Collaborator

jimidle commented Dec 9, 2022 via email

@jimidle
Copy link
Collaborator

jimidle commented May 4, 2023

@kaby76 Hi Ken - I am reviewing all the open tickets with the Go runtime.

Using the grammar at the hash in your link above, with the dev runtime for go and the dev tool build, I get this:

~/tmp/asm8080 (develop ✘)✹✚✭ ᐅ time ./asm8080 ./CPM22.ASM                      
./asm8080 ./CPM22.ASM  0.13s user 0.02s system 163% cpu 0.092 total

So, I think we can definitely say that this is fixed. Please verify for yourself if you wish to and let us know here, then mark it as closable, or we can take my word for it and ask @parrt to close it. I am fine either way.

@jimidle
Copy link
Collaborator

jimidle commented May 4, 2023

As an extra bonus, if you switch the driver in to SLL mode, then the results from hyperfine are:

Command Mean [ms] Min [ms] Max [ms] Relative
./asm8080 ./CPM22.ASM 35.0 ± 0.4 34.2 36.5 1.00

And from time:

~/tmp/asm8080 (develop ✘)✹✚✭ ᐅ time ./asm8080 ./CPM22.ASM                    
./asm8080 ./CPM22.ASM  0.04s user 0.01s system 107% cpu 0.044 total

@kaby76
Copy link
Contributor Author

kaby76 commented May 4, 2023

Closing. Thanks.

@kaby76 kaby76 closed this as completed May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants