-
-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XML benchmark #334
XML benchmark #334
Conversation
Initial result:
|
The fs2-data parser crashes when I generate CDATA: rossabaker@6851774. I have not minimized this yet. |
|
xmlStream | ||
.through(fs2.text.utf8.decode) | ||
.through(fs2.data.xml.events()) | ||
.through(fs2.data.xml.dom.documents) | ||
.compile | ||
.lastOrError | ||
.unsafeRunSync() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just discovered fs2.data.text, but this does not make it faster:
import fs2.data.text.utf8._
xmlStream
.through(fs2.data.xml.events())
.through(fs2.data.xml.dom.documents)
.compile
.lastOrError
.unsafeRunSync()
Thanks for the contribution. I will have a look at the crash, the XML parser is complex and still not widely used, so there might be some corner cases not covered correctly yet. I’ll try to fix this. Regarding parser performance, I did not put the emphasis on it yet, and the document builder is also naive. It would be interesting to compare event parsing vs document building, to identify where the bottleneck might be. |
It looks like most of the time is spent in the event parser:
|
Thanks for the measurements. Once crashes are fixed I will try to identify where the bottleneck is in the parser and make an attempt to address it. |
I tried to reproduce it locally, but I could parse it with success (both with and without DOM building). Do you have a stack trace to share? |
Oh, interesting. From that linked commit,
The -- I also sketched a parser based on aalto-xml, which has incremental properties that make it a great fit for fs2. That parser is not yet production grade, doesn't emit events, and won't work on Scala.js. But it's currently coming in at about 4500us/op. If it's still that fast when productionized, I think it will be strictly better than the scala-xml SAX parsing, but that fs2-data remains compelling for its other features. |
Do you run the benchmark on scala JVM or scala JS? |
JVM, JMH doesn't run on Scala.js. |
I think I found the bug. In - accept(ctx, s, chunkAcc)
+ loop(ctx, sidx, chunkAcc) Having CDATA at chunk boundary fails currently. Does it fix it for you? If yes I’ll do a proper bugfix PR. |
Crash should be fixed by #335. |
Nice work. The CDATA is passing. The numbers are all worse, but the XML is bigger. Relative is what's interesting:
|
Thanks for this awesome work. I think we can merge it, and use it as a baseline to improve performances. |
An initial benchmark comparing converting a stream with fs2-data-scala-xml and raw scala-xml with SAX. The sample XML is generated from onlinerandomtools.com.
The first commit fixes some bitrot on the CSV benchmark and could be its own PR, if you prefer.
Fixes #333.