Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve JSON Parser Performances #453

Merged
merged 17 commits into from
Jun 26, 2023
Merged

Improve JSON Parser Performances #453

merged 17 commits into from
Jun 26, 2023

Conversation

satabin
Copy link
Member

@satabin satabin commented Feb 26, 2023

Follow-up from #489

This PR implements several optimizations, resulting from observations from profilers:

  • Smarter CharLikeChunk implementations
    • It avoids adding characters one by one in the result and offers a way to batch copy a range in a new buffer
    • It accesses characters in an array instead of a string, as String.charAt makes more checks than array access
    • Since the new ones are not binary compatible, the parser specializes for the new implementations under the hood
    • Inheritance is marked as deprecated and trait will be sealed in the future
    • Other text based parser will be able to benefit from the improvements
  • Implementation of a faster stateless version of string parsing when there is no escape and fall back to the slower one on the first escape encountered in the string
  • Make various buffer size configurable
    • Depending on the specific data users are handling, they might want to allocate bigger or smaller buffer when building strings for keys, string values, or numbers.
    • The right initial buffer size can avoid buffer resizing (and hence data copy)
    • No documentation is added for those settings yet as they are experimental and might change in the future
  • Delete various boxing and Option allocations throughout the parser.
  • The improvements performed in this PR do not change the way the AST is built. This work will be done in a following PR.

With these improvements in place, we get these results on my machine (openjdk 19, Nixos, Intel Core i5-6500, 24GiB RAM):

JsonParserBenchmarks.parseCirceFs2           avgt   10  5541.715 ± 24.544  us/op
JsonParserBenchmarks.parseJsonFs2DataTokens  avgt   10  3800.039 ± 46.111  us/op
JsonParserBenchmarks.parseJsonFs2DataValues  avgt   10  6716.085 ± 60.833  us/op
JsonParserBenchmarks.parseJsonJawn           avgt   10  2019.646 ± 41.275  us/op
JsonValueBenchmarks.parseEscapedString       avgt   10     2.996 ±  0.069  us/op
JsonValueBenchmarks.parseSimpleString        avgt   10     2.712 ±  0.063  us/op

Compared to jawn the token parser is now only ~1.88x slower (previously ~2.57x slower) and the value parser is ~3.33x slower (previously ~3.85x slower).

Some progress can probably be achieved in the AST construction and will be addressed later, to bring fs2-data on par with circe-fs2.

@satabin satabin added enhancement New feature or request json labels Feb 26, 2023
@satabin satabin added this to the 1.7.0 milestone Feb 26, 2023
@satabin satabin modified the milestones: 1.7.0, 1.8.0 Mar 21, 2023
@satabin satabin force-pushed the benchmarks/json-parser branch from 4768da8 to 0bc8912 Compare June 6, 2023 21:25
@satabin satabin changed the title Benchmark and improve JSON parser performances Improve JSON Parser Performances Jun 21, 2023
satabin added 13 commits June 21, 2023 19:42
Testing showed that a buffer of 128 bytes yields good results.
Making it configurable through JVM parameters allows for use to tune the
sizes for their local needs if the default values do not fit their data.
This parser includes all performance improvements that do not require a
change in `CharLikeChunks`, allowing to be compatible with potential
custom implementation of the trait in user code.
@satabin satabin force-pushed the benchmarks/json-parser branch from b7dcc96 to 3b62258 Compare June 21, 2023 17:49
@satabin satabin force-pushed the benchmarks/json-parser branch 3 times, most recently from 66626d5 to 23ddee5 Compare June 21, 2023 18:35
@satabin satabin force-pushed the benchmarks/json-parser branch from 23ddee5 to f8fbb11 Compare June 21, 2023 18:40
@satabin satabin marked this pull request as ready for review June 21, 2023 18:56
@satabin satabin requested a review from a team as a code owner June 21, 2023 18:56
Copy link
Collaborator

@ybasket ybasket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment, but looks good, nice work!

@satabin satabin merged commit 5f94665 into main Jun 26, 2023
@satabin satabin deleted the benchmarks/json-parser branch June 26, 2023 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request json
Development

Successfully merging this pull request may close these issues.

2 participants