Improve JSON Parser Performances #453

satabin · 2023-02-26T21:02:23Z

Follow-up from #489

This PR implements several optimizations, resulting from observations from profilers:

Smarter CharLikeChunk implementations
- It avoids adding characters one by one in the result and offers a way to batch copy a range in a new buffer
- It accesses characters in an array instead of a string, as String.charAt makes more checks than array access
- Since the new ones are not binary compatible, the parser specializes for the new implementations under the hood
- Inheritance is marked as deprecated and trait will be sealed in the future
- Other text based parser will be able to benefit from the improvements
Implementation of a faster stateless version of string parsing when there is no escape and fall back to the slower one on the first escape encountered in the string
Make various buffer size configurable
- Depending on the specific data users are handling, they might want to allocate bigger or smaller buffer when building strings for keys, string values, or numbers.
- The right initial buffer size can avoid buffer resizing (and hence data copy)
- No documentation is added for those settings yet as they are experimental and might change in the future
Delete various boxing and Option allocations throughout the parser.
The improvements performed in this PR do not change the way the AST is built. This work will be done in a following PR.

With these improvements in place, we get these results on my machine (openjdk 19, Nixos, Intel Core i5-6500, 24GiB RAM):

JsonParserBenchmarks.parseCirceFs2           avgt   10  5541.715 ± 24.544  us/op
JsonParserBenchmarks.parseJsonFs2DataTokens  avgt   10  3800.039 ± 46.111  us/op
JsonParserBenchmarks.parseJsonFs2DataValues  avgt   10  6716.085 ± 60.833  us/op
JsonParserBenchmarks.parseJsonJawn           avgt   10  2019.646 ± 41.275  us/op
JsonValueBenchmarks.parseEscapedString       avgt   10     2.996 ±  0.069  us/op
JsonValueBenchmarks.parseSimpleString        avgt   10     2.712 ±  0.063  us/op

Compared to jawn the token parser is now only ~1.88x slower (previously ~2.57x slower) and the value parser is ~3.33x slower (previously ~3.85x slower).

Some progress can probably be achieved in the AST construction and will be addressed later, to bring fs2-data on par with circe-fs2.

Testing showed that a buffer of 128 bytes yields good results. Making it configurable through JVM parameters allows for use to tune the sizes for their local needs if the default values do not fit their data.

This parser includes all performance improvements that do not require a change in `CharLikeChunks`, allowing to be compatible with potential custom implementation of the trait in user code.

ybasket

Left one comment, but looks good, nice work!

text/shared/src/main/scala/fs2/data/text/CharLikeChunks.scala

satabin added enhancement New feature or request json labels Feb 26, 2023

satabin added this to the 1.7.0 milestone Feb 26, 2023

satabin modified the milestones: 1.7.0, 1.8.0 Mar 21, 2023

satabin force-pushed the benchmarks/json-parser branch from 4768da8 to 0bc8912 Compare June 6, 2023 21:25

satabin changed the title ~~Benchmark and improve JSON parser performances~~ Improve JSON Parser Performances Jun 21, 2023

satabin added 13 commits June 21, 2023 19:42

Make json strinbuilder capacity configurable

bf4f48a

Testing showed that a buffer of 128 bytes yields good results. Making it configurable through JVM parameters allows for use to tune the sizes for their local needs if the default values do not fit their data.

Remove Option allocations

b41fb8a

Tiny refactoring for better readability

c6deccc

Share key accumulator

48232bd

Make array char access instead of string accesses

b2958b3

[Breaking] Share buffer from chunk to avoid to many copies

7a71a16

Specialize parser when char chunk can be marked

bf32480

Add legacy parser implementation with tests

5cc16c5

This parser includes all performance improvements that do not require a change in `CharLikeChunks`, allowing to be compatible with potential custom implementation of the trait in user code.

Make it cross compile with scala 2.12

1f29c54

Add baseline benchmark for comparison

9b9f818

Remove warning in scala 3

325234d

Make CI happy

c913da4

Remove useless legacy code duplication

3b62258

satabin force-pushed the benchmarks/json-parser branch from b7dcc96 to 3b62258 Compare June 21, 2023 17:49

satabin added 3 commits June 21, 2023 19:56

Delete unused class in benchmarks

c55f7b6

Fix deprecated version

5e4ab10

Fix deprecation version

517d19e

satabin force-pushed the benchmarks/json-parser branch 3 times, most recently from 66626d5 to 23ddee5 Compare June 21, 2023 18:35

Keep old implicits for binary compatibility and deprecate them

f8fbb11

satabin force-pushed the benchmarks/json-parser branch from 23ddee5 to f8fbb11 Compare June 21, 2023 18:40

satabin marked this pull request as ready for review June 21, 2023 18:56

satabin requested a review from a team as a code owner June 21, 2023 18:56

ybasket approved these changes Jun 23, 2023

View reviewed changes

text/shared/src/main/scala/fs2/data/text/CharLikeChunks.scala Show resolved Hide resolved

satabin merged commit 5f94665 into main Jun 26, 2023

satabin deleted the benchmarks/json-parser branch June 26, 2023 07:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve JSON Parser Performances #453

Improve JSON Parser Performances #453

satabin commented Feb 26, 2023 •

edited

Loading

ybasket left a comment

Improve JSON Parser Performances #453

Improve JSON Parser Performances #453

Conversation

satabin commented Feb 26, 2023 • edited Loading

ybasket left a comment

Choose a reason for hiding this comment

satabin commented Feb 26, 2023 •

edited

Loading