Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: update roadmap, performance, readme document #16

Merged
merged 2 commits into from
Oct 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 84 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,27 +9,67 @@
[actions-badge]: https://github.com/cloudwego/sonic-rs/actions/workflows/ci.yaml/badge.svg
[actions-url]: https://github.com/cloudwego/sonic-rs/actions

A fast Rust JSON library based on SIMD. It has some references to other open-source libraries like [sonic_cpp](https://github.com/bytedance/sonic-cpp), [serde_json](https://github.com/serde-rs/json), [sonic](https://github.com/bytedance/sonic), [simdjson](https://github.com/simdjson/simdjson) and [rust-lang](https://github.com/rust-lang/rust).
English | [中文](README_ZH.md)

A fast Rust JSON library based on SIMD. It has some references to other open-source libraries like [sonic_cpp](https://github.com/bytedance/sonic-cpp), [serde_json](https://github.com/serde-rs/json), [sonic](https://github.com/bytedance/sonic), [simdjson](https://github.com/simdjson/simdjson), [rust-std](https://github.com/rust-lang/rust/tree/master/library/core/src/num) and more.

The main optimization in sonic-rs is the use of SIMD. However, we do not use the two-stage SIMD algorithms from `simd-json`. We primarily use SIMD in the following scenarios:
1. parsing/serialize long JSON strings
2. parsing the fraction of float number
3. Getting a specific elem or field from JSON
4. Skipping white spaces when parsing JSON

More details about optimization can be found in [performance.md](docs/performance.md).

## Requirements/Notes
1. Support x86_64 or aarch64. Note that the performance in aarch64 is low and it need to optimize.
2. Rust nightly version. Because we use the `packed_simd` crate.
3. Not validating the UTF-8 when parsing from slice by default. You can add the `utf8` feature to enable the validation. The performance loss is about 3% ~ 10%.

1. Support x86_64 or aarch64. Note that the performance in aarch64 is lower and needs optimization.
2. Requires Rust nightly version, as we use the `packed_simd` crate.
3. Does NOT validate the UTF-8 when parsing from a slice by default. You can use the `utf8` feature to enable validation. The performance loss is about 3% ~ 10%.
4. When using `get_from`, `get_many`, `JsonIter` or `RawValue`, ***Warn:*** the JSON should be well-formed and valid.

## Features
1. Serde into Rust struct as `serde_json` and `serde`
2. Parse/Serialize JSON for untyped document, and document can be mutable
3. Get specific fields from a JSON with blazing performance
4. Use JSON as a lazied array or object iterator
1. Serde into Rust struct as `serde_json` and `serde`.

2. Parse/Serialize JSON for untyped document, which can be mutable.

3. Get specific fields from a JSON with the blazing performance.

4. Use JSON as a lazy array or object iterator with the blazing performance.

5. Supprt `RawValue`, `Number` and `RawNumber`(just like Golang's `JsonNumber`) in default.

## Quick to use sonic-rs

To ensure that SIMD instruction is used in sonic-rs, you need to add rustflags `-C target-cpu=native` and compile on the host machine. For example, Rust flags can be configured in Cargo [config](.cargo/config).

Choose what features?
`default`: the fast version that does not validate UTF-8 when parsing for performance.
`utf8`: provides UTF-8 validation when parsing JSON from a slice.

## Benchmark

Benchmarks environemnt:

```
Architecture: x86_64
Model name: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
```
Benchmarks:

- Deserialize Struct: Deserialize the JSON into Rust struct. The defined struct and testdata is from [json-benchmark][https://github.com/serde-rs/json-benchmark]

- Deseirlize Untyped: Deseialize the JSON into a document

The serialize benchmarks work in the opposite way.


### Deserialize Struct (Enabled utf8 validation)

The benchmark will parse JSON into a Rust struct, and there are no unknown fields in JSON text. All fields are parsed into struct fields in the JSON.

Sonic-rs is faster than simd-json because simd-json (Rust) first parses the JSON into a `tape`, then parses the `tape` into a Rust struct. Sonic-rs directly parses the JSON into a Rust struct, and there are no temporary data structures. The [flamegraph](assets/pngs/) is profiled in the citm_catalog case.

### Deserialize Struct
`cargo bench --bench deserialize_struct --features utf8 -- --quiet`

```
Expand Down Expand Up @@ -61,7 +101,14 @@ canada/serde_json::from_str
time: [6.6534 ms 6.8373 ms 7.0402 ms]
```

### Deserialize Untyped

### Deserialize Untyped (Enabled utf8 validation)

The benchmark will parse JSON into a document. Sonic-rs seems faster for several reasons:
- There are also no temporary data structures in sonic-rs, as detailed above.
- Sonic-rs uses a memory arena for the whole document, resulting in fewer memory allocations, better cache-friendliness, and mutability.
- The JSON object in sonic-rs's document is actually a vector. Sonic-rs does not build a hashmap.

`cargo bench --bench deserialize_value --features utf8 -- --quiet`

```
Expand Down Expand Up @@ -101,8 +148,11 @@ canada/simd_json::slice_to_owned_value


### Serialize Untyped

`cargo bench --bench serialize_value -- --quiet`

We serialize the document into a string. In the following benchmarks, sonic-rs appears faster for the `twitter` JSON. The `twitter` JSON contains many long JSON strings, which fit well with sonic-rs's SIMD optimization.

```
twitter/sonic_rs::to_string
time: [380.90 µs 390.00 µs 400.38 µs]
Expand All @@ -128,6 +178,9 @@ canada/simd_json::to_string

### Serialize Struct
`cargo bench --bench serialize_struct -- --quiet`

The explanation is as mentioned above.

```
twitter/sonic_rs::to_string
time: [434.03 µs 448.25 µs 463.97 µs]
Expand Down Expand Up @@ -155,6 +208,8 @@ citm_catalog/serde_json::to_string

`cargo bench --bench get_from -- --quiet`

The benchmark is getting a specific field from the twitter JSON. In both sonic-rs and gjson, the JSON should be well-formed and valid when using get or get_from. Sonic-rs utilize SIMD to quickly skip unnecessary fields, thus enhancing the performance.

```
twitter/sonic-rs::get_from_str
time: [79.432 µs 80.008 µs 80.738 µs]
Expand All @@ -163,13 +218,13 @@ twitter/gjson::get time: [344.41 µs 351.36 µs 362.03 µs]

## Usage


### Serde into Rust Type

Directly use the `Deserialize` or `Serialize` trait, recommended use `sonic_rs::{Deserialize, Serialize}`.
Directly use the `Deserialize` or `Serialize` trait.

```rs
use sonic_rs::{Deserialize, Serialize};
use sonic_rs::{Deserialize, Serialize};
// sonic-rs re-exported them from serde
// or use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize)]
Expand Down Expand Up @@ -197,7 +252,7 @@ fn main() {

### Get a field from JSON

Get a specific field from a JSON with the `pointer` path. The return is a `LazyValue`, which is a wrapper of a raw JSON slice. Note that the JSON must be valid and well-formed, otherwise it may return unexpected result.
Get a specific field from a JSON with the `pointer` path. The return is a `LazyValue`, which is a wrapper of a raw JSON slice. Note that the JSON must be valid and well-formed, otherwise it may return unexpected result.

```rs
use sonic_rs::{get_from_str, pointer, JsonValue, PointerNode};
Expand All @@ -223,7 +278,7 @@ fn main() {

### Parse and Serialize into untyped Value

Parse a JSON as a document, and the document is mutable.
Parse a JSON into a document, which is mutable. Be aware that the document is managed by a `bump` allocator. It is recommended to convert documents into `Object/ObjectMut` or `Array/ArrayMut` to make them typed and easier to use.

```rs
use sonic_rs::value::{dom_from_slice, Value};
Expand Down Expand Up @@ -294,5 +349,19 @@ fn main() {
}
```

### JSON RawValue & Number & RawNumber

If we need parse a JSON value as a raw string, we can use `RawValue`.
If we need parse a JSON number into a untyped type, we can use `Number`.
If we need parse a JSON number ***without loss of percision***, we can use `RawNumber`. It likes `JsonNumber` in Golang, and can also be parsed from a JSON string.

Detailed examples can be found in [raw_value.rs](examples/raw_value.rs) and [json_number.rs](examples/json_number.rs).

## Acknowledgement

Thanks the following open-source libraries. sonic-rs has some references to other open-source libraries like [sonic_cpp](https://github.com/bytedance/sonic-cpp), [serde_json](https://github.com/serde-rs/json), [sonic](https://github.com/bytedance/sonic), [simdjson](https://github.com/simdjson/simdjson), [yyjson](https://github.com/ibireme/yyjson), [rust-std](https://github.com/rust-lang/rust/tree/master/library/core/src/num) and so on.

We rewrote many SIMD algorithms from sonic-cpp/sonic/simdjson/yyjson for performance. We reused the de/ser codes and modified necessary parts from serde_json to make high compatibility with `serde`. We resued part codes about floating parsing from rust-std to make it more accurate.

## Contributing
Please read `CONTRIBUTING.md` for information on contributing to sonic-rs.
Loading