Skip to content

Commit

Permalink
doc: update get_from bench and readme
Browse files Browse the repository at this point in the history
  • Loading branch information
liuq19 committed Oct 27, 2023
1 parent 9ffbae3 commit c3f6265
Show file tree
Hide file tree
Showing 9 changed files with 91 additions and 31 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ jobs:
- name: Run tests
run: |
cargo check
cargo run --examples
cargo test
cargo install cargo-fuzz
cargo +nightly fuzz run fuzz_value -- -max_total_time=5m
Expand All @@ -50,6 +51,7 @@ jobs:
run: |
cargo check
cargo test
cargo run --examples
lint:
runs-on: [self-hosted, X64]
Expand Down
29 changes: 25 additions & 4 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ categories = ["encoding", "parser-implementations"]
cfg-if = "1.0"
arrayref = "0.3"
packed_simd = { version = "0.3", package = "packed_simd" }

serde = { version = "1.0", default-features = false }
itoa = "1.0"
ryu = "1.0"
Expand All @@ -31,7 +32,7 @@ simdutf8 = "0.1"
jemallocator = "0.5"
serde = { version = "1.0", features = ["derive"] }
serde_json = { version = "1.0", features = ["float_roundtrip", "raw_value"] }
simd-json = "0.12"
simd-json = "0.13"
core_affinity = "0.8"
criterion = { version = "0.5", features = ["html_reports"] }
gjson = "0.8"
Expand Down
31 changes: 22 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ Model name: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
```
Benchmarks:

- Deserialize Struct: Deserialize the JSON into Rust struct. The defined struct and testdata is from [json-benchmark][https://github.com/serde-rs/json-benchmark]
- Deserialize Struct: Deserialize the JSON into Rust struct. The defined struct and testdata is from [json-benchmark](https://github.com/serde-rs/json-benchmark)

- Deseirlize Untyped: Deseialize the JSON into a document

Expand Down Expand Up @@ -223,12 +223,21 @@ citm_catalog/serde_json::to_string

`cargo bench --bench get_from -- --quiet`

The benchmark is getting a specific field from the twitter JSON. In both sonic-rs and gjson, the JSON should be well-formed and valid when using get or get_from. Sonic-rs utilize SIMD to quickly skip unnecessary fields, thus enhancing the performance.
The benchmark is getting a specific field from the twitter JSON.

- sonic-rs::get_unchecked_from_str: without validate
- sonic-rs::get_from_str: with validate
- gjson::get_from_str: without validate

Sonic-rs utilize SIMD to quickly skip unnecessary fields in the unchecked case, thus enhancing the performance.

```
twitter/sonic-rs::get_unchecked_from_str
time: [67.390 µs 68.121 µs 69.028 µs]
twitter/sonic-rs::get_from_str
time: [79.432 µs 80.008 µs 80.738 µs]
twitter/gjson::get time: [344.41 µs 351.36 µs 362.03 µs]
time: [428.33 µs 437.55 µs 448.50 µs]
twitter/gjson::get_from_str
time: [348.30 µs 355.34 µs 364.13 µs]
```

## Usage
Expand Down Expand Up @@ -267,7 +276,10 @@ fn main() {

### Get a field from JSON

Get a specific field from a JSON with the `pointer` path. The return is a `LazyValue`, which is a wrapper of a raw JSON slice. Note that the JSON must be valid and well-formed, otherwise it may return unexpected result.
Get a specific field from a JSON with the `pointer` path. The return is a `LazyValue`, which is a wrapper of a raw valid JSON slice.

We provide the `get` and `get_unchecked` apis. `get_unchecked` apis should be used in valid JSON, otherwise it may return unexpected result.


```rs
use sonic_rs::{get_from_str, pointer, JsonValue, PointerNode};
Expand All @@ -277,7 +289,8 @@ fn main() {
let json = r#"
{"u": 123, "a": {"b" : {"c": [null, "found"]}}}
"#;
let target = unsafe { get_from_str(json, &path).unwrap() };
let target = get(json, &path).unwrap() };
// or let target = unsafe { get_unchecked(json, &path).unwrap() };
assert_eq!(target.as_raw_str(), r#""found""#);
assert_eq!(target.as_str().unwrap(), "found");

Expand All @@ -286,7 +299,7 @@ fn main() {
{"u": 123, "a": {"b" : {"c": [null, "found"]}}}
"#;
// not found from json
let target = unsafe { get_from_str(json, &path) };
let target = get(json, &path);
assert!(target.is_err());
}
```
Expand Down Expand Up @@ -347,13 +360,13 @@ use sonic_rs::{to_array_iter, JsonValue};

fn main() {
let json = Bytes::from(r#"[1, 2, 3, 4, 5, 6]"#);
let iter = unsafe { to_array_iter(&json) };
let iter = to_array_iter(&json);
for (i, v) in iter.enumerate() {
assert_eq!(i + 1, v.as_u64().unwrap() as usize);
}

let json = Bytes::from(r#"[1, 2, 3, 4, 5, 6"#);
let iter = unsafe { to_array_iter(&json) };
let iter = to_array_iter(&json);
for elem in iter {
// deal with errors when invalid json
if elem.is_err() {
Expand Down
28 changes: 20 additions & 8 deletions README_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ Model name: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz

基准测试主要有两个方面:

- 解析到结构体:定义的结构体和测试数据来自 [json-benchmark][https://github.com/serde-rs/json-benchmark]
- 解析到结构体:定义的结构体和测试数据来自 [json-benchmark](https://github.com/serde-rs/json-benchmark)

- 解析到 document

Expand Down Expand Up @@ -223,12 +223,21 @@ citm_catalog/serde_json::to_string

`cargo bench --bench get_from -- --quiet`

基准测试是从 twitter JSON 中获取特定字段。在 sonic-rs 和 gjson 中,使用 get 或 get_from 时,JSON 应该格式正确且有效。Sonic-rs 利用 SIMD 快速跳过不必要的字段,从而提高性能。
基准测试是从 twitter JSON 中获取特定字段。

- sonic-rs::get_unchecked_from_str: 不校验json
- sonic-rs::get_from_str: 校验json
- gjson::get_from_str: 不校验json

在 get_unchecked_from_str 中,Sonic-rs 利用 SIMD 快速跳过不必要的字段,从而提高性能。

```
twitter/sonic-rs::get_unchecked_from_str
time: [67.390 µs 68.121 µs 69.028 µs]
twitter/sonic-rs::get_from_str
time: [79.432 µs 80.008 µs 80.738 µs]
twitter/gjson::get time: [344.41 µs 351.36 µs 362.03 µs]
time: [428.33 µs 437.55 µs 448.50 µs]
twitter/gjson::get_from_str
time: [348.30 µs 355.34 µs 364.13 µs]
```

## 用法
Expand Down Expand Up @@ -267,7 +276,9 @@ fn main() {

### 从 JSON 中获取字段

使用 `pointer` 路径从 JSON 中获取特定字段。返回的是 `LazyValue`,本质上是一段未解析的 JSON 切片。请注意,使用该 API 需要保证 JSON 是格式良好且有效的,否则可能返回非预期结果。
使用 `pointer` 路径从 JSON 中获取特定字段。返回的是 `LazyValue`,本质上是一段未解析的 JSON 切片。

sonic-rs 提供了 `get``get_unchecked` 两种接口。请注意,如果使用 `unchecked` 接口,需要保证 输入的JSON 是格式良好且合法的,否则可能返回非预期结果。

```rs
use sonic_rs::{get_from_str, pointer, JsonValue, PointerNode};
Expand All @@ -277,7 +288,8 @@ fn main() {
let json = r#"
{"u": 123, "a": {"b" : {"c": [null, "found"]}}}
"#;
let target = unsafe { get_from_str(json, &path).unwrap() };
let target = get(json, &path).unwrap() };
// or let target = unsafe { get_unchecked(json, &path).unwrap() };
assert_eq!(target.as_raw_str(), r#""found""#);
assert_eq!(target.as_str().unwrap(), "found");

Expand Down Expand Up @@ -348,13 +360,13 @@ use sonic_rs::{to_array_iter, JsonValue};

fn main() {
let json = Bytes::from(r#"[1, 2, 3, 4, 5, 6]"#);
let iter = unsafe { to_array_iter(&json) };
let iter = to_array_iter(&json);
for (i, v) in iter.enumerate() {
assert_eq!(i + 1, v.as_u64().unwrap() as usize);
}

let json = Bytes::from(r#"[1, 2, 3, 4, 5, 6"#);
let iter = unsafe { to_array_iter(&json) };
let iter = to_array_iter(&json);
for elem in iter {
// deal with errors when invalid json
if elem.is_err() {
Expand Down
12 changes: 10 additions & 2 deletions benches/get_from.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,23 @@ fn bench_get(c: &mut Criterion) {

let mut group = c.benchmark_group("twitter");

group.bench_with_input("sonic-rs::get_from_str", data, |b, data| {
group.bench_with_input("sonic-rs::get_unchecked_from_str", data, |b, data| {
b.iter_batched(
|| data,
|json| unsafe { sonic_rs::get_unchecked(json, &rpath) },
BatchSize::SmallInput,
)
});

group.bench_with_input("gjson::get", data, |b, data| {
group.bench_with_input("sonic-rs::get_from_str", data, |b, data| {
b.iter_batched(
|| data,
|json| sonic_rs::get(json, &rpath),
BatchSize::SmallInput,
)
});

group.bench_with_input("gjson::get_from_str", data, |b, data| {
b.iter_batched(
|| data,
|json| gjson::get(json, gpath),
Expand Down
10 changes: 7 additions & 3 deletions examples/get_from.rs
Original file line number Diff line number Diff line change
@@ -1,19 +1,23 @@
use sonic_rs::{get_from_str_unchecked, pointer, JsonValue, PointerNode};
use sonic_rs::{get, get_unchecked, pointer, JsonValue, PointerNode};

fn main() {
let path = pointer!["a", "b", "c", 1];
let json = r#"
{"u": 123, "a": {"b" : {"c": [null, "found"]}}}
"#;
let target = unsafe { get_from_str_unchecked(json, &path).unwrap() };
let target = unsafe { get_unchecked(json, &path).unwrap() };
assert_eq!(target.as_raw_str(), r#""found""#);
assert_eq!(target.as_str().unwrap(), "found");

let target = get(json, &path);
assert_eq!(target.as_str().unwrap(), "found");
assert_eq!(target.unwrap().as_raw_str(), r#""found""#);

let path = pointer!["a", "b", "c", "d"];
let json = r#"
{"u": 123, "a": {"b" : {"c": [null, "found"]}}}
"#;
// not found from json
let target = unsafe { get_from_str_unchecked(json, &path) };
let target = get(json, &path);
assert!(target.is_err());
}
2 changes: 0 additions & 2 deletions src/lazyvalue/iterator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ use crate::reader::SliceRead;
use faststr::FastStr;

/// A lazied iterator for JSON object.
/// ObjectIterator can be used as `into_iter` directly.
pub struct ObjectIntoIter<'de> {
json: JsonSlice<'de>,
parser: Option<Parser<SliceRead<'static>>>,
Expand All @@ -17,7 +16,6 @@ pub struct ObjectIntoIter<'de> {
}

/// A lazied iterator for JSON array.
/// ArrayIterator can be used as `into_iter` directly.
pub struct ArrayIntoIter<'de> {
json: JsonSlice<'de>,
parser: Option<Parser<SliceRead<'static>>>,
Expand Down
5 changes: 3 additions & 2 deletions src/lazyvalue/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@ mod iterator;
mod value;

pub use get::{
get_from_bytes_unchecked, get_from_faststr_unchecked, get_from_slice_unchecked,
get_from_str_unchecked, get_many_unchecked, get_unchecked,
get, get_from_bytes, get_from_bytes_unchecked, get_from_faststr, get_from_faststr_unchecked,
get_from_slice, get_from_slice_unchecked, get_from_str, get_from_str_unchecked, get_many,
get_many_unchecked, get_unchecked,
};
pub use iterator::{to_array_iter, to_object_iter, ArrayIntoIter, ObjectIntoIter};
pub use value::LazyValue;

0 comments on commit c3f6265

Please sign in to comment.