Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Span tracking #1246

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Span tracking #1246

wants to merge 1 commit into from

Conversation

robsdedude
Copy link

@robsdedude robsdedude commented Mar 11, 2025

Introducing optional span tracking (line, column, byte offset).

The feature is behind a new feature flag called spanned. It introduces

  • Spanned<T>, with wich you can get serde_json (when calling the appropriate deserializer) to inject span information into custom Deserialize types.
  • SpannedValue a copy of Value that uses Spanned<T> to provide span information.

This PR still needs a bit of fixing up:

  • clean out quick testing code
  • Fix relying on Read::position as its column isn't what we want and its line is slow to compute
    • While doing so, make sure to fix spans including leading white space

And then a good chunk of polishing:

  • DRY up the code (much of SpannedValue functionality is 90% copy-pasta from Value)
  • Write tests
  • Update/write docs
  • Maybe add the feature to the default features for the playground?

However, before I proceed with any of that, and since I've already put a good amount hours into this, I'd first like to get initial feedback on/review of the general design (API, outlined provided functionality).

Here's the gist of how it'd work

Sample code with output
#[cfg(test)]
mod test {
    use crate::spanned::*;
    use crate::*;

    fn show_spanned<T: core::fmt::Debug>(s: &Spanned<T>, source: &str) {
        use codespan_reporting::diagnostic::{Diagnostic, Label};
        use codespan_reporting::files::SimpleFiles;
        use codespan_reporting::term;
        use codespan_reporting::term::termcolor::{ColorChoice, StandardStream};

        let mut files = SimpleFiles::new();
        let file_id = files.add("input", source);
        let diagnostic = Diagnostic::note()
            .with_message(std::format!("Look, it's a {}", std::any::type_name::<T>()))
            .with_notes(std::vec![
                std::format!("{:?}", s.as_ref()),
                std::format!(
                    "From {}:{} ({}) to {}:{} ({})",
                    s.span().start.line,
                    s.span().start.column,
                    s.span().start.byte_offset,
                    s.span().end.line,
                    s.span().end.column,
                    s.span().end.byte_offset,
                )
            ])
            .with_labels(std::vec![Label::primary(file_id, s.byte_span())]);

        let writer = StandardStream::stderr(ColorChoice::Always);
        let config = codespan_reporting::term::Config::default();
        let mut writer_lock = writer.lock();

        term::emit(&mut writer_lock, &config, &files, &diagnostic).unwrap()
    }

    fn show_recursive(v: &Spanned<SpannedValue>, source: &str) {
        show_spanned(v, source);
        match v.get_ref() {
            SpannedValue::Array(values) => values.iter().for_each(|v| show_recursive(v, source)),
            SpannedValue::Object(map) => map.iter().for_each(|(k, v)| {
                show_spanned(k, source);
                show_recursive(v, source)
            }),
            _ => {}
        }
    }

    #[test]
    fn test_spanned_value() {
        let data = r#"
        {
            "name": "John Doe",
            "age": 42,
            "phones":    [
                "+44 1234567",
                "+44 2345678"
            ]
        }"#;

        let v: Spanned<SpannedValue> = from_str_spanned(data).unwrap();
        show_recursive(&v, data);
    }
}

Produces

note: Look, it's a serde_json::spanned::value::SpannedValue
  ┌─ input:1:1
  │  
1 │ ╭ 
2 │ │         {
3 │ │             "name": "John Doe",
4 │ │             "age": 42,
  · │
8 │ │             ]
9 │ │         }
  │ ╰─────────^
  │  
  = Object {Spanned { span: Span { start: SpanPosition { line: 3, column: 13, byte_offset: 23 }, end: SpanPosition { line: 3, column: 18, byte_offset: 29 } }, value: "name" }: Spanned { span: Span { start: SpanPosition { line: 3, column: 20, byte_offset: 30 }, end: SpanPosition { line: 3, column: 30, byte_offset: 41 } }, value: String("John Doe") }, Spanned { span: Span { start: SpanPosition { line: 4, column: 13, byte_offset: 55 }, end: SpanPosition { line: 4, column: 17, byte_offset: 60 } }, value: "age" }: Spanned { span: Span { start: SpanPosition { line: 4, column: 19, byte_offset: 61 }, end: SpanPosition { line: 4, column: 21, byte_offset: 64 } }, value: Number(42) }, Spanned { span: Span { start: SpanPosition { line: 5, column: 13, byte_offset: 78 }, end: SpanPosition { line: 5, column: 20, byte_offset: 86 } }, value: "phones" }: Spanned { span: Span { start: SpanPosition { line: 5, column: 22, byte_offset: 87 }, end: SpanPosition { line: 8, column: 13, byte_offset: 167 } }, value: Array [Spanned { span: Span { start: SpanPosition { line: 6, column: 17, byte_offset: 109 }, end: SpanPosition { line: 6, column: 29, byte_offset: 122 } }, value: String("+44 1234567") }, Spanned { span: Span { start: SpanPosition { line: 7, column: 17, byte_offset: 140 }, end: SpanPosition { line: 7, column: 29, byte_offset: 153 } }, value: String("+44 2345678") }] }}
  = From 1:1 (0) to 9:9 (177)

note: Look, it's a alloc::string::String
  ┌─ input:3:13
  │
3 │             "name": "John Doe",
  │             ^^^^^^
  │
  = "name"
  = From 3:13 (23) to 3:18 (29)

note: Look, it's a serde_json::spanned::value::SpannedValue
  ┌─ input:3:20
  │
3 │             "name": "John Doe",
  │                    ^^^^^^^^^^^
  │
  = String("John Doe")
  = From 3:20 (30) to 3:30 (41)

note: Look, it's a alloc::string::String
  ┌─ input:4:13
  │
4 │             "age": 42,
  │             ^^^^^
  │
  = "age"
  = From 4:13 (55) to 4:17 (60)

note: Look, it's a serde_json::spanned::value::SpannedValue
  ┌─ input:4:19
  │
4 │             "age": 42,
  │                   ^^^
  │
  = Number(42)
  = From 4:19 (61) to 4:21 (64)

note: Look, it's a alloc::string::String
  ┌─ input:5:13
  │
5 │             "phones":    [
  │             ^^^^^^^^
  │
  = "phones"
  = From 5:13 (78) to 5:20 (86)

note: Look, it's a serde_json::spanned::value::SpannedValue
  ┌─ input:5:22
  │  
5 │               "phones":    [
  │ ╭──────────────────────^
6 │ │                 "+44 1234567",
7 │ │                 "+44 2345678"
8 │ │             ]
  │ ╰─────────────^
  │  
  = Array [Spanned { span: Span { start: SpanPosition { line: 6, column: 17, byte_offset: 109 }, end: SpanPosition { line: 6, column: 29, byte_offset: 122 } }, value: String("+44 1234567") }, Spanned { span: Span { start: SpanPosition { line: 7, column: 17, byte_offset: 140 }, end: SpanPosition { line: 7, column: 29, byte_offset: 153 } }, value: String("+44 2345678") }]
  = From 5:22 (87) to 8:13 (167)

note: Look, it's a serde_json::spanned::value::SpannedValue
  ┌─ input:6:17
  │
6 │                 "+44 1234567",
  │                 ^^^^^^^^^^^^^
  │
  = String("+44 1234567")
  = From 6:17 (109) to 6:29 (122)

note: Look, it's a serde_json::spanned::value::SpannedValue
  ┌─ input:7:17
  │
7 │                 "+44 2345678"
  │                 ^^^^^^^^^^^^^
  │
  = String("+44 2345678")
  = From 7:17 (140) to 7:29 (153)

Closes: #637

@robsdedude robsdedude marked this pull request as draft March 11, 2025 08:29
@@ -36,6 +35,8 @@ pub struct Deserializer<R> {
single_precision: bool,
#[cfg(feature = "unbounded_depth")]
disable_recursion_limit: bool,
#[cfg(feature = "spanned")]
spanned_enabled: bool,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design note on why I decided to only add support for spans when explicitly enabled (e.g., by calling from_str_spanned instead of from_str):

Otherwise, this would be a breaking change for anyone passing a struct whose Deserialize implementation calls deserialize_struct on serde_json's Deserializer with arguments satisfying is_spanned. Admittedly, this is highly unlikely, but I'd rather stick to hard SemVer rules. If you disagree, I'm happy to put in this functionality unconditionally.

Another benefit of this design is that when fleshing out the implementation, this allows me to move the position tracking into the Deserializer (as Read's tracking seems to be insufficient) and only engage in the counting/tracking if the user indicated that they're interested in span information in the first place (by calling one of the new APIs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Get line and column number on valid JSON?
1 participant