Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Span tracking #1246

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ serde_stacker = "0.1.8"
trybuild = { version = "1.0.81", features = ["diff"] }

[package.metadata.docs.rs]
features = ["preserve_order", "raw_value", "unbounded_depth"]
features = ["preserve_order", "raw_value", "unbounded_depth", "spanned"]
targets = ["x86_64-unknown-linux-gnu"]
rustdoc-args = [
"--generate-link-to-definition",
Expand Down Expand Up @@ -84,8 +84,11 @@ raw_value = []
# structures without any consideration for overflowing the stack. When using
# this feature, you will want to provide some other way to protect against stack
# overflows, such as by wrapping your Deserializer in the dynamically growing
# stack adapter provided by the serde_stacker crate. Additionally you will need
# stack adapter provided by the serde_stacker crate. Additionally, you will need
# to be careful around other recursive operations on the parsed result which may
# overflow the stack after deserialization has completed, including, but not
# limited to, Display and Debug and Drop impls.
unbounded_depth = []

# TODO: document
spanned = []
97 changes: 92 additions & 5 deletions src/de.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@ pub use crate::read::{Read, SliceRead, StrRead};
#[cfg(feature = "std")]
#[cfg_attr(docsrs, doc(cfg(feature = "std")))]
pub use crate::read::IoRead;

//////////////////////////////////////////////////////////////////////////////

/// A structure that deserializes JSON into Rust values.
Expand All @@ -36,6 +35,8 @@ pub struct Deserializer<R> {
single_precision: bool,
#[cfg(feature = "unbounded_depth")]
disable_recursion_limit: bool,
#[cfg(feature = "spanned")]
spanned_enabled: bool,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design note on why I decided to only add support for spans when explicitly enabled (e.g., by calling from_str_spanned instead of from_str):

Otherwise, this would be a breaking change for anyone passing a struct whose Deserialize implementation calls deserialize_struct on serde_json's Deserializer with arguments satisfying is_spanned. Admittedly, this is highly unlikely, but I'd rather stick to hard SemVer rules. If you disagree, I'm happy to put in this functionality unconditionally.

Another benefit of this design is that when fleshing out the implementation, this allows me to move the position tracking into the Deserializer (as Read's tracking seems to be insufficient) and only engage in the counting/tracking if the user indicated that they're interested in span information in the first place (by calling one of the new APIs).

}

impl<'de, R> Deserializer<R>
Expand All @@ -49,7 +50,7 @@ where
/// as a [`File`], you will want to apply your own buffering because serde_json
/// will not buffer the input. See [`std::io::BufReader`].
///
/// Typically it is more convenient to use one of these methods instead:
/// Typically, it is more convenient to use one of these methods instead:
///
/// - Deserializer::from_str
/// - Deserializer::from_slice
Expand All @@ -65,8 +66,30 @@ where
single_precision: false,
#[cfg(feature = "unbounded_depth")]
disable_recursion_limit: false,
#[cfg(feature = "spanned")]
spanned_enabled: false,
}
}

#[cfg(feature = "spanned")]
pub(crate) fn new_spanned(read: R) -> Self {
let mut de = Self::new(read);
de.spanned_enabled = true;
de
}

#[cfg(feature = "spanned")]
pub(crate) fn position(&self) -> read::Position {
// TODO: consider tracking line-breaks to avoid potentially expensive counting in some
// implementations of `Read::position`.
// TODO: `Read::position().column` tracks byte offset, not character offset.
self.read.position()
}

#[cfg(feature = "spanned")]
pub(crate) fn byte_offset(&self) -> usize {
self.read.byte_offset()
}
}

#[cfg(feature = "std")]
Expand All @@ -82,20 +105,41 @@ where
pub fn from_reader(reader: R) -> Self {
Deserializer::new(read::IoRead::new(reader))
}

/// TODO: document
#[cfg(feature = "spanned")]
#[cfg_attr(docsrs, doc(cfg(feature = "spanned")))]
pub fn from_reader_spanned(reader: R) -> Self {
Deserializer::new_spanned(read::IoRead::new(reader))
}
}

impl<'a> Deserializer<read::SliceRead<'a>> {
/// Creates a JSON deserializer from a `&[u8]`.
pub fn from_slice(bytes: &'a [u8]) -> Self {
Deserializer::new(read::SliceRead::new(bytes))
}

/// TODO: document
#[cfg(feature = "spanned")]
#[cfg_attr(docsrs, doc(cfg(feature = "spanned")))]
pub fn from_slice_spanned(bytes: &'a [u8]) -> Self {
Deserializer::new_spanned(read::SliceRead::new(bytes))
}
}

impl<'a> Deserializer<read::StrRead<'a>> {
/// Creates a JSON deserializer from a `&str`.
pub fn from_str(s: &'a str) -> Self {
Deserializer::new(read::StrRead::new(s))
}

/// TODO: document
#[cfg(feature = "spanned")]
#[cfg_attr(docsrs, doc(cfg(feature = "spanned")))]
pub fn from_str_spanned(s: &'a str) -> Self {
Deserializer::new_spanned(read::StrRead::new(s))
}
}

macro_rules! overflow {
Expand Down Expand Up @@ -1824,6 +1868,13 @@ impl<'de, R: Read<'de>> de::Deserializer<'de> for &mut Deserializer<R> {
where
V: de::Visitor<'de>,
{
#[cfg(feature = "spanned")]
{
if self.spanned_enabled && crate::spanned::is_spanned(_name, _fields) {
return visitor.visit_map(crate::spanned::SpannedDeserializer::new(self));
}
}

let peek = match tri!(self.parse_whitespace()) {
Some(b) => b,
None => {
Expand Down Expand Up @@ -2146,7 +2197,7 @@ impl<'de, 'a, R: Read<'de> + 'a> de::VariantAccess<'de> for UnitVariantAccess<'a
}
}

/// Only deserialize from this after peeking a '"' byte! Otherwise it may
/// Only deserialize from this after peeking a '"' byte! Otherwise, it may
/// deserialize invalid JSON successfully.
struct MapKey<'a, R: 'a> {
de: &'a mut Deserializer<R>,
Expand Down Expand Up @@ -2317,8 +2368,28 @@ where
self.de.deserialize_bytes(visitor)
}

#[inline]
#[cfg(feature = "spanned")]
fn deserialize_struct<V>(
self,
name: &'static str,
fields: &'static [&'static str],
visitor: V,
) -> result::Result<V::Value, Self::Error>
where
V: de::Visitor<'de>,
{
if crate::spanned::is_spanned(name, fields) {
return visitor.visit_map(crate::spanned::SpannedDeserializer::new(self.de));
}
self.deserialize_any(visitor)
}

#[cfg(not(feature = "spanned"))]
forward_to_deserialize_any! { struct }

forward_to_deserialize_any! {
char str string unit unit_struct seq tuple tuple_struct map struct
char str string unit unit_struct seq tuple tuple_struct map
identifier ignored_any
}
}
Expand Down Expand Up @@ -2362,7 +2433,7 @@ where
/// Create a JSON stream deserializer from one of the possible serde_json
/// input sources.
///
/// Typically it is more convenient to use one of these methods instead:
/// Typically, it is more convenient to use one of these methods instead:
///
/// - Deserializer::from_str(...).into_iter()
/// - Deserializer::from_slice(...).into_iter()
Expand All @@ -2378,6 +2449,22 @@ where
}
}

/// Create a JSON span tracking stream deserializer from one of the possible serde_json
/// input sources.
///
/// Typically, it is more convenient to use one of these methods instead:
///
/// - Deserializer::from_str_spanned(...).into_iter()
/// - Deserializer::from_slice_spanned(...).into_iter()
/// - Deserializer::from_reader_spanned(...).into_iter()
#[cfg(feature = "spanned")]
#[cfg_attr(docsrs, doc(cfg(feature = "spanned")))]
pub fn new_spanned(read: R) -> Self {
let mut de = Self::new(read);
de.de.spanned_enabled = true;
de
}

/// Returns the number of bytes so far deserialized into a successful `T`.
///
/// If a stream deserializer returns an EOF error, new data can be joined to
Expand Down
13 changes: 12 additions & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@
//! Number(Number),
//! String(String),
//! Array(Vec<Value>),
//! Object(Map<String, Value>),
//! Object(Map),
//! }
//! ```
//!
Expand Down Expand Up @@ -398,6 +398,14 @@ pub use crate::ser::{to_string, to_string_pretty, to_vec, to_vec_pretty};
#[cfg_attr(docsrs, doc(cfg(feature = "std")))]
#[doc(inline)]
pub use crate::ser::{to_writer, to_writer_pretty, Serializer};
#[cfg(all(feature = "std", feature = "spanned"))]
#[cfg_attr(docsrs, doc(cfg(all(feature = "std", feature = "spanned"))))]
#[doc(inline)]
pub use crate::spanned::from_reader_spanned;
#[cfg(feature = "spanned")]
#[cfg_attr(docsrs, doc(cfg(feature = "spanned")))]
#[doc(inline)]
pub use crate::spanned::{from_slice_spanned, from_str_spanned};
#[doc(inline)]
pub use crate::value::{from_value, to_value, Map, Number, Value};

Expand All @@ -423,6 +431,9 @@ pub mod map;
pub mod ser;
#[cfg(not(feature = "std"))]
mod ser;
#[cfg(feature = "spanned")]
#[cfg_attr(docsrs, doc(cfg(feature = "spanned")))]
pub mod spanned;
pub mod value;

mod io;
Expand Down
Loading