Please address suitability for untrusted inputs in docs #345

marc-casperlabs · 2020-08-09T13:38:53Z

A question that I have run into in multiple times on different projects is: How suitable is bincode for network communication?

Example scenario

A typical scenario is a networking application that sends messages between two participants (peer-to-peer or client server). By default, one can reach for JSON at this point, as the libraries are built against a spec and are safe dealing with hostile inputs (provided they are size-limited).

bincode strikes me as something that (at least initially) was intended for trusted communication, possibly in-memory communication, potentially sacrificing support for input from untrusted sources for extra speed.

This question seems to have been asked directly or in a roundabout way in multiple other issues: #216, #136, #221, #240, #255, #266.

Alternatives

There is a real hole in the Rust ecosystem that yearns to be filled with a compact binary format that can be used to send unversioned, untrusted data with serde support. Certainly protobuf, captain proto, flatbuffers exist, but they all tackle a grander scope and come with extra hoops to jump through - at least I just want to slap a #[derive(Serialize)] onto my struct and send it more efficiently than JSON across the aether.

Of the available, mature alternatives, msgpack and bson are mentioned, but these are self-describing and thus "waste" space by serializing struct field names as well.

There are other non-established contenders listed in the serde docs, but many fail to address this issue in a precise manner in their docs as well.

The list

To judge suitability for using bincode for a specific project, at least the following questions need to be answered:

Is the format stable/well-defined, i.e. does not change across minor versions. This is typically important when storing data using bincode long-term.
Is the format invariant over differences in byte-order, that is will a big-endian machine produce the same output as a little-endian machine?
Is it space-efficient for binary data, i.e. will it be space efficient for large binary data?
Is it written with "hostile" data in mind? Rust's memory safety goes a long way of course, but some things like handling invalid inputs or not pre-allocating large amounts of memory when receiving a message are required, and typically a performance-security trade-off.

I may have missed some points, but these came to mind immediately.

It would be great if these could be answered in the docs - I will happily volunteer to write a note and add it to the README.md or core module docs, but each of these need an answer from someone who knows the actual source code well.

Each question will actually need two answers, one addressing the status quo and another whether or not this is an accidental or intended property. Any project that relies on bincode certainly needs to know the guarantees it can expect from future versions.

Note that answering "don't know and won't ever guarantee it" on each of these is perfectly fine - I would just like to see it written down, as I find myself in discussions about this with my colleagues fairly often and having sent them to various GitHub issues, it seems that there is a real need for this information =).

If someone could just let me know informally about the state and goals, I will do my best to turn this into a PR updating the docs.

In the meantime, thanks for six years of work that already went into this crate :)

The text was updated successfully, but these errors were encountered:

ZoeyR · 2020-08-12T18:01:20Z

These are the answers to the questions outlined above:

a. The library is stable across minor revisions in that the same configuration will produce the same output. New options have been added to configure the way data is encoded
b. The intention is to eventually codify the format into a specification. There will always be configuration options but these will be documented as part of the spec.
The default configuration will be invariant over byte-order. A user can set up bincode to use native-endian, in which case it will not be invariant.
bincode should be space-efficient for binary data by design. As an example, a slice of u8 will be encoded as a length and then the raw u8s
bincode attempts to protect against hostile data. There is a maximum size configuration (although not on by default), and pre-allocation size is limited based on that configuration. The structures produced by bincode should all be valid structures and deserializing will not cause UB (assuming of course that the deserialization code for the struct is safe).

Puts some prominent text into the `readme.md` regarding some use cases that are likely to be common, along with a few hopefully helpful pointers to avoid footguns. Closes bincode-org#345, closes bincode-org#216, addresses bincode-org#240, bincode-org#266.

…uts (#346) * Address questions regarding suitability for storage and untrusted inputs Puts some prominent text into the `readme.md` regarding some use cases that are likely to be common, along with a few hopefully helpful pointers to avoid footguns. Closes #345, closes #216, addresses #240, #266. * Fix typos in `readme.md` * Remove confusing sentence post 1.0, as requested

tv42 · 2021-06-07T19:48:10Z

#240 sounds relevant for protecting against inputs claiming large field sizes, maliciously or due to corruption / version mismatch / cat on keyboard using netcat.

marc-casperlabs mentioned this issue Aug 9, 2020

NDRS-206: replace bincode with rmp-serde casper-network/casper-node#142

Merged

mbr mentioned this issue Aug 13, 2020

Address questions regarding suitability for storage and untrusted inputs #346

Merged

ZoeyR closed this as completed in #346 Feb 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please address suitability for untrusted inputs in docs #345

Please address suitability for untrusted inputs in docs #345

marc-casperlabs commented Aug 9, 2020 •

edited

Loading

ZoeyR commented Aug 12, 2020

tv42 commented Jun 7, 2021

Please address suitability for untrusted inputs in docs #345

Please address suitability for untrusted inputs in docs #345

Comments

marc-casperlabs commented Aug 9, 2020 • edited Loading

Example scenario

Alternatives

The list

ZoeyR commented Aug 12, 2020

tv42 commented Jun 7, 2021

marc-casperlabs commented Aug 9, 2020 •

edited

Loading