Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please address suitability for untrusted inputs in docs #345

Closed
marc-casperlabs opened this issue Aug 9, 2020 · 2 comments · Fixed by #346
Closed

Please address suitability for untrusted inputs in docs #345

marc-casperlabs opened this issue Aug 9, 2020 · 2 comments · Fixed by #346

Comments

@marc-casperlabs
Copy link

marc-casperlabs commented Aug 9, 2020

A question that I have run into in multiple times on different projects is: How suitable is bincode for network communication?

Example scenario

A typical scenario is a networking application that sends messages between two participants (peer-to-peer or client server). By default, one can reach for JSON at this point, as the libraries are built against a spec and are safe dealing with hostile inputs (provided they are size-limited).

bincode strikes me as something that (at least initially) was intended for trusted communication, possibly in-memory communication, potentially sacrificing support for input from untrusted sources for extra speed.

This question seems to have been asked directly or in a roundabout way in multiple other issues: #216, #136, #221, #240, #255, #266.

Alternatives

There is a real hole in the Rust ecosystem that yearns to be filled with a compact binary format that can be used to send unversioned, untrusted data with serde support. Certainly protobuf, captain proto, flatbuffers exist, but they all tackle a grander scope and come with extra hoops to jump through - at least I just want to slap a #[derive(Serialize)] onto my struct and send it more efficiently than JSON across the aether.

Of the available, mature alternatives, msgpack and bson are mentioned, but these are self-describing and thus "waste" space by serializing struct field names as well.

There are other non-established contenders listed in the serde docs, but many fail to address this issue in a precise manner in their docs as well.

The list

To judge suitability for using bincode for a specific project, at least the following questions need to be answered:

  1. Is the format stable/well-defined, i.e. does not change across minor versions. This is typically important when storing data using bincode long-term.

  2. Is the format invariant over differences in byte-order, that is will a big-endian machine produce the same output as a little-endian machine?

  3. Is it space-efficient for binary data, i.e. will it be space efficient for large binary data?

  4. Is it written with "hostile" data in mind? Rust's memory safety goes a long way of course, but some things like handling invalid inputs or not pre-allocating large amounts of memory when receiving a message are required, and typically a performance-security trade-off.

I may have missed some points, but these came to mind immediately.

It would be great if these could be answered in the docs - I will happily volunteer to write a note and add it to the README.md or core module docs, but each of these need an answer from someone who knows the actual source code well.

Each question will actually need two answers, one addressing the status quo and another whether or not this is an accidental or intended property. Any project that relies on bincode certainly needs to know the guarantees it can expect from future versions.

Note that answering "don't know and won't ever guarantee it" on each of these is perfectly fine - I would just like to see it written down, as I find myself in discussions about this with my colleagues fairly often and having sent them to various GitHub issues, it seems that there is a real need for this information =).

If someone could just let me know informally about the state and goals, I will do my best to turn this into a PR updating the docs.

In the meantime, thanks for six years of work that already went into this crate :)

@ZoeyR
Copy link
Collaborator

ZoeyR commented Aug 12, 2020

These are the answers to the questions outlined above:

  1. a. The library is stable across minor revisions in that the same configuration will produce the same output. New options have been added to configure the way data is encoded
    b. The intention is to eventually codify the format into a specification. There will always be configuration options but these will be documented as part of the spec.

  2. The default configuration will be invariant over byte-order. A user can set up bincode to use native-endian, in which case it will not be invariant.

  3. bincode should be space-efficient for binary data by design. As an example, a slice of u8 will be encoded as a length and then the raw u8s

  4. bincode attempts to protect against hostile data. There is a maximum size configuration (although not on by default), and pre-allocation size is limited based on that configuration. The structures produced by bincode should all be valid structures and deserializing will not cause UB (assuming of course that the deserialization code for the struct is safe).

mbr added a commit to mbr/bincode that referenced this issue Aug 13, 2020
Puts some prominent text into the `readme.md` regarding some use cases
that are likely to be common, along with a few hopefully helpful
pointers to avoid footguns.

Closes bincode-org#345, closes bincode-org#216, addresses bincode-org#240, bincode-org#266.
ZoeyR pushed a commit that referenced this issue Feb 23, 2021
…uts (#346)

* Address questions regarding suitability for storage and untrusted inputs

Puts some prominent text into the `readme.md` regarding some use cases
that are likely to be common, along with a few hopefully helpful
pointers to avoid footguns.

Closes #345, closes #216, addresses #240, #266.

* Fix typos in `readme.md`

* Remove confusing sentence post 1.0, as requested
@tv42
Copy link

tv42 commented Jun 7, 2021

#240 sounds relevant for protecting against inputs claiming large field sizes, maliciously or due to corruption / version mismatch / cat on keyboard using netcat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants