Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A specification missing #221

Closed
realcr opened this issue Nov 16, 2017 · 10 comments
Closed

A specification missing #221

realcr opened this issue Nov 16, 2017 · 10 comments
Milestone

Comments

@realcr
Copy link

realcr commented Nov 16, 2017

Hi TyOverby, thank you for the great work on this crate!
I was wondering if bincode has any written specification. I searched the repository and googled a bit but I couldn't find any written spec.

I wanted to use this crate for serializing/deserializing basic protocol network messages, but then realized that I might have difficulty writing a client in another language (Like javascript or python) to communicate with my rust server. A spec could really help with the task of writing a minimal serializer on the client side.

It is possible that bincode was written with the intent of only being used internally by Rust programs, and in that case my question is probably not relevant. What do you think?

@TyOverby
Copy link
Collaborator

Hi @realcr; great question!

I do have plans for releasing documentation on how bincode works internally, but not for people to use to implement decoders/encoders in other languages.

This is because bincode depends precisely on your structs are laid out and how the serde serialize/deserialize impls are generated.

I think you put it well in that this is mainly supposed to be used internally by Rust programs; I might go further and say that it should only be used by the same program because you must be sure that the serialization and deserialization must be implemented exactly the same.

Hope that answers your question! If I were you, I'd take a look at bson, it also uses serde and has libraries in many other languages.

@TyOverby TyOverby added this to the post-1.0 milestone Nov 16, 2017
@realcr
Copy link
Author

realcr commented Nov 17, 2017

Thank you for your quick reply!
bson is probably not a good solution for me, as I need to have really small message size. I can't afford having field names in my messages.

I have not decided yet what to do. the bincode approach was very compelling as it works nicely and results in very small messages. I think that I will use some parser like https://github.com/Geal/nom to parse the messages I'm receiving.

Regarding the spec, even if bincode is not going to be used by other languages, I agree with you that it is probably a good idea to have a spec.

@TyOverby
Copy link
Collaborator

@realcr: the only other recommendation that I would have for you would be to check out protobuf or captain-proto. Very fast serialization / deserialization, and I believe that the messages are quite small (they definitely don't include the field names).

@realcr
Copy link
Author

realcr commented Nov 18, 2017

@TyOverby: Thank you! I will be sure to check those out.

@matklad
Copy link

matklad commented Feb 16, 2018

Hi! Congratulations with bin-code 1.0.0 official zeroth birthday 🎉 🍰!

👍 to the idea of providing a language-independent specification. I believe there is a particular niche for cross-language serialization, which, surprisingly, is better served by bincode than by other existing formats. This use case came up in exonum.

Specifically, exonum serializes various user-defined messages to binary, and then cryptographically signs the binary data. For this use-case, it is very desirable that the serialization format is canonical: that is, that each message has exactly one valid encoding. Or, more formally forall m1: Vec<u8>, m2: Vec<u8>. (m1 != m2) => decode(m1) != decode(m2). This is needed to make sure that signatures on various ends of the system actually match, and to protect from various replay-based attacks (you may fail to detect a replay if hashes of two semantically equivalent messages differ).

So, self-describing and forwards-compatible properties of formats like protobuf, cap'n'proto, JSON or CBOR are actively harmful for this use case: if you can reorder or add fields, it's hard to get canonical property. Not that cap'n'proto supports canonicalization, but it is not an intrinsic property of the format.

In contrast, there's by construction only one way to serialize the data in bincode, and this is a real unique competitive advantage of this format :)

@kuviman
Copy link

kuviman commented May 12, 2019

should only be used by the same program because you must be sure that the serialization and deserialization must be implemented exactly the same

Is this documented somewhere?

@kevincox
Copy link

Something important to document, that doesn't necessarily require documenting the protocol but would likely be a part of the specification is compatibility guarantees.

  • Compatibility between different platforms.
    • Endianness
    • Integer sizes
  • Compatibility between different versions:
    • Rust compiler/LLVM version.
    • bincode crate version.
    • Application version:
      • Different struct definitions.
      • Different Serialize/Deserialize definitions.
      • Same struct definitions but other changes in the codebase (AKA optimizer dependent).

It would be nice to know which of these (if any) are guaranteed to be compatible, if they have any well-defined degradation or if it is undefined.

@cheako
Copy link

cheako commented Jun 1, 2020

I'm still confused on two points worth addressing:

Backward compatibility: Can I read bytes saved years or decades ago? Maintainability: Can I easily manage different versions of my structs as they change?

I'm looking into other options, given these two items are not well explained in this issue.

@ZoeyR
Copy link
Collaborator

ZoeyR commented Jun 1, 2020

@cheako since 1.0 the bitstream should be stable. (pending an answer on serde-rs/serde#1756). I haven't evaluated struct changes in depth but they are very fragile at the least.

@stale
Copy link

stale bot commented Jun 13, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants