Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: introduce borsh::io with either items of std:io or private borsh::nostd_io module reexported (std or no_std) #212

Merged
merged 2 commits into from
Sep 9, 2023

Conversation

mina86
Copy link
Contributor

@mina86 mina86 commented Sep 7, 2023

When built with std feature, borsh uses a handful of std::io types
in its public API (e.g. std::io::Write is used in BorshSerialize
trait). When std feature is disabled, borsh switches to its own
types which mimics behaviour of standard types offered through
borsh::nostd_io.

Problem is that this approach results in no consistent way to name
Write, Read etc. symbols used in the public API. This creates
a problem for authors and users of no_std libraries. A no_std library
might have code such as:

impl borsh::BorshSerialize for Foo {
    fn serialize<W: borsh::nostd_io::Write>(
        &self, writer: &mut W,
    ) -> borsh::maybestd::io::Result<()> {
        todo!()
    }
}

So long as borsh is built with default features disabled it will work
correctly. However, if author of a std application enables borsh’s
std feature, the aforementioned example crate will fail to build since
borsh::nostd_io will no longer be offered.

Address this by introducing borsh::io module which exports Error,
ErrorKind, Read, Result and Write symbols, i.e. the types which are
used in borsh’s public API.

With this change, author of a no_std library may write:

impl borsh::BorshSerialize for Foo {
    fn serialize<W: borsh::io::Write>(
        &self, writer: &mut W,
    ) -> borsh::maybestd::io::Result<()> {
        todo!()
    }
}

and their code will work without problems when used in std
applications.

@mina86 mina86 requested review from frol and dj8yfo as code owners September 7, 2023 12:45
@mina86
Copy link
Contributor Author

mina86 commented Sep 7, 2023

The crux of the change is in borsh/src/lib.rs. The rest is just s/__private::maybestd/__private/.

Copy link
Collaborator

@frol frol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not support this change. We don't want to turn borsh-rs into maybestd provider. If you absolutely don't want to define your own maybestd, or implement some library that extends borsh capabilities, consider using use borsh::__private::maybestd; - it is still public, but from a borsh crate point of view, it is a private implementation that we don't want to provide backward compatibility guaranties.

@mina86
Copy link
Contributor Author

mina86 commented Sep 7, 2023

We don't want to turn borsh-rs into maybestd provider.

I mean, borsh devs should have thought about that when they decided to use io::Read, io::Write and io::Error as part of borsh’s public API. For comparison, serde does offer StdError which switches between std::io::Error and its own trait depending on Cargo features. If you don’t want to offer maybestd you have an option of switching the interface to not use std.

Not offering maybestd is not only hostile to users of borsh but also to all users of no_std-compatible libraries which use borsh.

consider using use borsh::__private::maybestd; - it is still public, but from a borsh crate point of view, it is a private implementation that we don't want to provide backward compatibility guaranties.

But you are providing compatibility guarantees to borsh::nostd_io, correct? So you’re already paying 99% of the cost of supporting borsh::maybestd. Adding borsh::maybestd::io as re-export of borsh::nostd_io or parts of std::io is a rather tame requirement.

@frol
Copy link
Collaborator

frol commented Sep 7, 2023

But you are providing compatibility guarantees to borsh::nostd_io, correct?

That scope is small and self-contained in a single file, and that is indeed part of the public interface, as you stated. maybestd does not need to be part of the public interface, hence the difference.

Not offering maybestd is not only hostile to users of borsh but also to all users of no_std-compatible libraries which use borsh.

no_std-compatible libraries have two valid options:

  1. Treat themselves as "friend" modules (in C++ terms), and use borsh::__private::maybestd wherever applicable (borsh-derive does that), knowing that they need to pay attention to __private changes
  2. Write a minimal maybestd implementation themselves, as you said, it will be just a re-export

We optimize borsh for application developers, and we don't want them to abuse borsh::maybestd as they will later come and request more re-exports there.

@mina86
Copy link
Contributor Author

mina86 commented Sep 7, 2023

That scope is small and self-contained in a single file,

Just like what this PR proposes:

pub mod maybestd {
    pub mod io {
        #[cfg(feature = "std")]
        pub use std::io::{Error, ErrorKind, Read, Result, Write};

        #[cfg(not(feature = "std"))]
        pub use crate::nostd_io::*;
    }
}

Literally five symbols which are part of borsh’s public API. If you don’t like the name maybestd I’d also be happy with:

#[cfg(feature = "std")]
pub mod io {
    pub use std::io::{Error, ErrorKind, Read, Result, Write};
}

#[cfg(not(feature = "std"))]
pub use crate::nostd_io as io;

That would offer types borsh::io::Read etc.

Treat themselves as "friend" modules (in C++ terms), and use borsh::__private::maybestd wherever applicable (borsh-derive does that), knowing that they need to pay attention to __private changes

What’s the point of borsh::nostd_io then? At the moment it’s a foot-gun breaking people’s builds. Consider the following pascal-str crate:

[package]
name = "pascal-str"
version = "0.1.0"
edition = "2021"

[dependencies]
borsh = { version = "1.0.0-alpha.4", default-features = false }
#![no_std]
extern crate alloc;

struct PascalString(alloc::vec::Vec<u8>);

impl borsh::BorshSerialize for PascalString {
    fn serialize<W: borsh::nostd_io::Write>(&self, writer: &mut W) -> borsh::nostd_io::Result<()> {
        writer.write_all(&[self.0.len() as u8])?;
        writer.write_all(&self.0)
    }
}

It compiles and works fine. But let’s now say someone uses it in std application like so:

[package]
name = "foo"
version = "0.1.0"
edition = "2021"

[dependencies]
borsh = "1.0.0-alpha.4"
pascal-str = { path = "../pascal-str" }

And suddenly the code breaks:

error[E0433]: failed to resolve: could not find `nostd_io` in `borsh`
 --> /tmp/b/pascal-str/src/lib.rs:7:28
  |
7 |     fn serialize<W: borsh::nostd_io::Write>(&self, writer: &mut W) -> borsh::nostd_io::Result<()> {
  |                            ^^^^^^^^ could not find `nostd_io` in `borsh`

error[E0433]: failed to resolve: could not find `nostd_io` in `borsh`
 --> /tmp/b/pascal-str/src/lib.rs:7:78
  |
7 |     fn serialize<W: borsh::nostd_io::Write>(&self, writer: &mut W) -> borsh::nostd_io::Result<()> {
  |                                                                              ^^^^^^^^ could not find `nostd_io` in `borsh`

Enabling a cargo feature shouldn’t break builds.

We optimize borsh for application developer

Except you don’t. Because now author of pascal-str has to introduce a std feature to their crate and application developer has to enable it. This is an objectively worse experience for application developer than what they had with borsh 0.10.

Write a minimal maybestd implementation themselves, as you said, it will be just a re-export

But as shown, it’s not just a matter of a simple maybestd module with re-exports. I have no way to refer to Write, Read, Error and ErrorKind symbols used by borsh. I have to have user of my crate tell me how borsh was compiled.

and we don't want them to abuse borsh::maybestd as they will later come and request more re-exports there.

If they do, tell them to GTFO. However, if something is part of borsh’s public API that should be provided by borsh in a way which doesn’t depend on whether std feature is enabled or not.

@frol
Copy link
Collaborator

frol commented Sep 9, 2023

Because now author of pascal-str has to introduce a std feature to their crate and application developer has to enable it. This is an objectively worse experience for application developer than what they had with borsh 0.10.

Well, pascal-str might want to have std feature anyway to enable it in borsh, but I see your point now.

But as shown, it’s not just a matter of a simple maybestd module with re-exports. I have no way to refer to Write, Read, Error and ErrorKind symbols used by borsh. I have to have user of my crate tell me how borsh was compiled.

This is a fair point. @mina86 @dj8yfo @matklad I would like to consider having borsh::maybestd_io or borsh::io module to be part of the public API. Still, I don't want to expose the whole maybestd. What do you think?

@dj8yfo
Copy link
Collaborator

dj8yfo commented Sep 9, 2023

Well, pascal-str might want to have std feature anyway to enable it in borsh...

Agree.

I have no way to refer to Write, Read, Error and ErrorKind symbols used by borsh.

Using this recipe

#[cfg(feature = "std")]
use std::io;
#[cfg(not(feature = "std"))]
use borsh::nostd_io as io;
should help.
Doing the dispatch on the client lib side is more explicit about what's happening and i like it more.

like to consider having borsh::maybestd_io or borsh::io module to be part of the public API

This might result in people mindlessly using borsh::io instead of std::io even in the spots completely unrelated to borsh (IDE completion somehow adversely affects minds of folks).

Here's an example https://github.com/near/nearcore/pull/9432/files#diff-5610719ce9d54675a9153a04627f8988705312f7dec97375fa702283ed796c1aL4 and there're a few more.

This pr has most likely protected from cases when items unrelated to borsh::io might be used by only reexporting the relevant ones https://github.com/mina86/borsh-rs/blob/2e942c8c88611804db699237d6b0d976da40d3c4/borsh/src/lib.rs#L127-L130 .
But it doesn't protect from cases when borsh::io is used in wrong context, where std::io was meant instead.

Supposedly, the recipe above addresses this too.
We've already adjusted the plan of #51 with making schema -> unstable__schema due to additions by @mina86.
Hopefully, he can sustain some backpressure in client libraries too.

dj8yfo
dj8yfo previously requested changes Sep 9, 2023
Copy link
Collaborator

@dj8yfo dj8yfo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking this pr as well, unless there's some new convincing argument

@mina86
Copy link
Contributor Author

mina86 commented Sep 9, 2023

  1. Build breaks when std feature is toggled. This is a bug.
  2. There’s no way to name Read, Write, Result, Error and ErrorKind types used by borsh thus making life of library developers harder. This is bad API design and user-hostile.
  3. Library developers are forced to introduce std feature which makes life of application developers harder since they now have to enable that feature on libraries which otherwise wouldn’t have it. This is user-hostile and contradicts frol’s statement that borsh is optimised for application developers.

The fact that pascal-str might want a std feature anyway is not an argument. First of all, there are many possible crates which won’t have std feature. Specifically the hypothetical pascal-str crate would probably not have that feature since it would be a wrapper around alloc::string::String. Second of all, there’s a bug in borsh. I’ve presented steps to reproduce it. And your answer to it: well, just don’t do it. That’s not a valid response to a bug report.

Unless there’s some new convincing argument, I must conclude that borsh 1.x is in state of perpetual brokenness poorly suited for no_std environments and that I should stick to borsh 0.10.

@dj8yfo
Copy link
Collaborator

dj8yfo commented Sep 9, 2023

Well, i've found an example pascal_string crate/repo. It doesn't have no_std at all.

Can you post a link to the problematic repo?

Your example

#![no_std]
extern crate alloc;

struct PascalString(alloc::vec::Vec<u8>);

impl borsh::BorshSerialize for PascalString {
    fn serialize<W: borsh::nostd_io::Write>(&self, writer: &mut W) -> borsh::nostd_io::Result<()> {
        writer.write_all(&[self.0.len() as u8])?;
        writer.write_all(&self.0)
    }
}

attempts to assert, that the pascal-str doesn't use anything out of std , that's why it can work in both std and no_std.

But let’s now say someone uses it in std application like so

Is it supposed to still be using borsh::nostd_io in an std application?

If so, it can be helped by removing just this line

#[cfg(not(feature = "std"))]
, then borsh will still use either std::io or self::nostd_io depending on flag, but provide borsh::nostd_io outside in both situations.
But then, what is the reason to support this, if pascal-str won't be compatible with other types from borsh in an std application?

@matklad
Copy link
Contributor

matklad commented Sep 9, 2023

Imo, this is what we need (haven't checked this specific code, might need some adjustments):

#[cfg(feature = "std")]      use std::io as io_impl;
#[cfg(not(feature = "std"))] mod nostd_io;
#[cfg(not(feature = "std"))] use nostd_io as io_impl;

pub mod io {
    pub use crate::io_impl::{Error, ErrorKind, Read, Result, Write};
}

That is:

  • given that borsh is io-polymorphic, clearly it should provide users with the ability to write io-polymorphic code as well.
  • borsh::io feels like the right name here. This is not the general "maybestd", io::Error is the only thing we need here (and we should commit to not needing anything else from the std)
  • we should enforce that the interface between two io modules is compatible, using a single pub use feels like a good way to do that.
  • it's unclear whether borsh::io should be absolutely minimal and strictly restricted to what's needed by borsh, or whether it should be a duck-type replacement for std::io. Eg, it unclear whether borsh::io should contain a version of std::io::copy -- some user code might want to call borsh::io::copy and have that work in std and non-std. I think it makes sense to expose what's already exposed. That is, neither to be absolutely minimal, nor to try to expose everything.

core::io feels like a common problem which should have solutions in the ecosystem, but quick googling didn't reveal any existing patterns. I think it is plausible that in the distant future, std::io::Error is moved to core, or some canonical core_io crate emerges. We should be able to take advantage of those by adding a new feature flag, so there's no compatibility hazard here.

In terms of diff for the current PR, rename&flatten borsh::maybestd::io to just borsh::io.

@dj8yfo dj8yfo dismissed their stale review September 9, 2023 14:15

argument settled

@mina86 mina86 changed the title Bring maybestd module back Introduce borsh::io with parts of std::io used in borsh API Sep 9, 2023
@mina86 mina86 force-pushed the a branch 2 times, most recently from 3e04ae9 to ffa6c3f Compare September 9, 2023 14:46
Copy link
Collaborator

@frol frol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think borsh::io is a good compromise, so I approve this change.

@mina86 please, make CI checks all green

@dj8yfo dj8yfo marked this pull request as draft September 9, 2023 17:07
When built with `std` feature, borsh uses a handful of std::io types
in its public API (e.g. std::io::Write is used in BorshSerialize
trait).  When `std` feature is disabled, borsh switches to its own
types which mimics behaviour of standard types offered through
`borsh::nostd_io`.

Problem is that this approach results in no consistent way to name
`Write`, `Read` etc. symbols used in the public API.  This creates
a problem for authors and users of no_std libraries.  A no_std library
might have code such as:

    impl borsh::BorshSerialize for Foo {
        fn serialize<W: borsh::nostd_io::Write>(
            &self, writer: &mut W,
        ) -> borsh::maybestd::io::Result<()> {
            todo!()
        }
    }

So long as borsh is built with default features disabled it will work
correctly.  However, if author of a std application enables borsh’s
std feature, the aforementioned example crate will fail to build since
borsh::nostd_io will no longer be offered.

Address this by introducing `borsh::io` module which exports Error,
ErrorKind, Read, Result and Write symbols, i.e. the types which are
used in borsh’s public API.

With this change, author of a no_std library may write:

    impl borsh::BorshSerialize for Foo {
        fn serialize<W: borsh::io::Write>(
            &self, writer: &mut W,
        ) -> borsh::maybestd::io::Result<()> {
            todo!()
        }
    }

and their code will work without problems when used in std
applications.
@mina86 mina86 marked this pull request as ready for review September 9, 2023 17:21
/// the exported types are custom borsh types which try to mimic behaviour of
/// corresponding standard types usually offering subset of features.
pub mod io {
pub use super::io_impl::{Error, ErrorKind, Read, Result, Write};
Copy link
Collaborator

@dj8yfo dj8yfo Sep 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rust allows to do this without much confusion )))
tmp

Haircut with fine nozzle
@dj8yfo dj8yfo changed the title Introduce borsh::io with parts of std::io used in borsh API feat!: introduce borsh::io with either items of std:io or private borsh::nostd_io module reexported (std or no_std) Sep 9, 2023
@dj8yfo dj8yfo merged commit f6f6268 into near:master Sep 9, 2023
@frol frol mentioned this pull request Sep 9, 2023
@mina86 mina86 deleted the a branch September 9, 2023 19:01
@dj8yfo
Copy link
Collaborator

dj8yfo commented Sep 9, 2023

The only downside of using borsh::io is that inattentive folks can randomly use items from this completion instead of std::io.

@mina86 has so far ignored the question to the pascal-str example, which was the cornerstone reason, backing the push for the change done by this pr:

tmp

@mina86
Copy link
Contributor Author

mina86 commented Sep 9, 2023

@mina86 has so far ignored the #212 (comment) to the pascal-str example, which was the cornerstone reason, backing the push for the change done by this pr:

Because it wasn’t a cornerstone reason. It was a quick example I came up with on the spot. The cornerstone reason was that there was no consistent way to name types used in the API which in turn resulted in build breaking when std feature was enabled.

I don’t care if it’s borsh::io, borsh::maybestd::io or even borsh::byćmożestd::io. So long as the same path works regardless of what features are enabled. And once that name is provided, pascal-str is supposed to use that.

@dj8yfo
Copy link
Collaborator

dj8yfo commented Sep 9, 2023

@mina86 well, alternatively, u can use 4 lines to dispatch on an std feature, or byćmożestd feature or whatever.

 #[cfg(feature = "std")] 
 use std::io; 
  
 #[cfg(not(feature = "std"))] 
 use borsh::nostd_io as io; 

It's explicit and prevents you from importing borsh::io all over the place, whether you need it or not.
std is important enough to be worth a line in Cargo.toml, and, as far as i understood, the pascal-str did intend to use two different providers of io, while using borsh, depending on whether it's std or not.

5 lines in client with explicit stating of what's used when and what's not vs 500 lines change in the subj repo is a good tradeoff imo.

@mina86
Copy link
Contributor Author

mina86 commented Sep 9, 2023

std is important enough to be worth a line in Cargo.toml

No, it’s not. I’m writing a no_std library. I don’t care about std. I wrote code which works with borsh with default features disabled. And then, when someone uses my library, the code breaks because borsh’s std feature was enabled. That a bug in borsh.

as far as i understood, the pascal-str did intend to use two different providers of io, while using borsh, depending on whether it's std or not.

Yes, depending on whether borsh is std or not. And since borsh changes types depending on whether it’s std or not, it’s borsh’ responsibility to provide a stable API for types it uses.

Imagine borsh had a uk feature and when it’s enabled rather than BorshSerialise and BorshDeserialise it would define BorshSerialize and BorshDeserialize. That would be extremely bad API. Situation with Read, Write etc. is the same.

5 lines in client with explicit stating of what's used when and what's not vs 500 lines change in the subj repo is a good tradeoff imo.

I can make this a ‘5 line’ change in borsh if you prefer:

+#[cfg(feature = "std")]
+pub mod io {
+    pub use std::io::{Read, Write, Error, ErrorKind, Result};
+}
+
+#[cfg(not(feature = "std"))]
+pub use nostd_io as io;

I made it 500 to simplify other things in the code. This isn’t a valid argument.

Not to mention that even if this is a 5 lines in client library, it’s 5 lines in every library and additional changes in every crate that transitively depends on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants