Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix read_line panic on non-empty buffer #126

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

notgull
Copy link
Member

@notgull notgull commented Feb 16, 2025

The behavior of the canonical read_line is to append to whatever buffer
is passed into it. At the moment it assumes the buffer is empty and
panics otherwise. This commit adds a check for the buffer being empty,
and appends the string otherwise. This is very inefficient but should be
relatively uncommon.

This commit also adds a unit test to prevent regressions.

Fixes #124

The behavior of the canonical read_line is to append to whatever buffer
is passed into it. At the moment it assumes the buffer is empty and
panics otherwise. This commit adds a check for the buffer being empty,
and appends the string otherwise. This is very inefficient but should be
relatively uncommon.

This commit also adds a unit test to prevent regressions.

Fixes #124

Signed-off-by: John Nunley <[email protected]>
@notgull notgull requested a review from fogti February 16, 2025 18:27
@@ -1906,9 +1906,12 @@ fn read_line_internal<R: AsyncBufRead + ?Sized>(

match String::from_utf8(mem::take(bytes)) {
Copy link
Member

@fogti fogti Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently allocates twice in case that !buf.is_empty(), but that should be relatively easy to avoid:

  • If buf.is_empty(), basically do the same as currently
  • If !buf.is_empty(), check the string via core::str:from_utf8 instead, and when that is successful, append it to buf. It would be possible to also do that for buf.is_empty(), and String::from_utf8 also does that internally, but doing that require usage of String::from_utf8_unchecked after the check, which we might want to avoid.
  • One perhaps needs a bit of work to factor out the common error handling part (perhaps creating a new enum for that or using Result<Result<_,_>,_> could be of use)

It might be possible (but is way more complicated) to reduce memory allocations further (keep in mind that we are currently:

  • copying data out of the buffer of an AsyncBufRead object (that buffer usually has fixed size),
  • appending it to bytes (possibly allocating)
  • moving or copying data to buf later

), but that would necessitate a refactoring of the API and different arguments, so is out of scope here (idk if there is already a proper type for "String with a possibly-invalid-UTF-8 unverified tail", backed by a single Vec<u8>).

Besides that, there is something that irritates me about the original code of read_line_internal, because it basically throws away the bytes already buffered when bytes doesn't contain valid UTF-8, which feels unexpected (we could at least append the part to buf that is valid UTF-8).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

read_line panics on non-empty buffer
2 participants