Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output iteration + buffering #33

Closed
ehmicky opened this issue Aug 23, 2024 · 3 comments · Fixed by #53
Closed

Output iteration + buffering #33

ehmicky opened this issue Aug 23, 2024 · 3 comments · Fixed by #53

Comments

@ehmicky
Copy link
Collaborator

ehmicky commented Aug 23, 2024

We buffer and return the subprocess stdout/stderr as result.stdout and result.stderr.

Users can also iterate over stdout/stderr. It is helpful for getting progressive output (e.g. a progress bar) or transforming it incrementally.

Also, one of the main use cases of iteration is to reduce memory consumption. However, in our case, the memory is unfortunately not reduced since we buffer result.stdout/result.stderr anyway.

The problem is: with the current API, we only know whether the subprocess output will be iterated or not after the subprocess has already started and its output is already buffering. We have to start buffering right away, else we might miss some data.

Some possible solutions:

  1. When iteration starts, stop any ongoing buffering, and return undefined for result.stdout/result.stderr.
  2. Add a boolean option.
  3. Remove the iteration feature altogether.
  4. Expose iteration as a separate top-level method.

I am leaning towards 1. because it results in the simplest API, while still keeping the iteration feature. It's somewhat quirky though.

What do you think?

@sindresorhus
Copy link
Owner

When iteration starts, stop any ongoing buffering, and return undefined for result.stdout/result.stderr.

👍 But maybe keep them empty strings? The main use-case is not iteration, and it would be annoying having to guard them for each usage when they are almost always filled in.

@ehmicky
Copy link
Collaborator Author

ehmicky commented Aug 24, 2024

Yes, you're right. Also, making the type change from string to undefined depending on whether buffer.stdout is iterated is not typable in TypeScript, so it's better to turn it into an empty string. 👍

@ehmicky
Copy link
Collaborator Author

ehmicky commented Aug 28, 2024

I've been trying to implement this but this is more difficult than expected.

The problem is being able to first remove the data listener, then start the for await (const chunk of stream) loop, while being 100% sure no chunk will be dropped in-between. I've been digging through the Node.js stream codebase, and it's a little tricky. The problem is that chunks are internally buffered by Node.js and some of the data/readable event emission is done asynchronously (typically, after 1 or multiple ticks). So it's not straightforward to switch from one consumer to another without dropping any chunk.

I am trying to think of implementation solutions right now. 🤔

One possible solution is outlined at #43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants