Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid seek in writeStreamItem #3370

Closed
dckc opened this issue Jun 20, 2021 · 14 comments
Closed

invalid seek in writeStreamItem #3370

dckc opened this issue Jun 20, 2021 · 14 comments
Assignees
Labels
bug Something isn't working SwingSet package: SwingSet

Comments

@dckc
Copy link
Member

dckc commented Jun 20, 2021

Describe the bug

I ran across this while testing snapshots

2021-06-20T14:49:44.128Z SwingSet: kernel: error in kernel.deliver: (Error#7)
Error#7: ESPIPE: invalid seek, write

  at Proxy.writeSync (fs.js:711:3)
  at meteredConstructor.writeStreamItem (packages/swing-store-lmdb/src/lmdbSwingStore.js:351:8)
  at meteredConstructor.addToTranscript (packages/SwingSet/src/kernel/state/vatKeeper.js:414:32)
  at meteredConstructor.finishDispatch (packages/SwingSet/src/kernel/vatManager/transcript.js:43:17)
  at meteredConstructor.deliver (packages/SwingSet/src/kernel/vatManager/manager-helper.js:157:25)
  at async deliverAndLogToVat (packages/SwingSet/src/kernel/kernel.js:393:30)
  at async deliverToVat (packages/SwingSet/src/kernel/kernel.js:423:7)
  at async deliverToTarget (packages/SwingSet/src/kernel/kernel.js:452:9)
  at async processQueueMessage (packages/SwingSet/src/kernel/kernel.js:649:9)
  at async meteredConstructor.run (packages/SwingSet/src/kernel/kernel.js:980:7)

To Reproduce

IOU

Expected behavior

A clear and concise description of what you expected to happen.

Platform Environment

I ran into this while on a branch for #2848 : 601ef07; I haven't verified that this happens on master, though all the problematic code seems to be on master.)

cc @warner

@dckc dckc added bug Something isn't working SwingSet package: SwingSet labels Jun 20, 2021
@dckc
Copy link
Member Author

dckc commented Jun 20, 2021

@warner
Copy link
Member

warner commented Jun 21, 2021

I experimented with node's fs.writeSync, and wasn't able to provoke an error with:

  • bad position (writing beyond the current end just makes a sparse file, writing to a negative position is either ignored or treated as zero, not sure)
  • bad length (although using a negative length cause an abort and a coredump from Node, but no ESPIPE)
  • deleting the file while holding an open fd (as expected)

not sure what could cause it.

I think the next step, if we can't reproduce it on demand, is to look at the linux kernel sources to see what could possibly cause ESPIPE, and then backtrack from there to figure out how Node's fs builtin could be provoked into doing that.

@dckc
Copy link
Member Author

dckc commented Jun 21, 2021

Linux kernel sources... Hm... I suggest trying sqlite (#3087 ), even just for transcripts initially, is getting more and more cost-effective.

@warner
Copy link
Member

warner commented Jun 21, 2021

It looks like Node's writeSync calls libuv's uv_fs_write which (when given an offset) uses the kernel's pwrite() call. (Today I learned that pwrite() exists, and it's like write() except it writes at a given offset and leave the file pointer unchanged). The man page for pwrite() suggets that ESPIPE is signaled when "The file descriptor is associated with a pipe, socket, or FIFO." (probably because you can't seek on a pipe).

In doing some experiments, it also looks like ESPIPE is returned when pwrite()ing to a closed FD. (That's annoying, because a normal write() to a closed FD gives you EBADF which makes a lot more sense).

@FUDCo could you eyeball the streamStore and see if there's any way we might be writing to an FD after it's been closed? I'm worried that could turn into a corruption bug, if we aren't forgetting the FD at the right time and might wind up writing to a stale FD that's been re-opened for some other purpose.

@warner warner self-assigned this Jun 21, 2021
@warner warner added this to the Testnet: Stress Test Phase milestone Jun 21, 2021
@warner
Copy link
Member

warner commented Jun 21, 2021

adding this to the stress-test milestone because corruption would be pretty bad

@FUDCo
Copy link
Contributor

FUDCo commented Jun 21, 2021

The error in question happens when you attempt to perform a seek on a pipe, which obviously doesn't work. The mystery is how the stream store could end up holding an fd to a pipe, since it only acquires fds by calling fs.openSync. However, the fact that the stack trace references something called Proxy.writeSync strikes me as suspicious.

@FUDCo
Copy link
Contributor

FUDCo commented Jun 21, 2021

Addendum: I wrote the above before I saw @warner's prior comment.

@warner
Copy link
Member

warner commented Jun 21, 2021

(side note, if we could hold onto stream objects instead of raw FD integers, this wouldn't be a problem, but IIRC we didn't find any good seek-like behavior from the object-ish API)

@FUDCo
Copy link
Contributor

FUDCo commented Jun 21, 2021

Reading the docs more closely: it seems pwrite is like a seek followed by a write, except that it "works" with unseekable things by writing to the current file position and then leaving the file position thereafter undefined (which is terrible). Apparently you CAN use it to write to a pipe if somebody else is actually reading from the pipe. This is an API designed either by a genius who understands something really deep and subtle that is not explained, or by somebody who is either an idiot or on drugs (or both; I suppose those are not mutually exclusive and may indeed be correlated).

In any event, I don't see how the error in question could possibly happen, because I don't see how one could end up wired to a pipe. There may be a point of difference between Darwin and Linux, in that, contra Brian's experience, if I try an fs.writeSync to a closed fd I get EBADF, not EPIPE

@rowgraus
Copy link

@warner to take 5 minute pass and kick out from stress test milestone

@warner
Copy link
Member

warner commented Jun 22, 2021

@FUDCo it seems unlikely, but what if reader() hit the last line (thus calling closefd(fd) in the else clause), but then something had happened to readStatus or streamStatus.get(streamName) such that the assert failed and the following closeStream() wasn't called? I dunno, some weird variable-binding thing, or two parties reading the same stream. Or someone calling close() while there was still an active reader.. that sounds like a plausible error situation.

In any case, I'm not seeing any obvious ways to write to a closed file either, so I'm ok kicking this out of the stress-test milestone. Hopefully nobody will find a way to trigger it in the next week.

@warner warner removed this from the Testnet: Stress Test Phase milestone Jun 22, 2021
@dckc
Copy link
Member Author

dckc commented Jun 22, 2021

Given that I can't reproduce it, go ahead and close it until/unless it shows up again?

@dckc dckc added the invalid This doesn't seem right label Jul 21, 2021
@dckc dckc closed this as completed Jul 21, 2021
@FUDCo
Copy link
Contributor

FUDCo commented Jul 21, 2021

This won't reproduce because you replaced the stream file with Sqlite.

@dckc dckc removed the invalid This doesn't seem right label Jul 22, 2021
@dckc
Copy link
Member Author

dckc commented Jul 22, 2021

ok, so we actually fixed it, then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

4 participants