-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve DataBlock memory consumption and validation #1089
Improve DataBlock memory consumption and validation #1089
Conversation
Hello @cltnschlosser 👋. Thank you for opening PR - at first glance it looks OK and it indeed makes sense to control the size of a buffer beforehand. Given that we already have similar logic controlling the "write size" we need some time within the team to discuss best strategy of incorporating this change. Stay tuned, we'll get back to you. |
var bytes = [UInt8](repeating: 0, count: length) | ||
let count = stream.read(&bytes, maxLength: length) | ||
guard length > 0 else { | ||
return Data() | ||
} | ||
|
||
// Load from stream to data without unnecessary copies | ||
var data = Data(repeating: 0, count: length) | ||
let count = try data.withUnsafeMutableBytes { (bytes: UnsafeMutableRawBufferPointer) in | ||
guard let buffer = bytes.assumingMemoryBound(to: UInt8.self).baseAddress else { | ||
throw DataBlockError.dataAllocationFailure | ||
} | ||
return stream.read(buffer, maxLength: length) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cltnschlosser We wonder what's the reasoning behind this change? The comment refers to unnecessary copy, but I don't see directly how this was addressed with changing from [UInt8](repeating: 0, count: length)
to Data(repeating: 0, count: length)
and accessing raw pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So before there was 3 copies of the data:
file.read()
read the entire contents into aData
.bytes
the[UInt8]
buffer- The
Data
that was initialized withbytes
. This is a copy.
So 2 was very short lived and it's possible that the swift optimizer removes that copy, but I don't know for sure.
1 was solved by using the InputStream
api that loads directly from a file, instead of from Data
. 2 and 3 were then merged using Data(repeating: 0, count: length)
and then reading directly to that Data
.
There is also probably an optimization where you use an uninitialized Data
, but I was having some trouble and didn't want to spend too much time on it. I just noticed this api which is probably better than using repeating: 0
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're uncomfortable with this, I can revert it. I don't think these extra copies were the cause of my memory issue since they are temporary, but it probably improves performance slightly since it doesn't have to copy the data twice (to [UInt8]
, then Data
. Just directly to Data
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explanation 👍 - now it makes sense given this broader picture. I was wrong by thinking that "copies" relate to local scope in this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @cltnschlosser, this is a very valuable contribution 🙏
It's ok for me to merge it, I will change the target branch so we can replace the MAX_BLOCK_SIZE
with the performance preset configuration before merging it to develop
. Thanks again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again @cltnschlosser ! This PR is in a good shape to merge it - we will then continue this work on our side 🙌.
@ncreated Hey, just checking in on your timeline for the updates? |
Hey @cltnschlosser 👋 I'm currently working on replacing the |
What and why?
Details in #1091
This leads me to 3 possibilities:
NOTE: Just realized we didn't start seeing this until after 1.12.0 upgrade.
Yeah, after digging into the 1.12.0 changes, I'm realizing that a bug causing the first possibility is the most likely. The data is already in memory at this point, and it's just being copied into a buffer. The crash is occurring during that buffer creation. It's likely that the length being provided is incorrect.
How?
I address these:
Int(exactly:)
which could still be huge. BlockSize is aUInt32
which is 4GB.Review checklist
Custom CI job configuration (optional)