Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on deserialize_from() with malformed data (attempting to allocate about 6.6 exabytes) #239

Closed
NoraCodes opened this issue Jun 29, 2018 · 7 comments

Comments

@NoraCodes
Copy link

Hi! The attached input (un-gzipped; github won't let me upload it raw) causes deserialize_from() to attempt to allocate 6.6 EB of memory.
test.bincode.gz
exabytes1

@TyOverby
Copy link
Collaborator

Can you post the code that you are using to deserialize?

@NoraCodes
Copy link
Author

NoraCodes commented Jul 9, 2018

I have a struct definition:

#[derive(Debug, PartialEq, Deserialize)]
struct GeneVariant {
    gene: String,
    p_dot_name: Option<String>,
    c_dot_name: Option<String>,
}

This is deserialized in a loop:

loop {
        match bincode::deserialize_from(&mut rdr) {
            Ok(variant) => { 
                // Do things with the variant
            }
            Err(error) => match *error {
                bincode::ErrorKind::Io(ioerror) => match ioerror.kind() {
                    io::ErrorKind::UnexpectedEof => break,
                    _ => panic!("Error ingesting variants from bincode: {}", ioerror)
                }
                error => error!("Unable to parse variant: {}", error)
            }
        }
    }

@TyOverby
Copy link
Collaborator

TyOverby commented Jul 23, 2018

try using the limit method.

bincode::config().limit(max number of bytes).deserialize_from(....)

@NoraCodes
Copy link
Author

This fixes the issue, but I don't really understand why. Where is it getting that extra data from?

@TyOverby
Copy link
Collaborator

TyOverby commented Jul 24, 2018

It's not getting extra data, it's preallocating a vector with way too much memory in expectation of a huge payload.

The first couple bytes that are decoded tell the String how long it's going to be, and it tries to pre-allocate all that memory for performance reasons. This is why the limit api exists, to kill the entire deserialization if too many bytes are asked to be read early in the process.

After reading through the code that does the pre-allocation, I think bincode can be much smarter about preallocation in the non-limited areas. I'll look into making this better. In the meantime, I highly recommend using the limit api.

@NoraCodes
Copy link
Author

Understood, thank you!

@TyOverby
Copy link
Collaborator

I've filed a new bug for the feature if you want to follow that one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants