Crash on deserialize_from() with malformed data (attempting to allocate about 6.6 exabytes) #239

NoraCodes · 2018-06-29T19:10:46Z

Hi! The attached input (un-gzipped; github won't let me upload it raw) causes deserialize_from() to attempt to allocate 6.6 EB of memory.
test.bincode.gz

The text was updated successfully, but these errors were encountered:

TyOverby · 2018-06-30T06:15:56Z

Can you post the code that you are using to deserialize?

NoraCodes · 2018-07-09T20:05:41Z

I have a struct definition:

#[derive(Debug, PartialEq, Deserialize)]
struct GeneVariant {
    gene: String,
    p_dot_name: Option<String>,
    c_dot_name: Option<String>,
}

This is deserialized in a loop:

loop {
        match bincode::deserialize_from(&mut rdr) {
            Ok(variant) => { 
                // Do things with the variant
            }
            Err(error) => match *error {
                bincode::ErrorKind::Io(ioerror) => match ioerror.kind() {
                    io::ErrorKind::UnexpectedEof => break,
                    _ => panic!("Error ingesting variants from bincode: {}", ioerror)
                }
                error => error!("Unable to parse variant: {}", error)
            }
        }
    }

TyOverby · 2018-07-23T19:33:51Z

try using the limit method.

bincode::config().limit(max number of bytes).deserialize_from(....)

NoraCodes · 2018-07-24T15:49:41Z

This fixes the issue, but I don't really understand why. Where is it getting that extra data from?

TyOverby · 2018-07-24T17:32:04Z

It's not getting extra data, it's preallocating a vector with way too much memory in expectation of a huge payload.

The first couple bytes that are decoded tell the String how long it's going to be, and it tries to pre-allocate all that memory for performance reasons. This is why the limit api exists, to kill the entire deserialization if too many bytes are asked to be read early in the process.

After reading through the code that does the pre-allocation, I think bincode can be much smarter about preallocation in the non-limited areas. I'll look into making this better. In the meantime, I highly recommend using the limit api.

NoraCodes · 2018-07-24T19:39:54Z

Understood, thank you!

TyOverby · 2018-07-24T19:53:25Z

I've filed a new bug for the feature if you want to follow that one!

TyOverby closed this as completed Jul 24, 2018

TyOverby mentioned this issue Jul 24, 2018

Don't preallocate so much on vectors and strings. #240

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash on deserialize_from() with malformed data (attempting to allocate about 6.6 exabytes) #239

Crash on deserialize_from() with malformed data (attempting to allocate about 6.6 exabytes) #239

NoraCodes commented Jun 29, 2018

TyOverby commented Jun 30, 2018

NoraCodes commented Jul 9, 2018 •

edited

Loading

TyOverby commented Jul 23, 2018 •

edited

Loading

NoraCodes commented Jul 24, 2018

TyOverby commented Jul 24, 2018 •

edited

Loading

NoraCodes commented Jul 24, 2018

TyOverby commented Jul 24, 2018

Crash on deserialize_from() with malformed data (attempting to allocate about 6.6 exabytes) #239

Crash on deserialize_from() with malformed data (attempting to allocate about 6.6 exabytes) #239

Comments

NoraCodes commented Jun 29, 2018

TyOverby commented Jun 30, 2018

NoraCodes commented Jul 9, 2018 • edited Loading

TyOverby commented Jul 23, 2018 • edited Loading

NoraCodes commented Jul 24, 2018

TyOverby commented Jul 24, 2018 • edited Loading

NoraCodes commented Jul 24, 2018

TyOverby commented Jul 24, 2018

NoraCodes commented Jul 9, 2018 •

edited

Loading

TyOverby commented Jul 23, 2018 •

edited

Loading

TyOverby commented Jul 24, 2018 •

edited

Loading