If you’re in information security you’ve probably heard a lot about serialization bugs. They are becoming increasingly common, and I wanted to give a basic overview of how they work and why they’re an issue.
The parsing problem
So much of security comes down to parsing. It’s the primary reason we need input validation, and the reason that software like antivirus and network protocol analyzers can have so many security issues.
The job of a parser is to take input from somewhere else and run it through your own software. That should frighten you. It’s like a CDC employee using the ‘open and lick’ method to test petri dish samples.
Bottom line: If you’re going to parse something, you have to get intimate with it.
And that brings us to serialization.
Serialization is the process of capturing a data structure or an object’s state into a (serial) format that can be efficiently stored or transmitted for later consumption.
So you can take an object, capture its state, and then put it in memory, write it to disk, or send it over the network. Then at some point the object can be retrieved and consumed, restoring the object’s state.
A basic example of serialization might be to take the following array:
$array = array("a" => 1, "b" => 2, "c" => array("a" => 1, "b" => 2));
And to serialize it into this:
At its core, serialization is a type of encoding.
So this brings us to the core issue: deserialization requires parsing.
In order to go from that serialized format to usable data, some software package needs to unpack that content, figure it out, and then consume it.
Unfortunately, this is precisely what parsers are so bad at. And doing it wrong can lead to all manner of security flaws, up to and including arbitrary code execution.
- Parsing untrusted input is hard
- Serialization takes data and encodes it into opaque formats for transfer and storage
- To make use of that content, parsers must unpack and consume it
- It’s extremely hard to do this correctly, and if you do it wrong it could mean code execution
- Don’t deserialize untrusted data if you can avoid it
- If you can’t avoid it, just realize you’re asking your parsing software to lick some petri dishes labeled “SAMPLE UNKNOWN”, and explore your options for making it so you don’t have to do this anymore
This overall concept applies to most any language that uses serialization, but some languages (like Java) are in worse shape than others.