BinData - Parsing Binary Data in Crystal Lang
BinData provides a declarative way to read and write structured binary data.
This means the programmer specifies what the format of the binary data is, and BinData works out how to read and write data in this format. It is an easier (and more readable) alternative.
Usage
Firstly, it's recommended that you specify the datas endian.
class Header < BinData
endian big
end
Then you can specify the structures fields. There are a few different field types:
- Core types
UInt8toUInt128,Int8toInt128,Float32andFloat64- mix endianness per field:
, endian: IO::ByteFormat::LittleEndian
- Custom types
- anything that is
IOserializable (implementsto_io/from_io)
- anything that is
- Bit Fields
- a group of fields whose values are defined by the number of bits used to represent them
- the total number of bits in a bit field must be divisible by 8
- they follow the class
endian(littlebyte-swaps the bitfield); override per field withbit_field endian: :little/:big
- Groups
- an embedded
BinDataclass with access to the parent fields - useful when a group of fields are related or optional
- an embedded
- Enums
- Bools
- Arrays and Sets (fixed size and dynamic)
- Strings (null-terminated, or fixed-size with an optional
encoding:) - Raw
Bytes
Examples
see the spec helper for all possible manipulations
enum Inputs
VGA
HDMI
HDMI2
end
class Packet < BinData
endian big
# Default sets the value at initialisation.
field start : UInt8 = 0xFF_u8
# Value procs assign these values before writing to an IO, overwriting any
# existing value
field size : UInt16, value: ->{ text.bytesize + 1 }
# String fields without a length use `\0` null byte termination
# Length is being calculated by the size field above
field text : String, length: ->{ size - 1 }
# Bit fields should only be used when one or more fields are not byte aligned
# The sum of the bits in a bit field must be divisible by 8
bit_field do
# a bits value can be between 1 and 128 bits long
bits 5, reserved
# Bool values are a single bit
bool set_input = false
# This enum is represented by 2 bits
bits 2, input : Inputs = Inputs::HDMI2
end
# isolated namespace
group :extended, onlyif: ->{ start == 0xFF } do
field start : UInt8 = 0xFF_u8
# Supports custom objects as long as they implement `from_io`
field header : ExtHeader = ExtHeader.new
end
# optionally read the remaining bytes out of io
remaining_bytes :rest
end
The object above can then be accessed like any other object
pack = io.read_bytes(Packet)
pack.size # => 12
pack.text # => "hello world"
pack.input # => Inputs::HDMI
pack.set_input # => true
pack.extended.start # => 255
Additionally, BinData fields support a verify proc, which allows data to be verified while reading and writing io.
class VerifyData < BinData
endian big
field size : UInt8
field bytes : Bytes, length: ->{ size }
field checksum : UInt8, verify: ->{ checksum == bytes.reduce(0) { |acc, i| acc + i } }
end
If the verify proc returns false, a BinData::VerificationException is raised with a message matching the following format.
Failed to verify reading basic at VerifyData.checksum
Inheritance is also supported
Callbacks
Callbacks can helpful for providing accessors for simplified representations of the data.
class CallbackTest < BinData
endian little
field integer : UInt8
property external_representation : UInt16 = 0
before_serialize { self.integer = (external_representation // 2).to_u8 }
after_deserialize { self.external_representation = integer.to_u16 * 2_u16 }
end
ASN.1 Helpers
Included in this library are helpers for decoding and writing ASN.1 BER data, such as those used in SNMP, LDAP and X.509.
require "bindata/asn1"
# Build an element with one of the typed setters and write it to an IO
ber = ASN1::BER.new
ber.set_integer(42)
io.write_bytes(ber)
# Read it back, then decode with the matching getter
ber = io.read_bytes(ASN1::BER)
ber.tag_class # => ASN1::BER::TagClass::Universal
ber.get_integer # => 42
Typed payload accessors cover the common universal types:
ber.set_integer(42); ber.get_integer # => 42
ber.set_string("hi"); ber.get_string # => "hi"
ber.set_boolean(true); ber.get_boolean # => true
ber.set_object_id("1.2.840.113549.1.1.1"); ber.get_object_id # round-trips
ber.set_hexstring("00ff"); ber.get_hexstring # => "00ff"
A constructed element (a Sequence or Set) can be split into / built from its children:
seq = io.read_bytes(ASN1::BER)
seq.children # => [ASN1::BER, ASN1::BER, ...]
out = ASN1::BER.new
out.tag_number = ASN1::BER::UniversalTags::Sequence
out.children = [child1, child2]
When parsing untrusted input, set a max_content_length before reading so a hostile
length field cannot force a huge allocation. The cap propagates to children.
ber = ASN1::BER.new
ber.max_content_length = 64 * 1024
ber.read(io) # raises ASN1::BER::ContentTooLarge if any element exceeds the cap
Errors
Every (de)serialization error derives from BinData::CustomException, which carries the
failing type and field:
BinData::ParseError/BinData::WriteErrorwrap any error hit while reading / writing a fieldBinData::VerificationExceptionis raised when averify:callback returnsfalse
ASN.1 helpers raise ASN1::BER::InvalidTag, ASN1::BER::InvalidObjectId and
ASN1::BER::ContentTooLarge for malformed input.
Thread safety
A BinData instance is not shared between fibers, but reading and writing different
instances of the same type concurrently is safe — the generated (de)serialization keeps no
shared mutable state.
Real World Examples
- ASN.1
- https://github.com/crystal-community/jwt/blob/master/src/jwt.cr#L251
- https://github.com/spider-gazelle/crystal-ldap
- enums and bit fields
- https://github.com/spider-gazelle/knx/blob/master/src/knx/cemi.cr#L195
- variable sized arrays
- https://github.com/spider-gazelle/crystal-bacnet/blob/master/src/bacnet/virtual_link_control/secure_bvlci.cr#L54