BinData - Parsing Binary Data in Crystal Lang

BinData provides a declarative way to read and write structured binary data.

This means the programmer specifies what the format of the binary data is, and BinData works out how to read and write data in this format. It is an easier (and more readable) alternative.

Build Status

Usage

Firstly, it's recommended that you specify the datas endian.

class Header < BinData
  endian big
end

Then you can specify the structures fields. There are a few different field types:

  1. Core types
    • UInt8 to UInt128, Int8 to Int128, Float32 and Float64
    • mix endianness per field: , endian: IO::ByteFormat::LittleEndian
  2. Custom types
  3. Bit Fields
    • a group of fields whose values are defined by the number of bits used to represent them
    • the total number of bits in a bit field must be divisible by 8
    • they follow the class endian (little byte-swaps the bitfield); override per field with bit_field endian: :little/:big
  4. Groups
    • an embedded BinData class with access to the parent fields
    • useful when a group of fields are related or optional
  5. Enums
  6. Bools
  7. Arrays and Sets (fixed size and dynamic)
  8. Strings (null-terminated, or fixed-size with an optional encoding:)
  9. Raw Bytes

Examples

see the spec helper for all possible manipulations

  enum Inputs
    VGA
    HDMI
    HDMI2
  end

  class Packet < BinData
    endian big

    # Default sets the value at initialisation.
    field start : UInt8 = 0xFF_u8

    # Value procs assign these values before writing to an IO, overwriting any
    # existing value
    field size : UInt16, value: ->{ text.bytesize + 1 }

    # String fields without a length use `\0` null byte termination
    # Length is being calculated by the size field above
    field text : String, length: ->{ size - 1 }

    # Bit fields should only be used when one or more fields are not byte aligned
    # The sum of the bits in a bit field must be divisible by 8
    bit_field do
      # a bits value can be between 1 and 128 bits long
      bits 5, reserved

      # Bool values are a single bit
      bool set_input = false

      # This enum is represented by 2 bits
      bits 2, input : Inputs = Inputs::HDMI2
    end

    # isolated namespace
    group :extended, onlyif: ->{ start == 0xFF } do
      field start : UInt8 = 0xFF_u8

      # Supports custom objects as long as they implement `from_io`
      field header : ExtHeader = ExtHeader.new
    end

    # optionally read the remaining bytes out of io
    remaining_bytes :rest
  end

The object above can then be accessed like any other object

  pack = io.read_bytes(Packet)
  pack.size # => 12
  pack.text # => "hello world"
  pack.input # => Inputs::HDMI
  pack.set_input # => true
  pack.extended.start # => 255

Additionally, BinData fields support a verify proc, which allows data to be verified while reading and writing io.

class VerifyData < BinData
  endian big

  field size : UInt8
  field bytes : Bytes, length: ->{ size }
  field checksum : UInt8, verify: ->{ checksum == bytes.reduce(0) { |acc, i| acc + i } }
end

If the verify proc returns false, a BinData::VerificationException is raised with a message matching the following format.

Failed to verify reading basic at VerifyData.checksum

Inheritance is also supported

Callbacks

Callbacks can helpful for providing accessors for simplified representations of the data.

class CallbackTest < BinData
  endian little

  field integer : UInt8

  property external_representation : UInt16 = 0

  before_serialize { self.integer = (external_representation // 2).to_u8 }
  after_deserialize { self.external_representation = integer.to_u16 * 2_u16 }
end

ASN.1 Helpers

Included in this library are helpers for decoding and writing ASN.1 BER data, such as those used in SNMP, LDAP and X.509.

require "bindata/asn1"

# Build an element with one of the typed setters and write it to an IO
ber = ASN1::BER.new
ber.set_integer(42)
io.write_bytes(ber)

# Read it back, then decode with the matching getter
ber = io.read_bytes(ASN1::BER)
ber.tag_class  # => ASN1::BER::TagClass::Universal
ber.get_integer # => 42

Typed payload accessors cover the common universal types:

ber.set_integer(42);                  ber.get_integer    # => 42
ber.set_string("hi");                 ber.get_string     # => "hi"
ber.set_boolean(true);                ber.get_boolean    # => true
ber.set_object_id("1.2.840.113549.1.1.1"); ber.get_object_id # round-trips
ber.set_hexstring("00ff");            ber.get_hexstring  # => "00ff"

A constructed element (a Sequence or Set) can be split into / built from its children:

seq = io.read_bytes(ASN1::BER)
seq.children # => [ASN1::BER, ASN1::BER, ...]

out = ASN1::BER.new
out.tag_number = ASN1::BER::UniversalTags::Sequence
out.children = [child1, child2]

When parsing untrusted input, set a max_content_length before reading so a hostile length field cannot force a huge allocation. The cap propagates to children.

ber = ASN1::BER.new
ber.max_content_length = 64 * 1024
ber.read(io) # raises ASN1::BER::ContentTooLarge if any element exceeds the cap

Errors

Every (de)serialization error derives from BinData::CustomException, which carries the failing type and field:

ASN.1 helpers raise ASN1::BER::InvalidTag, ASN1::BER::InvalidObjectId and ASN1::BER::ContentTooLarge for malformed input.

Thread safety

A BinData instance is not shared between fibers, but reading and writing different instances of the same type concurrently is safe — the generated (de)serialization keeps no shared mutable state.

Real World Examples