Comparison of data serialization formats

From Infogalactic: the planetary knowledge core
Jump to: navigation, search


This is a comparison of data serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.

Overview

Name Creator-maintainer Based on Standardized? Specification Binary? Human-readable? Supports references?e Schema-IDL? Standard APIs Supports Zero-copy operations
Apache Avro Apache Software Foundation N/A Yes Apache Avro™ 1.7.5 Specification Yes No N/A Yes (built-in) N/A N/A
ASN.1 ISO, IEC, ITU-T N/A Yes ISO/IEC 8824; X.680 series of ITU-T Recommendations Yes
(BER, DER, PER, OER, or custom via ECN)
Yes
(XER, GSER, or custom via ECN)
Partialf Yes (built-in) N/A N/A
Bencode Bram Cohen (creator)
BitTorrent, Inc. (maintainer)
N/A Yes Part of BitTorrent protocol specification Partially
(numbers and delimiters are ASCII)
No No No No N/A
Binn Bernardo Ramos N/A Yes Binn Specification Yes No No No No Yes
Bond Microsoft N/A No Bond IDL Specification Yes Yes
(JSON,
XML)
No Yes No N/A
BSON MongoDB JSON Yes BSON Specification Yes No No No No N/A
Candle Markup Henry Luo XML, JSON, JavaFX Yes Candle Markup Reference No Yes Yes
(XPointer, XPath)
Yes
(Candle Pattern Reference)
Yes
(XQuery, XPath)
N/A
Cap’n Proto Kenton Varda N/A No Cap'n Proto Encoding Spec Yes No Yes Yes No Yes
Comma-separated values (CSV) RFC author:
Yakov Shafranovich
N/A Partial
(myriad informal variants used)
RFC 4180
(among others)
No Yes No No No No
D-Bus Message Protocol freedesktop.org N/A Yes D-Bus Specification Yes No No Partial
(Signature strings)
Yes
(see D-Bus)
N/A
Flat Buffers Google N/A N/A flatbuffers github page Specification Yes No N/A Supports references?e Yes [1] C++, with support for Java, C# and Go Yes
GVariant GLib D-Bus MP Yes GVariant Serialization Yes No No Yes
(Type strings)
No N/A
Fast Infoset ISO, IEC, ITU-T XML Yes ITU-T X.891 and ISO/IEC 24824-1:2007 Yes Yes
(XML)
Yes
(XPointer, XPath)
Yes
(XML schema)
Yes
(DOM, SAX, XQuery, XPath)
N/A
HOCON Typesafe Inc. JSON No "HOCON (Human-Optimized Config Object Notation)" No Yes Yes ? Yes
(native Java API for all JVM languages)
No
JSON Douglas Crockford JavaScript syntax Yes RFC 7159
(ancillary:
RFC 6901,
RFC 6902)
No, but see BSON, Smile, UBJSON Yes Yes
(JSON Pointer (RFC 6901);
alternately:
JSONPath, JPath, JSPON, json:select())
Partial
(JSON Schema Proposal, Kwalify, Rx, Itemscript Schema)
Partial
(Clarinet, JSONQuery, JSONPath)
No
KMIP Oasis n/a Yes Oasis Yes (Tag, Type, Length, Value) Yes No No No N/A
MessagePack Sadayuki Furuhashi JSON (loosely) Yes MessagePack format specification Yes No No No No Yes
Netstrings Dan Bernstein N/A Yes netstrings.txt Yes Yes No No No Yes
OGDL Rolf Veen ? Yes Specification Yes
(Binary Specification)
Yes Yes
(Path Specification)
Yes
(Schema WD)
N/A
OpenDDL Eric Lengyel C, PHP Yes OpenDDL.org No Yes Yes No Yes
(OpenDDL Library)
N/A
PHP's serialize() & unserialize() PHP Group N/A Yes No Yes Yes Yes No Yes N/A
Data::Dumper format (Core Perl Module) Gurusamy Sarathy (ActiveState developer) Perl data types Yes No ? Yes No ? Yes N/A
Property list NeXT (creator)
Apple (maintainer)
? Partial Public DTD for XML format Yesa Yesb No ? Cocoa, CoreFoundation, OpenStep, GnuStep No
Protocol Buffers (protobuf) Google N/A Yes Developer Guide: Encoding Yes Partiald No Yes (built-in) C++, Java, Python No
ROOT CERN & FNAL N/A No N/A Yes Yes
(optional XML output for debugging)
Yes Yes
(C++ object persistency framework)
Yes
(Native C++ API, bindings for Python, Ruby, and others)
N/A
S-expressions Internet Draft author:
Ron Rivest
Lisp, Netstrings Partial
(largely de facto)
"S-Expressions" Internet Draft Yes
("Canonical representation")
Yes
("Advanced transport representation")
No No N/A
SCaViS jWork.ORG N/A Yes N/A Yes Yes
(XML, Java Serialization, ProtocolBuffers)
Yes Yes
(Java object persistency, XML, ProtocolBuffers)
Yes
(Native Java API, bindings for Jython, JRuby, Groovy and others)
N/A
Smile Tatu Saloranta JSON Yes Smile Format Specification Yes No No Partial
(JSON Schema Proposal, other JSON schemas/IDLs)
Partial
(via JSON APIs implemented with Smile backend, on Jackson, Python)
N/A
Structured Data eXchange Formats Max Wildgrube N/A Yes RFC 3072 Yes No No No N/A
Thrift Facebook (creator)
Apache (maintainer)
N/A No Original whitepaper Yes Partialc No Yes (built-in) N/A
UBJSON The Buzz Media, LLC JSON, BSON No [2] Yes No No No No N/A
eXternal Data Representation (XDR) Sun Microsystems (creator)
IETF (maintainer)
N/A Yes RFC 4506 Yes No Yes Yes Yes N/A
XML W3C SGML Yes W3C Recommendations:
1.0 (Fifth Edition)
1.1 (Second Edition)
Partial
(Binary XML)
Yes Yes
(XPointer, XPath)
Yes
(XML schema, RELAX_NG)
Yes
(DOM, SAX, XQuery, XPath)
No
XML-RPC Dave Winer[1] XML, SOAP[1] Yes XML-RPC Specification No Yes No No No No
YAML Clark Evans,
Ingy döt Net,
and Oren Ben-Kiki
C, Java, Perl, Python, Ruby, Email, HTML, MIME, URI, XML, SAX, SOAP, JSON[2] Yes Version 1.2 No Yes Yes Partial
(Kwalify, Rx, built-in language type-defs)
No No
  • a. ^ The current default format is binary.
  • b. ^ The "classic" format is plain text, and an XML format is also supported.
  • c. ^ Theoretically possible due to abstraction, but no implementation is included.
  • d. ^ The primary format is binary, but a text format is available.[3]
  • e. ^ Means that generic tools/libraries know how to encode, decode, and dereference a reference to another piece of data in the same document. A tool may require the IDL file, but no more. Excludes custom, non-standardized referencing techniques.
  • f. ^ ASN.1 does offer OIDs, a standard format for globally unique identifiers, as well as a standard notation ("absolute reference") for referencing a component of a value. Thus it would be possible to reference a component of an encoded value present in a document by combining an OID (assigned to the document) and an "absolute reference" to the component of the value. However, there is no standard way to indicate that a field contains such an absolute reference. Therefore, a generic ASN.1 tool/library cannot automatically encode/decode/resolve references within a document without help from custom-written program code.

Syntax comparison of human-readable formats

Format Null Boolean true Boolean false Integer Floating-point String Array Associative array/Object
ASN.1
(XML Encoding Rules)
<foo /> <foo>true</foo> <foo>false</foo> <foo>685230</foo> <foo>6.8523015e+5</foo> <foo>A to Z</foo>
<SeqOfUnrelatedDatatypes>
    <isMarried>true</isMarried>
    <hobby />
    <velocity>-42.1e7</velocity>
    <bookname>A to Z</bookname>
    <bookname>We said, "no".</bookname>
</SeqOfUnrelatedDatatypes>
An object (the key is a field name):
<person>
    <isMarried>true</isMarried>
    <hobby />
    <height>1.85</height>
    <name>Bob Peterson</name>
</person>

A data mapping (the key is a data value):

<competition>
    <measurement>
        <name>John</name>
        <height>3.14</height>
    </measurement>
    <measurement>
        <name>Jane</name>
        <height>2.718</height>
    </measurement>
</competition>

a

Candle Markup (), "" true false 685230
-685230
6.8523015e+5 "A to Z"
"""
A
to
Z
"""
(true, (), -42.1e7, "A to Z")
_{%342=true A%20to%20Z=(1, 2, 3)}
or
_{
  _{key=42 value=true}
  _{key="A to Z" value=(1, 2, 3)}
}
CSVb nulla
(or an empty element in the row)a
1a
truea
0a
falsea
685230
-685230a
6.8523015e+5a A to Z
"We said, ""no""."
true,,-42.1e7,"A to Z"
42,1
A to Z,1,2,3
Netstringsc 0:,a
4:null,a
1:1,a
4:true,a
1:0,a
5:false,a
6:685230,a 9:6.8523e+5,a 6:A to Z, 29:4:true,0:,7:-42.1e7,6:A to Z,, 41:9:2:42,1:1,,25:6:A to Z,12:1:1,1:2,1:3,,,,a
JSON null true false 685230
-685230
6.8523015e+5 "A to Z" [true, null, -42.1e7, "A to Z"] {"42": true, "A to Z": [1, 2, 3]}
OGDL[verification needed] nulla truea falsea 685230a 6.8523015e+5a "A to Z"
'A to Z'
NoSpaces
true
null
-42.1e7
"A to Z"

(true, null, -42.1e7, "A to Z")

42
  true
"A to Z"
  1
  2
  3
42
  true
"A to Z", (1, 2, 3)
OpenDDL ref {null} bool {true} bool {false} int32 {685230}
int32 {0x74AE}
int32 {0b111010010101110}
float {6.8523015e+5} string {"A to Z"} Homogeneous array:
int32 {1, 2, 3, 4, 5}

Heterogeneous array:

array
{
    bool {true}
    ref {null}
    float {-42.1e7}
    string {"A to Z"}
}
dict
{
    value (key = "42") {bool {true}}
    value (key = "A to Z") {int32 {1, 2, 3}}
}
PHP's serialize() & unserialize() N; b:1; b:0; i:685230;
i:-685230;
d:685230.150000000023283064365386962890625;
d:INF;
d:-INF;
d:NAN;
s:6:"A to Z"; a:4:{i:0;b:1;i:1;N;i:2;d:-421000000;i:3;s:6:"A to Z";} Associative array:
a:2:{i:42;b:1;s:6:"A to Z";a:3:{i:0;i:1;i:1;i:2;i:2;i:3;}}
Object:
O:8:"stdClass":2:{s:4:"John";d:3.140000000000000124344978758017532527446746826171875;s:4:"Jane";d:2.717999999999999971578290569595992565155029296875;}
Property list
(plain text format)[4]
N/A <*BY> <*BN> <*I685230> <*R6.8523015e+5> "A to Z" ( <*BY>, <*R-42.1e7>, "A to Z" )
{
    "42" = <*BY>;
    "A to Z" = ( <*I1>, <*I2>, <*I3> );
}
Property list
(XML format)[5][6]
N/A <true /> <false /> <integer>685230</integer> <real>6.8523015e+5</real> <string>A to Z</string>
<array>
    <true />
    <real>-42.1e7</real>
    <string>A to Z</string>
</array>
<dict>
    <key>42</key>
    <true />
    <key>A to Z</key>
    <array>
        <integer>1</integer>
        <integer>2</integer>
        <integer>3</integer>
    </array>
</dict>
S-expressions NIL
nil
T
#te
true
NIL
#fe
false
685230 6.8523015e+5 abc
"abc"
#616263#
3:abc
{MzphYmM=}
|YWJj|
(T NIL -42.1e7 "A to Z") ((42 T) ("A to Z" (1 2 3)))
YAML ~
null
Null
NULL[7]
y
Y
yes
Yes
YES
on
On
ON
true
True
TRUE[8]
n
N
no
No
NO
off
Off
OFF
false
False
FALSE[8]
685230
+685_230
-685230
02472256
0x_0A_74_AE
0b1010_0111_0100_1010_1110
190:20:30[9]
6.8523015e+5
685.230_15e+03
685_230.15
190:20:30.15
.inf
-.inf
.Inf
.INF
.NaN
.nan
.NAN[10]
A to Z
"A to Z"
'A to Z'
[y, ~, -42.1e7, "A to Z"]
- y
-
- -42.1e7
- A to Z
{"John":3.14, "Jane":2.718}
42: y
A to Z: [1, 2, 3]
XMLd <null />a <boolean val="true"/>a

<true />a

<boolean val="false"/>a

<false />a

<integer>685230</integer>a <float>6.8523015e+5</float>a A to Z a
<array>
  <element type="boolean">true</element>
  <element type="null"/>
  <element type="float">-42.1e7</element>
  <element type="string">A to Z</element>
</array>
a
<associative-array>
  <entry>
    <key type="integer">42</key>
    <value type="boolean">true</value>
  </entry>
  <entry>
    <key type="string">A to Z</key>
    <value>
      <array>
        <element type="integer" val="1"/>
        <element type="integer" val="2"/>
        <element type="integer" val="3"/>
      </array>
    </value>
  </entry>
</associative-array>
XML-RPC <value><boolean>1</boolean></value> <value><boolean>0</boolean></value> <value><int>685230</int></value> <value><double>6.8523015e+5</double></value> <value><string>A to Z</string></value>
<value><array>
  <data>
  <value><boolean>1</boolean></value>
  <value><double>-42.1e7</double></value>
  <value><string>A to Z</string></value>
  </data>
  </array></value>
<value><struct>
  <member>
    <name>42</name>
    <value><boolean>1</boolean></value>
    </member>
  <member>
    <name>A to Z</name>
    <value>
      <array>
        <data>
          <value><int>1</int></value>
          <value><int>2</int></value>
          <value><int>3</int></value>
          </data>
        </array>
      </value>
    </member>
</struct>
  • a. ^ One possible encoding; the specification document does not specifically give an encoding for this datatype.
  • b. ^ The RFC CSV specification only deals with delimiters, newlines, and quote characters; it does not directly deal with serializing programming data structures.
  • c. ^ The netstrings specification only deals with nested byte strings; anything else is outside the scope of the specification.
  • d. ^ XML in and of itself is not a data serialization language, but many data serialization formats have been derived from it; as such, there are many different ways, in addition to those shown, to serialize programming data structures into XML.
  • e. ^ This syntax is not compatible with the Internet-Draft, but is used by some dialects of Lisp.

Comparison of binary formats

Format Null Booleans Integer Floating-point String Array Associative array/Object
ASN.1
(BER, PER or OER encoding)
NULL type BOOLEAN:
  • BER: as 1 byte in binary form;
  • PER: as 1 bit;
  • OER: as 1 byte
INTEGER:
  • BER: variable-length big-endian binary representation (up to 2^(2^1024) bits);
  • PER Unaligned: a fixed number of bits if the integer type has a finite range; a variable number of bits otherwise;
  • PER Aligned: a fixed number of bits if the integer type has a finite range and the size of the range is less than 65536; a variable number of octets otherwise;
  • OER: one, two, or four octets (either signed or unsigned) if the integer type has a finite range that fits in that number of octets; a variable number of octets otherwise
REAL:

base-10 real values are represented as character strings in ISO 6093 format;

binary real values are represented in a binary format that includes the mantissa, the base (2, 8, or 16), and the exponent;

the special values NaN, -INF, +INF, and negative zero are also supported

Multiple valid types (VisibleString, PrintableString, GeneralString, UniversalString, UTF8String) data specifications SET OF (unordered) and SEQUENCE OF (guaranteed order) user definable type
Binn[11] \x00 True: \x01
False: \x02
big-endian 2's complement signed and unsigned 8/16/32/64 bits single: big-endian binary32
double: big-endian binary64
UTF-8 encoded, null terminated, preceded by int8 or int32 string length in bytes Typecode (one byte) + 1-4 bytes size + 1-4 bytes items count + list items Typecode (one byte) + 1-4 bytes size + 1-4 bytes items count + key/value pairs
BSON[12] Null type - 0 bytes for value True: one byte \x01
False: \x00
int32: 32-bit little-endian 2's complement or int64: 64-bit little-endian 2's complement double: little-endian binary64 UTF-8 encoded, preceded by int32 encoded string length in bytes BSON embedded document with numeric keys BSON embedded document
Concise Binary Object Representation (CBOR)[13] \xf6 True: \xf5
False: \xf4
Small positive number \x00-\x17, small negative number \x20-\x37 (abs(N) <= 23)

8bit: positive \x18\xhh, negative \x38\xhh
16bit: positive \x19<uint16_t>, negative \x39<uint16_t>
32bit: positive \x1A<uint32_t>, negative \x3A<uint32_t>
64bit: positive \x1B<uint64_t>, negative \x3B<uint64_t>
Negative number x encoded as ~x (binary inversion) or as (-x-1)
Byte order - Big-endian

Typecode (one byte) + IEEE half/single/double Typecode with length (like integer coding) and content.

Bytestring and UTF-8 have different typecode

Typecode with count (like integer coding) and items Typecode with pairs count (like integer coding) and pairs
MessagePack \xc0 True: \xc3
False: \xc2
Single byte "fixnum" (values -32..127)

or typecode (one byte) + big-endian (u)int8/16/32/64

Typecode (one byte) + IEEE single/double Typecode + up to 15 bytes
or
typecode + length as uint8/16/32 + bytes;
encoding is unspecified[14]
As "fixarray" (single-byte prefix + up to 15 array items)

or typecode (one byte) + 2-4 bytes length + array items

As "fixmap" (single-byte prefix + up to 15 key-value pairs)

or typecode (one byte) + 2-4 bytes length + key-value pairs

Netstrings 0:, True: 1:1,

False: 1:0,

OGDL Binary
Property list
(binary format)
Protocol Buffers[15] Variable encoding length signed 32-bit: varint encoding of "ZigZag"-encoded value (n << 1) XOR (n >> 31)

Variable encoding length signed 64-bit: varint encoding of "ZigZag"-encoded (n << 1) XOR (n >> 63)
Constant encoding length 32-bit: 32 bits in little-endian 2's complement
Constant encoding length 64-bit: 64 bits in little-endian 2's complement

floats: little-endian binary32

doubles: little-endian binary64

UTF-8 encoded, preceded by varint-encoded integer length of string in bytes Repeated value with the same tag N/A
Sereal 0x25 True: 0x3b
False: 0x3a
Single byte POS/NEG (values -16..15)

or typecode (one byte) + "varint" encoded variable length integer or typecode (one byte) + "zigzag" encoded variable length integer

Typecode (one byte) + IEEE single/double/quad As "SHORT_BINARY" (single-byte prefix + up to 31 raw bytes)

or typecode (one byte, including boolean UTF8-encoding flag) + "varint" encoded length + raw bytes

As "ARRAYREF" (single-byte prefix + up to 15 array items)

or typecode (one byte) + "varint" encoded length + array items

As "HASHREF" (single-byte prefix + up to 15 key-value pairs)

or typecode (one byte) + "varint" encoded length + key-value pairs. Distinguishes hashmaps from objects / class instances.

Smile \x21 True: \x23
False: \x22
Single byte "small" (values -16..15 encoded using \xc0 - \xdf),

zigzag-encoded varints (1 - 11 databytes), or BigInteger

IEEE single/double, BigDecimal Length-prefixed "short" Strings (up to 64 bytes), marker-terminated "long" Strings and (optional) back-references Arbitrary-length heterogenous arrays with end-marker Arbitrary-length key/value pairs with end-marker
Structured Data eXchange Formats (SDXF) big-endian signed 24bit or 32bit integer big-endian IEEE double either UTF-8 or ISO 8859-1 encoded list of elements with identical ID and size, preceded by array header with int16 length chunks can contain other chunks to arbitrary depth
Thrift
Transenc 0x82 True: 0x81
False: 0x80
Single byte integers in the range [-32;127]

Fixed length integers for 8-bits, 16-bits, 32-bits, and 64-bits integers.

Encoded as two's complement little-endian values.

Little-endian IEEE single/double precision numbers. UTF-8 encoded type-length-value string. Balanced brackets with an optional array count. Arrays can be nested. Balanced brackets with an optional object count. Objects can be nested.

It should be noted that any XML based representation can be compressed, or generated as, using EXI - Efficient XML Interchange, which is a "Schema Informed" (as opposed to schema-required, or schema-less) binary compression standard for XML.

See also

References

External links