CSV Serialization

The CSV serialzation is the format how record are serialized in the orientdb 0.* and 1.* version.

Documents are serialized in a proprietary format (as a string) derived from JSON, but more compact. The string retrieved from the storage could be filled with spaces. This is due to the oversize feature if it is set. Just ignore the tailing spaces.

To know more about types look at Supported types.

These are the rules:

  • Any string content must escape some characters:
  • " -> "
  • \ -> \
  • The class, if present, is at the begin and must end with @. E.g. Customer@
  • Each Field must be present with its name and value separated by :. E.g.name:"Barack"
  • Fields must be separated by ,. E.g. name:"Barack",surname:"Obama"
  • All Strings must be enclosed by " character. E.g. city:"Rome"
  • All Binary content (like byte[must be encoded in Base64 and enclosed by underscore _ character. E.g. buffer:AAECAwQFBgcICQoLDA0ODxAREhMUFRYXGBkaGx. Since v1.0rc7
  • Numbers (integer, long, short, byte, floats, double) are formatted as strings as ouput by the Java toString() method. No thousands separator must be used. The decimal separator is always . Starting from version 0.9.25, if the type is not integer, a suffix is used to distinguish the right type when unmarshalled: b=byte, s=short, l=long, f=float, d=double, c=BigDecimal (since 1.0rc8). E.g. salary:120.3f or code:124b.
  • Output of Floats
  • Output of Doubles
  • Output of BigDecimal
  • Booleans are expressed as true and false always in lower-case. They are recognized as boolean since the text has no double quote as is the case with strings
  • Dates must be in the POSIX format (also called UNIX format: http://en.wikipedia.org/wiki/Unix_time). Are always stored as longs but end with:
  • the 't' character when it's DATETIME type (default in schema-less mode when a Date object is used). Datetime handles the maximum precision up to milliseconds. E.g. lastUpdate:1296279468000t is read as 2011-01-29 05:37:48
  • the 'a' character when it's DATE type. Date handles up to day as precision. E.g. lastUpdate:1306281600000a is read as 2011-05-25 00:00:00 (Available since 1.0rc2)
  • RecordID (link) must be prefixed by #. A Record Id always has the format <cluster-id>:<cluster-position>. E.g. location:#3:2
  • Embedded documents are enclosed by parenthesis ( and ) characters. E.g. (name:"rules"). Note: before SVN revision 2007 (0.9.24-snapshot) only characters were used to begin and end the embedded document.*
  • Lists (array and list) must be enclosed by [ and ] characters. E.g. [1,2,3], [#10:3,#10:4] and [(name:"Luca")]. Before rel.15 SET type was stored as a list, but now it uses own format (see below)
  • Sets (collections without duplicates) must be enclosed by < and > characters. E.g. <1,2,3>, <#10:3,#10:4> and <(name:"Luca")>. There is a special case when use LINKSET type reported in detail in Special use of LINKSET types section. Before rel.15 SET type was stored as a list (see upon).
  • Maps (as a collection of entries with key/value) must be enclosed in { and } characters. E.g. rules:{"database":2,"database.cluster.internal":2</code>} (NB. to set a value part of a key/value pair, set it to the text "null", without quotation marks. Eg. rules:{"database_name":"fred","database_alias":null})
  • RidBags a special collection for link management. Represented as %(content:binary); where the content is binary data encoded in base64. Take a look at the main page for more details.
  • Null fields have an empty value part of the field. E.g. salary_cloned:,salary:

Simple example (line breaks introduced so it's visible on this page):


Complex example used in schema (line breaks introduced so it's visible on this page):


Other example of ORole that uses a map (line breaks introduced so it's visible on this page):



Below the serialization of types in JSON and Binary format (always refers to latest version of the protocol).

TypeJSON formatBinary descriptor
String0Value ends with 'b'. Example: 23b
Short10000Value ends with 's'. Example: 23s
Integer1000000Just the value. Example: 234392
Long1000000000Value ends with 'l'. Example: 23439223l
Float100000.33333Value ends with 'f'. Example: 234392.23f
Double100.33Value ends with 'd'. Example: 10020.2302d
Decimal1000.3333Value ends with 'c'. Example: 234.923c
Booleantrue'true' or 'false'. Example: true
Date1436983328000Value in milliseconds ends with 'a'. Example: 1436983328000a
Datetime1436983328000Value in milliseconds ends with 't'. Example: 1436983328000t
Binarybase64 encoded binary, like: "A3ERjRFdc0023Kc"Bytes surrounded with _ characters. Example: _2332322_
Link#10:3Just the RID. Example: #10:232
Link list[#10:3, #10:4]Collections values separated by commas and surrounded by brackets "[ ]". Example: [#10:3, #10:6]
Link setExample: [#10:3, #10:6]Example: <#10:3, #10:4>
Link mapExample: { "name" : "#10:3" }Map entries separated by commas and surrounded by curly braces "{ }". Example: {"Jay":#10:3,"Mike":#10:6}
Embedded{"Jay":"#10:3","Mike":"#10:6"}Embedded document serialized surrounded by parenthesis "( )". Example: ({"Jay":#10:3,"Mike":#10:6})
Embedded listExample: [20, 30]Collections of values separated by commas and surrounded by brackets "[ ]". Example: [20, 30]
Embedded set['is', 'a', 'test']Collections of values separated by commas and surrounded by minor and major "<>". Example: <20, 30>
Embedded map{ "name" : "Luca" }Map of values separated by commas and surrounded by curly braces "{ }". Example: {"key1":23,"key2":2332}
Custombase64 encoded binary, like: "A3ERjRFdc0023Kc"-