版本：1.3.0

Xlang Serialization Format

Cross-language Serialization Specification

Apache Fory™ xlang serialization enables automatic cross-language object serialization with support for shared references, circular references, and polymorphism. Unlike traditional serialization frameworks that require IDL definitions and schema compilation, Fory serializes objects directly without any intermediate steps.

Key characteristics:

Automatic: No IDL definition, no schema compilation, no manual object-to-protocol conversion
Cross-language: Same binary format works across Java, Python, C++, Go, Rust, JavaScript/TypeScript, C#, Swift, Dart, Scala, and Kotlin
Reference-aware: Handles shared references and circular references without duplication or infinite recursion
Polymorphic: Supports object polymorphism with concrete type resolution

This specification defines the Fory xlang binary format. The format is dynamic rather than static, which enables flexibility and ease of use at the cost of additional complexity in the wire format.

Type Systems

Data Types

bool: a boolean value (true or false).
int8: a 8-bit signed integer.
int16: a 16-bit signed integer.
int32: a 32-bit signed integer. Scalar type expressions may use fixed or varint encoding.
int64: a 64-bit signed integer. Scalar type expressions may use fixed, varint/PVL, or tagged encoding.
uint8: an 8-bit unsigned integer.
uint16: a 16-bit unsigned integer.
uint32: a 32-bit unsigned integer. Scalar type expressions may use fixed or varint encoding.
uint64: a 64-bit unsigned integer. Scalar type expressions may use fixed, varint/PVL, or tagged encoding.
float8: an 8-bit floating point number.
float16: a 16-bit floating point number.
bfloat16: a 16-bit brain floating point number.
float32: a 32-bit floating point number.
float64: a 64-bit floating point number including NaN and Infinity.
string: a text string encoded using Latin1/UTF16/UTF-8 encoding.
enum: a data type consisting of a set of named values. Rust enum with non-predefined field values are not supported as an enum.
named_enum: an enum whose value will be serialized as the registered name.
struct: a dynamic(final) type serialized by Fory Struct serializer. i.e. it doesn't have subclasses. Suppose we're deserializing List<SomeClass>, we can save dynamic serializer dispatch since SomeClass is dynamic(final).
compatible_struct: a dynamic(final) type serialized by Fory compatible Struct serializer.
named_struct: a struct whose type mapping will be encoded as a name.
named_compatible_struct: a compatible_struct whose type mapping will be encoded as a name.
ext: a type which will be serialized by a customized serializer.
named_ext: an ext type whose type mapping will be encoded as a name.
list: a sequence of objects.
set: an unordered set of unique elements.
map: a map of key-value pairs. Map keys do not allow binary values, floating-point values, decimal values, or collection-shaped values such as list, map, set, and array.
duration: an absolute length of time, independent of any calendar/timezone, as a count of nanoseconds.
timestamp: a point in time, independent of any calendar/timezone, encoded as seconds (int64) and nanoseconds (uint32) since the epoch at UTC midnight on January 1, 1970.
date: a naive date without timezone, encoded as a signed varint64 count of days since the Unix epoch.
decimal: an exact decimal value encoded as a signed scale and an exact unscaled integer.
binary: an variable-length array of bytes.
array: array<T> is dense one-dimensional bool or numeric data. Current xlang emits the canonical specialized *_ARRAY type IDs for each supported element domain. ARRAY (42) is reserved for a future generic array encoding and is not emitted by the current xlang format. list<T> remains a separate schema kind.
- bool_array: canonical wire tag for array<bool>.
- int8_array: canonical wire tag for array<int8>.
- int16_array: canonical wire tag for array<int16>.
- int32_array: canonical wire tag for array<int32>.
- int64_array: canonical wire tag for array<int64>.
- uint8_array: canonical wire tag for array<uint8>.
- uint16_array: canonical wire tag for array<uint16>.
- uint32_array: canonical wire tag for array<uint32>.
- uint64_array: canonical wire tag for array<uint64>.
- float8_array: reserved canonical wire tag for array<float8>.
- float16_array: canonical wire tag for array<float16>.
- bfloat16_array: canonical wire tag for array<bfloat16>.
- float32_array: canonical wire tag for array<float32>.
- float64_array: canonical wire tag for array<float64>.
union: a tagged union type that can hold one of several alternative types. The active alternative is identified by a non-negative case ID.
typed_union: a union value with registered numeric union type ID.
named_union: a union value with embedded union type name or shared TypeDef.
none: represents an empty/unit value with no data (e.g., for empty union alternatives).

Note:

Unsigned integer types use the same byte sizes as their signed counterparts; the difference is in value interpretation. See Type mapping for language-specific type mappings.

Polymorphisms

For polymorphism, if one non-final class is registered, and only one subclass is registered, then we can take all elements in List/Map have same type, thus reduce per-element type checks.

Collection/Array polymorphism are not fully supported, since some languages such as golang have only one collection type. If users want to get exactly the type he passed, he must pass that type when deserializing or annotate that type to the field of struct.

Type disambiguation

Due to differences between type systems of languages, those types can't be mapped one-to-one between languages. When deserializing, Fory use the target data structure type and the data type in the data jointly to determine how to deserialize and populate the target data structure. For example:

class Foo {
  int[] intArray;
  Object[] objects;
  List<Object> objectList;
}

class Foo2 {
  int[] intArray;
  List<Object> objects;
  List<Object> objectList;
}

intArray has array<int32> schema and uses the int32_array wire tag. Both objects and objectList have list schema. These schema kinds are distinct; implementations must not treat general object arrays as dense numeric arrays.

List and Array Semantics

list<T> and array<T> are different schema kinds.

Use list<T> for ordinary ordered collections whose elements may need collection semantics, nullable element handling, reference handling, or object/string/bytes payloads. A primitive list<T> may still use an optimized homogeneous element segment internally, but the payload is owned by the list protocol and carries list metadata.

Use array<T> for dynamic-length dense one-dimensional bool or numeric data. array<T> elements are always non-null, non-reference-tracked, and fixed-width by the array contract. array<bool> uses one byte per value. Integer arrays use fixed-width little-endian element payloads even when the scalar int32, int64, uint32, or uint64 default encoding is varint/PVL in scalar or list positions.

Valid array<T> element domains are:

bool
int8, int16, int32, int64
uint8, uint16, uint32, uint64
float16, bfloat16, float32, float64

Invalid array schemas include array<fixed int32>, array<optional int32>, array<ref T>, array<string>, array<bytes>, array<map<...>>, and arrays of structs, unions, enums, temporal values, decimals, or dynamic any values.

The current wire format keeps specialized primitive-array type IDs as the canonical dynamic tags for array<T>:

Schema	Dynamic wire tag
`array<bool>`	`BOOL_ARRAY`
`array<int8>`	`INT8_ARRAY`
`array<int16>`	`INT16_ARRAY`
`array<int32>`	`INT32_ARRAY`
`array<int64>`	`INT64_ARRAY`
`array<uint8>`	`UINT8_ARRAY`
`array<uint16>`	`UINT16_ARRAY`
`array<uint32>`	`UINT32_ARRAY`
`array<uint64>`	`UINT64_ARRAY`
`array<float16>`	`FLOAT16_ARRAY`
`array<bfloat16>`	`BFLOAT16_ARRAY`
`array<float32>`	`FLOAT32_ARRAY`
`array<float64>`	`FLOAT64_ARRAY`

ARRAY (42) is reserved for a future generic or shaped-array descriptor and is not emitted for dense primitive arrays.

In schema-compatible mode only, a matched struct/class field may read between direct top-level list<T> and direct top-level array<T> schemas when T belongs to the valid dense array element domains above. Integer list element encodings in the same signedness and width domain match the corresponding dense array element domain. This is a read adaptation, not a schema-kind merge: writers keep emitting their local canonical list<T> or array<T> payload, and TypeDef/ClassDef encodings, fingerprints, dynamic root serialization, same-schema mode, and unknown-field skipping continue to treat list<T> and array<T> as distinct kinds.

The adaptation is limited to the immediate schema of the matched compatible field. It does not apply when list<T> or array<T> appears inside another field type, including collection elements, map keys or values, array elements, union alternatives, or other generic/container positions. A peer list<T?> TypeDef element schema is not immediate schema incompatibility for a local matched array<T> field. Classification must accept the matched field when the element domains match and the only element-schema difference is nullable metadata. The reader must decide from the collection payload: if the payload actually carries a null element, the local array<T> field must raise a compatible-read error. Null list elements must not be coerced to dense-array default values. Reference-tracked list-element framing is separate from nullable element schema. A runtime that cannot materialize ref-tracked list elements into a dense array without generic/reference paths may reject that field during compatible classification; if it accepts the field, reference payloads that cannot be represented as dense array element values must fail during read.

The dense-array error rule applies to dense-array targets. A matched list<T?> field read into a local list<T?> target must keep using list semantics and preserve actual null elements; implementations must not route that payload through a dense primitive-array materialization path that rejects nulls.

In schema-compatible mode only, a matched struct/class field may read between direct top-level binary and direct top-level array<uint8> schemas. This is a byte-sequence adaptation only: it does not merge TypeDef/ClassDef type IDs, schema fingerprints, dynamic root serialization, same-schema mode, or nested collection/map/array/union/generic positions. array<int8> is not part of this adapter.

In schema-compatible mode only, a matched struct/class field may also read between direct top-level scalar schemas when the remote value can be represented by the local scalar schema without changing the logical value. This is a compatible read adaptation only: writers keep emitting their local canonical schema and payload, and TypeDef/ClassDef encodings, fingerprints, dynamic root serialization, same-schema mode, unknown-field skipping, and container element schemas continue to treat the original scalar types as distinct.

The scalar conversion rule applies only to the immediate schema of the matched compatible field. It does not apply to dynamic root values, any, map keys, map values, list elements, set elements, array elements, union alternatives, enum values, time/date/duration values, binary values, structs, ext values, or nested generic/container positions. It also applies only when both the remote and local top-level field schemas have trackingRef = false; if either matched field schema has trackingRef = true, scalar conversion is outside the compatible layout matrix and scalar type changes remain schema/type incompatible. Same scalar type IDs with matching top-level trackingRef and null/optional framing are exact same-schema direct reads, not compatible scalar conversion. Same scalar type IDs with different top-level trackingRef framing are schema/type incompatible because the wire framing differs. Same scalar type IDs with different top-level null/optional framing may still use the nullable/optional composition rule below when both fields have trackingRef = false.

The convertible scalar domains are bool, string, and numeric scalars. Numeric scalars are signed integers (int8, int16, int32, int64), unsigned integers (uint8, uint16, uint32, uint64), floating point (float16, bfloat16, float32, float64), and decimal. Integer encoding variants are the same semantic domain as their base width: fixed, variable, and tagged integer encodings do not create additional conversion domains.

Compatible scalar conversion MUST follow these rules:

string to bool accepts exactly "0", "1", "false", and "true". The match is byte-for-byte ASCII; readers MUST NOT trim whitespace, accept a leading sign, accept other letter case, or use locale-specific text.
bool to string produces canonical lower-case "false" or "true".
numeric to bool accepts only exact numeric zero and exact numeric one. NaN and infinities fail. Negative floating zero is zero. Decimal scale does not affect the zero/one check.
bool to numeric produces exact zero or one in the local numeric domain.
numeric to numeric succeeds only when the local numeric domain represents the same mathematical value. Integer conversions check target range and signedness; integer-to-floating conversions check exact representability in the target floating domain; floating-to-integer conversions require a finite integral value within range; floating-to-floating conversions require exact preservation after converting to the target and back to the source, including the sign of zero. Floating infinities may convert only when the target floating domain preserves the same infinity. NaN is not convertible across different floating type IDs.
decimal is an exact numeric scalar. Integer-to-decimal conversion uses scale 0; decimal-to-integer conversion requires an integral value in range; floating-to-decimal conversion requires a finite value and converts the exact binary floating value to canonical decimal form; decimal-to-floating conversion requires exact representability in the target floating domain. Same-type decimal reads preserve the ordinary decimal payload. Decimal values produced by conversion use the canonical converted decimal form below.
string to numeric accepts only the compatible numeric literal grammar below and then applies the same lossless target-domain checks. "NaN", "Infinity", "-Infinity", and spelling variants fail because numeric strings are finite-only.
numeric to string emits canonical finite numeric text. Integer sources emit decimal text with no leading zeros except "0". Floating sources emit exact plain decimal text that equals the source value and parses back to the same source floating type; it includes a decimal point and at least one fractional digit, preserves negative zero as "-0.0", never uses exponent notation, and fails for NaN and infinities. Decimal sources emit exact plain decimal text with no exponent and no insignificant trailing fractional zeros; decimal zero is "0".

The compatible numeric literal grammar is deliberately stricter than host language parsers:

no leading or trailing whitespace;
no leading plus sign;
ASCII grammar only: signs, digits, decimal points, and exponent markers are the ASCII bytes -, 0 through 9, ., e, and E;
no Unicode decimal digits, underscores, grouping separators, locale-specific digits, hexadecimal, octal, binary, or type suffixes;
integer literal: -?(0|[1-9][0-9]*);
decimal floating literal: -?(0|[1-9][0-9]*)\.[0-9]+([eE]-?(0|[1-9][0-9]*))? or -?(0|[1-9][0-9]*)[eE]-?(0|[1-9][0-9]*).

Readers MUST parse numeric strings with exact decimal, rational, or equivalent checked algorithms. Parsing through a host floating type and then casting is not valid unless the implementation also proves exactness against the original literal.

Canonical converted decimal form is:

zero: unscaled = 0, scale = 0;
non-zero integers: scale = 0 and the integer as unscaled;
finite fractional values: the smallest non-negative scale whose unscaled * 10^-scale equals the value and whose unscaled is not divisible by 10.

Compatible scalar conversion MUST reject a numeric string before arbitrary precision parsing when the raw string length is greater than 320. It MUST also reject a converted decimal before constructing large powers of ten or formatting plain decimal text when its canonical converted form would require an exponent or scale outside [-256, 256], a positive scale greater than 256, an unscaled decimal magnitude with more than 256 significant digits, or a negative scale whose formatted integer digit count would exceed 256. These bounds apply only to values produced by compatible scalar conversion, including string-to-decimal, decimal-to-string, and floating-to-decimal conversion. Same-type decimal reads preserve the ordinary decimal payload. A bounded public decimal carrier may reject smaller values when it cannot represent the value exactly.

Nullable, boxed, optional, and nullable-field composition is supported for matched scalar pairs whose top-level field schemas have trackingRef = false. Readers first consume the remote null/optional framing described by the remote field metadata. If a value is present, the reader converts the unwrapped scalar value and then assigns or wraps it into the local carrier. If the remote value is null or absent, the reader uses the same missing/null compatible-field rule it already applies for that local field; this feature does not introduce a second null policy. Reference-tracked scalar conversion is not supported.

Conversion failures are data errors, not schema misses. A schema pair outside the conversion matrix remains a schema/type compatibility error when building the compatible layout. Once a matched field is accepted as a scalar conversion action, an invalid payload value MUST be reported through the implementation's data-error path with enough context to identify the remote type, local type, and field when that path has the information.

Unknown-field skipping applies only when the remote field has no matching local field identity. If a local field matches by tag ID or name but its schema is outside the exact-read and compatible-adaptation rules, the reader MUST reject the compatible layout instead of treating the field as missing, remote-only, or skippable.

Users can also provide meta hints for fields of a type, or the type whole. Here is an example in java which use annotation to provide such information.

@ForyStruct
class Foo {
  @ArrayType
  @ForyField(id = 0)
  int[] intArray;

  @ForyField(id = 1, dynamic = ForyField.Dynamic.TRUE)
  Object object;

  @Nullable
  @ForyField(id = 2)
  List<Object> objectList;
}

Such information can be provided in other languages too:

cpp: use macro and template.
golang: use struct tag.
python: use typehint.
rust: use macro.

Type ID

All internal data types use an 8-bit internal ID (0~255, with 0~56 defined here). Users can register types by numeric ID (0~0xFFFFFFFE in current implementations). User IDs are encoded separately from the internal type ID; there is no bit shifting/packing.

Named types (NAMED_*) do not embed a user ID; their names are carried in metadata instead.

Internal Type ID Table

Type ID	Name	Description
0	UNKNOWN	Unknown type, used for dynamic typing
1	BOOL	Boolean value
2	INT8	8-bit signed integer
3	INT16	16-bit signed integer
4	INT32	32-bit signed integer
5	VARINT32	Variable-length encoded 32-bit signed integer
6	INT64	64-bit signed integer
7	VARINT64	Variable-length encoded 64-bit signed integer
8	TAGGED_INT64	Hybrid encoded 64-bit signed integer
9	UINT8	8-bit unsigned integer
10	UINT16	16-bit unsigned integer
11	UINT32	32-bit unsigned integer
12	VAR_UINT32	Variable-length encoded 32-bit unsigned integer
13	UINT64	64-bit unsigned integer
14	VAR_UINT64	Variable-length encoded 64-bit unsigned integer
15	TAGGED_UINT64	Hybrid encoded 64-bit unsigned integer
16	FLOAT8	8-bit floating point (float8)
17	FLOAT16	16-bit floating point (half precision)
18	BFLOAT16	16-bit brain floating point
19	FLOAT32	32-bit floating point (single precision)
20	FLOAT64	64-bit floating point (double precision)
21	STRING	UTF-8/UTF-16/Latin1 encoded string
22	LIST	Ordered collection (List, Array, Vector)
23	SET	Unordered collection of unique elements
24	MAP	Key-value mapping
25	ENUM	Enum registered by numeric ID
26	NAMED_ENUM	Enum registered by namespace + type name
27	STRUCT	Struct registered by numeric ID (same-schema)
28	COMPATIBLE_STRUCT	Struct with schema evolution support (by ID)
29	NAMED_STRUCT	Struct registered by namespace + type name
30	NAMED_COMPATIBLE_STRUCT	Struct with schema evolution (by name)
31	EXT	Extension type registered by numeric ID
32	NAMED_EXT	Extension type registered by namespace + type name
33	UNION	Union value, schema identity not embedded
34	TYPED_UNION	Union value with registered numeric type ID
35	NAMED_UNION	Union value with embedded type name/TypeDef
36	NONE	Empty/unit type (no data)
37	DURATION	Time duration (seconds + nanoseconds)
38	TIMESTAMP	Point in time (seconds + nanoseconds since epoch)
39	DATE	Date without timezone (signed varint64 days)
40	DECIMAL	Arbitrary precision decimal (scale + unscaled)
41	BINARY	Raw binary data
42	ARRAY	Reserved for future dedicated multi-dimensional arrays
43	BOOL_ARRAY	1D boolean array
44	INT8_ARRAY	1D int8 array
45	INT16_ARRAY	1D int16 array
46	INT32_ARRAY	1D int32 array
47	INT64_ARRAY	1D int64 array
48	UINT8_ARRAY	1D uint8 array
49	UINT16_ARRAY	1D uint16 array
50	UINT32_ARRAY	1D uint32 array
51	UINT64_ARRAY	1D uint64 array
52	FLOAT8_ARRAY	1D float8 array
53	FLOAT16_ARRAY	1D float16 array
54	BFLOAT16_ARRAY	1D bfloat16 array
55	FLOAT32_ARRAY	1D float32 array
56	FLOAT64_ARRAY	1D float64 array

Type ID Encoding for User Types

When registering user types (struct/ext/enum/union), the internal type ID is written as the 8-bit kind. The user type ID is written separately as an unsigned varint32 (small7); there is no bit shift or packing.

Examples:

User ID	Type	Internal ID	Encoded User ID	Decimal
0	STRUCT	27	0	0
0	ENUM	25	0	0
1	STRUCT	27	1	1
1	COMPATIBLE_STRUCT	28	1	1
2	NAMED_STRUCT	29	2	2

When reading type IDs:

Read internal type ID from the type ID field.
If the internal type is a user-registered kind, read user_type_id as varuint32.

Type mapping

See Type mapping

Spec overview

Here is the overall format:

| fory header | object ref meta | object type meta | object value data |

The data are serialized using little endian byte order for all types.

Fory header

Fory header format for xlang serialization:

|        1 byte bitmap           |
+--------------------------------+
|            flags               |

Detailed byte layout:

Byte 0:   Bitmap flags
          - Bit 0: xlang flag (0x01)
          - Bit 1: oob flag (0x02)
          - Bits 2-7: reserved

xlang flag (bit 0): 1 when serialization uses Fory xlang format, 0 when serialization uses a Fory native-mode format.
oob flag (bit 1): 1 when out-of-band serialization is enabled (BufferCallback is not null), 0 otherwise.
reserved bits (bits 2-7): must be zero.

All data is encoded in little-endian format.

Reference Meta

Reference tracking handles whether the object is null, and whether to track reference for the object by writing corresponding flags and maintaining internal state.

Reference Flags

Flag	Byte Value (int8)	Hex	Description
NULL FLAG	`-3`	`0xFD`	Object is null. No further bytes are written for this object.
REF FLAG	`-2`	`0xFE`	Object was already serialized. Followed by unsigned varint32 reference ID.
NOT_NULL VALUE FLAG	`-1`	`0xFF`	Object is non-null but reference tracking is disabled for this type. Object data follows immediately.
REF VALUE FLAG	`0`	`0x00`	Object is referencable and this is its first occurrence. Object data follows. Assigns next reference ID.

Reference Tracking Algorithm

Writing:

function write_ref_or_null(buffer, obj):
    if obj is null:
        buffer.write_int8(NULL_FLAG)      // -3
        return true  // done, no more data to write

    if reference_tracking_enabled:
        ref_id = lookup_written_objects(obj)
        if ref_id exists:
            buffer.write_int8(REF_FLAG)   // -2
            buffer.write_varuint32(ref_id)
            return true  // done, reference written
        else:
            buffer.write_int8(REF_VALUE_FLAG)  // 0
            add_to_written_objects(obj, next_ref_id++)
            return false  // continue to serialize object data
    else:
        buffer.write_int8(NOT_NULL_VALUE_FLAG)  // -1
        return false  // continue to serialize object data

Reading:

function read_ref_or_null(buffer):
    flag = buffer.read_int8()
    switch flag:
        case NULL_FLAG (-3):
            return (null, true)  // null object, done
        case REF_FLAG (-2):
            ref_id = buffer.read_varuint32()
            obj = get_from_read_objects(ref_id)
            return (obj, true)  // referenced object, done
        case NOT_NULL_VALUE_FLAG (-1):
            return (null, false)  // non-null, continue reading
        case REF_VALUE_FLAG (0):
            reserve_ref_slot()  // will be filled after reading
            return (null, false)  // non-null, continue reading

Reference ID Assignment

Reference IDs are assigned sequentially starting from 0
The ID is assigned when REF_VALUE_FLAG is written (first occurrence)
Objects are stored in a list/map indexed by their reference ID
For reading, a reference slot is reserved before deserializing the object, then filled after

When Reference Tracking is Disabled

When reference tracking is disabled globally or for specific types, only the NULL and NOT_NULL VALUE flags will be used for reference meta. This reduces overhead for types that are known not to have references.

Language-Specific Considerations

Languages with nullable and reference types by default (Java, Python, JavaScript):

In xlang mode, for cross-language compatibility:

All fields are treated as not-null by default
Reference tracking is disabled by default
Users can explicitly mark fields as nullable or enable reference tracking via annotations
Optional types (e.g., java.util.Optional, typing.Optional) are treated as nullable

Annotation examples:

// Java: use @Ref for reference tracking
public class MyClass {
    @Nullable
    @Ref
    private Object refField;

    private String requiredField;
}

# Python: use typing with fory field descriptors
from pyfory import ForyField, Ref

class MyClass:
    ref_field: ForyField(Ref[SomeType], nullable=True)
    required_field: ForyField(str, nullable=False)

Languages with non-nullable types by default:

Language	Null Representation	Reference Tracking Support
Rust	`Option::None`	Via `Rc<T>`, `Arc<T>`, `Weak<T>`
C++	`std::nullopt`, `nullptr`	Via `std::shared_ptr<T>`, `weak_ptr<T>`
Go	`nil` interface/pointer	Via pointer/interface types

Important: For languages like Rust that don't have implicit reference semantics, reference tracking must use explicit smart pointers (Rc, Arc).

Type Meta

Every non-primitive value begins with a type ID that identifies its concrete type. The type ID is followed by optional type-specific metadata.

Type ID encoding

The type ID is written as an unsigned varint32 (small7).
Internal types use their internal type ID directly (low 8 bits).
User-registered types write the internal type ID, then write user_type_id as varuint32.
- user_type_id is a numeric ID (0~0xFFFFFFFE in current implementations).
- internal_type_id is one of ENUM, STRUCT, COMPATIBLE_STRUCT, EXT, or TYPED_UNION.
Named types do not embed a user ID. They use NAMED_* internal type IDs and carry a namespace and type name (or shared TypeDef) instead.

Type meta payload

After the type ID:

ENUM / STRUCT / EXT / TYPED_UNION: no extra bytes beyond the user_type_id (registration by ID required on both sides).
COMPATIBLE_STRUCT:
- If meta share is enabled, write a shared TypeDef entry (see below).
- If meta share is disabled, no extra bytes.
NAMED_ENUM / NAMED_STRUCT / NAMED_COMPATIBLE_STRUCT / NAMED_EXT / NAMED_UNION:
- If meta share is disabled, write namespace and type_name as meta strings.
- If meta share is enabled, write a shared TypeDef entry (see below).
UNION: no extra bytes at this layer.
LIST / SET / MAP / primitives: no extra bytes at this layer.

Pure id-based enum, ext, and typed-union values therefore do not carry a TypeDef body. Receive-side TypeDef resource limits apply only when the stream actually carries shared TypeDef metadata.

ARRAY (42) is reserved for a future xlang extension for dedicated multi-dimensional arrays and is not used in current xlang streams.

Unregistered types are serialized as named types:

Enums -> NAMED_ENUM
Struct-like classes -> NAMED_STRUCT (or NAMED_COMPATIBLE_STRUCT when meta share is enabled)
Custom extension types -> NAMED_EXT
Unions -> NAMED_UNION

The namespace is the package/module name and the type name is the simple class name.

Shared Type Meta (streaming)

When meta share is enabled, TypeDef metadata is written inline the first time a type is encountered, and subsequent occurrences only reference it.

Encoding:

marker = (index << 1) | flag
flag = 0: new type definition follows
flag = 1: reference to a previously written type definition
index is the sequential index assigned to this type (starting from 0).

Write algorithm:

Look up the class in the per-stream meta context map.
If found, write (index << 1) | 1.
If not found:
- assign index = next_id
- write (index << 1)
- write the encoded TypeDef bytes immediately after

Read algorithm:

Read marker as varuint32.
flag = marker & 1, index = marker >>> 1.
If flag == 1, use the cached TypeDef at index.
If flag == 0, read a TypeDef, cache it at index, and use it.

TypeDef bytes include the 8-byte global header and optional size extension.

TypeDef (schema evolution metadata)

TypeDef describes a struct-like type (or a named enum/ext) for schema evolution and name resolution. It is encoded as:

|    8-byte global header   | [optional size varuint] | TypeDef body |

The 8-byte header is a little-endian uint64:

Low 8 bits: meta size (number of bytes in the TypeDef body).
- If meta size >= 0xFF, the low 8 bits are set to 0xFF and an extra varuint32(meta_size - 0xFF) follows immediately after the header.
Bit 8: COMPRESS_META is reserved for a future xlang metadata-compression extension. Current xlang writers MUST leave this bit unset and current xlang readers MUST treat a set bit as unsupported.
Bits 9-11: reserved for future extension (must be zero).
High 52 bits: stored hash bits derived from MurmurHash3 x64_128 seed 47 over TypeDef body || header_low12_le. header_low12_le is two little-endian bytes containing the low 12 header bits (size, compress, and reserved bits); the upper four bits of the second byte are zero. Take lane 0 of the 128-bit MurmurHash3 result as a signed int64, left-shift it by 12 with two's-complement 64-bit wraparound, apply signed absolute value (leaving INT64_MIN unchanged), then mask with 0xfffffffffffff000. The final header is the masked hash bits OR-ed with the low 12 header bits.

TypeDef body

TypeDef body has a single layer (fields are flattened in class hierarchy order):

| meta header (1 byte) | type spec | field info ... |

Meta header byte for struct TypeDefs:

Bit 7: IS_STRUCT (1).
Bit 6: COMPATIBLE.
Bit 5: REGISTER_BY_NAME (1 = namespace + type name, 0 = numeric user type ID).
Bits 0-4: num_fields (0-30).
- If num_fields == 31, read an extra varuint32 and add it.

Meta header byte for non-struct TypeDefs:

Bit 7: IS_STRUCT (0).
Bits 4-6: reserved (must be zero).
Bits 0-3: kind code.

Readers may reject a received TypeDef that exceeds runtime resource limits such as maximum metadata body bytes or maximum fields in one struct TypeDef. These limits are receive-side resource controls and do not change TypeDef wire encoding, type identity, dynamic loading, unknown-type handling, registration policy, or schema-evolution semantics.

Non-struct kind codes:

0: ENUM
1: NAMED_ENUM
2: EXT
3: NAMED_EXT
4: TYPED_UNION
5: NAMED_UNION
6-14: reserved
15: extended-kind escape, rejected until defined

Type spec:

If REGISTER_BY_NAME is set:
- namespace meta string
- type_name meta string
Otherwise:
- user type ID as varuint32

Field info list:

Each field is encoded as:

| field header (1 byte) | field type info | [field name bytes] |

Field header layout:

Bits 6-7: field name encoding (UTF8, ALL_TO_LOWER_SPECIAL, LOWER_UPPER_DIGIT_SPECIAL, or TAG_ID)
Bits 2-5: size
- For name encoding: size = (name_bytes_length - 1)
- For tag ID: size = tag_id
- If size == 0b1111, read varuint32(size - 15) and add it
Bit 1: nullable flag
Bit 0: reference tracking flag

Field type info:

The top-level field type is written as varuint32(type_id) (small7) without flags.
For LIST / SET, an element type follows, encoded as (nested_type_id << 2) | (nullable << 1) | tracking_ref.
For MAP, key type and value type follow, both encoded the same way.
One-dimensional primitive arrays use *_ARRAY type IDs; other arrays are encoded as LIST.

Field names:

If TAG_ID encoding is used, no name bytes are written.
Otherwise, write the encoded field name bytes as a meta string.
For xlang, field names are converted to snake_case before encoding for cross-language compatibility.

Field order:

TypeDef field lists use the same ordering defined in Field order. Compatible decoders must still match fields by name or tag ID rather than relying only on position.

Meta String

Meta string is a compressed encoding for metadata strings such as field names, type names, and namespaces. This compression significantly reduces the size of type metadata in serialized data.

Encoding Type IDs

ID	Name	Bits/Char	Character Set
0	UTF8	8	Any UTF-8 character
1	LOWER_SPECIAL	5	`a-z . _ $ \|`
2	LOWER_UPPER_DIGIT_SPECIAL	6	`a-z A-Z 0-9 . _`
3	FIRST_TO_LOWER_SPECIAL	5	First char uppercase, rest `a-z . _`
4	ALL_TO_LOWER_SPECIAL	5	`a-z A-Z . _` (uppercase escaped)

Character Mapping Tables

LOWER_SPECIAL (5 bits per character)

Character	Code (binary)	Code (decimal)
a-z	00000-11001	0-25
.	11010	26
_	11011	27
$	11100	28
\|	11101	29

Note: The | character is used as an escape sequence in ALL_TO_LOWER_SPECIAL encoding.

LOWER_UPPER_DIGIT_SPECIAL (6 bits per character)

Character	Code (binary)	Code (decimal)
a-z	000000-011001	0-25
A-Z	011010-110011	26-51
0-9	110100-111101	52-61
.	111110	62
_	111111	63

Encoding Algorithms

LOWER_SPECIAL Encoding

For strings containing only a-z, ., _, $, |:

function encode_lower_special(str):
    bits = []
    for char in str:
        bits.append(lookup_lower_special[char])  // 5 bits each

    // Pad to byte boundary
    total_bits = len(str) * 5
    padding_bits = (8 - (total_bits % 8)) % 8

    // First bit indicates if last char should be stripped (due to padding)
    strip_last = (padding_bits >= 5)
    if strip_last:
        prepend bit 1
    else:
        prepend bit 0

    return pack_bits_to_bytes(bits)

FIRST_TO_LOWER_SPECIAL Encoding

For strings like MyFieldName where only the first character is uppercase:

function encode_first_to_lower_special(str):
    // Convert first char to lowercase
    modified = str[0].lower() + str[1:]
    // Then use LOWER_SPECIAL encoding
    return encode_lower_special(modified)

ALL_TO_LOWER_SPECIAL Encoding

For strings with multiple uppercase characters like MyTypeName:

function encode_all_to_lower_special(str):
    result = ""
    for char in str:
        if char.is_upper():
            result += "|" + char.lower()  // Escape uppercase with |
        else:
            result += char
    return encode_lower_special(result)

Example: MyType → |my|type → encoded with LOWER_SPECIAL

Encoding Selection Algorithm

function choose_encoding(str):
    if all chars in str are in [a-z . _ $ |]:
        return LOWER_SPECIAL

    if first char is uppercase AND rest are in [a-z . _]:
        return FIRST_TO_LOWER_SPECIAL

    if all chars are in [a-z A-Z . _]:
        lower_special_size = encode_all_to_lower_special(str).size
        luds_size = encode_lower_upper_digit_special(str).size
        if lower_special_size <= luds_size:
            return ALL_TO_LOWER_SPECIAL
        else:
            return LOWER_UPPER_DIGIT_SPECIAL

    if all chars are in [a-z A-Z 0-9 . _]:
        return LOWER_UPPER_DIGIT_SPECIAL

    return UTF8

Meta String Header Format

Meta strings are written with a header that includes the encoding type:

| 3 bits encoding | 5+ bits length | encoded bytes |

Or for larger strings:

| varuint: (length << 3) | encoding | encoded bytes |

Special Character Sets by Context

Different contexts use different special characters:

Context	Special Chars	Notes
Field Name	. _ $ \|	$ for inner classes, \| for escape
Namespace	. _	Package/module separators
Type Name	$ _	$ for inner classes in Java

Deduplication

Meta strings are deduplicated within a serialization session:

First occurrence:  | (length << 1) | [hash if large] | encoding | bytes |
Reference:         | ((id + 1) << 1) | 1 |

Bit 0 of the header indicates: 0 = new string, 1 = reference to previous
Large strings (> 16 bytes) include 64-bit hash for content-based deduplication
Small strings use exact byte comparison

Value Format

Basic types

bool

size: 1 byte
format: 0 for false, 1 for true

int8

size: 1 byte
format: write as pure byte.

int16

size: 2 byte
byte order: raw bytes of little endian order

unsigned int32

size: 4 byte
byte order: raw bytes of little endian order

unsigned varint32

size: 1~5 bytes
Format: The most significant bit (MSB) in every byte indicates whether to have the next byte. If the continuation bit is set (i.e. b & 0x80 == 0x80), then the next byte should be read until a byte with unset continuation bit.

Encoding Algorithm:

function write_varuint32(value):
    while value >= 0x80:
        buffer.write_byte((value & 0x7F) | 0x80)  // 7 bits of data + continuation bit
        value = value >> 7
    buffer.write_byte(value)  // final byte without continuation bit

Decoding Algorithm:

function read_varuint32():
    result = 0
    shift = 0
    while true:
        byte = buffer.read_byte()
        result = result | ((byte & 0x7F) << shift)
        if (byte & 0x80) == 0:
            break
        shift = shift + 7
    return result

Byte sizes by value range:

Value Range	Bytes
0 ~ 127	1
128 ~ 16383	2
16384 ~ 2097151	3
2097152 ~ 268435455	4
268435456 ~ 4294967295	5

signed int32

size: 4 bytes
byte order: raw bytes of little endian order

signed varint32

size: 1~5 bytes
Format: First convert the number into positive unsigned int using ZigZag encoding, then encode as unsigned varint.

ZigZag Encoding:

// Encode: convert signed to unsigned
zigzag_value = (value << 1) ^ (value >> 31)

// Decode: convert unsigned back to signed
original = (zigzag_value >> 1) ^ (-(zigzag_value & 1))
// Or equivalently:
original = (zigzag_value >> 1) ^ (~(zigzag_value & 1) + 1)

ZigZag encoding maps signed integers to unsigned integers so that small absolute values (positive or negative) have small encoded values:

Original	ZigZag Encoded
0	0
-1	1
1	2
-2	3
2	4
...	...

unsigned int64

size: 8 bytes
byte order: raw bytes of little endian order

unsigned varint64

size: 1~9 bytes

Uses PVL (Progressive Variable-Length) encoding:

function write_varuint64(value):
    while value >= 0x80:
        buffer.write_byte((value & 0x7F) | 0x80)
        value = value >> 7
    buffer.write_byte(value)

Value Range	Bytes
0 ~ 127	1
128 ~ 16383	2
...	...
2^56 ~ 2^63-1	9

unsigned hybrid int64 (TAGGED_UINT64)

size: 4 or 9 bytes

Optimized for unsigned values that fit in 31 bits (common case for IDs, sizes, counts, etc.):

if value in [0, 2147483647]:  // fits in 31 bits (2^31 - 1), full unsigned range
    write 4 bytes: ((int32) value) << 1  // bit 0 is 0, indicating 4-byte encoding
else:
    write 1 byte:  0x01                  // bit 0 is 1, indicating 9-byte encoding
    write 8 bytes: value as little-endian uint64

Reading:

first_int32 = read_int32_le()
if (first_int32 & 1) == 0:
    return (uint64)(first_int32 >> 1)  // 4-byte encoding, unsigned
else:
    return read_uint64_le()            // read remaining 8 bytes

Note: TAGGED_UINT64 uses the full 31 bits for positive values [0, 2^31-1], compared to TAGGED_INT64 which splits the range for signed values [-2^30, 2^30-1].

VarUint36Small

A specialized encoding used for string headers that combines size (up to 36 bits) with encoding flags:

// Write: encodes (size << 2) | encoding_flags
function write_varuint36_small(value):
    if value < 0x80:
        buffer.write_byte(value)
    else:
        // Standard varint encoding for values >= 128
        write_varuint64(value)

This encoding is optimized for the common case where string length fits in 7 bits (strings < 32 characters).

signed int64

size: 8 bytes
byte order: raw bytes of little endian order

signed varint64

size: 1~9 bytes

Uses ZigZag encoding first, then PVL varint:

// Encode
zigzag_value = (value << 1) ^ (value >> 63)
write_varuint64(zigzag_value)

// Decode
zigzag_value = read_varuint64()
value = (zigzag_value >> 1) ^ (-(zigzag_value & 1))

signed hybrid int64 (TAGGED_INT64)

size: 4 or 9 bytes

Optimized for small signed values:

if value in [-1073741824, 1073741823]:  // fits in 30 bits + sign ([-2^30, 2^30-1])
    write 4 bytes: ((int32) value) << 1  // bit 0 is 0, indicating 4-byte encoding
else:
    write 1 byte:  0x01                  // bit 0 is 1, indicating 9-byte encoding
    write 8 bytes: value as little-endian int64

Reading:

first_int32 = read_int32_le()
if (first_int32 & 1) == 0:
    return (int64)(first_int32 >> 1)  // 4-byte encoding, sign-extended
else:
    return read_int64_le()            // read remaining 8 bytes

Note: TAGGED_INT64 uses 30 bits + sign for values [-2^30, 2^30-1], while TAGGED_UINT64 uses full 31 bits for unsigned values [0, 2^31-1].

float8

size: 1 byte
format:
- float8 has 4 kinds: float8 kind enum: float8_e4m3fn, float8_e4m3fnuz, float8_e5m2, float8_e5m2fnuz
- when serialize as field, write raw 8 bits as one byte directly
- when serialize as an object: write type kind as a byte, then write value byte

float16

size: 2 bytes
format: encode the specified floating-point value according to the IEEE 754 standard binary16 format, preserving NaN values, then write as binary by little endian order.

bfloat16

size: 2 bytes
format: encode the specified floating-point value according to the IEEE 754 standard bfloat16 format, preserving NaN values, then write as binary by little endian order.

float32

size: 4 byte
format: encode the specified floating-point value according to the IEEE 754 floating-point "single format" bit layout, preserving Not-a-Number (NaN) values, then write as binary by little endian order.

float64

size: 8 byte
format: encode the specified floating-point value according to the IEEE 754 floating-point "double format" bit layout, preserving Not-a-Number (NaN) values. then write as binary by little endian order.

string

Format:

| varuint36_small: (size << 2) | encoding | binary data |

String Header

The header is encoded using varuint36_small format, which combines the byte length and encoding type:

header = (byte_length << 2) | encoding_type

Encoding Type	Value	Description
LATIN1	0	ISO-8859-1 single-byte encoding
UTF16	1	UTF-16 encoding (2 bytes per code unit)
UTF8	2	UTF-8 variable-length encoding
Reserved	3	Reserved for future use

Encoding Algorithm

Writing:

function write_string(str):
    bytes = encode_to_bytes(str, chosen_encoding)
    header = (bytes.length << 2) | encoding_type
    buffer.write_varuint36_small(header)
    buffer.write_bytes(bytes)

Reading:

function read_string():
    header = buffer.read_varuint36_small()
    encoding = header & 0x03
    byte_length = header >> 2
    bytes = buffer.read_bytes(byte_length)
    return decode_bytes(bytes, encoding)

Encoding Selection by Language

Writing:

Language	Encoding Strategy
Java (JDK8)	Detect at runtime: LATIN1 if all chars < 256, else UTF16
Java (JDK9+)	Use String's internal coder: LATIN1 or UTF16
Python	Can write LATIN1, UTF16, or UTF8 based on string content
C++	UTF8 (`std::string`) or UTF16 (`std::u16string`)
Rust	UTF8 (`String`)
Go	UTF8 (`string`)
JavaScript	UTF8

Reading: All languages support decoding all three encodings (LATIN1, UTF16, UTF8).

Recommendation: Select encoding based on maximum performance - use the encoding that matches the language's native string representation to avoid conversion overhead.

Empty String

Empty strings are encoded with header 0 (length 0, any encoding) followed by no data bytes.

duration

Duration is an absolute length of time, independent of any calendar/timezone, as a count of seconds and nanoseconds.

Format:

| signed varint64: seconds | signed int32: nanoseconds |

seconds: Number of seconds in the duration, encoded as a signed varint64. Can be positive or negative.
nanoseconds: Nanosecond adjustment to the duration, encoded as a signed int32.

Notes:

The duration is stored as two separate fields to maintain precision and avoid overflow issues.
Seconds are encoded using varint64 for compact representation of common duration values.
Nanoseconds are stored as a fixed int32 since the range is limited.

Canonical Rules

Writers MUST normalize durations so nanoseconds is always in [0, 1_000_000_000).
Zero MUST be encoded as seconds = 0 and nanoseconds = 0.
Negative sub-second durations MUST borrow one second and use a positive nanosecond adjustment. Example: -0.5s is encoded as seconds = -1, nanoseconds = 500_000_000.
More generally, the encoded pair MUST satisfy:
- duration = seconds + nanoseconds / 1_000_000_000
- 0 <= nanoseconds < 1_000_000_000

Final Value

After decoding seconds and nanoseconds, the duration value is reconstructed as the exact duration represented by:

seconds + nanoseconds / 1_000_000_000

collection/list

Format:

| varuint32: length | 1 byte elements header | [optional type info] | elements data |

Elements Header

The elements header is a single byte that encodes metadata about the collection elements to optimize serialization:

| bit 7-4 (reserved) |    bit 3    |      bit 2       |   bit 1  |   bit 0   |
+--------------------+-------------+------------------+----------+-----------+
|      reserved      | is_same_type| is_decl_elem_type| has_null | track_ref |

Bit	Name	Value	Meaning when SET (1)	Meaning when UNSET (0)
0	track_ref	0x01	Track references for elements	Don't track element references
1	has_null	0x02	Payload contains null element markers	No null elements (skip null checks)
2	is_decl_elem_type	0x04	Elements are the declared generic type	Element types differ from declared type
3	is_same_type	0x08	All elements have the same concrete type	Elements have different concrete types

Common header values:

Header	Hex	Meaning
0x0C	12	Declared type + same type, non-null, no ref tracking (optimal)
0x0D	13	Declared type + same type, non-null, with ref tracking
0x0E	14	Declared type + same type, may have nulls, no ref tracking
0x08	8	Same type but not declared type (type info written once)
0x00	0	Different types, non-null, no ref tracking (type per element)

Type Info After Header

When is_decl_elem_type (bit 2) is NOT set, the element type info is written once after the header if is_same_type (bit 3) is set:

| header (0x08) | type_id (varuint32) | elements... |

When both is_decl_elem_type and is_same_type are NOT set, type info is written per element.

Element Serialization Based on Header

The header determines how each element is serialized:

elements data

Based on the elements header, the serialization of elements data may skip ref flag/null flag/element type info.

fory = ...
buffer = ...
elems = ...
if element_type_is_same:
    if not is_declared_type:
        fory.write_type(buffer, elem_type)
    elem_serializer = get_serializer(...)
    if track_ref:
        for elem in elems:
            if not ref_resolver.write_ref_or_null(buffer, elem):
                elem_serializer.write(buffer, elem)
    elif has_null:
        for elem in elems:
            if elem is None:
                buffer.write_byte(null_flag)
            else:
                buffer.write_byte(not_null_flag)
                elem_serializer.write(buffer, elem)
    else:
        for elem in elems:
            elem_serializer.write(buffer, elem)
else:
    if track_ref:
        for elem in elems:
            fory.write_ref(buffer, elem)
    elif has_null:
        for elem in elems:
            fory.write_nullable(buffer, elem)
    else:
        for elem in elems:
            fory.write_value(buffer, elem)

CollectionSerializer#writeElements can be taken as an example.

array

primitive array

Primitive array are taken as a binary buffer, serialization will just write the length of array size as an unsigned int, then copy the whole buffer into the stream. Multi-byte element arrays are always encoded in little-endian element order; implementations whose native typed-array storage uses another byte order must swap or write elements explicitly instead of copying native storage bytes unchanged.

Such serialization won't compress the array. If users want to compress primitive array, users need to register custom serializers for such types or mark it as list type.

Float array specifics:

float16/bfloat16 array: write varuint length, then raw bytes in little endian order.
float8 array: write element type kind as a byte, then varuint length, then raw bytes in little endian order.

Multi-dimensional arrays

Current xlang does not define a dedicated multi-dimensional array/tensor encoding. Multi-dimensional arrays are serialized as nested lists, while one-dimensional primitive arrays use the *_ARRAY type IDs. Internal type ID ARRAY (42) is reserved for a future dedicated multi-dimensional array encoding and is not used in current xlang streams.

object array

Object array is serialized using the list format. Object component type will be taken as list element generic type.

map

Map uses a chunk-based format to handle heterogeneous key-value pairs efficiently:

| varuint32: total_size | chunk_1 | chunk_2 | ... | chunk_n |

Map Chunk Format

Each chunk contains up to 255 key-value pairs with the same metadata characteristics:

|    1 byte    |     1 byte     |        variable bytes        |
+--------------+----------------+------------------------------+
|  KV header   |  chunk size N  |  N key-value pairs (N*2 obj) |

KV Header Bits

The KV header is a single byte encoding metadata for both keys and values:

|  bit 7-6   |     bit 5     |     bit 4    |     bit 3     |     bit 2     |     bit 1    |     bit 0     |
+------------+---------------+--------------+---------------+---------------+--------------+---------------+
|  reserved  | val_decl_type | val_has_null | val_track_ref | key_decl_type | key_has_null | key_track_ref |

Bit	Name	Value	Meaning when SET (1)
0	key_track_ref	0x01	Track references for keys
1	key_has_null	0x02	Keys may be null (rare, usually invalid)
2	key_decl_type	0x04	Key is the declared generic type
3	val_track_ref	0x08	Track references for values
4	val_has_null	0x10	Values may be null
5	val_decl_type	0x20	Value is the declared generic type

Common KV header values:

Header	Hex	Meaning
0x24	36	Key + value are declared types, non-null, no ref tracking (optimal)
0x2C	44	Key + value declared types, value tracks refs
0x34	52	Key + value declared types, value may be null
0x00	0	Key + value not declared types, non-null, no ref tracking

Chunk Size

Maximum chunk size: 255 pairs (fits in 1 byte)
When key or value is null, that entry is serialized as a separate chunk with implicit size 1 (chunk size byte is skipped)
Reader tracks accumulated count against total map size to know when to stop reading chunks

Why Chunk-Based Format?

Map iteration is expensive. Computing a single header for all pairs would require two passes. The chunk-based approach allows:

Optimistic prediction: Use first key-value pair to predict header
Adaptive chunking: Start new chunk if prediction fails for a pair
Efficient reading: Most maps fit in single chunk (< 255 pairs)
Memory efficiency: Minimal overhead for common homogeneous maps

Why serialize chunk by chunk?

When fory will use first key-value pair to predict header optimistically, it can't know how many pairs have same meta(tracking kef ref, key has null and so on). If we don't write chunk by chunk with max chunk size, we must write at least X bytes to take up a place for later to update the number which has same elements, X is the num_bytes for encoding varint encoding of map size.

And most map size are smaller than 255, if all pairs have same data, the chunk will be 1. This is common in golang/rust, which object are not reference by default.

Also, if only one or two keys have different meta, we can make it into a different chunk, so that most pairs can share meta.

The implementation can accumulate read count with map size to decide whether to read more chunks.

enum

Enums are serialized as an unsigned varint enum ID.

If the enum definition provides an explicit enum ID / variant ID / stable numeric tag for a value, that ID MUST be used.
If no explicit enum ID is specified, the declaration ordinal is used as the enum ID by default.

This means the wire contract is always an enum ID. When the enum ID comes from declaration order, reordering enum values changes the wire IDs and can change the deserialized result. For cross-language or long-lived schemas, users should prefer explicit stable enum IDs.

timestamp

Timestamp represents a point in time independent of any calendar/timezone. It is encoded as:

seconds (int64): seconds since Unix epoch (1970-01-01T00:00:00Z)
nanos (uint32): nanosecond adjustment within the second

On write, implementations must normalize negative timestamps so that nanos is always in [0, 1_000_000_000). This is a fixed-size 12-byte payload (8 bytes seconds + 4 bytes nanos).

date

Date represents a date without timezone. It is encoded as:

days (varint64): signed count of days since the Unix epoch (1970-01-01)

The value is reconstructed as LocalDate.ofEpochDay(days) or the equivalent calendar-date constructor in the target language implementation.

This varint64 encoding applies to xlang serialization only. Native, language-specific local-date encodings are unchanged.

decimal

A decimal value is encoded as:

scale: signed varint32
unscaledHeader: unsigned varint64
optional payload: present only for large unscaled values

The mathematical value is:

value = unscaled × 10^-scale

Scale

scale is encoded as signed varint32.
scale carries no extra flags or mode bits.

Unscaled Header

unscaledHeader selects the encoding of unscaled:

If (unscaledHeader & 1) == 0, the value uses the small encoding.
If (unscaledHeader & 1) == 1, the value uses the big encoding.

Small Encoding

For small values, unscaled must fit in signed 64-bit range and the zigzag-encoded value must fit in 63 bits.

Encoding:

unscaledHeader = zigzag(unscaled) << 1
no payload is written

Decoding:

unscaled = zigzagDecode(unscaledHeader >>> 1)

Big Encoding

For big values, unscaled is encoded as sign plus magnitude bytes.

Encoding:

sign = 0 if unscaled >= 0, otherwise 1
magnitude = abs(unscaled)
len = byte length of magnitude in canonical minimal little-endian form
meta = (len << 1) | sign
unscaledHeader = (meta << 1) | 1
payload = magnitude as canonical minimal little-endian bytes

Decoding:

meta = unscaledHeader >>> 1
sign = meta & 1
len = meta >>> 1
read len bytes as little-endian unsigned magnitude
unscaled = magnitude if sign == 0, otherwise -magnitude

Canonical Rules

Zero must use the small encoding.
Big encoding must not be used for zero.
In big encoding, payload must be the minimal little-endian representation.
Therefore, for big encoding, len > 0 and payload[len - 1] != 0.

Final Value

After decoding scale and unscaled, the decimal value is reconstructed as:

value = unscaled × 10^-scale

struct

Struct means object of class/pojo/struct/bean/record type. Struct values are serialized by writing fields in Fory order. The type meta before the value is written according to the rules in Type Meta.

Field order

Field order must be deterministic and identical across languages. This section defines the language-neutral ordering algorithm; implementations must follow the rules here rather than any language-specific helper classes.

Step 1: Field identifier

For every field, compute a stable identifier used for ordering:

If a non-negative tag ID is configured (e.g., @ForyField(id=...)), use the tag ID.
Otherwise, use the field name converted to snake_case.

Configured tag IDs must be non-negative. A negative configured tag ID is invalid; languages may use a negative value only as a default or internal sentinel for "no tag ID configured", which falls back to the snake_case field name and is not a tag ID. Tag IDs must be unique within a type; duplicate tag IDs are invalid.

Field identifiers compare as follows:

If both fields have tag IDs, compare the IDs numerically.
If only one field has a tag ID, the tagged field sorts first.
If neither field has a tag ID, compare the snake_case names lexicographically.
If fields still compare equal, use deterministic language-local tie-breakers such as declaring class name, original field name, or original field index.

Step 2: Group assignment

Assign each field to exactly one group in the following order:

Primitive (non-nullable): primitive or boxed numeric/boolean types without nullable metadata.
Primitive (nullable): primitive or boxed numeric/boolean types with nullable metadata.
Non-primitive: every other field, including strings, time/date/duration/decimal/binary values, unions, primitive arrays, collections, maps, enums, structs, ext/user-defined types, UNKNOWN fields, object arrays, and all other non-primitive schemas.

Step 3: Intra-group ordering

Within each group, apply the following sort keys in order until a difference is found:

Primitive groups (1 and 2):

Compression category: fixed-size numeric and boolean types first, then compressed numeric types (VARINT32, VAR_UINT32, VARINT64, VAR_UINT64, TAGGED_INT64, TAGGED_UINT64).
Primitive size (descending): 8-byte > 4-byte > 2-byte > 1-byte.
Internal type ID (ascending) as a tie-breaker for equal sizes.
Field identifier using the comparator from Step 1.

Non-primitive group (3):

Field identifier using the comparator from Step 1.

If two fields still compare equal after the rules above, preserve a deterministic order by comparing declaring class name and then the original field name. This tie-breaker should be reachable only in invalid schemas (e.g., duplicate tag IDs).

Notes

The ordering above is used for serialization order and TypeDef field lists. Schema hashes use the field identifier ordering described in the schema hash section.
Non-primitive type IDs and codec categories must not affect field order. Implementations may keep internal categories to preserve optimized serializers and generated code paths, but the categories are not ordering keys.
The compressed numeric rule is critical for cross-language consistency: compressed integer fields are always placed after all fixed-width integer fields.

Object value layout:

| [optional 4-byte schema hash] | field values |

The schema hash is written only when class-version checking is enabled. It is the low 32 bits of a MurmurHash3 x64_128 of the struct fingerprint string:

For each field, build <field_id_or_name>,<field_type_fingerprint>;.
Field identifier is the tag ID if present, otherwise the snake_case field name.
Sort by the field identifier comparator from Field order before concatenation.
field_type_fingerprint is recursive:
- Leaf: <type_id>,<ref>,<nullable>
- LIST / SET: <type_id>,<ref>,<nullable>[<element_fingerprint>]
- MAP: <type_id>,<ref>,<nullable>[<key_fingerprint>|<value_fingerprint>]
Nested container element/key/value fingerprints include nested type ID, container shape, and effective integer encoding, but nested nullable and ref policy are always hashed as 0. Only the root field nullable and ref bits participate in schema hash, because nested reads honor the wire null/ref flags directly.
This schema-hash rule is only for same-schema mode without TypeDef metadata. It does not permit compatible-mode matched-field classification to accept nested nullability or reference-tracking mismatches.

Field values are serialized in Fory order. Primitive fields are written as raw values (nullable primitives include a null flag). Non-primitive fields write ref/null flags as needed and then the value; polymorphic fields include type meta.

The field value layout is the same as same-schema mode, but the type meta for COMPATIBLE_STRUCT and NAMED_COMPATIBLE_STRUCT uses shared TypeDef entries. Deserializers use TypeDef to map fields by name or tag ID and to honor nullable/ref flags from metadata; unknown fields are skipped.

Union

Union values are encoded using three union type IDs so the union schema identity lives in type meta (like STRUCT/ENUM/EXT) and is easy to carry inside Any.

IDL syntax

union Contact [id=0] {
  string email = 0;
  int32  phone = 1;
}

Rules:

A union schema MUST declare at least one schema-defined alternative. The unknown-case carrier used by some language bindings is implementation-provided and is omitted from the schema's alternative table.
Each union alternative MUST have a stable non-negative tag number (= 0, = 1, ...).
Tag numbers MUST be unique within the union and MUST NOT be reused.
Unknown-case carriers exposed by language bindings have no local schema tag of their own; they replay the original peer schema tag when reserialized.

Type IDs and type meta

Type ID	Name	Meaning
33	UNION	Union value, schema identity not embedded
34	TYPED_UNION	Union value with registered numeric type ID
35	NAMED_UNION	Union value with embedded type name / shared TypeDef

Type meta encoding:

UNION (33): no additional type meta payload.
TYPED_UNION (34): write user_type_id as varuint32 after the type ID.
NAMED_UNION (35): followed by named type meta (namespace + type name, or shared TypeDef marker/body).

Field TypeDef metadata MUST use UNION for statically typed union fields, including generated typed ADT union fields. It MUST NOT use TYPED_UNION or NAMED_UNION there, because the field owner already supplies the union schema.

Union value payload

A union payload is:

| case_id (varuint32) | case_value (Any-style value) |

case_id is the union alternative tag number. Runtime APIs MAY expose zero-based ordinal indexes for generic union carriers; those ordinals are valid wire case_id values when they are the schema's alternative IDs.

case_value MUST be encoded as a full xlang value:

| field_ref_meta | field_value_type_meta | field_value_bytes |

This is required even for primitives so unknown alternatives can be skipped safely.

If a reader sees a case_id that is not present in its local union schema, it SHOULD preserve the unknown case when the target language has a language-neutral carrier for it. Such a carrier MUST expose the original case ID and decoded value, and it MUST retain only implementation-internal wire type ID state needed for reserialization. It MUST NOT store resolver-owned type metadata or other context-owned state. Writers MUST use the stored original case ID for the union envelope, not any generated carrier marker. Unknown-case payload writers MUST emit the Any-style payload body in wire order: ref metadata first, then full value type metadata, then value bytes. For internal numeric type IDs, the type ID byte is the complete value type metadata and the payload writer MAY use the stored wire type ID to preserve fixed, variable, or tagged integer encodings when the decoded value has the expected concrete value type. These scalar numeric payloads are not reference-tracked, so their ref metadata is NotNullValue. Otherwise it MUST fall back to the language implementation's ordinary polymorphic Any-value writer. Unknown carriers are implementation-provided forward-compatibility containers, not entries in the local schema case table; schema-defined union cases MAY use 0..N. When an unknown carrier is written back, the union envelope MUST use the carrier's original peer schema case ID unchanged, including 0 if that was the original peer schema case ID.

Wire layouts

UNION (schema known from context)

| ... outer ref meta ... | type_id=UNION(33) | case_id | case_value |

TYPED_UNION (schema identified by numeric id)

| ... outer ref meta ... | type_id=TYPED_UNION(34) | user_type_id | case_id | case_value |

user_type_id: varuint32 numeric registration ID for the union schema.

NAMED_UNION (schema embedded by name/typedef)

| ... outer ref meta ... | type_id=NAMED_UNION(35) | name_or_typedef | case_id | case_value |

Decoding rules

Read outer ref meta and type_id.
If TYPED_UNION, read user_type_id and resolve the union schema by ID.
If NAMED_UNION, read named type meta and resolve the union schema.
Read case_id.
Read case_value as Any-style value (ref meta + type meta + value).

If case_id is unknown, the decoder MUST still consume the case value using field_value_type_meta and standard skipValue(type_id).

When to use each type ID

Use UNION when the union schema is known from context. This includes statically typed generated union fields: the owning field metadata supplies the union schema, so the field type ID remains UNION even if the root or dynamic value form would identify the union as TYPED_UNION or NAMED_UNION.
Use TYPED_UNION for dynamic containers when numeric registration is available.
Use NAMED_UNION when name-based resolution is preferred or required.

Compatibility notes

case_id is a stable identifier; added alternatives are forward compatible and unknown cases can be skipped.

Type

Type will be serialized using type meta format.

Common Pitfalls

Byte Order: Always use little-endian for multi-byte values
Varint Sign Extension: Ensure proper handling of signed vs unsigned varints
Reference ID Ordering: IDs must be assigned in serialization order
Field Order Consistency: Must match exactly across languages in same-schema mode; in compatible mode, match by TypeDef field names or tag IDs
String Encoding: Use best encoding for current language
Null Handling: Different languages represent null differently
Empty Collections: Still write length (0) and header byte
Schema Hash Calculation: Must use the same fingerprint and MurmurHash3 algorithm across languages when enabled

Language Implementation Guidelines

See Xlang Implementation Guide documentation.

Cross-language Serialization Specification​

Type Systems​

Data Types​

Polymorphisms​

Type disambiguation​

List and Array Semantics​

Type ID​

Internal Type ID Table​

Type ID Encoding for User Types​

Type mapping​

Spec overview​

Fory header​

Reference Meta​

Reference Flags​

Reference Tracking Algorithm​

Reference ID Assignment​

When Reference Tracking is Disabled​

Language-Specific Considerations​

Type Meta​

Type ID encoding​

Type meta payload​

Shared Type Meta (streaming)​

TypeDef (schema evolution metadata)​

Global header​

TypeDef body​

Meta String​

Encoding Type IDs​

Character Mapping Tables​

LOWER_SPECIAL (5 bits per character)​

LOWER_UPPER_DIGIT_SPECIAL (6 bits per character)​

Encoding Algorithms​

LOWER_SPECIAL Encoding​

FIRST_TO_LOWER_SPECIAL Encoding​

ALL_TO_LOWER_SPECIAL Encoding​

Encoding Selection Algorithm​

Meta String Header Format​

Special Character Sets by Context​

Deduplication​

Value Format​

Basic types​

bool​

int8​

int16​

unsigned int32​

unsigned varint32​

signed int32​

signed varint32​

unsigned int64​

unsigned varint64​

unsigned hybrid int64 (TAGGED_UINT64)​

VarUint36Small​

signed int64​

signed varint64​

signed hybrid int64 (TAGGED_INT64)​

float8​

float16​

bfloat16​

float32​

float64​

string​

String Header​

Encoding Algorithm​

Encoding Selection by Language​

Empty String​

duration​

Canonical Rules​

Final Value​

collection/list​

Elements Header​

Type Info After Header​

Element Serialization Based on Header​

elements data​

array​

primitive array​

Multi-dimensional arrays​

object array​

map​

Map Chunk Format​

KV Header Bits​

Chunk Size​

Cross-language Serialization Specification

Type Systems

Data Types

Polymorphisms

Type disambiguation

List and Array Semantics

Type ID

Internal Type ID Table

Type ID Encoding for User Types

Type mapping

Spec overview

Fory header

Reference Meta

Reference Flags

Reference Tracking Algorithm

Reference ID Assignment

When Reference Tracking is Disabled

Language-Specific Considerations

Type Meta

Type ID encoding

Type meta payload

Shared Type Meta (streaming)

TypeDef (schema evolution metadata)

Global header

TypeDef body

Meta String

Encoding Type IDs

Character Mapping Tables

LOWER_SPECIAL (5 bits per character)

LOWER_UPPER_DIGIT_SPECIAL (6 bits per character)

Encoding Algorithms

LOWER_SPECIAL Encoding

FIRST_TO_LOWER_SPECIAL Encoding

ALL_TO_LOWER_SPECIAL Encoding

Encoding Selection Algorithm

Meta String Header Format

Special Character Sets by Context

Deduplication

Value Format

Basic types

bool

int8

int16

unsigned int32

unsigned varint32

signed int32

signed varint32

unsigned int64

unsigned varint64

unsigned hybrid int64 (TAGGED_UINT64)

VarUint36Small

signed int64

signed varint64

signed hybrid int64 (TAGGED_INT64)

float8

float16

bfloat16

float32

float64

string

String Header

Encoding Algorithm

Encoding Selection by Language

Empty String

duration

Canonical Rules

Final Value

collection/list

Elements Header

Type Info After Header

Element Serialization Based on Header

elements data

array

primitive array

Multi-dimensional arrays

object array

map

Map Chunk Format

KV Header Bits

Chunk Size