encoding block defines how Tangent serializes and compresses records before writing them to a sink (e.g., S3, File).It controls file format, schema handling (for Avro/Parquet), and data compression.
Schema
Output format for serialized records.
Defaults to
Defaults to
ndjson.Optionsndjson— newline-delimited JSON; one record per linejson— standard JSON array of recordsavro— Apache Avro binary encoding (requires schema)parquet— Apache Parquet columnar format (requires schema)
Path to a schema file required by
Can point to a local
avro and parquet formats.Can point to a local
.avsc (Avro) or .json (Arrow/Parquet) schema.Configures compression for the encoded data.
Defaults toNotes
Defaults to
zstd with level 3.Optionsnone— no compressiongzip— standard Gzip (.gzextension, level 6 default)zstd— Zstandard (.zstextension, level 3 default)snappy— Snappy block compression (Avro-only)deflate— Deflate stream compression (Avro-only)
- For Avro and Parquet, compression applies to data blocks inside the file, not the file as a whole.
- Tangent automatically appends compression extensions when applicable (e.g.
.gz,.zst).
Examples
NDJSON (default)
tangent.yaml
Parquet with Arrow schema
tangent.yaml
Avro with explicit schema
tangent.yaml
Defaults Summary
| Setting | Default | Description |
|---|---|---|
encoding.type | ndjson | Newline-delimited JSON |
encoding.schema | — | Required for Avro/Parquet |
compression.type | zstd | Data compression method |
compression.level | 3 (zstd) / 6 (gzip) | Compression strength |