Skip to main content
The encoding block defines how Tangent serializes and compresses records before writing them to a sink (e.g., S3, File).
It controls file format, schema handling (for Avro/Parquet), and data compression.

Schema

encoding.type
string
required
Output format for serialized records.
Defaults to ndjson.
Options
  • ndjson — newline-delimited JSON; one record per line
  • json — standard JSON array of records
  • avro — Apache Avro binary encoding (requires schema)
  • parquet — Apache Parquet columnar format (requires schema)
encoding.schema
string
Path to a schema file required by avro and parquet formats.
Can point to a local .avsc (Avro) or .json (Arrow/Parquet) schema.
compression
string | object
Configures compression for the encoded data.
Defaults to zstd with level 3.
Options
  • none — no compression
  • gzip — standard Gzip (.gz extension, level 6 default)
  • zstd — Zstandard (.zst extension, level 3 default)
  • snappy — Snappy block compression (Avro-only)
  • deflate — Deflate stream compression (Avro-only)
Examples
compression:
  type: gzip
  level: 6
compression:
  type: zstd
  level: 3
Notes
  • For Avro and Parquet, compression applies to data blocks inside the file, not the file as a whole.
  • Tangent automatically appends compression extensions when applicable (e.g. .gz, .zst).

Examples

NDJSON (default)

tangent.yaml
encoding:
  type: ndjson
  compression:
    type: zstd
    level: 3

Parquet with Arrow schema

tangent.yaml
encoding:
  type: parquet
  schema: ./schemas/arrow_schema.json
  compression:
    type: gzip

Avro with explicit schema

tangent.yaml
encoding:
  type: avro
  schema: ./schemas/events.avsc
  compression:
    type: deflate

Defaults Summary

SettingDefaultDescription
encoding.typendjsonNewline-delimited JSON
encoding.schemaRequired for Avro/Parquet
compression.typezstdData compression method
compression.level3 (zstd) / 6 (gzip)Compression strength

See also