This doc explains, in concrete terms, how Tangent’s runtime achieves near‑native throughput while running user plugins in WebAssembly. The audience is experienced, performance‑sensitive engineers who want to see the real mechanics: data movement, copying behavior, SIMD, borrowing across the host/guest boundary, batching/scheduling, and fan‑out.tl;dr
Data is not copied into WASM guest memory. Guests receive handles to host‑owned views and ask the host for just the scalars they need. Scalars (except string) avoid heap allocations.
Plugins subscribe to logs with JSON field selectors. This subscription is handled on the host.
Data is only copied once if outputting NDJSON. Twice for other formats.
In practice, end‑to‑end throughput is dominated by user logic, not Wasm overhead. When user logic is written with these constraints in mind, we routinely see performance in the same class as native services, and sometimes faster.
Incoming frames are accumulated as BytesMut buffers and parsed with simd-json using borrowed parsing. The parser returns a BorrowedValue that references the original byte buffer; we do not materialize a new heap of JSON objects.
We pin the original bytes and the parsed view together, so the borrowed tree remains valid without copying. Concretely, the host wraps the raw bytes and the BorrowedValue in a small Arc so lifetimes are tied and access is safe from the guest side via a resource handle.
Key effect: parsing is SIMD‑accelerated and allocation‑light, and the parsed structure is a zero‑copy view into the original data.
The guest does not receive the JSON bytes. Instead, the guest receives resource handles (via the Wasm Component Model) to host‑managed LogViews.
All field access (has/get/keys/…) is implemented by host functions. The guest calls into the host to read scalars by path. Only small, immediate values cross the boundary.
Strings do involve allocation when crossing the boundary (they are returned as owned strings to the guest by design), but this is far cheaper than copying entire JSON documents. Numeric and boolean scalars cross as immediate values.
This pattern avoids the classic Wasm bottleneck of copying large payloads into guest memory.
Workers batch frames by size and age. On flush, each record is parsed once and routed to the set of mappers whose selectors match. This pushes filtering into the host side, further reducing guest work and cross‑boundary calls.
Mapper plugins return output frames as contiguous byte vectors. We convert these to Bytes/BytesMut for downstream routing. This is a single materialization per output frame, by design.
Parse JSON directly against the mutable input buffer to build a BorrowedValue tree that points at the input’s bytes.
Freeze the buffer into an immutable, ref‑counted Bytes and store it alongside the borrowed tree inside an Arc.
Expose that pair to the guest as a resource handle. Host methods use the handle to look up fields and return scalars.
Why it’s safe:
The input bytes are reference‑counted and pinned for at least as long as the guest holds the handle.
The guest cannot mutate the underlying bytes; it only holds an opaque handle and can request reads via host functions.
Resource lifetimes are explicit; when the guest drops the handle, the host decrements and eventually frees the underlying view/buffer.
Practical implication: we get zero‑copy access to JSON fields, and the cost of crossing the boundary is proportional to the number of scalars actually read, not the size of the input.
For each record we evaluate compiled selectors against the borrowed view (has/eq/prefix/in/gt/regex). Only matching records are included for a given mapper.
This keeps guest work focused and reduces the number of handles passed into guest code.
The router maps a NodeRef (plugin) to one or more downstream nodes (plugins or sinks).
If there is a single downstream, the frame is forwarded directly.
If there are multiple downstreams, frames are duplicated per downstream delivery. Today, duplication happens at the frame boundary; this is a conscious trade‑off for simplicity and isolation of downstream stages.
Note on duplication: where possible, we use reference‑counted buffers to avoid deep copies. For some buffer types and paths, duplication may materialize new buffers. We keep fan‑out widths reasonable and recommend designing topologies to minimize unnecessary wide duplication of large frames.
Upstream acks are reference‑counted: each downstream delivery counts as one. When all deliveries complete, the shared ack triggers and the source is acknowledged exactly once.
This provides backpressure and prevents unbounded buffering.
Strings cross as owned: Reading large strings repeatedly into the guest will allocate.
Fan‑out duplication: Wide fan‑outs can multiply bytes in memory. Prefer routing trees that avoid unnecessary broadcast of large frames, or convert frames to compact encodings before fan‑out.
Regex costs: Regex predicates are compiled once, but matching is still non‑trivial. Use them judiciously and combine with cheaper predicates (has, prefix, numeric comparisons) to pre‑filter.
Huge objects: Extremely large JSON objects may reduce the effectiveness of borrowing. Consider pre‑normalizing at the source or switching to a more compact input encoding if you control the producer.
If you want to validate performance on your hardware:
Use a realistic dataset (JSON Lines). There’s sample input data provided in plugins tests/.
Disable debug logging and compile in release mode.
Vary batch size and age; observe throughput and tail latencies.
Compare against a native baseline that uses simd-json and equivalent logic. Expect differences to be driven by guest logic and fan‑out patterns rather than boundary overhead.
The boundary overhead in this design is small and mostly constant per scalar accessed; you should see line‑rate scaling until CPU saturates on parsing or guest logic.