Core Concepts
Wallaroo Core Concepts
A Wallaroo application consists of a pipeline with one or more sources. A pipeline takes in data from external systems, performs a series of computations based on that data, and optionally produces outputs which are sent to an external system.
Here is a diagram illustrating a single, linear pipeline:
External | Wallaroo Wallaroo| External
System ──>| Source ─> C1 ─> C2 ─> C3 ─> Sink |──> System
A | | B
An external data source sends data (say, over TCP) to an internal Wallaroo source. The Wallaroo source decodes that stream of bytes, transforming it into a stream of internal data types that are sent to a series of computations (C1, C2, and C3). Each computation takes an input and produces an output. Finally, C3 sends its output to a Wallaroo sink, which encodes that output as a series of bytes and sends it over TCP to an external system.
A Wallaroo application can have multiple sources. Each source might send data down a chain of computations. These chains can then be merged, say at a downstream state computation. For example:
External | Wallaroo
System ──>| Source ─> C1 ─> C2 -\
A | \
\ Wallaroo| External
-> C5 -> Sink |--> System
/ | C
External | Wallaroo /
System ──>| Source ─> C3 ─> C4 ─/
B |
The outputs of both C2 and C4 are sent along to C5, where they will arrive in a non-deterministic interleaving. The outputs of C5 are then sent to a sink and ultimately sent out over TCP to an external system.
Concepts
- State – Accumulated result of data stored over the course of time
- Stateless Computation – Code that transforms an input of some type
In
to an output of some typeOut
(or optionallyNone
if the input should be filtered out). - State Computation – Code that takes an input type
In
and a state object of some typeState
, operates on that input and state (possibly making state updates), and optionally producing an output of some typeOut
. - Source – Ingress (input) point for data from external systems into a Wallaroo application.
- Sink – Egress (output) point from a Wallaroo application to external systems.
- Decoder – Code that transforms a stream of bytes from an external system into a series of application input types.
- Encoder – Code that transforms an application output type into bytes for sending to an external system.
- Pipeline – A sequence of computations and/or state computations originating from one or more sources and terminating in a sink.
- Stage – A source, stateless computation, state computation, and sink each represents a single logical stage in a Wallaroo pipeline. The linear pipeline above–starting from a source, going through computations C1, C2, and C3, and terminating at a sink–has 5 logical stages (1 source, 3 computations, and 1 sink).
- Topology – A graph of how all sources, sinks, and computations are connected within an application.
- API – Wallaroo provides APIs for implementing all of the above concepts.