Architecture and data exchange
In Windmill, a workflow is a JSON serializable value in the OpenFlow format that consists of an input spec (similar to Scripts), and a linear sequence of steps, also referred to as modules. Each step consists of either:
- Reference to a Script from the Hub.
- Reference to a Script in your workspace.
- Inlined Script in TypeScript (Deno), Python, Go, Bash, SQL or non-supported languages.
- Trigger scripts which are a kind of Scripts that are meant to be first step of a scheduled Flow, that watch for external events and early exit the Flow if there is no new events.
- For loop that iterates over elements and triggers the execution of an embedded flow for each element. The list is calculated dynamically as an input transform.
- Branch to the first subflow that has a truthy predicate (evaluated in-order).
- Branches to all subflows and collect the results of each branch into an array.
- Approval/Suspend steps which suspend the flow at no cost until it is resumed by getting an approval/resume signal.
- Inner flows.
Input transform
With the mechanism of input transforms, the input of any step can be the output of any previous step, hence every Flow is actually a Directed Acyclic Graph (DAG) rather than simple sequences. You can refer to the result of any step using its ID.
Every step has an input transform that maps from:
to the different parameters of this specific step.
It does that using a JavaScript expression that operates in a more restricted setting. That JavaScript is using a restricted subset of the standard library and a few more functions which are the following:
flow_input
: the dict/object containing the different parameters of the Flow itself.results.{id}
: the result of the step with given ID.resource(path)
: the Resource at path.variable(path)
: the Variable at path.
Using JavaScript in this manner, for every parameter, is extremely flexible and allows Windmill to be extremely generic in the kind of modules it runs.
Connecting flow steps
For each field, one has the option to write the JavaScript directly or to use
the quick connect button if the field map one to one with a field of the
flow_input
, a field of the previous_result
or of any steps.
From the editor, you can directly get:
- Static inputs: you can find them on top of the side menu. This tab centralizes the static inputs of every steps. It is akin to a file containing all constants. Modifying a value here modify it in the step input directly.
- Dynamic inputs:
- using the id associated with the step
- clicking on the plug logo that will let you pick flow inputs or previous steps' results (after testing flow or step).
You can connect step inputs automatically using Windmill AI.
Custom flow states
A state is an object stored as a resource of the resource type state
which is meant to persist across distinct
executions of the same Script. This is what enables Flows to watch for changes in most event-watching scenarios.
Custom flow states are a way to store data across steps in a flow. You can set and retrieve a value given a key from any step of flow and it will be available from within the flow globally. That state will be stored in the flow state itself and thus has the same lifetime as the flow job itself.
It's a powerful escape hatch when passing data as output/input is not feasible and using getResource/setResource has the issue of cluttering the workspace and inconvenient UX.
- TypeScript
- Python
import * as wmill from "[email protected]"
export async function main(x: string) {
await wmill.setFlowUserState("FOO", 42)
return await wmill.getFlowUserState("FOO")
}
import wmill
#extra_requirements:
#wmill==1.297.0
def main(x: str):
wmill.set_flow_user_state("foobar", 43)
return wmill.get_flow_user_state("foobar")
Shared directory
By default, flows on Windmill are based on a result basis (see above). A step will take as input the results of previous steps. And this works fine for lightweight automation.
For heavier ETLs and any output that is not suitable for JSON, you might want to use the Shared Directory
to share data between steps. Steps share a folder at ./shared
in which they can store heavier data and pass them to the next step.
Get more details on the Persistent storage & databases dedicated page.