Agent workers
Agent workers are a Self-Hosted Enterprise Feature.
Agent workers are a 4th mode of execution of the Windmill binary, but instead of using MODE=worker
, we use here MODE=agent
.
It is most useful when workers are deployed remotely and they are in an unsecure environments and/or have a high latency to the Postgresql database.
Note that agent workers can even be running without docker on Linux, Windows and Apple targets using the cross-compiled binaries of Windmill attached to each release.
The differences is that the worker:
- will NOT emit intempestive warnings logs about high latency to the database
- will NOT start the embedded server api like normal workers do (the embedded server api is a performance optimization)
- (optional) will NOT create an ephemeral token per job and will instead use a static token that is pre-defined and that must only have access to the workspaces and folders this remote workers is meant to have access to
In this settings, workers will still require a connection to be passed with DATABASE_URL
as this is how they will:
- fetch jobs from the queue,
- emit audit logs in the database
- insert the finished job in the completed_job table.
All other operations are done by the intermerdiary of the api servers that the remote workers will connect to using http and the BASE_URL defined in the instance settings (overridable by passing BASE_INTERNAL_URL=<https://private_instance_url:8000>
to the worker`).
(Optional) Restrict the role used by the worker to the strict minimum
Not having access to the other tables allow to restrict the postgresql role of the database_url used by the worker to the strict minimum, mitigating any risks for the rest of the system even in the event of a compromise of the remote worker.
One such role that can be defined could be as follows:
CREATE USER agent with PASSWORD 'changeme';
GRANT ALL ON queue to agent;
GRANT SELECT ON job to agent;
GRANT INSERT on audit to agent;
GRANT ALL on sequence audit_id_seq to agent;
GRANT SELECT, INSERT, UPDATE ON completed_job to agent;
GRANT SELECT on config to agent;
GRANT SELECT on global_settings to agent;
GRANT SELECT, INSERT, UPDATE on worker_ping to agent;
GRANT ALL on concurrency_counter to agent;
GRANT SELECT, INSERT, UPDATE on pip_resolution_cache to agent;
GRANT SELECT, INSERT, UPDATE on job_logs to agent;
GRANT SELECT, INSERT, UPDATE on outstanding_wait_time to agent;
GRANT SELECT, UPDATE on script to agent;
GRANT SELECT, UPDATE, INSERT on deployment_metadata to agent;
GRANT SELECT, UPDATE, INSERT on dependency_map to agent;
GRANT SELECT, UPDATE on flow to agent;
GRANT SELECT, UPDATE on flow_version to agent;
GRANT SELECT, UPDATE, INSERT on flow_version_lite to agent;
GRANT SELECT, UPDATE, INSERT on flow_node to agent;
GRANT SELECT, UPDATE on schedule to agent;
GRANT SELECT (hostname, log_ts) ON log_file TO agent;
CREATE POLICY agent ON queue TO agent USING (tag = '<x>' OR tag = '<y>');
CREATE POLICY agent ON script TO agent USING (tag = '<x>' OR tag = '<y>');
CREATE POLICY agent ON flow TO agent USING (tag = '<x>' OR tag = '<y>');
CREATE POLICY agent ON job TO agent USING (tag = '<x>' OR tag = '<y>');
-- adapt below (and above) policies to your needs (e.g. based on workspace/path etc...)
CREATE POLICY agent ON flow_version TO agent USING (true);
CREATE POLICY agent ON flow_version_lite TO agent USING (true);
CREATE POLICY agent ON flow_node TO agent USING (true);
CREATE POLICY agent ON schedule TO agent USING (true);
CREATE POLICY agent ON completed_job TO agent USING (true);
CREATE POLICY agent
ON audit
FOR INSERT
TO agent
WITH CHECK (true);
Note that <x>
and <y>
need to be replaced with the exact job tags this agent is meant to be using.
Option 1. Using a static job token to avoid having to write complex RLS policies
-
Create a user with all the possible permissions ever needed by jobs on the remote worker:
- Instance settings -> Global users -> Add user to instance
- Login with what user, create a token
-
Run the binary or docker on the target with:
DATABASE_URL=postgres://agent:changeme@localhost/windmill JOB_TOKEN=<STATIC_TOKEN> BASE_INTERNAL_URL=http://private_instance_url:8000 MODE=agent ./windmill
Option 2: Define an RLS policy to the token table
The token table is the primary mechanism by which it is possible to do privilege escalation. Which is why Option 1. is recommended. But if writing RLS policy is your cup of tea, then you can avoid having to create a job token and do as follows:
Note that the token table has the following schema:
token | label | expiration | workspace_id | owner | email | super_admin | created_at | last_used_at | scopes
GRANT ALL on token to agent;
CREATE POLICY agent ON token TO agent USING <your custom defined policy>
You will most definitely want to restrict the ability to set super_admin to True and limit the owner
to be permissioned with only a few restricted possible owners.