Skip to main content

Workers and worker groups

Workers are autonomous processes that run one script at a time using the entire cpu and memory available to them. They are at the basis of Windmill's architecture as run the jobs. The number of workers can be horizontally scaled up or down depending on needs without any overhead. Each worker on Windmill can run up to 26 million jobs a month, where each job lasts approximately 100ms.

Workers pull jobs from the queue of jobs in the order of their scheduled_for datetime as long as it is in the past. As soon as a worker pulls a job, it atomically sets its state to "running", runs it, streams its logs then once it is finished, the final result and logs are stored for as long as the retention period allows. Logs are optionally stored to S3.

By default, every worker is the same and interchangeable. However, there are often needs to assign jobs to a specific worker pool, and to configure this worker pool to behave specifically or have different pre-installed binaries. To that end, we introduce the concept of "worker groups".

You can assign groups to flows and flow steps to be executed on specific queues. The name of those queues are called tags. Worker groups listen to those tags.

Workers page

In the Community Edition, worker management is done using tags that can be respectively assigned to workers (through the env variable WORKER_TAGS) and scripts or flows, so that the workers listen to specific jobs queues.


In the Cloud plans & Self-Hosted Enterprise Edition, workers can be commonly managed based on the group they are in, from the UI. Specifically, you can group the workers into worker groups, groups for which you can manage the tags they listen to, assignment to a single script, or the worker init scripts, from the UI.


Examples of configurations include:

  1. Assign different jobs to specific worker groups by giving them tags.
  2. Set an init script that will run at the start of the workers (e.g. to pre-install binaries).
  3. Dedicate your worker to a specific script or flow for high throughput.

Assign custom worker groups

Assign custom worker groups to scripts and flows in Windmill for efficient execution on different machines with varying specifications.

This feature is useful if you want to run some scripts on a GPU machine, or if you want to run some scripts on high-memory machine.

How to have a worker join a worker group


Create a worker group in your docker-compose.yml and simply pass the worker group as the env variable WORKER_GROUP=<name_of_worker_group> for it to automatically join its corresponding worker group.

Windmill's responsibility is not to spawn the worker itself but to play well with existing service orchestrator such as Kubernetes, ECS, Nomad or Docker Compose, and any IaC. In those, you define the number of replicas (which can be auto-scaled up or down), the resource to allocate to those workers and the WORKER_GROUP passed as env.

Upon start, those workers will automatically join their worker group and fetch their configurations (including init scripts). They will also listen for changes on the worker group configuration for hot reloading.

Here is an example of a worker group specification in docker-compose:

windmill_worker_highmem:
image: ghcr.io/windmill-labs/windmill-ee:main
pull_policy: always
deploy:
replicas: 2
resources:
limits:
cpus: '1'
memory: 4096M
restart: unless-stopped
environment:
- DATABASE_URL=${DATABASE_URL}
- MODE=worker
- WORKER_GROUP=highmem

Assign replicas, resource constraints, and that's it, the worker will automatically join the worker group on start and be displayed on the Workers page in the Windmill app!

Worker only require a database URL and can thus be spawned in separate VPCs if needed (as long as there is a tunnel to the database). There is also an agent mode for situations where workers are running in an untrusted environment.

Set tags to assign specific queues

You can assign groups to flows and flow steps to be executed on specific queues. The name of those queues are called tags. Worker groups listen to those tags.

Tags and Queues infographics

There are 2 worker groups by default: default and native.

Default worker group

The tags of default worker group are:

  • deno: The default worker group for Deno scripts.
  • python3: The default worker group for Python scripts.
  • go: The default worker group for Go scripts.
  • bash: The default worker group for Bash scripts.
  • powershell: The default worker group for Powershell scripts.
  • dependency: Where dependency jobs are run.
  • flow: The default worker group for executing flows modules outside of the script steps.
  • hub: The default worker group for executing Hub scripts.
  • bun: The default worker group for Bun scripts.
  • php: The default worker group for PHP scripts.
  • rust: The default worker group for Rust scripts.
  • ansible: The default worker group for Ansible scripts.
  • csharp: The default worker group for C# scripts.
  • other: Everything else (other than the native tags).

Native worker group

Native workers are workers within the native worker group. This group is pre-configured to listen to native jobs tags. Those jobs are executed under a special mode with subworkers for increased throughput.

You can set the number of native workers to 0. Just make sure that you assign the native tags to other worker groups. Otherwise, the jobs with those tags will never be executed.

The tags of native worker group are:

  • nativets: The default worker group for Rest scripts.
  • postgresql: The default worker group for PostgreSQL scripts.
  • mysql: The default worker group for MySQL scripts.
  • mssql: The default worker group for MS SQL scripts.
  • graphql: The default worker group for Graphql scripts.
  • snowflake: The default worker group for Snowflake scripts.
  • bigquery: The default worker group for Bigquery scripts.


If you assign custom worker groups to all your workers, make sure that they cover all tags above, otherwise those jobs will never be executed.

Button Reset to native tags will reset the tags of native worker group to a given worker group.

Button Reset to all tags will reset the tags of default and native worker group to a given worker group.

Button Reset to all tags minus native ones will reset the tags of default worker group to a given worker group.

Reset to tags buttons

To make custom tags available from the UI, go to the dedicated "Workers" tab on the workspace and click on the "Assignable Tags" button:

Worker Group Assignable Tags

It is possible to restrict some tags to specific workspace using the following syntax:

gpu(workspace+workspace2)

Only 'workspace' and 'workspace2' will be able to use the gpu tags.

Jobs within a same job queue can be given a priority between 1 and 100. Jobs with a higher priority value will be given precedence over jobs with a lower priority value in the job queue.

How to assign worker tags to a worker group

Use the edit/create config next to the worker group name in Windmill UI:

Worker group config

Note: The worker group management UI is a Cloud plans & Self-Hosted Enterprise Edition feature. It is still possible to use worker groups with the community edition by passing to each worker the env variable WORKER_TAGS:

WORKER_TAGS=tag1,tag2

How to assign a custom worker group to a script or flow

For scripts deployed on the script editor, select the corresponding worker group tag in the settings section.

Worker group tag

For scripts inlined in the flow editor, select it in the module header:

Worker group tag

If no worker group is assigned to a script, it will be assigned the default worker group for its language.

You can assign a worker group to an entire flow in the flow's settings:

Flow&#39;s Worker Group

Dynamic tag

If a workspace tag contains the substring $workspace, it will be replaced by the workspace id corresponding to the job. This is especially useful to have the same script deployed to different workspace and have them run on different workers.

With the following assignable tag:

normal-$workspace

the workspaces, dev, staging, prod and the worker groups: normal-dev, normal-staging, normal-prod. The same script wih the tag normal-$workspace will run on the corresponding worker group depending on the workspace it is deployed to. This enable to share the same control plane but use workers with different network restrictions for tighter security.

Last, if the tags contain $args[argName] (e.g: foo-$args[foobar]) then the tag will be replaced by the string value at the arg key argName and thus can be fully dynamic.

See Deploy to staging prod to see a full UI flow to deploy to staging and prod.

Alerts

You can set an alert to receive notification via Email or Slack when the number of running workers in a group falls below a given number. It's available in the worker group config.

Workers alerts Slack

Enable 'Send an alert when the number of alive workers falls below a given threshold', and enter en number of workers below which the notificationi will be sent.

You need to configure Critical alert channels to receive notifications.

Workers alerts

Create worker group config

Worker group config infographics

In the Cloud plans & Self-Hosted Enterprise Edition, workers can be commonly managed based on the group they are in, from the UI. Specifically, you can group the workers into worker groups, groups for which you can manage the tags they listen to (queue), assignment to a single script, or the worker init scripts, from the UI.

In Community Edition Workers can still have their WORKER_TAGS passed as env.


Pick "New worker group config" and just write the name of your worker group.

New worker group config

You can then configure it directly from the UI.

Worker group config


Examples of configurations include:

  1. Assign different jobs to specific worker groups by giving them tags.
  2. Set an init script that will run at the start of the workers (e.g. to pre-install binaries).
  3. Dedicate your worker to a specific script or flow for high throughput.

Alerts

You can set an alert to receive notification via Email or Slack when the number of running workers in a group falls below a given number. It's available in the worker group config.

Workers alerts Slack

Enable 'Send an alert when the number of alive workers falls below a given threshold', and enter en number of workers below which the notificationi will be sent.

You need to configre Critical alert channels to receive notifications.

Workers alerts

Python runtime settings

Add Python runtime specific settings like additional Python paths and PIP local dependencies.

Python runtime settings

Environment variables passed to jobs

Add static and dynamic environment variables that will be passed to jobs handled by this worker group. Dynamic environment variable values will be loaded from the worker host environment variables while static environment variables will be set directly from their values below.

Environment variables passed to jobs

Autoscaling

Autoscaling automatically adjusts the number of workers based on your workload demands.

Autoscaling is available in the Enterprise plan.

Init scripts

Init scripts provide a method to pre-install binaries or set initial configurations without the need to modify the base image. This approach offers added convenience. Init scripts are executed at the beginning when the worker starts, ensuring that any necessary binaries or configurations are set up before the worker undertakes any other job.

Under the Cloud plans & Self-Hosted Enterprise Edition, they can be set from Windmill UI.

Init scripts Infographics

Dedicated workers / High throughput

Dedicated Workers are workers that are dedicated to a particular script. They are able to execute any job that target this script much faster than normal workers at the expense of being capable to only execute that one script. They are as fast as running the same logic in a forloop, but keep the benefit of showing separate jobs per execution.

Dedicated workers / High throughput is a Cloud plans & Self-Hosted Enterprise Edition feature.

Dedicated Workers Infographics

Service logs

View logs from any workers or servers directly within the service logs section of the search modal.

Service logs

Queue metrics

You can visualize metrics for delayed jobs per tag and queue delay per tag.

Queue metrics is an Enterprise Edition feature.

Metrics are available under "Queue metrics" button on the Workers page.

Only tags for jobs that have been delayed by more than 3 seconds in the last 14 days are included in the graph.

Queue Metrics

Queue metric alerts

Enterprise Edition users can set up Critical alerts on the Queue Metrics page, and be notified when the number of delayed jobs in a queue is above a certain threshold for more than a configured amount of time. The "cooldown" parameter determines the minimum duration between two consecutive alerts if the number of waiting jobs are fluctuating around the configured threshold.

Queue Metrics

Workers and compute units

Even though Windmill's architecture relies on workers, Windmill's pricing is based on compute units. A compute unit corresponds to 2 worker-gb-month. For example, a worker with 2GB of memory limit (standard worker) counts as 1 compute unit. A worker with 4GB of memory (large worker) counts as 2 compute units. Any worker with memory above 2GB counts as 2 compute units (16GB worker counts as 2 compute units). Each worker can run up to ~26M jobs per month (at 100ms per job).

The number of compute units will depend on the workload and the jobs Windmill will need to run. Each worker only executes one job at a time, by design to use the full resource of the worker. Workers come in different sizes based on memory: small (1GB), standard (2GB), and large (> 2GB). Each worker is extremely efficient to execute a job, and you can execute up to 26 million jobs per month per worker if each one lasts 100ms. However, it completely depends on the nature of the jobs, their number and duration.

As a note, keep in mind that the number of compute units considered is the number of production compute units of your workers, not of development staging, if you have separate instances. You can set staging instances as 'Non-prod' in the Instance settings. The compute units are calculated based on the memory limits set in docker-compose or in Kubernetes. For example, a standard worker with 2GB memory counts as 1 compute unit, while a large worker with >2GB memory counts as 2 compute units. Any worker with memory above 2GB still counts as 2 compute units. Small workers are counted as 0.5 compute unit.

Also, for the Enterprise Edition, the free trial of one month is meant to help you evaluate your needs in practice.