Jobs
A job represents a past, present or future "task" or "work" to be executed by a
worker. Future jobs or jobs waiting for a worker are called "queued
jobs", and are ordered by the time at which they were scheduled for
(scheduled_for
). Jobs that are created without being given a future
scheduled_for
are scheduled for the time at which they were created.
Workers fetch jobs from the queue, start executing them, atomically set their state in the queue to "running", stream the logs while executing them, then once completed remove them from the queue and create a new "completed job".
Every job has a unique UUID attached to it and as long as you have visibility over the execution of the script, you are able to inspect the execution logs, output and metadata in the dedicated details page of the job.
Job kinds
There are 5 main kinds of jobs, that each have a dedicated tab in the runs page:
-
Script Jobs: Run a script as defined by the hash of the script (that uniquely and immutably defines a specific version of a script), its input arguments (args) and the
permissioned_as
user or group of whom it is going to act on behalf of and inherit the visibility to other items such as resources and variables from. An user can NEVER escalates his privileges but only de-escalates it by launching a script with either the same permissions as himself or a subset of it (by giving the permissions of a group that he is member of). -
Preview Jobs: similar to script jobs but instead of hash, they contain the whole raw code they will run. Those are the jobs that are launched from the script editors. Even when code is executed as a preview, you keep a trace of its execution.
-
Dependencies Jobs: Scripts written in Python generate a lockfile when they are saved/created. This lockfile ensures that an execution of the same hash will always use the same versions. The process of generating this lockfile is also a job in itself so you can easily inspect the issues generating the lockfile if any. See Dependency Management for more information.
-
Flow Jobs: A flow job is the "meta" job that orchestrates the execution of every step. The execution of the steps are in-themselves jobs. It is defined similarly to a script job but instead of being defined by a path to a script, it is defined by a path to the flow.
-
Preview Flow Jobs: A preview flow is a job that contains the raw json definition of the flow instead of merely a path to it. It is the underlying job for the preview of flows in the flow editor interface.
Run jobs on behalf of
The permissioned_as
value from script and preview jobs is the most important
concept to grasp to understand what makes Windmill's security and permission
model consistent, predictable and safe. permissioned_as
is distinct from the
created_by
value, even though in the vast majority of jobs, they will be the
same. It represents the level of permissions this job will execute with. As a
direct consequence, the variables (including secrets) that are accessible to the
scripts are only those whom the user or group has visibility on, given his
permissions.
Similarly for the Contextual Variable WM_TOKEN
which
contains an ephemeral token (ephemeral to the script execution), which has the
same privilege and permissions as the owner in permissioned_as
. The
Python client inside the script implicitly uses that same
token to be granted privilege to do Windmill operations (like running other
scripts or getting resources), meaning that the same script ran by 2 different
users, will run differently and may be unauthorized to do partially or all
operations of the script. This is what enables anyone to share scripts doing
sensitive operations safely as long as the resources and secrets that the script
relies on are permissioned correctly.
A user can only run a script permissioned as either himself, one of the group that he is a member of.
Job inputs and script parameters
Jobs take a JSON object as input which can be empty. That input is passed as the payload of the POST request that triggers the Script. The different key-value pairs of the objects are passed as the different parameters of the main function, with just a few language-specific transformations to more adequate types in the target language, if necessary (e.g base64/datetime encoding). Values can be nested JSON objects themselves, but we recommend trying to keep the input flat when possible.
If the payload contains keys that are not defined as parameters in the main function, they will be ignored. This allows you to handle arbitrary JSON payloads, as you can choose which keys to define as parameters in your script and process the data accordingly.
Retention policy
The retention policy for jobs runs details varies depending on your team's plan:
- Community plan (cloud): Jobs runs details are retained for 60 days.
- Team plan (cloud): Jobs runs details are retained for 60 days.
- Enterprise plan (cloud): Unlimited retention period.
- Open Source (self-host): Jobs runs details are retained for maximum 30 days.
- Enterprise plan (self-host): Unlimited retention period.
You can set a custom retention period for the jobs runs details. The retention period can be configured in the instance settings, in the "Core" tab.
High priority jobs
High priority jobs are jobs that are given a priority
value between 1 and 100. Jobs with a higher priority value will be given precedence over jobs with a lower priority value in the job queue.
Large job logs management
To optimize log storage and performance, Windmill leverages S3 for log management. This approach minimizes database load by treating the database as a temporary buffer for up to 5000 characters of logs per job.
For jobs with extensive logging needs, Windmill Enterprise Edition users benefit from seamless log streaming to S3. This ensures logs, regardless of size, are stored efficiently without overwhelming local resources.
This allows the handling of large-scale logs with minimal database impact, supporting more efficient and scalable workflows.
For large logs storage (and display) and cache for distributed Python jobs, you can connect your instance to a bucket from the instance settings.
This feature has no overlap with the Workspace object storage.
You can choose to use either S3 or Azure Blob Storage. For each you will find a button to test settings from a server or from a worker.
S3
Name | Type | Description |
---|---|---|
Bucket | string | Name of your S3 bucket. |
Region | string | If left empty, will be derived automatically from $AWS_REGION. |
Access key ID | string | If left empty, will be derived automatically from $AWS_ACCESS_KEY_ID, pod or ec2 profile. |
Secret key | string | If left empty, will be derived automatically from $AWS_SECRET_KEY, pod or ec2 profile. |
Endpoint | string | Only needed for non AWS S3 providers like R2 or MinIo. |
Allow http | boolean | Disable if using https only policy. |
Azure Blob
Name | Type | Description |
---|---|---|
Account name | string | The name of your Azure Storage account. It uniquely identifies your Azure Storage account within Azure and is required to authenticate with Azure Blob Storage. |
Container name | string | The name of the specific blob container within your storage account. Blob containers are used to organize blobs, similar to a directory structure. |
Access key | string | The primary or secondary access key for the storage account. This key is used to authenticate and provide access to Azure Blob Storage. |
Tenant ID | string | (optional) The unique identifier (GUID) for your Azure Active Directory (AAD) tenant. Required if using Azure Active Directory for authentication. |
Client ID | string | (optional) The unique identifier (GUID) for your application registered in Azure AD. Required if using service principal authentication via Azure AD. |
Endpoint | string | (optional) The specific endpoint for Azure Blob Storage, typically used when interacting with non-Azure Blob providers like Azurite or other emulators. For Azure Blob Storage, this is auto-generated and not usually needed. |