2026-05-10

Argo Workflows: Kubernetes-native pipelines

Argo Workflows is a Kubernetes-native workflow engine. You describe a pipeline as a Workflow CR — a directed acyclic graph of steps — and the controller schedules each step as a pod. Outputs flow through artifacts in an object store. Common uses: CI/CD pipelines, ML training, data processing, batch jobs, anything you’d previously have done with Jenkins, Airflow, or a hand-rolled shell script over kubectl.

It’s a CNCF graduated project, part of the Argo family (CD for GitOps, Workflows for pipelines, Events for triggering, Rollouts for progressive delivery). This post focuses on Workflows specifically, with comparisons to its peers at the end.

The position

Argo Workflows occupies a specific niche: container-per-step pipelines on Kubernetes. Three properties define it:

  1. Each step is a pod. Not a “stage in a JVM,” not “a function in a daemon” — an actual Kubernetes pod, with its own container image, resources, lifecycle. Want a step that runs python train.py? Use a Python image. Want a step that runs terraform apply? Use HashiCorp’s image. No plugin ecosystem to glue together.
  2. The pipeline is a CRD. The Workflow is just a Kubernetes object. Versioned in Git, applied with kubectl, viewable in the cluster, scoped by namespace, secured by RBAC.
  3. DAG and step semantics, declared in YAML. Steps can run sequentially, in parallel, conditionally, with retries, with template reuse via WorkflowTemplate CRs.

If your pipeline can be described as “run this image, then this image, then this image, passing files between them,” Argo Workflows is the cleanest expression of that in Kubernetes.

Architecture

Mini Map

Reading the diagram:

  • Workflow CR — your pipeline definition. Either created on demand or stamped from a WorkflowTemplate. Triggered by users (argo submit) or events (via Argo Events).
  • argo-server + workflow-controller — the two components that make up the Argo Workflows deployment. The server provides the API and UI; the controller watches Workflow CRs and schedules step pods.
  • Step pods — one per step that’s currently running. Each pod has a main container (your code) and a sidecar that handles artifact upload/download, log capture, and signaling.
  • Artifact store — S3, GCS, MinIO, Azure Blob. Workflow steps declare artifact outputs (/output/model.pkl) and inputs; the sidecar uploads/downloads them automatically. This is how data flows between pods that may run on different nodes.

The green dashed edges are the controller scheduling step pods. Solid edges are user input and artifact data. No persistent process holds workflow state in memory — everything is in the Workflow CR’s status field, so the controller can crash and resume.

Workflow templates and reuse

The thing that makes Argo Workflows usable at scale is template composition.

A WorkflowTemplate is a reusable, parameterized workflow definition. A team writes templates once (“the standard model training pipeline,” “the standard release pipeline”) and downstream users instantiate them by name with parameters. Combined with ClusterWorkflowTemplate (cluster-scoped), this gives you something like a function library — but for pipelines.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
spec:
  workflowTemplateRef:
    name: train-and-register-model
  arguments:
    parameters:
      - name: dataset
        value: s3://bucket/datasets/2026-q2/
      - name: epochs
        value: "20"

This pattern — central platform team owns templates, application teams instantiate them — is what differentiates “we use Argo Workflows” from “we use Argo Workflows well.”

The execution model in detail

Each step’s pod is created on demand, runs to completion, and is reaped. The controller polls pod status and updates the Workflow CR’s status with which steps have completed.

Steps interact via three mechanisms:

  • Parameters — small string values passed in command-line args or environment variables.
  • Artifacts — files uploaded to / downloaded from object storage by the wait sidecar. The unit of data interchange.
  • Result outputs — small string outputs read from a file in the pod, used to drive DAG conditionals.

Three execution constructs cover most real pipelines:

ConstructWhat it gives you
steps:Sequential pipeline with parallel sub-steps within each stage
dag:Explicit dependency graph; step B runs when step A completes
withItems: / withParam:Fan-out — run the same step against a list of items, in parallel

The combination is what makes complex pipelines expressible: a DAG of stages, each with parallel fan-out for grid search or batch processing, with conditional next-steps based on result outputs.

Where it sits in the workflow landscape

Workflow engines overlap in unhelpful ways. The clearest framing:

ToolProgramming modelBest at
Argo WorkflowsYAML DAG, pod-per-stepContainer-native batch and ML pipelines on K8s
Tekton (OpenShift Pipelines)YAML, pod-per-stepCI/CD specifically; similar shape, more CI-focused features
Apache AirflowPython DAGsData engineering pipelines; mature operator ecosystem, scheduler-centric
Kubeflow PipelinesPython SDK → Argo Workflows YAMLML pipelines specifically; uses Argo under the hood
FlytePython decorators → K8sType-safe ML pipelines with strong lineage / caching
TemporalCode-first durable workflowsLong-running business processes (months / years), not batch jobs
Prefect / DagsterPythonData engineering with developer ergonomics
Step Functions / Durable FunctionsJSON state machines / codeManaged serverless workflows on AWS / Azure

Argo’s natural lane: container-native pipelines on Kubernetes, where each step is heterogeneous (different images, different languages, different teams). If everything in your pipeline is Python and the data fits in memory between steps, Prefect or Dagster will give you better dev ergonomics. If everything is one team’s CI pipeline, Tekton fits the use case more directly.

Argo Events: the missing trigger story

Argo Workflows by itself starts workflows on user submission or cron schedule. Argo Events is the separate Argo project that fills the rest:

  • EventSource — listens to external events (GitHub webhooks, Kafka topics, S3 bucket changes, calendar schedules, AWS SNS, etc.)
  • Sensor — defines a trigger that fires when conditions on the event bus match
  • The trigger can create a Workflow, post to Slack, call a webhook, etc.

The Workflows + Events pair turns Argo into a full event-driven automation platform. Most production deployments use both.

Limitations and pitfalls

  • YAML at scale. Workflows past ~20 steps in pure YAML become unmaintainable. Lean on WorkflowTemplate decomposition or use the Hera Python SDK (or Argo’s own Python SDK) to generate workflows programmatically.
  • Artifact size. Pushing large artifacts (GBs) through the wait sidecar is slow and uses a lot of pod-local disk. For very large data, mount a shared volume or write directly to your store from main, bypassing the artifact mechanism.
  • Pod startup latency. Each step is a pod cold-start — typically 5–15 seconds before your code runs. For very fast steps, this dominates. Combine fast steps into single steps.
  • Quota explosions. A fan-out with withItems against a 10,000-item list creates 10,000 pods. Set parallelism: to cap concurrent step pods or you’ll exhaust the cluster.
  • State outside the cluster. Workflows in long-term archive accumulate. The archive (workflow-archive) stores completed workflows in a database; configure retention or your DB grows without bound.
  • Sensitive parameters. Step parameters are visible in the Workflow CR — visible to anyone with RBAC read on the resource. Use Kubernetes secrets via env vars / volumeMounts for anything sensitive.

Where to start

  1. Install Argo Workflows in a namespace via the official manifests or the Helm chart. Configure an artifact store (MinIO or your existing S3 bucket).
  2. Submit the hello-world workflow from the docs. argo watch it. Open the UI. Validate the rails.
  3. Move your simplest existing pipeline — a build, a daily report, a model training run — to Argo Workflows. Don’t pick the hardest one first.
  4. Introduce WorkflowTemplate the second time you find yourself copying a pipeline. Owning a template is cheaper than owning two near-duplicate pipelines.
  5. Add Argo Events when “we manually re-run this pipeline” stops being acceptable. Hook it to your Git provider, your queue, or whatever produces the trigger event.
  6. Adopt Hera (the Python SDK) when YAML becomes the bottleneck. Hera generates Argo Workflows YAML from Python; you keep all the Argo execution semantics with much better authoring ergonomics.

The mistake to avoid: writing every workflow inline as standalone YAML. Argo’s value compounds with reuse. Templates are the unit of return-on-investment; pipelines without templates feel like Bash scripts in disguise. Pipelines built from templates feel like a real platform.