2026-05-10
OpenShift AI: a comprehensive mind map
OpenShift AI’s scope isn’t obvious from the marketing. It’s not a single product; it’s a platform of ~15 components plus a curated stack of GPU operators, integration points, and use-case-specific layers. The component list reads like a Kubeflow + OpenShift + GenAI bingo card, and figuring out what relates to what — and what you actually need — takes effort.
This post is a comprehensive mind map of the platform, organized into the eight branches that I find capture the scope most cleanly. Each branch radiates from the center to its key sub-topics. Pan with drag, zoom with scroll, click ⛶ fullscreen for the full-window view.
The trunk edges (thick green) connect the central platform to each branch; the branch edges (thin gray) connect each branch to its sub-topics. Eight directions, eight slices of the platform.
The eight branches
Components. The actual products inside OpenShift AI: Workbenches (Jupyter / RStudio / Code Server), Data Science Pipelines (Kubeflow Pipelines v2 on Argo Workflows), Distributed Training (Ray + Kubeflow Training Operator), KServe with vLLM as the LLM runtime, the Model Registry (added 2024), TrustyAI for fairness / bias / explainability, the OpenShift AI Dashboard as a console plugin, and the curated workbench images that ship pre-baked with CUDA, ROCm, PyTorch, and TensorFlow stacks. If you’ve used OpenShift AI, you’ve used some subset of these nine.
LLM Stack. The GenAI-specific tooling layered on top of the base platform: InstructLab as the fine-tuning workflow, the LAB methodology (large-scale alignment for chatbots) for synthetic-data generation, Granite as Red Hat’s open-source model family, the RHEL AI Inference Server for standalone serving, vLLM’s multi-LoRA support for serving many adapters on one base model, reference RAG patterns with pgvector or Milvus, and NeMo Guardrails for safety layering. Center of gravity since 2023; covered in depth in the OpenShift AI post.
Architecture. The OpenShift-level foundations that the AI platform rides on. The OpenShift AI Operator orchestrates everything; KNative Serverless gives KServe its autoscale-to-zero behavior; Service Mesh handles inference traffic routing (including canary deployments); Tekton powers OpenShift Pipelines for CI/CD; Argo Workflows backs the Data Science Pipelines runtime; Authorino handles OAuth / OIDC for the dashboard and inference endpoints. Plus the Logging and Monitoring stacks every OCP cluster gets.
GPUs & Accelerators. Hardware enablement is operationally non-trivial; it gets its own branch. NVIDIA GPU Operator is the dominant case (H100, B200, L40S, A100, T4); AMD ROCm Operator covers MI300X / MI250; Intel Gaudi Operator handles Habana Gaudi 2/3. Node Feature Discovery labels nodes so workloads schedule correctly. MIG (Multi-Instance GPU) and time-slicing let you share an H100 across multiple smaller jobs — important for the economics of inference workloads that don’t saturate a full GPU.
MLOps Lifecycle. The end-to-end model lifecycle: experiment tracking, model registry, pipelines for training, KServe for deployment, then the day-2 concerns — drift detection, bias detection (TrustyAI), explainability (LIME / SHAP via TrustyAI), A/B testing via Service Mesh traffic splitting. The lifecycle-as-CRs framing is what differentiates OpenShift AI from a Notebook-as-a-service product: every step is a Kubernetes object you can version, audit, and operate.
Integrations. What OpenShift AI connects to in the broader Red Hat ecosystem. OpenShift GitOps deploys model serving across fleets via ApplicationSet. OpenShift Pipelines (Tekton) drives image builds for workbench images and serving containers. RHACM handles multi-cluster model deployment with the pull model. RHACS scans the resulting containers. Service Mesh / Serverless are the runtime substrate for inference. RHEL AI is the inference-only spinoff for non-OpenShift edge. Red Hat Connectivity Link is the API gateway story.
Deployment Models. Where OpenShift AI actually runs. Self-managed OCP on-prem is still the largest deployment. ROSA (AWS), ARO (Azure), and OpenShift Dedicated (GCP) are the cloud-managed variants. IBM Cloud OpenShift gets specific mention given IBM ownership. Disconnected / air-gapped is supported and used in regulated industries. Hosted Control Planes (HyperShift) is the multi-tenant economics fix for dense fleets. Edge / far-edge is the telco-flavored scenario.
Use Cases. What people actually build. Classical predictive ML (regression, classification, fraud, churn) is still the largest category by deployment count. LLM fine-tuning and LLM serving are the fastest-growing. RAG apps and AI agents are the application layer on top of those. Computer vision (industrial inspection, medical imaging) is a stable specialty. Traditional NLP, time series, and recommendation systems round out the catalog. The platform supports all eight, with varying levels of opinionated tooling per use case.
How to use this map
The map is a vocabulary tour and gap analysis tool, not a recommendation:
- Orientation. When a colleague says “we use TrustyAI for bias monitoring,” the map shows you that’s a Components-branch tool with MLOps-branch relevance.
- Gap analysis. Look at your current OpenShift AI deployment against the map. The branches you don’t have anything in are either gaps or deliberate scope reductions.
- Adoption planning. New OpenShift AI adopters typically start in Components (Workbench + DS Pipelines + KServe), add GPUs for serious training, add LLM Stack when GenAI use cases show up, and grow outward through MLOps and Integrations as the practice matures.
What the map deliberately omits
This is the 8-branch view of OpenShift AI specifically. The map doesn’t show:
- The broader Kubeflow ecosystem — Notebooks, Pipelines, Training Operator are inside OpenShift AI; other Kubeflow components (KFServing, Katib) are not, and aren’t shown.
- Third-party tools that integrate — Weights & Biases, MLflow, LangChain, Hugging Face Hub all work with OpenShift AI but aren’t part of the platform. The AI/ML landscape map covers those.
- Internal sub-architecture — each leaf could itself be expanded into a sub-mind-map. KServe alone has 8-10 distinct sub-components.
- Pricing / licensing tiers — orthogonal to the technical mind map.
If you want depth on any branch, the main OpenShift AI post walks through the lifecycle in narrative form. This mind map is the visual index.
The trap
The hardest thing about a platform with this many components is the temptation to enable all of them. A team that turns on Workbench + DS Pipelines + Ray + KServe + TrustyAI + InstructLab + Multi-LoRA + Service Mesh routing + A/B testing on day one ends up operating eight things instead of shipping one model. The platform is designed so that each branch can be adopted independently — start with the Components branch’s first three (Workbench → DS Pipeline → KServe), get one end-to-end model running, then expand outward only when the absence of a specific capability is what’s blocking you.
The map is comprehensive on purpose. Your initial adoption shouldn’t be.