Benchmarks

How Sabre performs on real Kubernetes tasks with open-source models via Ollama.

Results are from k8s-ai-bench, a benchmark suite of 24 real-world Kubernetes tasks spanning debugging, configuration, and deployments. All models run locally via Ollama -- no API keys, no cloud services.

Pass@1

The task is attempted once. The score reflects how often the model gets it right on the first try -- the most realistic measure of day-to-day usage.

Pass@5

The task is attempted up to 5 times. If any attempt succeeds, it counts as a pass. This measures the model's capability ceiling with retries.

devstral-small-2:24b Open Source

Pass@1

70.8%

Pass@5

91.7%

0% 70.8% Pass@1 100%

Debugging (8 tasks)

✓fix-crashloop

✓fix-image-pull

✓fix-pending-pod

✗fix-probes

✓fix-service-routing

✓fix-service-with-no-endpoints

✓fix-rbac-wrong-resource

✓debug-app-logs

Configuration (10 tasks)

✗create-network-policy

✓create-pod

✓create-pod-mount-configmaps

✓create-pod-resources-limits

✗create-simple-rbac

✓horizontal-pod-autoscaler

✓list-images-for-pods

✗multi-container-pod-communication

✓resize-pvc

✗setup-dev-cluster

Deployments (6 tasks)

✗create-canary-deployment

✓deployment-traffic-switch

✓rolling-update-deployment

✓scale-deployment

✓scale-down-deployment

✗statefulset-lifecycle

qwen3-coder-30b Open Source

Pass@1

83.3%

Pass@5

91.7%

0% 83.3% Pass@1 100%

Debugging (8 tasks)

✓fix-crashloop

✓fix-image-pull

✓fix-pending-pod

✓fix-probes

✗fix-service-routing

✓fix-service-with-no-endpoints

✓fix-rbac-wrong-resource

✓debug-app-logs

Configuration (10 tasks)

✓create-network-policy

✓create-pod

✗create-pod-mount-configmaps

✓create-pod-resources-limits

✓create-simple-rbac

✓horizontal-pod-autoscaler

✓list-images-for-pods

✗multi-container-pod-communication

✓resize-pvc

✗setup-dev-cluster

Deployments (6 tasks)

✓create-canary-deployment

✓deployment-traffic-switch

✓rolling-update-deployment

✓scale-deployment

✓scale-down-deployment

✓statefulset-lifecycle

qwen3.6:35b-a3b Open Source

Pass@1

83.3%

Pass@5

95.8%

0% 83.3% Pass@1 100%

Debugging (8 tasks)

✓fix-crashloop

✓fix-image-pull

✓fix-pending-pod

✓fix-probes

✓fix-service-routing

✗fix-service-with-no-endpoints

✗fix-rbac-wrong-resource

✓debug-app-logs

Configuration (10 tasks)

✓create-network-policy

✓create-pod

✓create-pod-mount-configmaps

✓create-pod-resources-limits

✓create-simple-rbac

✓horizontal-pod-autoscaler

✓list-images-for-pods

✗multi-container-pod-communication

✓resize-pvc

✗setup-dev-cluster

Deployments (6 tasks)

✓create-canary-deployment

✓deployment-traffic-switch

✓rolling-update-deployment

✓scale-deployment

✓scale-down-deployment

✓statefulset-lifecycle

Run Your Own Benchmarks

Want to test a different model or validate results on your hardware? The benchmark suite is open source and easy to run.

sabre-k8s-ai-bench k8s-ai-bench documentation

Learn More

Start here

Getting Started

Install Sabre and run your first query in under a minute.

Explore

Tutorials

Step-by-step guides for common Kubernetes tasks.