Sabre SABRE

Benchmarks

How Sabre performs on real Kubernetes tasks with open-source models via Ollama.

Results are from k8s-ai-bench, a benchmark suite of 24 real-world Kubernetes tasks spanning debugging, configuration, and deployments. All models run locally via Ollama -- no API keys, no cloud services.

Pass@1

The task is attempted once. The score reflects how often the model gets it right on the first try -- the most realistic measure of day-to-day usage.

Pass@5

The task is attempted up to 5 times. If any attempt succeeds, it counts as a pass. This measures the model's capability ceiling with retries.

devstral-small-2:24b Open Source
Pass@1
70.8%
Pass@5
91.7%
0% 70.8% Pass@1 100%
Debugging (8 tasks)
fix-crashloop
fix-image-pull
fix-pending-pod
fix-probes
fix-service-routing
fix-service-with-no-endpoints
fix-rbac-wrong-resource
debug-app-logs
Configuration (10 tasks)
create-network-policy
create-pod
create-pod-mount-configmaps
create-pod-resources-limits
create-simple-rbac
horizontal-pod-autoscaler
list-images-for-pods
multi-container-pod-communication
resize-pvc
setup-dev-cluster
Deployments (6 tasks)
create-canary-deployment
deployment-traffic-switch
rolling-update-deployment
scale-deployment
scale-down-deployment
statefulset-lifecycle
qwen3-coder-30b Open Source
Pass@1
83.3%
Pass@5
91.7%
0% 83.3% Pass@1 100%
Debugging (8 tasks)
fix-crashloop
fix-image-pull
fix-pending-pod
fix-probes
fix-service-routing
fix-service-with-no-endpoints
fix-rbac-wrong-resource
debug-app-logs
Configuration (10 tasks)
create-network-policy
create-pod
create-pod-mount-configmaps
create-pod-resources-limits
create-simple-rbac
horizontal-pod-autoscaler
list-images-for-pods
multi-container-pod-communication
resize-pvc
setup-dev-cluster
Deployments (6 tasks)
create-canary-deployment
deployment-traffic-switch
rolling-update-deployment
scale-deployment
scale-down-deployment
statefulset-lifecycle
qwen3.6:35b-a3b Open Source
Pass@1
83.3%
Pass@5
95.8%
0% 83.3% Pass@1 100%
Debugging (8 tasks)
fix-crashloop
fix-image-pull
fix-pending-pod
fix-probes
fix-service-routing
fix-service-with-no-endpoints
fix-rbac-wrong-resource
debug-app-logs
Configuration (10 tasks)
create-network-policy
create-pod
create-pod-mount-configmaps
create-pod-resources-limits
create-simple-rbac
horizontal-pod-autoscaler
list-images-for-pods
multi-container-pod-communication
resize-pvc
setup-dev-cluster
Deployments (6 tasks)
create-canary-deployment
deployment-traffic-switch
rolling-update-deployment
scale-deployment
scale-down-deployment
statefulset-lifecycle

Run Your Own Benchmarks

Want to test a different model or validate results on your hardware? The benchmark suite is open source and easy to run.

Learn More