The Blog

Kubesimplify Blog

Deep dives on Kubernetes, AI infrastructure, GitOps, and the cloud-native stack, written by practitioners.

RSS Authors Write for us →

Latest

kubernetesgpuJul 23, 2026

How to Share GPUs in Kubernetes at Scale with HAMi (Software vGPU Slicing)

Share NVIDIA GPUs in Kubernetes with HAMi software vGPU slicing: memory and compute limits, Helm configuration, a verified PyTorch manifest, a real RTX PRO 6000 OOM test, and Prometheus monitoring.

Shubham Katara & Saiyam Pathak · 34 min

Read →

kubernetesgpuJul 20, 2026

Slicing GPUs in Kubernetes with NVIDIA Multi-Instance GPU (MIG)

GPU sharing in Kubernetes explained: time-slicing vs MPS vs MIG, every nvidia-smi command to enable and disable MIG on one GPU or eight, GPU Operator automation, pitfalls, and DCGM monitoring.

Shubham Katara & Saiyam Pathak · 45 min

Read →

nvidiadgxsparkJul 17, 2026

Day 5: Local LLM Inference Engines, Wrappers, and What to Pick

A beginner-friendly guide to local LLM inference, with the same Qwen model tested through Ollama, llama.cpp, Docker Model Runner, vLLM, SGLang, and TensorRT-LLM on NVIDIA DGX Spark.

Saiyam Pathak · 56 min

Read →

Series

All series →

7 parts

7 Days of Docker

A week-long journey from your first docker run to production-ready containers.

by Saloni Narang

5 parts

7 Days of Local LLM

Running serious local LLMs hands-on on NVIDIA DGX Spark, from unboxing to 120B-parameter models.

by Saiyam Pathak

2 parts

agentgateway on Kubernetes

Put your AI agents behind a gateway: lock down their tools, then see what they cost.

by Shubham Katara

Topic hubs

Kubernetes

Deep dives on the world's most-deployed orchestrator.

Docker & Containers

Container internals, image building, and developer workflows.

AI & ML on Cloud Native

Running AI/ML workloads on Kubernetes and modern infrastructure.

DevOps & Platform

CI/CD, GitOps, IaC, and the platform-engineering playbook.

Cloud Native Security

Hardening containers, Kubernetes, and the supply chain.

Linux Fundamentals

The OS underneath every container, cluster, and cloud.

Browse by tag

kubernetes ·97 devops ·71 docker ·31 k8s ·27 linux ·19 containers ·17 cloud ·16 aws ·12 llm ·11 cloud-native ·11 security ·11 nvidia ·10

Newsletter

Kubesimplify Diaries

Subscribe to get our latest Kubernetes, AI infra, and cloud-native articles delivered to your inbox.

Subscribe on Substack

All posts

192 posts · page 1 of 13

kubernetesJuly 23, 2026

How to Share GPUs in Kubernetes at Scale with HAMi (Software vGPU Slicing)

Share NVIDIA GPUs in Kubernetes with HAMi software vGPU slicing: memory and compute limits, Helm configuration, a verified PyTorch manifest, a real RTX PRO 6000 OOM test, and Prometheus monitoring.

Shubham Katara Read now

kubernetesJuly 20, 2026

Slicing GPUs in Kubernetes with NVIDIA Multi-Instance GPU (MIG)

GPU sharing in Kubernetes explained: time-slicing vs MPS vs MIG, every nvidia-smi command to enable and disable MIG on one GPU or eight, GPU Operator automation, pitfalls, and DCGM monitoring.

Shubham Katara Read now

nvidiaJuly 17, 2026

Day 5: Local LLM Inference Engines, Wrappers, and What to Pick

A beginner-friendly guide to local LLM inference, with the same Qwen model tested through Ollama, llama.cpp, Docker Model Runner, vLLM, SGLang, and TensorRT-LLM on NVIDIA DGX Spark.

Saiyam Pathak Read now

aiJuly 16, 2026

Bonsai 27B on RTX PRO 6000 vs DGX Spark: what actually works

Real Bonsai 27B benchmarks on an RTX PRO 6000 and a DGX Spark, including the supported llama.cpp setup, ternary vs 1-bit results, and speculative decoding.

Saiyam Pathak Read now

kubernetesJuly 6, 2026

Introducing kiac: Real Kubernetes Nodes on Your Mac, Each Its Own Lightweight VM

kiac runs local Kubernetes on macOS where every node is its own lightweight VM via apple/container: kubeadm or k3s flavors, Cilium on a custom kernel, built-in LoadBalancer, Grafana, Gateway API, and clusters that survive reboots.

Saiyam Pathak Read now

kubernetesJune 30, 2026

LLM Costs and Observability with agentgateway on Kubernetes (Part 2)

Part 2: scrape agentgateway with Prometheus, build a Grafana dashboard of token cost and per-tool usage, see blocked tool calls, and alert on spend.

Shubham Katara Read now

kubernetesJune 29, 2026

Controlling MCP Tools with agentgateway on Kubernetes (Part 1)

Run AI agents behind agentgateway on Kubernetes: route their LLM and MCP tool calls through one proxy, keep secrets out of the agent, and block tools by policy.

Shubham Katara Read now

nvidiaJune 10, 2026

Day 4: Quantization Demystified. BF16, FP8, NVFP4, MXFP4, INT4, GGUF, and Why It All Matters

A practical, beginner-friendly guide to BF16, FP8, NVFP4, MXFP4, INT4, and GGUF Q4_K_M on NVIDIA DGX Spark. Bytes per parameter, quality vs size, and which format to pick when.

Saiyam Pathak Read now

nvidiaJune 5, 2026

Day 3: The DGX Spark Unpacked. GB10, Unified Memory, sm_121, and the One Reason This Hardware Exists

A practical teardown of NVIDIA DGX Spark's GB10 Grace Blackwell Superchip, unified memory, sm_121, NVFP4 tensor cores, memory reporting, and decode limits.

Saiyam Pathak Read now

wandlerJune 3, 2026

Wandler: Local OpenAI-Compatible Inference With Transformers.js and WebGPU

A practical Wandler deep dive with a local M1 Max WebGPU demo, real latency numbers, architecture diagrams, and getting-started commands.

Saiyam Pathak Read now

mlxMay 29, 2026

mlxcel: A Rust-Native Inference Engine for Apple Silicon, Tested on My M1 Max

Day-one deep dive into mlxcel v0.1.0, a Rust-native MLX inference engine. Real M1 Max benchmarks vs mlx-lm and Ollama on Llama 3.2 3B and Qwen 2.5 7B, with architecture diagrams and an honest take on TurboQuant.

Saiyam Pathak Read now

nvidiaMay 27, 2026

Day 2: Anatomy of an LLM Inference Request. From Prompt to Answer, Step by Step

A beginner-friendly walkthrough of tokenization, prefill, KV cache, decode, batching, TTFT, and why memory bandwidth shapes local LLM performance on NVIDIA DGX Spark.

Saiyam Pathak Read now

kubernetesMay 25, 2026

How kube-proxy Actually Works: iptables, IPVS, and nftables Inside Out

How kube-proxy turns Kubernetes Services into kernel rules. iptables, IPVS, nftables packet paths and which to pick in 2026. Verified against k/k 1.36 source.

Saiyam Pathak Read now

nvidiaMay 25, 2026

Day 1: The Local LLM Revolution. Why Your Desk Just Became the New Datacenter

Why local LLMs are becoming practical in 2026, what changed across open weights, hardware, and inference software, and why DGX Spark makes the desk feel like a small AI lab.

Saiyam Pathak Read now

kubernetesMay 11, 2026

How Kubernetes EndpointSlices Actually Work (and Why Endpoints Had to Die)

A Service has no pod IPs in it. We covered that in the last post. So somewhere, something is keeping a list of every pod IP that matches the Service's…

Saiyam Pathak Read now