Controlling MCP Tools with agentgateway on Kubernetes (Part 1)
Run AI agents behind agentgateway on Kubernetes: route their LLM and MCP tool calls through one proxy, keep secrets out of the agent, and block tools by policy.
Deep dives on Kubernetes, AI infrastructure, GitOps, and the cloud-native stack, written by practitioners.

Run AI agents behind agentgateway on Kubernetes: route their LLM and MCP tool calls through one proxy, keep secrets out of the agent, and block tools by policy.
Shubham Katara · 22 min
A practical, beginner-friendly guide to BF16, FP8, NVFP4, MXFP4, INT4, and GGUF Q4_K_M on NVIDIA DGX Spark. Bytes per parameter, quality vs size, and which format to pick when.
Saiyam Pathak · 28 min
A practical teardown of NVIDIA DGX Spark's GB10 Grace Blackwell Superchip, unified memory, sm_121, NVFP4 tensor cores, memory reporting, and decode limits.
Saiyam Pathak · 19 minDeep dives on the world's most-deployed orchestrator.
Container internals, image building, and developer workflows.
Running AI/ML workloads on Kubernetes and modern infrastructure.
CI/CD, GitOps, IaC, and the platform-engineering playbook.
Hardening containers, Kubernetes, and the supply chain.
The OS underneath every container, cluster, and cloud.
Newsletter
Subscribe to get our latest Kubernetes, AI infra, and cloud-native articles delivered to your inbox.
Subscribe on SubstackPowered by Substack · Unsubscribe anytime
186 posts · page 1 of 13
Run AI agents behind agentgateway on Kubernetes: route their LLM and MCP tool calls through one proxy, keep secrets out of the agent, and block tools by policy.
A practical, beginner-friendly guide to BF16, FP8, NVFP4, MXFP4, INT4, and GGUF Q4_K_M on NVIDIA DGX Spark. Bytes per parameter, quality vs size, and which format to pick when.
A practical teardown of NVIDIA DGX Spark's GB10 Grace Blackwell Superchip, unified memory, sm_121, NVFP4 tensor cores, memory reporting, and decode limits.
A practical Wandler deep dive with a local M1 Max WebGPU demo, real latency numbers, architecture diagrams, and getting-started commands.
Day-one deep dive into mlxcel v0.1.0, a Rust-native MLX inference engine. Real M1 Max benchmarks vs mlx-lm and Ollama on Llama 3.2 3B and Qwen 2.5 7B, with architecture diagrams and an honest take on TurboQuant.
A beginner-friendly walkthrough of tokenization, prefill, KV cache, decode, batching, TTFT, and why memory bandwidth shapes local LLM performance on NVIDIA DGX Spark.
How kube-proxy turns Kubernetes Services into kernel rules. iptables, IPVS, nftables packet paths and which to pick in 2026. Verified against k/k 1.36 source.
Why local LLMs are becoming practical in 2026, what changed across open weights, hardware, and inference software, and why DGX Spark makes the desk feel like a small AI lab.
A Service has no pod IPs in it. We covered that in the last post. So somewhere, something is keeping a list of every pod IP that matches the Service's…
NVIDIA just open-sourced the full NVCF platform under Apache 2.0. Not a thin SDK, not a client library. The actual control plane, invocation plane,…
A pod gets created. It gets an IP. Then it dies. A new pod replaces it. New IP. Now imagine you have ten pods of the same app, and they restart all the…
Your container runs as root and has 18 CVEs. A Docker Captain's guide to hardening, Scout policies, DHI, Sandboxes, and what comes after Docker.
"Pull AI models from Docker Hub, run them locally with GPU acceleration, and build an AI-powered app
When you run gcloud container clusters get-credentials , the kubeconfig it writes looks innocent — until you hand it to a teammate and they hit: …or the…
7 Days of Docker in 2026 - From docker run Chaos to Declarative Stacks Nobody types docker run with 15 flags in real life. I’ve been learning and working…