Day 2: Anatomy of an LLM Inference Request. From Prompt to Answer, Step by Step
A beginner-friendly walkthrough of tokenization, prefill, KV cache, decode, batching, TTFT, and why memory bandwidth shapes local LLM performance on NVIDIA DGX Spark.
Deep dives on Kubernetes, AI infrastructure, GitOps, and the cloud-native stack, written by practitioners.

A beginner-friendly walkthrough of tokenization, prefill, KV cache, decode, batching, TTFT, and why memory bandwidth shapes local LLM performance on NVIDIA DGX Spark.
Saiyam Pathak · 17 min
How kube-proxy turns Kubernetes Services into kernel rules. iptables, IPVS, nftables packet paths and which to pick in 2026. Verified against k/k 1.36 source.
Saiyam Pathak · 9 min
Why local LLMs are becoming practical in 2026, what changed across open weights, hardware, and inference software, and why DGX Spark makes the desk feel like a small AI lab.
Saiyam Pathak · 13 minDeep dives on the world's most-deployed orchestrator.
Container internals, image building, and developer workflows.
Running AI/ML workloads on Kubernetes and modern infrastructure.
CI/CD, GitOps, IaC, and the platform-engineering playbook.
Hardening containers, Kubernetes, and the supply chain.
The OS underneath every container, cluster, and cloud.
Newsletter
Subscribe to get our latest Kubernetes, AI infra, and cloud-native articles delivered to your inbox.
Subscribe on SubstackPowered by Substack · Unsubscribe anytime
181 posts · page 1 of 13
A beginner-friendly walkthrough of tokenization, prefill, KV cache, decode, batching, TTFT, and why memory bandwidth shapes local LLM performance on NVIDIA DGX Spark.
How kube-proxy turns Kubernetes Services into kernel rules. iptables, IPVS, nftables packet paths and which to pick in 2026. Verified against k/k 1.36 source.
Why local LLMs are becoming practical in 2026, what changed across open weights, hardware, and inference software, and why DGX Spark makes the desk feel like a small AI lab.
A Service has no pod IPs in it. We covered that in the last post. So somewhere, something is keeping a list of every pod IP that matches the Service's…
NVIDIA just open-sourced the full NVCF platform under Apache 2.0. Not a thin SDK, not a client library. The actual control plane, invocation plane,…
A pod gets created. It gets an IP. Then it dies. A new pod replaces it. New IP. Now imagine you have ten pods of the same app, and they restart all the…
Your container runs as root and has 18 CVEs. A Docker Captain's guide to hardening, Scout policies, DHI, Sandboxes, and what comes after Docker.
"Pull AI models from Docker Hub, run them locally with GPU acceleration, and build an AI-powered app
When you run gcloud container clusters get-credentials , the kubeconfig it writes looks innocent — until you hand it to a teammate and they hit: …or the…
7 Days of Docker in 2026 - From docker run Chaos to Declarative Stacks Nobody types docker run with 15 flags in real life. I’ve been learning and working…
How kube-scheduler picks a node: 13 framework stages, 14 Filter plugins, 9 Score plugins, live preemption demo.
7 Days of Docker in 2026 — When Containers Need to Talk and Remember On Day 3, you built production-ready images with Dockerfiles, optimized layers, and…
Stop writing Dockerfiles from scratch. A Docker Captain walks through docker init, layer caching, multi-stage builds, and docker debug for 2026.
Every step of what happens when you run kubectl run nginx on Kubernetes. From argv to etcd Raft, scheduler, CRI, CNI, runc, and PLEG.
7 Days of Docker (2026) - by Saloni Narang, Docker Captain & CNCF Ambassador I'm a Docker Captain. I've seen hundreds of Docker tutorials explain images…