Wandler: Local OpenAI-Compatible Inference With Transformers.js and WebGPU
A practical Wandler deep dive with a local M1 Max WebGPU demo, real latency numbers, architecture diagrams, and getting-started commands.
Deep dives on Kubernetes, AI infrastructure, GitOps, and the cloud-native stack, written by practitioners.

A practical Wandler deep dive with a local M1 Max WebGPU demo, real latency numbers, architecture diagrams, and getting-started commands.
Saiyam Pathak · 20 min
Day-one deep dive into mlxcel v0.1.0, a Rust-native MLX inference engine. Real M1 Max benchmarks vs mlx-lm and Ollama on Llama 3.2 3B and Qwen 2.5 7B, with architecture diagrams and an honest take on TurboQuant.
Saiyam Pathak · 28 min
A beginner-friendly walkthrough of tokenization, prefill, KV cache, decode, batching, TTFT, and why memory bandwidth shapes local LLM performance on NVIDIA DGX Spark.
Saiyam Pathak · 26 minDeep dives on the world's most-deployed orchestrator.
Container internals, image building, and developer workflows.
Running AI/ML workloads on Kubernetes and modern infrastructure.
CI/CD, GitOps, IaC, and the platform-engineering playbook.
Hardening containers, Kubernetes, and the supply chain.
The OS underneath every container, cluster, and cloud.
Newsletter
Subscribe to get our latest Kubernetes, AI infra, and cloud-native articles delivered to your inbox.
Subscribe on SubstackPowered by Substack · Unsubscribe anytime
183 posts · page 1 of 13
A practical Wandler deep dive with a local M1 Max WebGPU demo, real latency numbers, architecture diagrams, and getting-started commands.
Day-one deep dive into mlxcel v0.1.0, a Rust-native MLX inference engine. Real M1 Max benchmarks vs mlx-lm and Ollama on Llama 3.2 3B and Qwen 2.5 7B, with architecture diagrams and an honest take on TurboQuant.
A beginner-friendly walkthrough of tokenization, prefill, KV cache, decode, batching, TTFT, and why memory bandwidth shapes local LLM performance on NVIDIA DGX Spark.
How kube-proxy turns Kubernetes Services into kernel rules. iptables, IPVS, nftables packet paths and which to pick in 2026. Verified against k/k 1.36 source.
Why local LLMs are becoming practical in 2026, what changed across open weights, hardware, and inference software, and why DGX Spark makes the desk feel like a small AI lab.
A Service has no pod IPs in it. We covered that in the last post. So somewhere, something is keeping a list of every pod IP that matches the Service's…
NVIDIA just open-sourced the full NVCF platform under Apache 2.0. Not a thin SDK, not a client library. The actual control plane, invocation plane,…
A pod gets created. It gets an IP. Then it dies. A new pod replaces it. New IP. Now imagine you have ten pods of the same app, and they restart all the…
Your container runs as root and has 18 CVEs. A Docker Captain's guide to hardening, Scout policies, DHI, Sandboxes, and what comes after Docker.
"Pull AI models from Docker Hub, run them locally with GPU acceleration, and build an AI-powered app
When you run gcloud container clusters get-credentials , the kubeconfig it writes looks innocent — until you hand it to a teammate and they hit: …or the…
7 Days of Docker in 2026 - From docker run Chaos to Declarative Stacks Nobody types docker run with 15 flags in real life. I’ve been learning and working…
How kube-scheduler picks a node: 13 framework stages, 14 Filter plugins, 9 Score plugins, live preemption demo.
7 Days of Docker in 2026 — When Containers Need to Talk and Remember On Day 3, you built production-ready images with Dockerfiles, optimized layers, and…
Stop writing Dockerfiles from scratch. A Docker Captain walks through docker init, layer caching, multi-stage builds, and docker debug for 2026.