Series · 5 parts

7 Days of Local LLM

Running serious local LLMs hands-on on NVIDIA DGX Spark, from unboxing to 120B-parameter models.

A seven-day series taking the NVIDIA DGX Spark from unboxing to production AI workloads. SSH, networking, model serving, fine-tuning, and the practical infrastructure decisions you face when you actually own one of these.

Written by

Saiyam Pathak

01
Part 1 · May 25, 2026 · 13 min
Day 1: The Local LLM Revolution. Why Your Desk Just Became the New Datacenter
Why local LLMs are becoming practical in 2026, what changed across open weights, hardware, and inference software, and why DGX Spark makes the desk feel like a small AI lab.
02
Part 2 · May 27, 2026 · 26 min
Day 2: Anatomy of an LLM Inference Request. From Prompt to Answer, Step by Step
A beginner-friendly walkthrough of tokenization, prefill, KV cache, decode, batching, TTFT, and why memory bandwidth shapes local LLM performance on NVIDIA DGX Spark.
03
Part 3 · Jun 5, 2026 · 19 min
Day 3: The DGX Spark Unpacked. GB10, Unified Memory, sm_121, and the One Reason This Hardware Exists
A practical teardown of NVIDIA DGX Spark's GB10 Grace Blackwell Superchip, unified memory, sm_121, NVFP4 tensor cores, memory reporting, and decode limits.
04
Part 4 · Jun 10, 2026 · 28 min
Day 4: Quantization Demystified. BF16, FP8, NVFP4, MXFP4, INT4, GGUF, and Why It All Matters
A practical, beginner-friendly guide to BF16, FP8, NVFP4, MXFP4, INT4, and GGUF Q4_K_M on NVIDIA DGX Spark. Bytes per parameter, quality vs size, and which format to pick when.
05
Part 5 · Jul 17, 2026 · 56 min
Day 5: Local LLM Inference Engines, Wrappers, and What to Pick
A beginner-friendly guide to local LLM inference, with the same Qwen model tested through Ollama, llama.cpp, Docker Model Runner, vLLM, SGLang, and TensorRT-LLM on NVIDIA DGX Spark.

7 Days of Local LLM

Day 1: The Local LLM Revolution. Why Your Desk Just Became the New Datacenter

Day 2: Anatomy of an LLM Inference Request. From Prompt to Answer, Step by Step

Day 3: The DGX Spark Unpacked. GB10, Unified Memory, sm_121, and the One Reason This Hardware Exists

Day 4: Quantization Demystified. BF16, FP8, NVFP4, MXFP4, INT4, GGUF, and Why It All Matters

Day 5: Local LLM Inference Engines, Wrappers, and What to Pick