Learn AI

Interactive Guide

Topics
Hands-On
AI Agents
Large Language Models
Diffusion Models
LLM Inference
ML Fundamentals
Prompting
AI Safety & Ethics
AI Industry
A project byLMF
GitHub·Updates
Artificial IntelligenceLLM Inference
5 Topics

LLM Inference

Understand how large language models generate text efficiently — from KV caching to batching strategies and serving infrastructure.

Help Make This Better

This guide is open source. Got an idea for a new topic? Found a bug? Want to improve an explanation? Every contribution helps.

Request a TopicReport a BugStar on GitHub
01
KV CacheStore computed keys and values to avoid redundant workFeb 9, 2026
E
02
Prompt CachingReuse computed KV caches across API requests to save cost and latencyFeb 19, 2026
I
03
Batching & ThroughputProcess multiple requests simultaneously for higher throughputFeb 9, 2026
I
04
Running Models LocallyRun LLMs on your own hardware for privacy, speed, and zero API costsFeb 9, 2026
B
05
VRAM CalculatorEstimate VRAM requirements and inference speed for running local LLMsFeb 27, 2026
B
Back to all topics