Temperature

Understanding how a single parameter controls the balance between predictable logic and creative randomness in AI outputs.

What is Temperature?

In LLMs, Temperature is a hyperparameter that scales the "logits" (raw scores) of the next token predictions before they are converted into probabilities. It essentially controls how much the model favors the most likely options versus exploring less likely ones.

Low Temperature

Focuses on the top results. Reliable, consistent, and factual. Great for code, math, and structured data.

High Temperature

Spreads probability to more tokens. Diverse, creative, and surprising. Great for stories, brainstorming, and poetry.

🎛️

Interactive Distribution

Adjust the temperature to see its effect

Adjust the temperature slider to see how it reshapes the probability distribution for the next token. Watch how "the" (the most likely choice) dominates at low temperatures and loses its lead as the temperature rises.

Control Panel

Adjust the temperature

Balanced

Temperature0.7

DeterministicBalancedCreativeChaotic

Sample Prompt

"Once upon a time, there was..."

Probability Distribution

Next token probabilities

"the"

72.1%

"a"

20.6%

"one"

5.6%

"every"

1.5%

"some"

0.2%

Focused Sampling: Distribution is sharpened. The model samples randomly, but high-probability tokens are much more likely to be chosen.

Live Completion

"Once upon a time, there was" "

Balanced: Natural mix of predictability and variety.

Why different outputs? The bars show probability, but the model doesn't always pick the tallest bar. It samples randomly—like rolling a weighted die. Higher temperature = more equal weights = more unpredictable rolls. Click "Regenerate" to sample again!

How it Works Mathematically

The model generates a score for every possible token. To get probabilities, we use the Softmax function, modified by temperature:

P(x_i) = \frac{e^{z_i / T}}{\sum_{j} e^{z_j / T}}

When T → 0Low

Dividing by a small T amplifies differences between scores. The highest logit dominates exponentially.

\lim_{T \to 0} P(x_{\text{max}}) = 1

When T → ∞High

Dividing by a large T compresses all scores toward zero, making them nearly equal after exponentiation.

\lim_{T \to \infty} P(x_i) = \frac{1}{N}

Practical Guidelines

Use Case	Temperature	Why?
Coding & Math	0.0 - 0.2	Errors in logic are costly; you want the most likely correct path.
Fact Retrieval	0.1 - 0.4	Reduces "hallucinations" by sticking to the most probable data points.
General Chat	0.7 - 0.8	The "sweet spot" for most models to sound natural and helpful.
Creative Writing	1.0 - 1.2	Encourages the model to use more interesting, varied vocabulary.
Brainstorming	1.2 - 1.5	Generates wild, unconventional ideas that might spark inspiration.

Key Takeaways

1Temperature 0 is deterministic ("Greedy Search") — always picks the top token
2Higher temperature increases variety and creativity but decreases coherence
3Too high temperature (> 1.5) often results in gibberish
4Always match your temperature to the task's requirement for precision vs. creativity