on Mar 16, 2025 i took...

LLM — Understanding the Randomness

The Generation

The LLMs generation usually learned the patterns how together or apart of the words. So basically, the LLMs text generation is just the probabilities for the next token based on previous context. But what probabilities?

The generation work by predict one token at a time

When LLMs generating, there are usually a few candidates in the vocabulary for each token, and each token has a certain likeihood of being choosen.

Neutral network internals

The Softmax

Mathematically, LLMs using softmax function to transforming logits into a probability distribution over a tokens vocabulary.

σ(zi)=eziTj=1nezjT\sigma(z_i) = \frac{e^{z_i \over T}}{\sum_{j=1}^{n} e^{z_j \over T}}

Where:

The softmax function ensure that each σ(zi)\sigma(z_i) value is between (0,1)(0,1).

The Temperature

Temperature controls the unpredictability of a LLMs' output.

With a higher temperature, model's ouput get more creative and less predictable, as it amplifies the chances of selecting less protable tokens while reducing the chances for the more likely ones.

On the other hand, a lower temperature yield more cautious and predictable outputs. The softmax function magnifies differences between logits, leading to sharper probability distributions.

How temperature affect to probability distributions

The Top-P

Top-P helps control the randomness of LLMs' output. It sets a probability cutoff and picks tokens whose total likelihood adds up to or exceeds this threshold.

How Top-P establishing the tokens

Let's look at an example where an LLM predicts next word in the sentence "The quick brown..."

The LLM might rank the top token choices like:

If Top-P is set to 0.9, the LLM will only consider tokens whose combined probabilities add up to at least 90%:

At this point, the LLM stops adding more tokens. It then randomly pick one among fox, dog, or bear for the next word.

References