The Generation
The LLMs generation usually learned the patterns how together or apart of the words. So basically, the LLMs text generation is just the probabilities for the next token based on previous context. But what probabilities?
When LLMs generating, there are usually a few candidates in the vocabulary for each token, and each token has a certain likeihood of being choosen.
The Softmax
Mathematically, LLMs using softmax function to transforming logits into a probability distribution over a tokens vocabulary.
Where:
- is given vector of logits
- is vocabulary size
- is Euler number
- is temperature parameter
The softmax function ensure that each value is between .
The Temperature
Temperature controls the unpredictability of a LLMs' output.
With a higher temperature, model's ouput get more creative and less predictable, as it amplifies the chances of selecting less protable tokens while reducing the chances for the more likely ones.
On the other hand, a lower temperature yield more cautious and predictable outputs. The softmax function magnifies differences between logits, leading to sharper probability distributions.
The Top-P
Top-P helps control the randomness of LLMs' output. It sets a probability cutoff and picks tokens whose total likelihood adds up to or exceeds this threshold.
Let's look at an example where an LLM predicts next word in the sentence "The quick brown..."
The LLM might rank the top token choices like:
- fox with a 0.5 probability
- dog with a 0.25 probability
- bear with a 0.15 probability
- fish with a 0.07 probability
- and other tokens with probability below 0.05
If Top-P is set to 0.9
, the LLM will only consider tokens whose combined probabilities add up to at least 90%:
- Starting with fox: 50% so far
- Adding dog: now 75%
- Adding bear: total now reaches 90%
At this point, the LLM stops adding more tokens. It then randomly pick one among fox, dog, or bear for the next word.