Entropy and uncertainty
Let be a random variable and its entropy.
The larger the entropy, the more uncertain the random variable.
Example
Guessing the color of a ball drawn from an urn
Consider an urn containing 8 balls with the following distribution of colors. A single ball is drawn, and its color, modeled as a random variable , must be guessed with the smallest number of yes/no questions.
| Color () | Probability |
|---|---|
| Red | |
| Green | |
| Blue | |
| Yellow |
Technical detail
Strictly speaking, the mapping used in this example (where outcomes are “Red”, “Green”, etc.) is a categorical variable rather than a formal random variable.
In probability theory, a random variable is defined as a function that maps elements of the sample space to the real numbers . Since “colors” are symbolic labels, treating as a formal r.v. requires a relabeling, where each color is assigned a real number (e.g., Red = 1, Green = 2, and so on).
The entropy of the random variable is by definition:
Substituting the probabilities from the table:
Strategy
To minimize the average number of yes/no questions, the prior information regarding the probabilities of drawing a ball of a certain color should be exploited.
- Question 1: “Is the ball Red?” (Matches 50% of outcomes).
- Question 2: If not Red, “Is it Green?” (Matches 25% of outcomes).
- Question 3: If neither Red nor Green, “Is it Blue?” (Resolves the remaining 25%).
By tailoring the sequence of questions to the distribution, outcomes with higher probabilities are prioritized, thereby shortening the average decision path and minimizing the expected number of questions required.

Let denote the random variable representing the number of yes/no questions asked.
Why is a r.v?
is a random variable because it’s a function of the r.v. that is the color of the ball drawn from the urn (i.e., the specific number of questions is uniquely determined by the outcome of the draw ). Therefore represents a deterministic mapping from the sample space to the set of real numbers and so it’s a random variable.
| # of asked questions () | Probability |
|---|---|
| 1 (if red) | |
| 2 (if green) | |
| 3 (if blue or yellow) |
The average number of questions required to identify the color is calculated by applying the definition of expectation of the discrete random variable :
Important
In general, the entropy of a r.v. is approximately equal to the average number of binary questions (yes/no questions; that is why the entropy is measured in bits) necessary to guess it. Therefore:
Note
It should be noted that while in this specific example is exactly equal to the average number of questions needed to guess , in the general case, it can be proven that the entropy is the theoretical lower bound for this value.
Entropy and information
Let be a random variable and its entropy.
The larger the entropy, the more informative the random variable.
Example
Storing of daily weather report
The daily weather report on a mountain must be stored on a device; only sunny, cloudy, rainy, and snowy are of interest. From previous measurements the weather is sunny of the times, cloudy , rainy and snowy . The goal is to use, on the average, the smallest number of bits to store this information.
Let be the daily weather situation.
Technical detail
Strictly speaking, is not a random variable since outcomes like “Sunny” or “Cloudy” are not numerical. However, it can be treated as such through a simple relabeling into .
| Daily weather () | Probability |
|---|---|
| Sunny | |
| Cloudy | |
| Rainy | |
| Snowy |
The entropy of the random variable X is:
It can be proven that the best binary encoding is:
| value | codeword |
|---|---|
| sunny | 0 |
| cloudy | 10 |
| rainy | 110 |
| snowy | 111 |
Why this encoding strategy is optimal?
The strategy employs Huffman coding, which is proven to be optimal as it solves a constrained optimization problem: minimizing the average codeword length, equivalent to the expected number of questions , subject to the constraint that the code is prefix. In this framework, the length of each binary codeword corresponds exactly to the number of yes/no questions required to identify the outcome along the decision path.
Let denote the random variable representing the number of used bits for the encoding of the daily weather situation.
| # of used bits () | Probability |
|---|---|
| 1 (if it’s sunny) | |
| 2 (if it’s cloudy) | |
| 3 (if it’s rainy or snowy) |
The average number of bits used to store this information is calculated by applying the definition of expectation of the discrete random variable :
Important
In general, the entropy of a r.v. is approximately equal to the average number of bits (that is why it is measured in bits) necessary to describe/represent it. Therefore:
Note
It should be noted that while in this specific example is exactly equal to the average number of bits needed to represent , in the general case, it can be proven that the entropy is the theoretical lower bound for this value.
Entropy-uncertainty-information
Let be a random variable and its entropy.
It follows that:
Important