Entropy and uncertainty

Let be a random variable and its entropy.

The larger the entropy, the more uncertain the random variable.

Example

Guessing the color of a ball drawn from an urn

Consider an urn containing 8 balls with the following distribution of colors. A single ball is drawn, and its color, modeled as a random variable , must be guessed with the smallest number of yes/no questions.

Color ()Probability
Red
Green
Blue
Yellow

The entropy of the random variable is by definition:

Substituting the probabilities from the table:

Strategy

To minimize the average number of yes/no questions, the prior information regarding the probabilities of drawing a ball of a certain color should be exploited.

  • Question 1: “Is the ball Red?” (Matches 50% of outcomes).
  • Question 2: If not Red, “Is it Green?” (Matches 25% of outcomes).
  • Question 3: If neither Red nor Green, “Is it Blue?” (Resolves the remaining 25%).

By tailoring the sequence of questions to the distribution, outcomes with higher probabilities are prioritized, thereby shortening the average decision path and minimizing the expected number of questions required.

Let denote the random variable representing the number of yes/no questions asked.

# of asked questions ()Probability
1 (if red)
2 (if green)
3 (if blue or yellow)

The average number of questions required to identify the color is calculated by applying the definition of expectation of the discrete random variable :

Important

In general, the entropy of a r.v. is approximately equal to the average number of binary questions (yes/no questions; that is why the entropy is measured in bits) necessary to guess it. Therefore:

Note

It should be noted that while in this specific example is exactly equal to the average number of questions needed to guess , in the general case, it can be proven that the entropy is the theoretical lower bound for this value.


Entropy and information

Let be a random variable and its entropy.

The larger the entropy, the more informative the random variable.

Example

Storing of daily weather report

The daily weather report on a mountain must be stored on a device; only sunny, cloudy, rainy, and snowy are of interest. From previous measurements the weather is sunny of the times, cloudy , rainy and snowy . The goal is to use, on the average, the smallest number of bits to store this information.

Let be the daily weather situation.

Daily weather ()Probability
Sunny
Cloudy
Rainy
Snowy

The entropy of the random variable X is:

It can be proven that the best binary encoding is:

valuecodeword
sunny0
cloudy10
rainy110
snowy111

Let  denote the random variable representing the number of used bits for the encoding of the daily weather situation.

# of used bits ()Probability
1 (if it’s sunny)
2 (if it’s cloudy)
3 (if it’s rainy or snowy)

The average number of bits used to store this information is calculated by applying the definition of expectation of the discrete random variable :

Important

In general, the entropy of a r.v. is approximately equal to the average number of bits (that is why it is measured in bits) necessary to describe/represent it. Therefore:

Note

It should be noted that while in this specific example is exactly equal to the average number of bits needed to represent , in the general case, it can be proven that the entropy is the theoretical lower bound for this value.


Entropy-uncertainty-information

Let be a random variable and its entropy.

It follows that:

Important