Definition
Let be a discrete random variable defined over the alphabet and distributed according to the probability mass function .
The entropy is defined as:
Mathematical Nuances and Notation
To ensure rigor while maintaining readability, this definition relies on specific conventions:
The Support and Singularity: The condition implicitly restricts the summation to the support of the distribution, defined as . This restriction circumvents the mathematical singularity of for impossible events. Alternatively, the summation may cover the entire alphabet by adopting the standard information-theoretic convention that (justified by continuity, as ).
Dependency on the Distribution: Writing is technically a convenient abuse of notation. Entropy is not determined by the specific values (labels) of , but exclusively by its probability mass function (PMF). Strictly speaking, entropy is a functional of the distribution itself, making or the more rigorous notation.
Explicit PMF Notation: The term is used for brevity. Formally, this should be denoted as to explicitly attribute the probability to the random variable . This distinction becomes crucial when analyzing multiple random variables simultaneously (e.g., in joint entropy).
Change of Base and Units
The choice of the logarithmic base establishes the unit of information (e.g., bits for , nats for , bans for ). Conversion between these distinct units is facilitated by the logarithmic change-of-base identity:
By substituting this identity into the definition of entropy and invoking the linearity of the expectation operator, the conversion formula is derived as follows:
Thus, entropy expressed in one unit is strictly a scalar multiple of entropy expressed in another (e.g., ).
Convention: In accordance with standard information-theoretic practice, the binary logarithm () is adopted as the default for all subsequent analysis. Therefore, unless explicitly stated otherwise, entropy is quantified in bits.
Non-negativity of Entropy
The entropy of a discrete random variable is always non-negative: This lower bound is established by analyzing the components of the definition:
-
Bounded Probabilities By definition, the probability mass function satisfies for all in the support of .
-
Positivity of Self-Information Consequently, the argument of the logarithm, , lies within the interval . Since the logarithm is a monotonically increasing function with , the variable representing self-information is non-negative:
-
Monotonicity of Expectation Finally, invoking the property that the expectation of a non-negative random variable is itself non-negative (i.e., if , then ), the result is derived:
Note
Equality (i.e., ) holds if and only if the random variable is deterministic (i.e., there exists an outcome such that ).
Example: Entropy of a Bernoulli r.v
Let i.e., X is a Benoulli
