Recap: chain rule for probabilities
Let be a discrete random vector (or a sequence of discrete random variables). The Chain Rule for probabilities provides a method to factorize the joint probability mass function into a product of conditional probabilities.
By recursively applying the definition of conditional probability, , the following expansion is obtained:
This recursive decomposition can be expressed concisely using product notation:
Notation Convention
In the product formula above there is an abuse of notation, since the term for is . By convention, the conditioning set for the first element is considered empty, reducing the term simply to the marginal probability .
Chain Rule for Entropy
The Chain Rule for Entropy allows for the decomposition of the joint entropy of a random vector into a sum of conditional entropies.
Let be a discrete random vector. The Chain Rule for Entropy states that:
Proof
The derivation follows from applying the chain rule for probabilities within the definition of joint entropy and from the application of logarithmic and expectation properties:
Notation Convention
In the summation formula above, there is a slight abuse of notation for the case , which would strictly be expressed as . By convention, the conditioning set for the first element is considered empty, reducing the term to the marginal entropy .
2 r.v.s
For the case of two random variables , the rule simplifies to:
Important
This identity demonstrates that the total uncertainty of a system is the sum of the uncertainty of the first component plus the uncertainty remaining in the second component after the first has been observed.
Independent r.v.s
If the random variables are independent, the joint entropy simplifies significantly. Under the condition of independence, the probability of any variable does not depend on the outcomes of the preceding variables, implying that .
Consequently, the conditional entropy reduces to the marginal entropy:
In this specific case, the Chain Rule for Entropy collapses into the sum of the individual entropies:
Important
This result indicates that for independent systems, the total uncertainty is simply the aggregate of the uncertainties of each individual component, as no information is shared between them.