Statement
Let:
be a random vector in with alphabet (i.e. the set of values that can take)
be a convex set such that
be a convex function
Assuming that exists and that is well defined (possibly equal to ), Jensen’s Inequality states that:
If, moreover, is strictly convex, then equality holds if and only if is almost surely constant, equivalently if and only if almost surely.
In the remainder, the scalar case is considered, so the random vector is denoted by the random variable .
Note
Although the statement holds for both discrete and continuous random variables, the proof below treats only the discrete case.
Proof (Discrete case)
Let denote the number of distinct values that the random variable can assume. The proof proceeds by mathematical induction on .
1. Base Case ()
Assume that takes only two values, and in , with probabilities and respectively.
By the definition of a convex function, the image of a convex combination is always dominated by the convex combination of the images. Therefore, it trivially follows that:
The base case is verified.
2. Induction Hypothesis ()
Assume the inequality holds true for any discrete random variable taking exactly distinct values.
3. Inductive Step ()
It must be shown that if the proposition holds for , it necessarily holds for . Consider a random variable defined by the following probability distribution:
| Point in | Probability |
|---|---|
The expected value is given by the sum over all states:
Isolate the -th term from the rest of the summation:
To manipulate the summation into a form where the induction hypothesis can be applied, multiply and divide the first term by , bringing the denominator inside the summation:
Proof of legitimate convex combination coefficients
Before proceeding, it must be verified that the newly formed terms constitute a valid probability distribution. They must be strictly positive and sum to exactly to act as legitimate convex combination coefficients.
- Positivity: Since and (assuming a non-degenerate distribution), the ratio is intrinsically positive.
- Summation to 1: Summing these coefficients yields:
Knowing that the total probability must equal 1 (), it follows that the partial sum is . Substituting this back provides:
Thus, these coefficients represent a valid probability distribution over elements.
Applying the Induction and Concluding
Since the terms inside the summation represent a valid convex combination of elements, the induction hypothesis () can now be applied to the sum:
Substituting this inequality back into the main equation yields:
The resulting expression is a convex combination of exactly two images, weighted by and . This reduces the problem back to the base case (). Applying the definition of convexity once more:
Simplifying the term inside the function’s argument gives:
Recombining the terms inside the argument restores the full expected value :
This completes the proof by induction.