Statement

Let:

  • be a random vector in with alphabet (i.e. the set of values that can take)

  • be a convex set such that

  • be a convex function

    Assuming that exists and that is well defined (possibly equal to ), Jensen’s Inequality states that:

If, moreover, is strictly convex, then equality holds if and only if is almost surely constant, equivalently if and only if almost surely.

In the remainder, the scalar case is considered, so the random vector is denoted by the random variable .

Note

Although the statement holds for both discrete and continuous random variables, the proof below treats only the discrete case.

Proof (Discrete case)

Let denote the number of distinct values that the random variable can assume. The proof proceeds by mathematical induction on .

1. Base Case ()

Assume that takes only two values, and in , with probabilities and respectively.

By the definition of a convex function, the image of a convex combination is always dominated by the convex combination of the images. Therefore, it trivially follows that:

The base case is verified.

2. Induction Hypothesis ()

Assume the inequality holds true for any discrete random variable taking exactly distinct values.

3. Inductive Step ()

It must be shown that if the proposition holds for , it necessarily holds for . Consider a random variable defined by the following probability distribution:

Point in Probability

The expected value is given by the sum over all states:

Isolate the -th term from the rest of the summation:

To manipulate the summation into a form where the induction hypothesis can be applied, multiply and divide the first term by , bringing the denominator inside the summation:

Applying the Induction and Concluding

Since the terms inside the summation represent a valid convex combination of elements, the induction hypothesis () can now be applied to the sum:

Substituting this inequality back into the main equation yields:

The resulting expression is a convex combination of exactly two images, weighted by and . This reduces the problem back to the base case (). Applying the definition of convexity once more:

Simplifying the term inside the function’s argument gives:

Recombining the terms inside the argument restores the full expected value :

This completes the proof by induction.