Let and be two discrete random variables defined on a joint sample space . Consider two joint probability distributions and . The joint Kullback-Leibler (KL) divergence quantifies the statistical discrepancy between these two joint probability mass functions.
Definition
The joint KL divergence is defined as the expected value of the logarithmic ratio between the joint probabilities, calculated with respect to distribution :
In operator notation, this is expressed as:
The divergence is well-defined if and only if (joint absolute continuity), meaning that for every pair , implies .
Key Properties
1. Additivity under Independence
If the variables and are independent under both distributions ( and ), the joint divergence simplifies to the sum of the marginal divergences:
2. Conditioning and Convexity
Conditioning does not reduce KL divergence in the same way it reduces entropy. This behavior is a direct consequence of the log-sum inequality and the convexity of the KL divergence with respect to the pair .