/ Notes / Statistics
KL Divergence
An inutitive explanation
Kumar Shantanu | 2024-05-26

Definition

The definition for KL divergence between two probability distributions P and Q is:

DKL(PQ)=xP(x)log(P(x)Q(x))D_{\text{KL}}(P||Q) = \sum_{x} P(x) \log \left(\frac{P(x)}{Q(x)}\right)

where:

  1. Probability Distributions P and Q: P(x)P(x) and Q(x)Q(x) represents the probability of event xx occurring in distribution P and Q respectively.

  2. Relative Likelihood:

    • The term P(x)Q(x)\frac{P(x)}{Q(x)} calculates the relative likelihood of event xx according to distribution P compared to distribution Q.
    • If this ratio is close to 1, it means that the two distributions are similar for event xx.
  3. Summation:

    • The formula sums over all possible events xx in the distributions P and Q.
    • For each event, it calculates the difference in information content between the two distributions.
  4. Interpretation:

    • When P(x)P(x) is much larger than Q(x)Q(x) for an event xx, the logarithm term amplifies this difference.
    • A higher value in the sum indicates that the event xx is significantly more likely in distribution P compared to Q.

Intuition

  • KL divergence measures how much information is lost when distribution Q is used to approximate distribution P.
  • It quantifies the difference between the two distributions, highlighting areas where they diverge significantly.
  • A KL divergence of 0 indicates that the two distributions are identical, while a higher value signifies greater divergence.

In essence, KL divergence provides a way to assess the discrepancy between two probability distributions, helping in various fields like machine learning, statistics, and information theory to understand the difference in information content between them.

Also read about Maximising Likehood is equivalent to Minimising KL Divergence