Kullback's inequality

In information theory and statistics, Kullback's inequality is a lower bound on the Kullback–Leibler divergence expressed in terms of the large deviations rate function.^[1] If P and Q are probability distributions on the real line, such that P is absolutely continuous with respect to Q, i.e. P << Q, and whose first moments exist, then [math]\displaystyle{ D_{KL}(P\parallel Q) \ge \Psi_Q^*(\mu'_1(P)), }[/math] where [math]\displaystyle{ \Psi_Q^* }[/math] is the rate function, i.e. the convex conjugate of the cumulant-generating function, of [math]\displaystyle{ Q }[/math], and [math]\displaystyle{ \mu'_1(P) }[/math] is the first moment of [math]\displaystyle{ P. }[/math]

The Cramér–Rao bound is a corollary of this result.

Proof

Let P and Q be probability distributions (measures) on the real line, whose first moments exist, and such that P << Q. Consider the natural exponential family of Q given by [math]\displaystyle{ Q_\theta(A) = \frac{\int_A e^{\theta x}Q(dx)}{\int_{-\infty}^\infty e^{\theta x}Q(dx)} = \frac{1}{M_Q(\theta)} \int_A e^{\theta x}Q(dx) }[/math] for every measurable set A, where [math]\displaystyle{ M_Q }[/math] is the moment-generating function of Q. (Note that Q₀ = Q.) Then [math]\displaystyle{ D_{KL}(P\parallel Q) = D_{KL}(P\parallel Q_\theta) + \int_{\operatorname{supp}P}\left(\log\frac{\mathrm dQ_\theta}{\mathrm dQ}\right)\mathrm dP. }[/math] By Gibbs' inequality we have [math]\displaystyle{ D_{KL}(P\parallel Q_\theta) \ge 0 }[/math] so that [math]\displaystyle{ D_{KL}(P\parallel Q) \ge \int_{\operatorname{supp}P}\left(\log\frac{\mathrm dQ_\theta}{\mathrm dQ}\right)\mathrm dP = \int_{\operatorname{supp}P}\left(\log\frac{e^{\theta x}}{M_Q(\theta)}\right) P(dx) }[/math] Simplifying the right side, we have, for every real θ where [math]\displaystyle{ M_Q(\theta) \lt \infty: }[/math] [math]\displaystyle{ D_{KL}(P\parallel Q) \ge \mu'_1(P) \theta - \Psi_Q(\theta), }[/math] where [math]\displaystyle{ \mu'_1(P) }[/math] is the first moment, or mean, of P, and [math]\displaystyle{ \Psi_Q = \log M_Q }[/math] is called the cumulant-generating function. Taking the supremum completes the process of convex conjugation and yields the rate function: [math]\displaystyle{ D_{KL}(P\parallel Q) \ge \sup_\theta \left\{ \mu'_1(P) \theta - \Psi_Q(\theta) \right\} = \Psi_Q^*(\mu'_1(P)). }[/math]

Corollary: the Cramér–Rao bound

Main page: Cramér–Rao bound

Start with Kullback's inequality

Let X_θ be a family of probability distributions on the real line indexed by the real parameter θ, and satisfying certain regularity conditions. Then [math]\displaystyle{ \lim_{h\to 0} \frac {D_{KL}(X_{\theta+h} \parallel X_\theta)} {h^2} \ge \lim_{h\to 0} \frac {\Psi^*_\theta (\mu_{\theta+h})}{h^2}, }[/math]

where [math]\displaystyle{ \Psi^*_\theta }[/math] is the convex conjugate of the cumulant-generating function of [math]\displaystyle{ X_\theta }[/math] and [math]\displaystyle{ \mu_{\theta+h} }[/math] is the first moment of [math]\displaystyle{ X_{\theta+h}. }[/math]

Left side

The left side of this inequality can be simplified as follows: [math]\displaystyle{ \begin{align} \lim_{h\to 0} \frac {D_{KL}(X_{\theta+h}\parallel X_\theta)} {h^2} &=\lim_{h\to 0} \frac 1 {h^2} \int_{-\infty}^\infty \log \left( \frac{\mathrm dX_{\theta+h}}{\mathrm dX_\theta} \right) \mathrm dX_{\theta+h} \\ &=-\lim_{h\to 0} \frac 1 {h^2} \int_{-\infty}^\infty \log \left( \frac{\mathrm dX_{\theta}}{\mathrm dX_{\theta+h}} \right) \mathrm dX_{\theta+h} \\ &=-\lim_{h\to 0} \frac 1 {h^2} \int_{-\infty}^\infty \log\left( 1- \left (1-\frac{\mathrm dX_{\theta}}{\mathrm dX_{\theta+h}} \right ) \right) \mathrm dX_{\theta+h} \\ &= \lim_{h\to 0} \frac 1 {h^2} \int_{-\infty}^\infty \left[ \left( 1 - \frac{\mathrm dX_\theta}{\mathrm dX_{\theta+h}} \right) +\frac 1 2 \left( 1 - \frac{\mathrm dX_\theta}{\mathrm dX_{\theta+h}} \right) ^ 2 + o \left( \left( 1 - \frac{\mathrm dX_\theta}{\mathrm dX_{\theta+h}} \right) ^ 2 \right) \right]\mathrm dX_{\theta+h} && \text{Taylor series for } \log(1-t) \\ &= \lim_{h\to 0} \frac 1 {h^2} \int_{-\infty}^\infty \left[ \frac 1 2 \left( 1 - \frac{\mathrm dX_\theta}{\mathrm dX_{\theta+h}} \right)^2 \right]\mathrm dX_{\theta+h} \\ &= \lim_{h\to 0} \frac 1 {h^2} \int_{-\infty}^\infty \left[ \frac 1 2 \left( \frac{\mathrm dX_{\theta+h} - \mathrm dX_\theta}{\mathrm dX_{\theta+h}} \right)^2 \right]\mathrm dX_{\theta+h} \\ &= \frac 1 2 \mathcal I_X(\theta) \end{align} }[/math] which is half the Fisher information of the parameter θ.

Right side

The right side of the inequality can be developed as follows: [math]\displaystyle{ \lim_{h\to 0} \frac {\Psi^*_\theta (\mu_{\theta+h})}{h^2} = \lim_{h\to 0} \frac 1 {h^2} {\sup_t \{\mu_{\theta+h}t - \Psi_\theta(t)\} }. }[/math] This supremum is attained at a value of t=τ where the first derivative of the cumulant-generating function is [math]\displaystyle{ \Psi'_\theta(\tau) = \mu_{\theta+h}, }[/math] but we have [math]\displaystyle{ \Psi'_\theta(0) = \mu_\theta, }[/math] so that [math]\displaystyle{ \Psi''_\theta(0) = \frac{d\mu_\theta}{d\theta} \lim_{h \to 0} \frac h \tau. }[/math] Moreover, [math]\displaystyle{ \lim_{h\to 0} \frac {\Psi^*_\theta (\mu_{\theta+h})}{h^2} = \frac 1 {2\Psi''_\theta(0)}\left(\frac {d\mu_\theta}{d\theta}\right)^2 = \frac 1 {2\operatorname{Var}(X_\theta)}\left(\frac {d\mu_\theta}{d\theta}\right)^2. }[/math]

Putting both sides back together

We have: [math]\displaystyle{ \frac 1 2 \mathcal I_X(\theta) \ge \frac 1 {2\operatorname{Var}(X_\theta)}\left(\frac {d\mu_\theta}{d\theta}\right)^2, }[/math] which can be rearranged as: [math]\displaystyle{ \operatorname{Var}(X_\theta) \ge \frac{(d\mu_\theta / d\theta)^2} {\mathcal I_X(\theta)}. }[/math]

Notes and references

↑ Fuchs, Aimé; Letta, Giorgio (1970). "L'inégalité de Kullback. Application à la théorie de l'estimation". Séminaire de Probabilités de Strasbourg. Séminaire de probabilités (Strasbourg) 4: 108–131. http://www.numdam.org/item?id=SPS_1970__4__108_0.

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Kullback's inequality. Read more

[1] Fuchs, Aimé; Letta, Giorgio (1970). "L'inégalité de Kullback. Application à la théorie de l'estimation". Séminaire de Probabilités de Strasbourg. Séminaire de probabilités (Strasbourg) 4: 108–131. http://www.numdam.org/item?id=SPS_1970__4__108_0.

[1]

Anonymous

Search

Kullback's inequality

Namespaces

More

Page actions

Contents

Proof

Corollary: the Cramér–Rao bound

Start with Kullback's inequality

Left side

Right side

Putting both sides back together

See also

Notes and references

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Kullback's inequality

Proof

Corollary: the Cramér–Rao bound

Start with Kullback's inequality

Left side

Right side

Putting both sides back together

See also

Notes and references

Navigation

Wiki tools

Page tools

Other projects

Categories