A Short Proof for Hausdorff Moment Problem

Hausdorff moment problem asks for necessary and sufficient conditions that a given sequence \((m_n)\) with \(m_0=1\) be the sequence of moments of a random variable \(X\) supported on \([0,1]\), i.e., \(\operatorname{E}X^n=m_n\) for all \(n\).

In 1921, Hausdorff showed that \((m_n)\) is such a moment sequence if and only if the sequence is completely monotonic, i.e., its difference sequences satisfy the equation \((D^r m)_s \ge 0\) for all \(r, s \ge 0\). Here \(D\) is the difference operator on the space of real sequences \((a_n)\) given by \(D a = (a_{n} – a_{n+1})\).

The proof under the fold follows the outline given in (E18.5 – E18.6) Probability with Martingales by David Williams.

Proof of Necessity Suppose \((m_n)\) is the moment sequence of a random variable \(X\) supported on \([0,1]\). By induction, one can show that \((D^r m)_s = \operatorname{E}(1-X)^rX^s\). Clearly, as \(X\) is supported on \([0,1]\), the moment sequence is completely monotonic.

Proof of Sufficiency Suppose \((m_n)\) is a completely monotonic sequence with \(m_0 = 1\).

Define \(F_n(x) := \sum_{i \le nx}{n\choose i}(D^{n-i}m)_i\). Clearly, \(F_n\) is right-continuous and non-decreasing, and \(F_n(0^-) = 0\). To prove \(F_n(1) = 1\), one has to prove the identity $$\sum_{i}{n\choose i}(D^{n-i}m)_i = m_0.$$

Classical Trick Since the identity above is about vectors in the linear space (over the reals) spanned by \((m_n)\) and the linear space spanned by \((m_n)\) is isomorphic to the one spanned by \((\theta^n)\), the identity is equivalent to $$\sum_{i}{n\choose i}(D^{n-i}\theta)_i = \theta^0,$$ where \(\theta_n = \theta^n\). Now, we take advantage of the ring structure of \(\mathbb{R}[\theta]\). Notice that \((D^{r}\theta)_s = (1-\theta)^r\theta^s\). Using the binomial theorem, we obtain $$\sum_{i}{n\choose i}(D^{n-i}\theta)_i = \sum_{i}{n\choose i}(1-\theta)^{n-i}\theta^i = (1-\theta + \theta)^n = \theta^0.$$

Therefore \(F_n\) is a bona fide distribution function. Define \(m_{n, k} := \int_{[0,1]} x^kdF_n(x)\), i.e., \(m_{n,k}\) is the \(k\)th moment of \(F_n\). We now find an explicit formula for \(m_{n,k}\).

Noticing that \(F_n\) is constant, say \(c_{n,i}\), on \([\frac{i}{n}, \frac{i+1}{n})\), for all \(i=0, \dots, n-1\) and \(c_{n,i}\) is a linear combination of \(m_0, \dots, m_n\), we know that \(m_{n,k} = \sum_{i=0}^n a_{n,k,i}m_i\).

Just like what we did in proving the identity, we use the special case \(m_n = \theta^n\) to compute the coefficients \(a_i = a_{n,k,i}\), where \(0 \le \theta \le 1\). In this case, $$F_n(x) = \sum_{i \le nx}{n\choose i}(D^{n-i}\theta)_i = \sum_{i\le nx}{n\choose i}(1-\theta)^{n-i}\theta^i, m_{n,k} = \sum_{i=0}^n a_{i}\theta^i.$$

Now consider the situation in which a coin with probability \(\theta\) is tossed at times \(1,2,\dots\). The random variable \(H_k\) is \(1\) if the \(k\)th toss produces heads, \(0\) otherwise. Define \(A_n := (H_1 + \dots + H_n)/n\). It is immediate from the formula of \(F_n\) that \(F_n\) is the distribution function of \(A_n\), and so \(m_{n,k}\) is the \(k\)th moment of \(A_n\). However, one can calculate the \(k\)th moment of \(A_n\) explicitly. Let \(f\colon [k] \to [n]\) be chosen uniformly at random, \(Im_f\) be the cardinality of the image of \(f\) and denote by \(p_i = p_{n,k,i} := \operatorname{Pr}(Im_f = i)\). Using \(f, Im_f\) and \(p_i\), we obtain $$\operatorname{E}A_n^k = \operatorname{E}\left(\frac{H_1 + \dots + H_n}{n}\right)^k = \operatorname{E}H_{f(1)}\dots H_{f(k)} = \operatorname{E}\operatorname{E}[H_{f(1)}\dots H_{f(k)}\mid Im_f] = \operatorname{E}\theta^{Im_f} = \sum_{i=0}^n p_{i}\theta^i.$$ Therefore, for all \(\theta\in [0,1]\), we know that \(\sum_{i=0}^n a_i\theta^i = \sum_{i=0}^n p_i\theta^i\), and so \(a_i = p_i\) for all \( i=0,\dots, n\).

As both \((a_i)\) and \((p_i)\) do not depend on \(m_i\), \(a_i = p_i\) holds in general. Since \(p_k = p_{n, k, k} = \prod_{i=0}^{k-1}(1-i/n)\to 1\) as \(n\to\infty\)  and \(p_i = 0\) for all \(i > k\), we know that \(\lim_n m_{n,k}= m_k\).

Using the Helly–Bray Theorem, since \((F_n)\) is tight, there exists a distribution function \(F\) and a subsequence \((F_{k_n})\) such that \(F_{k_n}\) converges weakly to \(F\). The definition of weak convergence implies that $$\int_{[0,1]} x^k dF(x) = \lim_n \int_{[0,1]}x^k dF_{k_n}(x) = \lim_n m_{k_n,k} = m_k.$$ Therefore, the random variable \(X\) with distribution function \(F\) is supported on \([0,1]\) and its \(k\)th moment is \(m_k\). QED

There are other two classical moment problems: the Hamburger moment problem and the Stieltjes moment problem.

2 (this post is made with love)