Minimal Distance to Pi

Here is a problem from Week of Code 29 hosted by Hackerrank.

Problem Given two integers \(q_1\) and \(q_2\) (\(1\le q_1 \le q_2 \le 10^{15}\)), find and print a common fraction \(p/q\) such that \(q_1 \le q \le q_2\) and \(\left|p/q-\pi\right|\) is minimal. If there are several fractions having minimal distance to \(\pi\), choose the one with the smallest denominator.

Note that checking all possible denominators does not work as iterating for \(10^{15}\) times would exceed the time limit (2 seconds for C or 10 seconds for Ruby).

The problem setter suggested the following algorithm in the editorial of the problem:

  1. Given \(q\), it is easy to compute \(p\) such that \(r(q) := p/q\) is the closest rational to \(\pi\) among all rationals with denominator \(q\).
  2. Find the semiconvergents of the continued fraction of \(\pi\) with denominators \(\le 10^{15}\).
  3. Start from \(q = q_1\), and at each step increase \(q\) by the smallest denominator \(d\) of a semiconvergent such that \(r(q+d)\) is closer to \(\pi\) than \(r(q)\). Repeat until \(q\) exceeds \(q_2\).

Given \(q\), let \(d = d(q)\) be the smallest increment to the denominator \(q\) such that \(r(q+d)\) is closer to \(\pi\) than \(r(q)\). To justify the algorithm, one needs to prove the \(d\) is the denominator of one of the semiconvergents. The problem setter admits that he does not have a formal proof.

Inspired by the problem setter’s approach, here is a complete solution to the problem. Note that \(\pi\) should not be special in this problem, and can be replaced by any other irrational number \(\theta\). Without loss of generality, we may assume that \(\theta\in(0,1)\).

Let me first introduce the Farey intervals of \(\theta\).

  1. Start with the interval \((0/1, 1/1)\).
  2. Suppose the last interval is \((a/b, c/d)\). Cut it by the mediant of \(a/b\) and \(c/d\) and choose one of the intervals \((a/b, (a+c)/(b+d)), ((a+c)/(b+d), c/d)\) that contains \(\theta\) as the next interval.

We call the intervals appeared in the above process Farey intervals of \(\theta\). For example, take \(\theta = \pi – 3 = 0.1415926…\). The Farey intervals are:

$$(0/1, 1/1), (0/1, 1/2), (0/1, 1/3), (0/1, 1/4), (0/1, 1/5), (0/1, 1/6), (0/1, 1/7), (1/8, 1/7), (2/15, 1/7), \cdots$$

The Farey sequence of order \(n\), denoted by \(F_n\), is the sequence of completely reduced fractions between 0 and 1 which when in lowest terms have denominators less than or equal to \(n\), arranged in order of increasing size. Fractions which are neighbouring terms in any Farey sequence are known as a Farey pair. For example, Farey sequence of order 5 is

$$F_5 = (0/1, 1/5, 1/4, 1/3, 2/5, 1/2, 3/5, 2/3, 3/4, 4/5, 1/1).$$

Using the Stern–Brocot tree, one can prove that

Lemma 1 For every Farey interval \((a/b, c/d)\) of \(\theta\), the pair \((a/b, c/d)\) is a Farey pair. Conversely, for every Farey pair \((a/b, c/d)\), if \(\theta \in (a/b, c/d)\), then \((a/b, c/d)\) is a Farey interval.

We say \(p/q\) is a good rational approximation of \(\theta\) if every rational between \(p/q\) and \(\theta\) (exclusive) has a denominator greater than \(q\).

By the definition of Farey sequence, incorporating with Lemma 1, we know that

Lemma 2 A rational is an endpoint of a Farey interval of \(\theta\) if and only if it is a good rational approximation of \(\theta\).

In fact, one can show that the endpoints of Farey intervals and semiconvergents of continued fraction are the same thing! Thereof, the problem setter’s claim follows immediately from:

Proposition Given \(q\), let \(r(q) = p / q\) be the rational closest to \(\theta\) with denominator \(q\). If \(d = d(q)\) is the minimal increment to \(q\) such that \(r(q + d) = (p + c) / (q + d)\) is closer to \(\theta\) than \(r(q)\), then \(c/d\) is a good rational approximation.

Remark The proposition states that the increments to \(p/q\) always come from a good rational approximation. It is stronger than the problem setter’s statement, which only asserts that the increment to the \(q\) comes from a good rational approximation.

Proof In \((x, y)\)-plane, plot the trapezoid defined by

$$\left| y/x – \theta \right| < \left|p/q – \theta\right|, q < x < q + d.$$

Geometric interpretation

Also we interpret rational numbers \(p/q, (p+c)/(q+d)\) as points \(A = (q, p), B = (q+d, p+c)\). Let the line through \((q, p)\) parallel to \(y=\theta x\) intersect the vertical line \(x = q+d\) at \(C = (q+d, p+\theta d)\). By the definition of \(d\), we know that the trapezoid does not contain lattice points. In particular, there is no lattice point in the interior of the triangle \(ABC\). In the coordinate system with origin at \(A\), \(B\) has coordinate \((d, c)\) and the line through \(A, C\) is \(y = \theta x\). Since triangle \(ABC\) contains no lattice points, there is no \((b, a)\) with \(b < d\) such that \(a/b\) is between \(\theta\) and \(c/d\). In other words, \(c/d\) is a good rational approximation. QED.

Here is a fine print of the algorithm. Because floats may not have enough precision for the purpose of computation, we will instead use a convergent of the continuous fraction of \(\pi\) instead. All the computations will then happen in \(\mathbb{Q}\). Finally, we present the algorithm.

P = Rational(5706674932067741, 1816491048114374) - 3

# find endpoints of Farey intervals
a, b, c, d = 0, 1, 1, 1
farey = [[a,b],[c,d]]
while (f = b + d) <= max - min
  farey << [e = a + c, f]
  if P < Rational(e, f)
    c, d = e, f
    a, b = e, f

min, max =
p_min = (P * q).round

# increase p_min/min by frations in farey
while min <= max
  c, d = nil, nil
  farey.each do |a, b|
    break if min + b > max
    if (Rational(p_min + a, min + b) - P).abs < (Rational(p_min, min) - P).abs
      c, d = a, b
  break if d == nil
  p_min += c; min += d

puts "#{p_min + 3 * min}/#{min}"
0 (be the first to like this)

On Extension of Rationals

This note is based on my talk An Expedition to the World of \(p\)-adic Numbers at Carnegie Mellon University on January 15, 2014.

Construction from Cauchy Sequences

A standard approach to construct the real numbers from the rational numbers is a procedure called completion, which forces all Cauchy sequences in a metric space by adding new points to the metric space.

Definition A norm, denoted by \(|\cdot|\), on the field \(F\) is a function from \(F\) to the set of nonnegative numbers in an ordered field \(R\) such that (1) \(|x|=0\) if and only if \(x=0\); (2) \(|xy|=|x||y|\); (3) \(|x+y|\leq|x|+|y|\).

Remark The ordered field is usually chosen to be \(\mathbb{R}\). However, to construct of \(\mathbb{R}\), the ordered field is \(\mathbb{Q}\).

A norm on the field \(F\) naturally gives rise to a metric \(d(x,y)=|x-y|\) on \(F\). For example, the standard metric on the rationals is defined by the absolute value, namely, \(d(x, y) = |x-y|\), where \(x,y\in\mathbb{Q}\), and \(|\cdot|\) is the standard norm, i.e., the absolute value, on \(\mathbb{Q}\).

Given a metric \(d\) on the field \(F\), the completion procedure considers the set of all Cauchy sequences on \(F\) and an equivalence relation

$$(a_n)\sim (b_n)\text{ if and only if }d(a_n, b_n)\to 0\text{ as }n\to\infty.$$

Definition Two norms on \(F\), \(|\cdot|_1, |\cdot|_2\), are equivalent if and only if for every sequence \((a_n)\), it is a Cauchy sequence with respect to \(d_1\) if and only if it is so with respect to \(d_2\), where \(d_1, d_2\) are the metrics determined by \(|\cdot|_1, |\cdot|_2\) respectively.

Remark It is reasonable to worry about the situation in which two norms \(|\cdot|_1\) and \(|\cdot|_2\) are equivalent, but they introduce different equivalent relationships on the set of Cauchy sequences. However, given two equivalent norms, \(|a_n|_1\) converges to 0 if and only if it does so with respect to \(d_2\). (Hint: prove by contradiction and consider the sequence \((1/a_n)\).)

Definition The trivial norm on \(F\) is a norm \(|\cdot|\) such that \(|0|=0\) and \(|x|=1\) for \(x\neq 0\).

Since we are interested in norms that generate different completions of the field, it would be great if we can classify all nontrivial norms modulo the norm equivalence.

Alternative Norms on Rationals

Definition Let \(p\) be any prime number. For any non-zero integer \(a\), let \(\mathrm{ord}_pa\) be the highest power of \(p\) which divides \(a\). For any rational \(x=a/b\), define \(\mathrm{ord}_px=\mathrm{ord}_pa-\mathrm{ord}_pb\). Further define a map \(|\cdot|_p\) on \(\mathbb{Q}\) as follows: $$|x|_p = \begin{cases}p^{-\mathrm{ord}_px} & \text{if }x\neq 0 \\ 0 & \text{if }x=0\end{cases}.$$

Proposition \(|\cdot|_p\) is a norm on \(\mathbb{Q}\). We call it the \(p\)-adic norm on \(\mathbb{Q}\).

Proof (sketch) We only check the triangle inequality. Notice that $$\begin{eqnarray*}\mathrm{ord}_p\left(\frac{a}{b}-\frac{c}{d}\right) &=& \mathrm{ord}_p\left(\frac{ad-bc}{bd}\right) \\ &=& \mathrm{ord}_p(ad-bc)-\mathrm{ord}_p(bd) \\ &\geq& \min(\mathrm{ord}_p(ad), \mathrm{ord}_p(bc)) – \mathrm{ord}_p(bd) \\ &=& \min(\mathrm{ord}_p(ad/bd), \mathrm{ord}_p(bc/bd)).\end{eqnarray*}$$ Therefore, we obtain \(|a/b-c/d|_p \leq \max(|a/b|_p, |c/d|_p) \leq |a/b|_p+|c/d|_p\). QED

Remark Some counterintuitive fact about the \(p\)-adic norm are

The following theorem due to Ostrowski classifies all possible norms on the rationals up to norm equivalence. We denote the standard absolute value by \(|\cdot|_\infty\).

Theorem [OSTROWSKI 1916] Every nontrivial norm \(|\cdot|\) on \(\mathbb{Q}\) is equivalent to \(|\cdot|_p\) for some prime \(p\) or for \(p=\infty\).

Proof We consider two cases (i) There is \(n\in\{1,2,\ldots\}\) such that \(|n|>1\); (ii) For all \(n\in\{1,2,\ldots\}\), \(|n|\leq 1\). As we shall see, in the 1st case, the norm is equivalent to \(|\cdot|_\infty\), whereas, in the 2nd case, the norm is equivalent to \(|\cdot|_p\) for some prime \(p\).

Case (i) Let \(n_0\) be the least such \(n\in\{1,2,\ldots\}\) such that \(|n|>1\). Let \(\alpha > 0\) be such that \(|n_0|=n_0^\alpha\).

For every positive integer \(n\), if \(n_0^s \leq n < n_0^{s+1}\), then we can write it in \(n_0\)-base: $$n = a_0 + a_1n_0 + \ldots + a_sn_0^s.$$

By the choice of \(n_0\), we know that \(|a_i|\leq 1\) for all \(i\). Therefore, we obtain $$\begin{eqnarray*}|n| & \leq & |a_0| + |a_1||n_0| + \ldots + |a_s||n_0|^s \\ & \leq & 1 + |n_0| + \ldots |n_0|^s \\ & \leq & n_0^{s\alpha}\left(1+n_0^{-\alpha}+n_0^{-2\alpha}+\ldots\right) \\ & \leq & Cn^\alpha,\end{eqnarray*}$$ where \(C\) does not depend on \(n\). Replace \(n\) by \(n^N\) and get \(|n|^N = |n^N| \leq Cn^{N\alpha}\), and so \(|n|\leq \sqrt[N]{C}n^\alpha\). As we can choose \(N\) to be arbitrarily large, we obtain \(|n| \leq n^\alpha\).

On the other hand, we have $$\begin{eqnarray*}|n| & \geq & |n_0^{s+1}| – |n_0^{s+1}-n| \\ & \geq & n_0^{(s+1)\alpha} – (n_0^{s+1}-n_0^s)^\alpha\\ & = & n_0^{(s+1)\alpha}\left[1-(1-1/n_0)^\alpha\right] \\ & > & C’n^\alpha.\end{eqnarray*}$$ Using the same trick, we can actually take \(C’=1\).

Therefore \(|n| = n^\alpha\). It is easy to see it is equivalent to \(|\cdot|_\infty\).

Case (ii) Since the norm is nontrivial, let \(n_0\) be the least \(n\) such that \(|n|<1\).

Claim 1 \(n_0=p\) is a prime.

Claim 2 \(|q|=1\) if \(q\) is a prime other than \(p\).

Suppose \(|q| < 1\). Find \(M\) large enough so that both \(|p^M|\) and \(|q^M|\) are less than \(1/2\). By Bézout’s lemma, there exists \(a,b\in\mathbb{Z}\) such that \(ap^M + bq^M = 1\). However, $$1 = |1| \leq |a||p^M| + |b||q^M| < 1/2 + 1/2 = 1,$$ a contradiction.

Therefore, we know \(|n|=|p|^{ord_p n}\). It is easy to see it is equivalent to \(|\cdot|_p\). QED

 Non-Archimedean Norm

As one might have noticed, the \(p\)-adic norm satisfies an inequality stronger than the triangle inequality, namely \(|a\pm b|_p\leq \max(|x|_p, |y|_p)\).

Definition A norm is non-Archimedean provided \(|x\pm y|\leq \max(|x|, |y|)\).

The world of non-Archimedean norm is good and weird. Here are two testimonies.

Proposition (No more scalene triangles!) If \(|x|\neq |y|\), then \(|x\pm y| = \max(|x|, |y|)\).

Proof Suppose \(|x| < |y|\). On one hand, we have \(|x\pm y| \leq |y|\). On the other hand, \(|y| \leq \max(|x\pm y|, |x|)\). Since \(|x| < |y|\), we must have \(|y| \leq |x\pm y|\). QED.

Proposition (All points are centers!) \(D(a, r^-) = D(b, r^-)\) for all \(b\in D(a, r^-)\) and \(D(a, r) = D(b,r)\) for all \(b\in D(a,r)\), where \(D(c, r^-) = \{x : |x-c|<r\}\) and \(D(c,r)=\{x:|x-c|\leq r\}\).

Construction of \(p\)-adic Numbers

The \(p\)-adic numbers are the completion of \(\mathbb{Q}\) via the \(p\)-adic norm.

Definition The set of \(p\)-adic numbers is defined as $$\mathbb{Q}_p = \{\text{Cauchy sequences with respect to }|\cdot|_p\} / \sim_p,$$ where \((a_n)\sim_p(b_n)\) iff \(|a_n-b_n|_p\to 0\) as \(n\to\infty\).

We would like to extend \(|\cdot|_p\) from \(\mathbb{Q}\) to \(\mathbb{Q}_p\). When extending \(|\cdot|_\infty\) from \(\mathbb{Q}\) to \(\mathbb{R}\), we set \(|[(a_n)]|_\infty\) to be \([(|a_n|)]\), an element in \(\mathbb{R}\). However, the values that \(|\cdot|_p\) can take, after the extension, are still in \(\mathbb{Q}\).

Definition The extension of \(|\cdot|_p\) on \(\mathbb{Q}_p\) is defined by \(|[(a_n)]|_p = \lim_{n\to\infty}|a_n|_p\).

Remark Suppose \((a_n)\sim_p (a_n’)\). Then \(\lim_{n\to\infty}|a_n-a_n’|_p=0\), and so \(\lim_{n\to\infty}|a_n|_p=\lim_{n\to\infty}|a_n’|_p\). Moreover, one can prove that \(\lim_{n\to\infty}|a_n|_p\) always exists provided that \((a_n)\) is a Cauchy sequence. (Hint: Suppose \(\lim_{n\to\infty}|a_n|_p > 0\). There exists \(\epsilon > 0\) such that \(|a_n|_p > \epsilon\) infinitely often. Choose \(N\) enough so that \(|a_m – a_n|_p < \epsilon\) for all \(m,n>N\). Use ‘no more scalene triangles!’ property to deduce a contradiction.)

Representation of \(p\)-adic Numbers

Even though each real number is an equivalence class of Cauchy sequences, each equivalence class has a canonical representative. For instance, the canonical representative of \(\pi\) is \((3, 3.1, 3.14, 3.141, 3.1415, \ldots)\). The analog for \(\mathbb{Q}_p\) is the following.

Theorem Every equivalence class \(a\) in \(\mathbb{Q}_p\) for which \(|a|_p\leq 1\) has exactly one representative Cauchy sequence of the form \((a_i)\) for which (1) \(0\leq a_i < p^i\) for \(i=1,2,3,\ldots\); (2) \(a_i = a_{i+1} (\pmod p^i)\) for \(i=1,2,3,\ldots\).

Proof of Uniqueness Prove by definition chasing.

Proof of Existence We shall repeatedly apply the following lemma.

Lemma For every \(b\in\mathbb{Q}\) for which \(|b|_p \leq 1\) and \(i\in\mathbb{N}\), there exists \(a\in\{0, \ldots, p^i-1\}\) such that \(|a-b|_p \leq p^{-i}\).

Proof of Lemma Suppose \(b=m/n\) is in the lowest form. As \(|b|_p\leq 1\), we know that \((n, p^i)=1\). By Bézout’s lemma, \(an+a’p^i=m\) for some integers \(a,a’\). We may assume \(a\in\{0,\ldots,p^i-1\}\). Note that \(a-b=a’p^i/n\), and so \(|a-b|_p \leq p^{-i}\). qed

Suppose \((c_i)\) is a representative of \(a\). As \((c_i)\) is a Cauchy sequence, we can extract a subsequence \((b_i)\) such that \(|b_i – b_j| \leq p^{-i}\) for all \(i < j\) which is still a representative of \(a\). Using the lemma above, for each \(b_i\), we can find \(0 \leq a_i < q^i\) such that \(|a_i-b_i|_p \leq q^{-i}\). Therefore \((a_i)\) is a representative of \(a\) as well. For all \(i<j\), we have \(|a_i-a_j|_p \leq \max(|a_i – b_i|_p, |b_i-b_j|_p, |b_j-a_j|_p) \leq q^{-i}\), Therefore \(q^i\) divides \(a_i – a_j\).


For \(|a|_p\leq 1\), we write \(a=b_0 + b_1p + b_2p^2 + \ldots\), where \((b_{n-1}b_{n-2}\ldots b_0)_p = a_n\).

What if \(|a|_p > 1\)? As \(|ap^m|=|a|/p^m\), \(|ap^m|\leq 1\) for some natural number \(m\). By the representation theorem, we can write \(ap^m = b_0 + b_1p + b_2p^2 + \ldots\), and \(a = b_0p^{-m} + b_1p^{-m+1} + b_2p^{-m+2} + \ldots\).

Using the representation of \(p\)-adic numbers, one can perform arithmetic operations such as addition, subtraction, multiplication and division.

Like \(\mathbb{R}\), \(\mathbb{Q}_p\) is not algebraically closed. Though \(\mathbb{C}\), the algebraic closure of \(\mathbb{R}\), has degree \(2\) over \(\mathbb{R}\), and it is complete with respect to the absolute value, it is not so for \(\overline{\mathbb{Q}_p}\), the algebraic closure of \(\mathbb{Q}_p\). In fact, \(\overline{\mathbb{Q}_p}\) has infinite degree over \(\mathbb{Q}_p\) and is, unfortunately, not complete with respect to proper extension of \(|\cdot|_p\). The good news is that the completion of \(\overline{\mathbb{Q}_p}\), denoted by \(\Omega\) is algebraically closed.

3 (this post is made with love)

A Note on Thue’s Method

This note is based on my talk Introduction to Diophantine Approximation at Carnegie Mellon University on November 4, 2014. Due to limit of time, details of Thue’s method were not fully presented in the talk. The note is supposed to serve as a complement to that talk.

Diophantine Approximation

Diophantine approximation deals with the approximation of real numbers by rational numbers. In other words, given a real number \(\alpha\), we are interested in finding a rational approximation \(p/q\) such that the error \(|\alpha – p/q|\) is “small”. The smaller the error, the better the approximation. Since the set of rationals is dense, there is no best rational approximation of an irrational number (apparently, the best rational approximation of a rational number is itself). However, if we compare the error with the complexity of the rational, i.e., the denominator of the rational, it is natural to expect that the more complicated the rational approximation, the smaller the error. In particular, we can make sense of this in the following way.

Theorem (DIRICHLET) Given an irrational \(\alpha\), we have \(|\alpha – p/q| < 1/q^2\) for infinitely many rationals \(p/q\).

Proof Think of \(C = \mathbb{R}/\mathbb{Z}\) as a circle with circumference 1. Fix a natural number \(N\) and mark \(nq\) on \(C\) for \(n=1,\ldots,N\). By the pigeonhole principle, there are \(1\leq n_1 \leq n_2\leq N\) such that \(n_1\alpha\) and \(n_2\alpha\) is \(\leq 1/N\) apart from each other on \(C\). Let \(q = n_2 – n_1 < N\). Then there is \(p\) such that \(|q\alpha – p| \leq 1/N\), and so \(|\alpha – p/q| \leq 1/(Nq) < 1/q^2\). Taking \(N\) large enough yields better approximation \(p/q\) with the desired property.

By means of continued fraction, one can push the result a bit further by a constant factor.

Theorem (HURWITZ 1891) Given an irrational \(\alpha\), we have \(|\alpha – p/q| < 1/\sqrt{5}q^2\) for infinitely many rationals \(p/q\). Moreover, the constant \(\sqrt{5}\) is the best possible for \(\alpha = (1+\sqrt{5})/2\).

These show us how “well” we can approximate reals by rationals. On the other hand, we would also like to say that we cannot approximate “too well”, i.e., given a real number \(\alpha\), we have \(|\alpha – p/q| > 1/q^n\) for all but finite rationals \(p/q\), where \(n\) may depend on \(\alpha\). Unfortunately, this is not so because of the existence of Liouville numbers.

Definition A Liouville number is an irrational number \(\alpha\) with the property that, for every natural number \(n\), there exists a rational \(p/q\) and such that \(|\alpha – p/q| < q^{-n}\).

For example, \(\alpha = \sum_{k=1}^\infty 2^{-k!}\) is a Liouville number.

However, we indeed can say something along those lines for algebraic numbers.

Diophantine Approximation for Algebraic Numbers

Definition An algebraic number is a number that is a root of a non-zero polynomial in one variable with rational coefficients (or equivalently integer coefficients). Given an algebraic number, there is a monic polynomial (with rational coefficients) of least degree that has the number as a root. This polynomial is called its minimal polynomial. If its minimal polynomial has degree \(d\), then the algebraic number is said to be of degree \(d\).

Theorem (LIOUVILLE 1844) Given an algebraic irrational number \(\alpha\) of degree \(d\), there exists \(A>0\) such that for all rationals \(p/q\), \(|\alpha – p/q|>Aq^{-d}\).

A immediate consequence of the theorem is that Liouville numbers are not algebraic.

Proof Let \(f(x)\) be a polynomial with integer coefficients of degree \(d\) that has \(\alpha\) as a root. Note that \(f'(\alpha)\neq 0\) otherwise \(f'(x)\) is a polynomial of degree \(<d\) that has \(\alpha\) as a root. As \(f'(x)\) is continuous, there is \(r>0\) such that \(|f'(x)| < 1 / A\) for all \(|x-\alpha|<r\). Without loss of generality, we may assume \(p/q\) is pretty close, i.e., within distance \(r\), to \(\alpha\) by taking \(A < r\). Note that also \(f(p/q)\neq 0\) otherwise \(f(x) / (x-p/q)\) is a polynomial of degree \(<d\) that has \(\alpha\) as a root. Since \(q^df(p/q)\) is an integer, \(|f(p/q)| \geq q^{-d}\). On the other hand, by mean value theorem, there is \(c\) between \(p/q\) and \(\alpha\) such that \(|f(\alpha)-f(p/q)| = |f'(c)||\alpha – p/q|\). Therefore \(|\alpha – p/q| = |f(p/q)|/f'(c) > Aq^{-n}\) since \(|c-\alpha| < r\).

Liouville’s theorem (on Diophantine approximation) can be phrased in an alternative way.

Corollary Given an irrational algebraic number \(\alpha\) of degree \(d\). If the equation \(|\alpha – p/q| < 1/q^n\) has infinitely many rational solutions \(p/q\), then \(n\leq d\).

The continuing working was established by Axel Thue (1909), Carl Ludwig Siegle (1921), Freeman Dyson (1947), and culminated at Klaus Roth’s result (1955):

Theorem (ROTH 1955) Given an irrational algebraic number \(\alpha\). If the equation \(|\alpha-p/q|<1/q^n\) has infinitely many rational solutions \(p/q\), then \(n\leq 2\).

The constant \(2\) cannot be improved due to Dirichlet’s theorem. Another way to interpret Roth’s theorem is as follows.

Corollary Any irrational algebraic number \(\alpha\) has approximation exponent equal to 2, i.e., for any \(\epsilon>0\), the inequality \(|\alpha-p/q|<1/q^{2+\epsilon}\) can only have finitely many solutions \(p/q\).

This amounts to say that we cannot approximate irrational algebraic numbers “too well” in the sense that if we slightly restrict the error from \(q^{-2}\) to \(q^{-(2+\epsilon)}\), the number of “good” approximation changes from infinite to finite.

Thue’s Method

Although Roth’s proof has merit in itself, part of his idea dates back to Thue’s work. Indeed, Thue used a method, now known as the polynomial method, in a special way to prove the following theorem.

Theorem (THUE 1909) Given an irrational algebraic number \(\alpha\). If the equation \(|\alpha-p/q|<1/q^n\) has infinitely many rational solutions \(p/q\), then \(n\leq d/2+1\), where \(d\) is the degree of \(\alpha\).

Proof Suppose that the irrational algebraic number \(\alpha\) has two “good” rational approximations \(r_1 = p_1/q_1, r_2 = p_2/q_2\), i.e., \(|\alpha – p_i/q_i| < q_i^{-n}\) for \(i = 1,2\). Since there are infinitely many “good” rational approximations, we get to choose later how large we want \(q_i\) to be.

The rest of the proof can be broken into three parts. Let \(m\in\mathbb{N}\) and \(0 < \epsilon \ll 1\) be two constants to be determined. The constant \(\epsilon\) is fixed before we send it to \(0\).

We are going to use \(C_0, \ldots, C_6\) to indicate positive constants that may depend on \(\epsilon, d\) and \(c\), a constant determined by \(\alpha\) (defined later). We may always assume \(\alpha, d, c, C_0,\ldots, C_6 \ll m\).

Part I Find a non-zero polynomial \(f(x,y) = P(x) + Q(x)y\) with integer coefficients that satisfies the following properties:

  1. it vanishes at \((\alpha, \alpha)\) to order \((m, 1)\), i.e., \(\partial_x^k f(\alpha, \alpha) = 0\) for all \(k<m\);
  2. its degree in \(x\) is \(\leq D := \frac{1}{2}(1+\epsilon)dm\).
  3. its size, i.e., the maximum of all its coefficients’ absolute values, denoted by \(|f|\), is \(\leq C_1^m\) for some constant \(C_1\).

Proof of Part I The proof is a classical “parameter counting” argument. Consider a polynomial \(f(x,y)=P(x)+Q(x)y = \sum_{i<D}a_ix^i + \sum_{i<D}b_ix^iy\) with degree \(\leq D\) in \(x\) and think of its \(2D\) coefficients as undetermined. We can interpret the first property as a homogeneous system of \(m\) linear equations with coefficients in \(\mathbb{Z}[\alpha]\) because $$\frac{1}{k!}\partial_x^k f(\alpha, \alpha) = 0 \text{ if and only if } \sum_{i < D}{i\choose k}\alpha^{i-k}a_i + \sum_{i < D}{i\choose k}\alpha^{i-k+1}b_i = 0.$$

Fix a (non-zero) polynomial \(c_dx^d + \ldots + c_0\) with integer coefficients that has \(\alpha\) as a root and \(c\) the size of this polynomial. Notice that \(c_d\alpha^d = -(c_{d-1}\alpha^{d-1} + \ldots + c_0)\). For each of the linear equation above of the form $$L_0 + L_1\alpha + \ldots + L_t\alpha^t = 0,$$ for some \(t \geq d\) where \(L_k\) are linear combination of \(a_i, b_i\) with integer coefficients, we can replace it by $$c_d(L_0 + L_1\alpha + \ldots + L_{t-1}\alpha^{t-1}) – L_t(c_0\alpha^{t-d} + \ldots + c_{d-1}\alpha^{t-1}) = 0,$$ i.e., $$L’_0 + L_1’\alpha + \ldots + L_{t-1}\alpha^{t-1} = 0$$ with $$L’_k = \begin{cases} c_dL_k & 0\leq k < t-d \\ c_dL_k – c_{k-t+d}L_t & t-d \leq k < t\end{cases}$$ and we can repeat this until its degree in \(\alpha\) becomes smaller than \(d\). Let \(|L_k|_1\) be the sum of the absolute values of all coefficients in \(L_k\). Note that each replacement increases the \(\max |L_k|_1\) to \(\max |L_k’|_1 \leq 2c \max|L_k|\). Note that the initially \(\max |L_k| < 2^D\). After at most \(D\) replacements, the equation satisfies \(\max |L_k| < 2^D(2c)^D = C_0^m\) for some constant \(C_0\).

Finally, for each of the linear equation, after the replacements, of the form \(L_0 + L_1\alpha + \ldots +  L_{d-1}\alpha^{d-1} = 0\), it is equivalent to have \( L_0 = L_1 = \ldots = L_{d-1}\) because \(1, \alpha, \ldots, \alpha^{d-1}\) are linearly independent over \(\mathbb{Q}\). Therefore each linear equation with coefficient in \(\mathbb{Z}[\alpha]\) corresponding to \(\partial_x^k f(\alpha, \alpha) = 0\) is equivalent to \(d\) linear equations whose \(|\cdot|_1\) are all bounded by \(C_0^m\). In total, we have \(dm\) linear equations with \(2D\) undetermined.

One can view this homogeneous system of linear equations as a linear transformation \(L = (L_1, \ldots, L_{dm}): \mathbb{Z}^{2D} \to \mathbb{Z}^{dm}\). Consider all integral points in \([-M, M]^{2D}\) and their image under \(L\). Because \(|L_k|_1 \leq C_0^m\), their image is contained in \([-C_0^mM, C_0^mM]^{dm}\). By the pigeonhole principle, as long as $$(2M+1)^{2D} > (2C_0^mM+1)^{dm},$$ there are two integral points \(v’, v”\) in \([-M,M]^{2D}\) such that \(Lv’=Lv”\), hence there is an integral point \(v=v’-v”\in [-2M, 2M]^{2D}\) such that \(Lv=0\) and \(v\neq 0\), hence a non-zero solution to the linear system. To guarantee the above inequality involving \(M\), it is enough to have \(2M > C_0^{m/\epsilon}\), and so \(2M = C_1^m\), for some constant \(C_1\), works.

Part II Find \(l < L := \epsilon m\) such that \(g(x,y) = \frac{1}{l!}\partial_x^l f(x,y)\) satisfies the following properties:

  1. it vanishes at \((\alpha, \alpha)\) to order \((m-L, 1)\);
  2. its degree in \(x\) is \(\leq D\) (\(=\frac{1}{2}(1+\epsilon)dm\));
  3. its size is \(|g| \leq {D\choose l}C_1^m < C_2^m\) for some constant \(C_2\);
  4. and \(g(r_1, r_2)\neq 0\).

Proof of Part II The first three properties follow immediately from the properties of \(f\). Henceforth, we only have to show there exists \(l < L\) such that the last property holds. For the sake of contradiction, assume that \(\partial_x^l f(r_1, r_2) = 0\) for all \(l < L\). Define $$D(x) := \begin{vmatrix}P(x) & Q(x) \\ P'(x) & Q'(x)\end{vmatrix} = P(x)Q'(x) – P'(x)Q(x).$$ Because \(f(x,y)\) vanishes at \((r_1, r_2)\) to order \((L,1)\), by Leibniz rule, \(D(x)\) vanishes at \(r_1\) to order \(L-1\). Note that \(D(x)\) is in \(\mathbb{Z}[x]\). By Gauss’ lemma, \((q_1x-p_1)^{L-1}\) divides \(D(x)\). Therefore either (1) \(|D(x)| \geq q_1^{L-1}\); or (2) \(D(x) = 0\) as a polynomial.

If it is the first case, because \(|D(x)| = |P(x)Q'(x) – P'(x)Q(x)| < 2D|f|^2\), we have \(2D|f|^2 > q_1^{L-1}\). Because \(|f| < C_1^m\), we get a contradiction as long as $$q_1 > C_3,$$ for some constant \(C_3\).

If it is the second case, \(P(x)\) and \(Q(x)\) are linearly dependent over \(\mathbb{Q}\). Therefore either (1) \(P(x) = -r_2’Q(x)\) and \(Q(x)\) vanishes at \(r_1\) to order \(< L\) or (2) \(Q(x)\) vanishes at \(r_1\) to order \(L\). If it is the first case, because \(f(x,y) = (y-r_2′)Q(x)\) vanishes at \((r_1, r_2)\) to order \((L, 1)\) and \(Q(x)\) vanish at \(r_1\) to order \(<L\), we know that \(r_2′ = r_2\), hence by Gauss’ lemma, \(f(x,y)\) is a multiple of \(q_2x – p_2\), and so \(|f| \geq q_2\). If it is the second case, because \(Q\) vanishes at \(r_1\) to order \(L\), by Gauss’ lemma, \(Q(x)\) is a multiple of \((q_1x – p_1)^L\), and so \(|f| \geq |Q| \geq q_1^L\). To get a contradiction in both cases, we need $$q_1 > C_4\text{ and }q_2 > C_1^m.$$

Therefore, it is enough to have $$q_1 > C_5\text{ and }q_2 > q_1^m$$ for some constant \(C_5\).

Part III We claim that \(|g(r_1, r_2)| < C_6^m\left(q_1^{-(1-\epsilon)m}+q_2^{-1}\right)\).

Proof of Part III We know that $$|g(r_1, r_2)-g(r_1, \alpha)| = |Q(r_1)||\alpha – r_2| \leq D|g|(|\alpha|+1)^D|\alpha-r_2|.$$ Because \(g\) vanishes at \((\alpha_1, \alpha_1)\) to order \((m-L, 1)\), using Taylor expansion, we have $$|g(r_1, \alpha) – g(\alpha, \alpha)| \leq \sup_{|x| < |\alpha|+1}\frac{1}{(m-L)!}\partial_x^{m-L}g(x,\alpha) |\alpha_1-r_1|^{m-L} < D2^D|g|(|\alpha|+1)^D||\alpha_1-r_1|^{m-L}.$$ As \(r_1, r_2\) are two “good” approximations of \(\alpha\), we obtain $$\begin{eqnarray}|g(r_1, r_2)| & \leq & D|g|(|\alpha|+1)^D|\alpha-r_2| + D2^D|g|(|\alpha|+1)^D||\alpha_1-r_1|^{m-L}\\ & < & C_6^m\left(q_2^{-n} + q_1^{(-m+L)n}\right),\end{eqnarray}$$ for some constant \(C_6\)

From Part II, we know that \(g(r_1, r_2)\neq 0\), and so \(|g(r_1, r_2)| \geq q_1^{-D}q_2^{-1}\). From Part III, we know that \(|g(r_1, r_2)| < C_6^m(q_1^{(-m+L)n}+q_2^{-n})\). Therefore, we obtain $$q_1^{-D}q_2^{-1} < C_6^m\left(q_1^{(-m+L)n}+q_2^{-n}\right).$$ Set \(m\) to be the biggest integer less than \(\log{q_2} / \log{q_1}\), and so \(q_1^m < q_2 \leq q_1^{m+1}\). To ensure Part II to work, we only need to take \(q_1 > C_5\). In addition, we require \(q_1^\epsilon > C_6\) and so \(C_6^m < q_1^{\epsilon m} < q_2^\epsilon\). The choice of \(m\) implies that $$q_2^{-\frac{1}{2}(1+\epsilon)d – 1} = q_2^{-D/m}q_2^{-1} < q_2^\epsilon\left(q_2^{-\frac{m-L}{m+1}n}+q_2^{-n}\right) < 2q_2^{-(1-2\epsilon)n+\epsilon}.$$ In order to have this inequality hold for arbitrarily large \(q_2\), we need \(\frac{1}{2}(1+\epsilon)d + 1 > (1-2\epsilon)n-\epsilon\). Let \(\epsilon\) go to \(0\) and obtain \(n\leq \frac{1}{2}d + 1\). Q.E.D.


The proof largely relies on Larry Guth‘s notes on polynomial method.

0 (be the first to like this)