Sphere packing bounds via spherical codes

Henry Cohn and I just uploaded to the arXiv our paper “Sphere packing bounds via spherical codes.” [Update: Henry gave a wonderful talk about our paper at the IAS. A video of the talk is available here.]

KeplerConjectureWhat’s the most space-efficient way to arrange a collection of identical balls? This is known as the sphere packing problem. It is a very difficult problem with a long and interesting history. This problem in 3-dimensions is known as Kepler conjecture, which says that the the face-centered cubic formation does best. This is basically how grocers stack oranges. The seemingly innocent Kepler conjecture turned out to be extremely difficult, and it was only solved in the late 1990’s by Thomas Hales who gave a very complex proof involving massive computer-aided calculations. I’ve included a few article links at the end of this blog post on the history and background of the sphere packing problem.

Now, what about sphere packing in higher dimensions? Unfortunately, we know very little about what happens beyond three dimensions. No proof of optimality is known in any higher dimension, and there are only a few dozen dimensions in which there are even plausible conjectures for the densest packing. In dimensions 8 and 24 there are upper bounds that are extremely close to the conjectured optima, thanks to the works of Cohn, Elkies, and Kumar [1,2,3] (dimensions 8 and 24 are somehow special because of the existence of highly symmetric lattices known as the {E_8} lattice in dimension 8 and the Leech lattice in dimension 24). However, in most dimensions we must be content with much cruder bounds.

The current state of art for sphere packing density upper bounds is more or less as follows:

  • In dimensions 1, 2, 3, the exact upper bound is known. The result for dimension 3 is due to Hales.
  • In low dimensions, namely 4 to 42, Cohn and Elkies improved previous record by Rogers, although there were some recent improvements in dimensions 4,5,6,7,9 by de Laat, Filho, and Vallentin using techniques of semidefinite programming.
  • In all high dimensions, namely 43 and above, the best bounds are due to Kabatiansky and Levenshtein in 1978 and have not been improved since then.

The purpose of our of paper is fourfold:

  1. We give a small improvement over the 1978 bounds of Kabatiansky and Levenshtein in high dimensions by giving a simple modification of their geometric argument relating spherical codes to sphere packings.
  2. Kabatiansky and Levenshtein derived their bounds by first formulating a linear program for proving upper bounds on spherical codes. Cohn and Elkies found a more direct approach to bounding sphere packing densities, with no need to consider spherical codes. However, despite the excellent performance in low dimensions, the asymptotic behavior of the Cohn-Elkies bound is far from obvious and it has been unclear whether it improves on, or even matches, the Kabatiansky-Levenshtein bound asymptotically. In our paper, we show that in every dimension {n}, the Cohn-Elkies linear program can always match the Kabatiansky-Levenshtein approach. This further demonstrates the power of the linear programming bound for sphere packing.
  3. We prove an analogue of the Kabatiansky-Levenshtein bound in hyperbolic space. The resulting bound is exponentially better than the best bound previously known in hyperbolic space.
  4. We develop the theory of hyperbolic linear programming bounds and prove that they too subsume the Kabatiansky-Levenshtein approach. Packing in hyperbolic space is much more difficult to handle than in Euclidean space, primarily because the volume of an expanding ball grows exponentially with its radius (instead of polynomially as in the case of Euclidean space), so we that cannot neglect boundary fluctuations. In fact, it is a non-trivial matter to even define the density of a packing (there are examples of packings for which any reasonable definition of density should yield two different answers).

Further reading

Some articles on the history and background of the sphere packing problem:

Popular audience:

General mathematical audience:

Energy-minimizing error-correcting codes

Henry Cohn and I just uploaded to the arXiv our paper “Energy-minimizing error-correcting codes.”

Hamming cubeGeometry has long played a key role in coding theory, starting with the work of Hamming: binary codes can be viewed as packings of Hamming balls in a discrete cube. This framework provides a powerful analogy between discrete and continuous packing problems, which has been extensively developed and remains an active research topic. In our paper, we extend the analogy to a much broader relationship between coding theory and discrete models of physics. Of course, physics is related to coding theory in many ways, ranging from connections between spin glasses and codes to the statistical physics of belief propagation and other applications of graphical models to coding theory. Applications of physics to coding theory typically focus on the limit as the block length tends to infinity. Instead, in this paper we show that certain classical codes are exact ground states of natural physics models.

In addition to extending the analogy with continuous packing problems, our results can be thought of as addressing a philosophical problem. Many classical codes—such as Hamming, Golay, or Reed-Solomon codes—remain very popular, despite the many other good codes that have been found. Why should this be? One obvious answer is that these codes are particularly beautiful and useful, especially given the simplicity of their constructions. Another is that they were discovered early in the development of coding theory and had a chance to cement their place in the canon. We propose a third explanation: a code is most useful if it is robust, in the sense that it optimizes not just one specific measure of quality, but rather a wide range of them simultaneously. We will prove that these classical codes have a rare form of robustness that we call universal optimality, based on an analogy with continuous optimization problems.

Thomson modelTo see this analogy, recall the following classic problem: given {n} particles on a sphere interacting via some mutually repelling force (the Thomson model for electrons is a good example of this), in what configurations would the particles arrange themselves? The configuration(s) of greatest interest is the ground state, which is the one possessing the least potential energy.

This then leads to the following fundamental problem in extremal geometry: given some metric space (e.g. sphere) and an arbitrary potential function based on the distance between pairs of points, for any positive integer {N}, how should {N} points arrange themselves to minimize the total potential energy of the system?

One might expect that difference choices of potential energies might lead to different ground state configurations—and this is usually the case. However, there are some highly symmetric configurations which are highly robust in the sense that they minimize not just a single potential function, but a broad class of potential functions.

What kind of potential functions should we be looking at? For example, the electric potential function in {{\mathbb R}^3} is {f(r) = 1/r}. The natural generalization to {{\mathbb R}^n} is {f(r) = 1/r^{n-2}}. However, we are also interested in more general potentials. Ideally, the potential function should possess certain properties. First, it should be decreasing with distance (since we’re interested in repelling forces; the ground state with an attractive force isn’t so interesting, as all particles would collapse into one point). The potential function should also be convex, since the effect of the force should diminish with distance. We can express these two conditions as {f' \leq 0} and {f'' \geq 0} respectively. Let us extrapolate from these two conditions and impose similar conditions on higher order derivatives, namely, {f''' \leq 0}, {f^{(4)} \geq 0}, etc. Note that all power laws {f(r) = r^{\alpha}} for {\alpha < 0} satisfy these conditions.

The work of Cohn and Kumar studied precise this class potential functions, which they called completely monotonic, for points on a sphere in {{\mathbb R}^n} (actually, for technical reasons, they consider functions {f} of the square of the distance between pairs of points). They studied configurations of points on a sphere which minimize all completely monotonic potential function. Examples of such configurations include the regular simplex, the cross polytope, the icosahedron, the 600-cell, roots of the {E_8}-lattice, and the minimal vectors of the Leech lattice. These beautiful and highly symmetric configurations all have very robust energy minimization properties.

In our current work, we analyze error-correcting codes using this perspective of energy minimization. Binary error-correcting codes can be thought of set of particles on a high-dimensional cube. We are interested in finding out which error-correcting codes minimize a broad class of potential energies? The class of potential function is the discrete analogue of the completely monotonic functions described above, with the conditions on derivatives replaced by conditions on successive finite differences.

As mention in the beginning, we show that many classical codes have robust minimization properties, which translate into good performance according to a broad range of measures. For example, such codes minimize the probability of an undetected error under the symmetric channel, and it also has interesting consequences for other decoding error probabilities.

Our main technical tool for bounding energy is the linear program developed by Delsarte, which was originally used to bound the size of codes given their minimum distance. We will call a code LP universally optimal if its universal optimality follows from these bounds. One of our key results is that LP universal optimality behaves well under duality, thereby allowing us to apply our criteria to many classes of codes.

One result we found particularly surprising is that LP universally optimal codes continue to minimize energy even after we remove a single codeword. We know of no analogue of this property in the continuous setting. This property also implies structural properties, namely that every LP universally optimal code is distance regular, i.e., for each distance, every codeword has the same number of codewords at that distance.

On replica symmetry of large deviations in random graphs

Eyal Lubetzky and I just uploaded to the arXiv our new paper “On replica symmetry of large deviations in random graphs.” In this paper we answer the following question of Chatterjee and Varadhan:

Question. Fix {0<p<r<1}. Let {n} be a large integer and let {G} be an instance of the Erdős-Rényi random graph {\mathcal G(n,p)} conditioned on the rare event that {G} has at least as many triangles as the typical {\mathcal G(n,r)}. Does {G} look like a typical {\mathcal G(n,r)}?

Here the Erdős-Rényi random graph {\mathcal G(n,p)} is formed by taking {n} vertices and adding every possible edge independently with probability {p}. Here “look like” means close in cut-distance (but we won’t give a precise definition in this blog post). In this case, saying that {G} is close to {\mathcal G(n,r)} is roughly equivalent to saying that every not-too-small subset of vertices (at least constant fraction in size) of {G} induce a subgraph with edge density close to {r}.

Another way to phrase the question is: what is the reason for {G} having too many triangles? Is it because it has an overwhelming number of edges uniformly distributed, or some fewer edges arranged in a special structure, e.g., a clique.

Via a beautiful new framework by Chatterjee and Varadhan for large deviation principles in {\mathcal G(n,p)}, we give a complete answer to the above question.

The answer, as it turns out, depends on {(p,r)}. See the plot below. For {(p,r)} in the blue region, the answer is yes, and for {(p,r)} in the red region, the answer is no.

Does {G} looks like a {\mathcal G(n,r)}?

The phase transition behavior has already been observed previously by Chatterjee and Varadhan, but the determination of the exact precise phase boundary is new. Borrowing language from statistical physics, the blue region where the conditional random graph is close to {\mathcal G(n,r)} is called the replica symmetric phase, and the red region is called the symmetry breaking phase. Note that in the left part of the plot, as we fix a small value of {p}, the model experiences a double phase transition as {r} increases from {p} to {1} — starting first in the blue phase, then switches to the red phase, and then switches back to the blue phase.

More generally, our result works for any {d}-regular graph in replace of triangles. The boundary curve depends only on {d}, and they are plotted below for the first few values of {d}. In particular, this means that large deviation for triangles and 4-cycles share the same phase boundary. A pretty surprising fact! We also consider the model where we condition on the largest eigenvalue of the graph being too large, and the phase boundary also turns out to be the same as that of triangles.

We also derive similar results for large deviations in the number of linear hypergraphs in a random hypergraph.

The phase boundary for {d}-regular graphs

Exponential random graphs

We also studied the exponential random graph model. This is a widely studied graph model, motivated in part by applications in social networks. The idea is to bias the distribution of the random graph to favor those with, say, more triangles. This model has a similar flavor to the model considered above where we condition on the random graph having lots of triangles.

We consider a random graph {G} on {n} vertices drawn from the distribution

\displaystyle p_\beta(G) \propto \exp\left(\tbinom{n}{2}(\beta_1 t(K_2, G) + \beta_2 t(K_3, G))\right)\,,

where {t(K_2, G)} and {t(K_3, G)} are the edge density and the triangle density of {G}, respectively. When {\beta_2 = 0}, this model coincides with the Erdős-Rényi model {\mathcal G(n,p)} with some {p} depending on {\beta_1}. We only consider the case {\beta_2 > 0}, which represents a positive bias in the triangle count.

As shown by Bhamidi, Bresler and Sly and Chatterjee and Diaconis, when {n} is large, a typical random graph drawn from the distribution has a trivial structure — essentially the same one as an Erdős-Rényi random graph with a suitable edge density. This somewhat disappointing conclusion accounts for some of the practical difficulties with statistical parameter estimation for such models. In our work, we propose a natural generalization that will enable the exponential model to exhibit a nontrivial structure instead of the previously observed Erdős-Rényi behavior.

Here is our generalization. Consider the exponential random graph model which includes an additional exponent {\alpha>0} in the triangle density term:

\displaystyle p_{\alpha,\beta}(G) \propto \exp\left(\tbinom{n}{2}(\beta_1 t(K_2, G) + \beta_2 t(K_3, G)^\alpha)\right) \, .

When {\alpha \geq 2/3}, the generalized model features the Erdős-Rényi behavior, similar to the previously observed case of {\alpha = 1}. However, for {0< \alpha < 2/3}, there exist regions of values of {(\beta_1, \beta_2)} for which a typical random graph drawn from this distribution has symmetry breaking, which was rather unexpected given earlier results. For example, we know that there is symmetry breaking in the shaded regions in the plots below. (The blue curve indicates a discontinuity in the model which we won’t discuss in this blog post.)

Symmetry breaking in the new exponential graph model

Continue reading

Graph regularity

In this blog post I will give a brief introduction to Szemerédi’s Regularity Lemma, a powerful tool in graph theory. The post is based on a talk I gave earlier today at a graduate student lunch seminar.

Consider the following problem. Suppose you’re given a very large graph. The graph has so many vertices that you won’t be able to access all of them. But nevertheless you want to find out certain things about the graph. These situations come up in real world applications. Perhaps we would like to know something about a social network, e.g., Facebook, but we don’t have the resource to go through every single node, as there are simply too many of them. For the purpose of this blog post though, we won’t talk about applications and instead stick to the mathematics.

Suppose we are interested answering the following question about the very large graph:

Is the graph triangle-free?

Think of the given graph as a black box. We have the following access to the graph: we are allowed to randomly sample some number of vertices and be told of all the edges between these vertices.

Can we achieve the desired goal? Well, if the graph contains, say, only a single triangle, then it’s pretty much a hopeless task, since we are almost certainly never going to find the single needle in this giant haystack through random sampling. So we have to be content with a more modest objective.

Can we distinguish a graph that’s triangle-free from a graph that is {\epsilon}-far from triangle-free?

Being {\epsilon}-far from a property means that we would have to add/delete at least {\epsilon n^2} edges from the graph to make it satisfy that property. Here {n} is the number of vertices in the very large graph. Note that this model puts us in the setting of dense graphs, i.e., graphs with {\Omega(n^2)} edges.

This problem we know how to solve. The algorithm is very straightforward: sample some constant number of vertices, and check to see if you see any triangles.

Algorithm: Sample {C_\epsilon} (some constant depending on {\epsilon}) vertices

  • If a triangle is detected, then output that the graph is not triangle-free.
  • If no triangle is detected, then output that the graph is triangle-free

If the given graph is triangle-free, then clearly we won’t ever detect any triangles, so the algorithm always outputs the correct answer. But what if the given graph is not triangle-free? We said earlier that in this case we’ll assume the graph is {\epsilon}-far from triangle free. We want the algorithm to detect at least one triangle so that it can give the correct. However, the randomized nature of the algorithm means that there will be some probability that the output will be erroneous. We are claiming that this error probability is small.

This claim seems very innocent. Essentially we need to show that if a graph cannot be made triangle-free by deleting a small number of edges, then it must not contain very many triangles. If you haven’t seen this claim before, you might think that it’s something that would follow from some easy deductions, and you might be tempted to work it out yourself. However, be warned that you will almost certainly not succeed. The claim is indeed correct, but it is far from trivial.

Continue reading

The critical window for the classical Ramsey-Turán problem

[Update: Po-Shen Loh gave a wonderful talk in Banff about our paper. Here is the video.]

Jacob Fox, Po-Shen Loh and I recently uploaded to the arXiv our new paper “The critical window for the classical Ramsey-Turán problem.” This paper revisits the following celebrated Ramsey-Turán result first proved by Szemerédi in 1972.

Theorem 1 (Szemerédi 1972) For every {\epsilon > 0}, there exists a {\delta > 0} such that any graph on {n} vertices containing no 4-clique and no independent set of size greater than {\delta n} has at most {(1/8 + \epsilon)n^2} edges.

Turán’s theorem tells us that if we just want to construct a {K_4}-free graph on {n} vertices with the largest possible number of edges, then we should divide the {n} vertices into three nearly-equal parts and put in all possible edges across parts. This yields roughly {n^2/3} edges. However, this graph contains very large independent sets with {n/3} vertices. So the independent set restriction (which gives the “Ramsey” part of Ramsey-Turán) rules out this construction and forces the optimal graphs to take on a very different structure.

At the time of Szemerédi’s 1972 result, most people believed that a {K_4}-free graph with largest independent set of size {o(n)} has to be much more sparse, perhaps with only {o(n^2)} edges. So it was a surprise when, four years later, Bollobás and Erdős gave a clever geometric construction, based on the isoperimetric inequality for the high dimensional sphere, that showed that the bound given by Szemerédi is essentially optimal.

Theorem 2 (Bollobás and Erdős 1976) There exists a {K_4}-free graph on {n} vertices with largest independent set of size {o(n)} and having {(1/8 - o(1))n^2} edges.

Bollobás and Erdős asked what happens in the critical window, when the number of edges is about {n^2/8}. This problem has received considerable attention (e.g., it was featured in a paper by Erdős from 1990 titled “Some of my favourite unsolved problems”). One of the difficulties here is that Szemerédi’s original proof used the regularity lemma, which is a powerful tool but one that unfortunately gives very poor parameter dependencies. To obtain results for the Ramsey-Turán problem at the desired precision, a regularity-free proof is needed. We give a new proof of the classical Ramsey-Turán result that avoids the use of regularity. This allows us to obtain much better and in fact nearly-optimal dependencies for the classical Ramsey-Turán problem in the critical window near {n^2/8} edges, and solve several longstanding open problems in this area.

Continue reading

Extremal results for sparse pseudorandom graphs

David Conlon, Jacob Fox and I have just uploaded to the arXiv our paper Extremal results in sparse pseudorandom graphs. The main advance of this paper is a sparse extension of the counting lemma associated to Szemerédi’s regularity lemma, allowing us to extend a wide range of classical extremal and Ramsey results to sparse pseudorandom graphs.

An important trend in modern combinatorics research is in extending classical results to the sparse setting. For instance, Szemerédi’s theorem says that every subset of the integers with positive density contains arbitrarily long arithmetic progressions. The celebrated result of Green and Tao says that the primes also contain arbitrarily long arithmetic progressions. While the primes have zero density in the integers, they may be placed inside a pseudorandom set of “almost primes” with positive relative density. Green and Tao established a transference principle, allowing them to apply Szemerédi’s theorem as a black box to the sparse setting. Our work has a similar theme. We establish a transference principle extending many classical extremal graph theoretic results to sparse pseudorandom graphs.

One of the most powerful tools in extremal graph theory is Szemerédi’s regularity lemma. Roughly speaking, it says that every large graph can be partitioned into a bounded number of roughly equally-sized parts so that the graph is random-like between pairs of parts. With this tool in hand, many important results in extremal graph theory can be proven using a three-step recipe, known as the regularity method:

  1. Starting with any graph {G}, apply Szemerédi’s regularity lemma to obtain a regular partition;
  2. Clean up the graph and create an associated reduced graph. Solve an easier problem in the reduced graph;
  3. Apply the counting lemma. Profit.

The counting lemma is a result that says that the number of embeddings of a fixed graph (e.g., a triangle) into the regular partition is roughly what you would expect if the large graph were actually random. The original version of Szemerédi’s regularity lemma is useful only for dense graphs. Kohayakawa and Rödl later independently developed regularity lemmas for sparse graphs. However, for sparse extensions of the applications, the counting lemma remained a key missing ingredient and an important open problem in the field. Our main advance lies in a counting lemma that complements the sparse regularity lemma.

Continue reading