It has been truly a pleasure working with Ashwin, Mehtaab, and David on this project. I have learned so much from them.

Here are some highlights of our new results.

**1. Graph colorings.** Among *d-*regular graphs, which graph has the most number of proper *q*-colorings, exponentially normalized by the number of vertices of *G*?

We show that the extremal graph is the complete bipartite graph *K _{d,d}* (or disjoint unions thereof, as taking disjoint copies does not change the quantity due to the normalization).

**2. Graph homomorphisms.** Among *d*-regular graphs, which triangle-free graph *G* has the most number of homomorphisms into a fixed *H*, again exponentially normalized by the number of vertices of *G*?

We show that the extremal graph *G* is the complete bipartite graph *K _{d,d}*.

The triangle-free hypothesis on *G* is best possible in general. For certain specific *H, *such as *K _{q}*, corresponding to proper

**3. Partition functions of Ferromagnetic spin models.** Among *d*-regular graphs, which graph maximizes the log-partition function per vertex of a given Ferromagnetic spin model (e.g., Ising model)?

We show that the extremal graph is the clique *K _{d}*

For each setting, we establish our results more generally for irregular graphs as well, similar to our earlier work on independent sets.

Our results can be interpreted as a reverse form of the inequality in **Sidorenko’s conjecture**, an important open problem in graph theory stating a certain positive correlation on two-variable functions.

One can also view our results as a graphical analog of the **Brascamp-Lieb inequalities**, a central result in analysis.

This paper resolves one of my favorite open problems on this topic (the number of graph colorings). It also points us to many other open problems. Let me conclude by highlighting one of them (mentioned in #2 earlier). I’ll state it in a simpler form for *d*-regular graphs, but it can be stated more generally as well.

**Open problem.** Classify all *H* with the following property: among* **d*-regular graphs *G*, the number of homomorphisms from *G* to *H*, exponentially normalized by the number of vertices of *G*, is maximized by *G* = *K _{d,d}*.

Our results show that *H* = *K _{q}* works, even when some of its vertices are looped. Generalizing this case, we conjecture that all antiferromagnetic models

**Ashwin** is a freshman at MIT, **Mehtaab** is a sophomore at MIT, and **David** is a junior at Harvard.

The paper solves a conjecture made by Jeff Kahn in 2001 concerning the number of independent sets in a graph.

An **independent set** of a graph is a subset of vertices with no two adjacent. For example, here is the complete list of independent sets of a cycle of length 4:

Since many problems in combinatorics can be naturally phrased in terms of independent sets in a graph or hypergraph, getting good bounds on the number of independent sets is a problem of central importance.

Instead of giving the exact statement of the conjecture (now theorem), which you can find in the abstract of our paper, let me highlight a specific instance of a problem addressed by the theorem:

Consider the family of graphs with no isolated vertices and

- exactly a third of the edges have degree 3 on both endpoints, and
- a third of the edges have degree 4 on both endpoints,
- the remaining third of the edges have degree 3 on one endpoint and degree 4 on the other endpoint.
What is the smallest constant such that every -vertex graph satisfying the above properties has at most independent sets?

In other words, letting denote the number of independent sets and the number of vertices of , maximize the quantity over all graphs in the above family.

(See the end of this post for the answer)

In summer 2009, as an undergraduate participating in the wonderful Research Experience for Undergraduates (REU) run by Joe Gallian in Duluth, MN, I learned about Kahn’s conjecture (somewhat accidentally actually, as I was working on a different problem that eventually needed some bounds on the number of independent sets, and so I looked up the literature and learned, slightly frustratingly at the time, that it was an open problem). It was on this problem that I had written one of my very first math research papers.

I had been thinking about this problem on and off since then. A couple years ago, Joe Gallian invited me to write an article for a special issue of *the American Mathematical Monthly* dedicated to undergraduate research (see my previous blog post on this article), where I described old and new developments on the problem and collected a long list of open problems and conjectures on the topic (one of them being Kahn’s conjecture).

This spring, I had the privilege with working with Ashwin, Mehtaab, and David, three energetic and fearless undergraduate students, finally turning Kahn’s conjecture into a theorem. *Fearless* indeed, as our proof ended up involving quite a formidable amount of computation (especially for such an innocent looking problem), and my three coauthors deserve credit for doing the heavy-lifting. I’ve been told that there had been many late night marathon sessions in an MIT Building 2 classroom where they tore apart one inequality after another.

This is certainly not the end of the story, as many more interesting problems on the subject remain unsolved—my favorite one being the analogous problem for colorings, e.g., instead of counting the number of independent sets, how about we count the number of ways to color the vertices of the graph with 3 colors so that no two adjacent vertices receive the same color? (See my previous blog post for some more discussion.)

Anyway, here is the the answer to the question above. The number of independent sets in an -vertex graph satisfying the above properties is at most , where the optimal constant for is , or, more helpfully, the maximum is attained by the following graph (in general, the theorem says that the maximizer is always a disjoint union of complete bipartite graphs):

]]>The earliest result in **extremal graph theory** is usually credited to Mantel, who proved, in 1907, that a graph on vertices with no triangles contains at most edges, where the maximum is achieved for a complete bipartite graph with half of the vertices on one side and half on the other side. Much more is now known about the subject. While I initially encountered Mantel’s theorem as a high school student preparing for math contests, my first in-depth exposure to extremal graph theory was from taking a course by David Conlon during my year reading Part III at Cambridge (one can find excellent course notes on his website).

We seem to know less about sparse graphs (a general mantra in combinatorics, it seems). Let us focus on *-regular* graphs, which are graphs where every vertex has degree .

An *independent set* in a graph is a subset of vertices with no two adjacent. Many combinatorial problems can be reformulated in terms of independent sets by setting up a graph where edges represent forbidden relations.

**Question.** *In the family of -regular graph of the same size, which graph has the most number of independent sets?*

This question was raised by Andrew Granville in the 1988 Number Theory Conference at Banff, in an effort to resolve a problem in combinatorial number theory, namely the Cameron–-Erdős conjecture on the number of sum-free sets. The question appeared first in print in a paper by Noga Alon, who proved an asymptotic upper bound and speculated that, at least when is divisible by , the maximum should be attained by a disjoint union of complete bipartite graphs .

Some ten years later, Jeff Kahn arrived at the same conjecture while studying a problem arising from statistical physics. Using a beautiful entropy argument, Kahn proved the conjecture under the additional assumption that the graph is already bipartite.

Fast forward another nearly ten years. In the summer of 2009, during my last summer as an undergraduate, I spent a fun and productive summer attending Joe Gallian’s REU in Duluth, MN (a fantastic program, by the way!), and there I showed that Kahn’s theorem can be extended to all regular graphs, not just bipartite ones.

Here is the theorem statement. We write to denote the set of independent sets in , and the number of independent sets in .

**Theorem (Kahn, Z.)** * If is an -vertex -regular graph, then
*

In the survey, I provide an exposition of the proofs of these two theorems as well as a discussion of subsequent developments. Notably, Davies, Jenssen, Perkins, and Roberts recently gave a brand new proof of the above theorems by introducing a powerful new technique, called the *occupancy method*, inspired by ideas in statistical physics, and it already has a number of surprising new consequences.

On a personal level, I am pleased to see this topic gaining renewed interest. (Also, somewhat to my chagrin, my undergraduate paper, according to Google Scholar, still remains my most cited paper to date.)

I shall close this blog post with one of my favorite open problems in this area.

Let denote the number of proper -colorings of , i.e., coloring the vertices of with colors, so that no two adjacent vertices receive the same color.

**Conjecture.** * If is an -vertex -regular graph, and , then*

I was pleased to learn from Will Perkins, who gave a talk at Oxford last week, that he, along with Davies, Jenssen, and Roberts, recently proved the conjecture for 3-regular graphs. The proof uses the occupancy method that they developed earlier. The method is reminiscent of the flag algebra machinery developed by Razborov some ten years ago for solving extremal problems in dense graphs. The new method can be seen as some kind of “flag algebra for sparse graphs”. I find this development quite exciting, and I expect that more news will come.

]]>**The upper tail problem for triangles.** What is the probability that the number of triangles in an Erdős-Rényi graph graph is at least twice its mean, or more generally, larger than the mean by a factor of ?

This problem has a fairly rich history, and it is considered a difficult problem in probabilistic combinatorics. In a 2002 paper titled *The infamous upper tail*, Janson and Ruciński surveyed the progress on the problem and described its challenges (they also considered the upper tail problem more generally for any subgraph , but we focus on triangles here). Research towards the solution of this problem has led to new techniques for bounding probabilities of rare events. Note that in contrast, the lower tail problem — finding the probability of getting too *few* triangles — is comparatively easier.

Here we consider the case when tends to zero as grows (the dense case, where is fixed, leads to an interesting variational problem involving graph limits, as shown by a seminal work of Chatterjee and Varadhan — see my previous blog post). It is not too hard to describe what the answer *should* be in the sparse setting. Suppose we arbitrarily select a set of vertices and force it to be a clique, that is, we force all edges between these vertices to be present in the random graph — the probability “cost” of this constraint is — the cliques gives us roughly an additional triangles, thereby boosting the total number of triangles in the graph by a factor of from the mean, which is .

Let denote the triangle density in a graph , i.e., the number of triangles in as a fraction of the maximum possible number of triangles. The above argument shows that the upper tail probability satisfies

Is this lower bound tight? That is, can we find a matching upper bound for the upper tail probability? This is precisely the *infamous upper tail* problem mentioned earlier. For a long time there was no satisfactory upper bound anywhere close to the lower bound. A breakthrough by Kim and Vu in 2004 gave an upper bound of the form . However, this still leaves a missing factor of in the exponent. This problem was finally settled in 2012 by Chatterjee and independently by DeMarco and Kahn. They showed that the exponent in the probability has order , thereby filling in the “missing log.”

Now we know the correct order in the exponent, what remains unknown is the constant in front of the . In a very recent breakthrough by Chatterjee and Dembo (the preprint was released just January this year), they provided a framework that reduces this large deviation problem to a certain variational problem, at least in the range when for some explicit (though small) . The variational problem is the natural one associated to this problem, and it is expected to hold for much sparser random graphs as well.

Eyal and I have been working on this variational problem for quite some time, starting with our previous paper where we studied the problem in the dense setting (constant ) and determined the phase diagram for replica symmetry. In our new paper, we solve this variational problem in the sparse regime. Combining the solution of this variational problem with the new results of Chatterjee and Dembo, this determines the correct asymptotic in the exponent of the upper tail probabilities, at least when . For fixed , we can now prove that

where we determine the constant in the exponent to be

We described earlier the story behind the term (by forcing a clique of size ). Where does the other term, , come from? It turns out there is another way to get a lots of triangles relatively cheaply, at least when : forcing a set of vertices to be connected to all other vertices. This has probability . The first construction (forcing a clique) competes with the second construction (forcing a complete bipartite graph): the former is preferable when and the latter is preferable when . It turns out, as we show in our paper, that these are essentially the only constructions.

]]>**Relative Szemerédi theorem**

- D. Conlon, J. Fox and Y. Zhao, A relative Szemerédi theorem.
- Y. Zhao, An arithmetic transference proof of a relative Szemerédi theorem.

**Multidimensional Szemerédi theorem in the primes**

- T. Tao and T. Ziegler, A multi-dimensional Szemerédi theorem for the primes via a correspondence principle.
- B. Cook, Á. Magyar and T. Titichetrakun, A multidimensional Szemerédi theorem in the primes.
- J. Fox and Y. Zhao, A short proof of the multidimensional Szemerédi theorem in the primes.

——–

We start the story with Szemerédi’s famous result.

**Szemerédi’s theorem.** Every subset of integers with positive density contains arbitrarily long arithmetic progressions.

Green and Tao proved their famous theorem extending Szemerédi’s theorem to the primes.

**Green-Tao theorem.** The primes contain arbitrarily long arithmetic progressions. In fact, any subset of the primes with positive relative density contains arbitrarily long arithmetic progressions.

An important idea in Green and Tao’s work is the **transference principle**. They transfer Szemerédi’s theorem as a black box to the sparse setting, so that it can be applied to subsets of a sparse pseudorandom set of integers (in this case, some carefully designed enveloping set for the primes).

In my recent paper with David Conlon and Jacob Fox, we gave a new simplified approach to proving the Green-Tao theorem. In particular, we established a new **relative Szemerédi theorem**, which required simpler pseudorandomness hypotheses compared to Green and Tao’s original proof. Roughly speaking, a relative Szemerédi theorem is a result of the following form.

**Relative Szemerédi theorem (roughly speaking).** Let *S* be a pseudorandom subset of integers. Then any subset of *S* with positive relative density contains long arithmetic progressions.

The original proof in our paper followed the hypergraph approach, and in particular used the hypergraph removal lemma, a deep combinatorial result, as a black box. Subsequently, I wrote up a short six-page note showing how to prove the same result by directly transferring Szemerédi’s theorem, without going through hypergraph removal lemma. The former approach is more powerful and more general, while the latter approach is more direct and gives better quantitative bounds.

Next we shift our attention to higher dimensions. Furstenberg and Katznelson proved, using ergodic-theoretic techniques, a multidimensional generalization of Szemerédi’s theorem.

**Multidimensional Szemerédi theorem **(Furstenberg and Katznelson)**.** Any subset of **Z*** ^{d}* with positive density contains arbitrary constellations.

Here a *constellation* means some finite set *R* of lattice points, and the theorem says that the subset of **Z*** ^{d}* contains some translation of a dilation of

(This result always reminds me of a lovely scene in the movie *A Beautiful Mind* where the John Nash character traces out shapes among the stars.)

Tao proved a beautiful extension for the Gaussian primes.

**Theorem **(Tao)**.** The Gaussian primes contain arbitrary constellations.

In that paper, Tao also made the following interesting conjecture: let **P** denote the primes. Then any subset of **P** × **P** of positive relative density contains arbitrary constellations. This statement can be viewed as a hybrid of the Green-Tao theorem and the multidimensional Szemerédi theorem. Recently this conjecture was resolved by Tao and Ziegler, and independently by Cook, Magyar, and Titichetrakun.

**Multidimensional Szemerédi theorem in the primes. **Let **P** denote the primes. Then any subset of **P*** ^{d}* with positive density contains arbitrary constellations.

Terry Tao has written a blog post about this new result where he describes the ideas in the proofs (so I won’t repeat too much). Tao and Ziegler proved their result via ergodic theory. Their proof is about 19 pages long, and they assume as black boxes two powerful results: (1) Furstenberg and Katznelson’s multidimensional Szemerédi theorem (more precisely their equivalent ergodic-theoretic formulation) and (2) the landmark results of Green, Tao, and Ziegler on linear equations in primes (and related work on the Mobius-nilsequences conjecture and the inverse conjecture on the Gowers norm). Both these results are deep and powerful hammers.

In contrast, Cook, Magyar, and Titichetrakun proceeded differently as they develop a sparse extension of hypergraph regularity method from scratch, without assuming previous deep results. Though their paper is much longer, at 44 pages.

In a recent paper by Jacob Fox and myself, we give a new and very short proof of the same result, assuming the same tools as the Tao-Ziegler proof. Our paper is only 4 pages long, and it uses a very simple sampling argument described in two paragraphs on the first page of our paper (all the ideas are on the first page; the rest of the paper just contains the technical details spelled out). I invite you to read the first page of our paper to learn the very short proof of the theorem.

]]>Before describing our work, let me to take a detour to reflect on some recent news in number theory.

The past couple weeks were filled with exciting developments on prime numbers. Yitang Zhang, in a breakthrough that caught the mathematical community by complete surprise, proved that there exists infinitely many pairs of primes with bounded gaps (specifically, less than 70 million). This is a giant leap of progress towards the famous twin primes conjecture, which claims that there exist infinitely pairs of primes differing by two.

By an amazing coincidence, in the same day that Zhang’s news became public, Harald Helfgott posted on the arXiv his paper which claims to prove the odd Goldbach conjecture, that every odd integer greater than 5 can be written as a sum of three primes. Previously this was known to be true for sufficiently large odd numbers, specifically those with at least about a thousand digits. It is possible to check the conjecture by computer for “small” cases, but 10^{1000} is far from small. Helfgott’s work brought the “sufficiently large” down to a reasonable threshold, so that all the remaining small cases can indeed be verified by computer search.

Both results are exciting developments. However, we still seem far away from resolving two of the oldest problems about prime numbers. Zhang’s bound of 70 million probably could be brought down by more careful analysis, although getting it all the way down to 2 will probably require some really new ideas. As for Goldbach’s conjecture, there’s significant obstacle in getting down to two primes (the best result in this direction is due to Jingrun Chen), as all existing methods have severe limitations. But then again, the same could have been said about bounded gaps between primes before Zhang’s surprising work. Perhaps one day, we’ll be shocked again with a breakthrough on Goldbach, maybe even using techniques right in front of our eyes.

~~~~~~

Back in 2004, the mathematical community received a surprise on another long-standing open problem on prime numbers: Ben Green and Terry Tao proved that the primes contain arbitrarily long arithmetic progressions.

An arithmetic progression is a sequence of equally spaced numbers. For example, 3, 5, 7 is a 3-term arithmetic progression of prime numbers. To find one with four terms, you need to look a bit further: 5, 11, 17, 23. At the time of Green and Tao’s proof, the longest such progression known was a sequence of 22 numbers, starting with 11410337850553 and with step size 4609098694200. This record has since been improved to 26. Here’s the description from Wikipedia:

As of April 2010, the longest known AP-*k* is an AP-26, found on April 12, 2010 by Benoãt Perichon on a PlayStation 3 with software by Jaroslaw Wroblewski and Geoff Reynolds, ported to the PlayStation 3 by Bryan Little, in a distributed PrimeGrid project.

There is a webpage with current records on primes in arithmetic progressions.

It was a long standing open conjecture that the primes should contain arithmetic progressions of every length. This was a folklore conjecture, with the first recorded instance in history dating back to 1770 by Lagrange and Waring. Green and Tao proved this conjecture, and their breakthrough is considered one of the greatest mathematical achievements of the twenty-first century. This work was an important part of Tao’s 2006 Fields Medal award.

~~~~~~

I was still in high school when the Green-Tao theorem was announced. The news spread quickly online, and I soon heard of it through my math competition online community. I remember being quite excited about the news. It was a quintessential mathematical breakthrough. The statement of the problem could be easily understood, yet the solution involved very deep mathematics. I fell in love with the statement of the Green-Tao theorem. It became one of my favorite mathematical results, and I aspired to one day understand the mathematics behind it.

Now, nine years later, as a graduate student, I find it deeply personally fulfilling to be working on a problem central to the Green-Tao theorem. Our latest paper (joint with Conlon and Fox) strengthens the main technical result in the work of Green and Tao.

To describe this result, I need to first tell you about another important mathematical breakthrough by Endre Szemerédi dating back to the early 1970’s. Last year, Szemerédi received the Abel Prize, one of the highest lifetime achievement awards in mathematics. Szemerédi is perhaps most famous for his result, commonly referred to as **Szemerédi’s theorem**, which says that *every subset of integers with positive density contains arbitrarily long arithmetic progressions*. So, if *S* is a sufficiently large subset of the integers, “large” meaning that it contains, say, at least 0.1% of the first *N* positive integers for sufficiently large *N*, then *S* necessarily contains an arithmetic progression of every length.

Szemerédi’s theorem is a very deep result, albeit again with a seemingly very innocent statement. It was conjectured by Erdős and Turán in the 1930’s, and remained open for decades. Since Szemerédi’s breakthrough, several other proofs have been discovered. None of the proofs are easy but all of them have deep insights and have spawned into very rich and active areas of mathematical research.

I was personally exposed to Szemerédi’s theorem while at Cambridge University. I spent a year there right after finishing undergrad and took part in what is colloquially known as Part III of the Maths Tripos. One of the components of the program was writing an extended expository essay, and I wrote mine about Szemerédi’s theorem under the supervision of Ben Green. Ben gave me a very high score for the essay. I’ve put the essay on my website, and several people have told me that they found it to be helpful (including one who was preparing a popular article on Szemerédi after his Abel Prize award). Though, to be honest, looking back, perhaps now I would be a bit embarrassed to read that essay again myself, since I wonder how much I actually understood while I was writing it up at the time.

While Szemerédi’s theorem is a powerful result, it is not enough to draw any conclusions about prime numbers. Szemerédi’s theorem only works for sets of integers with positive density. The primes, on the other hand, have density diminishing to zero. Indeed, the Prime Number Theorem tells us that between 1 and *N*, approximately 1 in ln *N* fraction of the numbers are prime. This ratio, 1/ln *N*, diminishes to zero as *N* grows to infinity. So Szemerédi’s theorem doesn’t work here.

The primary innovation of Green and Tao is that they came up with a relative version of Szemerédi’s theorem. To overcome the problem that the primes have zero density, they find a slightly larger set of integers where the primes can sit inside as a subset of positive relative density. This larger host set, roughly speaking, is the set of “almost primes,” which consists of numbers with few prime divisors. They found a way to transfer Szemerédi’s theorem to this relative setting, showing that if we start with a set with some random-like characteristics (in this case, the almost primes), then any subset of it with positive density relative to the host set must necessarily contain long arithmetic progressions.

The Green-Tao paper splits into two parts. In the first part, which is their main technical contribution, they establish a relative Szemerédi theorem for subsets of pseudorandom sets of integers. Their method was heavily influenced by the work of Fields Medalist Timothy Gowers, who revolutionized the field of additive combinatorics by inventing a far-fetching extension of Fourier analysis (also known in this context as the Hardy-Littlewood circle method) to give a novel proof of Szemerédi’s theorem. The second part of Green and Tao’s paper shows that the almost primes act as a suitable host set by verifying the required pseudorandomness conditions. Most of the number theoretic input to their work were credited to the works of Goldston and Yıldırım, which later led to a spectacular breakthrough in the problem of small gaps in primes by Goldston, Pintz, and Yıldırım (which relates back to Zhang’s twin primes breakthrough mentioned at the beginning of this blog post).

The pseudorandomness conditions required in the Green-Tao method are rather involved. It assumes that the host pseudorandom set satisfies a “linear forms condition” as well as a “correlations condition.” Both conditions are essential to Green and Tao’s method of establishing their relative Szemerédi’s theorem. They show that the almost primes satisfy these pseudorandomness conditions, thereby proving the old conjecture about progressions in primes. However, the complicated pseudorandomness hypotheses, while adequate for this application, seem rather contrived and somewhat unsatisfying. This leads to the next question, which has been repeatedly asked since the Green-Tao work: does a relative Szemerédi theorem hold under more natural pseudorandomness hypotheses?

In our new paper, we show that the answer is yes! We prove a relative Szemerédi theorem under a very simple and natural linear forms condition. What we need to assume is a small subset of what Green and Tao assume in their work. This new result provides not only a brand new (and simpler, as we believe) alternative approach to the proof of the Green-Tao theorem, but also, more broadly speaking, a new method for understanding sparse pseudorandom structures. The Green-Tao method has had a large number of applications since its inception almost a decade ago. With our new perspective, perhaps one can go even further.

]]>What’s the most space-efficient way to arrange a collection of identical balls? This is known as the sphere packing problem. It is a very difficult problem with a long and interesting history. This problem in 3-dimensions is known as Kepler conjecture, which says that the the face-centered cubic formation does best. This is basically how grocers stack oranges. The seemingly innocent Kepler conjecture turned out to be extremely difficult, and it was only solved in the late 1990’s by Thomas Hales who gave a very complex proof involving massive computer-aided calculations. I’ve included a few article links at the end of this blog post on the history and background of the sphere packing problem.

Now, what about sphere packing in higher dimensions? Unfortunately, we know very little about what happens beyond three dimensions. No proof of optimality is known in any higher dimension, and there are only a few dozen dimensions in which there are even plausible conjectures for the densest packing. In dimensions 8 and 24 there are upper bounds that are extremely close to the conjectured optima, thanks to the works of Cohn, Elkies, and Kumar [1,2,3] (dimensions 8 and 24 are somehow special because of the existence of highly symmetric lattices known as the lattice in dimension 8 and the Leech lattice in dimension 24). However, in most dimensions we must be content with much cruder bounds.

The current state of art for sphere packing density upper bounds is more or less as follows:

- In dimensions 1, 2, 3, the exact upper bound is known. The result for dimension 3 is due to Hales.
- In low dimensions, namely 4 to 42, Cohn and Elkies improved previous record by Rogers, although there were some recent improvements in dimensions 4,5,6,7,9 by de Laat, Filho, and Vallentin using techniques of semidefinite programming.
- In all high dimensions, namely 43 and above, the best bounds are due to Kabatiansky and Levenshtein in 1978 and have not been improved since then.

The purpose of our of paper is fourfold:

- We give a small improvement over the 1978 bounds of Kabatiansky and Levenshtein in high dimensions by giving a simple modification of their geometric argument relating spherical codes to sphere packings.
- Kabatiansky and Levenshtein derived their bounds by first formulating a linear program for proving upper bounds on spherical codes. Cohn and Elkies found a more direct approach to bounding sphere packing densities, with no need to consider spherical codes. However, despite the excellent performance in low dimensions, the asymptotic behavior of the Cohn-Elkies bound is far from obvious and it has been unclear whether it improves on, or even matches, the Kabatiansky-Levenshtein bound asymptotically. In our paper, we show that in every dimension , the Cohn-Elkies linear program can always match the Kabatiansky-Levenshtein approach. This further demonstrates the power of the linear programming bound for sphere packing.
- We prove an analogue of the Kabatiansky-Levenshtein bound in hyperbolic space. The resulting bound is exponentially better than the best bound previously known in hyperbolic space.
- We develop the theory of hyperbolic linear programming bounds and prove that they too subsume the Kabatiansky-Levenshtein approach. Packing in hyperbolic space is much more difficult to handle than in Euclidean space, primarily because the volume of an expanding ball grows exponentially with its radius (instead of polynomially as in the case of Euclidean space), so we that cannot neglect boundary fluctuations. In fact, it is a non-trivial matter to even define the density of a packing (there are examples of packings for which any reasonable definition of density should yield two different answers).

**Further reading**

Some articles on the history and background of the sphere packing problem:

Popular audience:

- Oddballs: It’s easier to pack spheres in some dimensions than in others by E. Klarreich in
*Science News*(another link). - The 24-dimensional greengrocer by I. Stewart in
*Nature.* - A fine mess by D. Mackenzie in
*New Scientist.* - Kepler’s Conjecture and Hales’ Proof A book review by F. Morgan in
*Notices of the AMS*.

General mathematical audience:

- Cannonballs and Honeycombs by T. Hales in
*Notices of the AMS.* - Kissing Numbers, Sphere Packings, and Some Unexpected Proofs by F. Pfender and G.M. Ziegler in
*Notices of the AMS.*

Geometry has long played a key role in coding theory, starting with the work of Hamming: binary codes can be viewed as packings of Hamming balls in a discrete cube. This framework provides a powerful analogy between discrete and continuous packing problems, which has been extensively developed and remains an active research topic. In our paper, we extend the analogy to a much broader relationship between coding theory and discrete models of physics. Of course, physics is related to coding theory in many ways, ranging from connections between spin glasses and codes to the statistical physics of belief propagation and other applications of graphical models to coding theory. Applications of physics to coding theory typically focus on the limit as the block length tends to infinity. Instead, in this paper we show that certain classical codes are exact ground states of natural physics models.

In addition to extending the analogy with continuous packing problems, our results can be thought of as addressing a philosophical problem. Many classical codes—such as Hamming, Golay, or Reed-Solomon codes—remain very popular, despite the many other good codes that have been found. Why should this be? One obvious answer is that these codes are particularly beautiful and useful, especially given the simplicity of their constructions. Another is that they were discovered early in the development of coding theory and had a chance to cement their place in the canon. We propose a third explanation: a code is most useful if it is robust, in the sense that it optimizes not just one specific measure of quality, but rather a wide range of them simultaneously. We will prove that these classical codes have a rare form of robustness that we call *universal optimality*, based on an analogy with continuous optimization problems.

To see this analogy, recall the following classic problem: given particles on a sphere interacting via some mutually repelling force (the Thomson model for electrons is a good example of this), in what configurations would the particles arrange themselves? The configuration(s) of greatest interest is the *ground state*, which is the one possessing the least potential energy.

This then leads to the following fundamental problem in extremal geometry: given some metric space (e.g. sphere) and an arbitrary potential function based on the distance between pairs of points, for any positive integer , how should points arrange themselves to minimize the total potential energy of the system?

One might expect that difference choices of potential energies might lead to different ground state configurations—and this is usually the case. However, there are some highly symmetric configurations which are highly robust in the sense that they minimize not just a single potential function, but a broad class of potential functions.

What kind of potential functions should we be looking at? For example, the electric potential function in is . The natural generalization to is . However, we are also interested in more general potentials. Ideally, the potential function should possess certain properties. First, it should be decreasing with distance (since we’re interested in repelling forces; the ground state with an attractive force isn’t so interesting, as all particles would collapse into one point). The potential function should also be convex, since the effect of the force should diminish with distance. We can express these two conditions as and respectively. Let us extrapolate from these two conditions and impose similar conditions on higher order derivatives, namely, , , etc. Note that all power laws for satisfy these conditions.

The work of Cohn and Kumar studied precise this class potential functions, which they called *completely monotonic*, for points on a sphere in (actually, for technical reasons, they consider functions of the square of the distance between pairs of points). They studied configurations of points on a sphere which minimize all completely monotonic potential function. Examples of such configurations include the regular simplex, the cross polytope, the icosahedron, the 600-cell, roots of the -lattice, and the minimal vectors of the Leech lattice. These beautiful and highly symmetric configurations all have very robust energy minimization properties.

In our current work, we analyze error-correcting codes using this perspective of energy minimization. Binary error-correcting codes can be thought of set of particles on a high-dimensional cube. We are interested in finding out which error-correcting codes minimize a broad class of potential energies? The class of potential function is the discrete analogue of the completely monotonic functions described above, with the conditions on derivatives replaced by conditions on successive finite differences.

As mention in the beginning, we show that many classical codes have robust minimization properties, which translate into good performance according to a broad range of measures. For example, such codes minimize the probability of an undetected error under the symmetric channel, and it also has interesting consequences for other decoding error probabilities.

Our main technical tool for bounding energy is the linear program developed by Delsarte, which was originally used to bound the size of codes given their minimum distance. We will call a code *LP universally optimal* if its universal optimality follows from these bounds. One of our key results is that LP universal optimality behaves well under duality, thereby allowing us to apply our criteria to many classes of codes.

One result we found particularly surprising is that LP universally optimal codes continue to minimize energy even after we remove a single codeword. We know of no analogue of this property in the continuous setting. This property also implies structural properties, namely that every LP universally optimal code is distance regular, i.e., for each distance, every codeword has the same number of codewords at that distance.

]]>**Question.** Fix . Let be a large integer and let be an instance of the Erdős-Rényi random graph conditioned on the rare event that has at least as many triangles as the typical . Does look like a typical ?

Here the Erdős-Rényi random graph is formed by taking vertices and adding every possible edge independently with probability . Here “look like” means close in cut-distance (but we won’t give a precise definition in this blog post). In this case, saying that is close to is roughly equivalent to saying that every not-too-small subset of vertices (at least constant fraction in size) of induce a subgraph with edge density close to .

Another way to phrase the question is: what is the *reason* for having too many triangles? Is it because it has an overwhelming number of edges uniformly distributed, or some fewer edges arranged in a special structure, e.g., a clique.

Via a beautiful new framework by Chatterjee and Varadhan for large deviation principles in , we give a complete answer to the above question.

The answer, as it turns out, depends on . See the plot below. For in the blue region, the answer is yes, and for in the red region, the answer is no.

Does looks like a ?

The phase transition behavior has already been observed previously by Chatterjee and Varadhan, but the determination of the exact precise phase boundary is new. Borrowing language from statistical physics, the blue region where the conditional random graph is close to is called the *replica symmetric* phase, and the red region is called the *symmetry breaking* phase. Note that in the left part of the plot, as we fix a small value of , the model experiences a double phase transition as increases from to — starting first in the blue phase, then switches to the red phase, and then switches back to the blue phase.

More generally, our result works for any -regular graph in replace of triangles. The boundary curve depends only on , and they are plotted below for the first few values of . In particular, this means that large deviation for triangles and 4-cycles share the same phase boundary. A pretty surprising fact! We also consider the model where we condition on the largest eigenvalue of the graph being too large, and the phase boundary also turns out to be the same as that of triangles.

We also derive similar results for large deviations in the number of linear hypergraphs in a random hypergraph.

The phase boundary for -regular graphs

We also studied the exponential random graph model. This is a widely studied graph model, motivated in part by applications in social networks. The idea is to bias the distribution of the random graph to favor those with, say, more triangles. This model has a similar flavor to the model considered above where we condition on the random graph having lots of triangles.

We consider a random graph on vertices drawn from the distribution

where and are the edge density and the triangle density of , respectively. When , this model coincides with the Erdős-Rényi model with some depending on . We only consider the case , which represents a positive bias in the triangle count.

As shown by Bhamidi, Bresler and Sly and Chatterjee and Diaconis, when is large, a typical random graph drawn from the distribution has a trivial structure — essentially the same one as an Erdős-Rényi random graph with a suitable edge density. This somewhat disappointing conclusion accounts for some of the practical difficulties with statistical parameter estimation for such models. In our work, we propose a natural generalization that will enable the exponential model to exhibit a nontrivial structure instead of the previously observed Erdős-Rényi behavior.

Here is our generalization. Consider the exponential random graph model which includes an additional exponent in the triangle density term:

When , the generalized model features the Erdős-Rényi behavior, similar to the previously observed case of . However, for , there exist regions of values of for which a typical random graph drawn from this distribution has symmetry breaking, which was rather unexpected given earlier results. For example, we know that there is symmetry breaking in the shaded regions in the plots below. (The blue curve indicates a discontinuity in the model which we won’t discuss in this blog post.)

Symmetry breaking in the new exponential graph model

Our proof of the phase boundary result for large deviations in triangle counts is quite short, so I’ll present it here.

The framework developed by Chatterjee and Varadhan reduces the problem of large deviations in random graphs to solving a variational problem in graph limits. *Graph limits*, also called *graphons*, are symmetric measurable functions which are, in some sense, scaling limits of adjacency matrices of graphes. The values of represent local edge densities (there are several caveats in this description but we will not go into the details here). For example, the constant function is the limit of as . These objects were developed by Lovász and collaborators to understand the structure of very large graphs.

The relative entropy function defined by

for and is the rate function associated to the binomial distribution with probability . We can extend to graphons by defining

We’ll omit the limits of integrals from now on.

The triangle density in a graphon is given by the expression

The Chatterjee-Varadhan framework reduces the problem of large deviations of triangle counts (the question stated in the beginning) to the following variational problem.

**Variational problem:** Minimize subject to .

It is known that the distribution of the random graph drawn from and conditioned on the event of having triangle density at least , in the limit as , converges to the set of graph limits which are the minimizers of the variational problem.

In particular we would like to know whether the constant function minimizes . If it does (and if it’s the unique minimizer), then we are in the replica symmetric phase, and the random graph converges to in cut-distance. On the other hand, if there exists some function with a strictly smaller and satisfying the constraint, then we are in the symmetry breaking phase.

Thus it remains to consider the variational problem. The replica symmetry phase is determined by the following two lemmas.

**Lemma 1** * For any symmetric measurable , we have
*

*Proof:* This is an exercise in the Cauchy-Schwarz inequality.

**Lemma 2** * Let be such that lies on the convex minorant of . Then any symmetric measurable with satisfies . *

*Proof:* Lemma 1 implies

Define and let be the convex minorant of . Since convex and increasing on , by Jensen’s inequality we have

It can be shown that the hypotheses for in Lemma 3 is equivalent to

It turns out that this region is exactly the replica symmetric phase. It is the blue region in the first plot.

To show that everything else is in the symmetry breaking phase, we need to show that the constant function is not a minimizer of the variational problem. The following lemma captures this fact.

**Lemma 3** * Let be such that does not lie on the convex minorant of . Then there exists a symmetric measurable with and satisfying . *

*Proof:* (Sketch) The idea is to construct by slightly perturbing the constant graphon in a way that roughly preserves the triangle count but at the same time decreases the rate function with the help of the nonconvexity of .

See figure below for an illustration of this construction. The plot of is drawn not-to-scale to highlight its convexity features.

Since does not lie on the convex minorant of , there necessarily exist such that the point lies strictly above the line segment joining and . Letting be such that

Let and define

noting that for for any sufficiently small . Define by

It remains to check that and for sufficiently small . This is a straightforward calculation whose details we omit here.

This completes the proof the phase boundary for triangles. The proof for -regular graphs is similar, except that we need to replace Lemma 1 by a generalized Hölder’s inequality. We refer the reader to our paper for details.

Unfortunately, in the symmetry breaking phase where the constant function does not solve the variational problem, we do not know extremal solutions. In fact, we do not know how to solve the variational problem for any single in the symmetry breaking phase. This remains a tantalizing open problem.

]]>Consider the following problem. Suppose you’re given a very large graph. The graph has so many vertices that you won’t be able to access all of them. But nevertheless you want to find out certain things about the graph. These situations come up in real world applications. Perhaps we would like to know something about a social network, e.g., Facebook, but we don’t have the resource to go through every single node, as there are simply too many of them. For the purpose of this blog post though, we won’t talk about applications and instead stick to the mathematics.

Suppose we are interested answering the following question about the very large graph:

*Is the graph triangle-free?*

Think of the given graph as a black box. We have the following access to the graph: we are allowed to randomly sample some number of vertices and be told of all the edges between these vertices.

Can we achieve the desired goal? Well, if the graph contains, say, only a single triangle, then it’s pretty much a hopeless task, since we are almost certainly never going to find the single needle in this giant haystack through random sampling. So we have to be content with a more modest objective.

*Can we distinguish a graph that’s triangle-free from a graph that is -far from triangle-free?*

Being -far from a property means that we would have to add/delete at least edges from the graph to make it satisfy that property. Here is the number of vertices in the very large graph. Note that this model puts us in the setting of dense graphs, i.e., graphs with edges.

This problem we know how to solve. The algorithm is very straightforward: sample some constant number of vertices, and check to see if you see any triangles.

**Algorithm:** Sample (some constant depending on ) vertices

- If a triangle is detected, then output that the graph is not triangle-free.
- If no triangle is detected, then output that the graph is triangle-free

If the given graph is triangle-free, then clearly we won’t ever detect any triangles, so the algorithm always outputs the correct answer. But what if the given graph is not triangle-free? We said earlier that in this case we’ll assume the graph is -far from triangle free. We want the algorithm to detect at least one triangle so that it can give the correct. However, the randomized nature of the algorithm means that there will be some probability that the output will be erroneous. We are claiming that this error probability is small.

This claim seems very innocent. Essentially we need to show that if a graph cannot be made triangle-free by deleting a small number of edges, then it must not contain very many triangles. If you haven’t seen this claim before, you might think that it’s something that would follow from some easy deductions, and you might be tempted to work it out yourself. However, be warned that you will almost certainly not succeed. The claim is indeed correct, but it is far from trivial.

The proof of the correctness of the algorithm essentially boils down to the following result due to Ruzsa and Szemerédi from 1978.

**Theorem 1 (Triangle Removal Lemma)** * For every there exists a such that if a graph on vertices is -far from triangle-free then it contains at least triangles. *

The Triangle Removal Lemma implies that the algorithm is outputs the correct answer with high probability. Indeed, if there at least triangles, then by sampling enough triples of vertices (depending on ), we are likely to encounter at least one triangle.

The result is called Triangle Removal Lemma because it is usually stated in the contrapositive: if a graph has fewer than triangles, then it can be made triangle-free by deleting some edges.

The Triangle Removal Lemma is far from trivial, and you probably won’t be able to do it with just bare hands. In fact, until some recent work of Jacob Fox, the only way anyone knew how to prove the triangle removal lemma was via Szemerédi’s Regularity Lemma.

So what is Szemerédi’s Regularity Lemma?

Roughly speaking, Szemerédi’s Regularity Lemma says that **the vertex set of every large graph can be decomposed into a bounded a number of roughly equally-sized parts so that the graph is random-like between most pairs of parts**. It gives a structural classification of all large dense graphs, and it has proven to be a very powerful tool in combinatorics.

Let me now explain the regularity lemma in more detail. What do we mean by “random-like?” This notion is captured in the following definition, which basically says that the edge density between two fairly large vertex subsets should be approximately the same as the overall edge density.

For (disjoint) vertex sets and , let denote the number of edges with one endpoint in and the other endpoint in , and let denote the edge density between and .

**Definition 2** * A bipartite graph between and is said to be -regular if for all and with and we have .*

Now that we know what it means for a bipartite graph to be -regular, we can now say what an -regular partition means.

**Definition 3** * A partition of vertices of a graph into equal parts (the difference in size of two parts is at most 1) is said to be -regular if all but pairs of parts induce -regular bipartite graphs. *

Now we are ready to state Szemerédi’s Regularity Lemma.

**Szemerédi’s Regularity Lemma** *For every , there is some such that every graph has an -regular partition into at most parts. *

A key point here is that the number of parts does not depend on the size of the graph—so this is indeed a result that can be applied to very large graphs. The bound on the number of parts depends only on (what is this dependence? We’ll come back to this point later). Also note that the edge-densities do not have to be the same for all the bipartite graphs.

So why is Szemerédi’s Regularity Lemma a useful result? What can a regular partition do for you?

Intuitively the regularity lemma partitions the graph into some number of parts so that the graph, in some sense, behaves as if it were a random graph with prescribed densities between the parts. For example, in a random graph with edge density , we would expect the triangle density to be roughly . The same turns out to be true for the regular partition.

One benefit of a regular partition is that we have a **counting lemma**, which roughly says that the number of embeddings of small graphs into a regular partition is roughly the same as if there underlying graph were generated randomly. For example, without going into the details, the **triangle counting lemma** says that if we have three parts , and the graph is -regular between each pair, then the number of triangles with , and is roughly , the same as if the underlying graph were random.

With the powerful tool of Szemerédi’s Regularity Lemma at hand, we can now sketch a short proof of the Triangle Removal Lemma.

*Proof sketch of Triangle Removal Lemma.* Suppose is -far from triangle-free. Apply Szemerédi’s regularity lemma and obtain an -regular partition of the vertices of into at most parts. “Clean up” the graph by deleting all edges that are (1) internal to any part, or (2) between any irregular pair of parts, or (3) between any two parts with edge density less than . In each step we delete at most edges, so we’ve deleted fewer than edges. Since was -far from triangle-free, it follows that even after deleting these edges, some triangle must have still remained, say between parts , , . We have not deleted any edges between the parts , , , and the bipartite graphs between them are regular and have density bounded from below, and it therefore follows by the triangle counting lemma that there approximately triangles (to make this rigorous we need to give a lower bound on the number of triangles). By choosing appropriately, say , we see that contains at least triangles.

The example I just gave about testing for triangles in a very large graph is known as property testing. It is a well-studied topic in computer science.

Next I want to show how to deduce Roth’s Theorem from the Triangle Removal Lemma.

**Roth’s Theorem** *Every subset of the integers with positive density contains a 3-term arithmetic progression. *

Roth’s Theorem is a special case of Szemerédi’s Theorem, which states that every subset of the integers with positive density contains arbitrarily long arithmetic progressions. Szemerédi proved his theorem in 1975 using his regularity lemma. Several new proofs have been discovered since then.

To prove Roth’s Theorem using the Triangle Removal Lemma, we need to construct an auxiliary graph so that the 3-term arithmetic progressions are encoded by triangles in the graph.

*Proof sketch of Roth’s Theorem.* Suppose has (upper) density at least , and that does not contain any 3-term arithmetic progression. There exists arbitrarily large so that at least fraction of the elements of are in . Let . Construct a tripartite graph with vertex sets , each with nodes labeled by . Connect with if and only if . Connect with if and only if . Finally connect with if and only if .

Then , , form a triangle if and only if . The numbers form an arithmetic progression, and since contains no 3-term arithmetic progressions, these three numbers must equal. It follows all triangles in have the form , where , and these triangles are all edge disjoint. Since there are at least such edge-disjoint triangles, it must be -far from triangle-free (recall that there are vertices in our graph), and hence by the Triangle Removal Lemma, for some (depending on only), there at least triangles. But there are only at most triangles (all of the form ), and for sufficiently large. This is a contradiction.

Graph regularity is an active research area. Let me conclude by outlining some of the directions and goals in this area.

**Understanding the limitations of the regularity lemma**In the regularity partition, how does the number of parts depend on the regularity parameter ? As it turns out, the dependence is very bad. The proof of Szemerédi’s Regularity Lemma gives usThis bound is simply astronomical. Could it be possible that maaybe another proof of Szemerédi’s Regularity Lemma could give us a more reasonable bound? It turns out that the answer is no. As shown by Gowers and Conlon-Fox, in general an -regular partition necessarily requires essentially this number of parts (up to the exponent of in the height of the tower). So the tower-type bound is needed. As a consequence, any application of the regularity lemma necessarily yields terrible bounds and quantitative dependencies. This is a major drawback of the regularity lemma.

**Extending the regularity lemma.**Szemerédi’s regularity lemma gave us a terrible bound. However, in some applications we do not need the full strength of conclusion the regularity lemma. Weaker versions of the regularity lemma have been formulated that gives much better bounds on the number of parts (e.g., ) at the cost of weaker conclusions. Examples of these weaker regularity lemmas include Frieze-Kannan and Duke-Lefman-Rödl. In the other direction, there is a stronger version of the regularity lemma, given by Alon, Fischer, Krivelevich, and Szegedy, where a stronger conclusion is obtained at the cost of even worse bounds. The bound required for the strong regularity lemma is of wowzer type, which is one level higher than tower of exponentials in the Ackermann hierarchy. At the extreme end of this direction, if we really don’t care about bounds on the number of parts, there is an analytic version of the Szemerédi’s Regularity Lemma by Lovász and Szegedy, which can be stated beautifully as the compactness of some space of graph limits.As we saw, Roth’s Theorem follows rather quickly from triangle removal lemma. There is a similarly short deduction of the more general Szemerédi’s Theorem on -term arithmetic progressions if we have removal lemmas for hypergraphs.**Hypergraph regularity**turns out to be a much more difficult matter, and it was developed independently by Gowers and Rödl-Schacht. It is not easy to even state the conclusion of the hypergraph regularity lemma.The original version of Szemerédi’s Regularity Lemma was only useful for dense graphs, and it is natural to extend it to

**sparse graphs**. An extension of the regularity lemma for sparse graphs was developed independently by Kohayakawa and Rödl in the 90’s, but it was only recently, in a work by Conlon, Fox, and myself, that a general counting lemma for the sparse extension was developed.**Finding new applications of regularity.**The regularity lemma has shown to be a very powerful tool in combinatorics as well as other areas. In this post I talked about a couple of the applications, but there are many more old and new applications in variety of settings, too numerous to be listed here. We are still very much interested in discover new and clever applications of graph regularity.**Avoiding regularity.**How can we get rid of regularity in the proofs that use regularity? This might initially strike as a strange goal given how much we’re trying to promote the power of regularity. However, as mentioned earlier, one of the main drawbacks of regularity is that it gives terrible bounds. A theorem originally proven with regularity might not actually need regularity for its proof, and an alternate proof might give much better and perhaps more useful bounds. For example, Gowers’ Fourier analytic proof of Szemerédi’s Theorem gave us much better bounds than those from Szemerédi’s original proof. Fox’s new proof of the Triangle Removal Lemma gives a better bound than the one from regularity and thus it shows that Szemerédi’s Regularity Lemma is fundamentally not necessarily for the proof. Another example is the recently work by Fox, Loh, and myself on a result of Szemerédi on the classical Ramsey-Turán problem.

**Further readings**

- For more background on the regularity lemma, see the excellent surveys by Komlós and Simonovits and Rodl and Schacht.
- I formally learned Szemerédi’s Regularity Lemma through David Conlon’s Cambridge Part III course on Extremal Graph Theory. His notes are available here.