In this blog post I will give a brief introduction to Szemerédi’s Regularity Lemma, a powerful tool in graph theory. The post is based on a talk I gave earlier today at a graduate student lunch seminar.
Consider the following problem. Suppose you’re given a very large graph. The graph has so many vertices that you won’t be able to access all of them. But nevertheless you want to find out certain things about the graph. These situations come up in real world applications. Perhaps we would like to know something about a social network, e.g., Facebook, but we don’t have the resource to go through every single node, as there are simply too many of them. For the purpose of this blog post though, we won’t talk about applications and instead stick to the mathematics.
Suppose we are interested answering the following question about the very large graph:
Is the graph triangle-free?
Think of the given graph as a black box. We have the following access to the graph: we are allowed to randomly sample some number of vertices and be told of all the edges between these vertices.
Can we achieve the desired goal? Well, if the graph contains, say, only a single triangle, then it’s pretty much a hopeless task, since we are almost certainly never going to find the single needle in this giant haystack through random sampling. So we have to be content with a more modest objective.
Can we distinguish a graph that’s triangle-free from a graph that is -far from triangle-free?
Being -far from a property means that we would have to add/delete at least edges from the graph to make it satisfy that property. Here is the number of vertices in the very large graph. Note that this model puts us in the setting of dense graphs, i.e., graphs with edges.
This problem we know how to solve. The algorithm is very straightforward: sample some constant number of vertices, and check to see if you see any triangles.
Algorithm: Sample (some constant depending on ) vertices
- If a triangle is detected, then output that the graph is not triangle-free.
- If no triangle is detected, then output that the graph is triangle-free
If the given graph is triangle-free, then clearly we won’t ever detect any triangles, so the algorithm always outputs the correct answer. But what if the given graph is not triangle-free? We said earlier that in this case we’ll assume the graph is -far from triangle free. We want the algorithm to detect at least one triangle so that it can give the correct. However, the randomized nature of the algorithm means that there will be some probability that the output will be erroneous. We are claiming that this error probability is small.
This claim seems very innocent. Essentially we need to show that if a graph cannot be made triangle-free by deleting a small number of edges, then it must not contain very many triangles. If you haven’t seen this claim before, you might think that it’s something that would follow from some easy deductions, and you might be tempted to work it out yourself. However, be warned that you will almost certainly not succeed. The claim is indeed correct, but it is far from trivial.