Fixed-Parameter Tractability of Multicut Parameterized by the Size of the Cutset Dániel Marx ∗ Institut für Informatik Humboldt-Universität zu Berlin, Germany dmarx@cs.bme.hu Igor Razgon † Department of Computer Science University of Leicester, UK ir45@mcs.le.ac.uk ABSTRACT Given an undirected graph G, a collection {(s1,t1),...,(sk,tk)} of pairs of vertices, and an integer p, the EDGE MULTICUT problem ask if there is a set S of at most p edges such that the removal of S disconnects every si from the corresponding ti. VERTEX MULTICUT is the analogous problem where S is a set of at most p vertices. Our main result is that both problems can be solved in time 2O(p3 ) · nO(1), i.e., fixed-parameter tractable parameterized by the size p of the cutset in the solution. By contrast, it is unlikely that an algorithm with running time of the form f(p) · nO(1) exists for the directed version of the problem, as we show it to be W[1]-hard parameterized by the size of the cutset. Categories and Subject Descriptors F.2 [Theory of Computing]: Analysis of Algorithms and Problem Complexity General Terms Algorithms Keywords multicut, fixed-parameter tractability 1. INTRODUCTION From the classical results of Ford and Fulkerson on minimum s −t cuts [16] to the more recent O( √ logn)-approximation algorithms for sparsest cut problems [35, 1, 14], the study of cut and separation problems have a deep and rich theory. One well-studied problem in this area is the EDGE MULTICUT problem: given a ∗Research supported in part by ERC Advanced grant DMMCA, the Alexander von Humboldt Foundation, and the Hungarian National Research Fund (Grant Number OTKA 67651). †Research supported in part by Science Foundation Ireland grant 05/IN/I886 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. STOC’11, June 6–8, 2011, San Jose, California, USA. Copyright 2011 ACM 978-1-4503-0691-1/11/06 ...$10.00. graph G and pairs of vertices (s1,t1), ..., (sk,tk), remove a minimum set of edges such that every si is disconnected from its corresponding ti for every 1 ≤ i ≤ k. For k = 1, EDGE MULTICUT is the classical s − t cut problem and can be solved in polynomial time. For k = 2, EDGE MULTICUT remains polynomial-time solvable [37], but it becomes NP-hard for every fixed k ≥ 3 [11]. EDGE MULTICUT can be approximated within a factor of O(logk) in polynomial time [17] (even in the weighted case where the goal is to minimize the total weight of the removed edges). However, under the Unique Games Conjecture of Khot [24], no constant factor approximation is possible [7]. One can analogously define the VERTEX MULTICUT problem, where the task is to remove a minimum set of vertices. An easy reduction shows that the vertex version is more general than the edge version. Using brute force, one can decide in time nO(p) if a solution of size at most p exists. Our main result is a more efficient exact algorithm for small values of p (the O∗ notation hides factors that are polynomial in the input size): THEOREM 1.1. Given an instance of VERTEX MULTICUT or EDGE MULTICUT and an integer p, one can find in time O∗(2O(p3 )) a solution of size p, if such a solution exists. That is, we prove that VERTEX MULTICUT and EDGE MULTICUT are fixed-parameter tractable parameterized by the size p of the solution, resolving a very challenging open question in the area of parameterized complexity1. (Recall that a problem is fixed-parameter tractable (FPT) with a particular parameter p if it can be solved in time f(p) · nO(1), where f is an arbitrary function depending only on p; see [13, 15, 31] for more background). The question was first asked explicitly perhaps in [26]; it has been restated more recently as an open problem in e.g., [20, 8]. Our result shows in particular that multicut is polynomial-time solvable if the size of the optimum solution is O( 3 √ logn) (where n is the input size). One reason why multicut is a fundamental problem is that it is able to express several other problems. It has been observed that a correlation clustering problem called FUZZY CLUSTER EDITING can be reduced to (and in fact, equivalent with) EDGE MULTICUT [3, 12, 2]. Our results show that FUZZY CLUSTER EDITING is FPT parameterized by the editing cost, settling this open problem discussed e.g., in [3]. Related results. The fixed-parameter tractability of multicut and related problems has been thoroughly investigated in the literature. EDGE MULTICUT is NP-hard on trees, but it is known to be FPT, parameterized by the maximum number p of edges that can be deleted, and admits a polynomial kernel [5, 21]. Multi- 1Independently of our work and using very different techniques Bousquet et al. [4] (see also their paper in this volume) also proved the fixed-parameter tractability of multicut. 469 cut problems were studied in [20] for certain restricted classes of graphs. For general graphs, VERTEX MULTICUT is FPT if both p and and the number of terminal pairs k are chosen as parameters (i.e, the problem can be solved in time f(p,k) · nO(1) [27, 36, 19] for some function f). The algorithm of Theorem 1.1 is superior to these result in the sense that the running time depends polynomially on the number of terminals, and the exponential dependence is restricted to the parameter p. For the special case of MULTIWAY CUT (where terminals in a set T have to be pairwise separated form each other), algorithms with running time of the form f(p) · nO(1) were already known [27, 8, 19], but apparently these algorithms do not generalize in an easy way to multicut. An FPT 2-approximation algorithm was given in [28] for EDGE MULTICUT: in time O∗(2O(plog p)), one can find a solution of size 2p if a solution of size p exists. There is no obvious FPT algorithm for the problem even on bounded-treewidth graphs, although one can obtain linear-time algorithms if the treewidth remains bounded after adding an edge siti for each terminal pair [18, 32]. A PTAS is known for bounded-degree graphs of bounded treewidth [6]. Our techniques. The first two steps of our algorithm follows [28]. We start by an opening step that is standard in the design of FPT algorithms. Instead of solving the original VERTEX MULTICUT problem, we solve the compression version of the problem, where the input contains a solution W of size p+1, and the task is to find a solution of size p (if exists). A standard argument called iterative compression [34, 23] shows that if the compression problem is FPT, then the original problem is FPT. Alternatively, we can use the polynomial-time approximation algorithm of Gupta [22], which produces a solution W of size p2 if a solution of size p exists. In this case, O(p2) iterations of the compression algorithm gives a solution of size p. Next, as in [28], we try to reduce the compression problem to ALMOST 2SAT (delete k clauses to make a 2-CNF formula satisfiable; also known as 2CNF DELETION), which is known to be FPT [33]. However, our 2SAT formulation is very different from the one in [28]: we introduce a single variable xv only for each vertex of G, while in [28] there is a variable xv,w for every v ∈ V(G) and w ∈ W. This simpler reduction to ALMOST 2SAT is correct only if the instance satisfies two quite special properties: (1) every component of G\W is adjacent to at most two vertices of W (“has at most two legs”), and (2) there is a solution S such that every component of G\S contains a vertex of W (“no vertex is isolated from W after removing the solution”). The main part of the paper is devoted to achieving these properties. In order to achieve property (1), we show by an analysis of cuts and performing appropriate branchings that the set W can be extended in such a way that every component has at most two legs (Section 5). To achieve property (2), we describe a nontrivial way of sampling random subset of vertices such that if we remove this subset by a certain contraction operation (taking the torso of the graph), then without changing the solution, we get rid of the parts not reachable from W with some positive probability (Section 4). This random sampling uses the concept of “important separators,” which was introduced in [27], and has been implicitly used in [9, 33, 8] in the design of parameterized algorithms. We consider the random selection of important separators the main new technical idea of the paper. Subsequently to the current paper, the technique was applied in the very different context of clustering [25]. Directed graphs. Having resolved the fixed-parameter tractability of VERTEX MULTICUT, the next obvious question is what happens on directed graphs. Note that for directed graphs, the edge and vertex versions are equivalent. In directed graphs, multicut becomes much harder to approximate: there is no polynomial-time 2log1−ε n-approximation for any ε > 0, unless NP ⊆ ZPP [10]. From the fixed-parameter tractability point of view, the directed version of the problem received particular attention because DIRECTED FEEDBACK VERTEX SET or DFVS (delete p vertices to make the graph acyclic) can be reduced to DIRECTED MULTICUT. The fixedparameter tractability of DFVS had been a longstanding open question in the area of parameterized complexity until it was solved by Chen et al. [9] recently. The main idea that led to the solution is that DFVS can be reduced to a variant (in fact, special case) of DIRECTED MULTICUT called SKEW MULTICUT, where the task is to break every path from si to tj for every i > j. By showing that SKEW MULTICUT if FPT parameterized by the size of the solution, Chen et al. [9] proved the fixed-parameter tractability of DFVS. We show that, unlike SKEW MULTICUT, the general DIRECTED MULTICUT problem is unlikely to be FPT (see the full version [29]). THEOREM 1.2. DIRECTED MULTICUT is W[1]-hard parameterized by the size p of the solution. Theorem 1.2 leaves open a number of interesting questions. Is DIRECTED MULTICUT FPT for a fixed (say k = 2 or k = 3) number of terminal pairs? Is it perhaps FPT parameterized by both the solution size and the number terminals (i.e., is there an f(p,k) · nO(1) time algorithm)? Is the problem easier on acyclic graphs? Is the special case DIRECTED MULTIWAY CUT easier? The fixedparameter tractability of SKEW MULTICUT suggests that it is not unreasonable to expect a positive answer to at least some of these questions. The study of approximation algorithms for cut problems uncovered deep mathematical connections. It is possible that the study of these problems from these problems from the viewpoint of parameterized complexity and understanding the extremal combinatorics of small cuts can will lead to further surprising con- nections. 2. PRELIMINARIES Let G be an undirected graph and let T = {(s1,t1),...,(sk,tk)} be a set of terminal pairs. We say that a set S ⊆ V(G) of vertices is a multicut of (G,T) if there is no component of G\S that contains both si and ti for some 1 ≤ i ≤ k (note that it is allowed that S contains si or ti). The central problem of the paper is the following: VERTEX MULTICUT Input: A graph G, an integer p, and a set T of pairs of vertices of G Output: A multicut of (G,T) of size at most p or “NO” if no such multicut exists. 2.1 Compression The first step in the proof of Theorem 1.1 is a standard technique in the design of parameterized algorithm: we define and solve the compression problem, where it is assumed that the input contains a feasible solution of size larger than p. As this technique is standard (and in particular, we follow the approach of [28] for EDGE MULTICUT), we keep this section short and informal. MULTICUT COMPRESSION Input: A graph G, an integer p, a set T of pairs of vertices of G, and a multicut W of (G,T) Output: A multicut of (G,T) of size at most p, or “NO” if no such set S exists. Our main technical contribution is showing that MULTICUT COMPRESSION is FPT parameterized by p and |W|. 470 LEMMA 2.1. MULTICUT COMPRESSION can be solved in time O∗(2O((p+log|W|)3 +|W|log|W|)). Intuitively, it is clear that proving Lemma 2.1 could be easier than solving VERTEX MULTICUT: the extra input W can give us useful structural information about the graph (and as |W| appears in a running time, a large W is also helpful). What’s not obvious is how solving MULTICUT COMPRESSION gives us any help in the solution of the original VERTEX MULTICUT problem. We sketch two methods. Method 1. Let us use the polynomial-time approximation algorithm of Gupta [22] to find a multicut W of size at most c · OPT2, where c is a universal constant and OPT is the minimum size of a multicut. If |W| ≥ c · p2, then we can safely answer “NO”, as there is no multicut of size at most p. Otherwise, we run the algorithm of Lemma 2.1 for this set W to obtain a solution in time O∗(2O((p+log|W|)3 ) = O∗(2O(p3 )). Method 2. The standard technique of iterative compression [34, 23] allows us to reduce VERTEX MULTICUT to at most |V(G)| instances of MULTICUT COMPRESSION with |W| = p+1. This technique was used for the 2-approximation of EDGE MULTICUT in [28] and its application is analogous in our case. Let (G,T, p) be an instance of VERTEX MULTICUT. Suppose thatV(G) = {v1,...,vn} and let Gi = G[{v1,...,vi}]. One by one, we consider the instances (Gi,T, p) in ascending order of i, and for each instance we find a solution Si of size at most p. We start with S0 = /0. For some i > 0, we compute Si provided that Si−1 is already known. Observe that Si−1 ∪{vi} is a multicut of size p+1 for (Gi,T). Thus we can use the algorithm for MULTICUT COMPRESSION, which either returns a multicut Si of (Gi,T) having size at most p or returns “NO”. In the first case, we can continue the iteration with i+1. In the second case, there is no multicut of size p for (G,T) (as there is no such multicut even for (Gi,T)), and hence we can return “NO”. Both methods result in O∗(2O(p3 )) time algorithms. However, we feel it important to mention both, as improvements in Lemma 2.1 might have different effects on the two methods. It will be convenient to work with a slightly modified version of the compression problem. We say that a set S ⊆V(G) is a multiway cut of W ⊆ V(G) if every component of G\S contains at most one vertex of W. MULTICUT COMPRESSION∗ Input: A graph G, an integer p, a set T of pairs of vertices of G, and a multicut W of (G,T) Output: A set S of size at most p such that S ∩W = /0, S is multicut of (G,T) and a multiway cut of W or “NO” if no such set S exists. In Sections 3–5, we prove the this problem is FPT: LEMMA 2.2. MULTICUT COMPRESSION∗ can be solved in time O∗(2O((p+log|W|)3 )). It is not difficult to reduce MULTICUT COMPRESSION to MULTICUT COMPRESSION∗ (an analogous reduction was done in [28] for the the edge case). We briefly sketch such a reduction. In order to solve an instance (G,T,W, p) of MULTICUT COMPRESSION, we first guess the intersection X of the multicut W given in the input and the solution S we are looking for. This guess results in at most ∑ p i=1 |W| i branches; in each branch, we remove the vertices of X from G and decrease p by |X|. Thus in the following, we can restrict our attention to solutions disjoint from W. Next, we branch on all possible partitions (W1,...,Wt) of W, contract each Wi into a single vertex, and solve MULTICUT COMPRESSION∗ on the resulting instance (G ,T ,W , p ). One of the partitions (W1,...,Wt) 3 4 1 1 2 2 2 Figure 1: An instance with 7 components. The strong circles are the vertices of W, the numbers show the number of legs for each component. corresponds to the way the solution S partitions W into connected components, and in this case S is a multiway cut of W in G . Thus if the original MULTICUT COMPRESSION instance has a solution S, then it is a solution of one of the constructed MULTICUT COMPRESSION∗ instances. Conversely, any solution of the constructed instances is a solution of the original instance. As the number of partitions of W can be bounded by |W|O(|W|), the running time claimed in Lemma 2.1 follows from Lemma 2.2. Thus proving Lemma 2.2 implies the main result Theorem 1.1. 2.2 Components and legs Given an instance (G,T,W, p) of MULTICUT COMPRESSION∗, we say that a component C of G\W has -legs if C is adjacent with vertices of W. We say that a component is bipedal if it has two legs. In Sections 3–4, we solve MULTICUT COMPRESSION∗ in the special case where every component has only one or two legs (we will call this special case BIPEDAL MULTICUT COMPRESSION∗). LEMMA 2.3. The BIPEDAL MULTICUT COMPRESSION∗ problem can be solved in time O∗(2O((p+log|W|)3 )). Let I = (G,T,W, p) be an instance of the BIPEDAL MULTICUT COMPRESSION∗problem, and let S be a solution for I. The isolated part of the solution is the set of vertices not reachable from any vertex of W in G \ S. We say that the solution S is nonisolating if the isolated part is empty, i.e., G \ S has exactly |W| components. In Section 3, we show that if the BIPEDAL MULTICUT COMPRESSION∗ instance has a nonisolating solution, then it can be found by a quite intuitive reduction to an FPT prolem ALMOST 2SAT. Next in Section 4, we present a randomized algorithm that modifies the instance such that if a solution exists, then it makes the solution nonisolating with positive probability. The algorithm is based on a randomized contraction of sets defined by “important separators”; we review this concept in Section 2.3. We complete the proof of Lemma 2.3 by derandomizing this algorithm. Finally, in Section 5, we show how the general problem can be reduced to the bipedal case. The reduction is achieved by choosing an appropriate set B in a component with more than two legs and guessing, for each vertex v ∈ B, which vertex of W is reachable from v after removing the solution . Based on these guesses, we can identify each vertex of B with a vertex of W. We prove that if the set B is chosen appropriately, then after a bounded number of branchings, every component has one or two legs. 2.3 Important separators The concept of important separators were introduced in [27] to deal with the multiway cut problem. If X is a set of vertices in 471 graph G, then we denote by NG(X) the neighborhood of X in G and define γG(X) := |NG(X)|. We drop the subscript G if it is clear from the context. Let G be an undirected graph and let X,Y ⊆V(G) be two disjoint sets. A set S ⊆V(G) of vertices is an X −Y separator if S is disjoint from X ∪Y and there is no component2 K of G\S with K ∩X = /0 and K ∩Y = /0. To improve readability, we write s −Y separator instead of {s}−Y separator if s is a single vertex. DEFINITION 2.4. Let X,Y ⊂ V(G) be disjoint sets of vertices, S ⊆ V(G) be an X −Y separator, and let K be the union of every component of G \ S intersecting X. We say that S is an important X −Y separator if it is inclusionwise minimal and there is no X −Y separator S with |S | ≤ |S| such that K ⊃ K, where K is the union of every component of G\S intersecting X. Note that the order of X and Y matters: an important X −Y separator is not necessarily an important Y −X separator. We state without proof some properties of Definition 2.4 that are easy to see: PROPOSITION 2.5. Let G be a graph, X,Y ⊆ V(G) be two disjoint sets of vertices, and S be an important X −Y separator. 1. For every v ∈ S, the set S\{v} is an important X −Y separator in G\v. 2. If S is an X −Y separator for some X ⊃ X, then S is an important X −Y separator. 3. If G[X] is connected, then S is an important X −Y separator for every /0 = X ⊂ X. 4. If S is an X −Y separator in G for some supergraph G of G, then S is an important X −Y separator in G . The number of important separators were bound in [27] (although the notation there is slightly different). A better bound is implicit in [8]. LEMMA 2.6. Let X,Y ⊆ V(G) be disjoint sets of vertices in graph G. For every p ≥ 0, there are at most 4p important X −Y separators of size at most p. Furthermore, we can enumerate all these separators in time O∗(4p). 3. FINDING A NONISOLATING SOLUTION BY REDUCTION TO ALMOST 2SAT The goal of this section is to show that we can solve BIPEDAL MULTICUT COMPRESSION∗ if there is at least one nonisolating solution. In Section 4, we show that it is sufficient to solve the problem under this assumption. Let x1, ..., xn be a set of variables; a literal is either a variable xi or its negation ¯xi. Recall that a 2CNF formula is a conjunction of clauses with at most two literals in each clause, e.g., (¯x1 ∨ x2) ∧ (¯x3) ∧ (x1 ∨ ¯x4). It is well-known that a satisfying assignment for a 2CNF formula can be found in linear time (if exists). However, it is NP-hard to find an assignment that maximizes the number of satisfied clauses, or equivalently, to find a minimum set of clauses whose removal makes the formula satisfiable. Razgon and O’Sullivan [33] gave an O∗(15k) time algorithm for the problem of deciding if a 2CNF formula can be made satisfiable by the deletion of at most k clauses; they call this problem ALMOST 2Throughout this paper, when we refer to a component K of a graph, we consider the set of vertices of this component. We omit saying “the set of vertices of” for the sake of brevity. 2SAT. We need a variant of the result here, where instead of deleting at most k clauses, we are allowed to delete at most k variables. An easy reduction gives an algorithm for this variant. If φ is a 2CNF formula and X is a set of variables, then we denote by φ \X the formula obtained by removing every clause containing a literal of a variable in X. THEOREM 3.1. Given a 2CNF formula φ and an integer k, in time O∗(15k) we can either find a set X of at most k variables such that φ \X is satisfiable, or correctly state that no such set X exists. It is not difficult to reduce finding a nonisolating solution to the problem solved by Theorem 3.1. For each vertex v of G \W, we introduce a variable whose value expresses which leg of the component containing v is reachable from v. This formulation cannot express that a vertex is separated from both legs. However, as we assume that there is a nonisolating solution, we do not have to worry about such vertices. LEMMA 3.2. Let I = (G,T,W, p) be an instance of BIPEDAL MULTICUT COMPRESSION∗ that has a nonisolating solution of size at most p. In time O∗(15p), we can find a (not necessarily nonisolating) solution. PROOF. We encode the BIPEDAL MULTICUT COMPRESSION∗ instance I = (G,T,W, p) as a 2CNF formula φ the following way. For each component C of G \W having two legs, let 0(C) and 1(C) be the two legs. If component C has only one leg, then let 0(C) be this leg, and let 1(C) be undefined. For every vertex v ∈ C, let 0(v) = 0(C) and 1(v) = 1(C). We construct a formula φ whose variables correspond to V(G)\W. The intended meaning of the variables is that v has value b ∈ {0,1} if v is in the same component as b(v) after removing the solution. To enforce this interpretation, φ contains the following clauses: • Group 1: (u → v), (v → u) for every adjacent u,v ∈V(G)\W. • Group 2: If u is a neighbor of b(u) for some b ∈ {0,1}, then there is a clause (u = b). • Group 3: If (u,v) ∈ T, u,v ∈W, and bu (u) = bv (v) for some bu,bv ∈ {0,1}, then there is a clause (u = bu ∨v = bv) (e.g., if 0(u) = 1(v), then the clause is (u∨ ¯v)). • Group 4: If (u,v) ∈ T, u ∈ W, v ∈ W, and b(v) = u for some b ∈ {0,1}, then there is a clause (v = b). This completes the description of φ. Note that no clause is introduced for pairs (u,v) ∈ T with u,v ∈W, but these pairs are automatically separated by a solution that is a multiway cut of W. Furthermore, we can assume that W induces a independent set, otherwise there is no solution. We show first that if I has a nonisolating solution S, then removing the corresponding variables of φ makes it satisfiable. As S is nonisolating and it is a multiway cut of W, every vertex of G \ S is in the same component as exactly one of 0(v) and 1(v); let the value of variable v be b if vertex v is in the same component as b(v). It is clear that this assignment satisfies the clauses in the first two groups. Consider a clause (u = bu ∨v = bv) from the third group. This means that (u,v) ∈ T and bu (u) = bv (v) = w ∈ W. If this clause is not satisfied, then u = bu and v = bv. By the way the assignment was defined, this is only possible if u is in the same component of G\S as bu (u) = w and v is in the same component of G\S as bv (v) = w. Therefore, u and v are in the same component of G\S, contradicting the assumption that S is a solution of I. Clauses in Group 4 can be checked similarly. 472 We have shown that φ can be made satisfiable by the deletion of p variables. By Theorem 3.1, we can find such a set S of variables time O∗(15p). To complete the proof, we show that such a set S corresponds to a (not necessarily nonisolating) solution of I. Let us show first that S is a multiway cut of W. Suppose that there is a path P connecting w0,w1 ∈ W in G \ S . We can assume that the internal vertices of P are disjoint from W, i.e., they are in one component C of G\W with two legs. Thus there is a path P from a neighbor v0 of w0 to a neighbor v1 of w1 in C\S . Suppose without loss of generality that 0(C) = w0 and 1(C) = w1. As the clauses in Group 1 are satisfied, every variable of P has the same value. However, because of the clauses in Group 2, we have xv0 = 0 and xv1 = 1, a contradiction. Therefore, we can assume that S is a multiway cut of W. Suppose now that there is some (u,v) ∈ T such that u,v ∈ W are in the same component of G\S ; let P be a u−v path in G\S . AsW is a multicut of T, it is clear that P goes through at least one vertex of W. We have seen that S is a multiway cut of W, thus P goes through exactly one vertex of W. Let P = P1wP2 for some path P1 that is fully contained in the component of G\W containing u and path P2 fully contained in the component containing v. Let bu,bv ∈ {0,1} be such that bu (u) = bv (v) = w. Group 1 ensures that every variable of P1 has the same value and Group 2 ensures that the last variable of P1 has value bu, thus u = bu. A similar argument shows that v = bv. However, this means that clause (u = bu ∨ v = vu) of Group 3 is not satisfied, a contradiction. Finally, a similar argument shows that the clauses in Group 4 ensure that pairs (u,v) ∈ T with u ∈ W, v ∈ W are separated. 4. MAKING THE SOLUTION NONISOLAT- ING In this section, we present a randomized transformation that, given an instances of BIPEDAL MULTICUT COMPRESSION∗ having a solution, it modifies the instance in such a way that the new instance has a nonisolating solution with probability 2−O(p3 ). The main result of this section is a derandomized version of this trans- formation: LEMMA 4.1. Given an instance I of the BIPEDAL MULTICUT COMPRESSION∗ problem, we can construct in time O∗(2O(p3 )) a set of t = 2O(p3 ) logn instances I1, ..., It such that 1. If I has no solution, then Ii has no solution for any 1 ≤ i ≤ t. 2. If I has a solution, then Ii has a nonisolating solution for at least one 1 ≤ i ≤ t. Thus we can solve BIPEDAL MULTICUT COMPRESSION∗ by constructing the instances I1, ..., It of Lemma 4.1 and applying the algorithm of Lemma 3.2 to each instance. This will prove Lemma 2.3. 4.1 Torsos and nonisolating solutions The randomized transformation can be conveniently described using the operation of taking the torso of a graph. DEFINITION 4.2. Let G be a graph and C ⊆ V(G). The graph torso(G,C) has vertex set C and two vertices a,b ∈ C are adjacent if {a,b} ∈ E(G) or there is a path P in G connecting a and b whose internal vertices are not in C. It is easy to show that this operation preserves separation inside C: PROPOSITION 4.3. Let C ⊆ V(G) be a set of vertices in G and let a,b ∈ C two vertices. A set S ⊆ C separates vertices a and b in torso(G,C) if and only if S separates these vertices in G. PROOF. Let P be a path connecting a and b in G and suppose that P is disjoint from the set S. The path P contains vertices from C and from V(G) \C. If u,v ∈ C are two vertices such that every vertex of P between u and v is from V(G) \C, then by definition there is an edge uv in torso(G,C). Using these edges, we can modify P to obtain a path P that connects a and b in torso(G,C) and avoids S. Conversely, suppose that P is a path connecting a and b in the graph torso(G,C) and it avoids S ⊆ C. If P uses an edge uv that is not present in G, then this means that there is a path connecting u and v whose internal vertices are not in C. Using these paths, we can modify P to obtain a path P that uses only the edges of G. Since S ⊆C, the new vertices on the path are not in S, i.e., P avoids S as well. Let I = (G,W,T, p) be an arbitrary instance of BIPEDAL MULTICUT COMPRESSION∗. Given a set Z ⊆ V(G)\W of vertices, the reduced instance I/Z = (G ,W,T , p) is defined the following way: 1. The graph G is torso(G,V(G)\Z). 2. For every v ∈ V(G), let φ(v) = NG(C) if v belongs to component C of G[Z], and let φ(v) = {v} if v ∈ Z. The set T is obtained by by replacing every pair (x,y) ∈ T with the set of pairs {(x ,y ) | x ∈ φ(x),y ∈ φ(y)}. The main observation is that if we perform this torso operation for a Z that is sufficiently large to cover the isolated part of a hypothetical solution S and sufficiently small to be disjoint from S, then S becomes an nonisolating solution of I/Z. Furthermore, the torso operation is “safe” in the sense that it does not create new solutions. LEMMA 4.4. Let I = (G,T,W, p) be an instance of BIPEDAL MULTICUT COMPRESSION∗ and let Z ⊆ V(G)\W be a set of vertices. If I has no solution, then I/Z has no solution either. Furthermore, if I has a solution S such that Z covers the isolated part and Z ∩S = /0, then S is a nonisolating solution of I/Z. PROOF. Let G and G be the graphs in instances I and I/Z, respectively. To prove the first statement, we show that if S ⊆ V(G ) is a solution of I/Z, then S is a solution of I as well. Suppose that some pair (x,y) of I is not separated by S . Let P be a path in G\S going from x to y. Let x and y be the first and last vertex of P not in Z, respectively, and let P be the subpath of P from x to y . (Note that P cannot be fully contained in Z, as it contains at least one vertex of W.) By the way I/Z is defined, (x ,y ) is a pair in I/Z, hence S separates x and y in G = torso(G,C). Using Prop. 4.3 with C = V(G) \ Z, we get that S separates x and y in G, which is in contradiction with the existence of the path P. A similar argument shows that there is no path in G\S that connects two vertices of W. For the second statement, suppose that S is a solution of I with S∩Z = /0. Let us show that S is a solution of I/Z as well. Suppose that S does not separate x and y in G for some pair (x ,y ) of I/Z. Using Prop. 4.3 with C =V(G)\Z, we get that S does not separate x and y in G, i.e., there is a x − y path P in G \ S. By the way the pairs in I/Z were defined, there is a pair (x,y) of I and there is an x − x path P1 such that x is the only vertex of P1 not in Z, and there is a y − y path P2 such that y is the only vertex of P2 not in Z. Clearly, these paths are disjoint form S. Therefore, the concatenation of P1, P, P2 is an x − y path in G \ S, contradicting that S is a solution of I. To see that S is nonisolating in G , consider a vertex v of G \S. As v ∈ Z is not in the isolated part of the solution S of I, there is a path P in G\S going from v to a vertex w ∈ W. Again by Prop. 4.3, this means that there is a v−w path in G \S as well, which means that v is not in the isolated part of the solution S of I . 473 XS G\X C9 C2 C3 C4 C5 C6 C7 C8 C1 Figure 2: A solution S where the isolated part X consists of 9 important components (the components of G\X and the set W are not shown in the figure). The isolated part is the disjoint union of 5 important clusters: C1 ∪C2, C3 ∪C4, C5, C6 ∪C7 ∪C8, and C9. 4.2 Important components and clusters In light of Lemma 4.4, what we need to do is to guess a set Z that covers the isolated part of a solution. Lemma 4.7 below allows us to restrict our attention to very special sets Z whose boundary is formed from important separators. DEFINITION 4.5. A set C ⊆ V(G) is an important component if G[C] is connected and NG(C) is an important C −W separator of size at most p. For every S ⊆ V(G)\W, the important cluster LS is the (disjoint) union of every important component with N(C) = S. Observe that every important component is contained in exactly one important cluster, i.e, the important clusters form a partition of the important components. Every important C −W separator is an important v −W separator for every v ∈ C (Prop. 2.5(3)). Thus Lemma 2.6 gives a bound on the number of important components that can contain a vertex v. PROPOSITION 4.6. Every vertex v ∈ V(G) \W is contained in at most 4p important components and important clusters. Furthermore, all the important components and clusters can be enumerated in time O∗(4p). Note that the total number of important components and clusters cannot be bounded by a function of p: for example, it may happen that almost every vertex of V(G) \W forms an important component of size 1. In the following lemma, we show that there is a solution where every component of the isolated part is an important component. We cannot bound the number of these important components by a function of p, but they can be partitioned into at most 2p isolated clusters (see Figure 2). This is the reason why we are selecting important clusters instead of important components in Section 4.3. LEMMA 4.7. If instance I = (G,W,T, p) of BIPEDAL MULTICUT COMPRESSION∗ has a solution, then it has an inclusionwise minimal solution where the isolated part is the disjoint union of at most 2p important clusters. PROOF. Let S be a solution of I with isolated part R. Recall that γ(R) is defined as |N(R)|. Let us choose S such that γ(R) is minimum possible, and among such solutions, |R| (or equivalently, |R|+γ(R)) is maximum possible. LetC1, ..., Cs be the components of G[R]. We show that every Ci is an important component. If this is true, then these components can be classified into at most 2|S| ≤ 2p disjoint groups according to the neighborhood N(Ci) ⊆ S. Therefore, each group can be covered by an important cluster. Observe that if C and C are two important components with N(C) = N(C ) and C is in the isolated part, then C has to be in the isolated part as well (in particular, the minimality of S implies that C cannot contain a vertex of S). Therefore, the union of the at most 2p important clusters covering the groups is exactly the isolated part, and we are done. Suppose that some Ci is not important: in this case, there is an important component Ci ⊃ Ci such that γ(Ci) ≤ γ(Ci). Let S := (S \ N(Ci)) ∪ N(Ci); it is clear that |S | ≤ |S|. We claim that S is also a solution of instance I. For this purpose, we first show that every path P connecting a vertex v ∈ R ∪ S with a vertex of W has to go through S . Indeed, path P has to go through a vertex of S by the definition of R. Thus P can be disjoint from S and go through S only if it contains a vertex from N(Ci) \ N(Ci) ⊆ Ci. However, N(Ci) ⊆ S separates every vertex of Ci from W, contradicting the assumption that P is disjoint from S . To show that S is a solution of I , suppose that some x,y ∈ W or some pair (x,y) is not separated by S ; let P be a x − y path in G \ S . Path P has to go through S and a vertex of W, thus by the previous claim, P goes through S . This means that S is a solution of I with |S | ≤ |S| and therefore by the minimality of the solution S, we have |S | = |S|. Note that N(Ci) = N(Ci), hence |S | = |S| is only possible if S = S . Let R be the isolated part in solution S . Again by the previous claim, every vertex of R ∪ S is either in S or separated from W in G\S , thus R∪S ⊆ R ∪S . Suppose first that R∪S = R ∪S . The set S contains exactly those vertices of R ∪ S that have a neighbor outside R ∪ S (by the minimalty of S). The set S has to contain every such vertex (to separate R and W), thus S ⊆ S . Howerver, we have seen that |S| = |S | and S = S , a contradiction. Therefore, R ∪ S ⊂ R ∪ S , and |S | = |S| implies |R | > |R|. This contradicts the maximality of S with respect to the size of the isolated part. Important separators that induce cliques are nested, hence we can get a bound of p instead of 4p for the number of such separators. Lemma 4.10 uses this result to improve the probabilities in the randomized selection. LEMMA 4.8. Every vertex v ∈V(G)\W is contained in at most p important clusters X where N(X) is a clique. PROOF. Assume the opposite. We first show that if X1 and X2 are important components containing v such that N(X1) and N(X2) are cliques, then either X1 ⊆ X2 or X2 ⊆ X1. If X1 \ X2 = /0 and X1 is connected, then there is a vertex x1 ∈ X1 ∩ N(X2). As N(X2) is a clique, every vertex of N(X2) is adjacent with x1, implying that N(X2) ⊆ X1 ∪ N(X1). If X2 \ X1 = /0, then a symmetrical argument shows that N(X1) ⊆ X2 ∪ N(X2). We claim that N(X1 ∪X2) ⊆ N(X1)∩N(X2) and hence γ(X1 ∪X2) ≤ γ(X1),γ(X2); as X1 ∪ X2 ⊃ X1,X2, this would contradict the assumption that X1 and X2 are important components. Consider a vertex u ∈ N(X1 ∪ X2), which must have a neighbor w ∈ X1 ∪X2. If w ∈ X1 ∩X2, then u ∈ N(X1)∩N(X2) and we are done. Suppose without loss of generality that w ∈ X1 \X2. Then u ∈ N(X1) ⊆ X2 ∪N(X2), but u ∈ X2 by definition, hence u has to be in N(X2) as well. We have shown that the important components containing v whose boundaries are cliques form a chain. This means that there are at most p of them, as the boundary sizes must be different. Using that every important component is contained in exactly one important cluster, we get the bound on the number of important clusters. 474 4.3 Randomized selection of sets By Lemmas 4.4 and 4.7, we need to construct a set Z that is the union of at most 2p important clusters. As the number of important clusters cannot be bounded by a function of p, we cannot try all possibilities. Instead of complete enumeration, we randomly select important clusters (possibly much more than 2p) and let Z be their union. What we need is that Z is sufficiently large to cover the at most 2p important clusters of the isolated part given by Lemma 4.7, but Z is sufficiently small to be disjoint from S. We want to set the probabilities such that the probability of this event can be bounded from below by a function of p. First we present a simpler version of the proof, where the probability of success is double exponentially small in p (Lemma 4.9). This simpler proof highlights the main idea of the randomized reduction. In Lemma 4.10, we improve the probability to 2−O(p3 ). LEMMA 4.9. There is a randomized algorithm that, given an instance I = (G,W,T, p) of BIPEDAL MULTICUT COMPRESSION∗, in time O∗(4p) constructs a set Z ⊆ V(G) \W such that if I has a solution, then I/Z has a nonisolating solution with probability at least 2−2O(p) . PROOF. By Lemma 4.7, there is a solution such that the isolated part is the disjoint union of at most 2p important clusters. Let us fix such a hypothetical solution S, and suppose that the isolated part is the disjoint union of important clusters L1, ..., Ls. Let X be the set of all important clusters in G. Let X be a subset of X where every X ∈ X appears with probability 4−p independently at random. Let Z be the union of the sets in X . If the events (E1) Z ∩S = /0, and (E2) Lj ⊆ Z for every 1 ≤ j ≤ s hold, then by Lemma 4.4, I/Z has a nonisolating solution, and we are done. Let us estimate the probability that both (E1) and (E2) hold. Let A = {L1,...,Ls} and let B = {X ∈ X | X ∩S = /0}; we have |A| ≤ 2p and |B| ≤ p·4p (by Lemma 4.8, every vertex of S is contained in at most 4p important clusters). If no member of B is selected, then no set of X contains a vertex of S, and hence Z ∩S = /0. If every member of A is selected, then every Lj is a subset of Z. Therefore, the probability that (E1) and (E2) hold can be bounded from below by the probability of the event that every member of A is selected and no member of B is selected. As A and B are disjoint, this probability is at least (4−p )2p ·(1−4−p )p·4p ≥ 4−p·2p ·e−2p = 2−2O(p) (in the inequality, we use that 1+x ≥ exp(x/(1+x)) for every x > −1 and 1−4−p ≥ 1/2). In order to optimize the success probability, we do the randomized selection of important components in two phases: first we select some important components and add new edges to the graph and in the second phase we restrict our attention to important clusters whose boundaries are cliques. LEMMA 4.10. There is a randomized algorithm that, given an instance I = (G,W,T, p) of BIPEDAL MULTICUT COMPRESSION∗, in time O∗(4p) constructs a set Z ⊆ V(G) \W such that if I has a solution, then I/Z has a nonisolating solution with probability at least 2−O(p3 ). PROOF. The randomized algorithm consists of two phases. By Lemma 4.7, there is a solution S such that the isolated part is the disjoint union of at most 2p important clusters; let us fix such a hypothetical solution S and let R be the isolated part. Let us consider those pairs x,y ∈ S for which there is an x − y path with internal vertices in the isolated part R. For every such pair x,y ∈ S, select a component C of G[R] with x,y ∈ NG(C); let C1, ..., Cq with q ≤ |S| 2 ≤ p 2 be the selected components. Note that every Cj is an important component. Phase 1. Let C be the set of all important components; Prop. 4.6 states that these sets can be enumerated. In the first phase, we select a subset C ⊆ C by putting every C ∈ C into C with probability p1 = 4−p independently at random. Then for every component C ∈ C , we make NG(C) a clique; let G be the graph obtained this way. Let us estimate the probability that the events (E1) every C ∈ C is disjoint from S, (E2) G[R] and G [R] have the same connected components (as vertex sets), and (E3) G [NG (Ci)] is a clique for every 1 ≤ i ≤ q hold. Let A1 = {C1,...,Cq} and let B1 = {C ∈ C | S ∩C = /0}. Clearly, |A1| ≤ p2 and |B1| ≤ |S| · 4p ≤ p · 4p (by Prop. 4.6, every vertex of S can be contained in at most 4p members of C). If no set of B1 is selected, then (E1) holds. If (E1) holds, then we have NG(K) = NG (K) for every component K of G[R]. Indeed, the boundary of K can change only if some x ∈ K and y ∈ K ∪ NG[K] both appear in NG(C) for some C ∈ C , but such a connected C would have to contain a vertex of NG(K) ⊆ S as well, contradicting (E1). Therefore, if (E1) holds, then (E2) holds as well and in particular NG (Ci) = NG(Ci) for every i. Therefore, assuming that (E1) holds and every member of A1 is in C , NG(Ci) = NG (Ci) becomes a clique in G for every 1 ≤ i ≤ s. Thus the probability that (E1–E3) hold can be bounded from below by the probability of the event that every set in A1 is selected and no set from B1 is selected. As the sets A1 and B1 are disjoint, this probability is at least (1−4−p)p·4p ·(4−p)p2 ≥ e−2p ·4−p3 . In the following, we assume that (E1–E3) hold. By assumption, the isolated part of S is the disjoint union of important clusters L1, ..., Ls of G with s ≤ 2p. We claim that every Lj is an important cluster of G . Consider a component K of G[Lj]. By (E2), K is a component of G [R]. Furthermore, K is an important component of G : by Prop. 2.5(4), NG(K) = NG (K) remains an important K − W separator in G . Thus every component of Lj is an important component of G , which means that G has an important cluster Lj ⊇ Lj. This cluster Lj is fully contained in the isolated part R: NG(Lj) ⊆ S and the minimality of S implies that Lj cannot contain a vertex of S. By (E2), G[R] and G [R ] have the same components, thus Lj ⊆ R contains the same components as Lj. Phase 2. Let X be the set of all important clusters X in G for which G [NG (X)] is a clique. We have seen that (E1–E3) implies that X contains every Lj. Let X be a subset of X where every X ∈ X appears with probability p2 = 1−2−p independently at random. Let Z be the union of the sets in X . If the events (E4) Z ∩S = /0, and (E5) Lj ⊆ Z for every 1 ≤ j ≤ s hold, then by Lemma 4.4, I/Z has a nonisolating solution, and we are done. Let us estimate the probability that both (E4) and (E5) hold. Let A2 = {L1,...,Ls} and let B2 = {X ∈ X | X ∩S = /0}; we have |A2| ≤ 2p and |B2| ≤ p2 (by Lemma 4.8, every vertex of S is contained in at most p important clusters whose boundary is a clique). If no 475 member of B2 is selected, then no set of X contains a vertex of S, and hence Z ∩S = /0. If every member of A2 is selected, then every Lj is a subset of Z. Therefore, the probability that (E4) and (E5) hold can be bounded from below by the probability of the event that every member of A2 is selected and no member of B2 is selected, which is at least (2−p)p2 ·(1−2−p)2p ≥ 2−p3 ·e−2. Taking into account the probability of success in both phases, we get that I/Z has a nonisolating solution with probability 2−O(p3 ). As the number of important components is at most 4p|V(G)|, the running time is O∗(4p). 4.4 Derandomization By running 2O(p3 ) times the algorithm of Lemma 4.10, we get a collection of instances that satisfy the requirements of Lemma 4.1 with arbitrary large constant probability. We can derandomize the algorithm of Lemma 4.10 using the standard technique of splitters. Recall that an (n,r,r2)-splitter is a family of functions from [n] to [r2] such that for any subset X ⊆ [n] with |X| = r, one of the functions in the family is injective on X. Naor, Schulman, and Srinivasan [30] gave an explicit construction of an (n,r,r2)-splitter of size O(r6 logrlogn). Observe that in the first phase of the algorithm of Lemma 4.10, a random subset of a universe C of size n1 = |C| ≤ 4p ·n is selected. There is a collection A1 ⊆ C of a1 ≤ p2 sets and a collection B1 ⊆ C of b1 ≤ p·4p sets such that if every set in A1 is selected and no set in B1 is selected, then (E1–E3) hold. Instead of the selecting a random subset, we try every function f in an (n1,a1 + b1,(a1 + b1)2)-splitter family and every subset F ⊆ [(a1 + b1)2] of size a1 (there are (a1+b1)2 a1 = 2O(p3 )) such sets F). For a particular choice of f and F, we select those sets C ∈ C for which f(C) ∈ F. By the definition of the splitter, there will be a function f that is injective on A1 ∪B1, and there is a subset F such that f(C) ∈ F for every A1 and f(C) ∈ F for every B1. For such an f and F, the selection will ensure that (E1–E3) hold. In the second phase, we select a random subset of universe X of size n2 ≤ pn, and there is a collection A2 ⊆ X of size a2 ≤ 2p and a collection B2 ⊆ X of size b2 ≤ p2 such that if every set in A2 is selected and no set in B2 is selected, then (E4) and (E5) hold. As in the first phase, we can replace this random choice by enumerating the functions of an (n2,a2 +b2,(a2 +b2)2)-splitter and every subset ¯F ⊆ [(a2 + b2)2] of size b2 (there are (a2+b2)2 b2 = 2O(p3 ) such sets ¯F). This time, we select a set X ∈ X if f(X) is not in ¯F and it is clear that there is an f and ¯F for which (E4) and (E5) hold. Let us bound the number of branches of the algorithm. In both phases, the size of the splitter family is 2O(p) ·logn and the there are 2O(p3 ) possible F. (Note that the splitter family can be constructed in time polynomial in the size of the family.) Thus the algorithm produces 2O(p3 ) ·logn instances, proving Lemma 4.1. 5. REDUCTION TO THE BIPEDAL CASE Let (G,T,W, p) be an instance of the MULTICUT COMPRESSION∗ problem. Let us call a component of G\W having at least two legs a non-trivial component of G w.r.t. W (when the context is clear, we will just refer to a non-trivial component). As the solution of MULTICUT COMPRESSION∗ has to be a set S that is disjoint from W and a multiway cut of W, the number of non-trivial components is a lower bound on the size of the solution. We present an algorithm that solves the given instance of the MULTICUT COMPRESSION∗ problem either by applying the algorithm for the BIPEDAL MULTICUT COMPRESSION∗ problem (in case every component has at most two legs) or by recursive application to a set of instances whose number is bounded by a function of p and such that in each instance either the parameter is decreased or the number of non-trivial components is increased. The main idea for the branching is the following. Let B be a set of vertices in G\W and let S be a hypothetical solution for MULTICUT COMPRESSION∗. We try to guess what happens to each vertex of B in the solution S. It is possible that a vertex v ∈ B is in S; in this case, we delete v from the instance and reduce the parameter. If v is separated away from W, we argue that it can be assumed that the set separating v from W is an important separator. We guess this important separator, remove it from the graph, and decrease the parameter appropriately. Finally, if v is not in S and is not separated away from W, then it is in the same component as precisely one w ∈ W (as S is a multiway cut of W). In this case, identifying v and w does not change the solution. The following lemma formalizes these observations. Given a set B of vertices in G \W and a function f : B → W, we denote by Gf the graph obtained by replacing each set {w} ∪ f−1(w) with a single vertex (with removal of loops and multiple occurrences of edges). To simplify the presentation, we will assume that this new vertex is also named w. We denote by Tf the set of terminal pairs where each vertex v ∈ B is replaced by f(v), and we denote by T\B the set where every pair involving a vertex in B is removed. LEMMA 5.1. Let K be a non-trivial component of G \W with set of legs W and let B ⊆ K. Then (G,T,W, p) has a solution if and only if one of the following statements is true. • There is a v ∈ B such that the instance (G\v,T\v,W, p−1) has a solution. • There is a v ∈ B and an important v −W separator S of size at most p in G[K ∪W] such that the instance (G\S ,T\ S ,W, p−|S |) has a solution. • There is a function f : B →W such that instance (Gf ,Tf ,W, p) has a solution. Lemma 5.1 determines a set of recursive calls to be applied in order to solve the given instance (G,T,W, p) of the MULTICUT COMPRESSION∗ problem. It is clear that in each step, the number of direction we branch into is bounded by a function of p, |B|, and |W| (recall that the number of v−W important separators of size at most p is at most 4p and the number of functions f : B → W can be bounded by |W||B|). However, in order to ensure that the size of the search tree is bounded, we need to ensure that the height of the search tree is bounded as well. This is obvious for the first two type of branches, as p decreases. The following property ensures that in every branch of the third type, either the number of nontrivial components increases or we get an instance that trivially has no solution. DEFINITION 5.2. Let K be a non-trivial component and letW ⊆ W be its set of legs. Let B be a subset of K. We say that B is a shattering set if for any function f : B → W one of the following statements is true regarding the instance (Gf ,Tf ,W, p) of the MULTICUT COMPRESSION∗. • There is a w ∈ W such that there is no w − (W \ {w}) separator of size at most p in in Gf [K ∪W]. • The number of non-trivial components is strictly greater in Gf \W than in G\W. 476 Note that the first possibility includes the case when Gf [W] is not an independent set. In Section 5.1, we present a polynomial-time algorithm for finding a shattering set. Together with Lemma 5.1, it is sufficient to prove the fixed-parameter tractability of the MULTICUT COMPRESSION∗ problem, i.e. to prove Lemma 2.2. LEMMA 5.3. Given an instance (G,T,W, p) of the MULTICUT COMPRESSION∗ problem and a component K of G \W with more than two legs, we can find a shattering set B ⊆ K of size at most 3p in polynomial time. 5.1 Finding a shattering set We start with two simple lemmas. LEMMA 5.4. Let K be a non-trivial component with a set W of at least 3 legs. If G[M1] and G[M2] are both connected for two disjoint sets M1,M2 ⊆ K, then at most one of M1 and M2 can be a multiway cut of (G[K ∪W],W). PROOF. Assume the opposite. Since no two vertices of W belong to the same component of G[K ∪W]\M1 and |W| ≥ 3, we can specify two vertices w and w of W whose respective components C and C in G[K ∪W]\M1 are disjoint from the connected set M2. Then there is a w −w path in G[K ∪W] that uses vertices from C , then vertices from (the connected set) M1, then vertices from C . This path is disjoint from M2, contradicting the assumption that M2 is a multiway cut. LEMMA 5.5. Let K be a non-trivial component with a set W of at least 3 legs. Let B ⊆ K be a non-shattering set. Then there is exactly one connected component of G[K \ B] which is a multiway cut of (G[K ∪W],W). PROOF. Let f : B → W be the mapping witnessing that B is not a shattering set. Let K ⊆ K \ B be the unique non-trivial component of Gf \W which is a subset of K. It is easy to see that K a component of G[K \ B] as well. Furthermore, K is a multiway cut of (G[K ∪W],W). Otherwise, a path between vertices of W in G[K ∪W]\K would correspond to a walk of Gf between the same vertices which belong to a non-trivial component that is a subset of K but different from K , in contradiction to the definition of f. Finally, Lemma 5.4 implies that K is the unique connected component of G[K \B] being a multiway cut of G[K ∪W]. Let K be a non-trivial component with a set of legsW. Let M ⊆ K be a multiway cut of (G[K ∪W],W). We call N(M) (i.e the open neighborhood of M) the boundary of M. For each w ∈W, the image I(w) of w is the set of vertices of N(M) reachable from w in G[K ∪ W] \ M (the image may include vertex w itself). Note that I(w) is nonempty for any w ∈ W: consider the first vertex of N(M) on a path from w to some other leg in W. For X ⊆ W, we let I(X) = w∈X I(w). Let us select a distinguished leg w∗ ∈ W. We say that M is good if all of the following conditions are true. • G[M] is connected. • N(M) = I(W) or, in other words, each vertex of N(M) is reachable in G[K ∪W]\M from some vertex of W. • |I(w∗)| ≤ p and |I(W \{w∗})| ≤ p holds (and hence we have |N(M)| ≤ 2p). LEMMA 5.6. Let K be a non-trivial component with a set W of at least 3 legs and a distinguished leg w∗. Let M be a good multiway cut of (G[K ∪W],W). Then there is a polynomial-time algorithm that either returns a shattering set of size at most 3p or a good multiway cut M ⊂ M. PROOF. The desired algorithm first computes a smallest I(w∗)− I(W \{w∗}) separator S of G[N(M)∪M] (recall that the images are nonempty). Observe that S is an inclusionwise minimal w∗ −W \ {w∗} separator in G[K ∪W] (and hence nonempty). We consider three cases: 1. If |S| > p, then the algorithm returns B := N(M)\W reporting it as a shattering set. 2. If |S| ≤ p and there is a unique connected component M of G[K \(N(M)∪S)] which is a multiway cut of (G[K ∪W],W), then the algorithm returns M reporting it as a good multiway cut. 3. If |S| ≤ p and there is no such M , then the algorithm returns B := (N(M)∪S)\W reporting it as a shattering set. This algorithm clearly takes polynomial time. The remaining proof establishes correctness of the algorithm in each of these three cases. Case 1. Suppose that B := N(M)\W is not a shattering set and let f : B →W be a function witnessing this. It is not hard to see that M is a connected component in Gf \W whose set of legs is a subset of W. We consider three subcases and arrive to a contradiction in each of them. Case 1a. M is a trivial component of Gf \W. Let w be the only leg of M. Let w1 and w2 be other two distinct legs of K in G that are different from w. It follows that f maps every vertex of I(w1)∪I(w2) to w implying that there is a w−w1 and w−w2 path in Gf whose internal vertices belong to two different components on w1 and w2 in G[K ∪W]\M. Thus Gf has at least two non-trivial components that are subsets of K, in contradiction to the choice of f. Case 1b. M is a nontrivial component of Gf \W and f(v) = w for every v ∈ I(w) and w ∈ W (i.e., each vertex on the boundary is mapped to its preimage). As the smallest I(w∗) − I(W \ {w∗}) separator in G[N(M)∪M] is larger than p, G[M ∪W] does not have a w∗ −W \ {w∗} separator of size at most p in contradiction to f being a witnessing function. Case 1c. M is a nontrivial component of Gf \W and f(v) = w2 for some v ∈ I(w1), w1,w2 ∈ W, w1 = w2. By definition of I(w1), there is a v−w path in G whose internal vertices are fully contained in K \ M. Therefore, there is a w1 − w2 path in Gf whose internal vertices are disjoint from M, implying that Gf has a nontrivial component which is a subset of K, but distinct from the nontrivial component M. Thus the number of nontrivial components increases, a contradiction. Case 2. We show first that M ⊂ M in this case. Clearly, M = M, as M is disjoint from (the nonempty) S ⊆ M. Thus M ⊂ M is only possible if M is disjoint from M, but Lemma 5.4 implies that the two disjoint connected sets M and M cannot be both multiway cuts. For clarity, from now on we use IM(w) and IM (w) for the image of w on the boundary of M and M , respectively. Observe that IM(w) ∩ N(M ) ⊆ IM (w) for every w ∈ W: for every v ∈ IM(w) ∩ N(M ), there is a w − v path disjoint from M, which is obviously disjoint from M ⊆ M as well. This immediately implies that either IM(w∗) or IM(W \{w∗}) is disjoint from N(M ): otherwise, a vertex v1 ∈ IM(w∗)∩N(M ) and a vertex v2 ∈ IM(w∗)∩N(M ) can be connected by a path P whose internal vertices are in M (hence disjoint from S), and path P can be extended to a w∗ −W \ {w∗} path disjoint from S, contradicting the definition of S. Therefore, either N(M ) ⊆ IM(w∗)∪S or N(M ) ⊆ IM(W \{w∗})∪S holds. To show that |IM (w∗)| and |IM (W \ {w∗})| are both at most p, we argue as follows. Suppose first that N(M ) ⊆ IM(w∗) ∪ S. 477 We show that IM (W \{w∗}) = S∩N(M ) and therefore IM (w∗) = IM(w∗)∩N(M ), proving the bound on both |IM (w∗)| and |IM (W \ {w∗})|. Note that this implies furthermore that N(M ) = IM (W), i.e., every vertex of N(M ) is the image of some leg. From IM(w∗)∩ N(M ) ⊆ IM (w∗) we already know that IM (W \{w∗}) ⊆ S∩N(M ). To show that S ∩ N(M ) ⊆ IM (W \ {w∗}) holds, consider a vertex s ∈ S ∩ N(M ). By the minimality of S, for some w ∈ W \ {w∗} there is a w∗ − w path P in G[K ∪W] that intersects S exactly in s. As M is a multiway cut, path P has to go through M . Furthermore, we can assume that P contains exactly 2 vertices of N(M ). One of these two vertices is s, and the other has to be a vertex v ∈ IM(w∗) (as P contains only one vertex of S). It follows that P has a subpath P1 connecting W and v, and a subpath P2 connecting W and s, both of them disjoint from M . As v ∈ IM(w∗), path P1 connects w∗ and v, thus path P2 connects s and w. Therefore, s is in IM (W \{w∗}), what we had to show. A symmetrical argument (exchanging the role of w∗ and W \ {w∗}) shows that if N(M ) ⊆ IM(W \ {w∗}) ∪ S, then IM (w∗) = S∩N(M ), implying the bounds on |IM (w∗)| and |IM (W \{w∗})|. Thus in both cases, we proved that M ⊂ M is a good multiway cut. Case 3. Assume now that the algorithm returns B := (S∪N(M))\ W as a shattering set. This happens because there is no unique component of G[K \ (N(M) ∪ S)] which is a multiway cut of (G[K ∪ W],W). According to Lemma 5.5, N(M)∪S is indeed a shattering set in this case. Clearly, its size is at most 3p. Lemma 5.3 follows by iterative application of Lemma 5.6. PROOF (OF LEMMA 5.3). It is not hard to see that K is a good multiway cut of (G[K ∪W],W). Let M0 = K. Apply the algorithm of Lemma 5.6 to M0. The algorithm either returns a shattering set of size at most 3p or a good multiway cut M1 ⊂ M0. In the former case just return the shattering set, in the latter case, apply the algorithm of Lemma 5.6 to M1. Continuing this way, we obtain a sequence M0 ⊃ M1 ⊃ ... of good multiway cuts of decreasing size. It follows that after at most |V(G)| iterative applications of the algorithm of Lemma 5.6, a shattering set of size at most 3p will be returned. 6. REFERENCES [1] S. Arora, S. Rao, and U. Vazirani. Expander flows, geometric embeddings and graph partitioning. J. ACM, 56(2):1–37, 2009. [2] N. Bansal, A. Blum, and S. Chawla. Correlation clustering. Machine Learning, 56(1-3):89–113, 2004. [3] H. L. Bodlaender, M. R. Fellows, P. Heggernes, F. Mancini, C. Papadopoulos, and F. A. Rosamond. Clustering with partial information. Theor. Comput. Sci., 411(7-9):1202–1211, 2010. [4] N. Bousquet, J. Daligault, and S. Thomassé. Multicut is FPT. Manuscript, arXiv:1010.5197, 2010. [5] N. Bousquet, J. Daligault, S. Thomasse, and A. Yeo. A polynomial kernel for multicut in trees. In STACS, pages 183–194, 2009. [6] G. Calinescu, C. G. Fernandes, and B. Reed. Multicuts in unweighted graphs and digraphs with bounded degree and bounded tree-width. Journal of Algorithms, 48(2):333 – 359, 2003. [7] S. Chawla, R. Krauthgamer, R. Kumar, Y. Rabani, and D. Sivakumar. On the hardness of approximating multicut and sparsest-cut. Comput. Complexity, 15(2):94–114, 2006. [8] J. Chen, Y. Liu, and S. Lu. An improved parameterized algorithm for the minimum node multiway cut problem. Algorithmica, 55(1):1–13, 2009. [9] J. Chen, Y. Liu, S. Lu, B. O’Sullivan, and I. Razgon. A fixed-parameter algorithm for the directed feedback vertex set problem. J. ACM, 55(5), 2008. [10] J. Chuzhoy and S. Khanna. Polynomial flow-cut gaps and hardness of directed cut problems. J. ACM, 56(2):1–28, 2009. [11] E. Dahlhaus, D. S. Johnson, C. H. Papadimitriou, P. D. Seymour, and M. Yannakakis. The complexity of multiterminal cuts. SIAM J. Comput., 23(4):864–894, 1994. [12] E. D. Demaine, D. Emanuel, A. Fiat, and N. Immorlica. Correlation clustering in general weighted graphs. Theor. Comput. Sci., 361(2-3):172–187, 2006. [13] R. G. Downey and M. R. Fellows. Parameterized Complexity. Springer, New York, 1999. [14] U. Feige, M. Hajiaghayi, and J. R. Lee. Improved approximation algorithms for minimum weight vertex separators. SIAM J. Comput., 38(2):629–657, 2008. [15] J. Flum and M. Grohe. Parameterized Complexity Theory. Springer, Berlin, 2006. [16] L. R. Ford, Jr. and D. R. Fulkerson. Maximal flow through a network. Canad. J. Math., 8:399–404, 1956. [17] N. Garg, V. V. Vazirani, and M. Yannakakis. Approximate max-flow min-(multi)cut theorems and their applications. SIAM J. Comput., 25(2):235–251, 1996. [18] G. Gottlob and S. T. Lee. A logical approach to multicut problems. Inf. Process. Lett., 103(4):136–141, 2007. [19] S. Guillemot. Fpt algorithms for path-transversal and cycle-transversal problems. Discrete Optimization, 8(1):61 – 71, 2011. Parameterized Complexity of Discrete Optimization. [20] J. Guo, F. Hüffner, E. Kenar, R. Niedermeier, and J. Uhlmann. Complexity and exact algorithms for vertex multicut in interval and bounded treewidth graphs. European J. Oper. Res., 186(2):542–553, 2008. [21] J. Guo and R. Niedermeier. Fixed-parameter tractability and data reduction for multicut in trees. Networks, 46(3):124–135, 2005. [22] A. Gupta. Improved results for directed multicut. In SODA, pages 454–455, 2003. [23] F. Hüffner, R. Niedermeier, and S. Wernicke. Techniques for practical fixed-parameter algorithms. The Computer Journal, 51(1):7–25, 2008. [24] S. Khot. On the power of unique 2-prover 1-round games. In STOC, pages 767–775, 2002. [25] D. Lokshtanov and D. Marx. Clustering with local restrictions, 2010. [26] D. Marx. Parameterized graph separation problems. In IWPEC, pages 71–82. 2004. [27] D. Marx. Parameterized graph separation problems. Theoretical Computer Science, 351(3):394–406, 2006. [28] D. Marx and I. Razgon. Constant ratio fixed-parameter approximation of the edge multicut problem. Inf. Process. Lett., 109(20):1161–1166, 2009. [29] D. Marx and I. Razgon. Fixed-parameter tractability of multicut parameterized by the size of the cutset. CoRR, abs/1010.3633, 2010. [30] M. Naor, L. J. Schulman, and A. Srinivasan. Splitters and near-optimal derandomization. In FOCS, pages 182–191, 1995. [31] R. Niedermeier. Invitation to Fixed-Parameter Algorithms. Oxford University Press, 2006. [32] R. Pichler, S. Rümmele, and S. Woltran. Multicut algorithms via tree decompositions. In CIAC, pages 167–179. 2010. [33] I. Razgon and B. O’Sullivan. Almost 2-SAT is fixed-parameter tractable. J. Comput. Syst. Sci., 75(8):435–450, 2009. [34] B. Reed, K. Smith, and A. Vetta. Finding odd cycle transversals. Operations Research Letters, 32(4):299–301, 2004. [35] J. Sherman. Breaking the multicommodity flow barrier for O( √ logn)-approximations to sparsest cut. In FOCS, pages 363–372, 2009. [36] M. Xiao. Algorithms for multiterminal cuts. In CSR, pages 314–325, 2008. [37] M. Yannakakis, P. C. Kanellakis, S. S. Cosmadakis, and C. H. Papadimitriou. Cutting and partitioning a graph after a fixed pattern. In ICALP, pages 712–722, 1983. 478