Kernel Bounds for Path and Cycle Problems* Hans L. Bodlaender^ Bart M. P. Jansen"'" Stefan Kratsch^ December 14, 2011 Abstract Connectivity problems like /j-Path and /j-Disjoint Paths relate to many important milestones in parameterized complexity namely the Graph Minors Project, color coding, and the recent development of techniques for obtaining kernelization lower bounds. This work explores the existence of polynomial kernels for various path and cycle problems, by considering nonstandard parameterizations. We show polynomial kernels when the parameters are a given vertex cover, a modulator to a cluster graph, or a (promised) max leaf number. We obtain lower bounds via cross-composition, e.g., for Hamiltonian Cycle and related problems when parameterized by a modulator to an outerplanar graph. keywords: parameterized complexity, kernelization, upper and lower bounds, graphs, path and cycle problems 1 Introduction Connectivity problems such as fc-path and A:-DlSJOlNT paths play important theoretical and practical roles in the field of parameterized complexity. On the practical side, fc-path [22, ND29] has applications in computational biology [29] where the involved parameter is fairly small, thus giving an excellent opportunity to apply parameterized algorithms to find optimal solutions. On the theoretical side, these problems have triggered the development of very powerful algorithmic techniques. The A:-DlSJOlNT paths problem [28] lies at the heart of the Graph Minors Algorithm, and is the source of the irrelevant-vert ex technique. The color coding technique of Alon et al. [1] to solve fc-path has found a wide range of applications and extensions [27, 9], and new methods of solving fc-path are still developing [3]. Despite the success stories of parameterized algorithms for these problems, the quest for polynomial kernels has resulted in mostly negative results. Indeed, the failure to find a polynomial kernel for fc-path was one of the main motivations for the development of the kernelization lower-bound framework of Bodlaender et al. [5]. Using the framework it was shown that fc-path does not admit a polynomial kernel unless NP c coNP/poly, even when restricted to very specific graph classes such as planar cubic graphs. It did not take long before related connectivity problems such as fc-Disjoint Paths [7], fc-Disjoint Cycles [7], ^-Connected Vertex Cover [17], and restricted variants of ^-Connected Dominating Set [14] were also shown not to admit polynomial kernels unless NP c coNP/poly. "This work was supported by the Netherlands Organization for Scientific Research (NWO), project "KERNELS: Combinatorial Analysis of Data Reduction". tTJtrecht University, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands, {h.1.bodlaender,b.m.p.jansen,s. kratsch}@"iru.nl 1 Thus it seems that connectivity requirements in a problem form a barrier to polynomial ker-nelizability when it comes to the natural parameterization by solution size k. Driven by the desire to obtain useful preprocessing procedures for such problems, we may therefore investigate the ker-nelization complexity for non-standard parameters. Early work by Fellows et al. [20] shows that such a different perspective can yield polynomial kernels: they proved that hamiltonian Cycle parameterized by the max leaf number of the input graph G, i.e. the maximum number of leaves in a spanning tree for G, admits a linear-vertex kernel. In this work we study the existence of polynomial kernels for various structural parameters such as the max leaf number, the size of a vertex cover, and the vertex-deletion distance to simple graph classes such as cluster graphs and outerplanar graphs. Our results are as follows1: 1. We introduce a widely applicable technique based on matchings in bipartite graphs to show that the problems LONG Cycle, its directed and path variants, disjoint paths, and disjoint Cycles, admit kernels with Ö(|X|2) vertices when parameterized by a vertex cover X. 2. We generalize these results to the stronger (i.e. smaller) parameter "vertex-deletion distance to a cluster graph". For Long Cycle and Long Path this requires the trick of encoding clique sizes in binary (in a weighted variant) and applying a Karp reduction to get "back" to the original problem. On the one hand, this has the drawback of not giving an explicit polynomial size bound. On the other hand, the binary encoding of clique sizes seems to be more efficient for subsequent computations. For disjoint Cycles and disjoint paths the Karp reduction can be avoided, as the length of cycles or paths is not important. 3. Using the same binary encoding trick we get polynomial kernels for long Cycle and long Path parameterized by the max leaf number, generalizing the result of Fellows et al. [20] for Hamiltonian Cycle. For Disjoint Cycles and Disjoint Paths the encoding trick is again not necessary. 4. We give contrasting kernelization lower bounds using the recently introduced technique of cross-composition [6]: (a) Directed Hamiltonian Cycle parameterized by a modulator to Bl-paths does not admit a polynomial kernel unless NP c coNP/poly, where the parameter measures the vertex-deletion distance to a digraph whose underlying undirected graph is a path, and (b) we modify the construction to prove that hamiltonian Cycle parameterized by a modulator to Outerplanar graphs does not admit a polynomial kernel; both results assuming NP $z coNP/poly. 5. We initiate the parameterized complexity study of finding paths respecting forbidden pairs [22, GT54] under various parameterizations. We obtain W[l]-hardness proofs, FPT algorithms, kernel lower bounds, and para-NP-completeness results. Related work. Chen and Flum showed that maximal fc-path is in FPT, and that maximal Induced fc-path is W[2]-complete [10]. Joined by Müller they studied various forms of kernelization lower bounds, and showed amongst others that fc-PoiNTED Path does not admit a parameter non-in creasing polynomial kernelization unless P = NP, and that fc-path does not have a polynomial kernel on connected planar graphs unless NP c coNP/poly [11]. Hi is easy to see that Hamiltonian Cycle and Hamiltonian Path are special cases of Long Cycle and Long Path respectively. Hence, upper and lower bounds transfer in the obvious way between them. 2 Organization. Section 2 introduces the necessary notation regarding graphs and parameterized complexity, and introduces the main problems studied in this work. In Section 3 we show a useful property of bipartite matchings, which helps to obtain kernelization results. Section 4 contains the positive results, i.e., polynomial kernels, for the mentioned path and cycle problems when parameterized by a vertex cover (Section 4.1), the max leaf number (Section 4.2), or a modulator to a cluster graph (Section 4.3). In Section 5 we show the mentioned lower bound results for directed and undirected hamiltonian Cycle. Section 6 contains our results for path problems with forbidden pairs. We conclude in Section 7. 2 Preliminaries Graphs. All graphs are finite and simple, unless indicated otherwise. An undirected graph G has a vertex set V(G) and an edge set E(G) C (V^). A directed graph D has a vertex set V(D) and a set of directed arcs A(D) C V{D)2. All paths are assumed to be simple. We say that a matching M in a graph covers a set of vertices U, if each vertex in U is endpoint of an edge in M. We use [n] as a shorthand for the set {1, 2,..., n}. If X is a finite set then (^) denotes the set of all size-n subsets of X. For a directed graph D and vertex v we write N^(v) := {u | (v,u) G A(D)} for the out-neighbors, Np(v) := {u | (u,v) G A(D)} for the set of in-neighbors, and Nd(v) := Np(v) U Np(v) for the set of all neighbors. If (u, v) is an arc of D then u is the head of the arc and v is its tail. If C C A{D) is a Hamiltonian cycle in a digraph containing the arcs (xj,Xj+i) for i £ [fc] then we say that the vertices ..., rEfc+i appear consecutively on C, in that order. We also use the undirected variant of this notion which is defined analogously. For a set of vertices X in a digraph D we use D — X to denote the digraph which results after deleting all vertices of X and their incident arcs from D; the concept is defined analogously for undirected graphs. The underlying undirected graph of a digraph D is the result of disregarding the orientation of the arcs and eliminating parallel edges. Let Bl-paths (for bi-orientations of paths) be the class of digraphs whose underlying undirected graph is a path. Outerplanar graphs are those graphs which can be drawn in the plane without crossings such that all the vertices lie on the outer face; such graphs have treewidth at most two. Cluster graphs are disjoint unions of cliques. For a graph class Q and a vertex set X C V{G) of a graph G such that G — X G Q, we say that X is a modulator to the class Q. Parameterized complexity and kernelization. A parameterized problem Q is a subset of S* x N, the second component being the parameter which expresses some structural measure of the input. A parameterized problem is (strongly uniformly) fixed-parameter tractable if there exists an algorithm to decide whether (x, k) G Q in time f{k)\x\°^ where / is a computable function [18]. A kernelization algorithm (or kernel) for a parameterized problem Q is a polynomial-time algorithm which transforms an instance (x, k) into an equivalent instance (x', k') such that \x'\, k! < f{k) for some computable function /, which is the size of the kernel. If / is a polynomial then this is a polynomial kernel [23]. Cross-composition. To prove our lower bounds we use the framework of cross-composition [6], which builds on earlier work by Bodlaender et al. [5], and Fortnow and Santhanam [21]. 3 Definition 1 (Polynomial equivalence relation [6]). An equivalence relation 1Z on S* is called a polynomial equivalence relation if the following two conditions hold: 1. There is an algorithm that given two strings x, y G S* decides whether x and y belong to the same equivalence class in (\x\ + lyl)^1-* time. 2. For any finite set S C S* the equivalence relation 1Z partitions the elements of S into at most (max-Egs ja?!)^1) classes. Definition 2 (Cross-composition [6]). Let L C S* be a set and let Q C S* x N be a parameterized problem. We say that L cross-composes into Q if there is a polynomial equivalence relation 1Z and an algorithm which, given r strings a?2,..., xr belonging to the same equivalence class of 1Z, computes an instance (x*, k*) G S* x N in time polynomial in Y^i=i \xi\ such that: 1. (x*, k*) G Q ^ Xi G L for some 1 < i < r, 2. k* is bounded by a polynomial in max-=1 \xi\ + logr. Theorem 1 ([6]). If some set L C S* is NP-hard under Karp reductions and L cross-composes into the parameterized problem Q then there is no polynomial kernel for Q unless NP C coNP/poly. Problems considered in this work. The main focus of this work lies on nonstandard parame-terizations of Long Path, Long Cycle, Hamiltonian Path, Hamiltonian Cycle, Disjoint Paths, and Disjoint Cycles. We briefly introduce the classical unparameterized versions; the first four a quite well known: Given a graph G and an integer k, Long Path and Long Cycle ask for the existence of a path or cycle respectively containing at least k vertices. Given a graph G, Hamiltonian Path and Hamiltonian Cycle ask for the existence of a path or cycle respectively which contains all vertices of G. Disjoint Paths Input: A graph G, an integer k, and a set of k vertex pairs {(si,ti), • • •, (sfc,tfc)}. Question: Is there a set of k vertex-disjoint paths, such that each pair (sj,tj) is connected by exactly one of the paths? Disjoint Cycles Input: A graph G and an integer k. Question: Does G contain a set of k vertex-disjoint cycles? Most of the variants with nonstandard parameters considered in this work are given by requiring an extra set X (a modulator) to be given in the input such that G — X has some special property (like being an independent set). The parameter is then chosen as £ := \X\. For formal reasons this would require an explicit inclusion of £ in the input, which we omit for succinctness. Further variants of path problems with forbidden pairs are introduced in Section 6. 3 A property of maximum matchings in bipartite graphs The following theorem simplifies the argumentation needed for reduction rules that are based on assigning private choices to some entities, e.g., vertices or edges, using an auxiliary bipartite graph, useful for example for various problems parameterized by a vertex cover. 4 Theorem 2. Let G = {X U Y,E) be a bipartite graph. Let M C E{G) be a maximum matching in G. Let Xm C X be the set of vertices in X that are endpoint of an edge in M. Then, for each Y' C Y, if there exists a matching M' in G that covers Y', then there exists a matching M" in G[Xm U Y] that covers Y'. Proof. Let G, M, and Xm be as stated in the theorem. Suppose the theorem does not hold for Y' C Y, and let M' be a matching in G that covers Y'. Over all such matchings M', take one that covers the largest number of vertices in Xm- By assumption M' is not a matching in UY], so there is a vertex yo £ Y' that is matched in M' to a vertex in X \ Xm , say xq . We use an iterative process to derive a contradiction, maintaining the following invariants: • x0 0 XM- • {xj,yj} g M' for 0 < j < i. • {yj,Xj+i} £ M for 0 < j < i. • The vertices Xj for 0 < j < i are distinct members of X, and vertices yj for 0 < j < i are distinct members of Y. It is easy to verify that given our choice of xo,yo these invariants are initially satisfied for i = 0. We now continue the process based on a case distinction: 1. If yi is not matched under M, then the sequence (xq, yo,..., Xi, yi) is an M-augmenting path in G since xq and yi are not matched under M, and all edges {yj,Xj+\} for 0 < j < i are contained in M. Hence M" := M \ {{yj,Xj+\} \ 0 < j < i} U {{xj,yj} \ 0 < j < i} is a matching in G larger than M, contradicting that M is maximum. 2. In the remaining cases we may assume yi is matched under M, say {yi,Xi+\} £ M. If there is an index 0 < j < i such that Xi+\ = Xj then j > 0 (since xq 0 Jy) and the edges {yi,Xi+\} and are both contained in M and are distinct edges since yj-i ^ yi, contradicting the fact that M is a matching. Hence Xi+\ is distinct from Xj for 0 < j < i. 3. If Xi+i is not covered by M' then the matching M" := M'\{{xj,yj} \ 0 < j < i}L){{yj,Xj+i} \ 0 < j < contains as many edges as M' but covers more vertices of Xm, contradicting the choice of M'. Hence Xi+\ is covered by M', say {xi+±,yi+i} g M'. If there is an index 0 < j < i such that = yj then {xi+±,yi+i} and {xj,yj} are two distinct edges in M' incident on j/j+i, contradicting that M' is a matching. Hence is distinct from yj for 0 < j < i. Now observe that the invariant holds for i + 1, and we may proceed with the next step of the process. By the last property of the invariant, the process must end. Hence the assumption that there is no matching in G[Xm U Y] which covers Y' leads to a contradiction, which concludes the proof. □ 4 Polynomial kernels for path and cycle problems This section provides polynomial kernels for various path and cycle problems when parameterized by a vertex cover (Section 4.1), the max leaf number (Section 4.2), or a modulator to a cluster graph (Section 4.3). 5 4.1 Parameterization by a vertex cover In this section we consider path and cycle problems when parameterized by the size £ of a given vertex cover, focusing mainly on the long Cycle problem, for which we present a kernel with 0(£2) vertices. Long Cycle parameterized by a vertex cover Input: A graph G, an integer k, and a set X C V{G) such that X is a vertex cover. Parameter: £ := \X\. Question: Does G contain a cycle of length at least kl We need only one reduction rule to get a kernelization, it uses a bipartite connection graph H = H(G, k,X): One color class consists of the vertices in the independent set / = V{G) \ X, and the other consists of all (unordered) pairs of distinct vertices in X. We take an edge from a vertex v G / to a vertex representing the pair {p, q} C X, if and only if v is adjacent to p and to q. Reduction Rule 1. Given (G, k, X), if k < 4 then solve the problem (e.g. by the trivial C(n4) algorithm) and return an equivalent dummy instance. Otherwise, construct the connection graph H = H(G, k,X). Let M be a maximum matching in H. Let J C / be the vertices covered by an edge in M. Remove all vertices in / \ J and their incident edges from G. Let G' be the resulting graph, and return the instance (G", k,X). Observation 1. In Rule 1, \ J\ is at most the number of pairs of distinct vertices in X, and hence Correctness of the rule follows from the following lemma. Lemma 1. Let (G,k,X) be an instance of Long Cycle parameterized by a vertex cover, and let (G", k, X) be the instance returned by Rule 1. Then G has a cycle of length at least k if and only if G' has a cycle of length at least k. Proof. If k < 4 then the lemma holds trivially. Otherwise, we have that G' is an induced subgraph of G so cycles (in particular those of length at least k) in G' exist also in G. It remains to look at the converse. Let C be a cycle of length at least k > 5 in G. Clearly, as / = V \ X is an independent set, any vertices of / which are in C must be neighbored by vertices of X on C. Let v±,..., vr be all vertices of / contained in C and let pi and qi be the predecessor and successor of Vi on C, respectively (clearly r < £ but there might be far fewer vertices of / on C). Since C has length at least 5, it follows that {pi,qi} ^ {Pj,Qj} for all hj £ [r] with i ^ j (else it would have length 4). To show that G' contains a cycle of length at least k, it suffices to find replacements for all vertices Vi which are not in J (and hence not in G'); for this we will use the matching. Clearly, in H = H(G,k,X) there is a matching M covering W := {{pi, qi}, ■ ■ ■, {pr, Qr}}, namely matching each pair to the corresponding vertex V{. Further, by Rule 1, J is the set of endpoints in / of some maximum matching of H. Hence, by Theorem 2, there is a matching M' covering W in H[J U W]. Let v'i denote the vertex matched to {pi, qi} by M', for i £ {1,..., r}. It is easy to see that we may replace each Vi on C by since is adjacent to pi and qi in G, obtaining a cycle C' which intersects / only in vertices of J. Also, as all pairs {pi,qi} are different, no vertex is required twice. Hence, C' is also a cycle of G', and of length at least k. □ 6 The kernelization result now follows from Lemma 1 and Observation 1, and noting that Rule 1 can be easily performed in polynomial time. Theorem 3. Long Cycle parameterized by a vertex cover has a kernel with 0(£2) vertices. We note that in the obtained kernel the number of edges may still be cubic in £, giving an overall size bound of 0(£s) by using an adjacency matrix encoding. It would be interesting to know whether the number of edges can be reduced to 0{£2) and whether a size bound of 0{£2) could be showed to be tight, using recent results on polynomial lower bounds for kernelization [16, 15, 25]. Further problems parameterized by vertex cover. The same technique can be used for a number of additional problems, all parameterized by the size of a vertex cover. The basic argument is the same; the matching approach allows us to reroute any paths or cycles such that they use only matched vertices. Corollary 1. Long Path, Disjoint Paths, and Disjoint Cycles admit polynomial kernels with 0(£2) vertices. Proof. We briefly sketch how to modify the proof given for LONG Cycle (Theorem 3). For an instance (G, k, X) of the long Path problem, it is most convenient to introduce a universal vertex v (adjacent to all vertices of G) to obtain an equivalent instance (G", k + l,XL){v}) of long Cycle parameterized by a vertex cover. Each fc-path in G then corresponds to a k + 1-cycle in G' by adding v, and each fc + l-cycle in G' contains at least one fc-path in G. We then apply the kernelization as for long Cycle, and finish up by removing v from the obtained instance. For Disjoint Paths it is clear that each path fulfilling a request (sj,tj) must contain at least one vertex of X, hence yes-instances have at most £ = \X\ request pairs. Thus we may assume that all vertices of request pairs are contained in X (else adding them will at most triple the size of X). Since the paths have to be vertex-disjoint, no two request pairs share any vertices (though the proof could be adapted to internally vertex-disjoint paths, and also to asking for multiple paths between certain pairs). It is hence clear that edges between vertices of different pairs cannot be part of any path. Therefore, the easiest way to argue correctness is to add all those edges to G, and observe that the kernelization for long Cycle also preserves the existence of a cycle which traverses all request pairs in order, i.e., (..., s±,..., t±, S2, ■ ■ ■, t2> s3i ■ ■ •)• Such a cycle exists if and only if the k requested disjoint paths exist and the instance is yes. For the Disjoint Cycles problem we may proceed essentially as for Long Cycle, since each single cycle will be preserved (as the argument only comes down to providing the private shared neighbors). The only difference is that we also have to preserve cycles of length four, which may require two vertices of the independent set to be assigned (matched) to one pair of vertices of the vertex cover X. (We also must preserve cycles of length three of course, but only length four cycles may have the particular mentioned layout.) To do so create the auxiliary bipartite graph H = H(G, k, X) but duplicate all vertices corresponding to (unordered) pairs of vertices from X. This way, each pair can receive two private shared neighbors. □ For Hamiltonian Path and Hamiltonian Cycle it is easy to see that any vertex cover of a yes-instance must have size at least least |_^J> since vertices of the remaining independent set cannot be adjacent on a Hamiltonian path or cycle. Thus all nontrivial instances (G, X) have \ V{G)\ < 2\X\ + 1 = 0{£). 7 Furthermore, the matching argument can also be applied to the directed versions of these problems. For this, match to each ordered pair (u, v) with u and v being vertices of the provided vertex cover a vertex p such that there are (directed) edges (u, p) and (p, v) (note that there will be a vertex q for (v, u) as well, with edges (v, q) and (q, u)). This way, any directed paths or cycles can be rerouted to use only matched vertices from V{G) \ X, providing kernels of the same size (up to constant factors). Again, for Hamiltonian Path and Hamiltonian Cycle the simpler argument from the previous paragraph applies. 4.2 Parameterization by max leaf number In this section we consider path and cycle problems parameterized by the max leaf number, i.e., the maximum number of leaves in any spanning tree of the graph. Deviating slightly from the standard use, we will take the max leaf number of a disconnected graph to be the sum of max leaf numbers taken over all connected components (alternatively one may restrict the question to connected graphs). We will use LONG Cycle as a running example, but as in Section 4.1 it is easy to generalize the arguments to further problems. As the max leaf number of a graph cannot be verified in polynomial time, we consider the parameterization in the sense of a promise problem, e.g.: Long Cycle parameterized by max leaf number (LCML) Input: A graph G and two integers k and £. Parameter: £. Question: If G has max leaf number at most £, then decide whether G contains a cycle of length at least k. Else the output may be arbitrary. It is well known that a large graph having small max leaf number must contain long paths of degree two vertices and few vertices of degree at least three. The following bound was obtained by Fellows et al. [20] based on work by Kleitman and West [26]; it can be easily seen to hold for each connected component. Lemma 2 ([20]). // a graph G has max leaf number at most £ then it is a subdivision of some graph H of at most A£ — 2 vertices. In particular, G has at most A£ — 2 vertices of degree at least three. It is not hard to devise an FPT-algorithm for LCML. Lemma 3. Long Cycle parameterized by max leaf number can be solved in time 2°^nc. Proof. Let (G, k, £) be an instance of LCML. Let B denote those vertices that have degree at least three in G. If \B\ > A£ — 2 then by Lemma 2, the promise is not fulfilled as the max leaf number of G is greater than £, we return no. Otherwise, we have \B\ < A£ — 2. Now, replacing each path connecting two B vertices (without visiting other B vertices) by an edge with weight equal to the length of the path we obtain a multigraph with the same maximum cycle length. Clearly, a longest cycle will either consist of just two vertices, or use at most one of the edges between any two vertices. Hence, after checking whether there is a sufficiently long 2-vertex cycle, we may discard all but the longest edge connecting any two vertices. The remaining problem can be solved by Held-Karp style dynamic programming [24] on the remaining weighted graph with \B\ = 0{£) vertices; hence in time 2°^nc. □ 8 The main idea for the kernelization is that one of two good cases must hold: Either all the path lengths are small enough such that a binary encoding of their length has size polynomial in £, or the total number n of vertices is large enough such that the 2°^ nc is in fact polynomial in n. Theorem 4. Long Cycle parameterized by max leaf number admits a polynomial kernel. Proof. Given an instance (G, k, £) of LCML, we first check that k does not exceed the number of vertices and that there are at most A£ vertices of degree at least three, or else return no. If G has more than 2°W vertices (using the concrete bound resulting from an implementation of Lemma 3), then we solve the instance in time 2°^-nc = C(nc+1), and answer yes or no accordingly. Otherwise let B denote the set of vertices of degree at least three. If there are more than £ disjoint paths connecting any two vertices of B, then the max leaf number of G exceeds £, and we return no. We replace each path connecting two vertices b,b' £ B, with internal vertices from V{G) \ B, by a single edge with an integer label denoting the number of internal vertices of the replaced path. We obtain a multigraph G' in which some edges have an integer label. It is easy to see that cycles in G correspond to cycles in G' of the same length, when taking the integer labels into account (i.e. labeled edges are simply worth as much as that many internal vertices). Clearly, each label can be encoded in binary by at most log 2°^ = 0(£) bits. Furthermore, we delete all paths that start in a vertex of B, have internal vertices from V{G) \ B, and end in a vertex of degree one; clearly those cannot be used by cycles in G. We obtain a multigraph G' with at most A£ vertices in B and with at most £ edges between any two B-vertices. Thus we have 0(£) vertices and 0(£s) integer labels of size 0(£), for a total size of 0(£4) (this could be easily tightened, but it would not affect the result); clearly k can also be encoded in 0{£) bits. We obtain an equivalent instance of a slightly different long Cycle problem on multigraphs in which some edges may be labeled, but which is in NP. By the implied Karp reduction to LCML we obtain the claimed polynomial kernel (cf. [7]). Deviating from Bodlaender et al. [7] we do not use the versions with parameter encoded in unary, but observe the following: All instances of long Cycle with k exceeding the number of vertices are trivially no and may be replaced by smaller dummy no-instances, so the parameter value of the remaining instances is indeed polynomial in £ (as is the instance size, due to the Karp reduction). □ Further problems parameterized by max leaf number. A polynomial kernel for Hamil-tonian Cycle was already found by Fellows et al. [20]. Kernels for Hamiltonian Path as well as Disjoint Cycles can be obtained in a similar way, by observing that the paths of degree-2 vertices can be reduced to having only one internal vertex. For long Path it is again necessary to use the binary encoding trick for the path lengths. Corollary 2. Long Path, Hamiltonian Path, and Disjoint Cycles parameterized by max leaf number admit polynomial kernels. For Disjoint Paths, i.e., finding k disjoint paths connecting k terminal pairs (si, ti),..., (s^, £&), some more work is necessary on the paths between B vertices, and on paths between B vertices and the at most £ leaves. Theorem 5. disjoint paths parameterized by max leaf number admits a polynomial kernel. 9 Proof. Let (G, k, {(si,ii),..., (sk,tk)},£) be an instance of disjoint paths parameterized by max leaf number (DPML). Let L be the set of leaves of G and let B contain all vertices of degree at least three. If \L\ > £ or if \B\ > A£ we return no as, by Lemma 2, the max leaf number of G must exceed k. Similarly, we return no if any vertex of B has degree greater than £. We will now reduce the number of terminals on paths with internal vertices from 0 := V{G) \ (LUB), i.e., the set of vertices with degree exactly two. Let P be such a path and assume that it contains at least five terminals. If follows that any terminal except the first and the last one on the path must reach its partner (e.g., Sj must reach tj) on the path P. This can be tested in polynomial time, and the instance be rejected if it is impossible. Afterwards the subpath spanned by those terminals can be deleted and be replaced by a subpath consisting just of the vertices of one new terminal pair, say (s',t'). By the same argumentation s' and t' must be connected on the path, hence the path cannot be used for other terminals. To see that the max leaf number does not increase, observe that the above modification is basically a contraction of some edges (terminals have no influence), and that it does not increase the number of connected components. Otherwise, we replace all non-terminal vertices in O by deleting them and adding an edge between their neighbors; it is clear that any path must pass through them, so this is equivalent. Afterwards, each path with internal vertices from O has at most four internal vertices, allowing us to bound the total number of vertices by \L\ + \B\ + A£\B\ £ 0{£2) (the latter term accounts for the at most £ paths starting in any vertex of B). □ 4.3 Parameterization by a modulator to cluster graphs In this section, we consider path and cycle problems parameterized by vertex-deletion distance from cluster graphs. To this end, alongside the input graph (and possibly further inputs) a modulator X is provided such that G — X is a cluster graph. Again we consider the long Cycle problem. Long Cycle parameterized by a modulator to cluster graphs Input: A graph G, an integer k, and a set X C V{G) such that G — X is a disjoint union of cliques. Parameter: £ := \X\. Question: Does G contain a cycle of length at least kl Observation 2. Cycles in G contain vertices of at most \X\ cliques of G — X. If a cycle uses at least one edge between vertices of a clique, then it can be easily extended to include all so far unused vertices of the clique. Hence, it can be seen that there is a preference for including the largest possible cliques and as many cliques as possible. This is used in the following reduction rule. Reduction Rule 2. Given (G, k,X), if there is a clique with at least k vertices, or a vertex in X with two neighbors in a clique of size k — 1 then delete all other cliques and return the obtained instance (G',k,X) (which is trivially yes). Otherwise, for each vertex pair {u, v} from X, mark £+1 cliques that contain a shared neighbor of u and v. Additionally, mark the £ + 1 largest cliques containing vertices p and q with u adjacent to p and v adjacent to q. Delete all unmarked cliques, obtaining a graph G', and return (G", k,X). Clearly G' — X is still a cluster graph, and if G' contains a cycle of length at least k, then that cycle can also be found in its supergraph G. The following lemma completes the proof for safeness of Rule 2. 10 Lemma 4. Let (G',k,X) be obtained from an application of Rule 2 on (G,k,X). If G has a cycle of length at least k, then G' has a cycle of length at least k. Proof. Assume that G contains a cycle of length at least k. If the first part of Rule 2 applies, then G' trivially contains a cycle of length k and we are done. Otherwise, any cycle of length at least k must contain at least two vertices of X. Let C be a cycle of length at least k in G which contains a minimum number of subpaths that are not contained in G' — X. Assume that C contains vertices from at least one clique which is not in G' — X (i.e. which was not marked) and let K be such a clique of G — X. Let P = (pi,... ,pr) denote one of the subpaths obtained by intersecting C with the vertex set of K. Further, let Puv denote the extended subpath obtained by adding the vertices u, v G X which are adjacent to P on C, w.l.o.g., Puv = (u,pi,... ,pr,v). Note that «/d, else C = (u,p±,... ,pr) and the first case of Rule 2 would have applied. If r = 1 and Puv = (u,pi,v), then by the marking process of Rule 2 and as K was not marked, there must be £ + 1 cliques which contain a shared neighbor of u and v (which were marked and) which are present in G'. Hence, by Observation 2, at least one of those cliques, K' say, was not used by C, and we may replace p\ by any shared neighbor of u and v in K'. We obtain a cycle C' that has one fewer subpath which is not in G' — X contradicting our choice of C. Now, if r > 1, then Puv = (u,pi,... ,pr,v) with p\ ^ pr. Hence, K contains a neighbor of u and a different neighbor of v. As K is not marked, the reduction rule must have marked £ + 1 other cliques which are at least as large as K, and which contain such neighbors. Again, by Observation 2, there must be a clique K' among them which is not used by C. The clique K' is at least as large as K and it contains vertices p and q with p adjacent to u and q adjacent to v. Hence, we may replace the subpath Puv of C by a path from u to p, followed by a path from ptoq using all vertices of K', and back to v; we call this P'uv. Clearly, P'uv is at least as long as Puv, since Puv could at most use all vertices of K (there might be other subpaths of C using K) and K' has at least that many vertices. Thus, replacing Puv in C by P'uv we obtain a cycle C' of at least the same length, which uses one fewer subpath that is not contained in G' — X, contradicting the choice of C. It follows that C contains only vertices of X and of marked cliques. Hence, it exists also in G', completing the proof. □ For the remaining reduction we proceed similarly to Section 4.2: First, we show that we can without harm restrict to allowing only a small number of vertices in each clique for connecting to X, the remaining vertices are only needed to possibly extend the length of the cycle (i.e., they are visited after entering the clique and before leaving it). This allows for a straightforward FPT-algorithm of runtime 0{£1W ■ nc). Consequently, we may assume the number of vertices to be bounded by O(£10i), else solving the whole instance in time 0{nc+1). Thus, the number of additional vertices in each clique may be encoded in binary, with coding length O(^log^), giving a polynomial sized equivalent instance of a special version of our problem. Finally, we use a Karp reduction from this special version back to the basic problem, to obtain a polynomial kernel. Lemma 5. Given an instance (G,k,X) one can in polynomial time identify {£, + l)3 vertices in each clique ofG — X, such that if (G,k,X) is yes, then G contains a cycle of length at least k that enters and leaves cliques of G — X only through the identified vertices. Proof. Given (G, k, X) mark for each pair of vertices u, v G X up to 2£ + 1 shared neighbors in each clique. Furthermore, we mark for each vertex v G X up to 2£ + 1 neighbors in each clique. 11 Now, let us assume that G contains a cycle of length at least k, and let C be a cycle of length at least k that enters and leaves cliques of G — X through a minimum number of unmarked vertices (equivalently, C uses a minimum number of edges between the modulator and unmarked vertices). Assume that C = (... ,u,p±,... ,pr,v,...) where u,v G X, all vertices pi are from the same clique K of G — X, and at least p\ is unmarked. If r = 1, then C = (... ,u,pi,v,...). As p\ is unmarked, the clique K must contain 2£ + 1 marked vertices that are shared neighbors of u and v, say q±, . . . ,q2£+i- If at least one of them is not used by C, say then we may replace p\ by and thereby obtain a cycle of the same length as C that uses one less unmarked vertex to enter and leave cliques; a contradiction to the choice of C. So assume that C uses all vertices q\, . . . ,q2£+i- By £ = \X\ we know that at least one vertex, say qj, is not adjacent to vertices of X on C, but must be adjacent to other vertices of K. Therefore, we may swap the positions of p\ and qj in C: Indeed, qj is adjacent to u and v, and pi is adjacent to all other vertices of K (and clearly it is not adjacent to qj on C). We obtain a cycle of the same length, which uses one less unmarked vertex to enter or leave cliques of G — X, contradicting our choice of C: The new cycle now enters a clique at qj instead of p\. Now, we consider the case that r > 1, so C = (..., u,p±,... ,pr,v,...) with p\ ^ pr. Since p\ is unmarked, there must be 2£ + 1 marked vertices in K which are adjacent to u, say q±,..., q2£+i- It follows that at least one of those vertices, say qj, is not adjacent to X on C. We can now apply the same switching arguments as in the previous paragraph to obtain a cycle C' of the same length, which use one less unmarked vertex to enter or leave cliques of G — X, contradicting our choice of C. It follows that C enters and leaves cliques only through marked vertices. The marked vertices constitute the sets of vertices in the cliques as claimed by the lemma. There are at most £ ■ (2£ + 1) + (g) • {2£ + 1) < {£ + l)3 marked vertices per clique of G — X, as claimed. □ Lemma 6. Given (G,k,X), with £ := \X\, a cycle of length at least k in G can be found in time O{£loe ■ nc) = 2°^^n°^ if it exists. Proof. We use Reduction Rule 2 to reduce the number of cliques in G — X to 0(£s). Then we mark vertices according to Lemma 5. Assuming that (G,k,X) is a yes-instance, it follows that there must be a cycle of length at least k in G which enters and leaves cliques only in marked vertices. The following brute force algorithm finds such a cycle if it exists: 1. If any clique has size at least k then return yes. Else at least one vertex of X must be used. 2. Try all ordered subsets X' = {x\,..., Xf} of X for the intersection of C with X. 3. For all pairs (xj, Xi+i) (including (xt, x±)) try all cliques that contain neighbors of Xj and Xi+\, or, if possible, try the edge to connect Xj and Xj+i on C. 4. If a clique K has been chosen to connect Xj and Xj+i on C, then try all marked vertices of the clique for entering and leaving K on C, i.e., to find either vertices p and q with C = (..., Xi,p,..., q, Xj+i), or a single vertex p for C = (..., Xj,p, Xj+i,...) (and check adjacency of p and possibly q to Xj and Xj+i). 5. Greedily connect the chosen vertices inside the cliques: For each clique where we have chosen the option C = (..., Xj,p,..., q, Xj+i,...) we can include all remaining vertices of the clique into our cycle. If only shared neighbors were chosen, then this is not possible. 12 It is easy to see that given the existence of any cycle C of length at least k which enters and leaves cliques only through marked vertices, our simple algorithm must find a cycle of at least the same length: It will eventually try the same choice and order of vertices from X, and pick the same marked vertices to connect them to cliques. At that point, it is easy to see, that the greedy connection of the chosen marked vertices must give a cycle of at least the same length. Clearly, if our algorithm finds a cycle of length at least k, then the instance is yes. We close by giving an upper bound on the dependence of £ in the runtime. It is easy to see that the dependence on the input size is (a low degree) polynomial. 1. This step can be performed in polynomial time. 2. There are less than {£ + if = Oif) ordered subsets X' of X. 3. We pick up to £ cliques (possibly with repetition) out of the 0{£^) cliques of G — X, for a total of 0{£u) choices. 4. Up to £ times, we pick two marked vertices among the {£ + l)3 vertices, i.e., we have 0{£^) choices. 5. This step can be performed in time polynomial in the input size. This gives a total runtime of O(£loe ■ nc) = 2°(nos^n0^. □ Theorem 6. Long Cycle parameterized by a modulator to cluster graphs admits a polynomial kernel. Proof. Let (G,k,X) be an input instance. W.l.o.g. we let the instance be reduced according to Reduction Rule 2, i.e., G has at most 0(£3) cliques. If G has at least 2c'^log^ vertices, then the algorithm of Lemma 6 can be used to solve the instance in time n ■ rP^ = n°^l\ Otherwise, using n < 2o(eioge) we encode our instance in the following way, as an instance of a different problem: • We mark vertices according to Lemma 5. • In each clique, we replace all unmarked vertices by a single unmarked vertex with an integer label stating the number of unmarked vertices. In binary encoding, that label needs at most logn = log(2c,("°g£)) = 0(£log£) bits. For the so-encoded instance we ask for a path of length at least k which enters and leaves cliques only through marked vertices, but we account the labeled vertices as subpaths with a number of vertices equal to their label. Clearly, this is equivalent to the original question. In addition to the vertices of X, the instance has at most 0(£s) cliques each with at most {£, + l)3 + 1 vertices plus one integer label of bit size at most O(^log^) per clique. Clearly, this gives a total size which is polynomial in £. Finally, we observe that the question which we ask about this instance can be answered in NP. As long Cycle is NP-complete, it remains so even when we append the distance to a cluster graph in unary as well as a reasonable encoding of a modulator X as both are bounded by n. Thus there is a Karp reduction from our alternative problem to long Cycle parameterized by a modulator to cluster graphs. By standard arguments (e.g. see [7]) this implies that the latter problem admits a polynomial kernel. □ 13 Further problems parameterized by a modulator to cluster graphs. For long Path, Hamiltonian Cycle, and Hamiltonian Path polynomial kernels follow directly from Theorem 6 via straightforward reductions to the long Cycle problem. However, it can be easily observed that for Hamiltonian Cycle and Hamiltonian Path the encoding trick is not necessary. Indeed all the unmarked vertices can always be assumed to occur in a single subpath of the final Hamiltonian cycle or path. Hence, it suffices to keep only a single unmarked vertex per clique. This saves the argument via Karp reductions. In fact, the marking argument can be tightened too, by using the matching approach from Section 4.1. Corollary 3. Long Path, Hamiltonian Cycle, and Hamiltonian Path parameterized by a modulator to cluster graphs admit polynomial kernels. Proof. For an instance (G, k, X) of long Path parameterized by a modulator from cluster graphs, simply add a universal vertex v adjacent to all vertices of G and ask for a cycle of length k + 1. Furthermore, the vertex v is added to the modulator X, increasing the parameter value by one. Theorem 6 now creates an intermediate instance of size C(|X|6) which reduces back to long Path by a Karp reduction. (Note that we may also remove the universal vertex to get an instance of long Path with labeled vertices. Of course this is a compression, or generalized kernelization.) It is straightforward to get similar reductions for hamiltonian Cycle and hamiltonian Path by asking for long cycles or path respectively of length (at least) equal to the total number of vertices. However, it can be easily seen that following the proof of Theorem 6 for an obtained instance of long Cycle, one may replace each labeled vertex by a single unlabeled one and update the target length (such that it always matches the number of vertices). Indeed, any cycle using all vertices can be modified to contain all unmarked vertices as a single subpath. The number of those vertices (i.e. the single label in each clique) is then immaterial. Thus one obtains graphs with C(|X|6) vertices. It is straightforward to make the modifications to get back to instances of Hamiltonian Cycle and Hamiltonian Path respectively. □ We now turn our attention to disjoint paths and disjoint Cycles. In both problems the length of paths or cycles is not important, but there maybe large numbers of paths and cycles inside each clique. Observation 3. In the disjoint paths problem parameterized by a modulator X to a cluster graph we may assume that all requested pairs lie entirely in X: First, it is clear that request pairs (sj,tj) contained in the same clique of G — X, have the uniquely optimal path consisting only of Si and tj. Thus, all such vertex pairs may be deleted (both from the graph and the list of requests). To satisfy any other request pair the corresponding path needs to intersect X, bounding the maximum feasible number of requests by \X\; else we reject the instance. Thus, including the vertices of all requests at most triples the size of X. From this observation it is clear that a polynomial kernelization for disjoint paths can be achieved by marking (or matching) enough vertices to allow for the necessary paths. The case for Disjoint Cycles is similar: There are at most \X\ cycles which include vertices of X. All further cycles can be chosen as triangles inside single cliques. Again a marking procedure suffices, adding an additional step of removing triples of unmarked vertices from any single clique and each time reducing the target number of cycles by one (accounting for those trivial cycles not intersecting X). Corollary 4. disjoint paths and disjoint Cycles parameterized by a modulator to a cluster-graph admit polynomial kernels. 14 5 Lower bounds for path and cycle problems In this section we present kernelization lower bounds for the directed- and undirected variants of hamiltonian Cycle with structural parameters. The parameterizations we use are at least as large as the treewidth of the input graphs (or the underlying undirected graph, in the directed case) which shows that the parameterized problems for which we prove a kernel lower bound are indeed contained in FPT. We first develop a lower bound for Directed Hamiltonian Cycle parameterized by a modulator to Bl-paths, and the show how to alter this construction to give lower bound for Hamiltonian Cycle parameterized by a modulator to Outerplanar graphs. Our proofs use the technique of cross-composition [6], in which a kernel lower bound is obtained by showing that the logical OR of a series of instances of an NP-hard problem, can be embedded in a single instance of the parameterized target problem at a small parameter cost. We therefore start each subsection by defining an appropriate NP-hard problem to compose, and then give a cross-composition algorithm. 5.1 Directed Hamiltonian Cycle with a modulator to bi-paths We start by defining the NP-hard problem which we will use in the cross-composition. Hamiltonian s — t Path in Directed Bipartite Graphs Input: A bipartite digraph D with color classes A = {a±,..., anA} and B = {b±,..., bng} with ub = tia + 1 such that N^fti) = 0 and N^(bnB) = 0. Question: Does D contain a directed Hamiltonian path which starts in b\ and ends in bnB ? It is not difficult to show that this problem is NP-complete. Proposition 1. Hamiltonian s — t Path in Directed Bipartite Graphs is NP-complete. Proof. Membership in NP is trivial. We prove hardness by a reduction from hamiltonian s — t Path which is a classical NP-complete problem [22, GT39]. On input an undirected graph G with distinguished vertices s,t construct a graph G' on vertex set V{G) x {1,2,3,4} with edges {{vi,v2}, {v2,v3}, {v3,v4} | v g V(G)} u {{«i,n4}, {v4,ui} | {u,v} g E(G)}. It is easy to verify that G has a Hamiltonian s — t path if and only if G' has a Hamiltonian s± — £4 path. The graph G' is bipartite. To obtain an instance of hamiltonian s — t Path in Directed Bipartite Graphs we build a digraph D' by replacing the edges in G' with bidirectional arcs, adding a new starting vertex s* with unique out-neighbor s±, adding a vertex w with in-neighbor £4 and a new ending vertex t* with in-neighbor w. Taking the appropriate partite sets of the bipartition, labeling s* as b\ and t* as bng we obtain an equivalent instance of hamiltonian s — t Path in Directed Bipartite Graphs. □ Now we formally define the parameterized problem for which we will prove a kernel lower bound. Directed Hamiltonian Cycle parameterized by a modulator to Bi-paths Input: A digraph D and a modulator X c V(D) such that D — X g Bl-paths. Parameter: The size \X\ of the modulator. Question: Does D have a directed Hamiltonian cycle? 15 The following propositions will be useful for verifying the correctness of the composition. Proposition 2. Let D be a digraph with directed Hamiltonian cycle C. Ifv G V{D) is a vertex such that Nd{v) = {u,w} then C contains either the path (u,v), (v,w) or (w,v), (v,u). Proposition 3. Let D be a bipartite digraph with color classes A = {a±,... ,anA} and B = {b±,..., bng} with ub = ua + 1. If C c A{D) is a set of arcs such that: 1. D[C] does not contain a directed cycle, 2. all vertices a G A are the head of one arc in C and the tail of one arc in C, 3. no vertex b G B is the head of two arcs in C, or the tail of two arcs in C, 4- the vertex b\ is the head of one arc in C and the tail of no arc, 5. the vertex bng is the tail of one arc in C and the head of no arc, then C is a Hamiltonian path from b\ to bng in D. We can now give the cross-composition. Theorem 7. Directed Hamiltonian Cycle parameterized by a modulator to Bi-paths does not admit a polynomial kernel unless NP c coNP/poly. Proof. By Theorem 1 and Proposition 1 it is sufficient to show that hamiltonian s — t Path in Directed Bipartite Graphs cross-composes into Directed Hamiltonian Cycle parameterized by a modulator to Bl-paths. We first give the polynomial equivalence relationship TZ to be used for the cross-composition. Fix some reasonable encoding of instances of hamiltonian s — t Path in Directed Bipartite Graphs into an alphabet £ such that well-formed instances can be recognized in polynomial time. Now say that two strings in S* are equivalent under TZ if (a) they are both malformed instances, or (b) they encode instances {D\, Ai,B\) and {D2, A2, B2) of Hamiltonian s — t Path in Directed Bipartite Graphs such that \Ai\ = \A2\ and \Bi\ = \B2\. It is not hard to verify that a set of strings which encode instances of up to n vertices each, is partitioned into 0{n) equivalence classes by TZ, which is therefore a polynomial equivalence relationship. It now suffices to give an algorithm which composes a sequence of instances of hamiltonian s — t Path in Directed Bipartite Graphs which are equivalent under TZ into one instance of Directed Hamiltonian Cycle parameterized by a modulator to Bi-paths. If the input consists of malformed instances then we can simply output a constant-size no-instance. Hence in the remainder we may assume that the input contains r well-formed instances {D\, Ai,B\),..., {Dr, Ar,Br), and that \Ai\ = tia and \Bi\ = ub for i G [r] with ub = ua + 1- Label the vertices in each set Ai as a^i,..., a,i)riA and the vertices of a set Bi as 6^1,..., 6j,nB f°r i G [r]. Recall that instance i asks whether Di has a Hamiltonian path from 6^1 to 6j,nB- We construct a digraph D* as follows. 1. For i G [r], for j G [ua] add vertices •, a" •, a"'j to D*, and add arcs (aij' aij)' a«,j)' (a«,j' a«,j)' (a«,j' a«,j)" 2. As the next step we add one-directional arcs to connect adjacent triples. For i G [r], for j G [riA - 1] add the arc (af'-, 16 (a) Input instance (D\, A\, B\) of Hamiltonian s—t Path in Directed Bipartite Graphs. (b) Output instance of Directed Hamiltonian Cycle parameterized by a modulator to Bl-paths. Figure 1: An example of the lower-bound construction of Theorem 7 when composing r = 2 inputs with riA = 4 and = 5. (a) The first input instance, (b) Resulting output instance. The arcs between {&*,..., 65} and {a'2 j, a1^ j, a1^j \ j 6 [4]} which encode the second input (D2, A2, B2) have been omitted for readability. 3. For each instance i 6 [r] add two special vertices Xj and j/j, together with arcs (x^a^) and (a'"nA,yi). For i G [r - 1] add the arcs (yi,xi+1). 4. Observe that at this stage, D* £ Bl-paths. All vertices we add from this point on will go into the modulator X* such that D* — X* will be a member of Bl-paths. 5. We add a special vertex z with arcs (xi,z) and (z,yi) for i £ [r]. 6. For j 6 [n^] add a vertex b* to the graph Z)*, and let B* be the set of these vertices. Add arcs (yr,6*) and 7. As the last step of the construction we re-encode the behavior of the input graphs Di into the instance. For i £ [r], for all arcs (ajj,6j,/i) in A{Di) add the arc {a^j^b^) to D*. For all arcs {bij^ai^) 6 A(Z)j) add (b*,a'''h) to D*. This concludes the description of D*, which is illustrated in Fig. 1. Now define X* := {z} U -B*. The output of the cross-composition is the instance (D*,X*) of Directed Hamiltonian Cycle parameterized by a modulator to Bi-paths. It is easy to verify that D* — X* £ Bl-paths, and that the construction can be carried out in polynomial time. The parameter \X*\ is bounded by 1 + ub which is sufficiently small. It remains to prove that D* 17 is YES if and only if one of the input instances is YES. Before proving this equivalence we establish some properties of D*. Claim 1. Let C C A{D*) be a directed Hamiltonian cycle in D*. The following must hold. 1. If (a'-'j, a'-j) is an arc on C for some i £ [r],j 6 [na\ then there are distinct indices f, f such that vertices b^, a'l'j, a'l j, j, b^, appear consecutively on C. 2. If (x^a^) is not an arc on C for some i £ [r], then none of the arcs (a^-, for j £ [riA — 1] are contained in C, nor is the arc [a'l'n ,yl). Proof. We prove the different components consecutively. 1. Assume that (a'-'j,a'-j) is an arc on C. By construction of D* the in-neighbors of a'l'- which are not a'l ■ are of the form b*^ for / 6 [ns], and since a'l ■ is used as the successor of a'l'- this shows that the predecessor of af- must be some b*^. Since Nij*(a'lj) = {a^ ■, a'-'j} we find by Proposition 2 that the vertices a'l' -^a'l ■^a'i ■ must be consecutive on C. Similarly as before, the construction of D* shows that the only out-neighbors of a!i j different than a" • are of the form by for /' £ [n^]. Since a'-j is used as predecessor of a\ j on C, it cannot be the successor and hence some b*^, must be the successor of a!i j on C which shows that 6j, a'-'j, a'l j,a'rl j, b*^, are consecutive on C. 2. Assume that (xj, a!i 1) is not on C, but at least one of the arcs in the set Zj := {(a'-'j, a!i | j G [riA — 1]}U{{o.'l'nA,Hi)} is used on C. Let j* be smallest index such that the vertex af-, is the head of an arc in Cfl Zj. Then a',- -t is not the successor of a!"-* on C and by Proposition 2 it must therefore be its predecessor, showing that , a"-,, a'l'-, must be consecutive on C. If j* > 2 then the only in-neighbors of •» in D* are {a^-»_1, a" •»}, and if j* = 1 then the only in-neighbors are {xi,a"j*}. By choice of j* as the smallest index from Z{ which is the head of an arc in C n Zj, the first in-neighbor of •» cannot be its predecessor on C. But since a"is its successor on C, that vertex cannot be its predecessor either. So does not have a predecessor on C which contradicts the assumption that C is a Hamiltonian cycle, which concludes the proof of this part. □ We are now ready to prove that (D*,X*) is YES if and only if one of the input instances is YES. For the first direction, assume that D* has a directed Hamiltonian cycle C. Since Npt{z) = {xi \ i £. [r]} there is an index i* such that is the predecessor of z on C, which shows that the arc (xi*,a'it 1) is not used on C. By (2) this implies that there are no arcs (a"' •, a'it for j £ [riA — 1] contained in C, nor is the arc (a'" n ,yi*). By construction of D* this implies that each vertex a'l', ■ for j 6 [tia] has a" • as its successor on C (there is only one other option for the successor, which is explicitly excluded). Hence all the arcs (a"/ -,a"» •) are contained in C for j £ \ua\- By (1) this shows that for each j 6 [ng] there are b*^, b*^, such that b*^, a'" a" a'it b*^, are consecutive on C. Now consider the set of arcs Cj* in Di* defined as follows: • If b*f,a'll j,a'l, j,a'it j,b*f, are consecutive on C then add the arcs (&«*,/, CLi*,j) and (aj*j, ^«*,/') to Cj.. Recall that by construction of D*, the arc (6j, a"' j) is only present in D* when (&«*,/, a«* j and the arc [a'it j,b*j) is only present in D* when (ai*j,bi*j) G A(Di*), and hence all arcs added to Ci* by this definition are indeed present in Di*. We will show that the set Cj* C A(Di*) satisfies 18 all requirements of Proposition 3 for graph Di*, thereby showing that Di* has a Hamiltonian path from to bi*jTlB. If Ci* contains a directed cycle, then by construction of Ci* it follows that C must contain a closed cycle on a vertex subset of {b* \ j G [n^]} U {a'it •, a" •, a"' • | j G [^a]} showing that C is not a Hamiltonian cycle in D*; hence the set Ci* satisfies (1) of Proposition 3. The definition of Ci* directly shows that C satisfies (2). If some vertex bi*j is the head or tail of two arcs in Ci* then the corresponding vertex b* is head or tail of two arcs in the Hamiltonian cycle C, which is not possible; hence (3) is satisfied. Since the definition of Hamiltonian s — t Path in Directed Bipartite Graphs guarantees that bi*A does not have in-arcs in Di*, and that bi*)TlB has no out-arcs in Di*, the vertex b\ cannot occur as successor to a vertex a'it • and vertex 6* g cannot occur as predecessor of a vertex aHJ, ■. Therefore Ci* contains no arcs leading into bi*x and no arcs leading out of bi*jTlB, proving that the last condition is also satisfied. By the proposition this proves that Ci* is a Hamiltonian bi*A — bi*)TlB path in Di* which proves this direction of the equivalence. The other direction is straight-forward. Assume that Ci* is a Hamiltonian path from bi*A to bi*)TlB in Di*. We construct a Hamiltonian cycle C in D* as follows. • For i ^ i* add all arcs between consecutive vertices of Xi, 1, a"-^, a'"-^, a'i2, ■ ■ ■, ai'nA-i, ai nA, ai,nAiai'nAiyi to C. • Add all arcs (a'Jlj, a'(t j), (<* j, a-*,j) for j G [nA] to C. • Add all arcs (yi,Xi+i) for i G [r — 1] to C. • Add the arcs (b^ ,x±) and (yr,b*) to C. • For each arc (bi*j,cii*j) G Ci* add (b*^,a'-l j) to C. • For each arc (cii*j,bi*j) G Ci* add [a'it j,b*j) to C. Using the construction of D* is it straight-forward to verify that C is a Hamiltonian cycle in D*, which proves the reverse direction of the equivalence and concludes the proof. □ It is not difficult to see that the proof of Theorem 7 can be adapted to give a kernel lower bound for the variant where we are looking for a Hamiltonian path instead of a Hamiltonian cycle; these bounds in turn imply that the versions where we are looking for a long path or cycle (instead of one which is Hamiltonian) are at least as hard to kernelize, as is the case when we want to find a long s — t path or a long cycle through a given vertex. 5.2 Hamiltonian Cycle with a modulator to outerplanar graphs For the cross-composition of this section we will use the following variant of hamiltonian Path on undirected bipartite graphs. Hamiltonian s — t Path in Bipartite Graphs Input: A bipartite graph G with color classes A = {a±,..., anA} and B = {b±,..., bng} with nB = nA + 1 such that |iVG(&i)| = \NG(bnB)\ = 1. Question: Does G contain a Hamiltonian path which starts in b\ and ends in bng? NP-completeness of this problem follows from the construction of Proposition 1. 19 Ö Ö Ô (a) Domino. (b) a-traversal. Ô Ö Ö-Ö G-Ô-Ô (c) ä-traversal. Figure 2: (a) The domino gadget with four terminal vertices a~ ,a+ ,ä~ ,a+. If graph G contains the domino as a subgraph such that only these terminals are connected to the remainder of the graph, then any Hamiltonian cycle for G must use the a-traversal (b) or a-traversal (c) of the domino. Proposition 4. The Hamiltonian s - t Path in Bipartite Graphs problem is NP-complete. The problem for which we prove a lower bound is formally defined as follows. Hamiltonian Cycle parameterized by a modulator to Outerplanar graphs Input: A graph G and a modulator X c V(G) such that G — X e outerplanar. Parameter: The size \X\ of the modulator. Question: Does G have a Hamiltonian cycle? We can modify the construction of Theorem 7 to give a lower bound for hamiltonian Cycle parameterized by a modulator to outerplanar graphs. Our main tool is the "domino" gadget of Fig. 2, which a Hamiltonian cycle must visit in one of two specific ways. This domino will be used to simulate directed edges. Proposition 5. Let G be an undirected graph containing the domino as an induced subgraph, such that only the vertices labeled a+,a~,a+,a~ have neighbors outside the domino. Then any Hamiltonian cycle in G must either (a) contain an a-traversal of the domino, which is path between a+ and a~, visiting all other vertices of the domino in between or (b) contain an a-traversal of the domino, which is a path between a+ and a~, visiting all other vertices in between. Theorem 8. Hamiltonian Cycle parameterized by a modulator to Outerplanar graphs admits no polynomial kernel unless NP c coNP/poly. Proof. By Theorem 1 and Proposition 4 it is sufficient to show that hamiltonian s — t Path in Bipartite Graphs cross-composes into Hamiltonian Cycle parameterized by a modulator to Outerplanar graphs. By arguments similar to that in the proof of Theorem 7 we can define a suitable polynomial equivalence relationship 1Z such that it is sufficient to give an algorithm which, given r well-formed instances (G±, A±, B±),..., (Gr, Ar, Br) of hamiltonian s — t Path in Bipartite Graphs such that \Ai\ = ua, \Bi\ = hb with hb = ua + 1 for i e [r] outputs in polynomial time an instance (G*,X*) of Hamiltonian Cycle parameterized by a modulator to Outerplanar graphs which is yes if and only if one of the input instances is yes, and such that \X*\ is polynomial in the size of the largest input instance plus logr. The construction is similar to that of the graph D* in Theorem 7, with the difference that we are now using undirected graphs and that we use an outerplanar gadget to simulate directions of arcs. Assume Ai = {a^i,..., ajjnA} and Bi = ..., &«,nB} f°r i £ [r]. We build a graph G* as follows. 20 1. For i £ [r], for j G [n^] add a copy Ojj of the domino graph (Fig. 2) to G* and label its terminals by ai:j,ai:j,%j and a+.. 2. As the next step we add edges to connect adjacent dominos. For i £ [r], for j £ [n^ — 1] add the edge {a+.,ar.+1}. 3. For each instance i 6 [r] add three special vertices Xi,yi, and together with edges {wi, x{\, {xi,ar1},{a+nA,yi}. For i G [r - 1] add the edges {y^^+i}. 4. Observe that at this stage, G* G O uterplan ar. All vertices we add from this point on will go into the modulator X* such that G* — X* will be a member of outerplanar. 5. We add three special vertices z~, z, z+ with edges with {z~, and {z, z+}. Furthermore, we add edges {xi,z~} and {z+,yi} for i £ [r]. 6. For j G [riß] add a vertex 6* to the graph G*, and let i3* be the set of these vertices. Add edges {yr,b\} and {b*lB,wi}. 7. As the last step of the construction we re-encode the behavior of the input graphs Di into the instance. For i £ [r], for each edge {a,ij,bi:f} in E(Gi) add the edges {a7j'^/} an<^ {^j'^/} to G*. This concludes the description of G*. Now define X* := {z~, z, z+} u -B*. The output of the cross-composition is the instance (G*,X*) of Hamiltonian Cycle parameterized by a modulator to Outerplanar graphs. It is easy to verify that G* — X* £ outerplanar, and that the construction can be carried out in polynomial time. The parameter \X*\ is bounded by 1 + riß which is sufficiently small. It remains to prove that G* is yes if and only if one of the input instances is yes. We first prove some relevant properties of G*. Claim 2. Let C C E{G*) be a Hamiltonian cycle in G*. The following must hold. 1. If C uses an a-traversal of some domino Oij for i 6 [r] and j £ [n^] then there are distinct indices f,f such that bf, the vertices of the domino, and bp appear consecutively on C. 2. If {xjjö^} is not an edge on C for some i 6 [r], then none of the edges {atj>a7j+i\ for 3 £ \tla — 1] are contained in C, nor is the edge {afnA,yi}- 3. There is an index i* £ [r] such that {x^a^} is not used on C. Proof. We prove the different components consecutively. 1. Suppose C uses an a-traversal of some domino Oij. By construction of G* we know that the only neighbors of a^- and ■ which are not contained in the domino are of the form 6j for some / 6 [riß]- Since all vertices of the domino are used on the cycle C in an a-traversal (see Fig. 2) and the vertices a^- and ■ each have only one neighbor on C within the domino, each of these vertices must have a neighbor outside the domino on C and hence these must be of the form b*^ and b*^. We must have f ^ f otherwise we would have a closed cycle containing the domino Oij and a single vertex 6^, and this cycle would not be Hamiltonian since it would not visit the vertex z. 21 2. Assume that {xj, a^} is not on C, but at least one of the edges in the set Zi := {{afj, a^j'+i) I j £ [riA — 1]} U {{a^nA,yi}} is used on C. Let j* be smallest index such that the vertex af-t is the endpoint of an edge in C n Zi. Since a^~-, is endpoint of an edge in C and the other endpoint of this edge does not lie in the domino Oij*, it follows that C contains at most one edge in the domino Ojj* which is incident on af-,. This rules out the possibility that C makes an a-traversal of 0« j*, since that requires two edges within the domino incident on afjt. Because the construction of G* together with Proposition 5 ensures there are only two possible traversals of the domino, we know that C must use an a-traversal of Oij*. This traversal uses exactly one edge of the domino incident on af-,. Since a Hamiltonian cycle must contain exactly two edges incident on every vertex, this shows that C must contain some edge incident on af-, which is not in the domino Oij*. But by construction of G* there is only one such edge: if j* = 1 then this edge is {x^a^} and otherwise this edge is {a^"-,_1, a^~-,}. But by the assumption at the start of the proof and our choice of j*, this edge is not contained in C. Hence there is only one edge incident on af-t contained in C, contradicting the Hamiltonicity of C. 3. The Hamiltonian cycle C must contain two edges incident on every vertex. By construction of G* the vertex z~ is incident on the edge {z~, z} and on edges {xj, z~} for i £ [r]. Hence if C contains two edges incident on z~ then at least one is of the form {xi*,z~} for i* £ [r]. Now consider the vertex Wi*, which has degree two by construction. Therefore both its incident edges must be contained in C, and in particular {wi*,Xi*} is contained in C. So C contains two edges incident on Xj* which are unequal to the edge {xj*, af, 1} and since only two edges incident to each vertex are contained in C, it follows that C does not contain {xj», af, 1}. □ Armed with these claims we prove that G* has a Hamiltonian cycle if and only if one of the input instances Gi* has a Hamiltonian path starting in b\ and ending in bng. First assume G* has a Hamiltonian cycle C C E{G*). By (3) there is an index i* 6 [r] such that {xj, af, 1} 0 C. By (2) this implies there are no edges {af •, for j £ [ua — 1] contained in C, nor is the edge {af nA,yi}- This shows that for each vertex aft • the only edges incident on aft • which can be contained in C, are those of the domino Oj*j which implies by Proposition 5 that C must do an a-traversal of all dominos Oj* j for j 6 \tia\- By (1) each traversal of a domino is preceded and succeeded by distinct vertices b*^,b*^, and by construction of G* this can only occur if {aj*j,6j*j} and {aj*j,&j*j/} are edges of Gi*. Since vertices bi*^ and bi*)TlB have degree one in Gj* by definition of hamiltonian s — t Path in Bipartite Graphs, and since ub = ua + 1, it now follows by reasoning similar to that of Theorem 7 that the set Cj* := {{aj»j, b,t* j} \ b*^ precedes or succeeds domino Oj*j on C} is a Hamiltonian path in Gi* between vertices ftj*^ and bi*jTlB. For the reverse direction, assume that Gi* has a Hamiltonian path Gi* starting in bi*^ and ending in bi*jTlB. Construct a Hamiltonian cycle in G* as follows. • For i ^ i* add the edges {wi,Xi}, {xj, a^}, {a+nA, and {a+., aQ+1} for j G [n^ - 1] to C. • For i ^ i* add the edges on the a-traversals of all dominos Oij with j £ [n^] to C. • Add the edges {i«j*,Xj*}, {xi*,z~}, {z~,z}, {z,z+}, and to C. Also add all edges on a-traversals of the dominos Oi*j for j £ [n^] to C. • Add all edges {j/j,Xj+i} for i 6 [r — 1] to C. 22 • Add the edges {&nB,^i} and {yr,b*} to C. • For j g [ua], if the edges of Gi* incident on a^j in the Hamiltonian path Cj* are {aj* j, 6«*,/} and {cii*:j,bi*jt} then add the edges {a^^,6j} and {a^, j,b*f,} to C. (The ordering of b*^ and 6^, is immaterial since the domino can be traversed in either direction.) Using the construction of G* is it straight-forward to verify that C is a Hamiltonian cycle in G*, which proves the reverse direction of the equivalence and concludes the proof. □ 6 Finding paths with respect to forbidden pairs In this section we study multiple parameterizations of several variants of path problems involving forbidden pairs. The first version we consider is defined as follows. s — t Path with Forbidden Pairs Parameterized by a Vertex Cover of G Input: A graph G, distinct vertices s,t £ V(G), a set H c (V^) of forbidden pairs, and a vertex cover X of G. Parameter: £ := \X\. Question: Is there an s — t path in G which contains at most one vertex of each pair {u, v} 6 HI We can give evidence that this problem is not fixed-parameter tractable. Theorem 9. s — t Path with Forbidden Pairs Parameterized by a Vertex Cover of G is hard for W[l]. Proof. We give a parameterized reduction from the W[l]-hard ^-multicolored Clique problem [19]. Let (G, k, 4>) be an instance of ^-multicolored Clique, where 4> : V(G) —> {1,... , k} assigns each vertex of G a color from 1 to A: and the task is to decide whether G contains a clique with one vertex of each of the k colors. We construct a graph G' as follows. We first make an independent set on the vertices V(G), and add a forbidden pair for each pair of vertices from V{G) that are either not adjacent in G or which have the same color according to 0. Then we add k + 1 vertices vq,vi, ... ,Vk which will form the vertex cover. We connect vertex vq to all vertices of the independent set that have color 1 according to 0. For all i £ {1,..., k — 1}, we connect Vi to all vertices that have colors i or i +1. Finally, we connect to all vertices of color k. We set s := vq and t := v^, and return the instance (G , vq,Vk,X' := {vq, ... ,«&}). We note that this can be easily performed in polynomial time, and that the parameter value, i.e., k + 1, is bounded by a function of k. It is easy to see that every s — t path has 2k + 1 vertices, and uses all vertices vq, ... ,Vk as well as one vertex of each color. Note that the colors are not part of the produced instance, but the forbidden pairs are used to encode this property. The latter vertices can also not be non-adjacent in G, and hence they correspond to a multicolored clique of size k. Similarly, given such a multicolored clique it is straightforward to find an s — t path of length 2k + 1 in G' that respects the forbidden pairs. Thus W[l]-hardness of s — t Path with Forbidden Pairs Parameterized by a Vertex Cover of G follows. □ 23 Let us now consider some variations of the problem. The problems Shortest s — t Path With Forbidden Pairs and Longest s — t Path With Forbidden Pairs are defined similarly as s — t Path with Forbidden Pairs, with the difference that there is an extra integer k in the input and we are asking for an s — t path containing at most or at least k vertices. In longest Path With Forbidden Pairs we omit the inputs s and t, and are looking for any sufficiently long path, regardless of its endpoints. The related problem Shortest Path With Forbidden Pairs is not interesting, since its solution always consist of a path containing a single vertex. It can be easily verified that asking for a path on at least 2k + 1 vertices, regardless of its endpoints, in the construction for the proof of Theorem 9 leads also to a path whose vertices from the independent set made of V{G) correspond to a multicolored clique in G. Also, since asking for a long or short s — t path is as least as hard as asking for the existence of an s — t path, we obtain the following results as a corollary. Corollary 5. The problems Shortest s — t Path With Forbidden Pairs, Longest s — t Path With Forbidden Pairs and Longest Path With Forbidden Pairs, are hard for W[l] when parameterized by a vertex cover of G. Clearly, the hardness of the path problems with forbidden pairs stems from the extra structure of the forbidden pairs H, which is not taken into account when considering structural parameters of G. In the following we consider the effect of parameterizing by the structure of the graph GU H (i.e., G with an added edge for every forbidden pair). Using the optimization version of Courcelle's Theorem applied to structures of bounded tree-width [2, 4, 8, 12, 13] (instead of graphs, as is more common) it is not difficult to obtain an FPT result parameterized by the treewidth of GUH. If we take a structure on the base set V(G)L)E(G)U H which encodes an instance through unary predicates which say whether an object is a vertex of G, edge of G, or forbidden pair in H, and through binary predicates which give the incidence between vertices and edges or pairs, then the treewidth of the structure equals the treewidth of GU H. For such a representation it is well-known how to devise an MSOL formula which asks whether there exists a set of edges which forms a path between s and t, and such that no two vertices incident on such an edge form a forbidden pair. Using standard extensions of MSOL we may also maximize or minimize the size of a set of edges which forms an s — t path respecting forbidden pairs, obtaining the following result. Proposition 6. The problems Shortest s — t Path With Forbidden Pairs, Longest s — t Path With Forbidden Pairs, and Longest Path With Forbidden Pairs, are fixed-parameter tractable parameterized by the treewidth ofGUH. For the case of Shortest s — t Path With Forbidden Pairs the structure of G is actually not so important for the complexity of the problem: it is sufficient to parameterize by a vertex cover of the graph on the edge set H to obtain fixed-parameter tractability, as demonstrated by the following theorem. Theorem 10. Shortest s — t Path With Forbidden Pairs Parameterized by a Vertex Cover of H is fixed-parameter tractable. Proof. Given a graph G with endpoints s, t and forbidden pairs H such that X is a vertex cover of H, we describe how to find the shortest s — t path which avoids forbidden pairs. For all subsets I'CI which do not contain a forbidden pair, we compute the shortest s — t path which intersects X only 24 in X' as follows. Let Y be the vertices which form a forbidden pair with a member of X': then the desired path is a shortest s — t path in the graph G — {X \X') — Y. Since a shortest path in this graph can be found in linear time using breadth-first search, we solve the problem in C*(2'x') time by returning the shortest s — t path found over all choices of X' C X. □ It is easy to see that the positive news of Theorem 10, membership in FPT parameterized by a vertex cover of H, cannot be extended to Longest s — t Path With Forbidden Pairs since the latter problem is already NP-complete when there are no forbidden pairs. We mention without proof that s — t Path with Forbidden Pairs is NP-complete when the graph induced by H is a matching, showing that we cannot improve the parameterization by a vertex cover of H to the treewidth of H. As the final topic of this section we will consider the kernelization complexity of forbidden path problems, obtaining a super-polynomial lower bound on the kernel size when parameterizing by a vertex cover of G U H. Theorem 11. s — t Path with Forbidden Pairs Parameterized by a Vertex Cover of GUff admits no polynomial kernel unless NP C coNP/poly. Proof. We consider the classical problem s — t Path with Forbidden Pairs where we simply drop the set X from the problem description. We show that s — t Path with Forbidden Pairs cross-composes into s — t Path with Forbidden Pairs Parameterized by a Vertex Cover of G U H, which suffices to prove the claim by Theorem 1 since the construction of Theorem 9 shows that s — t Path with Forbidden Pairs is NP-complete. Let 1Z be a polynomial equivalence relation under which two well-formed instances (G±, s±, t±, Hi) and (G2, S2,t2, H2) are equivalent if |V(Gri)| = jV^G^)!- We show how to compose a sequence of inputs (G±, si,ti, Hi),..., (Gr, sr,tr, Hr) on n vertices each. Assume the vertices of V{Gi) are labeled vi,..., vn such that v\ = Sj and vn = tj. We build a graph G* with a vertex cover X*, and a set of forbidden pairs H* C (V{fy) such that all forbidden pairs have at least one member in X*. 1. For j £ [n] add a vertex v* to G*. 2. Add a special starting vertex w to G*. 3. For i 6 [r] add a vertex Z{ to G* and make it adjacent to w and v\. 4. For 1 < j < h < n, add a ladder gadget Lj^ with n spokes to G*: • Add vertices t* h and f%- h to G* for i £ [n]. . Add the set of edges {{t)h, t^1}, {t)h, f^1}, {f^, f^1}, {f^, t^1} | i G [„ - 1]} to form the inside of the ladder. • Make v* adjacent to t^h and fjh, and make adjacent to ir-h and fr^h. 5. Repeat the following for each i £ [r]: • For 1 < j < h < n, if {vj,Vh} 0 E{Gi) then add | x 6 to the set of forbidden pairs. Here Lj^ denotes the set of 2n vertices on the ladder for {j, h}. • For 1 < j < h < n, if {vj,Vh} £ E{Gi) then do as follows for all q 6 [n]: - If {uj,^} £ ffj or G Hi then add {zi, fjh} to 25 Figure 3: An example of the lower-bound construction of Theorem 11, cross-composing three instances with n = 4 into one. For clarity, only the vertices of the ladder are drawn; the other ladders are visualized by dotted paths. The forbidden pairs are drawn as zigzagged lines. Forbidden pairs with one element in {^1,^2} are n°t drawn. For this example, the third input (G3, S3, £3, H$) has a forbidden pair {^1,^3} £ H3 causing the forbidden pair {z3,ff2} £ H*. The vertices below the horizontal dashed line form the vertex cover X*. - Else, if {vj,vq} Hi and {vh,vq} Ht, then add {zi,tqjh}. • For 1 < j < h < n, for q G [n], add {tq h,v*} to H*. 6. This concludes the description of G* and H*, which is illustrated in Fig. 3. Define X* : = V(G*) \{zt\ie [r]}. It is easy to see that this construction can be carried out in polynomial time. The set X* is a vertex cover for G* U H* since all edges and forbidden pairs of G* have at least one element in X*. The size of X* is 1 + n + 2n(^), which is polynomial in n and therefore the parameter £ := \X*\ of the constructed instance is suitably small. We set s* := w and t* := u*. Let us establish some properties of the constructed instance. Claim 3. Consider an s* — t* path P in G* which respects the forbidden pairs H*, and orient the path such that it starts at s*. The following must hold. 1. There is exactly one index i* £ [r] such that Zi* 6 P, and the path P has the form {s* = w,Zi*,vl,...,v* =£*). 2. If P contains v* and vjt then {vj,Vh} 0 Hi*, where i* is as defined above. 3. If v* and vjt are vertices on P, and no vertices of the form v*^ are visited by P between visiting v* and vjt, then {vj,Vh} £ E{Gi*), where i* is as defined above. Proof. (1) Since all vertices Z{ for i £ [r] have the same neighborhood consisting of w and v*, if a path contains at least two such vertices then at least one of them is an endpoint of the path, which is not possible since P is an s* — t* path. Any s* — t* path must use at least one vertex Z{* since all neighbors of s* = w have this form. (2) Suppose P contains v* and v^, and assume for a contradiction that {vj,Vh} 6 Hi*. Assume without loss of generality that j < h and therefore v* 7^ u*. Since v* is not an endpoint of P, 26 vertex v* must have two neighbors on P. By construction of G* it easily follows that at least one of these neighbors must lie on a ladder. Assume that v* has a neighbor on a ladder LjjP; the case that the neighbor lies on a ladder Lpj is symmetric. The forbidden pairs involving Zi* and the vertices of the ladder Lj)P ensure that for each spoke of the ladder on vertices {t'jpifjp} there is a forbidden pair containing Zi* and exactly one vertex of the pair. Since P contains Zi* it must avoid exactly one vertex of each spoke of the ladder, which implies by construction of G* that all vertices on the ladder have only one valid option along which to continue the path, implying that P must traverse the entire ladder to the other endpoint v*. Since {vj,Vh} £ Hi* the definition of G* shows that {z,i*, fj'p} £ H*, and therefore path P must visit the other vertex tjp of the spoke when traversing the ladder. But by the last step of the construction we know that {tjp,v^} £ H* is a forbidden pair of which P contains both vertices; a contradiction. (3) Suppose that P successively visits the u*-vertices v* and vjt with j < h. By construction of G* it is easy to see that P must traverse the vertices of the ladder Lj^. Since P contains z^, and forms a forbidden pair with every vertex on the ladder Lj^ if {vj,Vh} £" E(Gi*), it follows that if P can traverse the ladder Lj^ without hitting a forbidden pair then the corresponding edge must be contained in Gi*. □ With these structural properties of the constructed instance we can prove the correctness of the cross-composition. Assume that G* is a yes-instance, and let P be an s* — t* path in G*. By (1) path P has the form (w, Zi*,u*,..., u*). Orient P from s* to £*, and consider the vertices (v* ,... ,v* ) in the set {v* \ i £ [n]} which are successively visited on this path. By (2) it follows that there are no indices ij,if such that {v^,Vif} £ Hi*, and by (3) the set E{Gi*) contains all edges {vij,Vij+1} for j £ [m — 1]. Hence the vertices (v^,... ,Vim) form a path in Gi* which does not contain a forbidden pair. By the form of P described in (1) it follows that v* = v*, and since P ends with it follows that v* = u*. Hence this suggested path is a v\ — vn path in Gi* which avoids forbidden pairs, which proves that instance number i* is yes. For the reverse direction, suppose that Gi* contains a path (v^,..., Vim) with i\ = 1 and im = n which does not contain a forbidden pair. Now construct a path in G* which starts at s* = w and successively visits Zi* and v\. It then traverses the ladder to v* , and by construction of G* there is exactly one vertex on each spoke of the ladder which is not in a forbidden pair with Z{*\ the path uses these vertices. We can continue this, alternatingly traversing a vertex v* and a ladder, until reaching v*m = t*. It is not difficult to verify that the constructed path does not contain any forbidden pairs of H*, which concludes the proof. □ It is easy to modify the construction of Theorem 11 to prove a kernel lower bound for longest Path With Forbidden Pairs Parameterized by a Vertex Cover of GuH. The definition of the latter problem does not allow us to specify the endpoints s and t of the path, but by creating two suitably long paths and connecting these only to s and t we can ensure that a sufficiently long path in the resulting graph is actually an s — t path. Also, similarly as before the hardness results for the existence problem of s — t paths immediately carry over to long and short s — t paths. We obtain the following results as a corollary. Corollary 6. The problems Shortest s — t Path With Forbidden Pairs, Longest s — t Path With Forbidden Pairs, and Longest Path With Forbidden Pairs, do not admit a polynomial kernel parameterized by a vertex cover of GU H unless NP C coNP/poly. Table 1 contains a summary of the results of this section. 27 vc(G) vc(H) TW(ff) tw(G U H) vc(G U H) s - t Path F.P. W[l]-hard FPT Para-NP-c FPT No poly Shortest s — t Path F.P. W[l]-hard FPT Para-NP-c FPT No poly Longest s - t Path F.P. W[l]-hard Para-NP-c Para-NP-c FPT No poly Longest Path F.P. W[l]-hard Para-NP-c Para-NP-c FPT No poly Table 1: Complexity overview of path problems with forbidden pairs. Each column represents a different parameterization; vc(G) denotes the minimum vertex cover size of G, and tw(G) denotes its treewidth. F.P. abbreviates "with Forbidden Pairs". The classification "No poly" means "no polynomial kernel unless NP c coNP/poly", and "Para-NP-c" means "NP-complete for a constant value of the parameter". For a parameterization in FPT, we either list "FPT" or "No poly", depending on which of the two is more relevant: all problems listed as "No poly" are in FPT, and no problem listed as "FPT" admits a polynomial kernel. shortest Path F.P. is trivially in P. 7 Conclusion In this work we have shown that for sufficiently strong structural parameterizations, many path and cycle problems admit polynomial kernels even though their natural parameterizations do not. The marking technique using bipartite matching yields quadratic-vertex kernels for many problems parameterized by the size of a vertex cover. We introduced a binary encoding trick which gives polynomial kernels for problems parameterized by the max leaf number or by a modulator to cographs. On the negative side, we also exhibited smaller structural parameters which provably do not lead to polynomial kernels for Hamiltonian Cycle unless NP c coNP/poly. Let us reflect briefly on the parameters used for the upper- and lower bounds. Recall that the vertex cover number of a graph can also be interpreted as the number of vertex-deletions needed to reduce the graph to an independent set, i.e. the vertex-deletion distance to a graph of treewidth 0. Hence Theorem 3 shows that long Cycle admits a polynomial kernel parameterized by vertex-deletion distance to treewidth 0. On the other hand, Theorem 8 shows that if NP ^ coNP/poly then hamiltonian Cycle does not have a polynomial kernel parameterized by the deletion distance to treewidth two (since outerplanar graphs have treewidth at most two), and of course this carries over to the harder problem long Cycle. It is interesting to settle what happens for treewidth one, i.e., forests: does hamiltonian Cycle parameterized by a feedback vertex set admit a polynomial kernel? To generalize the result of Theorem 6 by distance to a cluster graph, one could consider the distance to cographs. The kernelization complexity of compound parameterizations remains largely unexplored: for example, how does the long Cycle problem behave when parameterized by the solution size plus the vertex-deletion distance to an outerplanar graph? It follows from the work of Bodlaender, Thomasse, and Yeo [7] that Disjoint Paths and Disjoint Cycles do not admit polynomial kernels parameterized by the target value k plus the deletion distance to a path. We hope that a search for polynomial kernels of structural parameterizations leads to reduction rules which are useful in practice. References [1] N. Alon, R. Yuster, and U. Zwick. Color-coding. Journal of the ACM, 42(4):844-856, 1995. 28 [2] S. Arnborg, J. Lagergren, and D. Seese. Easy problems for tree-decomposable graphs. Journal of Algorithms, 12(2):308-340, 1991. [3] A. Björklund, T. Husfeldt, P. Kaski, and M. Koivisto. Narrow sieves for parameterized paths and packings. CoRR, abs/1007.1161, 2010. [4] H. L. Bodlaender. A linear-time algorithm for finding tree-decompositions of small treewidth. SI AM Journal on Computing , 25(6):1305-1317, 1996. [5] H. L. Bodlaender, R. G. Downey, M. R. Fellows, and D. Hermelin. On problems without polynomial kernels. Journal of Computer and System Sciences, 75(8):423-434, 2009. [6] H. L. Bodlaender, B. M. P. Jansen, and S. Kratsch. Cross-composition: A new technique for kernelization lower bounds. In Proc. 28th STAGS, pages 165-176, 2011. [7] H. L. Bodlaender, S. Thomasse, and A. Yeo. Kernel bounds for disjoint cycles and disjoint paths. Theoretical Computer Science, 412(35):4570-4578, 2011. [8] R. B. Borie, R. G. Parker, and C. A. Tovey. Automatic generation of linear-time algorithms from predicate calculus descriptions of problems on recursively constructed graph families. Algorithmica, 7(5&6):555-581, 1992. [9] J. Chen, S. Lu, S.-H. Sze, and F. Zhang. Improved algorithms for path, matching, and packing problems. In Proc. 18th SODA, pages 298-307, 2007. [10] Y. Chen and J. Flum. On parameterized path and chordless path problems. In Proc. 22nd CCC, pages 250-263, 2007. [11] Y. Chen, J. Flum, and M. Müller. Lower bounds for kernelizations and other preprocessing procedures. Theory of Computing Systems, 48(4):803-839, 2011. [12] B. Courcelle. The monadic second-order logic of graphs I: Recognizable sets of finite graphs. Information and Computation, 85(1):12—75, 1990. [13] B. Courcelle and M. Mosbah. Monadic second-order evaluations on tree-decomposable graphs. Theoretical Computer Science, 109(l&2):49-82, 1993. [14] M. Cygan, M. Pilipczuk, M. Pilipczuk, and J. O. Wojtaszczyk. Kernelization hardness of connectivity problems in 2-degenerate graphs. In Proc. 36th WG, pages 147-158, 2010. [15] H. Dell and D. Marx. Kernelization of packing problems. In Proc. of 23rd SODA, 2012. To appear. [16] H. Dell and D. van Melkebeek. Satisfiability allows no nontrivial sparsification unless the polynomial-time hierarchy collapses. In Proc. 42nd STOC, pages 251-260, 2010. [17] M. Dom, D. Lokshtanov, and S. Saurabh. Incompressibility through colors and IDs. In Proc. 36th ICALP, pages 378-389, 2009. [18] R. Downey and M. R. Fellows. Parameterized Complexity. Monographs in Computer Science. Springer, New York, 1999. 29 [19] M. R. Fellows, D. Hermelin, F. A. Rosamond, and S. Vialette. On the parameterized complexity of multiple-interval graph problems. Theoretical Computer Science, 410(1) :53—61, 2009. [20] M. R. Fellows, D. Lokshtanov, N. Misra, M. Mnich, F. A. Rosamond, and S. Saurabh. The complexity ecology of parameters: An illustration using bounded max leaf number. Theory of Computing Systems, 45(4):822-848, 2009. [21] L. Fortnow and R. Santhanam. Infeasibility of instance compression and succinct PCPs for NP. Journal of Computer and System Sciences, 77(1):91—106, 2011. [22] M. R. Garey and D. S. Johnson. Computers and Intractability, A Guide to the Theory of NP-Completeness. W.H. Freeman and Company, New York, 1979. [23] J. Guo and R. Niedermeier. Invitation to data reduction and problem kernelization. SIGACT News, 38(l):31-45, 2007. [24] M. Held and R. M. Karp. A dynamic programming approach to sequencing problems. Journal of the Society for Industrial and Applied Mathematics, 10(1):196-210, 1962. [25] D. Hermelin and X. Wu. Weak compositions and their applications to polynomial lower-bounds for kernelization. Electronic Colloquium on Computational Complexity (ECCC), 18:72, 2011. To appear in Proc. of SODA 2012. [26] D. J. Kleitman and D. B. West. Spanning trees with many leaves. SI AM Journal on Discrete Mathematics, 4(1):99-106, 1991. [27] J. Kneis, D. Molle, S. Richter, and P. Rossmanith. Divide-and-color. In Proc. 32nd WG, pages 58-67, 2006. [28] N. Robertson and P. D. Seymour. Graph minors. XIII. The disjoint paths problem. Journal of Combinatorial Theory, Series B, 63(1):65-110, 1995. [29] J. Scott, T. Ideker, R. M. Karp, and R. Sharan. Efficient algorithms for detecting signaling pathways in protein interaction networks. Journal of Computational Biology, 13(2) :133—144, 2006. 30