Bidirectional Search in a String with Wavelet Trees Thomas Schnattinger Institute of Theoretical Computer Science University of Ulm, Germany June 21, 2010 Thomas Schnattinger, University of Ulm, Germany Bidirectional Search in a String with Wavelet Trees Problem description Given: string S el_anele_lepanelen$ Find pattern: l e l _ane l e_ l epane l en$ “Forward search” for e el_ane le _ le pane le n$ “Backward search” for e el_an ele _lepan ele n$ Goal: “full text index” supporting efficient bidirectional search Thomas Schnattinger, University of Ulm, Germany Bidirectional Search in a String with Wavelet Trees Published approaches Forward search Suffix tree (Weiner P., 1973) Suffix array (Manber et al., 1990) and several variants... Backward search “FM-Index” (Ferragina et al., 2000) variant, using a “Wavelet Tree” (Grossi et al., 2003) Bidirectional search Affix tree (Stoye J., 1995) Affix array (Strothmann D., 2007) Thomas Schnattinger, University of Ulm, Germany Bidirectional Search in a String with Wavelet Trees Motivation Search for palindromic patterns in large strings e.g.: finding “RNA secondary structure” on the human genome (possible base pairs are a–t, c–g and g–t) . . . ggtgtatcgccgttacgtc . . . ggtgtatcgccgtaacgtc . . . Thomas Schnattinger, University of Ulm, Germany Bidirectional Search in a String with Wavelet Trees Motivation Search for palindromic patterns in large strings e.g.: finding “RNA secondary structure” on the human genome (possible base pairs are a–t, c–g and g–t) . . . ggtgtatcgccgttacgtc . . . ggtgtatcgccgtaacgtc . . . . . . ggtgtatcgccgttacgtc . . . ggtgtatcgccgtaacgtc . . . Thomas Schnattinger, University of Ulm, Germany Bidirectional Search in a String with Wavelet Trees Motivation Search for palindromic patterns in large strings e.g.: finding “RNA secondary structure” on the human genome (possible base pairs are a–t, c–g and g–t) . . . ggtgtatcgccgttacgtc . . . ggtgtatcgccgtaacgtc . . . . . . ggtgtatcgccgttacgtc . . . ggtgtatcgccgtaacgtc . . . . . . ggtgtatcgccgttacgtc . . . ggtgtatcgccgtaacgtc . . . Thomas Schnattinger, University of Ulm, Germany Bidirectional Search in a String with Wavelet Trees Motivation Search for palindromic patterns in large strings e.g.: finding “RNA secondary structure” on the human genome (possible base pairs are a–t, c–g and g–t) . . . ggtgtatcgccgttacgtc . . . ggtgtatcgccgtaacgtc . . . . . . ggtgtatcgccgttacgtc . . . ggtgtatcgccgtaacgtc . . . . . . ggtgtatcgccgttacgtc . . . ggtgtatcgccgtaacgtc . . . . . . ggtgtatcgccgttacgtc . . . ggtgtatcgccgtaacgtc . . . Thomas Schnattinger, University of Ulm, Germany Bidirectional Search in a String with Wavelet Trees Some basics Given: string S = el_anele_lepanelen$ S1 = el_anele_lepanelen$ S2 = l_anele_lepanelen$ S3 = _anele_lepanelen$ ... S17 = en$ S18 = n$ S19 = $ Definition (Suffix Array) The suffix array SA of a string S is an array containg a permutation of the numbers in the interval [1..n] so that SSA[1]