
Instead of starting the search at the beginning of needle, you start at the end. If your needle character doesn't match the character you're looking at in haystack, you can move needle forwards in haystack until haystack's mismatched character lines up with the same character in needle. If haystack's mismatch isn't in needle at all, then you can skip ahead a whole needle's length.įor example, if you're searching for a string of 100 'a's ( needle), you look at the 100th character in haystack. If it's an 'x', well, 'x' doesn't appear anywhere in needle, so you can skip ahead all of needle and look at the 200th character in haystack. If (last_char_in_needle != haystack_char) So the first part of my Boyer-Moore string searching algorithm looked like this:Ĭhar haystack_char = haystack A single mismatch allowed us to skip 100 characters!įor performance, the number of characters you can skip on a mismatch is usually stored in an array indexed by the character value.


So we look at the character in haystack and if it's not what we're looking for, we jump ahead by the right distance for that character, which is in jump_table. " There," I sigh, finishing and sitting back. It may not be faster than grep, but it should be at least as fast, because this is the fastest algorithm known. So I confidently ran my benchmark, for a 1 gigabyte file.

Ultimate Unix Geek chuckles into his xterms.
