why GNU grep is fast
摘要
Here's a quick summary of where GNU grep gets its speed. Hopefully you can carry these ideas over to BSD grep.
#1 trick: GNU grep is fast because it AVOIDS LOOKING AT EVERY INPUT BYTE.
#2 trick: GNU grep is fast because it EXECUTES VERY FEW INSTRUCTIONS FOR EACH BYTE that it *does* look at.
GNU grep uses the well-known Boyer-Moore algorithm, which looks first for the final letter of the target string, and uses a lookup table to tell it how far ahead it can skip in the input whenever it finds a non-matching character.
GNU grep also unrolls the inner loop of Boyer-Moore, and sets up the Boyer-Moore delta table entries in such a way that it doesn't need to do the loop exit test at every unrolled step. The result of this is that, in the limit, GNU grep averages fewer than 3 x86 instructions executed for each input byte it actually looks at (and it skips many bytes entirely).
欢迎在评论区写下你对这篇文章的看法。