为什么GNU grep速度快
Sat Aug 21 03:00:30 UTC 2010
世界标准时间2010年8月21日03:00:30
Hi Gabor,
嗨,Gabor。
I am the original author of GNU grep. I am also a FreeBSD user, although I live on -stable (and older) and rarely pay attention to -current.
我是GNU grep的原作者。 我也是一个FreeBSD用户,尽管我生活在-stable(和更老的)上,很少关注-current。
However, while searching the -current mailing list for an unrelated reason, I stumbled across some flamage regarding BSD grep vs GNU grep performance. You may have noticed that discussion too...
然而,在为一个不相关的原因搜索-current邮件列表时,我偶然发现了一些关于BSD grep与GNU grep性能的争吵。 你可能也注意到了这个讨论...
Anyway, just FYI, here's a quick summary of where GNU grep gets its speed. Hopefully you can carry these ideas over to BSD grep.
总之,仅供参考,这里简要介绍了GNU grep的速度来源。 希望你能把这些想法带到BSD的grep上。
#1 trick: GNU grep is fast because it AVOIDS LOOKING AT EVERY INPUT BYTE.
#第一招。GNU grep之所以快,是因为它避免了对每一个输入字节的观察。
#2 trick: GNU grep is fast because it EXECUTES VERY FEW INSTRUCTIONS FOR EACH BYTE that it *does* look at.
#2号技巧。GNU grep之所以快,是因为它对每一个字节执行的指令都非常少,而它*看的是这些指令。
GNU grep uses the well-known Boyer-Moore algorithm, which looks first for the final letter of the target string, and uses a lookup table to tell it how far ahead it can skip in the input whenever it finds a non-matching character.
GNU grep使用著名的Boyer-Moore算法,它首先寻找目标字符串的最后一个字母,并使用一个查找表来告诉它每当发现一个不匹配的字符时可以在输入中跳过多远。
GNU grep also unrolls the inner loop of Boyer-Moore, and sets up the Boyer-Moore delta table entries in such a way that it doesn't need to do the loop exit test at every unrolled step. The result of this is that, in the limit, GNU grep averages fewer than 3 x86 instructions executed for each input byte it actually looks at (and it skips many bytes entirely).
GNU grep也解开了Boyer-Moore的内循环,并设置了Boyer-Moore delta表项,这样它就不需要在每个解开的步骤中进行循环退出测试。 这样做的结果是,在极限情况下,GNU grep对于它实际查看的每个输入字节平均执行不到3条x86指令(而且它完全跳过许多字节)。
See "Fast String Searching", by Andrew Hume and Daniel Sunday, in the November 1991 issue of Software Practice & Experience, for a good discussion of Boyer-Moore implementation tricks. It's available as a free PDF online.
参见...