构建你自己的有限状态转换器
Have you always wanted your very own Lucene finite state transducer (FST) but you couldn't figure out how to use Lucene's crazy APIs?
你是否一直想要拥有自己的Lucene有限状态转换器(FST),但是你无法弄清楚如何使用Lucene的疯狂API?
Then today is your lucky day! I just built a simple web application that creates an FST from the input/output strings that you enter.
那么今天是你的幸运日!我刚刚建立了一个简单的网络应用程序,它可以根据你输入的输入/输出字符串创建一个FST。
If you just want a finite state automaton (no outputs) then enter only inputs, such as this example:
如果你只想要一个有限状态自动机(没有输出),那么只输入输入,例如这个例子:
If all of your outputs are non-negative integers then the FST will use numeric outputs, where you sum up the outputs as you traverse a path to get the final output:
如果所有的输出都是非负整数,那么FST将使用数字输出,其中在遍历路径时将输出求和以获得最终输出:
Finally, if the outputs are non-numeric then they are treated as strings, in which case you concatenate as you traverse the path:
最后,如果输出是非数字,那么它们被视为字符串,此时在遍历路径时进行连接:
The red arcs are the ones with the NEXT optimization: these arcs do not store a pointer to a node because their to-node is the very next node in the FST. This is a good optimization: it generally results in a large reduction of the FST size. The bolded arcs tell you the next node is final; this is most interesting when a prefix of another input is accepted, such as this example:
红色的弧线是具有NEXT优化的弧线:这些弧线不存储指向节点的指针,因为它们的目标节点是FST中的下一个节点。这是一个很好的优化:它通常会大大减小FST的大小。加粗的弧线告诉你下一个节点是最终节点;当接受另一个输入的前缀时,这是最有趣的,例如这个例子:
Here the "r" arc is bolded, telling you that "star" is accepted. Furthermore, that node following the "r" arc has a final output, telling you the overall output for "star" is "abc".
这里的"r"弧线是加粗的,告诉你"star"被接受。此外,跟随"r"弧线的节点有一个最终输出,告诉你"star"的整体输出是"abc"。
The web app is a simple Python WSGI app; source code is here. It invokes a simple Java tool as a subprocess; source code (including generics violations!) is here.
这个网络应用程序是一个简单的Python WSGI应用程序;源代码在这里。它调用一个简单的Java工具作为子进程;源代码(包括泛型违规!)在这里。