欧美性猛交XXXX免费看蜜桃,成人网18免费韩国,亚洲国产成人精品区综合,欧美日韩一区二区三区高清不卡,亚洲综合一区二区精品久久

打開(kāi)APP
userphoto
未登錄

開(kāi)通VIP,暢享免費電子書(shū)等14項超值服

開(kāi)通VIP
Did You Mean: Lucene?

 

by Tom White
08/09/2005

Contents
Techniques of Spell Checking
A Simple Search Application
the Simple Search
   Generating a Spell Index
    The "Did You Mean" Search Engine
    The "Did You Mean" Parser
How It Works
Supporting Composite Queries
Ensuring High-Quality Suggestions
   Zeitgeist
Conclusion
References

All modern search engines attempt to detect and correct spelling errors in users‘ search queries. Google, for example, was one of the first to offer such a facility, and today we barely notice when we are asked "Did you mean x?" after a slip on the keyboard. This article shows you one way of adding a "did you mean" suggestion facility to your own search applications using the Lucene Spell Checker, an extension written by Nicolas Maisonneuve and David Spencer.

Techniques of Spell Checking

Automatic spell checking has a long history. One important early paper was F. Damerau‘s A Technique for Computer Detection and Correction of Spelling Errors, published in 1964, which introduced the idea of minimum edit distance. Briefly, the concept of edit distance quantifies the idea of one string being "close" to another, by counting the number of character edit operations (such as insertions, deletions and substitutions) that are needed to transform one string into the other. Using this metric, the best suggestions for a misspelling are those with the minimum edit distance.

Another approach is the similarity key technique, in which words are transformed into some sort of key so that similarly spelled and, hopefully, misspelled words have the same key. To correct a misspelling simply involves creating the key for the misspelling and looking up dictionary words with the same key for a list of suggestions. Soundex is the best-known similarity key, and is often used for phonetic applications.

A combination of minimum edit distance and similarity keys (metaphone) is at the heart of the successful strategy used by Aspell, the leading open source spell checker. However, it is a third approach that underlies the implementation of the "did you mean" technique described in this article: letter n-grams.

A letter n-gram is a sequence of n letters of a word. For instance, the word "lucene" can be divided into four 3-grams, also known as trigrams: "luc", "uce", "cen", and "ene.". Why is it useful to break words up like this? The intuition is that misspellings typically only affect a few of the constituent n-grams, so we can recognize the intended word just by looking through correctly spelled words for those that share a high proportion of n-grams with the misspelled word. There are various ways of computing this similarity measure, but one powerful way is to treat it as a classic search engine problem with an inverted index of n-grams into words. This is precisely the approach taken by Lucene Spell Checker. Let‘s see how to use it.

本站僅提供存儲服務(wù),所有內容均由用戶(hù)發(fā)布,如發(fā)現有害或侵權內容,請點(diǎn)擊舉報。
打開(kāi)APP,閱讀全文并永久保存 查看更多類(lèi)似文章
猜你喜歡
類(lèi)似文章
TED爆紅課程:如何在6個(gè)月內學(xué)會(huì )一門(mén)外語(yǔ)
富蘭克林的考試 Franklin's TEST
聽(tīng)說(shuō)讀寫(xiě)的利器-音節劃分-音頻講解-第二部分-為什么要學(xué)習開(kāi)音節和閉音節
unit1 My classroom Part A Let''s learn 教案
How to Write a Spelling Corrector
Franklin and his friends
更多類(lèi)似文章 >>
生活服務(wù)
分享 收藏 導長(cháng)圖 關(guān)注 下載文章
綁定賬號成功
后續可登錄賬號暢享VIP特權!
如果VIP功能使用有故障,
可點(diǎn)擊這里聯(lián)系客服!

聯(lián)系客服

欧美性猛交XXXX免费看蜜桃,成人网18免费韩国,亚洲国产成人精品区综合,欧美日韩一区二区三区高清不卡,亚洲综合一区二区精品久久