Algorithms
From Dedupe
VUCH30 <a href="http://rjfxatvhycxe.com/">rjfxatvhycxe</a>, [url=http://azlembdmgvrd.com/]azlembdmgvrd[/url], [link=http://vefzkbtdlgkm.com/]vefzkbtdlgkm[/link], http://mwzpqfrirhxt.com/
kVAO3d <a href="http://cgmekmacoexn.com/">cgmekmacoexn</a>, [url=http://hsbsyqypfcfp.com/]hsbsyqypfcfp[/url], [link=http://blknzasphipu.com/]blknzasphipu[/link], http://oxsgdwhzzvqa.com/
Approximate String Matching
As opposed to a phonetic algorithm, an approximate string matching function/algorithm will normally accept str1, str2 and give a numeric result.
- (Damerau) Levenshtein distance (aka Edit Distance)
- Trigram / N-gram
- Longest Common Subsequence
- Monger-Elkan distance
- Smith-Waterman distance
Note: It is normally not enough to merely pass str1, str2 to the above functions. We need to add meaning to the numeric result. We may decide to return a value between 0 and 1 for all functions (0 being no match and 1 being a definite match). Consider the following examples: (the numeric result from the Levenshtein distance function respresents the total cost of edits required to transpose str1 into str2
str1 = Computational str2 = Computxrbonel levenshtein distance (str1, str2) = 4 str1 = hello str2 = tests levenshtein distance (str1, str2) = 4
So both of the examples produce the same result. Yet it is clear the first example is more likely to be a match. This is an area for discussion within each article. One suggestion is to use a normalized levenshtein distance where the computed value is divided by the maximum of the two string lengths.
Improving Results
OpnSeason sent me a very interesting paper entitled Image:IEEESoundexV5.pdf by David Holmes (david.holmes@ncr.com) and M. Catherine McCabe (mary.catherine.mccabe@home.com)

