Algorithms

From Dedupe

Jump to: navigation, search

Contents

Algorithm

From Wikipedia, the free encyclopedia (full article: Wikipedia)

In mathematics and computing, an algorithm is a procedure (a finite set of well-defined instructions) for accomplishing some task which, given an initial state, will terminate in a defined end-state. The computational complexity and efficient implementation of the algorithm are important in computing, and this depends on suitable data structures.

Phonetic

A partial list of algorithms used to generate a phonetic key based on a string or phrase. Phonetic algorithms provide a basis for data comparison.

Approximate String Matching

As opposed to a phonetic algorithm, an approximate string matching function/algorithm will normally accept str1, str2 and give a numeric result.

Note: It is normally not enough to merely pass str1, str2 to the above functions. We need to add meaning to the numeric result. We may decide to return a value between 0 and 1 for all functions (0 being no match and 1 being a definite match). Consider the following examples: (the numeric result from the Levenshtein distance function respresents the total cost of edits required to transpose str1 into str2

str1 = Computational
str2 = Computxrbonel
levenshtein distance (str1, str2) = 4

str1 = hello
str2 = tests
levenshtein distance (str1, str2) = 4

So both of the examples produce the same result. Yet it is clear the first example is more likely to be a match. This is an area for discussion within each article. One suggestion is to use a normalized levenshtein distance where the computed value is divided by the maximum of the two string lengths.

Improving Results

OpnSeason sent me a very interesting paper entitled Image:IEEESoundexV5.pdf by David Holmes (david.holmes@ncr.com) and M. Catherine McCabe (mary.catherine.mccabe@home.com)

Personal tools
google ads