Talk:Dirty data
From Dedupe
On Name Normalisation: A more traditional approach might be to "explode" the name into its separate elements*, for example
Surname (or equivalent international term meaning family name)
First name
Other name(s) or initials
Title (e.g. Mr)
Suffix (e.g. OBE, MA, etc.)
This approach requires that surname (the "key" name element), be identifiable (distinguishable) from the other elements. Depending on the source (quality) of the input data, some guesswork could be involved.
{* This may be extended to an entire set of contact details, populating as many as possible of a vast number of exhaustively "atomic" standard fields.)
--Neil T.

