PART 1: Finding your duplicate customer records
What issues do you need to be aware of when examining your database for
duplicate database records?
Duplicate Data Cleaning – Step 1: Finding Duplicates
The problem with customer and contact data, or any data that represents
people or organizations (such as customer databases, address databases,
contacts lists, student/patient/staff lists, etc) is that it is very ‘fuzzy’
in nature. By this we mean that there are many different ways to represent
the same person or address in a database and often that information is
highly complex in nature. For example, different people can have the same
name - and also the same person can have multiple addresses.
To complicate things further, many names and addresses can be abbreviated or
formatted in many different ways (eg: "234 W. 2nd St." versus "234 West
Second Street"). In addition, information can sometimes be stored in
different fields (eg: A persons initials may be stored in the initials field
or the first name field. Or a street name may be stored in a field called
'Address' or a field called 'Street') adding to the complications involved
with storing and managing such data.
Data Matching Techniques
This loosely defined and 'fuzzy' data makes it almost impossible for
computer systems to recognize when you have a duplicate record within your
database. This is where a tool like the
Duplicate Record Remover becomes
necessary to examine the data using an advanced fuzzy-logic data matching
algorithm to get a feel for the 'likeness' one record has against another –
regardless of misspellings, abbreviations, typos or information in incorrect
fields.
The Duplicate Record Remover will examine each record in your database
against every other record and using the fuzzy-logic algorithm will come up
with a Likeness Rating Percentage between two records that are alike,
presenting them for automatic or manual merging.
However, finding your duplicate customer records is only half the job. Once
you have identified your possible duplicate customer records, now you have
to merge them together without loosing any data in the process. For more information
on this next step see
Merging your duplicate customer records.