PART 1: Finding your duplicate customer records

What issues do you need to be aware of when examining your database for duplicate database records?

Duplicate Data Cleaning – Step 1: Finding Duplicates

The problem with customer and contact data, or any data that represents people or organizations (such as customer databases, address databases, contacts lists, student/patient/staff lists, etc) is that it is very ‘fuzzy’ in nature. By this we mean that there are many different ways to represent the same person or address in a database and often that information is highly complex in nature. For example, different people can have the same name - and also the same person can have multiple addresses.

To complicate things further, many names and addresses can be abbreviated or formatted in many different ways (eg: "234 W. 2nd St." versus "234 West Second Street"). In addition, information can sometimes be stored in different fields (eg: A persons initials may be stored in the initials field or the first name field. Or a street name may be stored in a field called 'Address' or a field called 'Street') adding to the complications involved with storing and managing such data.

Data Matching Techniques

This loosely defined and 'fuzzy' data makes it almost impossible for computer systems to recognize when you have a duplicate record within your database. This is where a tool like the Duplicate Record Remover becomes necessary to examine the data using an advanced fuzzy-logic data matching algorithm to get a feel for the 'likeness' one record has against another – regardless of misspellings, abbreviations, typos or information in incorrect fields.

The Duplicate Record Remover will examine each record in your database against every other record and using the fuzzy-logic algorithm will come up with a Likeness Rating Percentage between two records that are alike, presenting them for automatic or manual merging.

However, finding your duplicate customer records is only half the job. Once you have identified your possible duplicate customer records, now you have to merge them together without loosing any data in the process.  For more information on this next step see Merging your duplicate customer records.