| |||||||||||
|
|
Why does the Examination Process take so long?
An important consideration to remember is that every
record in the database is compared with every other record in the
database. This means the processing
grunt-work required by the fuzzy-matching engine increases exponentially as the
dataset also increases in size. For example: If you have 100,000 records – and add 1 more
record to your dataset, you need to do 100,000 more comparisons (that record is
compared with every other record in the database). And if you add 5 more records you need to do
500,000 more comparisons! So you
can see how quickly the number of records plays a crucial role in how long the
examination process takes. The tool uses SQL Server 2005 Express Edition as the
database engine that’s doing most of this grunt work. This technology was selected for this job because
of the highly efficient query optimization technologies already built into it,
and because the SQL Server query engine makes full use of parallelism across
multiple processors and processor cores and manages memory as efficiently as
possible. So even though the number of calculations being
executed is sometimes extremely high, the selected technology is ideal for
making these calculations as quickly and efficiently as possible. Does it matter what machine its run on? Most certainly yes!
When you have a high number of records, the faster the machine and the
more the memory you have, the quicker the examination process will run. We recommend large datasets (50,000+) should
be left to run overnight or even over the weekend. Does it run faster on a multi-processor machine? The basic install of the tool installs SQL Server 2005
Express edition. The limitations of the
Express Edition of SQL Server means the examination process will only take
advantage of the first 1 Gig of memory, and the first physical processor
(although it will use multi-cores in a single processor). This can prove to be a major limitation when examining
large datasets (100,000+ records) as more processors and more memory can
significantly reduce the time the examination process takes to run. Therefore for those who are on a Gold-Level support
contract, we provide the ability to connect to an external SQL Server that would
typically be a full-version running on a dedicated multi-processor and
multi-gigabyte database server. The
examination process will run significantly faster on such a machine. Please contact us if you want to discuss this option. Related Topics | ||||||||||
|
Duplicate Record Remover
Copyright (c) 2009 Precision Data, All Rights Reserved. | |||||||||||