New Generation Computing, 22(2004)239-252
Ohmsha, Ltd. and Springer-Verlag
Received 27 August 2002
Revised manuscript received 2 May 2003
This work proposes re-identification algorithms to select
records that are interesting from the point of view of giving new information.
Instead of focusing on re-identified elements, we focus on non re-identified
records (non linked records) as they are the ones that potentially supply new
and relevant information. Moreover, these relevant characteristics can correspond
to chances for improving the knowledge of a system.
To evaluate our approach, we have applied it to a example using publicly available
data from the UCI repository. We have used the data of the ionosphere
data base to build a re-identification problem for 35 non-common variables.
We show that the use of a simple heuristic rule base can effectively select
potentially interesting records.
Keywords:Chance Discovery, Knowledge Discovery in Databases, Data Mining, Multi-database Mining, Re-identification Algorithms, Record Selection, Record Linkage.