New Generation Computing, 22(2004)239-252
Ohmsha, Ltd. and Springer-Verlag

Selecting Potentially Relevant Records Using Re-identification Methods

Josep DOMINGO-FERRER
Dept. Comput. Eng. and Maths - ETSE
Universitat Rovira i Virgili, Av Països Catalans 26
43007 Tarragona, Catalonia, Spain

jdomingo@etse.urv.es
Vicenç TORRA
Institut d'Investigacio en Intelligència Artificial - CSIC
Campus UAB s/n, 08193 Bellaterra, Catalonia, Spain

vtorra@iiia.csic.es

Received 27 August 2002
Revised manuscript received 2 May 2003

Abstract

This work proposes re-identification algorithms to select records that are interesting from the point of view of giving new information. Instead of focusing on re-identified elements, we focus on non re-identified records (non linked records) as they are the ones that potentially supply new and relevant information. Moreover, these relevant characteristics can correspond to chances for improving the knowledge of a system.
To evaluate our approach, we have applied it to a example using publicly available data from the UCI repository. We have used the data of the ionosphere data base to build a re-identification problem for 35 non-common variables.
We show that the use of a simple heuristic rule base can effectively select potentially interesting records.

Keywords:Chance Discovery, Knowledge Discovery in Databases, Data Mining, Multi-database Mining, Re-identification Algorithms, Record Selection, Record Linkage.

[Back]