New Generation Computing, 23(2005)315-337
Ohmsha, Ltd. and
Springer
Received 17 February 2004
Revised manuscript
received 27 October 2004
Mining frequent patterns with a frequent pattern tree (FP-tree in short) avoids costly candidate generation and repeatedly occurrence frequency checking against the support threshold. It therefore achieves much better performance and efficiency than Apriori-like algorithms. However, the database still needs to be scanned twice to get the FP-tree. This can be very time-consuming when new data is added to an existing database because two scans may be needed for not only the new data but also the existing data. In this research we propose a new data structure, the pattern tree (P-tree in short), and a new technique, which can get the P-tree through only one scan of the database and can obtain the corresponding FP-tree with a specified support threshold. Updating a P-tree with new data needs one scan of the new data only, and the existing data does not need to be re-scanned. Our experiments show that the P-tree method outperforms the FP-tree method by a factor up to an order of magnitude in large datasets.
Keywords:Data Mining, Association Rules, Frequent Patterns.
Footnote:* A preliminary version of this paper has been published in the Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM '02), 629-632.