Discovering Frequent Structures using Summaries.
Shayan Ghazizadeh and Sudarshan S. Chawathe.
Technical Report CS-TR-4364. Computer Science Department, University of Maryland. College Park, Maryland. November 2001.
[ paper | citation ]
We study the problem of finding frequent structures in semistructured data (represented as a directed labeled graph). Frequent structures are graphs that are isomorphic to a large number of subgraphs in the data graph. Frequent structures form building blocks for visual exploration and data mining of semistructured data. We overcome the inherent computational complexity of the problem by using a summary data structure to prune the search space and to provide interactive feedback. We present an experimental study of our methods operating on real datasets. The implementation of our methods (which is freely available) is capable of operating on datasets that are two to three orders of magnitude larger than those described in prior work.
Back to publications.