Progress in Mining maximal frequent patterns

Given a database d, a finite language L of patterns, and a predicate Q for evaluating whether a pattern φ ∈ L∗ is “interesting” in d, the discovery task is to find the theory of d with respect to L and Q, i.e. the set Th(L, d, Q) = {φ ∈ L∗ |Q(d, φ) holds}. The framework of Mannila and Toivonen, classifies pattern mining problems that are (isomorphically) equivalent to frequent itemset mining (FIM). Moreover, the notion of dualization turns out to be crucial to characterize the complexity of maximal patterns mining problems. Nevertheless, many pattern mining problems cannot benefit from this framework since the isomorphism requirement is too restrictive for many patterns such as sequences, episodes or graphs to mention a few.
  • We extend their framework to take into account such complex patterns and studying their complexity.
  • We identify and classify pattern structures, which are not boolean lattices, for which the dualization problem remains quasi-polynomial.
  • We introduce convex embedding and poset reduction as key notions to obtain a new classification of pattern mining problems.
  • We generalize the seminal Dualize & Advance algorithm to deal with mining problem in these classes.
  • We give some examples of known instances of complex pattern mining problems, including sequences and conjunctive queries, fit into our framework.