Formal Concept Analysis (FCA) looks to decompose a matrix of objectsattributes into a set of sparse matrices capturing the underlying structure of a formal context. We propose a Rank Reduction (RR) method to prime approximate FCAs, namely RRFCA. While many existing FCA algorithms are complete, lectic ordering of the lattice may not minimize search/decomposition time. Initially, RRFCA decompositions are not unique or complete; however, a set of good closures with high support is learned quickly, and then, made complete. RRFCA has its novelty in that we propose a new multiplicative two-stage method. First, we describe the theoretical foundations underpinning our RR approach. Second, we provide a representative exemplar, showing how RRFCA can be implemented. Further experiments demonstrate that RRFCA methods are efficient, scalable and yield time-savings. We demonstrate the resulting methods lend themselves to parallelization.Key words: Formal Concept Analysis, Rank Reduction, Factorization.
IntroductionFormal Concept Analysis (FCA) leverages the notion of a concept, an objectattribute building block of a binary relational dataset, and its ranking in a concept hierarchy to mine data-sets [25]. One short-coming is that concepts are mined according to lectic ordering and not concept importance or support in the formal context. Lectic ordering recommends itself on account of its thoroughness [10] [24,13]. The theoretical and empirical complexity of various approaches is compared by Kuznetsov in [16]. Computational complexity is the main measure for comparing algorithms: Kuznetsov and Obiedkov focus on the properties of the data ensemble, namely sparsity, the primary complexity inducing characteristic of the decomposition. Aside from sparsity, the main bottlenecks are memory and processing constraints. Ganter's algorithm computes concepts iteratively based on the previous concept, without incurring exponential memory requirements, by exploiting lectic ordering. CloseByOne produces many concepts in each iteration. Bordat's algorithm, described in [3], introduces a data structure to store previously found concepts, which results in considerable timesavings. This approach is made more efficient in [2] by removing the need for a structure of exponential size.A significant short-coming of batch approaches is that the entire lattice must be reconstructed if the database changes. Here, we look to the memory and computation challenge by using rank reduction method and disjointness to select good starting-intents for FCA.The justification goes as follows: all concepts are not equal in a binary relational dataset. FC support (the extent to which it overlaps with the formal context) and its expressiveness (FC disjointness given a set of FCs), may be different for FCs. NextClosure's lectic ordering does not consider these concerns. To address this, we prime NextClosure with multiple starting-intents by taking reduced rank approximations of the binary relation using Nonnegative Matrix Factorization (NMF) [18]. Lectic order...