This paper presents a novel adaptive resonance theory (ART)-based modular architecture for unsupervised learning, namely the distributed dual vigilance fuzzy ART (DDVFA). DDVFA consists of a global ART system whose nodes are local fuzzy ART modules. It is equipped with the distinctive features of distributed higher-order activation and match functions, using dual vigilance parameters responsible for cluster similarity and data quantization. Together, these allow DDVFA to perform unsupervised modularization, create multi-prototype clustering representations, retrieve arbitrarily-shaped clusters, and control its compactness. Another important contribution is the reduction of order-dependence, an issue that affects any agglomerative clustering method. This paper demonstrates two approaches for mitigating order-dependence: preprocessing using visual assessment of cluster tendency (VAT) or postprocessing using a novel Merge ART module. The former is suitable for batch processing, whereas the latter can be used in online learning. Experimental results in the online learning mode carried out on 30 benchmark data sets show that DDVFA cascaded with Merge ART statistically outperformed the best other ART-based systems when samples were randomly presented. Conversely, they were found to be statistically equivalent in the offline mode when samples were pre-processed using VAT. Remarkably, performance comparisons to non-ART-based clustering algorithms show that DDVFA (which learns incrementally) was also statistically equivalent to the non-incremental (offline) methods of DBSCAN, single linkage hierarchical agglomerative clustering (HAC), and offline version of k-means, while retaining the appealing properties of ART. Links to the source code and data are provided. Considering the algorithm's simplicity, online learning capability, and performance, it is an ideal choice for many agglomerative clustering applications.Representation, Topology, Visual Assessment of Cluster Tendency. methods) and top-down (divisive or splitting methods) [2]. Hierarchical ART architectures generally follow two main designs [20]: (a) a series/cascade of ART modules where the output of one ART (i.e., a prototype) is the input of the next [21][22][23][24][25][26][27][28][29][30][31] or (b) parallel ART modules sharing the same inputs and using different vigilance values [6,[32][33][34][35][36][37][38][39]. Generally, the hierarchical relationships between ART modules are defined implicitly by the input signal flow, explicitly by enforcing constraints or connections, and/or by the setting of multiple vigilance parameters to define hierarchies. Alternatively, hierarchies within the same ART can be created by designing custom ART activation functions [40,41] or by analyzing its distributed activation patterns [42]. ART-based hierarchical approaches have been successfully applied, for instance, in text mining [20,43] and robotics [30,39].Another branch of clustering includes multi-prototype-based methods. These allow multiple prototypes to represent a si...