A Local Scalable Distributed Expectation Maximization Algorithm for Large Peer-to-Peer Networks

Bhaduri, Kanishka; Srivastava, Ashok N.

doi:10.1109/icdm.2009.45

Cited by 13 publications

(7 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Datta et al [7] presents an overview of this topic. Examples of scalable distributed P2P data mining algorithms include the association rule mining algorithm [26], k-Means clustering [8], top-l inner product identification [6], decision tree induction [3], expectation maximization [2] and more.…”

Section: P2p Data Miningmentioning

confidence: 99%

Multi-objective optimization based privacy preserving distributed data mining in Peer-to-Peer networks

Das

Bhaduri

Kargupta

2010

Peer-to-Peer Netw. Appl.

Self Cite

View full text Add to dashboard Cite

This paper proposes a scalable, local privacypreserving algorithm for distributed Peer-to-Peer (P2P) data aggregation useful for many advanced data mining/analysis tasks such as average/sum computation, decision tree induction, feature selection, and more. Unlike most multi-party privacy-preserving data mining algorithms, this approach works in an asynchronous manner through local interactions and it is highly scalable. It particularly deals with the distributed computation of the sum of a set of numbers stored at different peers in a P2P network in the context of a P2P web mining application. The proposed optimizationbased privacy-preserving technique for computing the sum allows different peers to specify different privacy requirements without having to adhere to a global set of parameters for the chosen privacy model. Since dis-A shorter version of this paper was published in IEEE P2P'09 conference. tributed sum computation is a frequently used primitive, the proposed approach is likely to have significant impact on many data mining tasks such as multi-party privacy-preserving clustering, frequent itemset mining, and statistical aggregate computation.

show abstract

Section: P2p Data Miningmentioning

confidence: 99%

Multi-objective optimization based privacy preserving distributed data mining in Peer-to-Peer networks

Das

Bhaduri

Kargupta

2010

Peer-to-Peer Netw. Appl.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The authors of [4] presented an algorithm for learning parameters of Gaussian mixture models (GMM) in large P2P environments that can be used for a variety of wellknown data mining tasks in distributed environments such as clustering, anomaly detection, target tracking, and density estimation, which are necessary for many emerging P2P applications in bio-informatics, web-mining and sensor networks.…”

Section: Learning and Mining In Peer-to-peer Networkmentioning

confidence: 99%

Learning Structure and Schemas from Heterogeneous Domains in Networked Systems: A Survey

Biba

Xhafa

2010

2010 International Conference on Intelligent Networking and Collaborative Systems

View full text Add to dashboard Cite

The rapidly growing amount of available digital documents of various formats and the possibility to access these through internet-based technologies in distributed environments, have led to the necessity to develop solid methods to properly organize and structure documents in large digital libraries and repositories. Specifically, the extremely large size of document collections make it impossible to manually organize such documents. Additionally, most of the documents exist in an unstructured form and do not follow any schemas. Therefore, research efforts in this direction are being dedicated to automatically infer structure and schemas. This is essential in order to better organize huge collections as well as to effectively and efficiently retrieve documents in heterogeneous domains in networked system. This paper presents a survey of the state-of-the-art methods for inferring structure from documents and schemas in networked environments. The survey is organized around the most important application domains, namely, bio-informatics, sensor networks, social networks, P2P systems, automation and control, transportation and privacypreserving for which we analyze the recent developments on dealing with unstructured data in such domains.

show abstract

“…While these works are often formulated as generic optimization problems, rather than designed for a specific learning task, they tend to be motivated by applications where data is distributed by examples (horizontally partitioned), as is made clear, for example, in Forero et al (2010). Closer to our work, multiple fully decentralized algorithms use EM to fit GMMs in horizontally partitioned setups, such as Nowak (2003); Kowalczyk and Vlassis (2005); Gu (2008); Forero et al (2008); Bhaduri and Srivastava (2009);Safarinejadian et al (2010); Weng et al (2011); Altilio et al (2019). Related density estimation tasks have also been considered (Hu et al, 2007;Hua and Li, 2015;Dedecius and Djurić, 2017).…”

Section: Introductionmentioning

confidence: 97%

Decentralized EM to Learn Gaussian Mixtures from Datasets Distributed by Features

Valdeira¹,

Soares²

2022

Preprint

View full text Add to dashboard Cite

Expectation Maximization (EM) is the standard method to learn Gaussian mixtures. Yet its classic, centralized form is often infeasible, due to privacy concerns and computational and communication bottlenecks. Prior work dealt with data distributed by examples, horizontal partitioning, but we lack a counterpart for data scattered by features, an increasingly common scheme (e.g. user profiling with data from multiple entities). To fill this gap, we provide an EM-based algorithm to fit Gaussian mixtures to Vertically Partitioned data (VP-EM). In federated learning setups, our algorithm matches the centralized EM fitting of Gaussian mixtures constrained to a subspace. In arbitrary communication graphs, consensus averaging allows VP-EM to run on large peer-to-peer networks as an EM approximation. This mismatch comes from consensus error only, which vanishes exponentially fast with the number of consensus rounds. We demonstrate VP-EM on various topologies for both synthetic and real data, evaluating its approximation of centralized EM and seeing that it outperforms the available benchmark.However, data is often too large, privacy sensitive, or both, preventing the use of a data center where conventional, centralized methods can be employed. Such setups call for distributed approaches where data is

show abstract

A Local Scalable Distributed Expectation Maximization Algorithm for Large Peer-to-Peer Networks

Cited by 13 publications

References 29 publications

Multi-objective optimization based privacy preserving distributed data mining in Peer-to-Peer networks

Multi-objective optimization based privacy preserving distributed data mining in Peer-to-Peer networks

Learning Structure and Schemas from Heterogeneous Domains in Networked Systems: A Survey

Decentralized EM to Learn Gaussian Mixtures from Datasets Distributed by Features

Contact Info

Product

Resources

About