Recently, a new research area, named Privacypreserving Distributed Data Mining (PPDDM) has emerged. It aims at solving the following problem: a number of participants want to jointly conduct a data mining task based on the private data sets held by each of the participants. This problem setting has captured attention and interests of researchers, practitioners and developers from the communities of both data mining and information security. They have made great progress in designing and developing solutions to address this scenario. However, researchers and practitioners are now faced with a challenge on how to devise a standard on synthesizing and evaluating various PPDDM protocols, because they have been confused by the excessive number of techniques developed so far.In this paper, we put forward a framework to synthesize and characterize existing PPDDM protocols so as to provide a standard and systematic approach of understanding PPDDMrelated problems, analyzing PPDDM requirements and designing effective and efficient PPDDM protocols.
Community Question Answering (CQA) forums, such as Stack Overflow, Stack Exchange and Massive Open Online Course (MOOC) forums, spend a lot of manpower and time to manage duplicate questions on the forum. Mismatch of duplicate questions makes users keep asking ''new'' questions, and the continuous accumulation of duplicate questions may interfere with their information searching again, affecting user satisfaction. Neural Networks (NN) models for parsing semantics provide the possibility of end-to-end duplicate question detection. Whereas, due to lack of domain data and expertise, NN models for semantic parsing are rarely directly applied to CQA duplicate question detection. This paper proposes a Semantic Matching Model (SMM) integrated with the multi-task transfer learning framework for multi-domain forum duplicate question detection. By designing the word-to-sentence interaction mechanism based on the word-to-word interaction, SMM can automatically choose to ignore or pay attention to potential similar words according to the semantics at the sentence level. The experiments on the benchmark data set and MOOC forum data set state that SMM outperforms baselines, its interaction mechanism is effective and it has an advantage in cross-domain duplicate question detection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.