Many robot exploration algorithms that are used to explore office, home, or outdoor environments, rely on the concept of frontier cells. Frontier cells define the border between known and unknown space. Frontier-based exploration is the process of repeatedly detecting frontiers and moving towards them, until there are no more frontiers and therefore no more unknown regions. The faster frontier cells can be detected, the more efficient exploration becomes. This paper proposes several algorithms for detecting frontiers. The first is called Naïve Active Area (NaïveAA) frontier detection and achieves frontier detection in constant time by only evaluating the cells in the active area defined by scans taken. The second algorithm is called Expanding-Wavefront Frontier Detection (EWFD) and uses frontiers from the previous timestep as a starting point for searching for frontiers in newly discovered space. The third approach is called Frontier-Tracing Frontier Detection (FTFD) and also uses the frontiers from the previous timestep as well as the endpoints of the scan, to determine the frontiers at the current timestep. Algorithms are compared to state-of-the-art algorithms such as Naïve, WFD, and WFD-INC. NaïveAA is shown to operate in constant time and therefore is suitable as a basic benchmark for frontier detection algorithms. EWFD and FTFD are found to be significantly faster than other algorithms.
With the rapid growth of users' data in SaaS (Software-as-a-service) platforms using micro-services, it becomes essential to detect duplicated entities for ensuring the integrity and consistency of data in many companies and businesses (primarily multinational corporations). Due to the large volume of databases today, the expected duplicate detection algorithms need to be not only accurate but also practical, which means that it can release the detection results as fast as possible for a given request. Among existing algorithms for the deduplicate detection problem, using Siamese neural networks with the triplet loss has become one of the robust ways to measure the similarity of two entities (texts, paragraphs, or documents) for identifying all possible duplicated items. In this paper, we first propose a practical framework for building a duplicate detection system in a SaaS platform. Second, we present a new active learning schema for training and updating duplicate detection algorithms. In this schema, we not only allow the crowd to provide more annotated data for enhancing the chosen learning model but also use the Siamese neural networks as well as the triplet loss to construct an efficient model for the problem. Finally, we design a user interface of our proposed deduplicate detection system, which can easily apply for empirical applications in different companies.
Existing duplicate records is one of the most common issues in many Software-as-as-Service (SaaS) platforms. In this paper, we study the duplicate identification problem in one specific SaaS platform related to quality and compliance management by using the address information. We interpret all typical mistakes from users that can generate the existent duplicated organizations in a given dataset, collected from the SaaS platform. Also, we create another set by crawling location data from Open Address (US Zone). We compare different methods, including Bag-of-words (using Cosine Distance), Record Linkage Toolkits, and Siamese Neural Networks using the triplet loss, in terms of precision, recall, and F1-score. The experimental results show that using Siamese Neural Networks can achieve a better performance in comparison with other techniques. We plan to publish our Open Address dataset and all implementation codes to facilitate further research in the related fields. CCS CONCEPTS • Computing methodologies → Machine learning algorithms; Neural networks; Classification and regression trees.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.