The dearth of clean textual data often acts as a bottleneck in several natural language processing applications. The data available often lacks proper case (uppercase or lowercase) information. This often comes up when text is obtained from social media, messaging applications and other online platforms. This paper attempts to solve this problem by restoring the correct case of characters, commonly known as Truecasing. Doing so improves the accuracy of several processing tasks further down in the NLP pipeline. Our proposed architecture uses a combination of convolutional neural networks (CNN), bi-directional long short-term memory networks (LSTM) and conditional random fields (CRF), which work at a character level without any explicit feature engineering. In this study we compare our approach to previous statistical and deep learning based approaches. Our method shows an increment of 0.83 in F1 score over the current state of the art. Since truecasing acts as a preprocessing step in several applications, every increment in the F1 score leads to a significant improvement in the language processing tasks.
Cluttering of SMS inbox is one of the serious problems that users today face in the digital world where every online login, transaction, along with promotions generate multiple SMS. This problem not only prevents users from searching and navigating messages efficiently but often results in users missing out the relevant information associated with the corresponding SMS like offer codes, payment reminders etc. In this paper, we propose a unique architecture to organize and extract the appropriate information from SMS and further display it in an intuitive template. In the proposed architecture, we use a Hybrid Hierarchical Long Short Term Memory (LSTM)-Convolutional Neural Network (CNN) to categorize SMS into multiple classes followed by a set of entity parsers used to extract the relevant information from the classified message. The architecture using its preprocessing techniques not only takes into account the enormous variations observed in SMS data but also makes it efficient for its on-device (mobile phone) functionalities in terms of inference timing and size.
Artificially crafted images such as memes, seasonal greetings, etc are flooding the social media platforms today. These eventually start occupying a lot of internal memory of smartphones and it gets cumbersome for the user to go through hundreds of images and delete these synthetic images. To address this, we propose a novel method based on Convolutional Neural Networks (CNNs) for the on-device filtering of social media images by classifying these synthetic images and allowing the user to delete them in one go. The custom model uses depthwise separable convolution layers to achieve low inference time on smartphones. We have done an extensive evaluation of our model on various camera image datasets to cover most aspects of images captured by a camera. Various sorts of synthetic social media images have also been tested. The proposed solution achieves an accuracy of 98.25% on the Places-365 dataset and 95.81% on the Synthetic image dataset that we have prepared containing 30K instances.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.