Cluttering of SMS inbox is one of the serious problems that users today face in the digital world where every online login, transaction, along with promotions generate multiple SMS. This problem not only prevents users from searching and navigating messages efficiently but often results in users missing out the relevant information associated with the corresponding SMS like offer codes, payment reminders etc. In this paper, we propose a unique architecture to organize and extract the appropriate information from SMS and further display it in an intuitive template. In the proposed architecture, we use a Hybrid Hierarchical Long Short Term Memory (LSTM)-Convolutional Neural Network (CNN) to categorize SMS into multiple classes followed by a set of entity parsers used to extract the relevant information from the classified message. The architecture using its preprocessing techniques not only takes into account the enormous variations observed in SMS data but also makes it efficient for its on-device (mobile phone) functionalities in terms of inference timing and size.
Diacritic characters can be considered as a unique set of characters providing us with adequate and significant clue in identifying a given language with considerably high accuracy. Diacritics, though associated with phonetics often serve as a distinguishing feature for many languages especially the ones with a Latin script. In this proposed work, we aim to identify language of text in images using the presence of diacritic characters in order to improve Optical Character Recognition (OCR) performance in any given automated environment. We showcase our work across 13 Latin languages encompassing 85 diacritic characters. We use an architecture similar to Squeezedet for object detection of diacritic characters followed by a shallow network to finally identify the language. OCR systems when accompanied with identified language parameter tends to produce better results than sole deployment of OCR systems. The discussed work apart from guaranteeing an improvement in OCR results also takes on-device (mobile phone) constraints into consideration in terms of model size and inference time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.