Traditional scene text spotters aim to detect and recognize entire words or sentences in natural scene images, however, the detection and recognition of every single character is also as important as the spotting of unifying words or sentences in one image. There are few specialized methods to spot single character in scene text spotting, and some word-based methods can not recognize a series of characters in images if they can not be spelled as a correct word. In addition, some early models can only detect or recognize texts which are horizontal and distinctive. We realize that it is necessary to improve some existing models for achieving the goal of spotting characters, therefore, we propose a novel method based on an improved YOLOv5 model to accomplish the character-level spotting. It’s worth noting that this method can spots characters not only in regular texts but also in irregular texts (curved texts and oriented texts).
In recent years, machine learning algorithms had good performance in many fields. On the one hand, its predictive ability is greatly improved; on the other hand, with the increase of the model complexity, the interpretability of the algorithm is even worse. In this paper, we propose a novel method for improving the tree ensemble model by balancing predictive performance and interpretability. The rule extraction turns tree models into “if-then” rules. The rule pruning method removes the redundant constraints. And the rule selection method selects the optimal rule subset based on the genetic algorithm. An evaluation of the proposed method on the regression problem has been performed. Experiments on acute toxicity datasets demonstrate the effectiveness of the proposed approach.
Lyric transcription is similar to speech recognition, both identify content from sound clips. Speech recognition technology is maturing and related application systems have been widely used in the software industry, but the research on singing content is far from getting enough attention, there is still little research on identifying words and sentences from singing voice. What's more serious is that compared with the lyrics transcription in the English field, there are almost no related academic papers in the Mandarin field. On the one hand, speech recognition has high-quality datasets in multiple languages that are large enough to train large-scale models. However, the field of singing lacks data resources. On the other hand, compared with speech recognition, singing recognition has obvious skills in pronunciation, which is embodied in musical characteristics such as pitch and rhythm. Based on these problems, this paper aims to provide a dataset that can be used for Mandarin lyrics transcription, and build a transcription model on this dataset. Our model can address some deficiencies of the existing models, and achieves promising results on our dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.