A new modification of multi-CNN ensemble training is investigated by combining multiloss functions from state-of-the-art deep CNN architectures for leaf image recognition. We first apply the U-Net model to segment leaf images from the background to improve the performance of the recognition system. Then, we introduce a multimodel approach based on a combination of loss functions from the EfficientNet and MobileNet (called as multimodel CNN (MMCNN)) to generalize a multiloss function. The joint learning multiloss model designed for leaf recognition allows each network to perform its task and cooperate with the others simultaneously, where knowledge from various trained deep networks is shared. This cooperation-proposed multimodel is forced to deal with more complicated problems rather than a simple classification. Therefore, the network can learn much rich information and improve its generalization capability. Furthermore, a multiloss trade-off strategy between two deep learning models can reduce the effect of redundancy problems in ensemble classifiers. The performance of our approach is evaluated by our custom Vietnamese herbal leaf species dataset, and public datasets such as Flavia, Leafsnap, and Folio are used to build test cases. The results confirm that our approach enhances the leaf recognition performance and outperforms the current standard single networks while having less low computation cost.
This paper proposes an enhancement of an automatic text recognition system for extracting information from the front side of the Vietnamese citizen identity (CID) card. First, we apply Mask-RCNN to segment and align the CID card from the background. Next, we present two approaches to detect the CID card's text lines using traditional image processing techniques compared to the EAST detector. Finally, we introduce a new end-to-end Convolutional Recurrent Neural Network (CRNN) model based on a combination of Connectionist Temporal Classification (CTC) and attention mechanism for Vietnamese text recognition by jointly train the CTC and attention objective functions together. The length of the CTC's output label sequence is applied to the attention-based decoder prediction to make the final label sequence. This process helps to decrease irregular alignments and speed up the label sequence estimation during training and inference, instead of only relying on a data-driven attention-based encoder-decoder to estimate the label sequence in long sentences. We may directly learn the proposed model from a sequence of words without detailed annotations. We evaluate the proposed system using a real collected Vietnamese CID card dataset and find that our method provides a 4.28% in WER and outperforms the common techniques.
This paper presents an effective Vietnamese handwritten text recognition model by applying an improved convolutional recurrent neural networks (CRNNs) model to high school enrollment forms in Tay Ninh province, Vietnam. First, the proposed model extracts data areas containing text characters from forms. Then, we connect text boxes on the same row and divide the fields that containing text into three specific regions. Finally, we detect areas containing text characters for handwritten text recognition. We use word error rate (WER) to evaluate the recognition process and obtain a result of 0.3602. This result is one of the best solutions to the Vietnamese handwritten text recognition problem.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.