A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module. Building these components often requires extensive domain expertise and may contain brittle design choices. In this paper, we present Tacotron, an end-to-end generative text-to-speech model that synthesizes speech directly from characters. Given pairs, the model can be trained completely from scratch with random initialization. We present several key techniques to make the sequence-tosequence framework perform well for this challenging task. Tacotron achieves a 3.82 subjective 5-scale mean opinion score on US English, outperforming a production parametric system in terms of naturalness. In addition, since Tacotron generates speech at the frame level, it's substantially faster than sample-level autoregressive methods.
We propose an end-to-end trainable network that can simultaneously detect and recognize text of arbitrary shape, making substantial progress on the open problem of reading scene text of irregular shape. We formulate arbitrary shape text detection as an instance segmentation problem; an attention model is then used to decode the textual content of each irregularly shaped text region without rectification. To extract useful irregularly shaped text instance features from image scale features, we propose a simple yet effective RoI masking step. Additionally, we show that predictions from an existing multi-step OCR engine can be leveraged as partially labeled training data, which leads to significant improvements in both the detection and recognition accuracy of our model. Our method surpasses the state-of-the-art for end-to-end recognition tasks on the ICDAR15 (straight) benchmark by 4.6%, and on the Total-Text (curved) benchmark by more than 16%.
The motion trajectory of sea cucumbers reflects the behavior of sea cucumbers, and the behavior of sea cucumbers reflects the status of the feeding and individual health, which provides the important information for the culture, status detection and early disease warning. Different from the traditional manual observation and sensor-based automatic detection methods, this paper proposes a detection, location and analysis approach of behavior trajectory based on Faster R-CNN for sea cucumbers under the deep learning framework. The designed detection system consists of a RGB camera to collect the sea cucumbers' images and a corresponding sea cucumber identification software. The experimental results show that the proposed approach can accurately detect and locate sea cucumbers. According to the experimental results, the following conclusions are drawn: (1) Sea cucumbers have an adaptation time for the new environment. When sea cucumbers enter a new environment, the adaptation time is about 30 minutes. Sea cucumbers hardly move within 30 minutes and begin to move after about 30 minutes. (2) Sea cucumbers have the negative phototaxis and prefers to move in the shadows. (3) Sea cucumbers have a tendency to the edge. They like to move along the edge of the aquarium. When the sea cucumber is in the middle of the aquarium, the sea cucumber will look for the edge of the aquarium. (4) Sea cucumbers have unidirectional topotaxis. They move along the same direction with the initial motion direction. The proposed approach will be extended to the detection and behavioral analysis of the other marine organisms in the marine ranching. INDEX TERMS Artificial intelligence (AI), animal behavior, deep learning, object detection, faster R-CNN, marine ranching, sea cucumber.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.