Automatic Speech Recognition (ASR) systems convert human speech into the corresponding transcription automatically. They have a wide range of applications such as controlling robots, call center analytics, voice chatbot. Recent studies on ASR for English have achieved the performance that surpasses human ability. The systems were trained on a large amount of training data and performed well under many environments. With regards to Vietnamese, there have been many studies on improving the performance of existing ASR systems, however, many of them are conducted on a small-scaled data, which does not reflect realistic scenarios. Although the corpora used to train the system were carefully design to maintain phonetic balance properties, efforts in collecting them at a large-scale are still limited. Specifically, only a certain accent of Vietnam was evaluated in existing works. In this paper, we first describe our efforts in collecting a large data set that covers all 3 major accents of Vietnam located in the Northern, Center, and Southern regions. Then, we detail our ASR system development procedure utilizing the collected data set and evaluating different model architectures to find the best structure for Vietnamese. In the VLSP 2018 challenge, our system achieved the best performance with 6.5% WER and on our internal test set with more than 10 hours of speech collected real environments, the system also performs well with 11% WER
The periodically exchanged basic safety messages (BSM) in vehicular networks have become a new attack target for jamming attacks that are easy to conduct in a vehicular environment. The detection and mitigation must be cordially integrated to provide acceptable communication latency under attacking conditions. This paper considers a comprehensive defense system detecting and mitigating jamming attacks. We analyze the impact of the jamming attack on BSMs on our initially proposed random channel surfing scheme coupling with a detection method. The detection method can hardly provide 100% accuracy, and this consequently delays the reaction. We study the defense system by a mathematical model which is validated by simulations in NS-3. The obtained results depict how the performance of the channel surfing scheme depends on its preinstalled detection method
Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and retrieval. It can be seen as a two-phase process: object detection and segmentation. Object segmentation becomes more challenging in case there is no prior knowledge about the object in the scene. In such conditions, visual attention analysis via saliency mapping may offer a mean to predict the object location by using visual contrast, local or global, to identify regions that draw strong attention in the image. However, in such situations as clutter background, highly varied object surface, or shadow, regular and salient object segmentation approaches based on a single image feature such as color or brightness have shown to be insufficient for the task. This work proposes a new salient object segmentation method which uses a depth map obtained from the input image for enhancing the accuracy of saliency mapping. A deep learning-based method is employed for depth map estimation. Our experiments showed that the proposed method outperforms other state-of-the-art object segmentation algorithms in terms of recall and precision. KeywordsSaliency map, Depth map, deep learning, object segmentation References[1] Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Transactions on pattern analysis and machine intelligence 20(11) (1998) 1254-1259.[2] Goferman, L. Zelnik-Manor, A. Tal, Context-aware saliency detection, IEEE transactions on pattern analysis and machine intelligence 34(10) (2012) 1915-1926.[3] Kanan, M.H. Tong, L. Zhang, G.W. Cottrell, Sun: Top-down saliency using natural statistics, Visual cognition 17(6-7) (2009) 979-1003.[4] Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang, H.-Y. Shum, Learning to detect a salient object, IEEE Transactions on Pattern analysis and machine intelligence 33(2) (2011) 353-367.[5] Perazzi, P. Krähenbühl, Y. Pritch, A. Hornung, Saliency filters: Contrast based filtering for salient region detection, in: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, 2012, pp. 733-740.[6] M. Cheng, N.J. Mitra, X. Huang, P.H. Torr, S.M. Hu, Global contrast based salient region detection, IEEE Transactions on Pattern Analysis and Machine Intelligence 37(3) (2015) 569-582.[7] Borji, L. Itti, State-of-the-art in visual attention modeling, IEEE transactions on pattern analysis and machine intelligence 35(1) (2013) 185-207.[8] Simonyan, A. Vedaldi, A. Zisserman, Deep inside convolutional networks: Visualising image classification models and saliency maps, arXiv preprint arXiv:1312.6034.[9] Li, Y. Yu, Visual saliency based on multiscale deep features, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5455-5463.[10] Liu, J. Han, Dhsnet: Deep hierarchical saliency network for salient object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 678-686.[11] Achanta, S. Hemami, F. Estrada, S. Susstrunk, Frequency-tuned saliency detection model, CVPR: Proc IEEE, 2009, pp. 1597-604.Fu, J. Cheng, Z. Li, H. Lu, Saliency cuts: An automatic approach to object segmentation, in: Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, IEEE, 2008, pp. 1-4Borenstein, J. Malik, Shape guided object segmentation, in: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, Vol. 1, IEEE, 2006, pp. 969-976.Jiang, J. Wang, Z. Yuan, T. Liu, N. Zheng, S. Li, Automatic salient object segmentation based on context and shape prior., in: BMVC. 6 (2011) 9.Ciptadi, T. Hermans, J.M. Rehg, An in depth view of saliency, Georgia Institute of Technology, 2013.Desingh, K.M. Krishna, D. Rajan, C. Jawahar, Depth really matters: Improving visual salient region detection with depth., in: BMVC, 2013.Li, J. Ye, Y. Ji, H. Ling, J. Yu, Saliency detection on light field, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2806-2813.Koch, S. Ullman, Shifts in selective visual attention: towards the underlying neural circuitry, in: Matters of intelligence, Springer, 1987, pp. 115-141.Laina, C. Rupprecht, V. Belagiannis, F. Tombari, N. Navab, Deeper depth prediction with fully convolutional residual networks, in: 3D Vision (3DV), 2016 Fourth International Conference on, IEEE, 2016, pp. 239-248.Bruce, J. Tsotsos, Saliency based on information maximization, in: Advances in neural information processing systems, 2006, pp. 155-162.Ren, X. Gong, L. Yu, W. Zhou, M. Ying Yang, Exploiting global priors for rgb-d saliency detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015, pp. 25-32.Fang, J. Wang, M. Narwaria, P. Le Callet, W. Lin, Saliency detection for stereoscopic images., IEEE Trans. Image Processing 23(6) (2014) 2625-2636.Hou, L. Zhang, Saliency detection: A spectral residual approach, in: Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on, IEEE, 2007, pp. 1-8.Guo, Q. Ma, L. Zhang, Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform, in: Computer vision and pattern recognition, 2008. cvpr 2008. ieee conference on, IEEE, 2008, pp. 1-8.Fang, W. Lin, B.S. Lee, C.T. Lau, Z. Chen, C.W. Lin, Bottom-up saliency detection model based on human visual sensitivity and amplitude spectrum, IEEE Transactions on Multimedia 14(1) (2012) 187-198.Lang, T.V. Nguyen, H. Katti, K. Yadati, M. Kankanhalli, S. Yan, Depth matters: Influence of depth cues on visual saliency, in: Computer vision-ECCV 2012, Springer, 2012, pp. 101-115.Zhang, G. Jiang, M. Yu, K. Chen, Stereoscopic visual attention model for 3d video, in: International Conference on Multimedia Modeling, Springer, 2010, pp. 314-324.Wang, M.P. Da Silva, P. Le Callet, V. Ricordel, Computational model of stereoscopic 3d visual saliency, IEEE Transactions on Image Processing 22(6) (2013) 2151-2165.Peng, B. Li, W. Xiong, W. Hu, R. Ji, Rgbd salient object detection: A benchmark and algorithms, in: European Conference on Computer Vision (ECCV), 2014, pp. 92-109.Wu, L. Duan, L. Kong, Rgb-d salient object detection via feature fusion and multi-scale enhancement, in: CCF Chinese Conference on Computer Vision, Springer, 2015, pp. 359-368.Xue, Y. Gu, Y. Li, J. Yang, Rgb-d saliency detection via mutual guided manifold ranking, in: Image Processing (ICIP), 2015 IEEE International Conference on, IEEE, 2015, pp. 666-670.Katz, A. Adler, Depth camera based on structured light and stereo vision, uS Patent App. 12/877,595 (Mar. 8 2012).Chatterjee, G. Molina, D. Lelescu, Systems and methods for determining depth from multiple views of a scene that include aliasing using hypothesized fusion, uS Patent App. 13/623,091 (Mar. 21 2013).Matthies, T. Kanade, R. Szeliski, Kalman filter-based algorithms for estimating depth from image sequences, International Journal of Computer Vision 3(3) (1989) 209-238.Y. Schechner, N. Kiryati, Depth from defocus vs. stereo: How different really are they?, International Journal of Computer Vision 39(2) (2000) 141-162.Delage, H. Lee, A.Y. Ng, A dynamic bayesian network model for autonomous 3d reconstruction from a single indoor image, in: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, Vol. 2, IEEE, 2006, pp. 2418-2428.Saxena, M. Sun, A.Y. Ng, Make3d: Learning 3d scene structure from a single still image, IEEE transactions on pattern analysis and machine intelligence 31(5) (2009) 824-840.Hedau, D. Hoiem, D. Forsyth, Recovering the spatial layout of cluttered rooms, in: Computer vision, 2009 IEEE 12th international conference on, IEEE, 2009, pp. 1849-1856.Liu, S. Gould, D. Koller, Single image depth estimation from predicted semantic labels, in: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, IEEE, 2010, pp. 1253-1260.Ladicky, J. Shi, M. Pollefeys, Pulling things out of perspective, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 89-96.K. Nathan Silberman, Derek Hoiem, R. Fergus, Indoor segmentation and support inference from rgbd images, in: ECCV, 2012.Liu, J. Yuen, A. Torralba, Sift flow: Dense correspondence across scenes and its applications, IEEE transactions on pattern analysis and machine intelligence 33(5) (2011) 978-994.Konrad, M. Wang, P. Ishwar, 2d-to-3d image conversion by learning depth from examples, in: Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, IEEE, 2012, pp. 16-22.Liu, C. Shen, G. Lin, Deep convolutional neural fields for depth estimation from a single image, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5162-5170.Wang, X. Shen, Z. Lin, S. Cohen, B. Price, A.L. Yuille, Towards unified depth and semantic prediction from a single image, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2800-2809.Geiger, P. Lenz, C. Stiller, R. Urtasun, Vision meets robotics: The kitti dataset, International Journal of Robotics Research (IJRR).Achanta, S. Süsstrunk, Saliency detection using maximum symmetric surround, in: Image processing (ICIP), 2010 17th IEEE international conference on, IEEE, 2010, pp. 2653-2656.E. Rahtu, J. Kannala, M. Salo, J. Heikkilä, Segmenting salient objects from images and videos, in: Computer Vision-ECCV 2010, Springer, 2010, pp. 366-37.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.