Computational visual perception, also known as computer vision, is a field of artificial intelligence that enables computers to process digital images and videos in a similar way as biological vision does. It involves methods to be developed to replicate the capabilities of biological vision. The computer vision’s goal is to surpass the capabilities of biological vision in extracting useful information from visual data. The massive data generated today is one of the driving factors for the tremendous growth of computer vision. This survey incorporates an overview of existing applications of deep learning in computational visual perception. The survey explores various deep learning techniques adapted to solve computer vision problems using deep convolutional neural networks and deep generative adversarial networks. The pitfalls of deep learning and their solutions are briefly discussed. The solutions discussed were dropout and augmentation. The results show that there is a significant improvement in the accuracy using dropout and data augmentation. Deep convolutional neural networks’ applications, namely, image classification, localization and detection, document analysis, and speech recognition, are discussed in detail. In-depth analysis of deep generative adversarial network applications, namely, image-to-image translation, image denoising, face aging, and facial attribute editing, is done. The deep generative adversarial network is unsupervised learning, but adding a certain number of labels in practical applications can improve its generating ability. However, it is challenging to acquire many data labels, but a small number of data labels can be acquired. Therefore, combining semisupervised learning and generative adversarial networks is one of the future directions. This article surveys the recent developments in this direction and provides a critical review of the related significant aspects, investigates the current opportunities and future challenges in all the emerging domains, and discusses the current opportunities in many emerging fields such as handwriting recognition, semantic mapping, webcam-based eye trackers, lumen center detection, query-by-string word, intermittently closed and open lakes and lagoons, and landslides.
Multimodal medical image fusion is a current technique applied in the applications related to medical field to combine images from the same modality or different modalities to improve the visual content of the image to perform further operations like image segmentation. Biomedical research and medical image analysis highly demand medical image fusion to perform higher level of medical analysis. Multimodal medical fusion assists medical practitioners to visualize the internal organs and tissues. Multimodal medical fusion of brain image helps to medical practitioners to simultaneously visualize hard portion like skull and soft portion like tissue. Brain tumor segmentation can be accurately performed by utilizing the image obtained after multimodal medical image fusion. The area of the tumor can be accurately located with the information obtained from both Positron Emission Tomography and Magnetic Resonance Image in a single fused image. This approach increases the accuracy in diagnosing the tumor and reduces the time consumed in diagnosing and locating the tumor. The functional information of the brain is available in the Positron Emission Tomography while the anatomy of the brain tissue is available in the Magnetic Resonance Image. Thus, the spatial characteristics and functional information can be obtained from a single image using a robust multimodal medical image fusion model. The proposed approach uses a generative adversarial network to fuse Positron Emission Tomography and Magnetic Resonance Image into a single image. The results obtained from the proposed approach can be used for further medical analysis to locate the tumor and plan for further surgical procedures. The performance of the GAN based model is evaluated using two metrics, namely, structural similarity index and mutual information. The proposed approach achieved a structural similarity index of 0.8551 and a mutual information of 2.8059.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.