Diabetic Retinopathy (DR) is a highly prevalent complication of diabetes mellitus, which causes lesions on the retina that affect vision which may lead to blindness if it is not detected and diagnosed early. Convolutional neural networks (CNN) are becoming the state-of-the-art approach for automatic detection of DR by using fundus images. The high-level features extracted by CNN are mostly utilised for the detection and classification of lesions on the retina. This high-level representation is capable of classifying different DR classes; however, more effective features for detecting the damages are needed. This paper proposes the multi-scale attention network (MSA-Net) for DR classification. The proposed approach applies the encoder network to embed the retina image in a high-level representational space, where the combination of mid and high-level features is used to enrich the representation. Then a multi-scale feature pyramid is included to describe the retinal structure in a different locality. Furthermore, to enhance the discriminative power of the feature representation a multi-scale attention mechanism is used on top of the high-level representation. The model is trained in a standard way using the cross-entropy loss to classify the DR severity level. In parallel as an auxiliary task, the model is trained using the weakly annotated data to detect healthy and non-healthy retina images. This surrogate task helps the model to enrich its discriminative power for distinguishing the non-healthy retina images. The proposed method when implemented has achieved outstanding results on two public datasets: EyePACS and APTOS.
Embodied agents present ongoing challenging agenda for research in multi-modal user interfaces and humancomputer-interaction. Such agent metaphors will only be widely applicable to online applications when there is a standardised way to map underlying engines with the visual presentation of the agents. This paper delineates the functions and specifications of a mark-up language for scripting the animation of virtual characters. The language is called: Character Mark-up Language (CML) and is an XML-based character attribute definition and animation scripting language designed to aid in the rapid incorporation of lifelike characters/agents into online applications or virtual reality worlds. This multi-modal scripting language is designed to be easily understandable by human animators and easily generated by a software process such as software agents. CML is constructed based jointly on motion and multi-modal capabilities of virtual life-like figures. The paper further illustrates the constructs of the language and describes a real-time execution architecture that demonstrates the use of such a language as a 4G language to easily utilise and integrate MPEG-4 media objects in online interfaces and virtual environments.
Diabetic macular edema (DME) is the most common cause of visual impairment among patients with diabetes mellitus. Anti-vascular endothelial growth factors (Anti-VEGFs) are considered the first line in its management. The aim of this research has been to develop a deep learning (DL) model for predicting response to intravitreal anti-VEGF injections among DME patients. The research included treatment naive DME patients who were treated with anti-VEGF. Patient’s pre-treatment and post-treatment clinical and macular optical coherence tomography (OCT) were assessed by retina specialists, who annotated pre-treatment images for five prognostic features. Patients were also classified based on their response to treatment in their post-treatment OCT into either good responder, defined as a reduction of thickness by >25% or 50 µm by 3 months, or poor responder. A novel modified U-net DL model for image segmentation, and another DL EfficientNet-B3 model for response classification were developed and implemented for predicting response to anti-VEGF injections among patients with DME. Finally, the classification DL model was compared with different levels of ophthalmology residents and specialists regarding response classification accuracy. The segmentation deep learning model resulted in segmentation accuracy of 95.9%, with a specificity of 98.9%, and a sensitivity of 87.9%. The classification accuracy of classifying patients’ images into good and poor responders reached 75%. Upon comparing the model’s performance with practicing ophthalmology residents, ophthalmologists and retina specialists, the model’s accuracy is comparable to ophthalmologist’s accuracy. The developed DL models can segment and predict response to anti-VEGF treatment among DME patients with comparable accuracy to general ophthalmologists. Further training on a larger dataset is nonetheless needed to yield more accurate response predictions.
Over the last years there has been a growing consensus that new generation interfaces turn their focus on the human element by enriching Human-Computer Communication with an Aflective dimension. Affective generation of autonomous agent behaviour aspires to give computer interfaces emotional states that relate and take into account user as well as system environment considerations. Internally, through computational models of artificial hearts (emotion and personality), and externally through believable multi-modal expression augmented with quasi-human characteristics. Computational models of affect are addressing problems of how agents anive at a given affective state and how these states are expressed through natural multimodal communicative interaction. Much of this work is targeting the entertainment environment and generally does not address the real-time requirements of multi-agent systems, where behaviour is dynamically changing based on agent goals as well as the shared data and knowledge. This paper discusses one of the requirements for real-time realisation of Personal Service Assistant interface characters.We describe an operational approach to enabling the computational perception required for the automated generation of affective behaviour through inter-agent communication in multi-agent real-time environments. The research is investigating the potential of extending current agent communication languages so as they not only convey the semantic content of knowledge exchange but also they can communicate affective attitudes about the shared knowledge. Providing a necessary component of the framework required for real-time autonomous agent development with which we may bridge the gap between current research in psychological theory and practical implementation of social multi-agent systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.