Deep networks are increasingly being applied to problems involving image synthesis, e.g., generating images from textual descriptions and reconstructing an input image from a compact representation. Supervised training of image-synthesis networks typically uses a pixel-wise loss (PL) to indicate the mismatch between a generated image and its corresponding target image. We propose instead to use a loss function that is better calibrated to human perceptual judgments of image quality: the multiscale structural-similarity score (MS-SSIM) [34]. Because MS-SSIM is differentiable, it is easily incorporated into gradient-descent learning. We compare the consequences of using MS-SSIM versus PL loss on training deterministic and stochastic autoencoders. For three different architectures, we collected human judgments of the quality of image reconstructions. Observers reliably prefer images synthesized by MS-SSIM-optimized models over those synthesized by PL-optimized models, for two distinct PL measures (L 1 and L 2 distances). We also explore the effect of training objective on image encoding and analyze conditions under which perceptually-optimized representations yield better performance on image classification. Finally, we demonstrate the superiority of perceptually-optimized networks for super-resolution imaging. Just as computer vision has advanced through the use of convolutional architectures that mimic the structure of the mammalian visual system, we argue that significant additional advances can be made in modeling images through the use of training objectives that are well aligned to characteristics of human perception.
With the resurgence of interest in neural networks, representation learning has re-emerged as a central focus in artificial intelligence. Representation learning refers to the discovery of useful encodings of data that make domain-relevant information explicit. Factorial representations identify underlying independent causal factors of variation in data. A factorial representation is compact and faithful, makes the causal factors explicit, and facilitates human interpretation of data. Factorial representations support a variety of applications, including the generation of novel examples, indexing and search, novelty detection, and transfer learning.This article surveys various constraints that encourage a learning algorithm to discover factorial representations. I dichotomize the constraints in terms of unsupervised and supervised inductive bias. Unsupervised inductive biases exploit assumptions about the environment, such as the statistical distribution of factor coefficients, assumptions about the perturbations a factor should be invariant to (e.g. a representation of an object can be invariant to rotation, translation or scaling), and assumptions about how factors are combined to synthesize an observation. Supervised inductive biases are constraints on the representations based on additional information connected to observations. Supervisory labels come in variety of types, which vary in how strongly they constrain the representation, how many factors are labeled, how many observations are labeled, and whether or not we know the associations between the constraints and the factors they are related to.This survey brings together a wide variety of models that all touch on the problem of learning factorial representations and lays out a framework for comparing these models based on the strengths of the underlying supervised and unsupervised inductive biases.
Objective. Algorithms to detect changes in cognitive load using non-invasive biosensors (e.g. electroencephalography (EEG)) have the potential to improve human–computer interactions by adapting systems to an individual’s current information processing capacity, which may enhance performance and mitigate costly errors. However, for algorithms to provide maximal utility, they must be able to detect load across a variety of tasks and contexts. The current study aimed to build models that capture task-general EEG correlates of cognitive load, which would allow for load detection across variable task contexts. Approach. Sliding-window support vector machines (SVM) were trained to predict periods of high versus low cognitive load across three cognitively and perceptually distinct tasks: n-back, mental arithmetic, and multi-object tracking. To determine how well these SVMs could generalize to novel tasks, they were trained on data from two of the three tasks and evaluated on the held-out task. Additionally, to better understand task-general and task-specific correlates of cognitive load, a set of models were trained on subsets of EEG frequency features. Main results. Models achieved reliable performance in classifying periods of high versus low cognitive load both within and across tasks, demonstrating their generalizability. Furthermore, continuous model outputs correlated with subtle differences in self-reported mental effort and they captured predicted changes in load within individual trials of each task. Additionally, alpha or beta frequency features achieved reliable within- and cross-task performance, suggesting that activity in these frequency bands capture task-general signatures of cognitive load. In contrast, delta and theta frequency features performed considerably worse than the full cross-task models, suggesting that delta and theta activity may be reflective of task-specific differences across cognitive load conditions. Significance. EEG data contains task-general signatures of cognitive load. Sliding-window SVMs can capture these signatures and continuously detect load across multiple task contexts.
We explore the nature of forgetting in a corpus of 125,000 students learning Spanish using the Rosetta Stone foreign-language instruction software across 48 lessons. Students are tested on a lesson after its initial study and are then retested after a variable time lag. We observe forgetting consistent with power function decay at a rate that varies across lessons but not across students. We find that lessons which are better learned initially are forgotten more slowly, a correlation which likely reflects a latent cause such as the quality or difficulty of the lesson. We obtain improved predictive accuracy of the forgetting model by augmenting it with features that encode characteristics of a student's initial study of the lesson and the activities the student engaged in between the initial and delayed tests. The augmented model can predict 23.9% of the variance in an individual's score on the delayed test. We analyze which features best explain individual performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.