We present an interactive application that enables users to improve the visual aesthetics of their digital photographs using spatial recomposition. Unlike earlier work that focuses either on photo quality assessment or interactive tools for photo editing, we enable the user to make informed decisions about improving the composition of a photograph and to implement them in a single framework. Specifically, the user interactively selects a foreground object and the system presents recommendations for where it can be moved in a manner that optimizes a learned aesthetic metric while obeying semantic constraints. For photographic compositions that lack a distinct foreground object, our tool provides the user with cropping or expanding recommendations that improve its aesthetic quality. We learn a support vector regression model for capturing image aesthetics from user data and seek to optimize this metric during recomposition. Rather than prescribing a fully-automated solution, we allow user-guided object segmentation and inpainting to ensure that the final photograph matches the user's criteria. Our approach achieves 86% accuracy in predicting the attractiveness of unrated images, when compared to their respective human rankings. Additionally, 73% of the images recomposited using our tool are ranked more attractive than their original counterparts by human raters.
The goal of high-level event recognition is to automatically detect complex high-level events in a given video sequence. This is a difficult task especially when videos are captured under unconstrained conditions by nonprofessionals. Such videos depicting complex events have limited quality control, and therefore, may include severe camera motion, poor lighting, heavy background clutter, and occlusion. However, due to the fast growing popularity of such videos, especially on the Web, solutions to this problem are in high demands and have attracted great interest from researchers. In this paper, we review current technologies for complex event recognition in unconstrained videos. While the existing solutions vary, we identify common key modules and provide detailed descriptions along with some insights for each of them, including extraction and representation of low-level features across different modalities, classification strategies, fusion techniques, etc. Publicly available benchmark datasets, performance metrics, and related research forums are also described. Finally, we discuss promising directions for future research.
Advances in control engineering and material science made it possible to develop small-scale unmanned aerial vehicles (UAVs) equipped with cameras and sensors. These UAVs enable us to obtain a bird's eye view of the environment. Having access to an aerial view over large areas is helpful in disaster situations, where often only incomplete and inconsistent information is available to the rescue team. In such situations, airborne cameras and sensors are valuable sources of information helping us to build an ''overview'' of the environment and to assess the current situation.This paper reports on our ongoing research on deploying small-scale, battery-powered and wirelessly connected UAVs carrying cameras for disaster management applications. In this ''aerial sensor network'' several UAVs fly in formations and cooperate to achieve a certain mission. The ultimate goal is to have an aerial imaging system in which UAVs build a flight formation, fly over a disaster area such as wood fire or a large traffic accident, and deliver high-quality sensor data such as images or videos. These images and videos are communicated to the ground, fused, analyzed in real-time, and finally delivered to the user.In this paper we introduce our aerial sensor network and its application in disaster situations. We discuss challenges of such aerial sensor networks and focus on the optimal placement of sensors. We formulate the coverage problem as integer linear program (ILP) and present first evaluation results.
While approaches based on bags of features excel at lowlevel action classification, they are ill-suited for recognizing complex events in video, where concept-based temporal representations currently dominate. This paper proposes a novel representation that captures the temporal dynamics of windowed mid-level concept detectors in order to improve complex event recognition. We first express each video as an ordered vector time series, where each time step consists of the vector formed from the concatenated confidences of the pre-trained concept detectors. We hypothesize that the dynamics of time series for different instances from the same event class, as captured by simple linear dynamical system (LDS) models, are likely to be similar even if the instances differ in terms of low-level visual features. We propose a two-part representation composed of fusing: (1) a singular value decomposition of block Hankel matrices (SSID-S) and (2) a harmonic signature (H-S) computed from the corresponding eigen-dynamics matrix. The proposed method offers several benefits over alternate approaches: our approach is straightforward to implement, directly employs existing concept detectors and can be plugged into linear classification frameworks. Results on standard datasets such as NIST's TRECVID Multimedia Event Detection task demonstrate the improved accuracy of the proposed method.
Animated GIFs are everywhere on the Web. Our work focuses on the computational prediction of emotions perceived by viewers after they are shown animated GIF images. We evaluate our results on a dataset of over 3,800 animated GIFs gathered from MIT's GIFGIF platform, each with scores for 17 discrete emotions aggregated from over 2.5M user annotations -the first computational evaluation of its kind for content-based prediction on animated GIFs to our knowledge. In addition, we advocate a conceptual paradigm in emotion prediction that shows delineating distinct types of emotion is important and is useful to be concrete about the emotion target. One of our objectives is to systematically compare different types of content features for emotion prediction, including low-level, aesthetics, semantic and face features. We also formulate a multi-task regression problem to evaluate whether viewer perceived emotion prediction can benefit from jointly learning across emotion classes compared to disjoint, independent learning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations鈥揷itations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright 漏 2024 scite LLC. All rights reserved.
Made with 馃挋 for researchers
Part of the Research Solutions Family.