Convolutional Neural Networks (CNNs) have been used extensively for computer vision tasks and produce rich feature representation for objects or parts of an image. But reasoning about scenes requires integration between the low-level feature representations and the high-level semantic information.We propose a deep network architecture which models the semantic context of scenes by capturing object-level information. We use Long Short Term Memory(LSTM) units in conjunction with object proposals to incorporate object-object relationship and object-scene relationship in an end-to-end trainable manner. We evaluate our model on the LSUN dataset and achieve results comparable to the state-of-art. We further show visualization of the learned features and analyze the model with experiments to verify our model's ability to model context.
Exploiting visual cues to control systems for robotic applications is a promising idea. Practically, they are usable only if the end result accedes to the restrictions of the specific environment. These restrictions are like the limited field of view of the camera and physical constraints of the workspace of the robot. Hence, there is a need for a general framework that can be used adaptively across various environments. We develop such an algorithmic framework that is flexible to accommodate various kinds of constraints and generate a solution that is optimal in the sense of the considered error measure. We perform a constrained optimization on the error in a convex domain considering all the necessary constraints using convex optimization techniques and further extend it to nonconvex domains. We utilize branch-and-bound algorithm to divide the problem of optimizing over a range of rotations into simpler problems and solve for the optimal rotation. We demonstrate the performance of the algorithm by generating control signals in a simulated framework for visual servoing and in a real-world for robot navigation.
Abstract-This paper explores the possibility of using convex optimization to address a class of problems in visual servoing. This work is motivated by the recent success of convex optimization methods in solving geometric inference problems in computer vision. We formulate the visual servoing problem with feature visibility constraints as a convex optimization of a function of the camera position i.e. the translation of the camera. First, the path is planned using potential field method that produces unconstrained but straight line path from the initial to the desired camera position. The problem is then converted to a constrained convex optimization problem by introducing the visibility constraints to the minimization problem. The objective of the minimization process is to find for each camera position the closest alternate position from which all features are visible. This algorithm ensures that the solution is optimal. This formulation allows the introduction of more constraints, like joint limits of the arm, into the visual servoing process. The results have been illustrated in a simulation framework.
We present a method to control the emotional prosody of Text to Speech (TTS) systems by using phoneme-level intermediate features (pitch, energy, and duration) as levers. As a key idea, we propose Differential Scaling (DS) to disentangle features relating to affective prosody from those arising due to acoustics conditions and speaker identity. With thorough experimental studies, we show that the proposed method improves over the prior art in accurately emulating the desired emotions while retaining the naturalness of speech. We extend the traditional evaluation of using individual sentences for a more complete evaluation of HCI systems. We present a novel experimental setup by replacing an actor with a TTS system in offline and live conversations. The emotion to be rendered is either predicted or manually assigned. The results show that the proposed method is strongly preferred over the state-of-the-art TTS system and adds the much-coveted "human touch" in machine dialogue. Audio samples for our experiments and the code are available at: https: //emtts.github.io/tts-demo/
Abstract. In this paper, we propose a convex optimization based approach for piecewise planar reconstruction. We show that the task of reconstructing a piecewise planar environment can be set in an L∞ based Homographic framework that iteratively computes scene plane and camera pose parameters. Instead of image points, the algorithm optimizes over inter-image homographies. The resultant objective functions are minimized using Second Order Cone Programming algorithms. Apart from showing the convergence of the algorithm, we also empirically verify its robustness to error in initialization through various experiments on synthetic and real data. We intend this algorithm to be in between initialization approaches like decomposition methods and iterative non-linear minimization methods like Bundle Adjustment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.