Learning semantic attributes for person re-identification and description-based person search has gained increasing interest due to attributes' great potential as a pose and view-invariant representation. However, existing attributecentric approaches have thus far underperformed state-ofthe-art conventional approaches. This is due to their nonscalable need for extensive domain (camera) specific annotation. In this paper we present a new semantic attribute learning approach for person re-identification and search. Our model is trained on existing fashion photography datasets -either weakly or strongly labelled. It can then be transferred and adapted to provide a powerful semantic description of surveillance person detections, without requiring any surveillance domain supervision. The resulting representation is useful for both unsupervised and supervised person re-identification, achieving state-of-the-art and near state-of-the-art performance respectively. Furthermore, as a semantic representation it allows description-based person search to be integrated within the same framework.
Most existing approaches to training object detectors rely on fully supervised learning, which requires the tedious manual annotation of object location in a training set. Recently there has been an increasing interest in developing weakly supervised approach to detector training where the object location is not manually annotated but automatically determined based on binary (weak) labels indicating if a training image contains the object. This is a challenging problem because each image can contain many candidate object locations which partially overlaps the object of interest. Existing approaches focus on how to best utilise the binary labels for object location annotation. In this paper we propose to solve this problem from a very different perspective by casting it as a transfer learning problem. Specifically, we formulate a novel transfer learning based on learning to rank, which effectively transfers a model for automatic annotation of object location from an auxiliary dataset to a target dataset with completely unrelated object categories. We show that our approach outperforms existing state-of-the-art weakly supervised approach to annotating objects in the challenging VOC dataset.
We address the problem of localisation of objects as bounding boxes in images with weak labels. This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. We propose a novel framework based on Bayesian joint topic modelling. Our framework has three distinctive advantages over previous works: (1) All object classes and image backgrounds are modelled jointly together in a single generative model so that "explaining away" inference can resolve ambiguity and lead to better learning and localisation. (2) The Bayesian formulation of the model enables easy integration of prior knowledge about object appearance to compensate for limited supervision.(3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. Extensive experiments on the challenging VOC dataset demonstrate that our approach outperforms the state-of-the-art competitors.
Existing RNN-based approaches for action recognition from depth sequences require either skeleton joints or handcrafted depth features as inputs. An end-to-end manner, mapping from raw depth maps to action classes, is nontrivial to design due to the fact that: 1) single channel map lacks texture thus weakens the discriminative power; 2) relatively small set of depth training data. To address these challenges, we propose to learn an RNN driven by privileged information (PI) in three-steps: An encoder is pretrained to learn a joint embedding of depth appearance and PI (i.e. skeleton joints). The learned embedding layers are then tuned in the learning step, aiming to optimize the network by exploiting PI in a form of multi-task loss. However, exploiting PI as a secondary task provides little help to improve the performance of a primary task (i.e. classification) due to the gap between them. Finally, a bridging matrix is defined to connect two tasks by discovering latent PI in the refining step. Our PI-based classification loss maintains a consistency between latent PI and predicted distribution. The latent PI and network are iteratively estimated and updated in an expectation-maximization procedure. The proposed learning process provides greater discriminative power to model subtle depth difference, while helping avoid overfitting the scarcer training data. Our experiments show significant performance gains over stateof-the-art methods on three public benchmark datasets and our newly collected Blanket dataset.
Safety ranks the first in Air Traffic Management (ATM). Accurate trajectory prediction can help ATM to forecast potential dangers and effectively provide instructions for safely traveling. Most trajectory prediction algorithms work for land traffic, which rely on points of interest (POIs) and are only suitable for stationary road condition. Compared with land traffic prediction, flight trajectory prediction is very difficult because way-points are sparse and the flight envelopes are heavily affected by external factors. In this paper, we propose a flight trajectory prediction model based on a Long Short-Term Memory (LSTM) network. The four interacting layers of a repeating module in an LSTM enables it to connect the long-term dependencies to present predicting task. Applying sliding windows in LSTM maintains the continuity and avoids compromising the dynamic dependencies of adjacent states in the long-term sequences, which helps to improve accuracy of trajectory prediction. Taking time dimension into consideration, both 3-D (time stamp, latitude and longitude) and 4-D (time stamp, latitude, longitude and altitude) trajectories are predicted to prove the efficiency of our approach. The dataset we use was collected by ADS-B ground stations. We evaluate our model by widely used measurements, such as the mean absolute error (MAE), the mean relative error (MRE), the root mean square error (RMSE) and the dynamic warping time (DWT) methods. As Markov Model is the most popular in time series processing, comparisons among Markov Model (MM), weighted Markov Model (wMM) and our model are presented. Our model outperforms the existing models (MM and wMM) and provides a strong basis for abnormal detection and decision-making.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.