A cloud infrastructure for target detection and tracking using audio and video fusion

Liu, Kui; Liu, Bingwei; Blasch, Erik; Shen, Dan; Wang, Zhonghai; Ling, Haibin; Chen, Genshe

doi:10.1109/cvprw.2015.7301299

Cited by 11 publications

(7 citation statements)

References 49 publications

(42 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Generally, application-specific visual features are extracted from the eye and mouth regions. Some of the visual features available in the literature include model-based, motion-based, image-based, and geometry-based features [5,6,8]. In most instances, the modality information is merged only after the feature extraction.…”

Section: Featuresmentioning

confidence: 99%

See 1 more Smart Citation

Survey on Fusion of Audiovisual Information for Multimedia Event Recognition

Jayalakshmi

Jothilakshmi

Ranjith

et al. 2021

Artificial Intelligence and Technologies

View full text Add to dashboard Cite

Recognition of video activities remains a critical problem in both vision and machine learning. They have numerous prospective applications such as autonomous surveillance [1, 2], video tagging, and multimedia information retrieval [3]. Most of the literature on machine learning is focusing on the visual modality and the fusion of other modalities, such as audio. In reality, event awareness and recognition are exhaustive, as they are based on simultaneous sensations occurring generally at the same time through the human sensory organs. Using a fusion of audio and video data improves the efficiency of multiple applications, such as automatic video event detection, video retrieval, or emotional recognition [4]. In comparison, the current analysis has found that audio information provides complementary information to capture information contained in the videos. For the two separate modalities to be fused, we must integrate the techniques used in the modalities. In this paper, we focus on a brief survey of features and classifiers used for fusion of audiovisual event recognition.The main focus of audio event recognition is to identify different types of audio events present in the real-time environment. It plays a vital role in many applica-

show abstract

Section: Featuresmentioning

confidence: 99%

“…There is a range of application areas using multimodal data convergence and fusion. Some of the applications include the following: (i) biomedical systems for emergency care, (ii) health monitoring system, (iii) smart outdoor environment monitoring [6], (iv) multimodal video retrieval, and (v) emotion recognition [7].…”

mentioning

confidence: 99%

Survey on Fusion of Audiovisual Information for Multimedia Event Recognition

Jayalakshmi

Jothilakshmi

Ranjith

et al. 2021

Artificial Intelligence and Technologies

View full text Add to dashboard Cite

show abstract

“…The comparative technique includes the NSA [23], Exponential Weighted Moving Average (EWMA) [29], NSA + EWMA, NSA + NARX. The results of the proposed SSDM + ENN are compared with the other existing techniques to highlight the dominance of the techniques.…”

Section: Comparative Techniquesmentioning

confidence: 99%

Support vector regression and extended nearest neighbor for video object retrieval

Ghuge

Ruikar²,

Prakash

2018

Evol. Intel.

View full text Add to dashboard Cite

Video retrieval is one of the emerging areas in video capturing that gained various technical advances, increasing the availability of a huge mass of videos. For the text or the image query given, retrieving the relevant videos and the objects from the videos is not always an easy task. A hybrid model was developed in the previous work using the Nearest Search Algorithm (NSA) and exponential weighted moving average (EWMA), for the video object retrieval. In NSA + EWMA, the object trajectories are retrieved based on the query specific distance. This work extends the previous work by developing a novel path equalization scheme for equalizing the path length of the query and the tracked object. Initially, a hybrid model based on Support Vector Regression and NSA tracks the position of the object in the video. The proposed density measure scheme equalizes the path length of the query and the object. Then, the identified path length related to the query is given to extended nearest neighbor classifier for retrieving the video. From the simulation results, it is evident that the proposed video retrieval scheme achieved high values of 0.901, 0.860, 0.849, and 0.922 for precision, recall, F-measure, and multiple object tracking precision, respectively.

show abstract

“…[21][22][23] Through the use of image quality, various image processing methods have been developed for cloud architectures. 24,25 An open research question is the alignment of machine-level image interpretability with that of human observers, 26,27 although initial comparisons suggest the human perception and machine-level processing are sensitive to different image characteristics. 28,29 Many examples to compute the NIIRS have been reported 11 and updates are included in the Motion Imagery Standards Board.…”

Section: Introductionmentioning

confidence: 99%

Prediction of compression-induced image interpretability degradation

et al. 2018

Self Cite

View full text Add to dashboard Cite

Image compression is an important component in modern imaging systems as the volume of the raw data collected is increasing. To reduce the volume of data while collecting imagery useful for analysis, choosing the appropriate image compression method is desired. Lossless compression is able to preserve all the information, but it has limited reduction power. On the other hand, lossy compression, which may result in very high compression ratios, suffers from information loss. We model the compression-induced information loss in terms of the National Imagery Interpretability Rating Scale or NIIRS. NIIRS is a user-based quantification of image interpretability widely adopted by the Geographic Information System community. Specifically, we present the Compression Degradation Image Function Index (CoDIFI) framework that predicts the NIIRS degradation (i.e., a decrease of NIIRS level) for a given compression setting. The CoDIFI-NIIRS framework enables a user to broker the maximum compression setting while maintaining a specified NIIRS rating.

show abstract

A cloud infrastructure for target detection and tracking using audio and video fusion

Cited by 11 publications

References 49 publications

Survey on Fusion of Audiovisual Information for Multimedia Event Recognition

Survey on Fusion of Audiovisual Information for Multimedia Event Recognition

Support vector regression and extended nearest neighbor for video object retrieval

Prediction of compression-induced image interpretability degradation

Contact Info

Product

Resources

About