Tracking of moving objects in video sequences is an important research problem because of its many industrial, biomedical, and security applications. Significant progress has been made on this topic in the last few decades. However, the ability to track objects accurately in video sequences that have challenging conditions and unexpected events, e.g. background motion and shadows; objects with different sizes and contrasts; a sudden change in illumination; partial object camouflage; and low signal-to-noise ratio, remains an important research problem. To address such difficulties, the authors developed a robust multiscale visual tracker that represents a captured video frame as different subbands in the wavelet domain. It then applies N independent particle filters to a small subset of these subbands, where the choice of this subset of wavelet subbands changes with each captured frame. Finally, it fuses the outputs of these N independent particle filters to obtain final position tracks of multiple moving objects in the video sequence. To demonstrate the robustness of their multiscale visual tracker, they applied it to four example videos that exhibit different challenges. Compared to a standard full-resolution particle filter-based tracker and a single wavelet subband (LL) 2-based tracker, their multiscale tracker demonstrates significantly better tracking performance.
The technological advances in computational systems have enabled very complex computer vision and machine learning approaches to perform efficiently and accurately. These new approaches can be considered a new set of tools to reshape the visual SLAM solutions. We present an investigation of the latest neuroscientific research that explains how the human brain can accurately navigate and map unknown environments. The accuracy suggests that human navigation is not affected by traditional visual odometry drifts resulting from tracking visual features. It utilises the geometrical structures of the surrounding objects within the navigated space. The identified objects and space geometrical shapes anchor the estimated space representation and mitigate the overall drift. Inspired by the human brain’s navigation techniques, this paper presents our efforts to incorporate two machine learning techniques into a VSLAM solution: semantic segmentation and layout estimation to imitate human abilities to map new environments. The proposed system benefits from the geometrical relations between the corner points of the cuboid environments to improve the accuracy of trajectory estimation. Moreover, the implemented SLAM solution semantically groups the map points and then tracks each group independently to limit the system drift. The implemented solution yielded higher trajectory accuracy and immunity to large pure rotations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.