Two common methods exist for solving indoor autonomous navigation and obstacle-avoidance problems using monocular vision: the traditional simultaneous localization and mapping (SLAM) method, which requires complex hardware, heavy calculations, and is prone to errors in low texture or dynamic environments; and deep-learning algorithms, which use the fully connected layer for classification or regression, resulting in more model parameters and easy over-fitting. Among the latter ones, the most advanced indoor navigation algorithm divides a single image frame into multiple parts for prediction, resulting in doubled reasoning time. To solve these problems, we propose a multi-task deep network based on feature map region division for monocular indoor autonomous navigation. We divide the feature map instead of the original image to avoid repeated information processing. To reduce model parameters, we use convolution instead of the fully connected layer to predict the navigable probability of the left, middle, and right parts. We propose that the linear velocity is determined by combining three prediction probabilities to reduce collision risk. Experimental evaluation shows that the proposed method is nine times smaller than the previous state-of-the-art methods; further, its processing speed and navigation capability increase more than five and 1.6 times, respectively.
At present, the main methods of solving the monocular depth estimation for indoor drones are the simultaneous localization and mapping (SLAM) algorithm and the deep learning algorithm. SLAM requires the construction of a depth map of the unknown environment, which is slow to calculate and generally requires expensive sensors, whereas current deep learning algorithms are mostly based on binary classification or regression. The output of the binary classification model gives the decision algorithm relatively rough control over the unmanned aerial vehicle. The regression model solves the problem of the binary classification, but it carries out the same processing for long and short distances, resulting in a decline in short-range prediction performance. In order to solve the above problems, according to the characteristics of the strong order correlation of the distance value, we propose a non-uniform spacing-increasing discretization-based ordinal regression algorithm (NSIDORA) to solve the monocular depth estimation for indoor drone tasks. According to the security requirements of this task, the distance label of the data set is discretized into three major areas—the dangerous area, decision area, and safety area—and the decision area is discretized based on spacing-increasing discretization. Considering the inconsistency of ordinal regression, a new distance decoder is produced. Experimental evaluation shows that the root-mean-square error (RMSE) of NSIDORA in the decision area is 33.5% lower than that of non-uniform discretization (NUD)-based ordinal regression methods. Although it is higher overall than that of the state-of-the-art two-stream regression algorithm, the RMSE of the NSIDORA in the top 10 categories of the decision area is 21.8% lower than that of the two-stream regression algorithm. The inference speed of NSIDORA is 3.4 times faster than that of two-stream ordinal regression. Furthermore, the effectiveness of the decoder has been proved through ablation experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.