Two common methods exist for solving indoor autonomous navigation and obstacle-avoidance problems using monocular vision: the traditional simultaneous localization and mapping (SLAM) method, which requires complex hardware, heavy calculations, and is prone to errors in low texture or dynamic environments; and deep-learning algorithms, which use the fully connected layer for classification or regression, resulting in more model parameters and easy over-fitting. Among the latter ones, the most advanced indoor navigation algorithm divides a single image frame into multiple parts for prediction, resulting in doubled reasoning time. To solve these problems, we propose a multi-task deep network based on feature map region division for monocular indoor autonomous navigation. We divide the feature map instead of the original image to avoid repeated information processing. To reduce model parameters, we use convolution instead of the fully connected layer to predict the navigable probability of the left, middle, and right parts. We propose that the linear velocity is determined by combining three prediction probabilities to reduce collision risk. Experimental evaluation shows that the proposed method is nine times smaller than the previous state-of-the-art methods; further, its processing speed and navigation capability increase more than five and 1.6 times, respectively.