Deep learning has become a hot topic in artificial intelligence due to its ability to model complex concepts from simple ones. In this regard, the convolutional neural network (CNN) is one of the most popular kinds of neural networks currently used in computer vision and related areas. In general, the following factors contributed to its popularity. (i) With enough data, most CNNs can be trained from scratch and learn powerful representations that solve the task at stake. (ii) On the other hand, with a limited volume of data, it is possible to also learn powerful representations by adapting the knowledge of a pre-trained CNN model via a transfer learning strategy. As a result, CNNs have advanced the state-of-the-art in many visual recognition tasks, leading to numerous applications in various fields outside of computer science, such as medicine and biology. Nevertheless, many of the best research efforts are focused on improving the state-of-the-art on a few datasets, such as ImageNet for image classification and COCO for object detection. On the other hand, research progress in many other domains is reduced to blindly applying existing approaches or re-inventing everything from scratch, resulting in the development of flawed methods in both cases. Therefore, this thesis focuses on understanding through systematic experiments why and when a pre-trained CNN model underperforms on a given task, to propose suitable solutions. In the first part of our study, we examined the task of texture recognition and discovered that all previous studies tended to focus exclusively on category-based texture datasets, leading to the misconception that only the deepest layers had the texture information needed to solve that task. We then show, by proposing multilayer transfer learning strategies, that the contribution of shallow layers is not trivial and should be used in certain applications. In the second part of our study, we focus on challenging object detection tasks (pollen grain detection and stomata localization), where we observe a situation similar to that of texture recognition. Therefore, in both cases, we also applied multilayer analysis to propose fast single-stage detectors that can handle large images accurately and efficiently.