Fruit detection in orchards is one of the most crucial tasks for designing the visual system of an automated harvesting robot. It is the first and foremost tool employed for tasks such as sorting, grading, harvesting, disease control, and yield estimation, etc. Efficient visual systems are crucial for designing an automated robot. However, conventional fruit detection methods always a trade-off with accuracy, real-time response, and extensibility. Therefore, an improved method is proposed based on coarse-to-fine multitask cascaded convolutional networks (MTCNN) with three aspects to enable the practical application. First, the architecture of Fruit-MTCNN was improved to increase its power to discriminate between objects and their backgrounds. Then, with a few manual labels and operations, synthetic images and labels were generated to increase the diversity and the number of image samples. Further, through the online hard example mining (OHEM) strategy during training, the detector retrained hard examples. Finally, the improved detector was tested for its performance that proved superior in predicted accuracy and retaining good performances on portability with the low time cost. Based on performance, it was concluded that the detector could be applied practically in the actual orchard environment.