The rising number of technological advanced devices making network coverage planning very challenging tasks for network operators. The transmission quality between the transmitter and the end users has to be optimum for the best performance out of any device. Besides, the presence of coverage hole is also an ongoing issue for operators which cannot be ignored throughout the whole operational stage. Any coverage hole in network operators' coverage region will hamper the communication applications and degrade the reputation of the operator's services. Presently, there are techniques to detect coverage holes such as drive test or minimization of drive test. However, these approaches have many limitations. The extreme costs, outdated information about the radio environment and high time consumption do not allow to meet the requirement competently. To overcome these problems, we take advantage of Unmanned aerial vehicle (UAV) and Q-learning to autonomously detect coverage hole in a given area and then deploy UAV based base station (UAV-BS) by considering wireless backhaul with the core network and users demand. This machine learning mechanism will help the UAV to eliminate human-in-the-loop (HiTL) model. Later, we formulate an optimisation problem for 3D UAV-BS placement at various angular positions to maximise the number of users associated with the UAV-BS. In summary, we have illustrated a cost-effective as well as time saving approach of detecting coverage hole and providing on-demand coverage in this article.