Autonomy is an issue in robotics systems. Currently the robotics research community is focusing on autonomy in the decision. The pre-programmed humanoid robots perform the operation in a known scenario. The presence of an obscure environment, a lack of awareness of the environment, humanoid robot fails to perform the random task. In such cases, the preprogrammed robot needs to be reprogrammed to enable it to perform in the changing environment. There is very limited success achieved in the autonomy of the decision-making process. This research work considered the problem of an autonomy while the deicion making in th real time interaction. The reinforcement algorithm helps to do the task in such type of unstructured and unknown environment. Reinforcement learning problems are categorized into partial Markov Decision Processes (MDP). The goal of RL agent is to minimize its immediate and expected costs. When the system interacts with the Markov Decision Process, RL agent passes through an intermediate sequence of states that depends on another by transition probabilities. The agents action takes, and the agents experience a sequence of immediate costs incurred. Reinforcement Learning and teaching approach like Queue Learning (Q-Learning) is implemented for humanoid robot for navigation and exploration. The Q-learning expresses the expected costs to go of a state action pair defined, which is meant to express the expected costs arising after having taken action in the state following policy. Based on the optimal policy of the reinforcement algorithm, a reinforcement controller was implemented. The transition probabilities of the controller depend on the randomness of the controller. The random values of the controller decide the action. Simulations were carried out for the different positions of the proposed model, and an interesting result were was observed while the transition from the sitting position to the goal position.