Legged robots are very popular topics in the robotic field owing to walking on hard terrain. In the current study, the walking of a bipedal robot that was a legged robot was aimed. For this purpose, the system was examined and an artificial neural network was designed. After, the neural network was trained by using the Deep Deterministic Policy Gradient (DDPG) and the Proximal Policy Optimization (PPO) algorithms. After the training process, the PPO algorithm was formed better training performance than the DDPG algorithm. Also, the optimal noise standard deviation of the PPO algorithm was investigated. The results were shown that the best results were obtained by using 0.50. The system was tested by utilizing the artificial neural networks that trained the PPO algorithm which has got 0.50 noise standard deviation. The total reward in the test was calculated as 274.334 and the walking task was achieved by purposed structure. As a result, the current study has formed the basis for controlling a bipedal robot and the PPO noise standard deviation selection.