The bipedal walking robot is an advanced anthropomorphic robot that can mimic the human ability to walk. Controlling the bipedal walking robot is difficult due to its nonlinearity and complexity. To solve this problem, recent studies have applied various machine learning algorithms based on reinforcement learning approaches, however most of them rely on deterministic-policybased strategy. This research proposes Soft Actor Critic (SAC), which has stochastic policy strategy for controlling the bipedal walking robot. The option thought deterministic and stochastic policy affects the exploration of DRL algorithm. The SAC is a Deep Reinforcement Learning (DRL) based algorithm whose improvement obtained through the augmented entropy-based expected return allows the SAC algorithm to learn faster, gain exploration ability, and still ensure convergence. The SAC algorithm's performance is validated with a bipedal robot to walk towards the straight-line trajectory. The number of the reward and the cumulative reward during the training is used as the algorithm's performance evaluation. The SAC algorithm controls the bipedal walking robot well with a total reward of 384,752.8.