In distributed machine learning training, bulk synchronous parallel (BSP) and asynchronous parallel (ASP) are two main synchronization methods to help achieve gradient aggregation. However, BSP needs longer training time due to “stragglers” problem, while ASP sacrifices the accuracy due to “gradient staleness” problem. In this article, we propose a distributed training paradigm on parameter server framework called adaptive synchronous strategy (A2S) which improves the BSP and ASP paradigms by adaptively adopting different parallel training schemes for workers with different training speeds. Based on the stale value between the fastest and slowest worker, A2S adaptively adds a relaxed synchronous barrier for fast workers to alleviate gradient staleness, where a differentiated weighting gradient aggregation method is used to reduce the impact of slow gradients. Simultaneously, A2S adopts ASP training for slow workers to eliminate stragglers. Hence, A2S not only improves the “gradient staleness” and “stragglers” problems, but also obtains convergence stability and synchronous gain through synchronous and asynchronous parallel, respectively. Specially, we theoretically proved the convergence of A2S by deriving the regret bound. Moreover, experiment results show that A2S improves accuracy by up to 2.64% and accelerates training by up to 41% more than the state‐of‐the‐art synchronization methods BSP, ASP, stale synchronous parallel (SSP), dynamic SSP, and Sync‐switch.