In the field programmable gate array (FPGA) design flow, one of the most time-consuming steps is the routing of nets. Therefore, there is a need to accelerate it. In a recent work by Hoo et al., the authors have developed a linear programming (LP)-based framework that parallelizes this routing process to achieve significant speed-ups (the resulting algorithm is termed as ParaLaR). However, this approach has certain weaknesses. Namely, the constraints violation by the solution and a standard routing metric could be improved. We address these two issues here. In this paper, we use the LP framework of ParaLaR and solve it using the primal-dual sub-gradient method that better exploits the problem properties. We also propose a better way to update the size of the step taken by this iterative algorithm. We call our algorithm as ParaLarPD. We perform experiments on a set of standard benchmarks, where we show that our algorithm outperforms not just ParaLaR but the standard existing algorithm VPR as well. We perform experiments with two different configurations. We achieve 20% average improvement in the constraints violation and the standard metric of the minimum channel width (both of which are related) when compared with ParaLaR. When compared to VPR, we get average improvements of 28% in the minimum channel width (there is no constraints violation in VPR). We obtain the same value for the total wire length as by ParaLaR, which is 49% better on an average than that obtained by VPR. This is the original metric to be minimized, for which ParaLaR was proposed. Next, we look at the third and easily measurable metric of critical path delay. On an average, ParaLarPD gives 2% larger critical path delay than ParaLaR and 3% better than VPR. We achieve maximum relative speed-ups of up to seven times when running a parallel version of our algorithm using eight threads as compared to the sequential implementation. These speed-ups are similar to those as obtained by ParaLaR.CAD (computer-aided design) tools. This can be achieved in two ways. First, by parallelizing the routing algorithms for hardware having multiple cores. However, the pathfinder algorithm [3], which is one of the most commonly used FPGA routing algorithm is intrinsically sequential. Hence, this approach seems inappropriate for parallelizing all types of FPGA routing algorithms.Second, instead of compiling the entire design together, the users can partition their design, compile partitions progressively, and then assemble all the partitions to form the entire design. Some existing works have proposed this approach [4,5]. However, the routing resources required by one partition may be held by another partition, i.e., there is no guarantee to have balanced partitions. In other words, in this approach, there is a need to tackle the difficulties arising in sharing of routing resources.The authors in ParaLaR [6] overcome the limitations of existing approaches by formulating the FPGA routing problem as an optimization problem [7]. Here, the objective function is linear an...