Newton-Raphson DC analysis of large-scale nonlinear circuits may be an extremely time consuming process even if sparse matrix techniques and bypassing of nonlinear models calculation are used. A slight decrease in the time required for this task may be enabled on multi-core, multithread computers if the calculation of the mathematical models for the nonlinear elements as well as the stamp management of the sparse matrix entries is managed through concurrent processes. In this paper it is shown how the numerical complexity of this problem (and thus its solution time) can be further reduced via the circuit decomposition and parallel solution of blocks taking as a departure point the Bordered-Block Diagonal (BBD) matrix structure. This BBD-parallel approach may give a considerable profit though it is strongly dependent on the system topology. This paper presents a theoretical foundation of the algorithm, its implementation, and numerical complexity analysis in virtue of practical measurements of matrix operations.