In this work we propose an efficient branch-and-bound (B&B) algorithm for the permutation flowshop problem (PFSP) with makespan objective. We present a new node decomposition scheme that combines dynamic branching and lower bound refinement strategies in a computationally efficient way. To alleviate the computational burden of the two-machine bound used in the refinement stage, we propose an online learning-inspired mechanism to predict promising couples of bottleneck machines. The algorithm offers multiple choices for branching and bounding operators and can explore the search tree either sequentially or in parallel on multi-core CPUs. In order to empirically determine the most efficient combination of these components, a series of computational experiments with 600 benchmark instances is performed. A main insight is that the problem size, as well as interactions between branching and bounding operators substantially modify the trade-off between the computational requirements of a lower bound and the achieved tree size reduction. Moreover, we demonstrate that parallel tree search is a key ingredient for the resolution of large problem instances, as strong super-linear speedups can be observed. An overall evaluation using two well-known benchmarks indicates that the proposed approach is superior to previously published B&B algorithms. For the first benchmark we report the exact resolution -within less than 20 minutes -of two instances defined by 500 jobs and 20 machines that remained open for more than 25 years, and for the second a total of 89 improved best-known upper bounds, including proofs of optimality for 74 of them. . In contrast, exact methods allow to find optimal solution(s) with a proof of optimality, but their execution time is unpredictable and exponential in the worst-case.Branch-and-Bound (B&B) is the most frequently used exact method to solve combinatorial optimization problems like the PFSP. The algorithm recursively decomposes the initial problem by dynamically constructing and exploring a search-tree, whose root node represents the initial problem, leaf nodes are possible solutions and internal nodes are subproblems of the initial problem. This is done using four operators: branching, bounding, selection and pruning. The branching operator divides the initial problem into smaller disjoint subproblems and a bounding function computes lower bounds on the optimal cost of a subproblem. The pruning operator eliminates subproblems whose lower bound exceeds the cost of the best solution found so far (upper bound on the optimal makespan). The tree-traversal is guided by the selection operator which returns the next subproblem to be processed according to a search strategy (e.g. depth-first search).In this paper the focus is put on three performance-critical components of the algorithm: the lower bound (LB), the branching rule and the use of parallel tree exploration. Although they can be separated on a conceptual level, the main objective of this article is to reveal interactions between these compone...