Current Mixed Integer Linear Programming (MILP)-based search against symmetric-key primitives with 8-bit S-boxes can only build word-wise model to search for truncated differential characteristics. In such a model, the properties of the Differential Distribution Table (DDT) are not considered. To take these properties into account, a bit-wise model is necessary, which can be generated by the H-representation of the convex hull or the logical condition modeling. However, the complexity of both approaches becomes impractical when the size of the S-box exceeds 5 bits. In this paper, we propose a new modeling for large (8-bit or more) S-boxes. In particular, we first propose an algorithm to generate a bit-wise model of the DDT for large S-boxes. We observe that the problem of generating constraints in logical condition modeling can be converted into the problem of minimizing the product-of-sum of Boolean functions, which is a well-studied problem. Hence, classical off-the-shelf solutions such as the Quine-McCluskey algorithm or the Espresso algorithm can be utilized, which makes building a bit-wise model, for 8-bit or larger S-boxes, practical. Then this model is further extended to search for the best differential characteristic by considering the probabilities of each propagation in the DDT, which is a much harder problem than searching for the lower bound on the number of active S-boxes. Our idea is to separate the DDT into multiple tables for each probability and add conditional constraints to control the behavior of these multiple tables. The proposed modeling is first applied to SKINNY-128 to find that there is no differential characteristic having probability higher than 2−128 for 14 rounds, while the designers originally expected that 15 rounds were required. We also applied the proposed modeling to two, arbitrarily selected, constructions of the seven AES round function based constructions proposed in FSE 2016 and managed to improve the lower bound on the number of the active S-boxes in one construction and the upper bound on the differential characteristic for the other.