The desired local feature descriptor should be distinctive, compact and fast to compute and match. Therefore, many computer vision applications use binary keypoint descriptors instead of floating-point, rich techniques. In this paper, an optimisation approach to the design of a binary descriptor is proposed, in which the detected keypoint is described using several, scale-dependent patches. Each such patch is divided into disjoint blocks of pixels, and then, binary tests between blocks' intensities, as well as their gradients, are used to obtain the binary string. Since the number of image patches and their relative sizes influence the descriptor creation pipeline, a simulated annealing algorithm is used to determine them, optimising recall and precision of keypoint matching. The simulated annealing is also used for dimensionality reduction in long binary strings. The proposed approach is extensively evaluated and compared with SIFT, SURF and BRIEF on public benchmarks. Obtained results show that the binary descriptor created using the resulted pipeline is faster to compute and yields comparable or better performance than the state-of-the-art descriptors under different image transformations.