Binary neural networks (BNNs) are adequate for energy-constrained embedded systems thanks to binarized parameters. Several researchers have proposed the compute-in-memory (CiM) SRAMs for XNOR-and-accumulation computations (XACs) in BNNs by adding additional transistors to the conventional 6T SRAM, which reduce the latency and energy of the data movements. However, due to the additional transistors, the CiM SRAMs suffer from larger area and longer wires than the conventional 6T SRAMs. Meanwhile, monolithic 3D (M3D) integration enables fine-grained 3D integration, reducing the 2D wire length in small functional units. In this paper, we propose a BNN accelerator (BNN_Accel), composed of a 9T CiM SRAM (CiM_SRAM), input buffer, and global periphery logic, to execute the computations in the binarized convolution layers of BNNs. We also propose CiM_SRAM with the subarray-level M3D integration (as well as the transistor-level M3D integration), which reduces the wire latency and energy compared to the 2D planar CiM_SRAM. Across the binarized convolution layers, our simulation results show that BNN_Accel with the 4-layer CiM_SRAM reduces the average execution time and energy by 39.9% and 23.2%, respectively, compared to BNN_Accel with the 2D planar CiM_SRAM.