Phoxonic crystal is a periodic artificial structure that can manipulate optical and acoustic waves in the same temporal and spatial domain. It has broad application prospect in optical communication, optical mechanics sensor, quantum computations, phoxonic crystal integrated devices and so on. In this paper, we adopt a silicon-based two-dimensional square lattice structure, which can exhibit wide band gap of phonons and photons simultaneously. Then a periodic rectangular structure is introduced on the surface, the effects of the height and width of the rectangle on the optical and acoustic surface wave modes are analyzed. Based on the mode gap effect, a surface heterostructure composed of rectangles with different heights and widths is constructed. Then two identical surface heterostructures are placed face to face with an air slot in the middle, and connected with silicon substrate on both sides, which form an air slot heterostructure cavity. Five phononic cavity modes and three photonic cavity modes are obtained, the acousto-optical coupling rates between phononic and photonic cavity modes are calculated. The results show that the coupling rate between phononic and photonic cavity mode with the same symmetry and maximum overlap is the largest, and the coupling rates between the combination of phononic cavity modes α and β and photonic cavity modes can be adjusted by changing the phase difference φ of modes α and β. In this paper, the finite element method is used to simulate the calculation.