We propose a phoxonic cavity with structural hierarchy to enhance acousto-optic interaction in acoustically dissipative media. In a conventional phoxonic cavity, interaction between infrared light and hypersound with the same wavelength scale became weak due to large acoustic attenuation whose coefficient is proportional to the square of the frequency. To alleviate the acoustic attenuation, it is necessary to use low-frequency sound with much longer wavelength than the infrared light, but the conventional phoxonic cavity is not suitable for confining such hypersound and infrared light simultaneously. In this study, we employ the concept of structural hierarchy into the phoxonic cavity to control infrared light and hypersound with different wavelength scales. A phoxonic cavity with two different scales achieves the acousto-optic interaction approximately 1.6 times that in the conventional one. To further enhance the interaction, we adjust geometrical constitution and material properties of the two-scale phoxonic cavity using quasi-static homogenization theory, leading to the interaction about 2.1 times that in the conventional cavity.