In the task of matching Chinese sentences, the key semantics within sentences and the deep interaction between them significantly affect the matching performance. However, previous studies mainly relied on shallow interactions based on a single semantic granularity, which left them vulnerable to interference from overlapping terms. It is particularly challenging to distinguish between positive and negative examples within datasets from the same thematic domain. This paper proposes a sentence-matching model that incorporates multi-granularity contextual key semantic interaction. The model combines multi-scale convolution and multi-level convolution to extract different levels of contextual semantic information at word, phrase, and sentence granularities. It employs multi-head self-attention and cross-attention mechanisms to align the key semantics between sentences. Furthermore, the model integrates the original, similarity, and dissimilarity information of sentences to establish deep semantic interaction. Experimental results on both open- and closed-domain datasets demonstrate that the proposed model outperforms existing baseline models in terms of matching performance. Additionally, the model achieves matching effectiveness comparable to large-scale pre-trained language models while utilizing a lightweight encoder.