Colorectal cancer (CRC) has been one of the top three disease in the world in terms of incidence for many years. Therefore, how to prevent and treat CRC has become a topic of concern for an increasing number of people, and colonoscopy is the most effective detection method in polyp examination. According to studies, 90% of CRC is caused by adenomatous polyps of the large intestine. In clinical practice, the diversity of polyps' size, number, and shape and the unclear boundary between polyps and colon folds can reduce the operator's accuracy of polyps segmentation and lead to a higher rate of missed diagnosis. To better address the inaccurate segmentation or high miss rate due to the above factors, we propose a text attention‐based CNN‐Transformer network for polyp segmentation (TACT) network to process the images in a way that minimizes operator subjectivity and miss rate. The network is based on the CNN‐Transformer structure, and on this basis, a fully attention progressive sampling module is added to more accurately divide the polyp boundary. Moreover, an auxiliary text classification task was added to focus on polyp size and number features in the form of text attention, which more effectively copes with the segmentation tasks of different sizes and different numbers of polyps. After comparing with multiple state‐of‐the‐art segmentation methods in four challenging datasets, our proposed TACT improves segmentation accuracy for polyps of different sizes in different datasets.