Deep learning methods are frequently used in segmenting histopathology images with high-quality annotations nowadays. Compared with well-annotated data, coarse, scribbling-like labelling is more cost-effective and easier to obtain in clinical practice. The coarse annotations provide limited supervision, so employing them directly for segmentation network training remains challenging. We present a sketch-supervised method, called DCTGN-CAM, based on a dual CNN-Transformer network and a modified global normalised class activation map. By modelling global and local tumour features simultaneously, the dual CNN-Transformer network produces accurate patchbased tumour classification probabilities by training only on lightly annotated data. With the global normalised class activation map, more descriptive gradient-based representations of the histopathology images can be obtained, and inference of tumour segmentation can be performed with high accuracy. Additionally, we collect a private skin cancer dataset named BSS, which contains fine and coarse annotations for three types of cancer. To facilitate reproducible performance comparison, experts are also invited to label coarse annotations on the public liver cancer dataset PAIP2019. On the BSS dataset, our DCTGN-CAM segmentation outperforms the state-of-the-art methods and achieves 76.68 % IOU and 86.69 % Dice scores on the sketch-based tumour segmentation task. On the PAIP2019 dataset, our method achieves a Dice gain of 8.37 % compared with U-Net as the baseline network. The dataset, annotation and code will be published at https://github.com/skdarkless/DCTGN-CAM.