Sketching is a simple and efficient way for humans to express their perceptions of the world. Sketch semantic segmentation plays a key role in sketch understanding, and is widely used in sketch recognition, sketch-based image retrieval or editing. Due to modality difference between images and sketches, existing image segmentation methods may not perform best, which overlook the sparse nature and stroke-based representation in sketches. The existing sketch semantic segmentation methods are mainly designed for single instance sketches. In this paper, we present a new Stroke-based Sequential-Spatial Neural Network (S 3 NN) for scene-level free-hand sketch semantic segmentation, which leverages a bidirectional LSTM and graph convolutional network to capture the sequential and spatial features of sketches. In order to address the data lacking issue, we propose the first Scene-level Free-hand Sketch Dataset (SFSD). SFSD is composed of 12K sketch-photo pairs over 40 object categories, where the sketches were completely hand-drawn and each contains 7 objects on average. We conduct comparative and ablative experiments on SFSD to evaluate the effectiveness of our method. The experimental results demonstrate that our method outperforms state-of-the-art (SOTA) methods. The code, models and dataset will be made public after acceptance.