Abstract. In this paper, we investigate how discourse context in the form of short-term memory can be exploited to automatically group consecutive strokes in digital freehand sketching. With this machine learning approach, no database of explicit object representations is used for template matching on a complete scene-instead, grouping decisions are based on limited spatio-temporal context. We employ two different classifier formalisms for this time series analysis task, namely Echo State Networks (ESNs) and Support Vector Machines (SVMs). ESNs present internal-state classifiers with inherent memory capabilities. For the conventional static SVM, short-term memory is supplied externally via fixed-length feature vector expansion. We compare the respective setup heuristics and conduct experiments with two exemplary problems. Promising results are achieved with both formalisms. Yet, our experiments indicate that using ESNs for variable-length memory tasks alleviates the risk of overfitting due to non-expressive features or improperly determined temporal embedding dimensions.