Detecting and understanding temporal expressions are key tasks in natural language processing (NLP), and are important for event detection and information retrieval. In the existing approaches, temporal semantics are typically represented as discrete ranges or specific dates, and the task is restricted to text that conforms to this representation. We propose an alternate paradigm: that of distributed temporal semantics -where a probability density function models relative probabilities of the various interpretations. We extend SUTime, a state-of-the-art NLP system to incorporate our approach, and build definitions of new and existing temporal expressions. A worked example is used to demonstrate our approach: the estimation of the creation time of photos in online social networks (OSNs), with a brief discussion of how the proposed paradigm relates to the point-and interval-based systems of time. An interactive demonstration, along with source code and datasets, are available online.
Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model.
Computing learners may not master basic concepts, or forget them between courses or from infrequent use. Learners also often struggle with advanced computing courses, perhaps from weakness with prerequisite concepts. One underlying challenge for researchers and instructors is determining the reason why a learner gets an advanced question wrong. Was the wrong answer because the learner lacked prerequisite skills, has not mastered the advanced skill, or some combination of the two? We contribute a design investigation into how to create differentiated questions which diagnose prerequisite and advanced skills at the same time. We focused on tracing and related skills as prerequisites, and on advanced object-oriented programming, concurrency, algorithm and data structures as the advanced skills. We conducted an inductive qualitative analysis of existing assessment questions from instructors and from a concept inventory with a validity argument (the Basic Data Structures Inventory). We found dependencies on a variety of prerequisite knowledge and mixed potential for diagnosing difficulties with prerequisites. Inspired by this analysis, we developed examples
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.