An agent pursuing a task may work with a corpus of documents as a reference library. Subjective content descriptions (SCDs) provide additional data that add value in the context of the agent’s task. In the pursuit of documents to add to the corpus, an agent may come across new documents where content text and SCDs from another agent are interleaved and no distinction can be made unless the agent knows the content from somewhere else. Therefore, this paper presents a hidden Markov model-based approach to identify SCDs in a new document where SCDs occur inline among content text. Additionally, we present a dictionary selection approach to identify suitable translations for content text and SCDs based on [Formula: see text]-grams. We end with a case study evaluating both approaches based on simulated and real-world data.
An agent in pursuit of a task may work with a reference library containing documents associated with additional data that provide location-specific explanations about the content. Faced with a new document, an agent has to decide whether to include the new document in its reference library. Basing the decision on words, topics, or entities has shown not to lead to a balanced performance for varying documents. In this paper, we present an approach for automatically enriching new documents with data associated to documents in a reference library. Additionally, we analyze these data to classify new documents into categories to help an agent in deciding whether to include the new document in its reference library.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.