Digital storage now acts as an archive of the memories of users worldwide, keeping record of data as well as the context in which the data was acquired. The massive amount of data available and the fact that it is fragmented across many services (e.g., Facebook) and devices (e.g., laptop) make it very difficult for users to find specific pieces of information that they remember having stored or accessed. Unifying this fragmented data into a single data set that includes contextual information would allow for much better indexing and searching of personal information. Thus, we have developed a personal data extraction tool as a first step toward this vision. In this paper, we present this extraction tool, along with some preliminary statistics about personal data gathered by the tool for several users. The goal of the data analysis is to give a glimpse of what the digital life of a person may look like, and how it is currently partitioned across many different services; moreover, it reinforces the fact that it is not possible for users to manually retrieve, store and access their extensive digital data without the support of a personalized information management tool.
Digital traces of our lives are now constantly produced by various connected devices, internet services and interactions. Our actions result in a multitude of heterogeneous data objects, or traces, kept in various locations in the cloud or on local devices. Users have very few tools to organize, understand, and search the digital traces they produce. We propose a simple but flexible data model to aggregate, organize, and find personal information within a collection of a user's personal digital traces. Our model uses as basic dimensions the six questions: what, when, where, who, why, and how. These natural questions model universal aspects of a personal data collection and serve as unifying features of each personal data object, regardless of its source. We propose indexing and search techniques to aid users in searching for their past information in their unified personal digital data sets using our model. Experiments performed over real user data from a variety of data sources such as Facebook, Dropbox, and Gmail show that our approach significantly improves search accuracy when compared with traditional search tools.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.