A major problem that often occurs in Digital Forensics (DF) is the huge volumes of data that has to be searched, filtered, and indexed to discover patterns that could lead to forensic evidence. The nature of, and the process by which the data gets collected, implies that the data also contain information about persons that are not implicated, or only incidentally involved in the crime under investigation. Privacy is therefore an important issue that needs to be managed in a DF investigation. This paper shows that techniques used in the Team Formation (TF) task can be successfully applied to address both the problems of data volume and privacy. The TF task can be re-formulated to fit the DF arena: to commit a crime, the culprit(s) may require the assistance of several other individuals, which implies that a team of some sort gets established. During a post-mortem DF analysis, an investigator may only have one, or a few names to start with. One of the key challenges is finding possible co-conspirators. From a TF point of view, the culprit is trying to find the best team to commit the crime, given some constraints. The TF task in DF requires the recording of skill-sets, and the generation and/or discovery of a graph depicting interaction between candidates. If the data consist of an email corpus and peoples' roles in an organisation (such as in the Enron data), both of these are readily available. In this paper we consider the TF problem in general and extend it to the DF arena by considering the information that an investigator may have access to during the investigation. We also show that simple information retrieval and keyword extraction techniques (such as RAKE) can be used to automatically discover potential teams from the data, while preserving privacy; results from a series of experiments (using the new definitions of TF and the proposed information retrieval techniques) on the Enron data is then presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.