The challenge in executing a data science project is more than just identifying the best algorithm and tool set to use. Additional sociotechnical challenges include items such as how to define the project goals and how to ensure the project is effectively managed. This paper reports on a set of case studies where researchers were embedded within data science teams and where the researcher observations and analysis was focused on the attributes that can help describe data science projects and the challenges faced by the teams executing these projects, as opposed to the algorithms and technologies that were used to perform the analytics. Based on our case studies, we identified 14 characteristics that can help describe a data science project. We then used these characteristics to create a model that defines two key dimensions of the project. Finally, by clustering the projects within these two dimensions, we identified four types of data science projects, and based on the type of project, we identified some of the sociotechnical challenges that project teams should expect to encounter when executing data science projects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.