W e address key questions related to the explosion of interest in the emerging fields of big data, analytics, and data science. We discuss the novelty of the fields and whether the underlying questions are fundamentally different, the strengths that the information systems (IS) community brings to this discourse, interesting research questions for IS scholars, the role of predictive and explanatory modeling, and how research in this emerging area should be evaluated for contribution and significance.
The use of the term "Data Science" is becoming increasingly common along with "Big Data." What does Data Science mean? Is there something unique about it? What skills should a "data scientist" possess to be productive in the emerging digital age characterized by a deluge of data? What are the implications for business and for scientific inquiry? In this brief monograph I address these questions from a predictive modeling perspective.
Abstract:The Internet has enabled the era of user-generated content, potentially breaking the hegemony of traditional content generators as the primary sources of "legitimate" information.Prime examples of user-generated content are blogs and social networking sites, which allow easy publishing of and access to information. In this study, we examine the usefulness of such content, consisting of data from blogs and social networking sites in predicting sales in the music industry.We track the changes in online chatter for a sample of 108 albums for four weeks before and after their release dates. We use linear and nonlinear regression to identify the relative significance of online variables on their observation date in predicting future album unit sales two weeks ahead Our findings are as follows: (a) the volume of blog posts about an album is positively correlated with future sales, (b) greater increases in an artist's Myspace friends week over week have a weaker correlation to higher future sales, (c) traditional factors are still relevant -albums released by major labels and albums with a number of reviews from mainstream sources like Rolling Stone also tended to have higher future sales. More generally, the study provides some preliminary answers for marketing managers interested in assessing the relative importance of the burgeoning number of "Web 2.0" information metrics that are becoming available on the Internet, and how looking at interactions among them could provide predictive value beyond viewing them in isolation. The study also provides a framework for thinking about when user-generated content influences decision making.
The use of the term "Data Science" is becoming increasingly common along with "Big Data." What does Data Science mean? Is there something unique about it? What skills should a "data scientist" possess to be productive in the emerging digital age characterized by a deluge of data? What are the implications for business and for scientific inquiry? In this brief monograph I address these questions from a predictive modeling perspective.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.