In this article we describe our experiences with computational text analysis involving rich social and cultural concepts. We hope to achieve three primary goals. First, we aim to shed light on thorny issues not always at the forefront of discussions about computational text analysis methods. Second, we hope to provide a set of key questions that can guide work in this area. Our guidance is based on our own experiences and is therefore inherently imperfect. Still, given our diversity of disciplinary backgrounds and research practices, we hope to capture a range of ideas and identify commonalities that resonate for many. This leads to our final goal: to help promote interdisciplinary collaborations. Interdisciplinary insights and partnerships are essential for realizing the full potential of any computational text analysis involving social and cultural concepts, and the more we bridge these divides, the more fruitful we believe our work will be.
In 2010, Wikileaks sprang to international prominence when it released two tranches of war logs documenting American military action in Afghanistan and Iraq. This was the first time data had been made available which documented the scale of American military action in those countries and the numbers of civilian casualties. The Afghan war logs comprised 91,000 military records, while the Iraqi files were even larger, containing 391,000 records.The huge number of documents posed great problems for the journalists from The Guardian and New York Times who worked on them. One journalist said that the experience was 'like panning for tiny grains of gold in a mountain of data'. 1 The Afghan logs were initially loaded into Microsoft Excel. One of the Guardian journalists recalled that:When I first got access to the database, it felt like being a kid in a candy shop. My first impulse was to search for 'Osama bin Laden', the man who had started the war.Several of us furiously inputted the name to see what it would produce (not much as it turned out). 2 However, Excel had serious limitations. After a while, it was realised that the spreadsheet had automatically truncated the import of the Afghan war logs after 66,000 records, so that a third of the records were missing from the journalists' initial searches. 3 A different approach was needed. Alastair Dant, the Guardian's data visualiser, explained that he could create a bespoke interactive visual display of the statistics. He used as a template an interactive map of the Glastonbury music festival previously produced by The Guardian. 4 This visualisation enabled journalists to follow day by day and year by year the struggle of the US Army to deal with thousands of improvised explosive devices in Afghanistan. It showed how ordinary civilians were the principal victims of these devices and vividly illustrated the ebb and flow of these incidents in response to political developments. For the first time accurate statistics of the death toll in Iraq could be produced. In addition to 3,771 dead US and allied soldiers, the war logs recorded 109,032 deaths of civilians, members of the Iraqi security forces and people classed as 'enemy '. 5 The way in which these journalists worked with this first tranche of Wikileaks material anticipated the methods that historians will in future need to adopt as they deal increasingly with born-digital historical records. The whole episode also served to illustrate the importance of who has access to what data. These logs were only available to journalists because they were leaked, an action which circumvented existing legal and national security frameworks. The stakes are not always so high, but barriers to and inequalities of access affect researchers working with all kinds of born-digital materials, and shape the kinds of analysis that can be undertaken, the types of people whose voices and stories may be represented. This is apparent from the over-representation of Twitter in social media studies, for example. Many more people use Facebook ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.