Abstract. In this study we present a new cyclone identification and tracking algorithm, cycloTRACK. The algorithm describes an iterative process. At each time step it identifies all potential cyclone centers, defined as relative vorticity maxima embedded in smoothed enclosed contours of at least 3 × 10 −5 s −1 at the atmospheric level of 850 hPa. Next, the algorithm finds all the potential cyclone paths by linking the cyclone centers at consecutive time steps and selects the most probable track based on the minimization of a cost function. The cost function is based on the average differences of relative vorticity between consecutive track points, weighted by their distance. Last, for each cyclone, the algorithm identifies "an effective area" for which different physical diagnostics are measured, such as the minimum sea level pressure and the maximum wind speed. The algorithm was applied to the ERA-Interim reanalyses for tracking the Northern Hemisphere extratropical cyclones of winters from 1989 until 2009, and we assessed its sensitivity for the several free parameters used to perform the tracking.
News content analysis is usually preceded by a labour-intensive coding phase, where experts extract key information from news items. The cost of this phase imposes limitations on the sample sizes that can be processed, and therefore to the kind of questions that can be addressed. In this paper we describe an approach that incorporates text-analysis technologies for the automation of some of these tasks, enabling us to analyse data sets that are many orders of magnitude larger than those normally used. The patterns detected by our method include: (1) similarities in writing style among several outlets, which reflect reader demographics; (2) gender imbalance in media content and its relation with topic; (3) the relationship between topic and popularity of articles.
BackgroundA trend towards automation of scientific research has recently resulted in what has been termed “data-driven inquiry” in various disciplines, including physics and biology. The automation of many tasks has been identified as a possible future also for the humanities and the social sciences, particularly in those disciplines concerned with the analysis of text, due to the recent availability of millions of books and news articles in digital format. In the social sciences, the analysis of news media is done largely by hand and in a hypothesis-driven fashion: the scholar needs to formulate a very specific assumption about the patterns that might be in the data, and then set out to verify if they are present or not.Methodology/Principal FindingsIn this study, we report what we think is the first large scale content-analysis of cross-linguistic text in the social sciences, by using various artificial intelligence techniques. We analyse 1.3 M news articles in 22 languages detecting a clear structure in the choice of stories covered by the various outlets. This is significantly affected by objective national, geographic, economic and cultural relations among outlets and countries, e.g., outlets from countries sharing strong economic ties are more likely to cover the same stories. We also show that the deviation from average content is significantly correlated with membership to the eurozone, as well as with the year of accession to the EU.Conclusions/SignificanceWhile independently making a multitude of small editorial decisions, the leading media of the 27 EU countries, over a period of six months, shaped the contents of the EU mediasphere in a way that reflects its deep geographic, economic and cultural relations. Detecting these subtle signals in a statistically rigorous way would be out of the reach of traditional methods. This analysis demonstrates the power of the available methods for significant automation of media content analysis.
We present a method for the classification of multi-labelled text documents explicitly designed for data stream applications that require to process a virtually infinite sequence of data using constant memory and constant processing time.Our method is composed of an online procedure used to efficiently map text into a low-dimensional feature space and a partition of this space into a set of regions for which the system extracts and keeps statistics used to predict multi-label text annotations. Documents are fed into the system as a sequence of words, mapped to a region of the partition, and annotated using the statistics computed from the labelled instances colliding in the same region. This approach is referred to as clashing.We illustrate the method in real-world text data, comparing the results with those obtained using other text classifiers. In addition, we provide an analysis about the effect of the representation space dimensionality on the predictive performance of the system. Our results show that the online embedding indeed approximates the geometry of the full corpus-wise TF and TF-IDF space. The model obtains competitive F measures with respect to the most accurate methods, using significantly fewer computational resources. In addition, the method achieves a higher macro-averaged F measure than methods with similar running time. Furthermore, the system is able to learn faster than the other methods from partially labelled streams.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.