Performance of standard text analytics algorithms are known to be substantially degraded on consumer generated data, which are often very noisy. These algorithms also do not work well on enterprise data which has a very different nature from News repositories, storybooks or Wikipedia data. Text cleaning is a mandatory step which aims at noise removal and correction to improve performance. However, enterprise data need special cleaning methods since it contains many domain terms which appear to be noise against a standard dictionary, but in reality are not so. In this work we present detailed analysis of characteristics of enterprise data and suggest unsupervised methods for cleaning these repositories after domain terms have been automatically segregated from true noise terms. Noise terms are thereafter corrected in a contextual fashion. The effectiveness of the method is established through careful manual evaluation of error corrections over several standard data sets, including those available for hate speech detection, where there is deliberate distortion to avoid detection. We also share results to show enhancement in classification accuracy after noise correction.
Richard Swedberg, well known for his work in economic sociology 2 and one of the doyens of the rising movement in analytical sociology 3 , offers the reader a pessimistic view of the current state-of-the-art about sociological theorizing. Indeed, compared to empirical methods or other sciences such as cognitive science, sociological theory has advanced little over the last decades. The prognosis is simple: students and researchers are taught theories but cannot theorize. If this is the case, then what is to be done? What does 'theorizing' mean exactly?In an earlier publication, The Art of Social Theory 4 , Swedberg attempted to provide an overview of the state of theorising in social sciences and offered practical tips and techniques for initiating theorizing. Here he strikes a second time and deepens the investigation of the topic with the ambitious Theorizing in Social Science, the Context of Discovery. The book, as the subtitle explicitly indicates, focuses on the context in which creativity is primarily what matters when a theory is devised. Grounding his argument upon the work of the forefathers of sociology such as Weber, Popper, and Durkheim, Swedberg punctures the myth of logical and rational thinking, arguing that the process of theorising is imperfect, and in which creativity, inspiration and intuition play a 1 The book reviewers are currently enrolled in the Ph.D. program at the
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.