In this paper, we propose a novel semi-supervised approach for detecting profanity-related offensive content in Twitter. Our approach exploits linguistic regularities in profane language via statistical topic modeling on a huge Twitter corpus, and detects offensive tweets using these automatically generated features. Our approach performs competitively with a variety of machine learning (ML) algorithms. For instance, our approach achieves a true positive rate (TP) of 75.1% over 4029 testing tweets using Logistic Regression, significantly outperforming the popular keyword matching baseline, which has a TP of 69.7%, while keeping the false positive rate (FP) at the same level as the baseline at about 3.77%. Our approach provides an alternative to large scale hand annotation efforts required by fully supervised learning approaches.
Abstract-The BitTorrent (BT) file sharing protocol is popular due to its scalability property and the incentive mechanism to reduce free-riding. However, in designing such P2P file sharing protocols, there is a fundamental "tussle" between keeping peers, specially the more resourceful ones, in the system for as long as possible to help the system achieve better performance and allowing peers finish their download as quickly as possible. The current BT protocol represents only "one" possible implementation in this whole design spectrum. In this paper, we characterize the "complete" design space of BT-like protocols. We use fairness index to measure the fairness that incorporates the contribution peers make. We show that there is a wide range of design choices, ranging from optimizing the performance of file download, to optimizing the fairness measure. More importantly, we show that there is a simple and easily implementable design knob which can be used to choose a particular operating point in the design space. We then discuss different algorithms (centralized versus distributed) in realizing the design knob. We also carry out performance evaluation to quantify the merits and properties of the BT-like file sharing protocols.
Several emerging network trends and new architectural ideas are placing increasing demand on forwarding table sizes. From massivescale datacenter networks running millions of virtual machines to flow-based software-defined networking, many intriguing design options require FIBs that can scale well beyond the thousands or tens of thousands possible using today's commodity switching chips. This paper presents CUCKOOSWITCH, a software-based Ethernet switch design built around a memory-efficient, high-performance, and highly-concurrent hash table for compact and fast FIB lookup. We show that CUCKOOSWITCH can process 92.22 million minimumsized packets per second on a commodity server equipped with eight 10 Gbps Ethernet interfaces while maintaining a forwarding table of one billion forwarding entries. This rate is the maximum packets per second achievable across the underlying hardware's PCI buses.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.