Detection of some types of toxic language is hampered by extreme scarcity of labeled training data. Data augmentation -generating new synthetic data from a labeled seed datasetcan help. The efficacy of data augmentation on toxic language classification has not been fully explored. We present the first systematic study on how data augmentation techniques impact performance across toxic language classifiers, ranging from shallow logistic regression architectures to BERT -a state-of-the-art pretrained Transformer network. We compare the performance of eight techniques on very scarce seed datasets. We show that while BERT performed the best, shallow classifiers performed comparably when trained on data augmented with a combination of three techniques, including GPT-2-generated sentences. We discuss the interplay of performance and computational overhead, which can inform the choice of techniques under different constraints.
Al,rfrou-WCUMA networks havc rnaiig parameters which determine their performance. Eirsuriog a desired quality et service means B proper rhoiec nf parameters. One approach to choosing these parameters automatically is to miniinize a cos1 function with respect to these parameters. The minimum ofthc cost function corresponds to an optimal network performance. The choice of cost function, in the simplest case cm be some conihinntion of Kcy Prrfirmanec Indicators (KPl's) of the network. Here a cost function is described which enn be used to minimize thc vnluc a i one of there KPl's, the call blucking rate in the network A second nrdsr gradient niethod is derived which is used to minimize the cost function. The cost function and gradient algorithm arc implemented on BU adyanccd WCUMA radio network simulator to optimize the value of the soft linndovcr parnrnrtar window add.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.