Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Pastebin is an online notepad service to share text anonymously. However, it could be misused to propagate suspicious or even illegal activities, like leaking sensitive information or sharing hyperlinks to child sexual abuse material. Due to the high rate of daily upload pastes, manual inspection of this material is not feasible. Conversely, an automatic classifier could identify such activities with little or no human intervention. However, a supervised model may require a significant number of training samples and have to handle distinct text typologies presented in Pastebin. This paper presents a classification approach composed of three cascading supervised classifiers that use Active Learning to select and label the most informative samples from Pastebin. The modularity of the proposed design allows each classifier to adapt to a specific text typology. The first classifier determines whether the text is a code snippet, and the second is to identify whether it is readable. The third classification level is twofold: (i) a binary classifier to say whether the text is suspicious and (ii) a multiclass classifier with seven predefined categories of possibly illegal activities. The average class recall of the binary and multiclass classifiers is $$95.24\%$$ 95.24 % and $$80.33\%$$ 80.33 % , respectively. Additionally, this paper presents a dataset of 3.8 million Pastebin samples, called onlIne Notepad Services PastEbin aCtiviTies (INSPECT-3.8M), along with their labels using our classification framework. Our classifier recognised that $$7.54\%$$ 7.54 % of the collected samples are correlated with presumably criminal activities. Law enforcement agencies may benefit from the insights shared in our research when aiming to investigate or automate the monitoring of Pastebin or other Online Notepad Services. This would allow responsible authorities to block illegal content before it spreads to the public.
Pastebin is an online notepad service to share text anonymously. However, it could be misused to propagate suspicious or even illegal activities, like leaking sensitive information or sharing hyperlinks to child sexual abuse material. Due to the high rate of daily upload pastes, manual inspection of this material is not feasible. Conversely, an automatic classifier could identify such activities with little or no human intervention. However, a supervised model may require a significant number of training samples and have to handle distinct text typologies presented in Pastebin. This paper presents a classification approach composed of three cascading supervised classifiers that use Active Learning to select and label the most informative samples from Pastebin. The modularity of the proposed design allows each classifier to adapt to a specific text typology. The first classifier determines whether the text is a code snippet, and the second is to identify whether it is readable. The third classification level is twofold: (i) a binary classifier to say whether the text is suspicious and (ii) a multiclass classifier with seven predefined categories of possibly illegal activities. The average class recall of the binary and multiclass classifiers is $$95.24\%$$ 95.24 % and $$80.33\%$$ 80.33 % , respectively. Additionally, this paper presents a dataset of 3.8 million Pastebin samples, called onlIne Notepad Services PastEbin aCtiviTies (INSPECT-3.8M), along with their labels using our classification framework. Our classifier recognised that $$7.54\%$$ 7.54 % of the collected samples are correlated with presumably criminal activities. Law enforcement agencies may benefit from the insights shared in our research when aiming to investigate or automate the monitoring of Pastebin or other Online Notepad Services. This would allow responsible authorities to block illegal content before it spreads to the public.
Effective and precise techniques for mosquito species identification are required as mosquito-borne illnesses continue to pose serious threats to public health across the world. We provide a new hybrid machine-learning technique in this research work for the classification of mosquito species through the Wingbeat analysis. It analyzes the wingbeat of the mosquito species based on which it can identify the mosquito species. This method makes use of deep learning techniques. The hybrid technique attempts to provide robust and dependable classification performance by utilizing a wide range of machine learning methods, such as k-Nearest Neighbors (KNN), Random Forest, Multi-layer Perceptron (MLP), Support Vector Machines (SVM), and Gradient Boosting. To improve feature extraction and normalization, we apply a rigorous set of preprocessing techniques to a large dataset that includes wingbeat recordings from many mosquito species. By means of comprehensive testing and analysis, we prove that our method is effective in correctly detecting mosquito species, exhibiting better results than using separate machine learning algorithms. Our findings demonstrate how deep learning methods may support more conventional machine learning strategies in problems involving the categorization of mosquito species. We also address the implications of our results for ecological research and disease management initiatives, highlighting the significance of precise species identification in vector monitoring and epidemiological investigations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.