Using meta-learning for automated algorithms selection and configuration: an experimental framework for industrial big data

Garouani, Moncef; Ahmad, Adeel; Bouneffa, Mourad; Hamlich, Mohamed; Bourguin, Grégory; Lewandowski, Arnaud

doi:10.1186/s40537-022-00612-4

Cited by 27 publications

(8 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Following preprocessing, the default text serves as input for feature extraction, a pivotal step in transforming data into meaningful features [39]. Our method employs metavectorization as a feature extraction concept [40] integrating TF-IDF [21] and Word2Vec feature extraction [41].…”

Section: Meta-vectorization Based On Text Feature Extractionmentioning

confidence: 99%

Automated Text Annotation Using a Semi-Supervised Approach with Meta Vectorizer and Machine Learning Algorithms for Hate Speech Detection

Saifullah,

Dreżewski,

Dwiyanto

et al. 2024

Applied Sciences

View full text Add to dashboard Cite

Text annotation is an essential element of the natural language processing approaches. The manual annotation process performed by humans has various drawbacks, such as subjectivity, slowness, fatigue, and possibly carelessness. In addition, annotators may annotate ambiguous data. Therefore, we have developed the concept of automated annotation to get the best annotations using several machine-learning approaches. The proposed approach is based on an ensemble algorithm of meta-learners and meta-vectorizer techniques. The approach employs a semi-supervised learning technique for automated annotation to detect hate speech. This involves leveraging various machine learning algorithms, including Support Vector Machine (SVM), Decision Tree (DT), K-Nearest Neighbors (KNN), and Naive Bayes (NB), in conjunction with Word2Vec and TF-IDF text extraction methods. The annotation process is performed using 13,169 Indonesian YouTube comments data. The proposed model used a Stemming approach using data from Sastrawi and new data of 2245 words. Semi-supervised learning uses 5%, 10%, and 20% of labeled data compared to performing labeling based on 80% of the datasets. In semi-supervised learning, the model learns from the labeled data, which provides explicit information, and the unlabeled data, which offers implicit insights. This hybrid approach enables the model to generalize and make informed predictions even when limited labeled data is available (based on self-learning). Ultimately, this enhances its ability to handle real-world scenarios with scarce annotated information. In addition, the proposed method uses a variety of thresholds for matching words labeled with hate speech ranging from 0.6, 0.7, 0.8, to 0.9. The experiments indicated that the DT-TF-IDF model has the best accuracy value of 97.1% with a scenario of 5%:80%:0.9. However, several other methods have accuracy above 90%, such as SVM (TF-IDF and Word2Vec) and KNN (Word2Vec), based on both text extraction methods in several test scenarios.

show abstract

Section: Meta-vectorization Based On Text Feature Extractionmentioning

confidence: 99%

Automated Text Annotation Using a Semi-Supervised Approach with Meta Vectorizer and Machine Learning Algorithms for Hate Speech Detection

Saifullah,

Dreżewski,

Dwiyanto

et al. 2024

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…Meta-vectorization was used in this study to obtain information from the extracted text that met specific requirements. This technique involves applying text feature extraction (characteristics of the dataset) [38]. This study applies Term Frequency-Inverse Document Frequency (TF-IDF) and Word Embedding (Word2Vec) feature extraction [39].…”

Section: Meta-vectorization Based On Text Feature Extractionmentioning

confidence: 99%

Automated Text Annotation Using Semi-Supervised Approach with Meta Vectorizer and Machine Learning Algorithms for Hate Speech Detection

Saifullah,

Dreżewski,

Dwiyanto

et al. 2023

Preprint

View full text Add to dashboard Cite

Text annotation is an essential element of the natural language processing approaches. The manual annotation process performed by humans has several drawbacks, such as subjectivity, slowness, fatigue, and possibly carelessness. In addition, annotators may annotate ambiguous data. So, we developed the concept of automated annotation to get the best annotations using several machine-learning approaches. The proposed approach is based on an ensemble algorithm of meta-learners and meta-vectorizer techniques. The approach employs a semi-supervised learning technique for automated annotation, aimed at detecting hate speech. This involves leveraging various machine learning algorithms, including Support Vector Machine (SVM), Decision Tree (DT), K-Nearest Neighbors (KNN), and Naive Bayes (NB), in conjunction with Word2Vec and TF-IDF text extraction methods. The annotation process is performed using 13,169 Indonesian YouTube comments data. The proposed model used a Stemming approach using data from Sastrawi and also new data of 2,245 words. Semi-supervised learning uses 5%, 10%, and 20% of labeled data as compared to performing labeling based on 80% of the datasets. In semi-supervised learning, the model learns from the labeled data, which provides explicit information, and the unlabeled data, which offers implicit insights. This hybrid approach enables the model to generalize and make informed predictions even when limited labeled data is available, ultimately enhancing its ability to handle real-world scenarios with scarce annotated information. In addition, the proposed method uses a variety of thresholds for matching words labeled with hate speech ranging from 0.6, 0.7, 0.8, and 0.9. The experiment showed that the KNN-Word2ec model has the best accuracy value of 96.9% with a scenario of 5%:80%:0.9. However, several other methods have also accuracy above 90%, such as SVM and DT based on both text extraction methods in several test scenarios.

show abstract

“…The spherical boundary is characterized by a center a and a radius R, hence during test, points that fall outside the boundary are considered as abnormal as illustrated in Figure 3. The parameters R and a are defined by ( 9 ξ 𝒾 ≥ 0 ∀ 𝓲 (11) here 𝝃 𝓲 : are slack variables that allow some points in training data to be outside the sphere and C represents a penalty constant that controls the trade-off between the volume of the hypersphere and rejected points.…”

Section: Defect Detectionmentioning

confidence: 99%

“…The term OCC was first introduced by [9] to denote a category of classification algorithms that address cases where few to none defect samples are available for training; the normal class is well-defined while abnormal one is under-sampled [10] which is quite common in industrial areas [11] ,and with that, defects are seen as a deviation from defect-free class. The OCC concept encompasses several approaches, such as methods based on density [12], distance [13], neural networks [14], [15], and boundary approaches [16] that aims to encircle normal samples by a decision boundary.…”

Section: Introductionmentioning

confidence: 99%

Product defect detection based on convolutional autoencoder and one-class classification

Chaabi¹,

Hamlich²,

Garouani³

2023

IJ-AI

View full text Add to dashboard Cite

<span lang="EN-US">To meet customer expectations and remain competitive, industrials try constantly to improve their quality control systems. There is hence increasing demand for adopting automatic defect detection solutions. However, the biggest issue in addressing such systems is the imbalanced aspect of industrial datasets. Often, defect-free samples far exceed the defected ones, due to continuous improvement approaches adopted by manufacturing companies. In this sense, we propose an automatic defect detection system based on one-class classification (OCC) since it involves only normal samples during training. It consists of three sub-models, first, a convolutional autoencoder serves as latent features extractor, the extracted features vectors are subsequently fed into the dimensionality reduction process by performing principal component analysis (PCA), then the reduced-dimensional data are used to train the one-class classifier support vector data description (SVDD). During the test phase, both normal and defected images are used. The first two stages of the trained model generate a low-dimensional features vector, whereas the SVDD classifies the new input, whether it is defect-free or defected. This approach is evaluated on the carpet images from the industrial inspection dataset MVTec anomaly detection (MVTec AD). During training, only normal images were used. The results showed that the proposed method outperforms the state-of-the-art methods.</span>

show abstract

Using meta-learning for automated algorithms selection and configuration: an experimental framework for industrial big data

Cited by 27 publications

References 45 publications

Automated Text Annotation Using a Semi-Supervised Approach with Meta Vectorizer and Machine Learning Algorithms for Hate Speech Detection

Automated Text Annotation Using a Semi-Supervised Approach with Meta Vectorizer and Machine Learning Algorithms for Hate Speech Detection

Automated Text Annotation Using Semi-Supervised Approach with Meta Vectorizer and Machine Learning Algorithms for Hate Speech Detection

Product defect detection based on convolutional autoencoder and one-class classification

Contact Info

Product

Resources

About