A sparse version of the ridge logistic regression for large-scale text categorization

Aseervatham, Sujeevan; Antoniadis, Anestis; Gaussier, Éric; Burlet, Michel; Denneulin, Yves

doi:10.1016/j.patrec.2010.09.023

Cited by 40 publications

(20 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It has been proven to outperform traditional back-off smoothing, because the former has the ability to process unknown terms and also avoids over evaluating the conditional probability which is originally zero. In future, these kinds of works could be extended to evaluate relationship between sentences rather than words (Aseervatham, Antoniadis, Gaussier, Burlet, & Denneulin, 2011) The automatic text categorization is the process of assigning, one or more textual documents to predefined categories based on its contents. However, it encounters a problem when the number of features exceeds the number of observations.…”

Section: Logistic Regressionmentioning

confidence: 99%

“…However, it encounters a problem when the number of features exceeds the number of observations. Also, ML techniques tend to perform weakly due to these overfitting problems; in which case, the model memorizes the training set instead of acquiring knowledge from them (Aseervatham et al, 2011). To prevent this, the complexity of the model has to be controlled during the training process using model selection techniques.…”

Section: Logistic Regressionmentioning

confidence: 99%

See 1 more Smart Citation

Text Classification Techniques: A Literature Review

Thangaraj

Sivakami

2018

IJIKM

View full text Add to dashboard Cite

Aim/Purpose: The aim of this paper is to analyze various text classification techniques employed in practice, their strengths and weaknesses, to provide an improved awareness regarding various knowledge extraction possibilities in the field of data mining. Background: Artificial Intelligence is reshaping text classification techniques to better acquire knowledge. However, in spite of the growth and spread of AI in all fields of research, its role with respect to text mining is not well understood yet. Methodology: For this study, various articles written between 2010 and 2017 on “text classification techniques in AI”, selected from leading journals of computer science, were analyzed. Each article was completely read. The research problems related to text classification techniques in the field of AI were identified and techniques were grouped according to the algorithms involved. These algorithms were divided based on the learning procedure used. Finally, the findings were plotted as a tree structure for visualizing the relationship between learning procedures and algorithms. Contribution: This paper identifies the strengths, limitations, and current research trends in text classification in an advanced field like AI. This knowledge is crucial for data scientists. They could utilize the findings of this study to devise customized data models. It also helps the industry to understand the operational efficiency of text mining techniques. It further contributes to reducing the cost of the projects and supports effective decision making. Findings: It has been found more important to study and understand the nature of data before proceeding into mining. The automation of text classification process is required, with the increasing amount of data and need for accuracy. Another interesting research opportunity lies in building intricate text data models with deep learning systems. It has the ability to execute complex Natural Language Processing (NLP) tasks with semantic requirements. Recommendations for Practitioners: Frame analysis, deception detection, narrative science where data expresses a story, healthcare applications to diagnose illnesses and conversation analysis are some of the recommendations suggested for practitioners. Recommendation for Researchers: Developing simpler algorithms in terms of coding and implementation, better approaches for knowledge distillation, multilingual text refining, domain knowledge integration, subjectivity detection, and contrastive viewpoint summarization are some of the areas that could be explored by researchers. Impact on Society: Text classification forms the base of data analytics and acts as the engine behind knowledge discovery. It supports state-of-the-art decision making, for example, predicting an event before it actually occurs, classifying a transaction as ‘Fraudulent’ etc. The results of this study could be used for developing applications dedicated to assisting decision making processes. These informed decisions will help to optimize resources and maximize benefits to the mankind. Future Research: In the future, better methods for parameter optimization will be identified by selecting better parameters that reflects effective knowledge discovery. The role of streaming data processing is still rarely explored when it comes to text classification.

show abstract

Section: Logistic Regressionmentioning

confidence: 99%

Section: Logistic Regressionmentioning

confidence: 99%

Text Classification Techniques: A Literature Review

Thangaraj

Sivakami

2018

IJIKM

View full text Add to dashboard Cite

show abstract

“…Since feature selection can be regarded as a binary regression task about each dimension of the original feature, a logistic regression function is used to denote a conditional probability model with the form defined by

P false(w | l, f (x), b false) = {()(, 1 + exp ()(, - l ()(, w^{T} f (x) + b)))}^{- 1}

where

f (x)

represents the original feature signature of voxel

x

, and

w

is the binary coefficient with 1 indicating that the corresponding features are relevant to the anatomical classification, and 0 denoting that the nonrelevant features are eliminated during the classifier learning process. l (·) is an anatomical binary labeling function.…”

Section: Methodsmentioning

confidence: 99%

Learning‐based CBCT correction using alternating random forest based on auto‐context model

et al. 2018

View full text Add to dashboard Cite

Purpose Quantitative Cone Beam CT (CBCT) imaging is increasing in demand for precise image‐guided radiotherapy because it provides a foundation for advanced image‐guided techniques, including accurate treatment setup, online tumor delineation, and patient dose calculation. However, CBCT is currently limited only to patient setup in the clinic because of the severe issues in its image quality. In this study, we develop a learning‐based approach to improve CBCT's image quality for extended clinical applications. Materials and methods An auto‐context model is integrated into a machine learning framework to iteratively generate corrected CBCT (CCBCT) with high‐image quality. The first step is data preprocessing for the built training dataset, in which uninformative image regions are removed, noise is reduced, and CT and CBCT images are aligned. After a CBCT image is divided into a set of patches, the most informative and salient anatomical features are extracted to train random forests. Within each patch, alternating RF is applied to create a CCBCT patch as the output. Moreover, an iterative refinement strategy is exercised to enhance the image quality of CCBCT. Then, all the CCBCT patches are integrated to reconstruct final CCBCT images. Results The learning‐based CBCT correction algorithm was evaluated using the leave‐one‐out cross‐validation method applied on a cohort of 12 patients’ brain data and 14 patients’ pelvis data. The mean absolute error (MAE), peak signal‐to‐noise ratio (PSNR), normalized cross‐correlation (NCC) indexes, and spatial nonuniformity (SNU) in the selected regions of interest (ROIs) were used to quantify the proposed algorithm's correction accuracy and generat the following results: mean MAE = 12.81 ± 2.04 and 19.94 ± 5.44 HU, mean PSNR = 40.22 ± 3.70 and 31.31 ± 2.85 dB, mean NCC = 0.98 ± 0.02 and 0.95 ± 0.01, and SNU = 2.07 ± 3.36% and 2.07 ± 3.36% for brain and pelvis data. Conclusion Preliminary results demonstrated that the novel learning‐based correction method can significantly improve CBCT image quality. Hence, the proposed algorithm is of great potential in improving CBCT's image quality to support its clinical utility in CBCT‐guided adaptive radiotherapy.

show abstract

“…Logistic Regression (LR) is a well-known statistical algorithm which was used widely in information retrieval [17][18][19][20][21][22]. LR was also investigated algorithm in English TC by some researchers [23][24][25][26][27][28][29][30][31][32][33][34][35][36].…”

Section: Introductionmentioning

confidence: 99%

Arabic Text Categorization Using Logistic Regression

Al-Tahrawi¹

2015

IJISA

View full text Add to dashboard Cite

Several Text Categorization (TC) techniques and algorithms have been investigated in the limited research literature of Arabic TC. In this research, Logistic Regression (LR) is investigated in Arabic TC. To the best of our knowledge, LR was never used for Arabic TC before. Experiments are conducted on Aljazeera Arabic News (Alj-News) dataset. Arabic text-preprocessing takes place on this dataset to handle the special nature of Arabic text. Experimental results of this research prove that the LR classifier is a competitive Arabic TC algorithm to the state of the art ones in this field; it has recorded a precision of 96.5% on one category and above 90% for 3 categories out of the five categories of Alj-News dataset. Regarding the overall performance, LR has recorded a macroaverage precision of 87%, recall of 86.33% and Fmeasure of 86.5%.

show abstract

A sparse version of the ridge logistic regression for large-scale text categorization

Cited by 40 publications

References 17 publications

Text Classification Techniques: A Literature Review

Text Classification Techniques: A Literature Review

Learning‐based CBCT correction using alternating random forest based on auto‐context model

Arabic Text Categorization Using Logistic Regression

Contact Info

Product

Resources

About