In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine learning approaches have achieved surpassing results in natural language processing. The success of these learning algorithms relies on their capacity to understand complex models and non-linear relationships within data. However, finding suitable structures, architectures, and techniques for text classification is a challenge for researchers. In this paper, a brief overview of text classification algorithms is discussed. This overview covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods. Finally, the limitations of each technique and their application in real-world problems are discussed.Spelling correction is an optional pre-processing step. Typos (short for typographical errors) are commonly present in texts and documents, especially in social media text data sets (e.g., Twitter). Many algorithms, techniques, and methods have addressed this problem in NLP [49]. Many techniques and methods are available for researchers including hashing-based and context-sensitive spelling correction techniques [50], as well as spelling correction using Trie and Damerau-Levenshtein distance bigram [51]. StemmingIn NLP, one word could appear in different forms (i.e., singular and plural noun form) while the semantic meaning of each form is the same [52]. One method for consolidating different forms of a word into the same feature space is stemming. Text stemming modifies words to obtain variant word forms using different linguistic processes such as affixation (addition of affixes) [53,54]. For example, the stem of the word "studying" is "study". LemmatizationLemmatization is a NLP process that replaces the suffix of a word with a different one or removes the suffix of a word completely to get the basic word form (lemma) [54][55][56]. Syntactic Word RepresentationMany researchers have worked on this text feature extraction technique to solve the loosing syntactic and semantic relation between words. Many researchers addressed novel techniques for solving this problem, but many of these techniques still have limitations. In [57], a model was introduced in which the usefulness of including syntactic and semantic knowledge in the text representation for the selection of sentences comes from technical genomic texts. The other solution for syntactic problem is using the n-gram technique for feature extraction. N-GramThe n-gram technique is a set of n-word which occurs "in that order" in a text set. This is not a representation of a text, but it could be used as a feature to represent a text.BOW is a representation of a text using its words (1-gram) which loses their order (syntactic). This model is very easy to obtain and the text can be represented through a vector, generally of a manageable size of the text. On the ...
Increasingly large document collections require improved information processing methods for searching, retrieving, and organizing text. Central to these information processing methods is document classification, which has become an important application for supervised learning. Recently the performance of traditional supervised classifiers has degraded as the number of documents has increased. This is because along with growth in the number of documents has come an increase in the number of categories. This paper approaches this problem differently from current document classification methods that view the problem as multi-class classification. Instead we perform hierarchical classification using an approach we call Hierarchical Deep Learning for Text classification (HDLTex). HDLTex employs stacks of deep learning architectures to provide specialized understanding at each level of the document hierarchy.
BackgroundResearch in psychology demonstrates a strong link between state affect (moment-to-moment experiences of positive or negative emotionality) and trait affect (eg, relatively enduring depression and social anxiety symptoms), and a tendency to withdraw (eg, spending time at home). However, existing work is based almost exclusively on static, self-reported descriptions of emotions and behavior that limit generalizability. Despite adoption of increasingly sophisticated research designs and technology (eg, mobile sensing using a global positioning system [GPS]), little research has integrated these seemingly disparate forms of data to improve understanding of how emotional experiences in everyday life are associated with time spent at home, and whether this is influenced by depression or social anxiety symptoms.ObjectiveWe hypothesized that more time spent at home would be associated with more negative and less positive affect.MethodsWe recruited 72 undergraduate participants from a southeast university in the United States. We assessed depression and social anxiety symptoms using self-report instruments at baseline. An app (Sensus) installed on participants’ personal mobile phones repeatedly collected in situ self-reported state affect and GPS location data for up to 2 weeks. Time spent at home was a proxy for social isolation.ResultsWe tested separate models examining the relations between state affect and time spent at home, with levels of depression and social anxiety as moderators. Models differed only in the temporal links examined. One model focused on associations between changes in affect and time spent at home within short, 4-hour time windows. The other 3 models focused on associations between mean-level affect within a day and time spent at home (1) the same day, (2) the following day, and (3) the previous day. Overall, we obtained many of the expected main effects (although there were some null effects), in which higher social anxiety was associated with more time or greater likelihood of spending time at home, and more negative or less positive affect was linked to longer homestay. Interactions indicated that, among individuals higher in social anxiety, higher negative affect and lower positive affect within a day was associated with greater likelihood of spending time at home the following day.ConclusionsResults demonstrate the feasibility and utility of modeling the relationship between affect and homestay using fine-grained GPS data. Although these findings must be replicated in a larger study and with clinical samples, they suggest that integrating repeated state affect assessments in situ with continuous GPS data can increase understanding of how actual homestay is related to affect in everyday life and to symptoms of anxiety and depression.
BackgroundCritical illness is a leading cause of morbidity and mortality in sub-Saharan Africa (SSA). Identifying patients with the highest risk of death could help with resource allocation and clinical decision making. Accordingly, we derived and validated a universal vital assessment (UVA) score for use in SSA.MethodsWe pooled data from hospital-based cohort studies conducted in six countries in SSA spanning the years 2009–2015. We derived and internally validated a UVA score using decision trees and linear regression and compared its performance with the modified early warning score (MEWS) and the quick sepsis-related organ failure assessment (qSOFA) score.ResultsOf 5573 patients included in the analysis, 2829 (50.8%) were female, the median (IQR) age was 36 (27–49) years, 2122 (38.1%) were HIV-infected and 996 (17.3%) died in-hospital. The UVA score included points for temperature, heart and respiratory rates, systolic blood pressure, oxygen saturation, Glasgow Coma Scale score and HIV serostatus, and had an area under the receiver operating characteristic curve (AUC) of 0.77 (95% CI 0.75 to 0.79), which outperformed MEWS (AUC 0.70 (95% CI 0.67 to 0.71)) and qSOFA (AUC 0.69 (95% CI 0.67 to 0.72)).ConclusionWe identified predictors of in-hospital mortality irrespective of the underlying condition(s) in a large population of hospitalised patients in SSA and derived and internally validated a UVA score to assist clinicians in risk-stratifying patients for in-hospital mortality. The UVA score could help improve patient triage in resource-limited environments and serve as a standard for mortality risk in future studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.