Mining consumer product data via latent semantic indexing

Jiang, Jingqian; Berry, Michael W.; Donato, J. M.; Ostrouchov, George; Grady, Nancy W.

doi:10.1016/s1088-467x(99)00029-3

Cited by 23 publications

(4 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It uses singular value decomposition (SVD) as a mathematical technique from algebra to discover latent, underlying patterns within a collection of unstructured texts [17], [18]. The patterns consist of several terms that are semantically related.…”

Section: Latent Semantic Indexingmentioning

confidence: 99%

Linguistic Feature Classifying and Tracing

Moohebat

Raj

Thorleuchter

et al. 2017

MJCS

View full text Add to dashboard Cite

We investigate the identification and analysis of linguistic (lexico-grammatical) features that are characteristically used by articles of a specific year of publication. Linguistic features differ from shallow features because they represent authors' lexico-grammatical writing styles and do not consider well-known bag-of-words model. Current literature focusses on shallow features rather than on linguistic features and existing methods for identifying linguistic features use well-known knowledge-structure based approaches. In contrast to this, we advance these existing methods by applying semantic clustering instead of using knowledge-structure based approaches. For evaluation purpose, a linguistic feature-based prediction model is built to enable an automated assignment of articles to their years of publication. In a case study, the proposed methodology is applied to articles of the Springer book series 'Communications in Computer and Information Science' published from 2009 to 2013. The Case study results show the feasibility of the proposed approach as compared to frequently used baseline.Keywords: Scientific articles, Linguistic features, Latent semantic indexing, Text Mining. INTRODUCTIONWe investigate the occurrence of linguistic (lexico-grammatical) features in articles to show that they can be used for assigning articles to their years of publication. The Literature shows related approaches that can be used to assign articles to a pre-defined class. A domain-specific vocabulary (key words) is often used for this classification task. Different domains can be well distinguished by the distribution of specific key words as shown by existing bag-of-words approaches [1]- [5]. Further, trend analysis and bibliometric research also show that key word distributions can be used to identify a time period [6]. They trace topic changes over time within a domain. Thus, these approaches can estimate an article's publication year based on the used topics.The approaches as mentioned above are based on shallow (bag-of-words) features. They are in contrast to linguistic features such as specific word class distributions that indicate authors' lexico-grammatical writing styles. Literature also shows the possibilities of using linguistic features for classification. [7] investigate the impact of linguistic features on different scientific disciplines and on different points in time. A further approach uses linguistic features for spam detection [8]. Both approaches are based on systemic functional linguistics, in which a knowledge-structure based classifier (e.g. support vector machine) is used.We provide a new approach that identifies articles' linguistic features and that investigates their usage at different points in time. In contrast to previous work, clustering is used instead of classification. Text classification assigns a text to the given pre-defined classes. Classes are normally defined in a way that they cover all known linguistic features that are expected to occur within the given texts. Text clusteri...

show abstract

Section: Latent Semantic Indexingmentioning

confidence: 99%

Linguistic Feature Classifying and Tracing

Moohebat

Raj

Thorleuchter

et al. 2017

MJCS

View full text Add to dashboard Cite

show abstract

“…It is based on eigenvector techniques from algebra. Dependencies among terms are calculated to group semantically related terms ( Jiang, Berry, Donato, Ostrouchov, & Grady, 1999 ). These groups are named concepts and they represent semantic clusters.…”

Section: Semantic Clustering Of Ideasmentioning

confidence: 99%

Identification of interdisciplinary ideas

Thorleuchter

Poel

2016

Information Processing & Management

View full text Add to dashboard Cite

“…The calculation of these semantic relationships between terms based on computational eigenvector techniques from algebra (Jiang, Berry, Donato, Ostrouchov, & Grady, 1999;Luo, Chen, & Xiong, 2011).…”

Section: Latent Semantic Indexing For Weak Signals Identificationmentioning

confidence: 99%

Weak signal identification with semantic web mining

Thorleuchter

Poel

2013

Expert Systems with Applications

View full text Add to dashboard Cite

We investigate an automated identification of weak signals according to Ansoff to improve strategic planning and technological forecasting. Literature shows that weak signals can be found in the organization's environment and that they appear in different contexts. We use internet information to represent organization's environment and we select these websites that are related to a given hypothesis. In contrast to related research, a methodology is provided that uses latent semantic indexing (LSI) for the identification of weak signals. This improves existing knowledge based approaches because LSI considers the aspects of meaning and thus, it is able to identify similar textual patterns in different contexts. A new weak signal maximization approach is introduced that replaces the commonly used prediction modeling approach in LSI. It enables to calculate the largest number of relevant weak signals represented by singular value decomposition (SVD) dimensions. A case study identifies and analyses weak signals to predict trends in the field of on-site medical oxygen production. This supports the planning of research and development (R&D) for a medical oxygen supplier. As a result, it is shown that the proposed methodology enables organizations to identify weak signals from the internet for a given hypothesis. This helps strategic planners to react ahead of time.

show abstract

Mining consumer product data via latent semantic indexing

Cited by 23 publications

References 13 publications

Linguistic Feature Classifying and Tracing

Linguistic Feature Classifying and Tracing

Identification of interdisciplinary ideas

Weak signal identification with semantic web mining

Contact Info

Product

Resources

About