Understanding large text corpora via sparse machine learning

Ghaoui, Laurent El; Pham, Vu Viet Hoang; Li, Guancheng; Duong, Viet-An; Srivastava, Ashok N.; Bhaduri, Kanishka

doi:10.1002/sam.11187

Cited by 17 publications

(7 citation statements)

References 40 publications

(45 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While adding findings to the network is relatively easy, extracting and coding them requires more effort and time. Past research has been conducted on using machine learning to analyze accident reports [Abedin et al, 2010;Ghaoui et al, 2013;Robinson et al, 2015] so we may be able to use a similar method to teach a machine our coding scheme and easily add systems engineering failures to our solution aid.…”

Section: Discussionmentioning

confidence: 99%

Instant‐Expertise in Failure Causation: Developing and Presenting a Network of Causes and Recommendations Extracted from Past Failures

Aloisio¹,

Marais²

2018

INCOSE International Symp

View full text Add to dashboard Cite

Systems engineering failures like schedule slips, budget overruns, and other setbacks, occur often and can be costly, but the literature provides little guidance on why these failures occur or how to prevent them. In our previous work we argued that systems engineering failures are manifestations of similar underlying problems, and thus leveraged literature on the theory of accident causation, as well as the many accident investigation reports, to understand better how and why project failures occur and potential ways of preventing these failures. We developed a database of over 900 examples of failure causes and over 600 examples of remedial actions. In this paper, we detail our process for building this database into an interactive network‐based solution aid that anyone experiencing problems in their project or organization can access. The interactive solution aid provides the user with “instant expertise” in accident causation and remediation because it provides users with summaries and simplifications of the causes and remediation measures from the failures we studied.

show abstract

Section: Discussionmentioning

confidence: 99%

Instant‐Expertise in Failure Causation: Developing and Presenting a Network of Causes and Recommendations Extracted from Past Failures

Aloisio¹,

Marais²

2018

INCOSE International Symp

View full text Add to dashboard Cite

show abstract

“…We observe that IHT, in [17] is shown to solve a very similar problem to the one that we look to solve. In fact, their formulations are identical to (2), however the presentation of the algorithm's convergence and performance guarantees in [17] are restricted to a small set of matrices: those which are full column rank and meet the 3k-restricted isometry property [21].…”

Section: Algorithmsmentioning

confidence: 99%

“…Blumensath and Davies [17] present the iterative hard thresholding algorithm (IHT) as an efficient algorithm for finding a local minimum of the least-squares objective in (2). IHT has the advantage of explicitly working with the cardinality-constrained feasible set, instead of the 1 convex relaxation, controlled by λ as a proxy for the cardinality.…”

Section: Algorithmsmentioning

confidence: 99%

“…[2] uses a formulation based on the Lasso to extract keywords using a bag of words model to identify topic descriptions. [1] uses an 1 -penalized logistic regression formulation and demonstrates the quality of the keywords selected by investigating the human interpretability of the model in the experimental results.…”

Section: Algorithmsmentioning

confidence: 99%

“…We can also quickly summarize a topic, by representing it with a small set of keywords -which can be used both to understand the topic, and to efficiently query other relevant documents for that topic. Keyword extraction can also be used for investigative purposes, for example to suggest causal factors that influence plane safety by analysing ASRS flight reports [2], or to investigate public perception and bias in media reporting [1].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Iterative Hard Thresholding for Keyword Extraction from Large Text Corpora

Yadlowsky

Nakkarin

Wang

et al. 2014

2014 13th International Conference on Machine Learning and Applications

View full text Add to dashboard Cite

To better understand and analyze text corpora, such as the news, it is often useful to extract keywords that are meaningfully associated with a given topic. A corpus of documents labeled by their topic can be used to approach this as a learning problem. We consider this problem through the lens of statistical text analysis, using bag-of-words frequencies as features for a sparse linear model. We demonstrate, through numerical experiments, that iterative hard thresholding (IHT) is a practical and effective algorithm for keyword-extraction from large text corpora. In fact, our implementation of IHT can quickly analyze more than 800,000 documents, returning keywords comparable to algorithms solving a Lasso problemformulation, with significantly less computation time. Further, we generalize the analysis of the IHT algorithm to show that it is stable for rank deficient matrices, as those arising from our bag-of-words model often are.

show abstract