key insightsN-gram analysis is a simple but extremely useful method of extracting knowledge about an institution's culture and identity from its archived historical documents.
N-gram analysis can reveal surprising and long-hidden trends that show how an institution has evolved.Knowledge gained from n-gram analyses can substantially improve managerial decision making.
Selecting a final machine learning (ML) model typically occurs after a process of hyperparameter optimization in which many candidate models with varying structural properties and algorithmic settings are evaluated and compared. Evaluating each candidate model commonly relies on k-fold cross validation, wherein the data are randomly subdivided into k folds, with each fold being iteratively used as a validation set for a model that has been trained using the remaining folds. While many research studies have sought to accelerate ML model selection by applying metaheuristic and other search methods to the hyperparameter space, no consideration has been given to the k-fold cross validation process itself as a means of rapidly identifying the best-performing model. The current study rectifies this oversight by introducing a greedy k-fold cross validation method and demonstrating that greedy k-fold cross validation can vastly reduce the average time required to identify the best-performing model when given a fixed computational budget and a set of candidate models. This improved search time is shown to hold across a variety of ML algorithms and real-world datasets. For scenarios without a computational budget, this paper also introduces an early stopping algorithm based on the greedy cross validation method. The greedy early stopping method is shown to outperform a competing, state-of-the-art early stopping method both in terms of search time and the quality of the ML models selected by the algorithm. Since hyperparameter optimization is among the most time-consuming, computationally intensive, and monetarily expensive tasks in the broader process of developing ML-based solutions, the ability to rapidly identify optimal machine learning models using greedy cross validation has obvious and substantial benefits to organizations and researchers alike.
The increasing adoption of Service Oriented Architecture (SOA) is allowing more and more companies to integrate themselves in interorganizational netchain environments wherein knowledge assets can be electronically shared with selected business partners. The dynamic nature of these environments implies a need for organizations to protect and monitor the flow of their valuable knowledge assets throughout the netchain if they hope to maintain their long-term competitive positions. In this paper, we propose an interorganizational knowledge-sharing security model that integrates the value chain reference model (VCOR), the federated enterprise reference architecture model (FERA), and multidimensional data warehouse technologies to allow for the proactive monitoring of shared knowledge assets across an SOA-based netchain. The proposed architecture is novel In that it supports dynamic policy revision through the automated detection of knowledge-sharing breaches within a netchain-a process whose viability is demonstrated using network flow theory and a series of simulations. Existing business intelligence infrastructures can be readily modified to support the proposed model, as multidimensional data warehousing has already been adopted in many organizations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.