Finding the K best policies in a finite-horizon Markov decision process

Nielsen, Lars Relund; Kristensen, Anders Ringgaard

doi:10.1016/j.ejor.2005.06.011

Cited by 16 publications

(15 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A very similar problem has been explored by Nielsen et al [10][11][12]. Nielsen and Kristensen observed that the problem of finding optimal history-dependent policies (maps from the state space crossed with the time step to the action space) can be modeled as finding "a minimum weight hyperpath" in directed hypergraphs.…”

Section: Introductionmentioning

confidence: 93%

Ranking policies in discrete Markov decision processes

Dai

Goldsmith

2010

Ann Math Artif Intell

View full text Add to dashboard Cite

An optimal probabilistic-planning algorithm solves a problem, usually modeled by a Markov decision process, by finding an optimal policy. In this paper, we study the k best policies problem. The problem is to find the k best policies of a discrete Markov decision process. The k best policies, k > 1, cannot be found directly using dynamic programming. Naïvely, finding the k-th best policy can be Turing reduced to the optimal planning problem, but the number of problems queried in the naïve algorithm is exponential in k. We show empirically that solving k best policies problem by using this reduction requires unreasonable amounts of time even when k = 3. We then provide two new algorithms. The first is a complete algorithm, based on our theoretical contribution that the k-th best policy differs from the i-th policy, for some i < k, on exactly one state. The second is an approximate algorithm that skips many less useful policies. We show that both algorithms have good scalability. We also show that the approximate algorithms runs much faster and finds interesting, high-quality policies.

show abstract

Section: Introductionmentioning

confidence: 93%

Ranking policies in discrete Markov decision processes

Dai

Goldsmith

2010

Ann Math Artif Intell

View full text Add to dashboard Cite

show abstract

“…Nielsen and Kristensen [14] created a system that modeled Markov decision processes using directed hypergraphs in order to find the K best policies in a finite-horizon. The system ranked the first K deterministic Markov policies in non-decreasing order using an additive criterion of optimality.…”

Section: Symmetry 2019 11 X For Peer Review 3 Of 18mentioning

confidence: 99%

The Development of a Fuzzy Logic System in a Stochastic Environment with Normal Distribution Variables for Cash Flow Deficit Detection in Corporate Loan Policy

2019

View full text Add to dashboard Cite

This paper develops a Mamdani fuzzy logic system (FLS) that has stochastic fuzzy input variables designed to identify cash-flow deficits in bank lending policies. These deficits do not cover the available cash-flow (CFA) resulting from the company's operating activity. Thus, due to these deficits, solutions must be identified to avoid companies' financial difficulties. The novelty of this paper lies in its using stochastic fuzzy variables, or those categories of variables that are defined by fuzzy sets, characterized by normally distributed density functions specific to random variables, and characterized by fuzzy membership functions. The variation intervals of the stochastic fuzzy variables allow identification of the probabilistic risk situations to which the company is exposed during the crediting period using the Mamdani-type fuzzy logic system. The mechanism of implementing the fuzzy logic system is based on two stages. The first is based on the determination of the cash-flow requirements resulting from loan reimbursement and interest rates. This stage has the role of determining the need for financial resources to cover the liabilities. The second stage is based on the identification of the stochastic fuzzy variables which have a role in influencing the cash flow deficits and the probability values estimation of these variables taking into account probability calculations. Based on these probabilistic values, using the Mamdani fuzzy logic system, estimations are computed for the available cash-flow (the output variable). The estimated values for CFA are then used to detect probability risk situations in which the company will not have enough resources to cover its liabilities to financial creditors. All the FLS calculations refer to future time periods. Testing and simulating the fuzzy controller confirms its functionality.

show abstract

“…It can be seen that the higher the value of N prev the larger the number of integrals in equations (15), (17), and (21). This leads to difficulties in obtaining results.…”

Section: Expression Of F T F (T) With T 6 Tmentioning

confidence: 99%

Maintenance policy on a finite time span for a gradually deteriorating system with imperfect improvements

Ponchet

Fouladirad

Grall

2011

Proceedings of the Institution of Mechanical Engineers, Part O:

View full text Add to dashboard Cite

The study deals with a gradually deteriorating system such as a large structure. This system is studied over a finite time span where the finite horizon can be seen, for example, as an insurance deadline which requires a specific maintenance policy. Maintenance actions are assumed to be imperfect in this work. An improvement function is used to model the impact of the maintenance on the degradation level of the system. The improvement function is based in the virtual age model ARA 1 . A maintenance policy is then proposed in which maintenance actions are systematically performed at given maintenance dates, if the system has not already failed. It is assumed that in the event of a failure the system is not repaired. The system is then unavailable until the finite horizon. The proposed maintenance policy is assessed on the finite time span, and both maintenance dates and the number of maintenance actions are optimized.

show abstract

Finding the K best policies in a finite-horizon Markov decision process

Cited by 16 publications

References 21 publications

Ranking policies in discrete Markov decision processes

Ranking policies in discrete Markov decision processes

The Development of a Fuzzy Logic System in a Stochastic Environment with Normal Distribution Variables for Cash Flow Deficit Detection in Corporate Loan Policy

Maintenance policy on a finite time span for a gradually deteriorating system with imperfect improvements

Contact Info

Product

Resources

About