Deep reinforcement learning for condition-based maintenance planning of multi-component systems under dependent competing risks

Zhang, Nailong; Si, Wujun

doi:10.1016/j.ress.2020.107094

Cited by 90 publications

(19 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The method corresponding to an off-policy configuration is superior in that it can utilize historical data of past maintenance by human experts to implement an optimized decision-making policy that is different from the policy in the past history immediately after offline training. DQN applications to maintenance include road pavement maintenance [29], bridge maintenance [30], and general multi-component condition-based maintenance [31]. In [31], stochastic and economic dependencies among multiple components are taken into account by DQN.…”

Section: (Deep) Reinforcement Learning For Maintenance Planningmentioning

confidence: 99%

“…DQN applications to maintenance include road pavement maintenance [29], bridge maintenance [30], and general multi-component condition-based maintenance [31]. In [31], stochastic and economic dependencies among multiple components are taken into account by DQN. DQN takes the same approach as ours in terms of Q-learning, and while its model is flexible enough to fully capture these dependencies, it is too complex to scale with respect to the number of components.…”

Section: (Deep) Reinforcement Learning For Maintenance Planningmentioning

confidence: 99%

“…DQN takes the same approach as ours in terms of Q-learning, and while its model is flexible enough to fully capture these dependencies, it is too complex to scale with respect to the number of components. The number of components assumed in [31] is around ten, while we assume up to thousands or more. DQN utilizes a multi-head neural network that outputs Q-values for each combination of actions (thus it has 2 n heads for the number of components n), while we have too many components (n = 1000 or more) to apply this approach in terms of statistical and computational complexity.…”

Section: (Deep) Reinforcement Learning For Maintenance Planningmentioning

confidence: 99%

See 2 more Smart Citations

Combinatorial Q-Learning for Condition-Based Infrastructure Maintenance

Tanimoto

2021

IEEE Access

View full text Add to dashboard Cite

Infrastructure maintenance planning is a large-scale optimization problem of planning when and on which components to carry out maintenance so as to keep the whole infrastructure in good condition with minimal maintenance cost. Recent advances in condition monitoring techniques have enabled timely maintenance in response to the condition of each part regardless of age. In addition to the condition, the spatial structure is also important for cost-efficiency in infrastructure maintenance since traveling costs and/or setup costs can be saved by simultaneous maintenance of neighboring components, which is called economic dependency. This optimization problem naively has a high computational complexity of O(2 nH ), where n is the number of components and H is the planning horizon, and the predictive modeling of degradation is also a big issue. To solve this problem efficiently at scale, our proposed method utilizes two kinds of dynamic programming for temporal and spatial scalability and consequently enjoys O(n) complexity at each time step. For temporal scalability, we utilize a direct modeling approach for the action value of maintenance instead of modeling degradation, namely, Q-learning. For spatial scalability, we exploit locality in economic dependency by means of a reasonable approximation of the Q-function. A typical baseline approach is to divide the whole infrastructure into fixed groups of neighboring components beforehand and determine if maintenance should be performed for all the components in each group at each time step. In contrast, our scalable method enables fully combinatorial optimization for each component at each time step. We demonstrate the advantage of our method in a simulated environment, and the resulting maintenance history intuitively illustrates the benefit of our dynamic grouping approach. We also show that our method has a kind of interpretability in the optimization at each time step.

show abstract

Section: (Deep) Reinforcement Learning For Maintenance Planningmentioning

confidence: 99%

Section: (Deep) Reinforcement Learning For Maintenance Planningmentioning

confidence: 99%

Section: (Deep) Reinforcement Learning For Maintenance Planningmentioning

confidence: 99%

See 1 more Smart Citation

Combinatorial Q-Learning for Condition-Based Infrastructure Maintenance

Tanimoto

2021

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Non‐periodic inspection optimisation has also been studied as a function of the current health of the system [6]. In some papers, optimal maintenance policy was found for large states, based on cost optimisation for a multi‐component system using RL with fixed intervals [7, 8].…”

Section: Literature Reviewmentioning

confidence: 99%

Reinforcement learning for optimal policy learning in condition‐based maintenance

Adsule¹,

Kulkarni²,

Tewari³

2020

IET Collaborative Intelligent Manufacturing

View full text Add to dashboard Cite

“…There are three types of dependencies: (i) economic, (ii) stochastic and (iii) structural. The dependency of type (i) exists when a high setup cost required to perform maintenance, and hence it will save cost if the maintenance action is done for a group of components ([ 31 , 32 ]). The dependency of type (ii) occurs if the degradation of one component is affected by the degradation of another component ([ 33 , 34 ]).…”

Section: Introductionmentioning

confidence: 99%

Condition-based maintenance policy for a leased reman product

Husniah

Pasaribu

Wangsaputra

et al. 2021

Heliyon

View full text Add to dashboard Cite

Many firms prefer to lease rather than to buy a product as leasing does not require a large investment cost. Leased products can be brand new products or remanufactured products (henceforth referred to as reman products). The market of reman products has grown in the last two decades due to the increasing concern of sustainability issues. This in turn brings a positive impact on the demand of leased reman products. In general, the reliability of the reman product is closed to the reliability level of a new product. To guarantee a high performance of a leased reman product, a more effective maintenance strategy is required. In this paper, we investigate a condition-based maintenance (CBM) policy to be used for maintaining a lease reman product. With the CBM policy, the condition of the reman product is monitored and controlled periodically, and hence it can avoid failure before it occurs and reduce unnecessary maintenance actions. This in turn improves the performance of the leased reman product and provides more value to the lessee. The lessor will incur a penalty cost if the performance is below a predefined threshold value. We obtain the optimal inspection interval minimizing the expected total cost and provide the numerical example for illustrating the optimal solution.

show abstract

Deep reinforcement learning for condition-based maintenance planning of multi-component systems under dependent competing risks

Cited by 90 publications

References 37 publications

Combinatorial Q-Learning for Condition-Based Infrastructure Maintenance

Combinatorial Q-Learning for Condition-Based Infrastructure Maintenance

Reinforcement learning for optimal policy learning in condition‐based maintenance

Condition-based maintenance policy for a leased reman product

Contact Info

Product

Resources

About