2019
DOI: 10.1007/s12555-019-0120-7
|View full text |Cite
|
Sign up to set email alerts
|

Model-free Adaptive Optimal Control of Episodic Fixed-horizon Manufacturing Processes Using Reinforcement Learning

Abstract: A self-learning optimal control algorithm for episodic fixed-horizon manufacturing processes with time-discrete control actions is proposed and evaluated on a simulated deep drawing process. The control model is built during consecutive process executions under optimal control via reinforcement learning, using the measured product quality as reward after each process execution. Prior model formulation, which is required by state-of-the-art algorithms from model predictive control and approximate dynamic progra… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0
2

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 38 publications
(20 citation statements)
references
References 39 publications
0
18
0
2
Order By: Relevance
“…The reward gained per episode is visualized as box-plots grouped by the position of the task-instance in the task-sequence and aggregated over the 100 experiments and over 250 episodes per box. The baseline (red box) is consisting of 100 random taskinstances, independently optimized by single-policy learning with single-objective manufacturing-process NFQ, as proposed in [8]. The plot shows a positive effect of the amount of prior knowledge to the convergence speed.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…The reward gained per episode is visualized as box-plots grouped by the position of the task-instance in the task-sequence and aggregated over the 100 experiments and over 250 episodes per box. The baseline (red box) is consisting of 100 random taskinstances, independently optimized by single-policy learning with single-objective manufacturing-process NFQ, as proposed in [8]. The plot shows a positive effect of the amount of prior knowledge to the convergence speed.…”
Section: Resultsmentioning
confidence: 99%
“…Furthermore stochastic process behavior is induced by varying friction-coefficients, randomly drawn per process-execution from a beta-distribution with α = 3, β = 15. A detailed description of the evaluation environment can be found in [8].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…In the context of manufacturing, reinforcement learning methods have been proposed for model-free adaptive optimization on the device level and the operational level of various manufacturing processes. Recently published work include the optimization of process control in sheet metal milling (Veeramani et al 2019), polymerization reaction systems (Ma et al 2019), laser welding (Günther et al 2016) and in deep drawing (Dornheim et al 2019). Operational optimization objects are amongst others material flow in industrial mining (Kumar et al 2020), preventive maintenance scheduling of flow line systems (Wang et al 2016) and job shop scheduling (Kuhnle et al 2020).…”
Section: Contributionmentioning
confidence: 99%
“…Q s a π = оптимальна стратегія. Оскільки Q-функція (Q-function) робить дію явною, то оцінимо Q-значення (Q-values), використовуючи метод TD(0) [11], [13], а також визначимо стратегію (дія може бути вибрана простим відбором максимального Q-значення для поточного стану). Значення стратегії обчислюється із використанням методу TD(0), який є екземпляром більш загального класу методів TD(λ) [14].…”
Section: розв'язання задачіunclassified