2020
DOI: 10.1002/jeab.587
|View full text |Cite
|
Sign up to set email alerts
|

The dynamics of behavior: Review of Sutton and Barto: Reinforcement Learning: An Introduction (2nd ed.)

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(4 citation statements)
references
References 18 publications
0
4
0
Order By: Relevance
“…(2) Dispatching Rule 2: Firstly, 2according to Equation (15), the job J i with the largest estimated remaining processing time is selected from the uncompleted jobs and its operation O i(OPi(t)+1) is selected. A suitable machine for O i(OPi(t)+1) then is selected according to Equation (14).…”
Section: Action Setmentioning
confidence: 99%
See 1 more Smart Citation
“…(2) Dispatching Rule 2: Firstly, 2according to Equation (15), the job J i with the largest estimated remaining processing time is selected from the uncompleted jobs and its operation O i(OPi(t)+1) is selected. A suitable machine for O i(OPi(t)+1) then is selected according to Equation (14).…”
Section: Action Setmentioning
confidence: 99%
“…In 2018, some scholars applied deep reinforcement learning (DRL) to the scheduling field and then it was widely used, which attracted the attention and competitive research of scholars in China and abroad. The basic components of reinforcement learning are the environment, agents, the behavior policy, the reward and the value function, where the learning process is usually described by a Markov decision process (MDP) [14]. For large-scale problems, it is necessary to parameterize it through a policy network and to balance exploration and exploitation, which ensures that the scheduling agent converges to the optimal or near-optimal solution in a reasonable time, thus improving the adaptability and self-learning of production scheduling in intelligent manufacturing.…”
Section: Introductionmentioning
confidence: 99%
“…Reinforcement learning (RL) is a branch of machine learning, [1,2] which is an agent that interacts with an environment through a sequence of state observation, action (a k ) decision, reward (R k ) receive, and value (Q (S, A) ) update. The aim is to obtain a policy consisting of state-action pairs to guide the agent to maximize reward, which is typically used to solve Markov decision process (MDP) or partially observable Markov decision process (POMDP) problems.…”
Section: Introductionmentioning
confidence: 99%
“…In [22,25,[31][32][33][34][35][36][37][38][39][40][41][42][43], shallow and deep neural networks have been suggested to approximate the Q-value function to achieve better optimisation results with shorter convergence time. Low convergence time is also desirable for online applications.…”
Section: Introductionmentioning
confidence: 99%