A study on a Q-Learning algorithm application to a manufacturing assembly problem

Neves, Miguel; Vieira, Miguel; Neto, Pedro

doi:10.1016/j.jmsy.2021.02.014

Cited by 25 publications

(6 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many authors explored the deep Q learning algorithm and exploited it in a wide variety of complex applications, like collaborative business processes with cloud services, manufacturing assembly programs, many robotic applications, automated trading in equity stock markets, and many more [24][25][26][27][28]. Chatterjee et al [29] discussed deep reinforcement learning for the application where the most phishing activities are taking place on websites and detecting malicious URLs.…”

Section: Review Of Related Workmentioning

confidence: 99%

RDQN: ensemble of deep neural network with reinforcement learning in classification based on rough set theory for digital transactional fraud detection

Tekkali

Natarajan

2023

Complex Intell. Syst.

View full text Add to dashboard Cite

All financial sectors are facing the most common frauds, which are digital transactional frauds. Fraudsters have always engaged in illegal activities such as stealing personal information and logging in with unauthorised credentials. Many machine learning algorithms predict whether the transaction is factual or nonfactual but fail to decrease the processing time. Hybrid models are used in this case to identify the fraud in a quick and efficient manner. This article demarcates to construct a novel model, RDQN, i.e., deep reinforcement learning, that combines with the rough set theory. This article has three steps, including data pre-processing to determine the quality of the data, which affects the learning ability of the model, determining the structural relationship and gaining useful features from the data set using rough set theory, and doing a hybridization of DNN (deep neural network) and Q learning, which is called DQN. It uses the MISH activation function and the ReLU activation function in different layers for training dynamics in the neural network. The proposed model classifies and predicts that the transaction belongs to the category implemented by the agents by activating the reward function. The reinforcement-learning agent’s performance improves based on reward assessment. This reward function gives a more precise value for each transaction, and no fraudster can escape from the agent’s sight. This novel approach improves accuracy and reduces processing time by considering the best feature selection during the process.

show abstract

Section: Review Of Related Workmentioning

confidence: 99%

RDQN: ensemble of deep neural network with reinforcement learning in classification based on rough set theory for digital transactional fraud detection

Tekkali

Natarajan

2023

Complex Intell. Syst.

View full text Add to dashboard Cite

show abstract

“…The studied ASP problem was based on the assembly case study proposed by Neves et al [24] of an airplane toy from the Yale-CMU-Berkeley Object and Benchmark Dataset [25,26], Fig 2. This assembly was optimized through the usage of the deep reinforcement learning algorithms A2C [21], DQN [1], and Rainbow [23], provided in the RLlib python library [27], and the tabular Q-Learning algorithm [28].…”

Section: Case Studymentioning

confidence: 99%

“…The MDP formulation of the airplane assembly sequence planning problem was based on the approach proposed by Neves et al [24].…”

Section: Mdp Formulationmentioning

confidence: 99%

“…Where t bi is the mean base time of the task i, without any other previous tasks done, apart from the tasks required due to task dependencies (Table 3), t c ki are the time correction elements due to already done tasks (Table 4), where y k ∈ {0, 1} encodes whether the task k has already been done, t t is the tool change time, which has a value of 2 time units, and x i ∈ {0, 1} is a parameter that encodes the necessity of a tool change. The values for the task's mean base time and time correction elements were obtained from the work done by Neves et al [24]. In a deterministic setting the accumulated time is calculated by the sum of all tasks' deterministic time durations, which in turn is the sum of the task base time t b , the sum of all time correction elements due to already completed tasks n k=0 y k t c k , and the tool change time t c if a tool change is required.…”

Section: Rewardmentioning

confidence: 99%

See 1 more Smart Citation

Deep reinforcement learning applied to an assembly sequence planning problem with user preferences

Neves

Neto

2022

Int J Adv Manuf Technol

Self Cite

View full text Add to dashboard Cite

Deep reinforcement learning (DRL) has demonstrated its potential in solving complex manufacturing decision-making problems, especially in a context where the system learns over time with actual operation in the absence of training data. One interesting and challenging application for such methods is the assembly sequence planning (ASP) problem. In this paper, we propose an approach to the implementation of DRL methods in ASP. The proposed approach introduces in the RL environment parametric actions to improve training time and sample efficiency and uses two different reward signals: (1) user's preferences and (2) total assembly time duration. The user's preferences signal addresses the difficulties and non-ergonomic properties of the assembly faced by the human and the total assembly time signal enforces the optimization of the assembly. Three of the most powerful deep RL methods were studied, Advantage Actor-Critic (A2C), Deep Q-Learning (DQN) and Rainbow, in two different scenarios: a stochastic and a deterministic one. Finally, the performance of the DRL algorithms was compared to tabular Q-Learning's performance. After 10000 episodes the system achieved near optimal behaviour for the algorithms tabular Q-Learning, A2C and Rainbow. Though, for more complex scenarios the algorithm tabular Q-Learning is expected to underperform in comparison to the other 2 algorithms. The results support the potential for the application of deep reinforcement learning in assembly sequence planning problems with human interaction.

show abstract

“…As one of most commonly used RL algorithms, the Q-Learning algorithm, which is based on value, reference strategy learning and TD method [7], has been widely applied to route planning, manufacturing and assembly, and dynamic train scheduling [8][9][10]. Many researchers have been dedicated to improving the low exploration efficiency problem of the traditional Q-Learning algorithm.…”

Section: Introductionmentioning

confidence: 99%

A Self-Adaptive Reinforcement-Exploration Q-Learning Algorithm

et al. 2021

View full text Add to dashboard Cite

Directing at various problems of the traditional Q-Learning algorithm, such as heavy repetition and disequilibrium of explorations, the reinforcement-exploration strategy was used to replace the decayed ε-greedy strategy in the traditional Q-Learning algorithm, and thus a novel self-adaptive reinforcement-exploration Q-Learning (SARE-Q) algorithm was proposed. First, the concept of behavior utility trace was introduced in the proposed algorithm, and the probability for each action to be chosen was adjusted according to the behavior utility trace, so as to improve the efficiency of exploration. Second, the attenuation process of exploration factor ε was designed into two phases, where the first phase centered on the exploration and the second one transited the focus from the exploration into utilization, and the exploration rate was dynamically adjusted according to the success rate. Finally, by establishing a list of state access times, the exploration factor of the current state is adaptively adjusted according to the number of times the state is accessed. The symmetric grid map environment was established via OpenAI Gym platform to carry out the symmetrical simulation experiments on the Q-Learning algorithm, self-adaptive Q-Learning (SA-Q) algorithm and SARE-Q algorithm. The experimental results show that the proposed algorithm has obvious advantages over the first two algorithms in the average number of turning times, average inside success rate, and number of times with the shortest planned route.

show abstract

A study on a Q-Learning algorithm application to a manufacturing assembly problem

Cited by 25 publications

References 26 publications

RDQN: ensemble of deep neural network with reinforcement learning in classification based on rough set theory for digital transactional fraud detection

RDQN: ensemble of deep neural network with reinforcement learning in classification based on rough set theory for digital transactional fraud detection

Deep reinforcement learning applied to an assembly sequence planning problem with user preferences

A Self-Adaptive Reinforcement-Exploration Q-Learning Algorithm

Contact Info

Product

Resources

About