IEEE Proceedings. Intelligent Vehicles Symposium, 2005. 2005
DOI: 10.1109/ivs.2005.1505104
|View full text |Cite
|
Sign up to set email alerts
|

Double action Q-learning for obstacle avoidance in a dynamically changing environment

Abstract: In this paper, we propose a new method for sobing the reinforcement learning problem in a dynamically changing environment, as inivehicle navigatiop, in which the Markov Decision Process used in traditional reinforcement learning is modified so that the response of the environment is taken into consideratiin for determining the agent's next state. This i s achieved by changing the action-value function to handle three parameters at a time, namely, the current state, action taken by the agent, and action taken … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2007
2007
2008
2008

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(1 citation statement)
references
References 6 publications
0
1
0
Order By: Relevance
“…1. In this paper, we assume that (a) the proposed method runs in a vehicle, which is to perform CA and other tasks in a DCE; (b) the method for determining the vehicle's action decision is outside the scope of this paper (Interested reader should refer to [12] for details); (c) an observable pedestrian instance is a distance value measured at time t by the vehicle's sensors; and (d) pedestrian behavior is defined by a series of pedestrian motions. To start with, when a group of pedestrian instances at time t are observed, they are first associated with existing trajectories that have been assembled through the previous t-1 time steps.…”
Section: Introductionmentioning
confidence: 99%
“…1. In this paper, we assume that (a) the proposed method runs in a vehicle, which is to perform CA and other tasks in a DCE; (b) the method for determining the vehicle's action decision is outside the scope of this paper (Interested reader should refer to [12] for details); (c) an observable pedestrian instance is a distance value measured at time t by the vehicle's sensors; and (d) pedestrian behavior is defined by a series of pedestrian motions. To start with, when a group of pedestrian instances at time t are observed, they are first associated with existing trajectories that have been assembled through the previous t-1 time steps.…”
Section: Introductionmentioning
confidence: 99%