“…The actor-critic algorithm is much simpler than Q-learning in the computation, 19 can determine the optimal policy, and effectively applied to control the tasks. 20,21 Markov chain model for investment…”
How to get maximal benefit within a range of risk in securities market is a very interesting and widely concerned issue. Meanwhile, as there are many complex factors that affect securities' activity, such as the risk and uncertainty of the benefit, it is very difficult to establish an appropriate model for investment. Aiming at solving the curse of dimension and model disaster caused by the problem, we use the approximate dynamic programming to set up a Markov decision model for the multi-time segment portfolio with transaction cost. A model-based actor-critic algorithm under uncertain environment is proposed, where the optimal value function is obtained by iteration on the basis of the constrained risk range and a limited number of funds, and the optimal investment of each period is solved by using the dynamic planning of limited number of fund ratio. The experiment indicated that the algorithm could get a stable investment, and the income could grow steadily.
“…The actor-critic algorithm is much simpler than Q-learning in the computation, 19 can determine the optimal policy, and effectively applied to control the tasks. 20,21 Markov chain model for investment…”
How to get maximal benefit within a range of risk in securities market is a very interesting and widely concerned issue. Meanwhile, as there are many complex factors that affect securities' activity, such as the risk and uncertainty of the benefit, it is very difficult to establish an appropriate model for investment. Aiming at solving the curse of dimension and model disaster caused by the problem, we use the approximate dynamic programming to set up a Markov decision model for the multi-time segment portfolio with transaction cost. A model-based actor-critic algorithm under uncertain environment is proposed, where the optimal value function is obtained by iteration on the basis of the constrained risk range and a limited number of funds, and the optimal investment of each period is solved by using the dynamic planning of limited number of fund ratio. The experiment indicated that the algorithm could get a stable investment, and the income could grow steadily.
“…A good solution to this problem might be to exploit both mechanisms, in order to get the best out of each. Even if a few steps in this direction have already been made ( [5,26]), no final solution has been proposed yet.…”
Section: About Learning and Trainingmentioning
confidence: 99%
“…A second example of the application of the BAT methodology is HAMSTER, a mobile robot based on a commercial platform, whose task is to bring "food" to its "nest." 5 We shall not describe all steps in the development of this robot, but confine ourselves to the description of the main differences with respect to AM.…”
We propose Behavior Engineering as a new technological area whose aim is to provide methodologies and tools for developing autonomous robots. Building robots is a very complex engineering enterprise that requires the exact definition and scheduling of the activities which a designer, or a team of designers, should follow. Behavior Engineering is, within the autonomous robotics realm, the equivalent of more established disciplines like Software Engineering and Knowledge Engineering. In this article we first give a detailed presentation of a Behavior Engineering methodology, which we call Behavior Analysis and Training (BAT), where we stress the role of learning and training. Then we illustrate the application of the BAT methodology to three cases involving different robots: two mobile robots and a manipulator. Results show the feasibility of the proposed approach.
In this paper, we propose Ant-Q learning Algorithm [1], which uses the habits of biological ants, to find a new way to solve Stable Marriage Problem(SMP) [3] presented by Gale-Shapley [2]. The issue of SMP is to find optimum matching for a stable marriage based on their preference lists (PL). The problem of Gale-Shapley algorithm is to get a stable matching for only male (or female). We propose other way to satisfy various requirements for SMP.ACS(Ant colony system) is an swarm intelligence method to find optimal solution by using phermone of ants. We try to improve ACS technique by adding Q learning [9] concept. This Ant-Q method can solve SMP problem for various requirements. The experiment results shows the proposed method is good for the problem.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.