Markov decision theory framework for resource allocation in LEO satellite constellations

Usaha, Wipawee; Barria, Javier A.

doi:10.1049/ip-com:20020510

Cited by 5 publications

(13 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Under policy π, the process {x k , ω k } is an embedded finite state Markov chain evolving in continuous time. Note that in [19], it has been shown that the mean holding time between state transitions no longer needs to be Markovian. Note also that even though the chain evolves in continuous time, we only need to consider the system state at epochs where the events and decisions take place.…”

Section: Problem Formulationmentioning

confidence: 99%

“…If a new call of class-j is accepted the number of class-j new calls becomes x N (j, s) + 1 on link s ∈ r. The net gain of admitting a class-j new/handover (N/HO) call to some route r is the gain obtained from admitting the call rather than rejecting it and is given by ζ j, +h q (x ) −h q (x) where x is the network state after admitting the call and = N, HO [19]. From the quadratic form of the feature vector φ(x) in (14), the following net-gain result can be obtained in terms of the link net gains [11]:…”

Section: Q(5k S)]mentioning

confidence: 99%

“…In this algorithm policy learning is performed at each (satellite) node and the resulting policy is a randomized policy, that is, given an event and network state, the policy maps to each state a distribution over the set of available actions. The embedded Markov chains {x k , ω k , a k } in the satellite network evolves within state space (S × Ω) × A and the distribution of the holding times between state transitions could be nonexponential [19]. Let µ θ be a randomized policy parameterized by a vector θ ∈ R M (M is the number of tunable parameters) for a satellite node in a given topology say TP n .…”

Section: A Actor-critic Semi-markov Decision Algorithm (Acsmdp)mentioning

confidence: 99%

“…The effect of rerouted traffic caused by the changing satellite topology is hence here considered so that forced termination of rerouted calls due to insufficient resources is minimized. In [19], it was shown that routing strategies determined from a semiMarkov decision process (SMDP) formulation can minimize the dropping of both new and rerouted traffic. However, the computational complexity of the algorithms proposed in [19] makes the solution impractical for even small size networks.…”

mentioning

confidence: 99%

“…In [19], it was shown that routing strategies determined from a semiMarkov decision process (SMDP) formulation can minimize the dropping of both new and rerouted traffic. However, the computational complexity of the algorithms proposed in [19] makes the solution impractical for even small size networks. Furthermore, it is well known that solving the SMDP formulation with conventional dynamic-programming (DP) methods can still be too complex to solve-even with simplifications and suitable approximations.…”

mentioning

confidence: 99%

See 4 more Smart Citations

Reinforcement Learning for Resource Allocation in LEO Satellite Networks

Usaha

Barria

2007

IEEE Trans. Syst., Man, Cybern. B

View full text Add to dashboard Cite

Abstract-In this paper, we develop and assess online decisionmaking algorithms for call admission and routing for low Earth orbit (LEO) satellite networks. It has been shown in a recent paper that, in a LEO satellite system, a semi-Markov decision process formulation of the call admission and routing problem can achieve better performance in terms of an average revenue function than existing routing methods. However, the conventional dynamic programming (DP) numerical solution becomes prohibited as the problem size increases. In this paper, two solution methods based on reinforcement learning (RL) are proposed in order to circumvent the computational burden of DP. The first method is based on an actor-critic method with temporaldifference (TD) learning. The second method is based on a critic-only method, called optimistic TD learning. The algorithms enhance performance in terms of requirements in storage, computational complexity and computational time, and in terms of an overall long-term average revenue function that penalizes blocked calls. Numerical studies are carried out, and the results obtained show that the RL framework can achieve up to 56% higher average revenue over existing routing methods used in LEO satellite networks with reasonable storage and computational requirements.Index Terms-Call admission control (CAC), low Earth orbit (LEO) satellite network, reinforcement learning (RL), routing, temporal-difference (TD) learning.

show abstract

Section: Problem Formulationmentioning

confidence: 99%

Section: Q(5k S)]mentioning

confidence: 99%

Section: A Actor-critic Semi-markov Decision Algorithm (Acsmdp)mentioning

confidence: 99%

mentioning

confidence: 99%

mentioning

confidence: 99%

See 3 more Smart Citations

Reinforcement Learning for Resource Allocation in LEO Satellite Networks

Usaha

Barria

2007

IEEE Trans. Syst., Man, Cybern. B

View full text Add to dashboard Cite

show abstract

Hardware architecture for high-speed real-time dynamic programming applications

Matthews

Elhanany

2008

IET Comput. Digit. Tech.

View full text Add to dashboard Cite

A novel hardware architecture for performing the core computations required by dynamic programming (DP) techniques is introduced. The latter pertain to a vast range of applications that necessitate an optimal sequence of decisions to be obtained. An underlying assumption is that a complete model of the environment is provided, whereby the dynamics are governed by a Markov decision process. Existing DP implementations have traditionally focused on software-based mechanisms. Here, the authors present a method for exploiting the inherent parallelism associated with computing both the value function and optimal policy. This allows for the optimal policy to be obtained several orders of magnitude faster than traditional software implementations, establishing the viability of the approach for demanding, real-time applications. The well-known rental car management problem has been studied as a benchmark for which a field-programmable gate array-based implementation was designed. The results highlight the advantages of the proposed approach with respect to the execution speed and the scalability properties.

show abstract

Handover Management Based on Fuzzy Logic Decision for Leo Satellite Networks

Wang¹

2005

Intelligent Automation & Soft Computing

View full text Add to dashboard Cite

Due to the property of low propagation delay, LEO (Leo Earth Orbit) satellite network is an indispensable solution for providing a global coverage and multimedia communication. However, since the LEO satellites move very fast in orbits, to offer a smooth handover is regarded as the most significant service feature. In this paper, an effective handover management is proposed. The proposed scheme is based on fuzzy logic decision to efficiently determine handover and considers three important parameters, including blocking rate, dropping rate, and bandwidth utilization. From the simulation results, this scheme can achieve the higher bandwidth utilization, the lower new call blocking rate, and the lower handover call dropping rate.

show abstract

Markov decision theory framework for resource allocation in LEO satellite constellations

Cited by 5 publications

References 3 publications

Reinforcement Learning for Resource Allocation in LEO Satellite Networks

Reinforcement Learning for Resource Allocation in LEO Satellite Networks

Hardware architecture for high-speed real-time dynamic programming applications

Handover Management Based on Fuzzy Logic Decision for Leo Satellite Networks

Contact Info

Product

Resources

About