Kubernetes (k8s) has the potential to coordinate distributed edge resources and centralized cloud resources, but currently lacks a specialized scheduling framework for edgecloud networks. Besides, the hierarchical distribution of heterogeneous resources makes the modeling and scheduling of k8soriented edge-cloud network particularly challenging. In this paper, we introduce KaiS, a learning-based scheduling framework for such edge-cloud network to improve the long-term throughput rate of request processing. First, we design a coordinated multiagent actor-critic algorithm to cater to decentralized request dispatch and dynamic dispatch spaces within the edge cluster. Second, for diverse system scales and structures, we use graph neural networks to embed system state information, and combine the embedding results with multiple policy networks to reduce the orchestration dimensionality by stepwise scheduling. Finally, we adopt a two-time-scale scheduling mechanism to harmonize request dispatch and service orchestration, and present the implementation design of deploying the above algorithms compatible with native k8s components. Experiments using real workload traces show that KaiS can successfully learn appropriate scheduling policies, irrespective of request arrival patterns and system scales. Moreover, KaiS can enhance the average system throughput rate by 15.9% while reducing scheduling cost by 38.4% compared to baselines.
We consider federated learning in tiered communication networks. Our network model consists of a set of silos, each holding a vertical partition of the data. Each silo contains a hub and a set of clients, with the silo’s vertical data shard partitioned horizontally across its clients. We propose Tiered Decentralized Coordinate Descent (TDCD), a communication-efficient decentralized training algorithm for such two-tiered networks. The clients in each silo perform multiple local gradient steps before sharing updates with their hub to reduce communication overhead. Each hub adjusts its coordinates by averaging its workers’ updates, and then hubs exchange intermediate updates with one another. We present a theoretical analysis of our algorithm and show the dependence of the convergence rate on the number of vertical partitions and the number of local updates. We further validate our approach empirically via simulation-based experiments using a variety of datasets and objectives.
Gradient-based iterative algorithms have been widely used to solve optimization problems, including resource sharing and network management. When system parameters change, it requires a new solution independent of the previous parameter settings from the iterative methods. Therefore, we propose a learning approach that can quickly produce optimal solutions over a range of system parameters for constrained optimization problems. Two Coupled Long Short-Term Memory networks (CLSTMs) are proposed to find the optimal solution. The advantages of this framework include: (1) near-optimal solution for a given problem instance can be obtained in few iterations during the inference, (2) enhanced robustness as the CLSTMs can be trained using system parameters with distributions different from those used during inference to generate solutions. In this work, we analyze the relationship between minimizing the loss functions and solving the original constrained optimization problem for certain parameter settings. Extensive numerical experiments using datasets from Alibaba reveal that the solutions to a set of nonconvex optimization problems obtained by the CLSTMs reach within 90% or better of the corresponding optimum after 11 iterations, where the number of iterations and CPU time consumption are reduced by 81% and 33%, respectively, when compared with the gradient descent with momentum.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.