Policy gradient methods are amongst the most efficient for on-policy, model-free reinforcement learning. However, they suffer from high variance in gradient updates, making them unstable during training. Subtracting a baseline from the rewards is an effective strategy to reduce variance, such as in actorcritic models. This work presents a variation of the actor-critic model that uses a fuzzy system instead of a neural network to estimate the state value function. The fuzzy value approximation is inspired by previous value-based methods such as fuzzy Q-learning. Experiments with the cart-pole benchmark show that fuzzy value approximation outperforms several reinforcement learning algorithms in terms of sample-efficiency.
The worldwide growth of e-commerce has created new challenges for logistics companies, one of which is being able to deliver products quickly and at low cost, which reflects directly in the way of sorting packages, needing to eliminate steps such as storage and batch creation. Our work presents a multi-agent system that uses trajectory data mining techniques to extract territorial patterns and use them in the dynamic creation of last-mile routes. The problem can be modeled as a Dynamic Capacitated Vehicle Routing Problem (VRP) with Stochastic Customer, being therefore NP-HARD, what makes its implementation unfeasible for many packages. The work's main contribution is to solve this problem only depending on the Warehouse system configurations and not on the number of packages processed, which is appropriate for Big Data scenarios commonly present in the delivery of e-commerce products. Computational experiments were conducted for single and multi depot instances. Due to its probabilistic nature, the proposed approach presented slightly lower performances when compared to the static VRP algorithm. However, the operational gains that our solution provides making it very attractive for situations in which
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.