Safe exploration is regarded as a key priority area for reinforcement learning research. With separate reward and safety signals, it is natural to cast it as constrained reinforcement learning, where expected long-term costs of policies are constrained. However, it can be hazardous to set constraints on the expected safety signal without considering the tail of the distribution. For instance, in safety-critical domains, worst-case analysis is required to avoid disastrous results. We present a novel reinforcement learning algorithm called Worst-Case Soft Actor Critic, which extends the Soft Actor Critic algorithm with a safety critic to achieve risk control. More specifically, a certain level of conditional Value-at-Risk from the distribution is regarded as a safety measure to judge the constraint satisfaction, which guides the change of adaptive safety weights to achieve a trade-off between reward and safety. As a result, we can optimize policies under the premise that their worst-case performance satisfies the constraints. The empirical analysis shows that our algorithm attains better risk control compared to expectation-based methods.
Background Fractures are the most common type of unintentional injury in children, with traumatic upper limb fractures accounting for approximately 80% of all childhood fractures. Many epidemiological investigations of upper limb fractures in children have been conducted, but with the development of society, the patterns of childhood fractures may have changed. This study aimed to analyze the epidemiology and economic cost factors of upper limb fractures in Chinese children. Methods We retrospectively reviewed children with upper limb fractures or old upper limb fractures hospitalized between December 1, 2015, and December 31, 2019, in 22 tertiary children’s hospitals, under China’s Futang Research Center of Pediatric Development. We used the ICD10 codes on the front sheet of their medical records to identify cases and extracted data on age, sex, injury cause, fracture site, treatment, the year of admission and discharge, visiting time, and various costs during hospitalization from the medical record. Results A total of 32,439 children (21,478 boys and 10,961 girls) were identified, of whom 32,080 had fresh fractures and 359 had old fractures. The peak age was 3–6 years in both sexes. A total of 4788 were infants, 14,320 were preschoolers, 10,499 were in of primary school age, and 2832 were adolescent. Fractures were most frequent in autumn (August to October). Admissions peaked at 0 o’clock. Among the 32,080 children with fresh upper limb fractures, the most common fracture site was the distal humerus, with a total of 20,090 fracture events including 13,134 humeral supracondylar fractures and 4914 lateral humeral condyle fractures. The most common cause of injuries was falling over. The most common joint dislocation accompanying upper limb fractures occurred in the elbow, involving 254 cases. Surgery was performed in 31,274 children, and 806 did not receive surgery. Among those with clear operative records, 10,962 children were treated with open reduction and 18,066 with closed reduction. The number of cases was largest in the East China region (Anhui Province, Shandong Province, Jiangsu Province, Zhejiang Province, and Fujian Province), with 12,065 cases overall. Among the 359 children with old fractures, 118 were admitted with a diagnosis of “old humerus fracture,” accounting for the highest proportion; 244 underwent surgical open reduction, 16.16% of whom had osteotomy. For the children with fresh fractures, the average total hospital cost was 10,994 yuan, and the highest average total hospital cost was 14,053 yuan, for humeral shaft fractures. For the children with old fractures, the average total hospital cost was 15,151 yuan, and the highest average total hospital cost was 20,698 yuan, for old ulna fractures. Cost of materials was the principle factor affecting total hospital cost, followed by surgery and anesthesia costs, both in children with fresh fractures and those with old fractures. Significant differences were observed in all hospital costs (P < 0.001) except treatment costs (P = 0.702), between children with fresh fractures and those with old fractures. Among the 32,439 children, full self-payment accounted for the highest proportion of all payment methods, involving 17,088 cases, with an average cost of 11,111 yuan. Conclusion Information on the epidemiological characteristics of childhood fractures suggests that health and safety education and protective measures should be strengthened to prevent upper limb fractures in children. For both fresh and old fractures, the cost of materials was the principal factor affecting total hospital cost, followed by surgery and anesthesia costs. The overall average total hospital cost is higher in children with old fractures than in children with fresh fractures. Among all children, full self-payment, at 53% of children, accounted for the highest proportion of all payment methods. Hospital costs are a headache for those families who will pay on their own. It can lead to a delayed treatment and unhealed fractures or malunion in some children. Therefore, the child trauma care system and training on fractures need to be improved, to reduce the late presentation of fractures. These combined measures will improve children’s quality of life, reduce the expenditure of families, and decrease the public health burden. To provide better medical services for children, authorities must improve the allocation of health resources, establish a comprehensive medical security system for children, and set up more child trauma centers.
Safety is critical to broadening the real-world use of reinforcement learning. Modeling the safety aspects using a safety-cost signal separate from the reward and bounding the expected safety-cost is becoming standard practice, since it avoids the problem of finding a good balance between safety and performance. However, it can be risky to set constraints only on the expectation neglecting the tail of the distribution, which might have prohibitively large values. In this paper, we propose a method called Worst-Case Soft Actor Critic for safe RL that approximates the distribution of accumulated safety-costs to achieve risk control. More specifically, a certain level of conditional Value-at-Risk from the distribution is regarded as a safety constraint, which guides the change of adaptive safety weights to achieve a trade-off between reward and safety. As a result, we can compute policies whose worst-case performance satisfies the constraints. We investigate two ways to estimate the safety-cost distribution, namely a Gaussian approximation and a quantile regression algorithm. On the one hand, the Gaussian approximation is simple and easy to implement, but may underestimate the safety cost, on the other hand, the quantile regression leads to a more conservative behavior. The empirical analysis shows that the quantile regression method achieves excellent results in complex safety-constrained environments, showing good risk control.
In the absence of assigned tasks, a learning agent typically seeks to explore its environment efficiently. However, the pursuit of exploration will bring more safety risks. An under-explored aspect of reinforcement learning is how to achieve safe efficient exploration when the task is unknown. In this paper, we propose a practical Constrained Entropy Maximization (CEM) algorithm to solve task-agnostic safe exploration problems, which naturally require a finite horizon and undiscounted constraints on safety costs. The CEM algorithm aims to learn a policy that maximizes state entropy under the premise of safety. To avoid approximating the state density in complex domains, CEM leverages a k-nearest neighbor entropy estimator to evaluate the efficiency of exploration. In terms of safety, CEM minimizes the safety costs, and adaptively trades off safety and exploration based on the current constraint satisfaction. The empirical analysis shows that CEM enables the acquisition of a safe exploration policy in complex environments, resulting in improved performance in both safety and sample efficiency for target tasks.
In the context of food security, the market-oriented allocation of factors under the collective ownership system has had a profound impact on agricultural production. As a hot issue under the Household Responsibility System (HRS), the impact mechanism of farmland market transaction on agricultural production efficiency deserves discussion. Based on the stochastic frontier production function model, this paper analyzes the impact of farmland transfer on farmers’ production technical efficiency under the external environmental factors by using the moderating effect and threshold effect. The study found that farmland transfer can improve farmers’ technical efficiency. The market price of agricultural products and farmland transfer subsidies have a positive moderating effect on the impact of farmland transfer on technical efficiency. Furthermore, farmland transfer subsidy shows a nonlinear effect on the impact of technical efficiency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.