Cytotoxic T cells and regulatory T cells play a crucial role in the outcome of cancer patients. Besides the density of these cells, it was shown recently that the spatial distribution is equally important. Here, we specifically analyzed the spatial distribution of these T cell subtypes at the epithelial-stromal interface in a rectal cancer cohort and its relevance for prognosis. We studied a cohort of 191 patients with advanced rectal cancer treated by radiochemotherapy (RCT). Tissue microarrays were immunohistochemical double-stained by FoxP3+ and CD+. Cell densities were analyzed in the stromal and epithelial compartment. Additionally, an image analysis software calculated the distances of lymphocytes to the epithelial-stromal interface (ESI). CD8+ and FoxP3+ cell counts decreased clearly after RCT with the decrease of FoxP3+ being more pronounced than of CD8+ cells. In the invasive front, short distances of the ESI to CD8+ and to FoxP3+ cells were associated with improved overall survival. Cell counts in the stromal compartment had no influence on prognosis. No correlation between stromal and epithelial lymphocyte densities was observed. The distance of epithelial-stromal interface to CD8+ and FoxP3+ cells was more accurate in predicting prognosis in the stromal compartment of rectal cancer patients than mere cell counts and could thereby be means of better stratifying patients for therapy. This observation will have to be validated in future prospective studies with regard to other tumor entities and its implications for the responsiveness of tumors to new therapeutic modalities.
We propose a method to develop trustworthy reinforcement learning systems. To ensure safety especially during exploration, we automatically synthesize a correct-by-construction runtime enforcer, called a shield, that blocks all actions of the agent that are unsafe with respect to a temporal logic specification. Our main contribution is a new synthesis algorithm for computing the shield online. Existing offline shielding approaches compute exhaustively the safety of all states-action combinations ahead-of-time, resulting in huge computation times, large memory consumption, and significant delays at runtime due to the look-ups in huge databases. The intuition behind online shielding is to compute at runtime the set of all states that could be reached in the near future. For each of these states, the safety of all available actions is analysed and used for shielding as soon as one of the considered states is reached. Our proposed method is general and can be applied to a wide range of planning problems with stochastic behaviour. For our evaluation, we selected a 2player version of the classical computer game Snake. The game requires fast decisions and the multiplayer setting induces a large state space, computationally expensive to analyze exhaustively. The safety objective of collision avoidance is easily transferable to a variety of planning tasks.
No abstract
Besides the recent impressive results on reinforcement learning (RL), safety is still one of the major research challenges in RL. RL is a machine-learning approach to determine near-optimal policies in Markov decision processes (MDPs). In this paper, we consider the setting where the safety-relevant fragment of the MDP together with a temporal logic safety specification is given, and many safety violations can be avoided by planning ahead a short time into the future. We propose an approach for online safety shielding of RL agents. During runtime, the shield analyses the safety of each available action. For any action, the shield computes the maximal probability to not violate the safety specification within the next k steps when executing this action. Based on this probability and a given threshold, the shield decides whether to block an action from the agent. Existing offline shielding approaches compute exhaustively the safety of all state-action combinations ahead of time, resulting in huge computation times and large memory consumption. The intuition behind online shielding is to compute at runtime the set of all states that could be reached in the near future. For each of these states, the safety of all available actions is analysed and used for shielding as soon as one of the considered states is reached. Our approach is well-suited for high-level planning problems where the time between decisions can be used for safety computations and it is sustainable for the agent to wait until these computations are finished. For our evaluation, we selected a 2-player version of the classical computer game Snake. The game represents a high-level planning problem that requires fast decisions and the multiplayer setting induces a large state space, which is computationally expensive to analyse exhaustively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.