Reinforcement learning aims at searching the best policy model for decision making, and has been shown powerful for sequential recommendations. The training of the policy by reinforcement learning, however, is placed in an environment. In many real-world applications, however, the policy training in the real environment can cause an unbearable cost, due to the exploration in the environment. Environment reconstruction from the past data is thus an appealing way to release the power of reinforcement learning in these applications. The reconstruction of the environment is, basically, to extract the casual effect model from the data. However, real-world applications are often too complex to offer fully observable environment information. Therefore, quite possibly there are unobserved confounding variables lying behind the data. The hidden confounder can obstruct an effective reconstruction of the environment. In this paper, by treating the hidden confounder as a hidden policy, we propose a deconfounded multi-agent environment reconstruction (DEMER) approach in order to learn the environment together with the hidden confounder. DEMER adopts a multi-agent generative adversarial imitation learning framework. It proposes to introduce the confounder embedded policy, and use the compatible discriminator for training the policies. We then apply DEMER in an application of driver program recommendation. We firstly use an artificial driver program recommendation environment, abstracted from the real application, to verify and analyze the effectiveness of DEMER. We then test DEMER in the real application of Didi Chuxing. Experiment results show that DEMER can effectively reconstruct the hidden confounder, and thus can build the environment better. DEMER also derives a recommendation policy with
We report fully automated self-calibrating formaldehyde analyzers relying on a hybrid flow format and include operational scheme and design details. Long-term operation is made possible with the use of syringe pumps. Four identical analyzers were built and showed low LODs of 120 pptv or better (S/N = 3) and good linearity over 0-50 ppbv HCHO concentration range (r2 > 0.9960), all concentrations refer to 10 min averaging times. The analyzer can resume normal operation after shortterm power failure with at most two cycles of data loss following restart. Good agreement between analyzers was observed for either indoor or outdoor measurements. The use of an integrated HCHO calibration source and full control by the host computer via a graphical user interface program enables the instrument to switch between zero, calibration, and sampling modes in a programmed automated manner. Detailed field data from deployment in three urban Texas locations from the summer of 2006 are presented. Features of the data, including an episode in which the HCHO concentration exceeded 50 ppbv, the highest reported ambient HCHO concentration in North America to our knowledge, are discussed in some detail.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.