Reinforcement learning (RL) algorithms have been widely applied in solving traffic signal control problems. Traffic environments, however, are intrinsically nonstationary, which creates a convergence problem that RL algorithms struggle to overcome. Basically, as a target problem for an RL algorithm, the Markov decision process (MDP) can be solved only when both the transition and reward functions do not vary. Unfortunately, the environment for traffic signal control is not stationary since the goal of traffic signal control varies according to congestion levels. For unsaturated traffic conditions, the objective of traffic signal control should be to minimize vehicle delay. On the other hand, the objective must be to maximize the throughput when traffic flow is saturated. A multiregime analysis is possible for varying conditions, but classifying the traffic regime creates another complex task. The present study provides a meta‐RL algorithm that embeds a latent vector to recognize the different contexts of an environment in order to automatically classify traffic regimes and apply a customized reward for each context. In simulation experiments, the proposed meta‐RL algorithm succeeded in differentiating rewards according to the saturation level of traffic conditions.
Cameras for traffic surveillance are usually pole-mounted and produce images that reflect a birds-eye view. Vehicles in such images, in general, assume an ellipse form. A bounding box for the vehicles usually includes a large empty space when the vehicle orientation is not parallel to the edges of the box. To circumvent this problem, the present study applied bounding ellipses to a non-anchor-based, single-shot detection model (CenterNet). Since this model does not depend on anchor boxes, non-max suppression (NMS) that requires computing the intersection over union (IOU) between predicted bounding boxes is unnecessary for inference. The SpotNet that extends the CenterNet model by adding a segmentation head was also tested with bounding ellipses. Two other anchor-based, single-shot detection models (YOLO4 and SSD) were chosen as references for comparison. The model performance was compared based on a local dataset that was doubly annotated with bounding boxes and ellipses. As a result, the performance of the two models with bounding ellipses exceeded that of the reference models with bounding boxes. When the backbone of the ellipse models was pretrained on an open dataset (UA-DETRAC), the performance was further enhanced. Several data augmentation schemes also improved the performance of the proposed models. As a result, the best mAP score of a CenterNet exceeds 0.95 when augmenting heatmaps with bounding ellipses.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.