Reinforcement Learning (RL) is a popular approach for deciding on an optimum traffic signal control policy to alleviate congestion in a road network. However, the traffic signal control policy can also be optimized in conjunction with the design of vehicular flow directions to further improve traffic performance. The design of vehicular flow directions refers to the right of way or directional restriction imposed in a road network. Here, a new RL-based technique is presented for co-optimization of the design of vehicular flow directions and control policy for traffic signals. This technique consists of a two-step iterative process, wherein a set of vehicular flow directions for a road network is generated, then a RL-based approach is used to train the traffic signal control policy over the given set of vehicular flow directions. Following the proposed technique, the vehicular flow directions with poor traffic performance are iteratively eliminated, while new vehicular flow directions are generated to achieve better traffic performance and realize convergence to a maximum possible expected traffic performance. The proposed RL-based technique is evaluated by using two examples under rush hour and non-rush hour traffic conditions. It is found that, compared to a RL-based approach in which only traffic signal control policy is considered, the proposed approach can be used to obtain a better traffic performance in terms of vehicular queue length and throughput.INDEX TERMS Co-optimization, reinforcement learning, vehicular flow direction design, traffic signal control, deep neural networks.