Flow routing can achieve fine-grained network performance optimizations by routing distinct packet traffic flows over different network paths. While the centralized control of Software-Defined Networking (SDN) provides a control framework for implementing centralized network optimizations, e.g., optimized flow routing, the implementation of flow routing that is adaptive to varying traffic loads requires complex models. The goal of this study is to pursue a model-free approach that is based on reinforcement learning. We design and evaluate QR-SDN, a classical tabular reinforcement learning approach that directly represents the routing paths of individual flows in its state-action space. Due to the direct representation of flow routes in the QR-SDN state-action space, QR-SDN is the first reinforcement learning SDN routing approach to enable multiple routing paths between a given source (ingress) switch-destination (egress) switch pair while preserving the flow integrity. That is, in QR-SDN, packets of a given flow take the same routing path, while different flows with the same source-destination switch pair may take different routes (in contrast, the recent DRL-TE approach splits a given flow on a per-packet basis incurring high complexity and out-of-order packets). We implemented QR-SDN in a Software-Defined Network (SDN) emulation testbed. Our evaluations demonstrate that the flow-preserving multi-path routing of QR-SDN achieves substantially lower flow latencies than prior routing approaches that determine only a single source-destination route. A limitation of QR-SDN is that the state-action space grows exponentially with the number of network nodes. Addressing the scalability of direct flow routing, e.g., through routing only high-rate flows, is an important direction for future research. The QR-SDN code is made publicly available to support this future research.