IoT has recently witnessed a boom in AI deployment at the edge as a result of the newly developed small size Machine Learning (ML) models and integrated hardware accelerators. Although it brings huge benefits such as privacy-preserving and low-latency applications, it still suffers from typical resource limitations of edge devices. A new approach aims to deploy multiple inference models varying in size and accuracy onboard the edge device which could alleviate some of these limitations. This dynamic system can be leveraged to provide real-time energy efficient application by smartly allocating inference tasks to inference local models or offload to edge servers based on current constraints. In this work, we tackle the problem of efficiently allocating inference models for a given set of inference tasks between local inference models and edge server models in parallel under given time and energy constraints. This problem is considered strongly NP-hard and therefore we propose LITOSS, a 2-stage framework in which we use a lightweight Genetic Algorithm-based schemer for task scheduling along with a Reinforcement Learning (RL) agent for improving edge server selection. We perform experiments using a raspberry pi with a set of edge servers. Results show that our framework performed relatively faster compared to other meta-heuristic schemes such as LGSTO, ACO and PSO while providing higher average accuracy. We also show that using an RL agent to select the best subset of available edge servers increased, or maintained in worst cases, the average accuracy while reducing the average scheduling times.