The scalling capability of network controllers is important for network operators to have the confidence to continue to expand the communication network. Today's networks can use the Software Defined Networks (SDN) concept to separate the control and the switching planes, and use autonomous self-management features to quickly react to any network event, while providing the network with necessary agility to support custom Service Level Agreement (SLA) to its clients, while lowering the capital and operational expenditures. Both SDN and autonomous management concepts have a centralized architecture, requiring a single entity managing every aspect of the network, making it difficult to manage an entire network in the massive IoT world, where services tend to be provided close to the users, and edgebased approaches are key to the support of 5G and beyond services. The support of such dynamic and flexible network requires an approach where the network management responsibilities are distributed through multiple autonomous OpenFlow controllers, which we denote as ArchSDN controllers. By assigning OpenFlow switches to different ArchSDN controllers, the network becomes divided into different sectors, each controlled exclusivly by one ArchSDN controller. These controllers, coordinate their actions and use a reinforcement-learning based decision mechanism to explore, learn and implement near-to-optimum end-to-end communication paths. The evaluation results show that the proposed decision system is capable of finding near-to-optimum solutions, learn from the obtained results to improve future service activation results, and quickly adapth the network to the loss of a communication link by responding in less than 100 ms.