Single-chained blockchains are being rapidly replaced by sidechains (or sharded chains), due to their high QoS (Quality of Service), and low complexity characteristics. Existing sidechaining models use context-specific machine-learning optimization techniques, which limits their scalability when applied to real-time use cases. Moreover, these models are also highly complex and require constant reconfigurations when applied to dynamic deployment scenarios. To overcome these issues, this text proposes design of a novel low-complexity Q-Learning Model based on Proof-of-Context (PoC) consensus for scalable sidechains. The proposed model initially describes a Q-Learning method for sidechain formation, which assists in maintaining high scalability even under large-scale traffic scenarios. This model is cascaded with a novel Proof-of-Context based consensus that is capable of representing input data into context-independent formats. These formats assist in providing high-speed consensus, which is uses intent of data, instead of the data samples. To estimate this intent, a set of context-based classification models are used, which assist in representing input data samples into distinctive categories. These models include feature representation via Long-Short-Term-Memory (LSTM), and classification via 1D Convolutional Neural Networks (CNNs), that can be used for heterogeneous application scenarios. Due to representation of input data samples into context-based categories, the proposed model is able to reduce mining delay by 8.3%, reduce energy needed for mining by 2.9%, while maintaining higher throughput, and lower mining jitters when compared with standard sidechaining techniques under similar use cases.