In an era of fully digitally interconnected people and machines, IoT devices become a real target for attackers. Recent incidents such as the well-known Mirai botnet, have shown that the risks incurred are huge and therefore a risk assessment is mandatory. In this paper we present a novel approach on collecting relevant data about IoT attacks. We detail a SSH/Telnet honeypot system that leverages reinforcement learning algorithms in order to interact with the attackers, and we present the results obtained in view of defining optimal reward functions to be used. One of the key issues regarding the performance of such algorithms is the direct dependence on the reward functions used. The main outcome of our study is a full implementation of an IoT honeypot system that leverages Apprenticeship Learning using Inverse Reinforcement Learning, in order to generate best suited reward functions.