In the past few years, the use of transformer-based models has experienced increasing popularity as new state-of-the-art performance was achieved in several natural language processing tasks. As these models are often extremely large, however, their use for applications within embedded devices may not be feasible. This thesis looks at two specific applications, Dialogue Systems and Sentiment Analysis.These offer great potential to enhance user experience, but at the same time, when running on embedded devices, cannot make use of the same models and algorithms designed for server-based execution, due to factors such as reduced memory capacity and limited computational power. Novel solutions that are resource-and user-aware are therefore needed.Dialogue Systems Research on building dialogue systems able to engage in natural sounding conversation with humans has attracted increasing attention in recent years. This has led to the rise of commercial conversational agents such as Google Home, Alexa and Siri situated on embedded devices, that enable users to interface with a wide range of underlying functionalities in a natural and seamless manner. However, in part due to memory and computational power constraints, these systems necessitate to either be placed on, or initiate frequent communication with, a server in order to process the users' queries. When placed on embedded systems, this communication may act as a bottleneck, resulting in delays as well as in the halt of the system should the network connection be lost or unavailable.Moreover, despite the rise of generative models such as ChatGPT, retrieval-based dialogue systems remain a promising approach due to their ability to deliver syntactically rich and informative responses while allowing for greater control on the responses that the model can provide, which may be critical in some applications. This thesis proposes a new framework for hardware-aware retrieval-based dialogue systems based on the Dual-Encoder architecture, coupled with a clustering method to group candidates pertaining to a same conversation, that reduces storage capacity and computational power requirements.
xi xiiSentiment Analysis The availability of new datasets and deep learning techniques have led to a surge of effort directed towards sentiment analysis research. However, little attention has been given to the development of models that are not only accurate, but also suitable for user-specific use or geared towards resourceconstrained devices. State-of-the-art models often have tens of millions of parameters which make it unfeasible to deploy such solutions on devices characterized by limited memory and computational power. This work explores the concept of software-hardware co-design and propose a methodical procedure to select the most desirable model taking into consideration application constraints described in terms of memory and latency. In doing so, it shows how fully utilizing the feature extraction capabilities of large pre-trained language models can close the gap between the ...