Predictive coding (PC) is a theory in cognitive/computational neuroscience which explains cortical functions with a hierarchical process of minimising prediction errors. It provides a neuronal scheme for implementing Bayesian inference in the brain to recover the hidden state of the world from sensory input (passive inference) and to select actions to reach the goals the agent has (active inference). Since its discovery, predictive coding has been found to be a unifying theory explaining more and more cognitive functions, including perception, attention, and action planning. In this literature thesis, I review and discuss how PC can be used also as a powerful tool to understand working memory (WM), an essential function for executive control.% Giving a brief introduction to working memory and current PC frameworks, I start with an overview of how WM might fit within predictive coding frameworks.Specifically, I try to explore how PC frameworks help with explaining the following questions: 1. how is WM maintained and updated? 2. What is the relationship between attention and WM and how do they interact? 3. why does WM have limited capacity? and 4. why is WM hierarchical? By treating WM coding as part of the state inference process, we can explain WM maintenance as the stage where the state variables remain the same when there is no new evidence. WM updates, on the other hand, correspond to belief updating when new evidence arises. Since there is a trade-off between prediction complexity and accuracy during state inference, the limited capacity of WM may be an emergent property to ensure a certain level of accuracy. In a process of active inference, attention helps the agent to select actions that reduce uncertainties about the world where selected actions give rise to observations that are used to update WM. This delineates the roles of WM and attention and clarifies the mechanism of their interactions. Finally, hierarchical PC can account for the hierarchical representation of working memory in the brain where each level of WM corresponds to each level of inferred states. Based on the reviewed literature, I summarised three important ingredients for modelling WM which are temporal depth, goals and hierarchy. Future work on modelling would be to clarify whether WM is a separable component in PC, which variable WM is actually represented in PC and where in the hierarchy WM is generated and maintained. In summary, through the lens of variational Bayesian inference, WM can be assessed in the process of evidence accumulation simulated in a deep hierarchical predictive coding model. With action selection incorporated, this naturally explains WM as an emergent property of goal-directed behaviour, manifested by hierarchical inference of the brain through the minimization of expected free energy. Modelling WM in PC frameworks provides alternative explanations to some long-standing questions about WM and may help with resolving the conflicts between WM theories, for example, the ones that propose either persistent or sparse neuronal activity during WM. It may also help with developing computational tools to improve treatments for brain disorders such as schizophrenia and facilitate artificial intelligence in coping with a world full of uncertainties.