With the advent of digital technology, it is more common that committed crimes or legal disputes involve some form of speech recording where the identity of a speaker is questioned [1]. In face of this situation, the field of forensic speaker identification has been looking to shed light on the problem by quantifying how much a speech recording belongs to a particular person in relation to a population. In this work, we explore the use of speech embeddings obtained by training a CNN using the triplet loss. In particular, we focus on the Spanish language which has not been extensively studies. We propose extracting the embeddings from speech spectrograms samples, then explore several configurations of such spectrograms, and finally, quantify the embeddings quality. We also show some limitations of our data setting which is predominantly composed by male speakers. At the end, we propose two approaches to calculate the Likelihood Radio given out speech embeddings and we show that triplet loss is a good alternative to create speech embeddings for forensic speaker identification.Keywords Triplet Loss • Speaker Identification • Forensic • Spanish Recently, Triplet loss has been proposed for speech tasks, for instance: the speaker verification task [6], for speaker turn [7], speech emotion classification [8], among other tasks. However, its applicability to forensic speaker identification has not been explored, particularly for the Spanish language. Forensic Speaker Identification (FSI) focuses on gathering and quantifying the evidence that will be presented in a court. FSI addresses the question if a specific recording registers or not, speech produced by a specific person [9,10]. The more basic scenario in FSI consists of two sample speech recordings, a reference sample and a questioned sample. For the reference sample, we always know the identity of