Image classification is an important task for the organization and analysis of visual information. According to the literature one of the most important concepts is the visual word ; a visual element that represents a set of visual-similar regions. The Bag-of-Visual Words (BoVW) is one of the most widely used approaches in High Level Computer Vision (HLCV). The BoVW is an histogram of the occurrence of visual words in each image, which is in some way inspired by the Bag-of-Words (BoW) used in Natural Language Processing (NLP). In spite of the success of BoVW, it has the same limitations of BoW (e.g., the overlook of the spatial context). In this research proposal we bear in mind the successful evidence of visual words in HLCV, and we take the analogy of visual-textual words to a new higher level. This is, by designing methods inspired in NLP, we aim to consider contextual (e.g., spatial, sequential), and high level (e.g., semantic) information among visual words. However, bringing NLP like approaches pose several nontrivial problems, for example: i) the definition of analogous attributes (visual-textual), ii) a suitable strategy to interpret images; documents can be read only in one direction, but in images we have a 2D plane without an specific way to read them, iii) the way to extract high level information (e.g., semantic). This paper presents the proposed research methodology and through preliminary results, we provide strong evidence of the feasibility of this research. For this, a popular NLP technique is used to improve the BoVW; the Bag-of-Visual n-grams (BoVN). The idea is evaluated in the challenging task of Histopathology image classification overcoming the BoVW and an state-of-the-art approach based in language models.