This paper discusses the assumptions of a Multi-Layer Transcription Model (hereinafter: MLTM). The solution presented is an advanced grapheme-to-phoneme (G2P) conversion method that can be implemented in technical applications, such as automatic speech recognition and synthesis systems. The features of MLTM also facilitate the application of text-to-transcription conversion in linguistic research. The model presented here is the basis for multi-step processing of the orthographic representation of words with those being transcribed gradually. The consecutive stages of the procedure include, among other things, identification of multi-character phonemes, voicing status change, and consonant clusters simplification. The multi-layer model described in this paper makes it possible to assign individual phonetic processes (for example assimilation), as well as other types of transformation, to particular layers. As a result, the set of rules becomes more transparent. Moreover, the rules related to any process can be modified independently of the rules connected with other forms of transformation, provided that the latter have been assigned to a different layer. These properties of the multi-layer transcription model in question provide crucial advantages for the solutions based on it, such as their flexibility and transparency. There are no assumptions in the model about the applicable number of layers, their functions, or the number of rules defined in each layer. A special mechanism used for the implementation of the MLTM concept enables projection of individual characters onto either a phonemic or a phonetic transcript (obtained after processing in the final layer of the MLTM-based system has been completed). The solution presented in this text has been implemented for the Polish language, however, it is not impossible to use the same model for other languages.