Retweet prediction is an important task related to different problems such as information spreading analysis, the automatic detection of fake news, social media monitoring, etc. In this study we explore the possibilities of retweet prediction based on heterogeneous data sources. In order to classify the tweet according to the amount of retweets, we combine features extracted from the multilayer network and the text. More specifically, we introduce a multilayer framework that proposes the multilayer network representation of Twitter. This formalism captures different users' actions and complex relationships as well as other key properties of communication on Twitter. We select a set of local network measures from each layer and construct a set of multilayer network features. In addition, we adopt a BERT-based language model, namely Cro-CoV-cseBERT to capture high-level semantics and structure of tweets as a set of text features. Then, we train six machine learning (ML) algorithms: random forest, multilayer perceptron, light gradient boosting machine, category embedding model, neural oblivious decision ensembles and attentive interpretable tabular learning model in the task of retweet prediction. We compare the performance of all six algorithms in three different setups (i) using only text features, (ii) using only multilayer network features and (iii) using both sets of features. We evaluate all setups in terms of standard evaluation measures i.e. precision, recall, F1-score and accuracy. For this task, we first prepare and use an empirical dataset of 199,431 tweets in the Croatian language posted during the period between January 1, 2020 and May 31, 2021. Our results indicate that by integrating multilayer network features with text features the prediction model would perform better than using just one set of features.