Abstract:Research in Textual Entailment (TE) has been widely conducted, mainly in natural language based systems, since TE can provide solutions to semantic problems. Usually, the researchers focus on method improvement, hence, they use standard data sets, which are specific to a particular language, primarily in English. For low-resource languages, it is very difficult to find data sets to test the TE systems. Therefore, in this paper we propose a model to extract data from the web to serve as data set for TE systems. The model can be used for crosslanguage domains with simple modifications. Two datasets are created and used to evaluate the model, i.e. DS-100-R, which contains facts, and DS-100-W, which contains non-facts. The model produces a set of sentences that are expected to be relevant to the queries. Some algorithms are created to address problems that arise during experiments. Based on the evaluation, the model accuracy for DS-100-R dataset is 79.0%, and for DS-100-W dataset is 70.0%. Hence, the overall model accuracy is 74.5%.