Abstract. Text mining research is becoming an important topic in biology with the aim to extract biological entities from scientific papers in order to extend the biological knowledge. However, few thorough studies on text mining and applications are developed for plant molecular biology data, especially rice, thus resulting a lack of datasets available to train models able to detect entities such as genes, proteins and phenotypic traits. Since there is rare benchmarks for rice, we have to face various difficulties in exploiting advanced machine learning methods for accurate analysis of rice bibliography. In this article, we developed a new training datasets (Oryzabase) as the benchmark. Then, we evaluated the performance of several current approaches to find a methodology with the best results and assigned it as the state of the art method for our own technique in the future. We applied Name Entities Recognition (NER) tagger, which is built from a Long Short Term Memory (LSTM) model, and combined with Conditional Random Fields (CRFs) to extract information of rice genes and proteins. We analyzed the performance of LSTM-CRF when applying to the Oryzabase dataset and improved the results up to 86% in F1. We found that on average, the result from LSTM-CRF is more exploitable with the new benchmark.