The pursuit of catalyst discovery through machine learning has garnered substantial attention in recent years. The effectiveness of such a framework in uncovering appropriate catalysts hinges greatly on the quality and quantity of data used to train the machine learning models. In this article, we report our work in curating a water−gas shift reaction database from the literature between 2013 and 2021, focusing on the usage of noble metal catalysts for fuel cell applications. Our investigation yields 8908 individual records retrieved from a total of 82 publications. The database is composed of 99 features, including 10 different base metals, 27 supports, 16 promoters, 32 preparation methods, 13 reaction conditions, and carbon monoxide conversion percentage. A machine learning approach with Shapley feature importance methodology is used to evaluate the effects of catalytic compositions and reaction conditions on carbon monoxide conversion. In our previous work, we proposed a theory-guided machine learning model, which was designed to obey the chemical reaction principles, including the thermodynamic constraint, while predicting the carbon monoxide conversion percentage. This work shows that the proposed theory-guided machine learning model outperforms other state-of-the-art machine learning models and furthermore opens up promising possibilities for finding suitable catalysts.