Eating, central to human existence, is influenced by a myriad of factors, including nutrition, health, personal taste, cultural background, and flavor preferences. The challenge of devising personalized meal plans that effectively encompass these dimensions is formidable. A crucial shortfall in many existing meal-planning systems is poor user adherence, often stemming from a disconnect between the plan and the user’s lifestyle, preferences, or unseen eating patterns. Our study introduces a pioneering algorithm, CFRL, which melds reinforcement learning (RL) with collaborative filtering (CF) in a unique synergy. This algorithm not only addresses nutritional and health considerations but also dynamically adapts to and uncovers latent user eating habits, thereby significantly enhancing user acceptance and adherence. CFRL utilizes Markov decision processes (MDPs) for interactive meal recommendations and incorporates a CF-based MDP framework to align with broader user preferences, translated into a shared latent vector space. Central to CFRL is its innovative reward-shaping mechanism, rooted in multi-criteria decision-making that includes user ratings, preferences, and nutritional data. This results in versatile, user-specific meal plans. Our comparative analysis with four baseline methods showcases CFRL’s superior performance in key metrics like user satisfaction and nutritional adequacy. This research underscores the effectiveness of combining RL and CF in personalized meal planning, marking a substantial advancement over traditional approaches.