This paper presents initial results of a research study in the optimal scheduling (i.e., job sequencing) in Reentrant Manufacturing Lines (RML), motivated by applications in semiconductor manufacturing. In particular, a simple benchmark RML is utilized, and the optimal scheduling policy is analyzed for an infinite horizon discounted cost problem formulation. The optimality equation and condition are derived, and optimal policy results are obtained for general non-negative one-stage cost functions (in the buffer size). Computational experiments are also performed using the Modified Policy Iteration algorithm. Preliminary experiments on the application of a Neuro-Dynamic Programming (NDP) method (i.e., Qlearning) to approximate the optimal scheduling policy are then presented, when linear and quadratic one-stage cost functions are considered. These experiments show that the Q-learning algorithm gradually approximates the optimal policy as the number of iterations increases and longer simulation lengths are utilized. However, the computational load required by the algorithm increases exponentially with the number of states. Results from this study represent an initial and exploratory research in the application of NDP methods to large-scale RML systems. More extensive research in both exact optimal results and efficient NDP schemes is in progress.