“…In the special case of an RL FNN controller C interacting with a deterministic, predictable environment, a separate FNN called M can learn to become C's world model through system identification, predicting C's inputs from previous actions and inputs (e.g., Werbos, 1981Werbos, , 1987Munro, 1987;Jordan, 1988;Werbos, 1989b,a;Robinson and Fallside, 1989;Jordan and Rumelhart, 1990;Schmidhuber, 1990d;Narendra and Parthasarathy, 1990;Werbos, 1992;Gomi and Kawato, 1993;Cochocki and Unbehauen, 1993;Levin and Narendra, 1995;Miller et al, 1995;Ljung, 1998;Prokhorov et al, 2001;Ge et al, 2010). Assume M has learned to produce accurate predictions.…”