“…Therefore, similar time structure (same musical note) presented at the observation layer will drive the top layer state to similar locations in state space, while differences in the input time structure will push the top layer state mean values to different points in the space, creating invariant representations (clusters) for musical data as we have illustrated in previous works [17], [18]. This multilayer state model is still linear, and can be trained using recursive state estimators since it is a special case of the system model defined in the Kalman Filter [19], which further exploits computational efficiency.…”