Abstract-While a physical environment interacts with a human individual through the brain's sensors and effectors, internal representations inside the skull-closed brain autonomously emerge and adapt throughout the lifetime. By "skull-closed", we mean that the brain inside the skull is off limit to all teachers in the external physical environment, except the brain's sensory ends and motor ends. We present the Where-What Network 6 (WWN-6), which has realized our goal of fully autonomous development inside a closed network "skull". This means that the human programmer is not allowed to handcraft the internal representation for any fixed extra-body concepts. For example, the meanings of specific values of location or type concept are not known during the programming time. Such meanings emerge through associations imbedded in 'postnatal" experience. This capability is especially challenging when one considers the fact that most elements in the sensory ends are irrelevant to the signals at the effector ends (e.g., many background pixels). How does each vector value in the effector find its corresponding pattern in the correct patch of the sensory image? We outline this autonomous learning theory for the brain and present how the developmental program (DP) of WWN-6 enables the network to perform for attending and recognizing objects in complex backgrounds using natural video. The inputs to the agent (i.e., the network) are not artificially synthesized images as WWNs used before, but drawn from continuous video taken from natural settings where, in general, everything is moving.