Recent theories of mindreading explain the recognition of action, intention, and belief of other agents in terms of generative architectures that model the causal relations between observables (e.g., observed movements) and their hidden causes (e.g., action goals and beliefs). Two kinds of probabilistic generative schemes have been proposed in cognitive science and robotics that link to a "theory theory" and "simulation theory" of mindreading, respectively. The former compares perceived actions to optimal plans derived from rationality principles and conceptual theories of others' minds. The latter reuses one's own internal (inverse and forward) models for action execution to perform a look-ahead mental simulation of perceived actions. Both theories, however, leave one question unanswered: how are the generative models - including task structure and parameters - learned in the first place? We start from Dennett's "intentional stance" proposal and characterize it within generative theories of action and intention recognition. We propose that humans use an intentional stance as a learning bias that sidesteps the (hard) structure learning problem and bootstraps the acquisition of generative models for others' actions. The intentional stance corresponds to a candidate structure in the generative scheme, which encodes a simplified belief-desire folk psychology and a hierarchical intention-to-action organization of behavior. This simple structure can be used as a proxy for the "true" generative structure of others' actions and intentions and is continuously grown and refined - via state and parameter learning - during interactions. In turn - as our computational simulations show - this can help solve mindreading problems and bootstrap the acquisition of useful causal models of both one's own and others' goal-directed actions.