Polypeptides with multiple enzyme domains, such as type I polyketide synthases, produce chemically complex compounds that are difficult to produce via conventional chemical synthesis and are often pharmaceutically or otherwise commercially valuable. Engineering polyketide synthases, via domain swapping and/or site directed mutagenesis, in order to generate novel polyketides, has tended to produce either low yields of product or no product at all. The success of such experiments may be limited by our inability to predict the key functional residues and boundaries of protein domains. Computational tools to identify the boundaries and the residues determining the substrate specificity of domains could reduce the trial and error involved in engineering multi-domain proteins. In this study we use statistical coupling analysis to identify networks of co-evolving residues in type I polyketide synthases, thereby predicting domain boundaries. We extend the method to predicting key residues for enzyme substrate specificity. We introduce bootstrapping calculations to test the relationship between sequence length and the number of sequences needed for a robust analysis. Our results show no simple predictor of the number of sequences needed for an analysis, which can be as few as a hundred and as many as a few thousand. We find that polyketide synthases contain multiple networks of co-substituting residues: some are intradomain but most multiple domains. Some networks of coupled residues correlate with specific functions such as the substrate specificity of the acyl transferase domain, the stereo chemistry of the ketoreductase domain, or domain boundaries that are consistent with experimental data. Our extension of the method provides a ranking of the likely importance of these residues to enzyme substrate specificity, allowing us to propose residues for further mutagenesis work. We conclude that analysis of co-evolving networks of residues is likely to be an important tool for re-engineering multi-domain proteins.