Compiler-in-the-Loop (CiL) architecture exploration is widely accepted as being the right track for fast development of Application Specific Instruction-set Processors (ASIP). In this context, both, automatic application-specific Instruction Set Extension (ISE) and code generation by a compiler have received huge attention in the past. Together, both techniques enable processor designers to quickly adapt a processor's Instruction Set Architecture (ISA) to the needs of a certain set of applications and to provide an appropriate high-level programming model. This manuscript presents a tool flow for identification and utilization of Custom Instructions (CIs) during architecture exploration in an automated fashion. By embedding this tool flow in an industry-proven architecture exploration framework, a methodology for simultaneous compiler/architecture co-exploration is derived. The advantage of the presented tool flow lies in its ability to develop a reusable ISA and an appropriate compiler for a set of applications and therefore to support the design of programmable architectures. In addition, ASIP architecture exploration is effectively improved since time consuming application analysis and compiler retargeting is automated. Through compilation and simulation of several benchmarks in accordance to extended ISAs, reliable feedback on speedup, code size and usability of identified CIs is provided. Furthermore, results on area consumption for extended ISAs are presented in order to compare the obtained speedup with the invested hardware effort of new CIs.Extension of Conference Paper: An earlier version [66] of this paper appeared in the proceedings of the 5th IEEE/ACM international conference on hardware/software codesign and system synthesis. It introduces a code-generator named CBurg which is now applied for implementing a code-selector engine of the CoSy compiler system from ACE. Additionally, a methodology for recurrence-aware identification of custom instructions is presented that builds on the data flow graphs from the compiler's intermediate representation. At the same time, it produces a code-generator description which is used to retarget the compiler backend to a new instruction set.