Physical Unclonable Functions (PUFs) enable the generation of device-unique, on-chip, and digital identifiers by exploiting the manufacturing process variation. The past decade has seen an extensive effort in PUF design. Yet, most PUF constructions are regarded as stand-alone hardware building blocks. In contrast, we propose PUF constructions that are tightly integrated into the design of a micro-processor. The proposed PUFs are essentially a collection of time-to-digital converters that are integrated into the custom instruction or memory-mapped interface of a processor. Therefore, the processor can issue the PUF challenges and collect the associated responses using instruction executions. This integration enables practical, run-time physical authentication and it allows flexible post-processing mechanisms using software. In this article, we describe the design, implementation, and the performance analysis details of such hardware/software co-designed authentication mechanisms on FPGAs. We propose two variants of the PUF architecture: a synchronous module that requires minimal place and route constraints utilizing the common clock of the SoC, and an asynchronous alternative that is independent of the clock but realized with a controlled placement. We implemented the synchronous architecture on the Altera Cyclone-IV FPGAs and performed a large-scale characterization on 55 boards. The asynchronous design is realized on the Xilinx Virtex-5 FPGAs and tested on 100 boards. Measurements reveal that the proposed solutions can authenticate trillions of devices and provide better performance than the ring oscillator based alternative.[1], tamper-resistant key storage [2], bitstream protection [3], and IP binding [4].