With the proliferation of ultra-high-speed mobile networks and internet-connected devices, along with the rise of artificial intelligence, the world is generating exponentially increasing amounts of data-data that needs to be processed in a fast, efficient and 'smart' way. These developments are pushing the limits of existing computing paradigms, and highly parallelized, fast and scalable hardware concepts are becoming progressively more important. Here, we demonstrate a computational specific integrated photonic tensor core-the optical analog of an ASIC-capable of operating at Tera-Multiply-Accumulate per second (TMAC/s) speeds. The photonic core achieves parallelized photonic inmemory computing using phase-change memory arrays and photonic chip-based optical frequency combs (soliton microcombs). The computation is reduced to measuring the optical transmission of reconfigurable and non-resonant, i.e. broadband, passive components operating at a bandwidth exceeding 14 GHz, limited only by the speed of the modulators and photodetectors. Given recent advances in hybrid integration of soliton microcombs at microwave line rates, ultra-low loss silicon nitride waveguides, and high speed on-chip detectors and modulators, our approach provides a path towards full CMOS wafer-scale integration of the photonic tensor core. While we focus on convolution processing, more generally our results indicate the major potential of integrated photonics for parallel, fast, efficient and wafer-scale manufacturable computational hardware in demanding AI applications such as autonomous driving, live video processing, and next generation cloud computing services.The increased demand for machine learning on very large datasets 1 and the growing offering of artificial intelligence services on the cloud 2-4 has driven a resurgence in custom hardware designed to accelerate multiply and accumulate (MAC) computations-the fundamental mathematical element needed for matrix-vector multiplication (MVM) operations. Whilst various custom silicon computing hardware (i.e. FPGAs 5 , ASICs 6 , and GPUs 7 ) have been developed to improve computational throughput and efficiency, they still depend on the same underlying electrical components which are fundamentally limited in both speed and energy by Joule heating, RF crosstalk, and capacitance 8 . The last of these (capacitance) dominates energy consumption and limits the maximum operating speeds in neural network hardware accelerators 9 since the movement of data (e.g. trained network weights), rather than arithmetic operations, requires the charging and discharging of chip-level metal interconnects. Thus, improving the efficiency of logic gates at the device level provides diminutive returns in such applications, if the flow of data during computation is not simultaneously addressed 10 . Even recent developments in the use of memristive crossbar arrays [11][12][13] to compute in the analog domain, whilst promising, do not have the potential for parallelizing the MVM operations (save for physically repli...