In this paper, we present a parallel Image-to-Mesh Conversion (I2M) algorithm with quality and fidelity guarantees achieved by dynamic point insertions and removals. Starting directly from an image, it is able to recover the isosurface and mesh the volume with tetrahedra of good shape. Our tightly-coupled shared-memory parallel speculative execution paradigm employs carefully designed contention managers, load balancing, synchronization and optimizations schemes which boost the parallel efficiency with little overhead: our single-threaded performance is faster than CGAL, the state of the art sequential mesh generation software we are aware of. The effectiveness of our method is shown on Blacklight, the Pittsburgh Supercomputing Center's cache-coherent NUMA machine, via a series of case studies justifying our choices. We observe a more than 82% strong scaling efficiency for up to 64 cores, and a more than 95% weak scaling efficiency for up to 144 cores, reaching a rate of 14.7 Million Elements per second. To the best of our knowledge, this is the fastest and most scalable 3D Delaunay refinement algorithm.