With increasing amount of strong motion data, Ground Motion Prediction Equation (GMPE) developers are able to quantify empirical site amplification functions (2) from GMPE residuals, for use in sitespecific Probabilistic Seismic Hazard Assessment. In this study, we first derive a GMPE for 5% damped Pseudo Spectral Acceleration (g) of Active Shallow Crustal earthquakes in Japan with 3.4 ≤ ≤ 7.3 and 0 ≤ < 600. Using k-mean spectral clustering technique, we then classify our estimated 2 (= 0.01-2) of 588 well-characterized sites, into 8 site clusters with distinct mean site amplification functions, and within-cluster site-to-site variability ~50% smaller than the overall dataset variability (φS2S). Following an evaluation of existing schemes, we propose a revised data-driven site classification characterized by kernel density distributions of Vs30, Vs10, H800, and predominant period (TG) of the site clusters