This study investigates the tonal variant description of the official dialect in China (Putonghua) as a factor in the coevolution of dialects. Three sociophonetic factors, target tone familiarity, tonal variant familiarity, and tonal inventory size, are included in order to raise interesting theoretical questions concerning the role of familiarity and dialect experience in sound change. Standard Putonghua tones are manipulated in height and shape in order to create systematically varying stimuli. Speakers from three Chinese dialect groups, Beijing Mandarin, Shanghai Wu, and Guangzhou Cantonese, are invited to rate the applicability of a description of pitch contour and height to the stimuli. The three dialects have different tonal inventory size, and their native speakers have different levels of familiarity with Putonghua tone or Putonghua tonal variants. The above three sociophonetic factors make different predictions about listeners' performances. The findings of the experimental analysis of data confirm the role of tonal variant familiarity in predicting participants' descriptive decisions on tone height variants. Tonal variant familiarity is also combined with tone inventory size to explain the assignment of descriptions of tone shape variations. This suggests that when variations still follow the phonetic pattern of the tone distribution of the Putonghua tonal system, listeners give phonetic patterns the primary role in acoustic decisions but still benefit from their dialect experiences in making more precise acoustic decisions. It also suggests that when variations violate the phonetic features of the target tonal system, they may depend on familiarity with the individual variant. This study applies an innovative sociophonetic method by conducting a perception experiment online with a self-paced procedure. The findings here are crucial for examining the relationship between sociophonetic factors and listeners' acoustic decisions and the cultural coevolution of cross-dialect tonal variation. The findings here also give support to the validity of the current web-based crowd perception experiment design and are also needed to facilitate research under restricted conditions, such as a pandemic situation.