For face naming in TV series or movies, a typical way is using subtitles/script alignment to get the time stamps of the names, and tagging them to the faces. We study the problem of face naming in videos when subtitles are not available. To this end, we divide the problem into two tasks: face clustering which groups the faces depicting a certain person into a cluster, and name assignment which associates a name to each face. Each task is formulated as a structured prediction problem and modeled by a hidden conditional random field (HCRF) model. We argue that the two tasks are correlated problems whose outputs can provide prior knowledge of the target prediction for each other. The two HCRFs are coupled in a unified graphical model called coupled HCRF where the joint dependence of the cluster labels and face name association is naturally embedded in the correlation between the two HCRFs. We provide an effective algorithm to optimize the two HCRFs iteratively and the performance of the two tasks on real-world data set can be both improved.