Traditionally, we have been attempting to extract useful features from the massive amount of data generated daily. However, following the legal constraints regarding personal data protection and the challenges of potential data biases and manipulation, artificial intelligence that relies less on big data and more on reasoning ability has become an emerging trend. This paper demonstrates how to estimate age and gender using names only. The proposed two-layer comparative model was trained on Taiwanese names, and its generalizability was further examined on bilingual and cross-border names. By considering additional features of the contextual environment, the model achieves high accuracy in age and gender prediction on Taiwanese and bilingual names. However, the prediction results for ethnic-Chinese Malaysian names (in English) do not reach the same level. This is due to the linguistic differences among Chinese dialects; the features trained on Taiwanese names cannot be directly applied to English names in Malaysia. This study illustrates a path for accomplishing prediction tasks using minimal data and highlights a future possibility for further research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.