“…The data labeled with user's gender was used in the reviewed studies to build and evaluate classification models based on features describing the text in tweets (e.g., n-grams, word embeddings, hashtags, URLs) 36,37,44,[47][48][49][50]54,58,61,65,66,71,75,83,88,92,100 , or in the users' profile metadata (e.g., user names, bio, followers, users followed) 28,30,31,51,59,64,91,94,101 , a combination of their profile metadata and tweets 31,33,55,62,72,86,87,89,96 , images 31,59,87,91,95 . There was one study from Japan that included the user's geographic information under the assumption that, culturally, a person of a certain demographic is more likely to frequent specific places 68 .…”