Well labeled natural language corpus data is essential for most natural language processing techniques, especially in specialized fields. However, cohort biases remain a significant challenge in machine learning. The narrow origin of data sampling or human annotators in cohorts is a prevalent issue for machine learning researchers due to its potential to induce bias in the final product. During the development of the CryptoLin corpus for another research project, the authors became concerned about the potential influence of cohort bias on the selection of annotators. Therefore, this paper addresses the question of whether cohort diversity improves the labeling result through the implementation of a repeated annotator process, involving two annotator cohorts and a statistically robust comparison methodology. The utilization of statistical tests, such as the Chi-Square Independence test for absolute frequency tables, and the construction of confidence intervals for Kappa point estimates, facilitates a rigorous analysis of the differences between Kappa estimates. Furthermore, the application of a two-proportion z-test to compare the accuracy scores of UTAD and IE annotators for various pre-trained models, including Vader Sentiment Analysis, TextBlob Sentiment Analysis, Flair NLP library, and FinBERT Financial Sentiment Analysis with BERT, contributes to the advancement of knowledge in this field. The paper utilizes Cryptocurrency Linguo (CryptoLin), a corpus containing 2683 cryptocurrency-related news articles spanning more than three years,and compares two different selection criteria for the annotators. CryptoLin was annotated twice with discrete values representing negative, neutral, and positive news respectively. The first annotation was done by twenty-seven annotators from the same cohort. Each news title was randomly assigned and blindly annotated by three human annotators. The second annotation was carried out by eighty-three annotators from three cohorts. Each news title was randomly assigned and blindly annotated by three human annotators, one in each different cohort. In both annotations, a consensus mechanism using simple voting was applied. The first annotation used the same cohort with students from the same nationality and background. The second used three cohorts with students from a very diverse set of nationalities and educational backgrounds. The results demonstrate that manual labeling done by both groups was acceptable according to inter-rater reliability coefficients Fleiss's Kappa, Krippendorff's Alpha, and Gwet's AC1. Preliminary analysis utilizing Vader, Textblob, Flair, and FinBERT confirmed the utility of the data set labeling for further refinement of sentiment analysis algorithms. Our results also highlight that the more diverse annotator pool performed better in all measured aspects.