Software engineering is a data-driven discipline and an integral part of data science. The introduction of big data systems has led to a great transformation in the architecture, methodologies, knowledge domains, and skills related to software engineering. Accordingly, education programs are now required to adapt themselves to up-to-date developments by first identifying the competencies concerning big data software engineering to meet the industrial needs and follow the latest trends. This paper aims to reveal the knowledge domains and skill sets required for big data software engineering and develop a taxonomy by mapping these competencies. A semi-automatic methodology is proposed for the semantic analysis of the textual contents of online job advertisements related to big data software engineering. This methodology uses the latent Dirichlet allocation (LDA), a probabilistic topic-modeling technique to discover the hidden semantic structures from a given textual corpus. The output of this paper is a systematic competency map comprising the essential knowledge domains, skills, and tools for big data software engineering. The findings of this paper are expected to help evaluate and improve IT professionals' vocational knowledge and skills, identify professional roles and competencies in personnel recruitment processes of companies, and meet the skill requirements of the industry through software engineering education programs. Additionally, the proposed model can be extended to blogs, social networks, forums, and other online communities to allow automatic identification of emerging trends and generate contextual tags. INDEX TERMS Big data software engineering, competency map, knowledge domains and skill sets, topic modeling, latent Dirichlet allocation. with the Department of Informatics, Karadeniz Technical University, from 2001 to 2014, where he has been an Instructor with the Center for Research and Application in Distance Education, since 2015. His research interests include trend analysis, sentiment analysis, statistical topic modeling, engineering education, data mining, machine learning, big data analytics, and text mining. NERGIZ ERCIL CAGILTAY received the degree in computer engineering and the Ph.D. degree in instructional technologies from Middle East Technical University, Turkey. She worked for commercial and government organizations as a Project Manager for more than eight years in Turkey. She was also with the Indiana University Digital Library Program as a System Analysis and a Programmer for four years. She has been with the Software Engineering Department, Atilim University, Turkey, since 2003, as an Associate Professor. Her main research interests include information systems, medical information systems, engineering education, instructional systems technologies, distance education, e-learning, and medical education.