Reference architectures for big data and machine learning include not only interconnected building blocks but important considerations (among others) for scalability, manageability and usability issues as well. Leveraging on such reference architectures, the automated deployment of distributed toolsets and frameworks on various clouds is still challenging due to the diversity of technologies and protocols. The paper focuses particularly on the widespread Apache Spark cluster with Jupyter as the particularly addressed framework, and the Occopus cloud-agnostic orchestrator tool for automating its deployment and maintenance stages. The presented approach has been demonstrated and validated with a new, promising text classification application on the Hungarian academic research infrastructure, the OpenStack-based MTA Cloud. The paper explains the concept, the applied components, and illustrates their usage with real use-case measurements. K E Y W O R D S big data, cloud, machine learning, parallel and distributed execution, reference architectures, text classification 1 INTRODUCTION Research in different scientific fields (e.g., natural and social sciences) often require extremely huge computational resources and storage capacity to handle Big Data problems. Traditional sequential data processing algorithms are not sufficient to analyze this large volume of data. For efficient processing and analysis new approaches, techniques and tools are necessary. Moreover, cloud infrastructures and services are becoming even more popular and are nowadays widely used to address the computation and storage requirements of many scientific and commercial Big Data applications. Their widespread usage is a consequence of the dynamic and scalable nature of the services maintained by cloud providers.However, there are several challenges that a data scientist has to face when planning the use or deployment of any Big Data platform on cloud(s). 1The selection of the appropriate cloud provider(s) is always a tiresome process since several factors has to be considered, even when only a generic Infrastructure-as-a-Service (IaaS) provider is required: private (e.g., Agrodat Cloud 2 ), federated (e.g., MTA Cloud 3 or pan-European EGI FedCloud 4 ), or public cloud (e.g., Amazon AWS 5 ).The Hungarian Academy of Sciences (MTA) provides free IaaS cloud (MTA cloud) services for research communities and easy to use, dynamic infrastructures adapted to the actual project requirements. MTA Cloud was established to accelerate research for the scientists of MTA. Nearly 100 projects have been deployed on MTA Cloud since its opening and more and more projects require to use Big Data and machine learning applications.However, the large number of artificial intelligence (AI) tools available for clouds are very complex, and their proper deployment and configuration