The rapid pace of development over the last few decades in the domain of machine learning mirrors the advances made in the field of quantum computing. It is natural to ask whether the conventional machine learning algorithms could be optimized using the present-day noisy intermediate-scale quantum technology. There are certain computational limitations while training a machine learning model on a classical computer. Using quantum computation, it is possible to surpass these limitations and carry out such calculations in an optimized manner. This study illustrates the working of the quantum support vector machine classification model which guarantees an exponential speed-up over its typical alternatives. This research uses the quantum SVM model to solve the classification task of a malignant breast cancer diagnosis. This study also demonstrates a comparative analysis of distinct forms of SVM algorithms concerning their time complexity and performances on standard evaluation metrics, namely accuracy, precision, recall, and F1-score, to exemplify the supremacy of quantum SVM over its conventional variants.
In recent years, we have witnessed a growing interest in data science not only from academia but particularly from companies investing in data science platforms to analyze large amounts of data. In this process, a myriad of data science artifacts, such as datasets and pipeline scripts, are created. Yet, there has so far been no systematic attempt to holistically exploit the collected knowledge and experiences that are implicitly contained in the specification of these pipelines, e.g., compatible datasets, cleansing steps, ML algorithms, parameters, etc. Instead, data scientists still spend a considerable amount of their time trying to recover relevant information and experiences from colleagues, trial and error, lengthy exploration, etc. In this paper, we, therefore, propose a scalable system (KGLiDS) that employs machine learning to extract the semantics of data science pipelines and captures them in a knowledge graph, which can then be exploited to assist data scientists in various ways. This abstraction is the key to enabling Linked Data Science since it allows us to share the essence of pipelines between platforms, companies, and institutions without revealing critical internal information and instead focusing on the semantics of what is being processed and how. Our comprehensive evaluation uses thousands of datasets and more than thirteen thousand pipeline scripts extracted from data discovery benchmarks and the Kaggle portal and shows that KGLiDS significantly outperforms state-of-the-art systems on related tasks, such as dataset recommendation and pipeline classification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.