Machine learning and quantum computing are two technologies each with the potential for altering how computation is performed to address previously untenable problems. Kernel methods for machine learning are ubiquitous for pattern recognition, with support vector machines (SVMs) being the most well-known method for classification problems. However, there are limitations to the successful solution to such problems when the feature space becomes large, and the kernel functions become computationally expensive to estimate. A core element to computational speed-ups afforded by quantum algorithms is the exploitation of an exponentially large quantum state space through controllable entanglement and interference.Here, we propose and experimentally implement two novel methods on a superconducting processor. Both methods represent the feature space of a classification problem by a quantum state, taking advantage of the large dimensionality of quantum Hilbert space to obtain an enhanced solution. One method, the quantum variational classifier builds on [1, 2] and operates through using a variational quantum circuit to classify a training set in direct analogy to conventional SVMs. In the second, a quantum kernel estimator, we estimate the kernel function and optimize the classifier directly. The two methods present a new class of tools for exploring the applications of noisy intermediate scale quantum computers [3] to machine learning.The intersection between machine learning and quantum computing has been dubbed quantum machine learning, and has attracted considerable attention in recent years [4][5][6]. This has led to a number of recently proposed quantum algorithms [1,2,[7][8][9]. Here, we present a quantum algorithm that has the potential to run on near-term quantum devices. A natural class of algorithms for such noisy devices are short-depth circuits, which are amenable to error-mitigation techniques that reduce the effect of decoherence [10,11]. There are convincing arguments that indicate that even very sim- ple circuits are hard to simulate by classical computational means [12,13]. The algorithm we propose takes on the original problem of supervised learning: the construction of a classifier. For this problem, we are given data from a training set T and a test set S of a subset Ω ⊂ R d . Both are assumed to be labeled by a map m : T ∪ S → {+1, −1} unknown to the algorithm. The training algorithm only receives the labels of the training data T . The goal is to infer an approximate map on the test setm : S → {+1, −1} such that it agrees with high probability with the true map m( s) =m( s) on the test data s ∈ S. For such a learning task to be meaningful it is assumed that there is a correlation between the labels given for training and the true map. A classical approach to constructing an approximate labeling function uses socalled support vector machines (SVMs) [14]. The data gets mapped non-linearly to a high dimensional space, the feature space, where a hyperplane is constructed to separate the labeled samples. ...