Introduction to Support Vector Machines

A support vector machine (SVM) is a supervised learning technique from the field of machine learning applicable to both classification and regression.
Rooted in the Statistical Learning Theory developed by Vladimir Vapnik and co-workers at AT&T Bell Laboratories in 1995, SVMs are based on the principle of Structural Risk Minimization.

Non-linearly map the input space into a very high dimensional feature space (the “kernel trick”).

In the case of classification, construct an optimal separating hyperplane in this space (a maximal margin classifier); or
in the case of regression, perform linear regression in this space, but without penalising small errors.

Sewell (2005)

"The support vector machine (SVM) is a universal constructive learning procedure based on the statistical learning theory (Vapnik, 1995)."
Cherkassky and Mulier (1998)

"The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimensional feature space. In this feature space a linear decision surface is constructed."
Cortes and Vapnik (1995)

"Support Vector Machines (SVM) are learning systems that use a hypothesis space of linear functions in a high dimensional feature space, trained with a learning algorithm from optimisation theory that implements a learning bias derived from statistical learning theory."
Cristianini and Shawe-Taylor (2000)

"These techniques are then generalized to what is known as the support vector machine, which produces nonlinear boundaries by constructing a linear boundary in a large, transformed version of the feature space."
Hastie, Tibshirani and Friedman (2001)

"Support Vector Machines have been developed recently [34]. Originally it was worked out for linear two-class classification with margin, where margin means the minimal distance from the separating hyperplane to the closest data points. SVM learning machine seeks for an optimal separating hyperplane, where the margin is maximal. An important and unique feature of this approach is that the solution is based only on those data points, which are at the margin. These points are called support vectors. The linear SVM can be extended to nonlinear one when first the problem is transformed into a feature space using a set of nonlinear basis functions. In the feature space - which can be very high dimensional - the data points can be separated linearly. An important advantage of the SVM is that it is not necessary to implement this transformation and to determine the separating hyperplane in the possibly very-high dimensional feature space, instead a kernel representation can be used, where the solution is written as a weighted sum of the values of certain kernel function evaluated at the support vectors."
Horváth (2003) in Suykens et al.

"With their introduction in 1995, Support Vector Machines (SVMs) marked the beginning of a new era in the learning from examples paradigm. Rooted in the Statistical Learning Theory developed by Vladimir Vapnik at AT&T, SVMs quickly gained attention from the pattern recognition community due to a number of theoretical and computational merits. These include, for example, the simple geometrical interpretation of the margin, uniqueness of the solution, statistical robustness of the loss function, modularity of the kernel function, and overfit control through the choice of a single regularization parameter."
Lee and Verri (2002)

"Support Vector machines (SVM) are a new statistical learning technique that can be seen as a new method for training classifiers based on polynomial functions, radial basis functions, neural networks, splines or other functions. Support Vector machines use a hyper-linear separating plane to create a classifier. For problems that can not be linearly separated in the input space, this machine offers a possibility to find a solution by making a non-linear transformation of the original input space into a high dimensional feature space, where an optimal separating hyperplane can be found. Those separating planes are optimal, which means that a maximal margin classifier with respect to the training data set can be obtained.
Rychetsky (2001)

"A learning machine that is based on the principle of Structural Risk Minimization described above is the Support Vector Machine (SVM). The SVM has been developed by Vapnik and co-workers at AT&T Bell Laboratories [9, 115, 116, 19]."
Rychetsky (2001)

"The support vector network implements the following idea [21]: Map the input vectors into a very high dimensional feature space Z through some non-linear mapping chosen a priori. Then construct an optimal separating hyperplane in this space."
Vapnik (2003) in Suykens et al.

"The support vector machine (SVM) is a supervised learning method that generates input-output mapping functions from a set of labeled training data."
Wang (2005)

"Support vector machines (SVMs) are a set of related supervised learning methods, applicable to both classification and regression."
Wikipedia (2004)