- In 1936, R. A. Fisher suggested the first algorithm for pattern recognition (Fisher 1936).
- Aronszajn (1950) introduced the ‘Theory of Reproducing Kernels’.
- In 1957 Frank Rosenblatt invented a linear classifier called the perceptron (the simplest kind of feedforward neural network), see Rosenblatt (1962).
- Vapnik and Lerner (1963) introduce the Generalized Portrait algorithm (the algorithm implemented by support vector machines is a nonlinear generalization of the Generalized Portrait algorithm).
- Aizerman, Braverman and Rozonoer (1964) introduced the geometrical interpretation of the kernels as inner products in a feature space.
- Vapnik and Chervonenkis (1964) further develop the Generalized Portrait algorithm.
- Cover (1965) discussed large margin hyperplanes in the input space and also sparseness.
- Similar optimisation techniques were used in pattern recognition by Mangasarian (1965).
- The use of slack variables to overcome the problem of noise and nonseparability was introduced by Smith (1968).
- Duda and Hart (1973) discuss large margin hyperplanes in the input space.
- The field of ‘statistical learning theory’ began with Vapnik and Chervonenkis (1974) (in Russian).
- SVMs can be said to have started when statistical learning theory was developed further with Vapnik (1979) (in Russian).
- Wapnik and Tscherwonenkis (1979) wrote a German translation of Vapnik and Chervonenkis’s 1974 book.
- Vapnik (1982) wrote an English translation of his 1979 book.
- See also the PhD thesis by Hassoun (1986) for related early work.
- Several statistical mechanics papers (for example Anlauf and Biehl (1989)) suggested using large margin hyperplanes in the input space.
- Poggio and Girosi (1990) and Wahba (1990) discuss the use of kernels.
- Bennett and Mangasarian (1992) improved upon Smith’s 1968 work on slack variables.
- SVMs close to their current form were first introduced with a paper at the COLT 1992 conference (Boser, Guyon and Vapnik 1992).
- In 1995 the soft margin classifier was introduced by Cortes and Vapnik (1995); in the same year the algorithm was extended to the case of regression by Vapnik (1995) in
*The Nature of Statistical Learning Theory*. - The papers by Bartlett (1998) and Shawe-Taylor,
*et al.*(1998) gave the first rigorous statistical bound on the generalisation of hard margin SVMs. - Shawe-Taylor and Cristianini (2000) gave statistical bounds on the generalisation of soft margin algorithms and for the regression case.

- AIZERMAN, M. A., E. M. BRAVERMAN, and L. I. ROZONOER, 1964. Theoretical foundations of the potential function method in pattern recognition learning.
*Automation and Remote Control*,**25**, 821–837. - ANLAUF, J. K., and M. BIEHL, 1989. The adatron: An adaptive perceptron algorithm.
*Europhysics Letters*,**10**(7), 687–692. - ARONSZAJN, N., 1950. Theory of reproducing kernels.
*Transactions of the American Mathematical Society*,**68**(3), 337–404. - BARTLETT, Peter L., 1998. The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network.
*IEEE Transactions on Information Theory*,**44**(2), 525–536. - BENNETT, Kristin P., and O. L. MANGASARIAN, 1992. Robust linear programming discrimination of two linearly inseparable sets.
*Optimization Methods and Software*,**1**, 23–34. - BOSER, Bernhard E., Isabelle M. GUYON, and Vladimir N. VAPNIK, 1992. A training algorithm for optimal margin classifiers.
*In*:*COLT ’92: Proceedings of the Fifth Annual Workshop on Computational Learning Theory*. New York, NY, USA: ACM Press, pp. 144–152. - CORTES, Corinna, and Vladimir VAPNIK, 1995. Support-vector networks.
*Machine Learning*,**20**(3), 273–297. - COVER, Thomas M., 1965. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition.
*IEEE Transactions on Electronic Computers*,**14**(3), 326–334. - DUDA, Richard O., and Peter E. HART, 1973.
*Pattern Classification and Scene Analysis*. New York: John Wiley & Sons Inc. - FISHER, R. A., 1936. The use of multiple measurements in taxonomic problems.
*Annals of Eugenics*,**7**, 111–132. - HASSOUN, M. H., 1986.
*Optical Threshold Gates and Logical Signal Processing*. Ph. D. thesis, Wayne State University, Detroit, USA. - MANGASARIAN, O. L., 1965. Linear and nonlinear separation of patterns by linear programming.
*Operations Research*,**13**(3), 444–452. - POGGIO, Tomaso, and Federico GIROSI, 1990. Networks for approximation and learning.
*Proceedings of the IEEE*,**78**(9), 1481–1497. - ROSENBLATT, Frank, 1962.
*Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms*. Washington DC: Spartan Books. - SHAWE-TAYLOR, John,
*et al.*, 1998. Structural risk minimization over datadependent hierarchies. IEEE Transactions on Information Theory,**44**(5), 1926–1940. - SHAWE-TAYLOR, John, and Nello CRISTIANINI, 2000. Margin distribution and soft margin. In: Alexander J. SMOLA, et al., eds.
*Advances in Large Margin Classifiers*. The MIT Press, pp. 349–358. - SMITH, F. W., 1968. Pattern classifier design by linear programming.
*IEEE Transactions on Computers*,**C-17**(4), 367–372. - VAPNIK, V., 1979.
*Estimation of Dependences Based on Empirical Data*[in Russian]. Moscow: Nauka. - VAPNIK, Vladimir, 1982.
*Estimation of Dependences Based on Empirical Data*. Springer Verlag. - VAPNIK, V., and A. CHERVONENKIS, 1964. A note on one class of perceptrons.
*Automation and Remote Control*,**25**. - VAPNIK, V., and A. LERNER, 1963. Pattern recognition using generalized portrait method.
*Automation and Remote Control*,**24**, 774–780. - VAPNIK, Vladimir N., 1995.
*The Nature of Statistical Learning Theory*. Springer-Verlag New York, Inc. - VAPNIK, V. N., and A. Ya. CHERVONENKIS, 1974.
*Teoriya raspoznavaniya obrazov: Statisticheskie problemy obucheniya*. (Russian) [Theory of pattern recognition: Statistical problems of learning]. Moscow: Nauka. - WAHBA, Grace, 1990.
*Spline Models for Observational Data*. Volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. Philadelphia, PA, USA: SIAM: Society for Industrial and Applied Mathematics. - WAPNIK, W. N., and A. J. TSCHERWONENKIS, 1979.
*Theorie der Zeichenerkennung*. (German) [Theory of pattern recognition]. Berlin: Akademie-Verlag.