Kernel trick
In machine learning, the kernel trick is a method for using a linear classifier algorithm to solve a non-linear problem by mapping the original non-linear observations into a higher-dimensional space, where the linear classifier is subsequently used; this makes a linear classification in the new space equivalent to non-linear classification in the original space.
This is done using Mercer's theorem, which states that any continuous, symmetric, positive semi-definite kernel function K(x, y) can be expressed as a dot product in a high-dimensional space.
More specifically, if the arguments to the kernel are in a measurable space X, and if the kernel is positive semi-definite — i.e.
- <math>\sum_{i,j} K(x_i,x_j) c_i c_j \ge 0</math>
for any finite subset {x1, ..., xn} of X and subset {c1, ..., cn} of objects (typically real numbers) — then there exists a function φ(x) whose range is in an inner product space of possibly high dimension, such that
- <math>K(x,y) = \varphi(x)\cdot\varphi(y).</math>
The kernel trick transforms any algorithm that solely depends on the dot product between two vectors. Wherever a dot product is used, it is replaced with the kernel function. Thus, a linear algorithm can easily be transformed into a non-linear algorithm. This non-linear algorithm is equivalent to the linear algorithm operating in the range space of φ. However, because kernels are used, the φ function is never explicitly computed. This is desirable, because the high-dimensional space may be infinite-dimensional (as is the case when the kernel is a Gaussian).
The kernel trick was first published by Aizerman et al.[1]
It has been applied to several kinds of algorithm in machine learning and statistics, including:
- Perceptrons
- Support vector machines
- Principal components analysis
- Canonical correlation analysis
- Fisher's linear discriminant analysis
- Clustering
The origin of the term kernel trick is not known.
References
- ↑ M. Aizerman, E. Braverman, and L. Rozonoer (1964). "Theoretical foundations of the potential function method in pattern recognition learning". Automation and Remote Control. 25: 821–837.
See also
- Kernel methods
- Integral transforms
- Hilbert space, specifically reproducing kernel Hilbert space
- Mercer kernel
External links
- Carlos C. Rodríguez: The Kernel Trick (pdf)