where is a vector of weights and denotes dot product. (We use the dot product as we are computing a weighted sum.) The sign of is used to classify as either a positive or a negative instance. Since the inputs are fed directly to the output via the weights, the perceptron can be considered the simplest kind of feedforward network.
When the training set is linearly separable, there exists a weight vector such that for all , the perceptron can be trained by a simple online learning algorithm in which examples are presented iteratively and corrections to the weight vectors are made each time a mistake occurs (learning by examples). The correction to the weight vector when a mistake occurs is (with learning rate ). Novikoff (1962) proved that this algorithm converges after a finite number of iterations.
However, if the training set is not linearly separable, the above online algorithm will never converge.
The -perceptron further utilised a preprocessing layer of fixed random weights, with thresholded output units. This enabled the perceptron to classify analogue patterns, by projecting them into a binary space. In fact, for a projection space of sufficiently high dimension, patterns can become linearly separable.
As an example, consider the case of having to classify data into two classes. Here is a small such dataset, consisting of two points coming from two Gaussian distributions.
A linear classifier can only separate things with a hyperplane, so it's not possible to perfectly classify all the examples. On the other hand, we may project the data into a large number of dimensions. In this case a random matrix was used to project the data linearly to a 1000-dimensional space; then each resulting data point was transformed through the hyperbolic tangent function. A linear classifier can then separate the data, as shown in the third figure. However the data may still not be completely separable in this space, in which the perceptron algorithm would not converge. In the example shown, stochastic steepest gradient descent was used to adapt the parameters.
It should be kept in mind, however, that the best classifier is not necessarily that which classifies all the training data perfectly. Indeed, if we had the prior constraint that the data come from equi-variant Gaussian distributions, the linear separation in the input space is optimal.
Other training algorithms for linear classifiers are possible: see, e.g., support vector machine and logistic regression.
More recently, interest in the perceptron learning algorithm has increased again after Freund and Schapire (1998) presented a voted formulation of the original algorithm (attaining large margin) and suggested that one can apply the kernel trick to it. The kernel-perceptron not only can handle nonlinearly separable data but can also go beyond vectors and classify instances having a relational representation (e.g. trees, graphs or sequences).
Classification algorithms | Neural networks
Perzeptron | Perceptrón | Perceptron | Perceptron | パーセプトロン | Perceptron | Perceptron | Perceptron | เพอร์เซปตรอน
This article is licensed under the GNU Free Documentation License.
It uses material from the
"Perceptron".
Home Page • arts • business • computers • games • health • hospitals • home • kids & teens • news • physicians • recreation• reference • regional • science • shopping • society • sports • world