Perceptrive

Multiclass perceptron edit

Like most other techniques for training linear classifiers, the perceptron generalizes naturally to multiclass classification. Here, the input $x$ and the output $y$ are drawn from arbitrary sets. A feature representation function $f(x,y)$ maps each possible input/output pair to a finite-dimensional real-valued feature vector. As before, the feature vector is multiplied by a weight vector $w$ , but now the resulting score is used to choose among many possible outputs:

{\hat {y}}=\mathrm {argmax} _{y}f(x,y)\cdot w

Learning again iterates over the examples, predicting an output for each, leaving the weights unchanged when the predicted output matches the target, and changing them when it does not. The update becomes:

w_{t+1}=w_{t}+f(x,y)-f(x,{\hat {y}})

This multiclass formulation reduces to the original perceptron when $x$ is a real-valued vector, $y$ is chosen from $\{0,1\}$ , and $f(x,y)=yx$ .

For certain problems, input/output representations and features can be chosen so that $\mathrm {argmax} _{y}f(x,y)\cdot w$ can be found efficiently even though $y$ is chosen from a very large or even infinite set.

In recent years, perceptron training has become popular in the field of natural language processing for such tasks as part-of-speech tagging and syntactic parsing (Collins, 2002).

Collins, M. 2002. Discriminative training methods for hidden Markov models: Theory and experiments with the perceptron algorithm in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '02)