Multiclass perceptron edit

Like most other techniques for training linear classifiers, the perceptron generalizes naturally to multiclass classification. Here, the input   and the output   are drawn from arbitrary sets. A feature representation function   maps each possible input/output pair to a finite-dimensional real-valued feature vector. As before, the feature vector is multiplied by a weight vector  , but now the resulting score is used to choose among many possible outputs:

 

Learning again iterates over the examples, predicting an output for each, leaving the weights unchanged when the predicted output matches the target, and changing them when it does not. The update becomes:

 

This multiclass formulation reduces to the original perceptron when   is a real-valued vector,   is chosen from  , and  .

For certain problems, input/output representations and features can be chosen so that   can be found efficiently even though   is chosen from a very large or even infinite set.

In recent years, perceptron training has become popular in the field of natural language processing for such tasks as part-of-speech tagging and syntactic parsing (Collins, 2002).

  • Collins, M. 2002. Discriminative training methods for hidden Markov models: Theory and experiments with the perceptron algorithm in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '02)