Simple example edit

A convolutional neural net is a feed-forward neural net that has convolutional layers.

A convolutional layer takes as input a M x N matrix (could for example be the redness of each pixel of an input image) and provides as output another M' x N' matrix. Oftentimes convolutional layers are placed in R parallel channels, and such stacks of convolutional layers are sometimes also called (M x N x R) convolutional layers. For clarity, let's continue with the R=1 case.

Suppose we want to train a network to recognize features from a 13x13 pixel grayscale image (so the image is a real 13x13 matrix). It is reasonable to create a first layer with neurons that connect to small connected patches, since we expect these neurons to learn to recognize "local features" (like lines or blots).

Viterbi edit

Given a sequence   of observations, emission probabilities p(x,y) of observing   when the hidden state is  , and transition probabilities q(x, x') between hidden states, find the most likely path   of hidden states.

The algorithm edit

Extended content

Let   be a neural network with   edges.

Below,   will denote vectors in  ,   vectors in  , and   vectors in  . These are called inputs, outputs and weights respectively. The neural network gives a function   which, given a weight  , maps an input   to an output  .

We select an error function   measuring the difference between two outputs. The standard choice is  , the Euclidean distance between the vectors   and  .

The backpropagation algorithm takes as input a sequence of training examples   and produces a sequence of weights   starting from some initial weight  , usually chosen to be random. These weights are computed in turn: we compute   using only   for  . The output of the backpropagation algorithm is then  , giving us a new function  . The computation is the same in each step, so we describe only the case  .

Now we describe how to find   from  . This is done by considering a variable weight   apply gradient descent to the function   to find a local minimum, starting at  . We then let   be the minimizing weight found by gradient descent.

The algorithm in coordinates edit

collaps't
We now compute the gradient of the function   from the previous section, for the special choice  .

To compute the gradient, it is enough to find compute  , where   is the output of node number   in   (whether it is an input, output or hidden neuron). Suppose  , where   is the weight of the  th incoming connection to  , and the other end of that connection has output  . The choice of   depends on  ; a common choice is   for all  .

Then  .

Note that we can apply the same formula again to   and that   is either   or  . This process must eventually end if there are no cycles in   (which we assume).


How does Aspirin relieve pain?

Aspirin contains acetylsalicylic acid (ASA)

ASA enters the blood stream

ASA disables COX

By entangling of the molecules (picture)

Prostaglandins are produced by wounded tissue
  • White blood cells produce prostaglandins at wounded cells (=tissue)
COX is required for prostaglandin production
Prostaglandins cause pain