User:Pakoch/sandbox

In applied mathematics, K-SVD is a dictionary learning algorithm for creating a dictionary for sparse representations, via a singular value decomposition approach. K-SVD is a generalization of the k-means clustering method, and it works by iteratively alternating between sparse coding the input data based on the current dictionary, and updating the atoms in the dictionary to better fit the data.^[1]^[2] K-SVD can be found widely in use in applications such as image processing, audio processing, biology, and document analysis.

Problem description

Sparse representation with fixed dictionaries

Many applications, such as the JPEG 2000 image standard, leverage the fact that many signals (in the case of JPEG 2000, natural images) can be represented sparsely as linear combinations of elements from known dictionaries, such as wavelets, curvelets, or contourlets. Given such a dictionary of K signal atoms, usually represented as the columns of a matrix $D\in \mathbb {R} ^{n\times K}$ , and a target vector $y\in \mathbb {R} ^{n}$ , the goal of sparse approximation is to find a sparse coefficient vector $x\in \mathbb {R} ^{K}$ that reconstructs y well. This goal is often formulated in two similar ways:

\min \limits _{x}\|y-Dx\|_{2}\quad {\text{subject to }}\|x\|_{0}<T_{0}

or

\min \limits _{x}\|x\|_{0}\quad {\text{subject to }}\|y-Dx\|_{2}<\epsilon

where $\|.\|_{2}$ indicates the L₂ norm and $\|.\|_{0}$ indicates the L₀ norm, which counts the number of nonzero elements in a vector. Here, T₀ and ε are fixed tolerances on the sparsity of the coefficient vector and the reconstruction error, respectively. Finding the truly optimal x for either of these equations, though, is NP-hard, and many algorithms exist for approximating the optimal solution. The sparse coding step of the K-SVD algorithm requires any algorithm that solves the first equation for a fixed T₀.

Dictionary learning

Though using predefined dictionaries can simplify the computation of sparse representations, tailoring the set of signal atoms to a specific application can outperform general-purpose dictionaries in some contexts. Given a set of N training vectors y₁, y₂, ... y_N, the goal of dictionary learning is to find a dictionary D that allows for the best sparse representation. As before, this can be achieved by constraining sparsity and minimizing reconstruction error, or vice-versa:

\min \limits _{D,X}\|Y-DX\|_{F}^{2}\quad {\text{subject to }}\forall i,\|x_{i}\|_{0}\leq T_{0}

(1)

or

\min \limits _{D,X}\sum \limits _{i}\|x_{i}\|_{0}\quad {\text{subject to }}\|Y-DX\|_{F}^{2}\leq \epsilon

where x_i is the ith row of matrix X and $\|.\|_{F}$ indicates the Frobenius norm.

Relation to vector quantization

Vector quantization can be considered as extreme form of this dictionary learning problem, wherein each y_i must be represented by exactly one signal atom; equivalently, in the first equation above, each x_i must have a single nonzero entry, and that entry must be 1. K-means clustering, the inspiration for the K-SVD algorithm, is a method for solving this problem, and as shown below, K-SVD is a direct generalization of K-means; enforcing this additional constraint on X makes K-SVD identical to the K-means algorithm.

K-SVD algorithm

The K-SVD algorithm alternates between two steps: finding the best representation X for the input data Y (using any sparse approximation algorithm that can find such a representation for a fixed T₀, such as orthogonal matching pursuit (OMP)), and updating the dictionary D to better fit the data. To accelerate convergence, the coefficient matrix is also updated in a limited fashion during the second step.

Sparse coding step

Any algorithm that approximately solves equation (1) can be used in this step. Orthogonal matching pursuit (OMP) is readily made for this task, though other methods, such as basis pursuit and FOCUSS can be modified trivially to satisfy the sparsity condition.

Dictionary update step

The K-SVD algorithm then turns to the more difficult task of updating the dictionary elements to further reduce the reconstruction error. This is done by considering only one column d_k of D at a time and using the singular value decomposition (SVD) to find a replacement for it that removes the most overall error, while holding the other columns constant. In contrast to the k-means algorithm, wherein the coefficients remain fixed while the dictionary is updated, the kth row of X (denoted x_(k)) is also modified. This provides the algorithm a more current X for use in updating the subsequent columns, resulting in further error reduction and faster convergence at the cost of prohibiting parallelism in computing the column updates.

To update the kth column of D, write the product of D and X as a sum of outer products and define E_k, the error contributed by all atoms except the kth one, as

{\begin{aligned}\|Y-DX\|_{F}^{2}&=\|Y-\sum \limits _{j=1}^{K}d_{j}x_{(j)}\|_{F}^{2}\\&=\|(Y-\sum \limits _{j\neq k}d_{j}x_{(j)})-d_{k}x_{(k)}\|_{F}^{2}\\&=\|E_{k}-d_{k}x_{(k)}\|_{F}^{2}\end{aligned}}

Because all terms in E_k are being held constant for the update of this column, the search for the d_k and x_(k) that minimize the above expression is equivalent to finding the rank-1 matrix that is closest to E_k in the Frobenius norm sense. This is easily found using the SVD, by letting d_k be the first left singular vector of E_k and x_(k) be the first right singular vector scaled by the first singular value. However, this will not preserve the sparsity of X, and so, measures must be taken to continue meeting the sparsity constraint.

To this end, the K-SVD algorithm updates only the nonzero entries of x_(k). This is done by ignoring columns of X that do not "use" the kth signal atom; define ω_k as the set of indices pointing to training vectors that use x_(k):

\omega _{k}=\{i\mid 1\leq i\leq N,x_{(k)}(i)\neq 0\}

Furthermore, define Ω_k as an $N\times \left\vert \omega _{k}\right\vert$ matrix, with ones on the (i, ω_k(i))th entries and zeros elsewhere. The product $x_{(k)}^{R}=x_{(k)}\Omega _{k}$ is then a reduced version of x_(k), containing only the nonzero entries. Similarly, $Y_{k}^{R}=Y\Omega _{k}$ is the set of training vectors that currently use the kth atom, and $E_{k}^{R}=E_{k}\Omega _{k}$ selects the error columns associated with those training vectors.

This provides a way to selectively update only the nonzero entries of x_(k) by performing the update on $x_{(k)}^{R}$ and mapping the new values back onto x_(k). Now, the column update minimization problem is

\|E_{k}\Omega _{k}-d_{k}x_{(k)}\Omega _{k}\|_{F}^{2}=\|E_{k}^{R}-d_{k}x_{(k)}^{R}\|_{F}^{2}

which can be solved straightforwardly using the SVD. Decomposing $E_{k}^{R}=U\Delta V$ , the updated d_k is chosen to be the first column of U and $x_{(k)}^{R}$ to be the first column of V multiplied by Δ(1,1). The new entries in $x_{(k)}^{R}$ are then mapped back to the nonzero entries of x_(k) from which they came. This update preserves both the normalization of the columns of D and the sparsity of the coefficient matrix X.

Initialization and stopping rule

In order to begin the algorithm, D must be initialized. As in the k-means algorithm, since the number of training vectors is much larger than the number of dictionary atoms, D can be set to a random selection of K unique columns of Y, then scaled to each have unit L₂ norm. The algorithm can be terminated after a fixed number of iterations, or once the change in overall error between successive iterations becomes smaller than some tolerance ε.

Convergence

The overall error, $\|Y-DX\|_{F}^{2}$ , is guaranteed to decrease with every application of the dictionary update procedure as a byproduct of the properties of the SVD. However, nothing guarantees that the new coefficient matrix produced in each sparse coding step will reduce the error over the previous one. This can be remedied with external interference: if the new X computed does not reduce the overall error, retain the previous X and continue with the dictionary update step. With this modification, the error is guaranteed to converge (though in practice, this is seldom needed, as many modern sparse coding algorithms perform very well). Note, though, that the dictionary update step is not suited to perform the dictionary learning process on its own, as it cannot change the support of X, and allowing training vectors to switch which signal atoms to use is important to both the K-SVD and k-means algorithms.

Example MATLAB implementation

Here, X = sparse_code(D,Y,T0) can be an implementation of any function that approximately solves equation (1), as noted above.

% Y : training vectors, n x N
% K : number of signal atoms to use
% T0 : maximum allowable number of nonzero entries per coefficient vector
% max_iters : maximum number of iterations to run
function [D,X] = KSVD(Y, K, T0, max_iters)
    %Initialization
    N = size(Y,2);
    D = normc(Y(:,randi(N,K)));
    X = zeros(K,N);
    
    for J = 1:max_iters
        %Sparse coding step
        X = sparse_code(D,Y,T0);
        %Dictionary update step
        for k = 1:K
            Ek = (Y - D*X) + D(:,k)*X(k,:);
            omegak = find(X(k,:) > 0);
            Omegak = eye(N);
            Omegak = Omegak(:,omegak);
            [U,Delta,V] = svd(Ek*Omegak);
            D(:,k) = U(:,1);
            X(k,omegak) = Delta(1,1)*(V(:,1)');
        end
    end

    %Final sparse coding step
    X = sparse_code(D,Y,T0);
end

Limitations

Choosing an appropriate "dictionary" for a dataset is a non-convex problem, and K-SVD operates by an iterative update which does not guarantee to find the global optimum.^[2] However, this is common to other algorithms for this purpose, and K-SVD works fairly well in practice.^[2]^{[better source needed]}

References

^ Michal Aharon, Michael Elad, and Alfred Bruckstein (2006), "K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation" (PDF), IEEE Transactions on Signal Processing, 54 (11): 4311–4322, doi:10.1109/TSP.2006.881199{{citation}}: CS1 maint: multiple names: authors list (link)
^ ^a ^b ^c Rubinstein, R., Bruckstein, A.M., and Elad, M. (2010), "Dictionaries for Sparse Representation Modeling", Proceedings of the IEEE, 98 (6): 1045–1057, doi:10.1109/JPROC.2010.2040551{{citation}}: CS1 maint: multiple names: authors list (link)

Category:Norms (mathematics) Category:Linear algebra Category:Cluster analysis algorithms

[aharon2006-1] Michal Aharon, Michael Elad, and Alfred Bruckstein (2006), "K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation" (PDF), IEEE Transactions on Signal Processing, 54 (11): 4311–4322, doi:10.1109/TSP.2006.881199{{citation}}: CS1 maint: multiple names: authors list (link)

[rubinstein2010-2] Rubinstein, R., Bruckstein, A.M., and Elad, M. (2010), "Dictionaries for Sparse Representation Modeling", Proceedings of the IEEE, 98 (6): 1045–1057, doi:10.1109/JPROC.2010.2040551{{citation}}: CS1 maint: multiple names: authors list (link)

[1]

[2]