Linköping University 
Department of Mathematics
Lars Eldén

April 2007


Matrix Methods in Data Mining and Pattern Recognition

Computer assignment

Classification of Handwritten Digits by Non-Negative Matrix Factorization





ASSIGNMENT

Examine the approximation properties of non-negative matrix factorization. Construct an algorithm in MATLAB for classification of handwritten digits using NMF.

SPECIFIC TASKS

  1. Program the multiplicative NMF algorithm, and use it first for computing a factorization of the whole training set, with 40 columns in W, say. Experiment with the maximum number of iterations, so that you are sure that the algorithm has converged or is close to convergence. Look at the columns of W using ima2 (see below). Also illustrate, using ima2, for a number of digits in the training set their respective approximation using the NMF basis.

  2. Using a training set, compute the NMF of each class matrix. Use 5-20 basis vectors and classify unknown test digits according to how well they can be represented in terms of the respective bases (use the relative residual vector in the least squares problem as a measure). Compare your classification results with those obtained using the SVD.

DATA

The test data are available at the URL http://www.mai.liu.se/~laeld/matrix-methods/computer-assignments/digits/. The following files are provided:

  1. dzip.mat and azip.mat: the first is a vector that holds the digits (the number) and the second is an array of dimension 256 x 1707 that holds the training images. The images are vectors of dimension 256, that have been constructed from 16 x 16 images.

  2. dtest.mat and testzip.mat hold the test data.

  3. ima2.m takes an image vector as input and displays it.

The data are a subset of the US Postal Service Database, and we downloaded them from the web page of the book The Elements of Statistical Learning, Hastie, Tibshirani and Friedman (2001). Springer-Verlag.