Tune the algorithm for accuracy of classification. Give a table or graph of the percentage of correctly classified digits as a function of the number of basis vectors.
Check if all digits are equally easy or difficult to classify. Also look at some of the difficult ones, and see that in many cases they are very badly written.
Check the singular values of the different classes. Is it motivated to use different numbers of basis vectors for different classes? If so, perform a few experiments to find out if it really pays off to use fewer basis vectors in one or two of the classes.
The test data are available at readme. The following files are provided:
dzip.mat and azip.mat:. The first is a vector that holds the digits (the number) and the second is an array of dimension 256 x 1707 that holds the training images. The images are vectors of dimension 256, that have been constructed from 16 x 16 images.
dtest.mat and testzip.mat hold the test data.
The data are a subset of the US Postal Service Database, and we downloaded them from the web page of the book The Elements of Statistical Learning, Hastie, Tibshirani and Friedman (2001). Springer-Verlag.
Rewrite the ima2.m function so that you can view 20 x 20 images.
In the classification, compare each test digit with all the training digits,
and classify as the closest.
In order to make the algorithm more robust one must first smooth the digits. This has been done and the test data are given in digits-smoothed/