November 17, 2004
In the lab directory /mailocal/lab/numt/ngssc/cancer-data/
there is a file A.mat that contains a 9 x 683 matrix with
data for 683 cancer patients. 444 of the patients have a benign form of cancer,
and 239 have a malign form.
Classify the patients in the two categories, using k-means clustering and/or
spectral clustering. Compare with the correct classification given
in the file class.mat
It is common in data mining and statistics to preprocess a data matrix by
subtracting the means of each column. Experiment with the clustering method(s)
using the original data and preprocessed data.
The report should contain the code, and a confusion matrix for each
of your classifications.
The data have been taken from the Wisconsin
breast cancer dataset