Linköping University
Department of Mathematics
Lars Eldén

November 17, 2004                                                     

 

TANA48: Numerical algorithms for data mining and IT


Computer Assignment

Cancer data

In the  lab directory  /mailocal/lab/numt/ngssc/cancer-data/ there is a  file  A.mat that contains a 9 x 683 matrix with data for 683 cancer patients. 444 of the patients have a benign form of cancer, and 239 have a malign form. 


Classify the patients in the two categories, using k-means clustering and/or spectral clustering.  Compare with the correct classification given in the file class.mat


It is common in data mining and statistics to preprocess a data matrix by subtracting the means of each column. Experiment with the clustering method(s) using the original data and preprocessed data.

The report should contain the code,  and a confusion matrix for each of your classifications.

 The data have been taken  from the  Wisconsin breast cancer dataset