Linköping University 
Department of Mathematics
Lars Eldén

June 2009

Matrix methods in data mining and pattern recognition

Computer assignment

Spam Classification


Construct an algorithm in MATLAB for spam classification, where the data are collected in a matrix with a number of features prepared from a set of 4601 e-mail messages. Use singular vector bases for the two classes and reclassify the given data.


  1. Use different numbers of basis vectors and determines for which dimension the number of incorrectly classified is lowest.
  2. Give a "confusion table" for the classification.


The test data are available at the URL They are described in the file spam.txt.