Linköping University 
Department of Mathematics
Lars Eldén

June 2009


Matrix methods in data mining and pattern recognition

Computer assignment

Spam Classification








ASSIGNMENT

Construct an algorithm in MATLAB for spam classification, where the data are collected in a matrix with a number of features prepared from a set of 4601 e-mail messages. Use singular vector bases for the two classes and reclassify the given data.

SPECIFIC TASKS

  1. Use different numbers of basis vectors and determines for which dimension the number of incorrectly classified is lowest.
  2. Give a "confusion table" for the classification.

DATA

The test data are available at the URL http://www.mai.liu.se/~laeld/matrix-methods/computer-assignments/spam/. They are described in the file spam.txt.