Linköping University
Department of Mathematics
Lars Eldén
June 2009
Matrix methods in data mining and pattern recognition
Computer assignment
Spam Classification
ASSIGNMENT
Construct an algorithm in MATLAB for spam classification, where the data are collected in a matrix with a number of features prepared from a set of 4601 e-mail messages. Use singular vector bases for the two classes and reclassify the given data.
SPECIFIC TASKS
- Use different numbers of basis vectors and determines for which dimension the number of incorrectly classified is
lowest.
- Give a "confusion table" for the classification.
DATA
The test data are available at the URL http://www.mai.liu.se/~laeld/matrix-methods/computer-assignments/spam/. They are described in the file spam.txt.