Linköping University 
Department of Mathematics
Lars Eldén

Nov 2006


Matrix Methods in Data Mining and Pattern Recognition

Computer assignment

Pagerank


ASSIGNMENT

Then implement the pagerank computation using the power method and a Lanczos method (e.g. Matlab's eigs function). Experiment with different values of the teleportation parameter.

SPECIFIC TASKS

  1. A very small link matrix is available at QMATRIX.mat. Draw the corresponding link graph. Compute (using eig) and plot the pagerank vector for the following values of the teleportation parameter $\alpha: 0.85, 0.9, 0.99, 1-10^{-8}, 1-10^{-12}$. What conclusions can we draw about the dependence of the pagerank on the parameter? More information can be found in the report PageRank as a function of the damping factor.

  2. Implement the power method for pagerank computation along the lines of Chapter 12 of Matrix Methods in Data Mining and Pattern Recognition. Use the norm of the residual as stopping criterion. Test it on the Stanford matrix and plot the residual.

  3. Implement the computation of the pagerank vector using the Matlab function eigs. In order not to have to store the matrix A (in the notation of Chapter 12) call the function eigs with a function as first argument (see the Matlab help information). Note that you can not control the norm of the vectors in the matrix-vector multiplication (inside eigs) so the matrix-vector multiplication trick in Section 12.3 of the book is useless here.

  4. Test the two algorithms on your link matrix. Use the values 0.85 and 0.99 for the teleportation parameter. Compare the rate of convergence of the two algorithms and the elapsed time (Matlab functions tic and toc). Compare pagerank vectors for the two values of the parameter. Is the ranking sensitive to the value of the parameter?

  5. Run eigs and compute six eigenvectors for the Stanford matrix. Explain the behaviour of the algorithm in view of the perturbation results in Section 15.1 of the book.

Optional tasks

  1. Let A be the Google matric for some link matrix P. Rewrite the eigenvalue equation Ar = r as a linear system. For a link matrix of reasonable dimension and the standard value of the teleportation parameter, solve the linear system by a linear system solver. Try a sparse direct solver (e.g. Matlab's sparse LU decomposition) or an iterative method (e.g. Gauss-Seidel or GMRES). If possible, vary the the dimension of the matrix Compare the efficiency of this approach to that of the eigenvalue-based method.

  2. Web spiders (crawlers) can be obtained via Amy Langville's web page. Alternatively, see the spider (Perl script) published in Andersson, Ekström, Investigating Google's PageRank algorithm.

    Use the spider and generate the link matrix for a web domain of your choice. For instance, http://www.mai.liu.se gave a matrix of dimension 16000 in November 2006. The matrix produced by the Perl spider can be converted to a sparse matrix using the matlab function spconvert. Make sure that the matrix becomes column-stochastic.



DATA

The large matrix stanford-web are available at stanford-web. The Stanford matrix can be read using the MATLAB file loadStanfordMatrix.m.