page rank for a web search engin when a search is made on the internet us a search engin there is first a tradit text process part, where the aim is to find all the web page contain the word of the queri. due to the massiv size of the web, the number of hit is like to be much too larg to be handl by the user. therefor, some measur of qualiti is need to filter awai page that ar like to be less interest. when on us a web search engin it is typic that the search phrase is under-specifi. a googl http://www.googl.com/ search conduct on septemb 29, 2005, us the search phrase univers, gave as a result link to the follow well-known univers: harvard, stanford, cambridg, yale, cornel, oxford. the total number of web page relev to the search phrase wa more than 2 billion. obvious googl us an algorithm for rank all the web page that agre rather well with a common-sens qualiti measur. somewhat surprisingli, the rank procedur is not base on human judgement, but on the link structur of the web. loos speak, googl assign a high rank to a web page, if it ha inlink from other page that have a high rank. we will see that thi ``self-referenc'' statement can be formul mathemat as an eigenvalu equat for a certain matrix. pagerank it is of cours imposs to defin a gener valid measur of relev that would be accept for a major of user of a search engin. googl us the concept of pagerank as a qualiti measur of web page. it is base on the assumpt that the number of link to and from a page give inform about the import of a page. we will give a descript of pagerank base primarili on and . concern googl, see . let all web page be order from 1 to , and let be a particular web page. then will denot the set of page that is link to, the outlink. the number of outlink is denot the set of inlink, denot , ar the page that have an outlink to . in gener, a page can be consid as more import the more inlink it ha. howev, a rank system base onli on the number of inlink is easi to manipul for an exampl of attempt to fool a search engin, see when you design a web page that (e.g. for commerci reason) you would like to be seen by as mani as possibl, you could simpli creat a larg number of (inform-less and unimport) page that have outlink to . in order to discourag thi, on defin the rank of in such a wai that if a highli rank page , ha an outlink to , thi add to the import of in the follow wai: the rank of page is a weight sum of the rank of the page that have outlink to . the weight is such that the rank of a page is divid evenli among it outlink. the preliminari definit of pagerank is thi definit is recurs, so pagerank cannot be comput directli. instead a fix point iter might be us: guess an initi rank vector . then iter there ar a few problem with thi iter: if a page ha no outlink, then in the iter process it onli accumul rank via it inlink, but thi rank is never distribut further. therefor it is not clear if the iter converg. we will come back to thi question later. more insight is gain if we reformul eq:pr1 as an eigenvalu problem for a matrix repres the graph of the internet. let be a squar matrix of dimens . then defin thi definit mean that row ha nonzero element in those posit that correspond to inlink of . similarli, column ha nonzero element equal to in those posit that correspond to the outlink of , and, provid that the page ha outlink, the sum of all the element in column is equal to on. in the follow symbol pictur of the matrix , non-zero element ar denot : the follow link graph illustr a set of web page with outlink and inlink. the correspond matrix becom sinc page 4 ha no outlink, the correspond column is equal to zero. obvious, the definit eq:pr1 is equival to the scalar product of row and the vector , which hold the rank of all page. we can write the equat in matrix form: i.e. is an eigenvector of with eigenvalu . it is now easili seen that the iter eq:pow1 is equival to which is the power method for comput the eigenvector. howev, at thi point it is not clear that pagerank is well-defin, as we do not know if there exist an eigenvalu equal to 1. it turn out that the theori of markov chain is us in the analysi. random walk and markov chain there is a random walk interpret of the pagerank concept. assum that a surfer visit a web page, choos the next page among the outlink with equal probabl. then the random walk induc a markov chain, see e.g. a markov chain is a random process, where the next state is determin complet from the present state; the process ha no memori the transit matrix of the markov chain is . (note that we us a slightli differ notat than is common in the theori of stochast process). the random surfer should never get stuck. in other word, our random walk model should have no page without outlink (such a page correspond to a zero column in ). therefor the model is modifi so that zero column ar replac by a constant valu in all posit. thi mean that there is equal probabl to go to ani other page in the net. defin the vector for , and the modifi matrix is defin with thi modif the matrix is a proper column-stochast matrix: it ha non-neg element and the element of each column sum up to 1. the preced statement can be reformul as follow. a column-stochast matrix satisfi where is defin by eq:edef. the matrix in the previou exampl is modifi to now, in analog to eq:preig, we would like to defin the pagerank vector as a uniqu eigenvector of with eigenvalu 1, howev, the exist of such a uniqu eigenvalu is still not guarante. the eigenvector of the transit matrix correspond to a stationari probabl distribut for the markov chain: the element in posit , , is the probabl that after a larg number of step, the random walker is at web page . to ensur the exist of a uniqu distribut, the matrix must be irreduc, cf. . a squar matrix is call reduc if there is a permut matrix such that where and ar both squar. otherwis the matrix is call irreduc., to illustr the concept of reduc, we give an exampl of a link graph that correspond to a reduc matrix. a random walker who ha enter the left part of the link graph will never get out of it, and similarli he will get stuck in the right part. the correspond matrix is which is of the form eq:reduc. actual, thi matrix ha two eigenvalu equal to 1, and on equal to -1, see exampl below. the direct graph correspond to an irreduc matrix is strongli connect: given ani two node , in the graph, there exist a path lead from to . the uniqu of the largest eigenvalu of an irreduc matrix is guarante by the perron-frobeniu theorem; we state it for the special case treat here. the inequ is understood as all the element of be strictli posit. let be an irreduc column-stochast matrix. the largest eigenvalu in magnitud is equal to 1. there is a uniqu correspond eigenvector satisfi , and ; thi is the onli eigenvector that is non-neg. if , then . due to the fact that is column-stochast we have , which mean that 1 is an eigenvalu of . the rest of the statement can be prove us perron-frobeniu theori . given the size of the internet, we can be sure that the link matrix is reduc, which mean that the pagerank eigenvector of is not well-defin. to ensur irreduc, i.e. to make it imposs for the random walker to get trap in a subgraph, on add, artifici, a link from everi web page to all the other. in matrix term, thi can be made by take a convex combin of and a rank on matrix, for some satisfi . it is easi to see that the matrix is column-stochast: the random walk interpret of the addit rank on term is that in each time step the surfer visit a page will jump to a random page with probabl (sometim refer to as teleport). we now see that the pagerank vector for the matrix is well-defin. the column-stochast matrix defin in eq:a is irreduc (sinc ) and ha the largest in magnitud eigenvalu . the correspond eigenvector satisfi . for the converg of the numer eigenvalu algorithm, it is essenti to know, how the eigenvalu of ar chang by the rank on modif eq:a. [] assum that the eigenvalu of the column-stochast matrix ar . then the eigenvalu of ar . defin to be normal to euclidean length 1, and let be such that is orthogon. then, sinc where , and . sinc we have made a similar transform, the matrix ha the eigenvalu . we further have therefor, the statement now follow immedi. theorem impli that even if ha a multipl eigenvalu equal to 1, which is actual the case for the googl matrix, the second largest eigenvalu in magnitud of is alwai equal to . we comput the eigenvalu and eigenvector of the matrix , with from eq:preduc and . the matlab code give the follow result: it is seen that all other eigenvector except the first on (which correspond to the eigenvalu 1), have both posit and neg compon, as state in theorem . instead of the modif eq:a we can defin where is a non-neg vector with that can be chosen in order to make the search bias toward certain kind of web page. therefor, it is often refer to as a person vector . the vector can also be us for avoid manipul by so call link farm . the power method for pagerank comput we want to solv the eigenvalu problem where is normal . in thi section we denot the sought eigenvector by . in deal with stochast matric and vector that ar probabl distribut, it is natur to us the 1-norm for vector (section ). due to the sparsiti and the dimens of (estim to be of the order billion), it is out of the question to comput the eigenvector us ani of the standard method for dens matric that ar base on appli orthogon transform to the matrix describ in chapter . the onli viabl method so far is the power method. assum that an initi approxim is given. the power method is given in the follow algorithm. the purpos of normal the vector (make it have 1-norm equal to 1) is to avoid that the vector becom either veri larg of veri small, and thu unrepresent in the float point system. we will see later that normal is not necessari in the pagerank comput. in thi context there is no need to comput an eigenvalu approxim, as the eigenvalu sought for is known to be equal to on. the converg of the power method depend on the distribut of eigenvalu. in order to make the present simpler, we assum that is diagonaliz, i.e . there exist a nonsingular matrix of eigenvector, . the eigenvalu ar order expand the initi approxim in term of the eigenvector, where is assum thi assumpt can be expect to be satisfi in float point arithmet, if not at the first iter, so after the second, due to round off.. then we have obvious, sinc for we have , the second term tend to zero and the power method converg to the eigenvector . the rate of converg is determin by the ratio . if thi ratio is close to 1, then the iter is veri slow. fortun thi is not the case for the googl matrix, see theorem and below. a stop criterion for the power iter can be formul in term of the residu vector for the eigenvalu problem. let be the comput approxim of the eigenvalu, and the correspond approxim eigenvector. then it can be shown that the optim error matrix , for which exactli, satisfi where . thi mean that if the residu is small, then the comput approxim eigenvector is the exact eigenvector of a matrix that is close to . in the case of pagerank comput it is natur to us the 1-norm instead , which doe not make much differ, sinc the norm ar equival eq:norm-eq. in view of the huge dimens of the googl matrix, it is non-trivial to comput the matrix-vector product , where . first, we show that normal of the vector produc in the power iter is unnecessari. assum that the vector satisfi , and that the matrix is is column-stochast. then put . then sinc is column-stochast (). then recal that wa construct from the actual link matrix as where the row vector ha an element 1 in all those posit that correspond to web page with no outlink, see eq:p. thi mean that to form , we insert a larg number of full vector in , each of the same dimens as the total number of web page. consequ, we cannot afford to store explicitli. let us look at the multipl in some more detail: where howev, we do not need to comput from thi equat. instead we can us eq:normi in combin with eq:yq: thu, we have an extra bonu is that we do not us the vector at all, i.e we need not know which page lack outlink. the follow matlab code implement the matrix vector multipl. here or a person teleport vector, see p. in order to save memori, we even should avoid us the extra vector yhat and replac it by y. from theorem we know that the second eigenvalu of the googl matrix satisfi . a typic valu of is 0.8 approxim iter ar need to make the factor equal to . thi is report to be close the number of iter us by googl. as an exampl we us the matrix obtain from the domain stanford the number of page is 281903, and the total number of link is 2312497. part of the matrix is displai in figur . we comput the page rank vector us the power method with and iter 63 time until the 1-norm of the residu wa smaller than . the residu and the final pagerank vector ar illustr in figur . if on comput the pagerank for a subset of the internet, on particular domain, sai, then the matrix mai be of a dimens for which on can us other method than the power method. in such case it mai be suffici to us the matlab function , which comput a small number of eigenvalu and the correspond eigenvector of a spars matrix us a variant of the arnoldi (or lanczo) method, cf. section . in view of the fact that on pagerank calcul can take sever dai, sever enhanc of the iter procedur have been propos. in an adapt method is describ that check the converg of the compon of the pagerank vector and avoid perform the power iter for those compon. up to 30 speed up ha been report. the block structur of the web is us in , and speedup of a factor 2 have been report. an acceler method base on aitken extrapol is describ in . aggreg method ar discuss in sever paper by langvil and meyer and in , and the arnoldi method in . a variant of pagerank is propos in . further properti of the pagerank matrix ar given in . hit anoth method base on the link structur of the web wa introduc at the same time as pagerank . it is call hit (hypertext induc topic search), and is base on the concept of author and hub. an author is a web page with sever inlink and a hub ha sever outlink. the basic idea is: good hub point to good author and good author ar point to by good hub. each web page is assign both a hub score and an author score . let be the adjac matrix of the direct web graph. then two equat ar given that mathemat defin the relat between the two score, base on the basic idea: the algorithm for comput the score is the power method, which converg to the left and right singular vector correspond to the largest singular valu of . in the implement of hit it is not the adjac matrix of the whole web that is us, but of all the page relev to the queri. there is now an extens literatur on pagerank, hit and other rank method. for overview, see . a combin of hit and pagerank ha been propos in . obvious the idea underli pagerank and hit ar not restrict to web applic, but can be appli to other network analys. for instanc, a variant of the hit method wa recent us in a studi of suprem court preced . a gener of hit is given in , which also treat synonym extract.