How do search engines find what you want?

When we search on the internet, we want to see the most relevant pages. Page rank algorithm is a tool to determine which pages are more authorative on the internet based on their popularity to ensure users see pages that are most likely to be of use to them.

Here is a brief video on the popular Page Rank Algorithm, which was introduced by Sergey Brin and Larry Page in 1997.

##### What is the Page Rank Algorithm?

The crux of page rank algorithm is that pages which are visited more often are considered more important.

Consider a random surfer who starts from node to node (or a page) in a “random walk”, where the surfer follows one of the outgoing links at random. Then the surfer goes to the next page by clicking another outgoing link at random, and so on. As the surfer continues surfing in this fashion, what is the likelihood that she ends on on a particular page?

Nodes that have many incoming links are likely to be visited more often.

What if a page has no outlinks ? To handle this case, a restart operation is introduced, where the surfer randomly jumps to any page

This is can be alternatively thought of as constructing a markov chain where each node is a webpage and the steady state probability is the probability that a random surfer will end up on a particular page. The page rank is derived from these steady state probabilities.

##### The Math for Page Rank Algorithm

Consider a toy graph with just 4 nodes. All nodes point to A while none of the nodes point to D.

The Page Rank can be iteratively computed for each page as follows:

For Page A, there is contribution from B, C and D. However for Page B there is contribution only from D. For page C, there is contribution from B and D, while for page D there is no contribution from any other page since there are no inlinks.

To give all pages a non-zero chance, usually a damping or a smoothing factor is added and the page rank is computed as follows:

##### Applications of Page Rank Algorithm

- Ranking Tweets: The page rank algorithm can be used to rank tweets by constructing a graph of users and tweets. A follow can be modeled as a link and a user tweeting or retweeting can be modeled as a link from the user to the tweet. The importance or rank of tweets can be infered by running PRA on the graph.
- Product Recommendation: Once can construct a product user graph as shown in the figure below to apply PRA and find the rank of each product. A variation of this algorithm, the
**personalized Page rank algorithm**can be applied to get the products that are most relevent for a particular user or a segment of users.

##### References

https://en.wikipedia.org/wiki/PageRank

Brin, S.; Page, L. (1998). “The anatomy of a large-scale hypertextual Web search engine” (PDF). *Computer Networks and ISDN Systems*. **30** (1–7): 107–117.