Searching for Search Algorithms in Yelp

Posted by on Dec 3, 2016 in Writing Assignment 6 | No Comments

According to Alexa.com, the most popular sites are search engines with Google.com as the leading provider on the World Wide Web. Practically, it is easy to see why this is so. The World Wide Web is an information system of documents that may be linked together with hyperlinks. These documents can be accessed via the Internet, which is a system of interconnected computer networks. To be able to access these documents, the address of the website needs to either be known or “Googled”. While Google didn’t pioneer search engines, it is indeed on the forefront of search engine technology. As the company is dominant in this industry, it become a verb, so to speak. As there are billions of documents on the internet, a search query would seemingly take ages. However, Google searches take less than a second on average as the time is listed whenever you make a search.

The whole concept of searches stems from information retrieval – “a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information.” (Croft, et al.) Generally, there are many types of searches in Computer Science. The most simple of all is the linear search where you traverse an array linearly to compare the value of each index to find a target. Another interesting search is the binary search which is seen in the figure. When given a sorted array, simply divide and conquer by asking: is the target greater or smaller than the current index being looked at in the array? A decision tree is shown that will clarify the process for a binary search (Horowitz).

Figure 1: Binary Search Decision Tree

Figure 1: Binary Search Decision Tree (Horowitz)

Google is a crawler-based search engine in that it is divided into three tasks. First, there is a “spider” that goes through a website and all of the links in the directory of that web server. Second, these web pages are coped into an index that will allow for the final step where a search engine software will traverse the index to find pages that would satisfy a target (Sullivan). As discussed before, we know what a linear search and binary search is. A linear search simply could not suffice for billions of pages as your daily Google search will take hours and maybe even days. A binary search would also take very long. Suppose for a small data set, we used a binary search. Entering a search query would then go through this index of pages that point to a set of documents based on the keywords in the search query.

People that use Google for the most part only want a few pages out of the hundreds of thousands of documents returned. An estimated 85% of queries only had the first page of results requested (Henziner, et al.). As there are many results, it is given that users would only want a small subset to satisfy their request (Joachims). How does this relate to Yelp? Like any search engine, it utilizes an indexing system that would allow it to search for restaurants and businesses efficiently. If you did a linear search through the many businesses in Manhattan alone for Alice’s Tea Cup, the resources exhausted for merely one query would be unsustainable in a searching service. By indexing keywords, Yelp can easily return a list of restaurants with a simple search. Furthermore, Yelp’s discrete tags allow you to filter restaurants based on their accommodations and cuisine.

Works Cited

Croft, W. Bruce, Donald Metzler, and Trevor Strohmann. Search engines. Pearson Education, 2010.

Henzinger, Monika R., Rajeev Motwani, and Craig Silverstein. “Challenges in web search engines.ACM SIGIR Forum. Vol. 36. No. 2. ACM, 2002.

Horowitz, Ellis, and Sartaj Sahni. Fundamentals of computer algorithms. Computer Science Press, 1978.

Joachims, Thorsten. “Optimizing search engines using clickthrough data.Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2002.

Sullivan, Danny. “How search engines work.SEARCH ENGINE WATCH, at http://www. searchenginewatch. com/webmasters/work. html (last updated June 26, 2001)(on file with the New York University Journal of Legislation and Public Policy) (2002).

Leave a Reply