Cultivation of Big Data and Crowdsourcing Software Development
Yelp has been one of the most successful crowdsourcing websites ever since its start in 2004 more than a decade ago. Besides having a social media aspect, it is granted that there is a lot of data science involved given the countless businesses and reviews that are posted on Yelp. It’s most successful feat was utilizing Computer Science to create a product that not only attracts many users to build a large and sustainable community base, but also involves up-to-date algorithms to upkeep the system. Reviews are invaluable to a business – statistics show that higher star ratings are correlated with higher revenue. Findings show that an increase in one star can lead to a 5-9 percent increase in revenue for the business. (Luca 2016) Therefore, businesses will have incentive to create fake reviews. Yelp attempts to counteract this by implementing a filter to detect fake reviews. (Kamerer 2014) To do so, machine learning algorithms are used in order to distinguish legitimate reviews from fake reviews. (Mukherjee, et al.) Furthermore, text data mining is necessary in order to utilize these algorithms to produce data. (Mukherjee, et al.) Interestingly enough, to expand the possibility for innovation that will drive the website way into the future, Yelp consistently hosts challenges that will allow scientists to download parts of legitimate data sets from Yelp for the purpose of creating a new way to use it.
The data sets consist of a stockpile of information including a user database, business database, review database. There are 366715 users, 61184 businesses, and 1569264 reviews available from the Yelp data set. (Yu, et al.) Each of these files are stored in JSON, JavaScript Object Notation. This is a format that describes an object through attributes. In the case of the data sets, these objects could be users, businesses, and reviews. There are countless possibilities for things that could be done with these large data sets. For example, we could create different types of algorithms that would predict a business’ star rating just by the content of the reviews. (Yu, et. al) Furthermore, you could also extract text from reviews for latent subtopics that could potentially help businesses improve their services by observing a “hidden” demand. (Huang, et. al 2014)
As media evolves, companies that rely on technology need to stay on the forefront in order to maintain a status in the online world. In keeping technologically updated, recruiting computer scientists and engineers in their endeavor to constantly keep their website/software updated is a popular tactic used by many other companies, such as Google. Many of these big name companies host Code Challenges that allow them to gather ideas as well as recruit valuable members in the future. In this context, Computer Scientists can be imagined as working in a lab to rapidly create new ways of using changing technology.
Works Cited
Fan, Mingming, and Maryam Khademi. “Predicting a Business Star in Yelp from Its Reviews
Text Alone.” ArXiv Preprint ArXiv:1401.0864 (2014): n. pag. [1401.0864] Predicting a Business Star in Yelp from Its Reviews Text Alone. Web. 16 Sept. 2016.
Huang, J.; Rogers, S.; Joo, E. (2014). Improving Restaurants by Extracting Subtopics from Yelp
Reviews. In iConference 2014 (Social Media Expo)
Kamerer, David. “Understanding the Yelp review filter: An exploratory study.” First
Monday [Online], 19.9 (2014): n. pag. Web. 12 Sep. 2016
Luca, Michael. “Reviews, Reputation, and Revenue: The Case of Yelp.com.” Harvard Business
School Working Paper, No. 12-016, September 2011. (Revised March 2016. Revise and resubmit at the American Economic Journal – Applied Economics.)
Mukherjee, Arjun, Venkataraman, Vivek, Liu, Bing, AND Glance, Natalie. “What Yelp Fake
Review Filter Might Be Doing?” International AAAI Conference on Web and Social
Media (2013): n. pag. Web. 12 Sep. 2016
Yu, Mengqi, Meng Xue, and Wenjia Ouyang. “Restaurants Review Star Prediction for Yelp