Our Hadoop stack at foursquare
Recently, we hosted the NYC Hadoop Meetup at foursquare HQ. To the over 100 in attendance, we gave an overview of our Hadoop stack. In case you couldn’t make it (or want to take a second look at the slides), here they are:
At foursquare, we’re generating, collecting, and analyzing billions of log events each week. Combined with data exported from mongodb, such as foursquare’s 2 billion check-ins, we use Hadoop to build recommendations, power internal reporting, and drive product features.
We’ve recently introduced a number of new components to our Hadoop stack in order to create a reliable, scalable, and production ready pipeline. These include Oozie for workflow management, Thrift for data serialization, Pig for adhoc analysis, and Scoobi for writing MapReduce jobs in Scala. We’ll share some experiences, lessons learned, and how we’re making it easier for developers and non-developers to use Hadoop at foursquare.