In March, I spoke at Queens Open Tech about machine learning at Foursquare. The talk gives a nice overview of the kinds of insights we have about human behavior from check-in data and our machine-learning setup. Learn how we used smarter algorithms to get 20,000 people to try a new place every week.
Foursquare geographic infrastructure relies on numerous pieces of open geo software: PostGIS, GDAL, Shapely, Fiona, QGIS, S2, and JTS as well as open geographic data: OSM, geonames.org, US Census’ TIGER, Canada’s geogratis, Mexico’s INEGI and EuroGeoGraphics to name a few. We’ve been inspired by existing efforts around geographic data including the alphashapes and betashapes projects. We are eager and excited to contribute back to the open geo ecosystem with a few projects that I demoed recently at foss4g-na and State of the Map US.
Geographic polygon / boundary data is important to us as a way to aggregate venues around places like cities and neighborhood. Finding a good source of city data around the world has proved difficult. For that reason, we’ve been curating a set of worldwide polygon data that we’re calling Quattroshapes. Quattroshapes debuted at Nathaniel Vaughn Kelso’s talk at State of the Map US this past weekend. The project combines normalizing open government data with synthesizing new polygons out of flickr photos and Foursquare checkin data in places where open government data is unavailable. It’s called quattroshapes because it’s the fourth iteration (that we know of) of the work flickr did on alphashapes and SimpleGeo on betashapes also, it’s based on a quadtree.
We use this polygon data in twofishes, our coarse, splitting, forward and reverse geocoder based on the geonames.org dataset. Twofishes has been open source since we first wrote it, but recently we’re releasing prebuilt indexes, complete with autocomplete and partial worldwide city-level reverse geocoding functionality. Twofishes is used in Foursquare Explore on the web. We’re looking at using it with our mobile applications as well to provide the best experience to our users. We’re also proud to say that our friends at Twitter have found a use for it as well.
We’re eager to collaborate with others on continuing to source and create this data. If you know of open (redistributable, commercial-friendly) datasets that we’ve missed, please let us know. If you have large sources of labeled point data that you think could help create more accurate inferred polygons, we’re interested in that too. If you make use of the quattroshapes or twofishes project, we’d love to hear how you’re using it and how it’s working out for you.
– David Blackman, Geo Lead at Foursquare
The gold standard for systems performance measurement is a load test, which is a deterministic process of putting a demand on a system to establish its capacity. For example, you might load test a web search cluster by playing back actual logged user requests at a controlled rate. Load tests make great benchmarks for performance tuning exactly because they are deterministic and repeatable. Unfortunately, they just don’t work for some of us.
At Foursquare, we push new versions of our application code at master/HEAD to production at least daily. We are constantly adding features, tweaking how old features work, doing A/B tests on experimental features, and doing behind-the-scenes work like refactoring and optimization to boot. So any load test we might create would have to be constantly updated to keep up with new features and new code. This hypothetical situation is reminiscent of bad unittests that basically repeat the code being tested — duplicated effort for dubious gain.
To make things even worse, a lot of our features rely on a lot of data. For example, to surface insights after you check in to a location on Foursquare we have to consider all your previous check-ins, your friends’ check-ins, popular tips at the venue, nearby venues that are popular right now, etc. etc. Creating an environment in which we might run a meaningful load test would require us to duplicate a lot of data, maybe as much as the whole site. A lot of data means a lot of RAM to serve it from, and RAM is expensive.
So we usually choose not to attempt these “canned” load tests. In lieu of a classic load test, our go-to pre-launch performance test is what we call a “dark test.” A dark test involves generating extra work in the system in response to actual requests from users.
For example, in June 2012, we rolled out a major Foursquare redesign in which we switched the main view of the app from a simple list of recent friend check-ins to an activity stream which included other types of content like tips and likes. Behind the scenes, the activity stream implementation was much more complex than the old check-in list. This was in part because we wanted to support advanced behavior like collapsing (your friend just added 50 tips to her to-do list, we should collapse them all into a single stream item).
Before and after the redesign
Perhaps surprisingly, the biggest driver of additional complexity was the requirement for infinite scroll, which meant we needed to be ready to materialize any range of activity for all users. Since the intention was for the activity stream to be the main view a user sees upon opening the Foursquare app, we knew that the activity stream API endpoint would receive many, many requests as soon as users started to download and use the new version of the app. Above all, we did not want to make a big fuss about this great new feature and then give our users a bad experience by serving errors to them when they tried to use it. Dark testing was a key factor in making the launch a success.
The first version of the dark test was very simple: whenever a Foursquare client makes a request for the recent check-ins list, generate an activity stream response in parallel with the recent check-ins response, then throw the activity stream response away. We then hooked this up to a runtime control in our application which permitted it to be invoked on an arbitrary percentage of requests, so we were able to generate this work for one percent, five percent, 20 percent, etc. of all check-in list requests. By the time we were a few weeks out from redesign launch, we were running this test 24/7 for one-hundred percent of requests, which gave us pretty good confidence that we could launch this feature without overloading our systems.
Click here to read the full post.
– Cooper Bethea (@cooperb)
Yesterday, BlackBerry announced their first BB10 devices. We here on the BlackBerry team at Foursquare are really excited about the launch and wanted to give our awesome third party developers something to help them get the most out of Foursquare and BlackBerry 10. With the help of the amazing Invocation Framework (learn more here) we have opened up a few parts of the Foursquare for BlackBerry 10 app to developers to enrich their native apps with Foursquare content easier than ever before. Check out the two examples below and then head over to GitHub to check out our sample app and get started.
Foursquare Single Sign On (SSO)
The first thing we’ve opened up, and a personal favorite, is the ability for your users to connect their Foursquare accounts with the click of a button. Instead of every app having to make their own WebView wrapper solution to the OAuth flow for obtaining an access token, we’ve built it right into the native Foursquare app for everyone to use. You can now let a user login to your app through Foursquare in 6 lines of code, and the user never has to leave the context of your app.
It is up to the user whether to approve or deny your app. We will send their action back to you, along with the access token if they decided to link their Foursquare account with your app.
Easy as that! So be sure to include a “connect with Foursquare” option in your app to reduce friction for new users signing up!
Foursquare Place Picker
More and more often the most engaging content that users can create comes with a location attached to it, whether that’s a picture posted on Instagram or a beer being checked into on Untappd. With the place picker api, you can build this rich content into your app built on the power of the over 50 million places in the Foursquare database. The best part about this is that just like the SSO api, you get the native Foursquare UI, network requests and GPS functionality built into your app without having to write any of it. Just use the invocation framework to launch it. If you know what your user is looking for already, you can pass in a query to prime the search with and if you have already authenticated a user, just pass in their token for personalize results!
Once a user selects a place, we’ll return back to you the JSON data for that place that you can process and then do whatever you need to do with it!
– Kyle, Foursquare BlackBerry Engineer
The Opportunity Gap: learn about the public schools in your neighborhood with @ProPublica and Foursquare
As our recent hackathon showed, there are tons of ways that developers can use Foursquare to power amazing apps – from ones that shame you into going to the gym, to others that alert you to restaurants with health code violations.
Today, ProPublica, an award-winning investigative journalism site, is relaunching its Opportunity Gap news application, which helps people find and compare statistics about public schools across the nation. With their new Foursquare integration, you can connect your Foursquare account to instantly see statistics for schools you’ve checked in to before. And when you’re out, you can instantly get stats about a school on your phone whenever you check in to one. It’s a great example of how news organizations can use Foursquare to reach their readers with relevant information when they’re out in the real world.
Learn more about The Opportunity Gap and connect your Foursquare account here.
And take two! Although Sandy foiled our hackathon plans in November, we’re back and ready to hack it up with the best and brightest. On January 5, we’re inviting developers and designers in NYC, SF, and everywhere around the world to sign up and build some more amazing hacks using the Foursquare API. We’ll have prizes, swag, and (naturally) global glory for the best Foursquare hacks, no matter where in the world they originate.
Head over to our Meetup page and keep an eye on hackathon.foursquare.com for more details as the event draws nearer. If you’re in SF or NYC, sign up to work from our HQ, otherwise, you can use Meetup to connect with hackers in your area.
(Fun fact: back in 2010, two designer/developer friends got together and entered the first Foursquare hackathon. They built a snazzy little hack that took people’s Foursquare check-in histories and resurfaced them in the form of a daily email that showcased people’s check-ins from exactly one year ago. They dubbed their hack, “4SquareAnd7YearsAgo,” and today you might know it as Timehop.)
Now, go sign up for the Foursquare Hackathon 2013!
We’ve been running Mongo as our main storage engine for almost 3 years. For most of that time the Mongo servers and all the rest of our infrastructure were hosted on Amazon’s EC2. We recently migrated the Mongo servers onto our own hardware hosted in a datacenter and now have a hybrid environment with everything else still on EC2. I’ll be talking about why and how we did this at the upcoming MongoSV conference on Tuesday, December 4th, 2012.
The name of the talk is “MongoDB at Foursquare: From the the cloud to bare metal.” Come check it out!
– Jon Hoffman (@hoffrocket)
Want to know if there’s a teaser after the movie credits, or the healthiest dish to order at a restaurant? There are a bunch of apps you can connect to your Foursquare account to help you make the most of your check-ins. Our gallery keeps growing – check out some of the latest additions:
- The Winester Square – it’s like Untappd for wine! Learn about the reds and whites on the menu after you check in.
- YOLO tells you the most expensive item on the menu.
- #mom helps you quickly let your mom know that you’re safe and sound with a call or text when you check in.
- GeoPollster will tell you which political party the businesses you check in at support.
To connect these and other apps to your Foursquare account, go to our app gallery at Foursquare.com/apps or the settings screen in the Foursquare app on your phone. Got an idea for an app? Build one (and win prizes!) at our upcoming Global Hackathon on November 3.
At Foursquare, we use Apache Oozie to manage the scheduling and control of our offline data processing workflows. We’ve had great success with the project, and we run upwards of 1000 Oozie workflows per day.
Despite the quality of Oozie’s core workflow engine, the web UI is a little clunky, and is franky unusable in a lot of circumstances, especially when you’re using it at a moderate scale.
- Unique URL’s for coordinators, and workflows
- Proper ordering of coordinator / workflow actions
- Syntax highlighting of job definition and configuration files
- Coordinator actions link to their corresponding workflows
- Workflow actions link to their corresponding hadoop jobs
- Re-run failed coordinator actions with a single click.
- A better search implementation that matches substrings in workflow names
We’ve been using Oozie Web internally for a couple of months now, so we figured it was about time to make the project open-source and give back to the community. We’re releasing the project under the Apache 2.0 license, and it’s available right now on github: http://github.com/foursquare/oozie-web
Last Friday, I spoke at DataGotham about how foursquare data can provide an unprecedented view into the behavior of cities. In this talk, I focus on what we can learn about New York City from aggregating the check-ins of millions of New Yorkers, and demonstrate tools that we are building to help make cities, like New York, easier to use.