Quattroshapes: A Global Polygon Gazetteer from Foursquare
Foursquare geographic infrastructure relies on numerous pieces of open geo software: PostGIS, GDAL, Shapely, Fiona, QGIS, S2, and JTS as well as open geographic data: OSM, geonames.org, US Census’ TIGER, Canada’s geogratis, Mexico’s INEGI and EuroGeoGraphics to name a few. We’ve been inspired by existing efforts around geographic data including the alphashapes and betashapes projects. We are eager and excited to contribute back to the open geo ecosystem with a few projects that I demoed recently at foss4g-na and State of the Map US.
Geographic polygon / boundary data is important to us as a way to aggregate venues around places like cities and neighborhood. Finding a good source of city data around the world has proved difficult. For that reason, we’ve been curating a set of worldwide polygon data that we’re calling Quattroshapes. Quattroshapes debuted at Nathaniel Vaughn Kelso’s talk at State of the Map US this past weekend. The project combines normalizing open government data with synthesizing new polygons out of flickr photos and Foursquare checkin data in places where open government data is unavailable. It’s called quattroshapes because it’s the fourth iteration (that we know of) of the work flickr did on alphashapes and SimpleGeo on betashapes also, it’s based on a quadtree.
We use this polygon data in twofishes, our coarse, splitting, forward and reverse geocoder based on the geonames.org dataset. Twofishes has been open source since we first wrote it, but recently we’re releasing prebuilt indexes, complete with autocomplete and partial worldwide city-level reverse geocoding functionality. Twofishes is used in Foursquare Explore on the web. We’re looking at using it with our mobile applications as well to provide the best experience to our users. We’re also proud to say that our friends at Twitter have found a use for it as well.
We’re eager to collaborate with others on continuing to source and create this data. If you know of open (redistributable, commercial-friendly) datasets that we’ve missed, please let us know. If you have large sources of labeled point data that you think could help create more accurate inferred polygons, we’re interested in that too. If you make use of the quattroshapes or twofishes project, we’d love to hear how you’re using it and how it’s working out for you.
- David Blackman, Geo Lead at Foursquare