How we built our Model Training Engine

At Foursquare, we have large-scale machine-learning problems. From choosing which venue a user is trying to check in at based on a noisy GPS signal, to serving personalized recommendations, discounts, and promoted updates to users based on where they or their friends have been, almost every aspect of the app uses machine-learning in some way.  All of these queries happen at a massive scale: we average one million Explore queries and six million check-ins every day. Not only do we have to process each request faster than the blink of an eye, but these millions of user interactions are giving us millions of data points to feed back into our models to make them better. We’ve been building out a Model Training Engine (MTE) to automate our (machine) learning from user data.  Here’s an overview to whet your appetite.

Fitting the model to the data rather than the data to the model.

Many models are built using linear regressions or similar approaches. While these models can help us quickly understand data (and we certainly make use of them), they make convenient but unrealistic assumptions and are limited in the kinds of relationships they can express. The MTE uses techniques liked Boosted Decision Trees or Random Forests (we have both a scikit-learn and an in-house MapReduce based implementation) to learn much more detailed and nuanced models that fit the data better.

Keeping models fresh and relevant.

With 6 million new check-ins a day, models quickly get stale. The MTE automatically retrains models daily based on the latest signals and the latest data. New signals and changes in old signals are immediately incorporated into new models and we monitor and deploy newer models when they outperform older ones.

Model training that scales with data and the organization.

With a large-scale, very interconnected system, changes made by other engineers on a seemingly unrelated app feature could throw off a very carefully calibrated model. How do we make model building scale across billions of check-ins and an entire organization without engineers stepping on each other’s toes?

To make models scalable across our data, we’ve rolled our own online learning algorithms and use clever downsampling techniques when we cannot load the entire dataset into memory. We use techniques like bagging and cross-validation to optimally understand how to combine different signals into a single prediction in a way that maximizes the contribution from each signal without picking up on spurious correlations (aka overfitting). This means that no one can throw off the model by adding or tweaking a signal. For example, If an engineer accidentally adds random noise (e.g. dice rolls) as a signal, the MTE would quickly detect that signal was not predictive and ignore it. This allows us to be open to new ideas and signals from pretty much anyone at the company, not just data scientists.

What’s more, the MTE can adapt to frequent UX and other product changes, all without human intervention. For example, if our mobile team changes the UI to make friends’ prior visits more prominent, the MTE will automatically detect that users are weighing social signals more heavily and adjust our models accordingly. And our automated Model Training Engine means that engineers can concentrate on building signals and let the model training select their best ones.

All of these quality improvements are translating into a better and smarter user experience. More details (with code) and quality improvements to come!

–Michael Li, Data Scientist
@tianhuil