Rogue: A Type-Safe Scala DSL for querying MongoDB

Rogue is our newly open-sourced, badass (read: type-safe) library for querying MongoDB from Scala. It is implemented as an internal DSL and leverages the type information in your Lift records and fields to keep your queries in line. Here’s a quick look:

Checkin where (_.venueid eqs id)
  and (_.userid eqs mayor.id)
  and (_.cheat eqs false)
  and (_._id after sixtyDaysAgo)
  select(_._id) fetch()

VenueRole where (_.venueid eqs this.id)
  and (_.userid eqs userid)
  modify (_.role_type setTo RoleType.manager)
  upsertOne()

We’ve been developing and using it internally at foursquare for the last several months. You can now get the sources on github, and the packaged JAR is available on scala-tools.org under foursquare.com/rogue (current version is 1.0.2).

In this post, we’re going to dive in to some of the motivations and implementation details of Rogue, and hopefully show you why we think Scala (and MongoDB and Lift) are so awesome.

Background

At foursquare we use the Lift web framework for our ORM layer. Lift’s Record class represents a database record, and MetaRecord trait provides “static” methods for querying and updating records in a fully expressive way.

Unfortunately, we found the querying support a bit too expressive — you can pass in a query object that doesn’t represent a valid query, or query against fields that aren’t part of the record. And in addition it isn’t very type-safe. You can ask for, say, all Venue records where mayor = “Bob”, and it happily executes that query for you, returning nothing, never informing you that the mayor field is not a String but a Long representing the ID of the user. Well, we thought we could use the Scala type system to prevent this from ever happening, and that’s what we set out to do.

For reference, here’s a simplified version of our Venue model class:

class Venue extends MongoRecord[Venue] {
  object id extends Field[Venue, ObjectId](this)
  object venuename extends Field[Venue, String](this)
  object categories extends Field[Venue, List[String]](this)
  object mayor extends Field[Venue, Long](this)
  object popularity extends Field[Venue, Long](this)
  object closed extends Field[Venue, Boolean](this)
}

object Venue extends Venue with MongoMetaRecord[Venue] {
  // some configuration pointing to the mongo
  // instance and collection to use
}

Lift’s MongoMetaRecord trait provides a findAll() method that lets you pass in a query as a JSON object (MongoDB queries are in fact JSON objects), returning a list of records. For example, using lift’s JsonDSL, we can do:

Venue.findAll((Venue.mayor.name -> 1234) ~
              (Venue.popularity.name -> ("$gt" -> 5)))

which is equivalent to

Venue.findAll("{ mayor : 1234, popularity : { $gt : 5 } }")

which will return a List[Venue] containing all venues where the mayor is user 1234 and the popularity is greater than 5. And this all works fine until the day you do

Venue.findAll(Venue.mayor.name -> "Bob")
Venue.findAll(Venue.categories.name -> ("$gt" -> "Steve"))

which don’t really make sense and should be able to be detected by the compiler.

Scala to the rescue!

We would like to write an internal Scala DSL that lets you write queries like this:

Venue where (_.mayor eqs 1234)
Venue where (_.mayor eqs 1234) and (_.popularity eqs 5)

while enforcing some kind of type safety among records, fields, conditions and operands. To start off, we need to pimp the MongoMetaRecord class to support the where and and methods.

implicit def metaRecordToQueryBuilder[M <: MongoRecord[M]]
    (rec: MongoMetaRecord[M]) =
  new QueryBuilder(rec, Nil)

class QueryBuilder[M <: MongoRecord[M]](
    rec: M with MongoMetaRecord[M],
    clauses: List[QueryClause[_]]) {
  def where[F](clause: M => QueryClause[F]): QueryBuilder[M] =
    new QueryBuilder(rec, clause(rec) :: clauses)
  def and[F](clause: M => QueryClause[F]): QueryBuilder[M] =
    new QueryBuilder(rec, clause(rec) :: clauses)
}

Notice that the where method applies the clause function argument to the MetaRecord (rec) in use. So in a query like Venue where (_.mayor ...), the where method applies _.mayor to Venue, yielding Venue.mayor. So what about the eqs 1234 part? We have something like Venue.mayor, which is a Field, and we need to return a QueryClause[F] (F represents the field type, Boolean or String or whatever). So all we need to do is pimp the Field class and add the method eqs, which will take an operand (e.g., 1234) and return a QueryClause[F].

implicit def fieldToQueryField[M <: MongoRecord[M], F]
    (field: Field[M, F]) =
  new QueryField[M, F](field)

class QueryField[M <: MongoRecord[M], F]
    (field: Field[M, F]) {
  def eqs(v: F) = new QueryClause(field.name, Op.Eq, v)
}

(Op is just an enumeration that defines all the comparison operators that MongoDB supports: Eq, Gt, Lt, In, NotIn, Size, etc. The .name method is provided by Lift’s Field class, which through the magic of reflection is a String that matches the name of the field object as it is declared in the Record.)

So an expression like Venue where (_.mayor eqs 1234) gets expanded by the compiler to:

metaRecordToQueryBuilder(Venue).where(rec =>
  fieldToQueryField(rec.mayor).eqs(1234))

This allows the compiler to enforce two things: that the field specified (mayor) is a valid field on the record (Venue), and that the value specified (1234) is of the same type as the field (Long) — notice that the eqs method takes an argument of type F, the same type as the Field.

More operators

We can extend this to support other conditions besides equality. The Scala type system helps us once again in ensuring that the condition used is appropriate for the field type.

implicit def fieldToQueryField[M <: MongoRecord[M], F](field: Field[M, F]) =
  new QueryField[M, F](field)
implicit def longFieldToQueryField[M <: MongoRecord[M]](field: Field[M, Long]) =
  new NumericQueryField[M, F](field)
implicit def listFieldToQueryField[M <: MongoRecord[M], F](field: Field[M, List[F]]) =
  new ListQueryField[M, F](field)
implicit def stringFieldToQueryField[M <: MongoRecord[M]](field: Field[M, String]) =
  new StringQueryField[M](field)

class QueryField[M <: MongoRecord[M], F](val field: Field[M, F]) {
  def eqs(v: F) = new QueryClause(field.name, Op.Eq, v)
  def neqs(v: F) = new QueryClause(field.name, Op.Neq, v)
  def in(vs: List[F]) = new QueryClause(field.name, Op.in, vs)
  def nin(vs: List[F]) = new QueryClause(field.name, Op.nin, vs)
}

class NumericQueryField[M <: MongoRecord[M], F](val field: Field[M, F]) {
  def lt(v: F) = new QueryClause(field.name, Op.Lt, v)
  def gt(v: F) = new QueryClause(field.name, Op.Gt, v)
}

class ListQueryField[M <: MongoRecord[M], F](val field: Field[M, List[F]]) {
  def contains(v: F) = new QueryClause(field.name, Op.Eq, v)
  def all(vs: List[F]) = new QueryClause(field.name, Op.All, v)
  def size(s: Int) = new QueryClause(field.name, Op.Size, v)
}

class StringQueryField[M <: MongoRecord[M], F](val field: Field[M, String]) {
  def startsWith(s: String) = new QueryClause(field.name, Op.Eq, Pattern.compile("^" + s))
}

You can see that only certain field types support certain operators. No startsWith on a Field[Long], no contains on a Field[String], etc. So now we can build queries like

Venue where (_.venuename startsWith "Starbucks")
      and   (_.mayor in List(1234, 5678))

without having to worry about the stray (and admittedly contrived)

Venue where (_.mayor startsWith "Steve")
      and   (_.venuename contains List(1234))

Executing queries

Now once we have a QueryBuilder object, it is a straightforward exercise to translate it into a JSON object and send it to lift to execute.  This is done by the fetch() method:

Venue where (_.mayor eqs 1234)
      and   (_.categories contains "Thai") fetch()

It’s also a simple matter to support .skip(n), .limit(n) and .fetch(n) methods on QueryBulder.

Summary

To recap, Rogue enforces the following, at compile time:

  1. the field specified in a query clause is a valid field on the record in question
  2. the comparison operator specified in the query clause makes sense for the field type
  3. the value specified in the query clause is the same type as the field type (or is appropriate for the operator)

In the next post, we’ll show you how we added sort ordering to the DSL and how we used the phantom type pattern in Scala to prevent, again at compile time, constructions like this:

Venue where (_.mayor eqs 1234) skip(3) skip(5) fetch()
Venue where (_.mayor eqs 1234) limit(10) fetch(100)

In the meantime, go check out the code — contributions and feedback welcome!

- Jason Liszka and Jorge Ortiz, foursquare engineers