Slashem: A Rogue-like, type-safe Scala DSL for querying Solr

Slashem is our new spiffy Rogue-like type-safe* DSL for querying Solr. If you are curious about Rogue, we’ve talked about it in some of our previous blog posts (launch, Going Rogue, Part 2: Phantom Types). Solr is an open source full-text search platform from the Apache Lucene project which we use (not surprisingly) for searching things. Here’s a look at how some simple queries look like in slashem:

SolrVenue where (_.default contains "club")
      useQueryType("edismax")
      phraseBoost(_.name,2.5)
SolrUser where (_.fullname contains "jon")
      boostQuery(_.friend_ids in List(110714,
                                      1048882,
                                      2775804,
                                      364701,
                                      33).map(_.toString))
      useQueryType(“edismax”)
SolrUser where (_.fullname contains “jon-shea”)
SolrVenue where (_.name eqs “Blue Bottle”)

The resulting Solr queries they make are:

/solr/select?q=(club)&defType=edismax&start=0&rows=10&pf=name^2.5&pf2=name^2.5&pf3=name^2.5
/solr/select?q=fullname:(jon)&defType=edismax&start=0&rows=10&bq=friend_ids:("110714" OR "1048882" OR "2775804" OR "364701" OR "33")
/solr/select?q=fullname:(jon\-shea)&defType=edismax&start=0&rows=10&bq=friend_ids:("110714" OR "1048882" OR "2775804" OR "364701" OR "33")
/solr/select?q=name:(”Blue Bottle”)&start=0&rows=10

As you can see Slashem also takes care of any escaping that might be necessary :)
If you want to skip the chit-chat and get right to it, you can check out our github project for slashem at https://github.com/foursquare/slashem.

Background:

In the dark ages, before Rogue, all of our queries we’re written by manually crafting strings. Sure they were lovingly hand crafted, and some would argue it’s not the same with a robot, but there were mistakes, too. A query would come out a little too large, or try and place two separate limits on the same query. Sometimes we would even query against a field that didn’t exist. Worst of all, you had to wait until runtime to find out that your precious hand-crafted query just didn’t make the cut. The first to modernize was our Mongo queries with the introduction of Rogue, and now with Slashem our Solr queries are now generated by an unfeeling type-safe robot as well.

Much like in Rogue, you start by making a record definition. The types are pretty closely mapped to the ones you will have in your Solr’s schema.xml. Here is a version of our Event schema:

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"/>
<field name="venueid" type="string" indexed="true" stored="true" required="true" multiValued="false"/>
<field name="name" type="text" index="true" stored="true" required="false" />
<field name="tags" type="text" index="true" stored="false" required="false" />
<!-- event time -->
<field name="start_time" type="date" index="true" stored="true" required="false" />
<field name="expires_time" type="date" index="true" stored="true" required="false" />
<field name="lat" type="double" indexed="true" stored="true"/>
<field name="lng" type="double" indexed="true" stored="true"/>
<field name="geo_s2_cell_ids" type="text_ws" indexed="true" stored="false" omitNorms="true" omitTermFreqAndPositions="true"/>

And our resulting model looks like:

class SolrEvent extends SolrSchema[SEvent] {
    def meta = SolrEvent

    // The default field will result in queries against the default
    // field or if a list of fields to query has been specified to
    // an edismax query then the query will be run against this.
    object default extends SolrDefaultStringField(this)

    // This is a special field to allow for querying of *:*
    object metall extends SolrStringField(this) {
        override def name="*"
    }
    object id extends SolrObjectIdField(this)
    object venueid extends SolrObjectIdField(this)
    object lat extends SolrDoubleField(this)
    object lng extends SolrDoubleField(this)
    object name extends SolrStringField(this)
    object tags extends SolrStringField(this)
    object start_time extends SolrDateTimeField(this)
    object expires_time extends SolrDateTimeField(this)
    object geo_s2_cell_ids extends SolrGeoField(this)
}

A simple query for all events containing “dj hixxy” would be:

SolrEvent where (_.name contains “dj hixxy”)

We might also care about all of the events at a given venue:

SolrEvent where (_.venueid eqs
                 new ObjectId("4dc5bc4845dd2645527930a9"))

We could follow up with a query for all events between two dates:

SolrEvent where (_.start_time inRange(startTime,endTime))

Another important part of slashem, in addition to writing queries, is that the type checker verifies that queries are sensible. For example we can see that it refuses to perform a date range search on the name field:

SolrEventTest where (_.name inRange(startTime,endTime))

results in:

error: type mismatch;
found : org.joda.time.DateTime
required: String

Which helps keep us from accidentally writing bad queries :)

And while doing a geo filter query on the geo_s2_cell_ids field compiles:

SolrEventTest where (_.default contains “DJ Hixxy”)
                    useQueryType("edismax")
                    filter(_.geo_s2_cell_ids inRadius(geoLat,
                                                      geoLong,
                                                      1))

Trying to run it against the tags field does not compile. Yay for the compiler :)

Specifying return fields

While the above queries have been lovely, some people actually want return data from their queries (crazy!). You can specify the fields to fetch with a simple fetchField like so:

SolrEvent where (_.default contains “pirates”) fetchField(_.name)

Executing Queries

Executing your queries is simple. For a blocking request, simply take the query and then call fetch() on it. The query is then executed against one of your Solrs (most likely, you only have one, but thanks to finagle we support multiple backends). We also provide a fetchBatch method, which takes a function and applies it over the Solr results in a given batch size. If you want to perform a non-blocking request and get a future back, you can call .fetchFuture() (thanks again to finagle for making this easy :)).

The return type is a generic SearchResults. The general information (like # of results) is in the responseHeader. From the response, you can extract your actual resutling documents in a few ways.

  1. The most sane way is to call .results on the response, yielding a list of instances of your schema class.
  2. If you just want a list of object ids which are named “id” then you can call .oids. This option exists mostly because it was an frequent important code path for us.
  3. If you don’t want any of the record stuff getting in your way for some reason, you can call .docs and get an array of HashMap’s of [String,Any]. I don’t recommend this.

Contributions (code, comments, docs) always welcome!

- @holdenkarau, @jliszka, @jonshea, @kabragovind