Taming Search with Data::SearchEngine
Sooner or later it's going to happen: Someone will request a feature of your application's search code. It might be gentle at first. A casual remark about speed, functionality or scalability will meander into your bug tracker, standup meeting or planning session. At first you will nod and file it away, knowing that it takes a few requests for something to really stick. Pretty soon a second, perhaps unrelated, request will arrive. Before you know it you'll be surrounded by reminders, almost Tribble-like, of a sobering fact:
SELECT * FROM table WHERE description LIKE "whatever%" isn't going to cut it anymore.
Investigating Your Options
There are lots of ways to add search to your application. The details of which largely depend on the type of data you are searching. You are on your own for evaluating and testing search-engines. There are plenty of resources for that task.
Instead, lets focus on what to do to minimize the impact to your application when adopting or changing search engines.
The Problem with Search
Every search library has a different interface. Assuming you are using something similar to MVC, you'll have code in each layer that deals with the implementation-specific functionality. Your controller will have to parse requests and build queries to send to the model, and the view will need to iterate over and display the results. The model will bear the brunt of the changes, but that's what models are for.
Needing to rewrite our controller and view every time we adjust our search or – worse yet – each time we evaluate a new search product is a real pain. I bet we can fix this if we just add another layer of abstraction!
Data::SearchEngine is a toolbox that comes with everything you need to wrap a pretty API around your search implementation. It even has two wrappers already written: one for Solr and one for ElasticSearch. Before we talk about those let's take a moment to wrap up your average SQL-based search with these tools so you can see how they work.
Step 1: Subclass!
First, you'll want to create a Data::SearchEngine::MySearch that wraps your implementation:
We're consuming a Moose roles called Data::SearchEngine that requires the implementation of a method called
search. Let's imagine that your search code just searches a databases using
LIKE. I'm sure you can imagine a bit of code that executes that query and gets back a resultset, right? Great! Let's move on to the next bit then.
Note: That role also requires that you implement
find_by_id. You can just make an empty sub to satisfy it for now.
Step 2: Getting the Query
The query is the request that the user has given us to find something. This is where the rubber really meets the road, as we need to create a query format that any search engine can use. We won't try to abstract the syntax, but we can provide a container:
Data::SearchEngine::Query gives us a simple Query object:
Easy, eh? Your search backend may need more information or have a more complex query format, but that's ok. Data::SearchEngine::Query has a permissive
query attribute plus hooks for things like filters.
Step 3: Results!
That last code example showed getting results back. How does that work? Let's write it! Start with our
MySearch example earlier, but put some meat on it's bones:
That bit of code is pretty simple. We run our query and then store each row that is returned for that page in a Data::SearchEngine::Results object. Now that we have our results we can show them to the user.
Note: Data::SearchEngine uses a special paginator class called Data::Paginator that has many of the features of Data::Page and Data::Pageset. Since all of Data::SearchEngine is serializable there needed to be an easily serializable, Moose-based pagination module. Hence Data::Paginator!
Step 4: Show Our Answers
The aforementioned Results object has an attribute
items. This is an array of Data::SearchEngine::Item objects. Displaying our results is as simple as iterating over this array. We'll write this in Perl, but it's easy to translate into your favorite templating module.
That's it! You'll use
get_value to retrieve any fields other than
id from the item.
Done, So Now What?
You've now successfully wrapped your internal search code with a powerful abstraction. You could now easily experiment with ElasticSearch or Solr, the two search products for which there are existing Data::SearchEngine backends. Or you could take what you've just learned and create a new backend for a different search product.
Some Other Noteworthy Features
The Query object has lots of convenience methods for filtering (limiting your results via a filter such as "price > 20") and faceting (counting the number of items with different attributes so you can filter them). It will also generate a unique digest based on it's attributes so that you can cache results.
Finally, keep in mind that Query is just a guide. Your implementation may require much more complex syntax and Data::SearchEngine tries to stay out of the way. For example the ElasticSearch query DSL uses hashrefs, not strings:
You might not change search backends every week, but taking a bit of time to wrap your custom implementation in something featureful can save you a lot of trouble down the road. It also provides you with some great features as a result!