Solr Search: How to Avoid the Black Box
A question that comes up often in terms of Solr Search is, "why are my search results appearing in this particular order?"
You can play with the Boost and other UI configuration but how do you know what Solr is actually returning to determine that order? This blog post aims to shed some light on what some see as a black box. To tackle this, we can first perform a direct Solr query in debug mode and look at the results of what Solr is returning.
These results are what are used by Search API Views, Search Pages, and other interfaces driven by Solr results. We then need to determine the Solr URL. To do this, look at your Solr configuration where you'll see things like "host," "port" and "path" - all of which you will need to construct your URL. In our example, we will assume the following configuration of Solr:
'host' => 'localhost' 'port' => 1234 'path' => '/solr/drupal'
Our Solr URL would thus be:
Now: don't try the URL just yet; we still need to add an action to use this in the browser . For our purposes, we will use the "select" action (though there are others available). Adding the "select" action will allow Solr to return XML to our browser representing the content in it's own default order, like this:
We're still not done yet though. The "select" action now needs more variables to drill down to the specific query we want. In our test case we want to perform a search for the word "food" in our content. For this, we will append the following variables (each explained individually):
- fl=item_id,score,ss_title Field limit tells Solr to only return the item_id (same as Node ID), score (ranking) and ss_title (Node's title) field. You can specify fl=* if you want everything and restrict fields from there.
- fq=ss_type:"recipes"&fq=is_status:"1"&fq=index_id:"node_index_solr" Field query does additional queries on individual fields. This queries content of type recipies that is published from our preferred Solr index (specifying the index in case multiple indexes exist in Solr).
- sort=score+desc Pre-sorts the content by score so the best results are first.
- debugQuery=true Returns additional debug information from Solr.
- qf=tm_search_api_aggregation_1^1.0 Tells Solr what field to search into assuming we don't want to use the default Solr fields. This field specifically is the aggrigate content field available with Search API.
- q="food" Finally, the string we want to query content for.
All together, our query URL to enter in the browser is therefore:
Solr will the return results for us in XML format. For more information on reading the results, refere to the following resources:
Q: How do I troubleshoot an existing search page? A: All Solr queries should be logged. One common directory where these logs are placed is /var/log/solr but can vary depending on your server configuration. Looking at the log files will allow you to capture queries and replicate them in browser using the steps above.
Q: How do I figure out why a result is where it is in the search order? A: explainOther is an additional Solr variable you can pass in a query to ask Solr to explain a result in comparison to given query. For example if we wanted to know why node ID 203 isn't higher we could run our above query and add to the end "&explainOther=if:YourSolrNodeID" where YourSolrNodeID is the ID of the content node in Solr. You will then see additional debug information about how Solr ranked that result in comparison to others.
We hope this has been helpful! If you have any additional questions, feel free to type them in the comments below.