The Solr suggester component allows you to vastly improve your search capabilities and experience. It provides users with automatic suggestions for query terms and can be used to implement useful auto-suggest features in your search application.
In a previous post we covered several suggester cases with the following implicit assumptions:
- That there is a separate Solr index for suggestions and that each suggestion is a document within this index; and
- The Solr query is created specifically to find suggestions.
The most common questions that we received on those posts were centered around the need to separate indexes and queries, as well as if there was a built-in dedicated Solr suggester that can manage all aspects of these two tasks. So let’s now take a more detailed look at Solr suggester and how it can be utilized.
What is Solr suggester?
Suggester is a search component, which is a building block of Solr’s search pipeline.
To make this component work, two things need to be configured in the search engine’s config: the data source for suggestions (dictionaryImpl parameter); and how these suggestions are stored and searched in query-time (lookupImpl parameter). Collection of weighted suggestions can be taken from the index or from an alternate data source such as a file stored on disk. Data can be loaded from a source into a lookup-ready data structure automatically on startup/reload (buildOnStartup parameter).
Lookup implementation is varied and the following features can be compared:
- AnalyzingSuggester — holds suggestions in FST (in-memory data structure) and allows prefix matches (single term suggestions as described in our previous post). It can also be used to apply analysis to suggestions and user input (SuggestAnalyzerFieldType parameter), which is a powerful feature. You can additionally remove stop words, apply stemming, and address several other requirements. For example, if your desired outcome is multiterm ordered suggestions, then try to:
- Create a field type with a SuffixSingleTokenFilter in an index chain and put it into schema.xml
- Use this type as the SuggestAnalyzerFieldType value
- AnalyzingInfixSuggester — also includes an analysis feature. However, under the hood it applies the EdgeNGramTokenFilter filter to suggestions to give multiterm unordered suggestions. In this case, it loads data into a separate internal index and transforms the user’s input into a term query.
- FuzzySuggester — is an AnalyzingSuggester with a fuzzy match. It works like a suggestions tool with built-in spell correction. This sounds like a great feature, however it can have drawbacks if not used correctly. For example, it is not possible to boost suggestions without spell corrections over suggestions with spell corrections.
- FSTCompletionLookup, WFSTCompletionLookup — this follows a similar process to loading suggestions into FST but provides alternate implementation options.
- TSTLookup — stores suggestions into the Ternary Search Tree.
Several suggesters can be chained inside one component to take suggestions from two fields or from a field and a file.
SuggestComponent needs to be added into the handler’s components chain to serve requests. It can be a separate handler or common handler to receive documents, facets, and suggestions in the same request.
Should you use SuggestComponent?
Ultimately, the final choice is up to you but we believe that experimenting with your data and environments can be highly beneficial. Based on our experience, here are our recommendations for exploring your best options:
- Functionality — SuggestComponent doesn’t tabulate some features that couldn't be (relatively) easily covered by using an “index + query” approach, whether it be for a complex weights calculation, stop words removal, or basic context filtration. SuggestComponent does offer some pre-prepared features but if you need full control or you have complex requirements then you will often find it’s not enough for your needs.
- Performance — Lookups against in-memory FST work incredibly fast, offering several FST-based implementations to compare so you can then choose the one with the best outcome for a particular case. However, performance of a term query against the suggestion index in a well-tuned environment is usually sufficient for almost all cases. And if you can reduce the suggestions search to a single term search, which will result in a corresponding increase in the suggestion index, then this is the simplest and safest option to use.
- Maintainability— monitoring using Solr's index is much more reliable than using an internal in-memory data structure or internal index.
The built-in suggestion component is a handy element of Solr’s functionality that includes some non-trivial capabilities that you will find useful. However, it is important to recognize that there are no magic tricks going on under the hood and it is possible (and in some cases necessary) to implement suggestion services without it. But you will find that diving into the internals of the suggester’s capabilities will help you to better understand Solr and improve the quality of your suggestions.