The Solr suggester component allows you to vastly improve your search capabilities and experience. It provides users with automatic suggestions for query terms, and can be used to implement useful auto-suggest features in your search application.
In a previous post, we covered several suggester cases with the following implicit assumptions:
There is a separate Solr index for suggestions
Each suggestion is a document within this index
The Solr query is created specifically to find suggestions
The most common questions that we received on that post centered around the need to separate indexes and queries, as well as if there was a built-in dedicated Solr suggester that can manage all aspects of these two tasks. So, let’s take a more detailed look at Solr suggester and how it can be utilized.
What is Solr suggester?
Suggester is a search component, which is a building block of Solr’s search pipeline.
To make this component work, two things need to be configured in the search engine’s config: the data source for suggestions (dictionaryImpl parameter), and how these suggestions are stored and searched in query-time (lookupImpl parameter). A collection of weighted suggestions can be taken from the index, or from an alternate data source such as a file stored on a disk. Data can then be loaded from a source into a lookup-ready data structure automatically upon startup/reload (buildOnStartup parameter).
Lookup implementation is varied and the following features can be compared:
AnalyzingSuggester — holds suggestions in FST (in-memory data structure) and allows prefix matches (single term suggestions as described in our previous post). It can also be used to apply analysis to suggestions and user input (SuggestAnalyzerFieldType parameter), which is a powerful feature. You can additionally remove stop words, apply stemming, and address several other requirements. For example, if your desired outcome is multiterm ordered suggestions, then try to:
Create a field type with a SuffixSingleTokenFilter in an index chain and put it into schema.xml
Use this type as the SuggestAnalyzerFieldType value
AnalyzingInfixSuggester— also includes an analysis feature. However, under the hood, it applies the EdgeNGramTokenFilter filter to suggestions to give multiterm unordered suggestions. In this case, it loads data into a separate internal index, and transforms the user’s input into a term query.
FuzzySuggester— is an AnalyzingSuggester with a fuzzy match. It works like a suggestions tool with built-in spell correction. This sounds like a great feature. However, it can have drawbacks if not used correctly. For example, it is not possible to boost suggestions without spell corrections over suggestions with spell corrections.
FSTCompletionLookup, WFSTCompletionLookup — this follows a similar process to loading suggestions into FST, but provides alternate implementation options.
Several suggesters can be chained inside one component to take suggestions from two fields, or from a field and a file.
SuggestComponent needs to be added into the handler’s components chain to serve requests. It can be a separate handler or common handler to receive documents, facets, and suggestions in the same request.
Should you use SuggestComponent?
Ultimately, the final choice is up to you, but we believe that experimenting with your data and environments can be highly beneficial. Based on our experience, here are our recommendations for exploring your best options:
Functionality — SuggestComponent doesn’t tabulate some features that aren't (relatively) easily covered when using an “index + query” approach, whether it be for a complex weights calculation, stop words removal, or basic context filtration. SuggestComponent does offer some pre-prepared features, but if you need full control or you have complex requirements, then you will often find it’s not enough for your needs.
Performance — Lookups against in-memory FST work incredibly fast, offering several FST-based implementations to compare so that you can then choose the one with the best outcome for a particular case. However, performance of a term query against the suggestion index in a well-tuned environment is usually sufficient for almost all cases. And, if you can reduce the suggestions search to a single term search, which will result in a corresponding increase in the suggestion index, it would be the simplest and safest option to use.
Maintainability— monitoring using Solr's index is much more reliable than using an internal in-memory data structure or internal index.
This built-in suggestion component is a handy element of Solr’s functionality, as it includes some non-trivial capabilities that you will find useful. However, it is important to recognize that there are no magic tricks under the hood, and it is possible (and in some cases necessary) to implement suggestion services without it. However, you will find that diving into the internals of the suggester’s capabilities will help you to better understand Solr and improve the quality of your suggestions.