Semantic query parsing blueprint
Sep 16, 2020 • 10 min read
Sep 16, 2020 • 10 min read
Here is a simple type of faceted search you see on many e-commerce websites:
In this example, we’ve already selected “Sleeveless Dresses.” The categories shown in the left column (Price, Features, Customer Rating, Brand) represent faceted search options, which differ from simple filters because, with faceted search, we can offer our customers multiple filters at the same time. For example, you can select a “Price” of $10 - $19.99, which gives you 98 choices, then narrow your choice further by specifying your favorite "Color." After selecting “Color,” you can choose the minimum “Customer Rating” you find acceptable. Do you have a brand of clothing you prefer? You can choose that, too, with each selection acting as one facet in a single multi-faceted search.
Here’s another faceted search example, this time showing basic implementation details:
The screenshot above is taken from an online retailer’s website. According to the graphic, a dress can be blue, pink or red, and only sizes XS and S are available in blue. However, for merchandisers and customers this dress is considered a single product, not many similar variations. When a customer navigates the site, she should see all SKUs belonging to the same product as a single product, not as multiple products. This means that for facet calculations, our facet counts should represent products, not SKUs. Thus, we need to find a way to aggregate SKU-level facets into product ones.
A common solution is to propagate properties from the SKU level to the product level and produce a single product document with multivalued fields aggregated from the SKUs. With this approach, our aggregated product looks like this:
However, this approach creates the possibility of false positive matches with regards to combinations of SKU-level fields. For example, if a customer filters by color ‘Blue’ and size ‘M’, Product_1 will be considered a valid match, even though there is no SKU in the original catalog which is both 'Blue' and 'M'. This happens because when we are aggregating values from the SKU level, we are losing information about what value comes from what SKU. Even though this situation looks like an edge case, in a real life application it can result in a really bad customer experience. Imagine a situation where a customer searches for a particular item, filters by color and size, only to discover on the checkout page that there is no such item available in the catalog. This type of website behavior can really frustrate your customers and have a strong negative impact on loyalty.
Getting back to the technology, this means we should carefully support our catalog structure when searching and faceting products. The problem of searching structured data is already addressed in Solr with a powerful, high performance and robust solution: Block Join Query. We write extensively about this approach in another blog post.
However, the problem of faceting structured data required further work, so we created SOLR-5743 in February 2014 to bring faceting support to Solr, and we have worked on it ever since. Now that it is committed to trunk, we have a well-documented method of implementing block join faceting in Solr.