Completion suggester elasticsearch analyzer

11/23/2023

The completion suggester was at most +19KQPS faster and at least +9KQPS faster compared with its prefix query counterpart for the prefix lengths of 1 to 6. The following shows the comparison of search performance (in KQPS) of the completion suggester (with and without using the newly added query-time doc-values based payload) and an equivalent prefix query with increasing number of prefix length: The FST size for the dataset was ~201.5 mb. One of the things that has made Elasticsearch so wonderfully accepted by everyone was that users had options to configure and use it for what made sense for their particular requirements/needs (since everyone is different) and just don't want to loose I have benchmarked the completion suggester on a single shard environment using the geonames dataset of ~9.3 million city names. Hope I'm not being too noisy, but would hate to see it be an unpleasant surprise or an upgrade blocker for that 20% of our user base. large scale and off heap memory usage at the cost of some slight performance. So therefore allowing our users to choose which approach works best for them based upon a tradeoff of speed for increased memory, vs. Generally speaking I tend to always follow the "golden rule" that says to "always put the user in control". For the 20% of folks (made up number following the 80/20 rule with our userbase) that need that blistering performance, that it will hurt them and they won't have an option to choose the old approach.We don't have concrete numbers (that I'm aware of) to validate the expectation.So while there will be some perf hit, I expect it won't matter in most use cases. Just want to make sure we don't have surprises and regression for customers. Hopefully there has been testing or other work that I missed reading this issue and the associated PRs that proves this a non-issue and apologies if I missed it. While payloads can be problematic if abused, many clients are smart about it and/or have small enough suggester FSTs that the extra memory associated with payloads is a non-issue compared to the performance expectation they have. Instead I believe it would be really nice to default to the new approach, but one could still opt-in to using payloads. My concern is not allowing clients who continue to want to make indices with completion suggester mappings that leverage payloads due to performance reasons. By leveraging payloads we were able to strive extreme performance as we never had to do a FETCH of the associated docs for field values, parse the json, etc. With the current 1.X completion suggester, the entire FST (including payloads) is held on heap (but persisted to disk). Let me try to elaborate and please let me know if I stated anything incorrectly. I have some concerns over performance with the change to remove the payload functionality. The completion fields are indexed in a special way, hence a field mapping has to be defined.įollowing shows a field mapping for a completion field named title_suggest: Return document field values via payload.Ĭompletion Suggester V2 is based on LUCENE-6339 and LUCENE-6459, the first iteration of Lucene's new suggest API.Searches suitable for serving relevant results as a user types.

The completions are indexed as a weighted FST (finite state transducer) to provide fast Top N prefix-based It is not meant for spell correction or did-you-mean functionality like the term or phrase suggesters.

This is a navigational feature to guide users to relevant results as they are typing, improving search precision. The filter is important, otherwise the search results will be case-sensitive.The completion suggester provides auto-complete/search-as-you-type functionality. The analyzer is called autocomplete_analyzer which makes use of the autocomplete_tokenizer - defined below - and uses a lowercase filter. In the end this is what needs to be done:Īt first we need to define an analyzer. It just took a while to get all pieces of the puzzle together. Trying to include the edge ngram analyser seemed a lot easier, and it was. Modifying elasticsuite to support completion suggester queries looked like a huge task.

While the latter seems the most performant option due to its usage of an in-memory data structure called Finite State Transducer(FST), we decided to go for the edge ngram analyser. You basically have 2 options, one is using the edge ngram analyser and the other is using the completion suggester feature. Let's step back first and check which solutions are offered by Elasticsearch when it comes to building an "AutoComplete functionality". When we tried to make use of the existing Autocomplete features, we realized that partial word matching was not supported. Elasticsearch is basically the first point of contact of the single-page application we have built on top of Magento. In a current Magento project we make heavy use of Elasticsearch via the smile/elasticsuite module.

0 Comments

discovery guide

Completion suggester elasticsearch analyzer

Leave a Reply.

Author

Archives

Categories