{
"_source": [
"__title"
],
"from": 0,
"size": 3,
"query": {
"match_phrase_prefix": {
"__title": {
"query": "3 foxe"
}
}
}
}
Now that was what I was looking for!
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 33,
"successful": 33,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 28,
"relation": "eq"
},
"max_score": 12.053555,
"hits": [
{
"_index": "data",
"_type": "_doc",
"_id": "1a42873cead94f18a31d0b102b4fbdcd",
"_score": 12.053555,
"_source": {
"__title": "3 foxes"
}
},
//omited for brevity
]
}
}
Can we wrap up at this point? Not so soon! As you may recall, our autocomplete functionality uses 3 fields, but we’ve examined only one of them. So how do we combine multiple fields? Since match_phrase_prefix
doesn’t support multiple fields, the first guess was the plain old bool
query.
{
"_source":[
"__title",
"title",
"commonInfo.RealNameShort"
],
"explain":false,
"from":0,
"size":3,
"query":{
"bool":{
"should":[
{
"match_phrase_prefix":{
"__title":{
"query":"3 foxe"
}
}
},
{
"match_phrase_prefix":{
"title":{
"query":"3 foxe"
}
}
},
{
"match_phrase_prefix":{
"commonInfo.RealNameShort":{
"query":"3 foxe"
}
}
}
]
}
}
}
And the result was,
{
"took": 13,
"timed_out": false,
"_shards": {
"total": 33,
"successful": 33,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 28,
"relation": "eq"
},
"max_score": 28.880083,
"hits": [
{
"_index": "data",
"_type": "_doc",
"_id": "15e4e503cc1d4284aeb34664cb61c5ae",
"_score": 28.880083,
"_source": {
"__title": "apt 3 foxes",
"commonInfo": {
"RealNameShort": "apt 3 foxes"
},
"title": "apartmetnt 3 foxes"
}
},
{
"_index": "data",
"_type": "_doc",
"_id": "83b2653a851c4ca19d3df0410ab1c41f",
"_score": 26.242756,
"_source": {
"__title": "rest/ 3 foxes",
"commonInfo": {
"RealNameShort": "rest/ 3 foxes"
},
"title": "restaraunt/ 3 foxes"
}
},
{
"_index": "data",
"_type": "_doc",
"_id": "1a42873cead94f18a31d0b102b4fbdcd",
"_score": 23.940828,
"_source": {
"__title": "3 foxes",
"title": "3 completely irrelevant to real name words",
"commonInfo": {
"RealNameShort": "3 foxes"
}
}
}
]
}
}
Huh? What happened? Let’s run the same query with explain":true
to understand. Since the output is huge, I’ll focus only on important parts. In the topmost document, we’ll notice,
"value": 10.268458,
"description": "weight(__title:\"3 (foxe foxes)\" in 1156) [PerFieldSimilarity], result of:",
...
"value": 8.497357,
"description": "weight(title:\"3 foxes\" in 1156) [PerFieldSimilarity], result of:",
...
"value": 10.114267,
"description": "weight(commonInfo.RealNameShort:\"3 (foxes foxe)\" in 1156)
[PerFieldSimilarity], result of:",
And here’s the document we expect to be the topmost,
"value": 12.053555,
"description": "weight(__title:\"3 (foxe foxes)\" in 1180) [PerFieldSimilarity], result of:",
...
"value": 11.887274,
"description": "weight(commonInfo.RealNameShort:\"3 (foxes foxe)\" in 1180)
[PerFieldSimilarity], result of:",
...
"value": 0.0,
"description": "match on required clause, product of:",
So as we might expect, the document which contains 3 foxes
in __title
scores most by the field __title
. But since apt 3 foxes
contains somewhat relevant results in each field of interest it outweighs desired document. If only we could somehow order documents by most relevant match!
Disjunction Max Query
And indeed, we can try Disjunction max query just for that case. Let’s try the example right from the docs,
{
"_source":[
"__title",
"title",
"commonInfo.RealNameShort"
],
"explain":false,
"from":0,
"size":3,
"query": {
"dis_max": {
"queries": [
{
"match_phrase_prefix":{
"__title":{
"query":"3 foxe"
}
}
},
{
"match_phrase_prefix":{
"title":{
"query":"3 foxe"
}
}
},
{
"match_phrase_prefix":{
"commonInfo.RealNameShort":{
"query":"3 foxe"
}
}
}
],
"tie_breaker": 0.7
}
}
}
Still not good, but at least the scores are closer to each other.
{
"took": 13,
"timed_out": false,
"_shards": {
"total": 33,
"successful": 33,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 28,
"relation": "eq"
},
"max_score": 23.296595,
"hits": [
{
"_index": "data",
"_type": "_doc",
"_id": "15e4e503cc1d4284aeb34664cb61c5ae",
"_score": 23.296595,
"_source": {
"__title": "apt 3 foxes",
"commonInfo": {
"RealNameShort": "apt 3 foxes"
},
"title": "apartmetnt 3 foxes"
}
},
{
"_index": "data",
"_type": "_doc",
"_id": "83b2653a851c4ca19d3df0410ab1c41f",
"_score": 21.053097,
"_source": {
"__title": "rest/ 3 foxes",
"commonInfo": {
"RealNameShort": "rest/ 3 foxes"
},
"title": "restaraunt/ 3 foxes"
}
},
{
"_index": "data",
"_type": "_doc",
"_id": "1a42873cead94f18a31d0b102b4fbdcd",
"_score": 20.374645,
"_source": {
"__title": "3 foxes",
"title": "3 completely irrelevant to real name words",
"commonInfo": {
"RealNameShort": "3 foxes"
}
}
}
]
}
}
It doesn’t seem that obvious what tie_breaker
parameter does. Let’s tweak it to find out. At first, we’ll set it to 1
.
{
"took": 13,
"timed_out": false,
"_shards": {
"total": 33,
"successful": 33,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 28,
"relation": "eq"
},
"max_score": 28.880083,
"hits": [
{
"_index": "data",
"_type": "_doc",
"_id": "15e4e503cc1d4284aeb34664cb61c5ae",
"_score": 28.880083,
"_source": {
"__title": "apt 3 foxes",
"commonInfo": {
"RealNameShort": "apt 3 foxes"
},
"title": "apartmetnt 3 foxes"
}
},
{
"_index": "data",
"_type": "_doc",
"_id": "83b2653a851c4ca19d3df0410ab1c41f",
"_score": 26.242756,
"_source": {
"__title": "rest/ 3 foxes",
"commonInfo": {
"RealNameShort": "rest/ 3 foxes"
},
"title": "restaraunt/ 3 foxes"
}
},
{
"_index": "data",
"_type": "_doc",
"_id": "1a42873cead94f18a31d0b102b4fbdcd",
"_score": 23.940828,
"_source": {
"__title": "3 foxes",
"title": "3 completely irrelevant to real name words",
"commonInfo": {
"RealNameShort": "3 foxes"
}
}
}
]
}
}
So as we see increasing it leads us in the wrong direction. Let’s remove it altogether.
{
"took": 15,
"timed_out": false,
"_shards": {
"total": 33,
"successful": 33,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 28,
"relation": "eq"
},
"max_score": 12.053555,
"hits": [
{
"_index": "data",
"_type": "_doc",
"_id": "1a42873cead94f18a31d0b102b4fbdcd",
"_score": 12.053555,
"_source": {
"__title": "3 foxes",
"title": "3 completely irrelevant to real name words",
"commonInfo": {
"RealNameShort": "3 foxes"
}
}
},
//omitted for brevity
]
}
}
Success! This was exactly what we were looking for!
Conclusion
When implementing autocomplete functionality with Elasticsearch, don’t jump straight away to the naive query_string
approach. Explore rich Elasticsearch query language first. Leveraging search_as_you_type
mapping at index-time might not be a silver bullet as well as the main aim of it is to combat search queries with out-of-order words by creating n-gram
fields for you. So it might be sufficient to resort solely to query-time improvements such as bool_prefix
query type if you want to get more lenient results or match_phrase_prefix
query type if you want your results to be more strict.
When combining autocomplete on multiple fields, you may use dis_max
query type. In such a case, increasing tie_breaker
parameter increases the degree by which all fields influence on resulting score.
And finally, when in doubt about why query results don’t match your expectations, you may resort to explain":true
query parameter.