Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail span_multi queries that don't use top_terms_* rewriting #27432

Closed
jimczi opened this issue Nov 17, 2017 · 9 comments
Closed

Fail span_multi queries that don't use top_terms_* rewriting #27432

jimczi opened this issue Nov 17, 2017 · 9 comments
Assignees
Labels
>bug :Search/Search Search-related issues that do not fall into other categories

Comments

@jimczi
Copy link
Contributor

jimczi commented Nov 17, 2017

The span multi term query limits the number of expanded terms to search only if the inner multi term query uses top_terms_N rewrite. Otherwise it extracts all terms that match the inner multi term query and use them to build a giant span_or query without any limit.
We should force the inner multi term query to use a top_terms_N rewrite in order to limit the memory. Currently a prefix query like a* would extract all term that starts with a since the default rewrite of the prefix query is constant_score. This mode is optimized for prefix query and should never be used on a span_multi which needs to access all positions for the extracted terms.

@jimczi jimczi added :Search/Search Search-related issues that do not fall into other categories >bug discuss and removed >bug labels Nov 17, 2017
@jimczi jimczi self-assigned this Nov 17, 2017
@jimczi
Copy link
Contributor Author

jimczi commented Nov 17, 2017

Discussed in fixit Friday. We should limit the expansion to a sane value and improve the documentation around rewriting of multi term queries.

@jpountz
Copy link
Contributor

jpountz commented Nov 21, 2017

We should limit the expansion to a sane value

Does it mean enforcing a top-terms rewrite or failing the query if there are more than X terms?

@jimczi
Copy link
Contributor Author

jimczi commented Nov 21, 2017

Does it mean enforcing a top-terms rewrite or failing the query if there are more than X terms?

I'd prefer enforcing a top-terms rewrite with an explicit option to limit the expansion:

"span_multi":{
  "match":{
    "prefix" : { "user" :  { "value" : "ki" } }
   },
  "max_expansions": 100
}

Then we can fail the query if an explicit rewrite is set on the inner multi query or if max_expansions is greater than 1024. I also think that max_expansions should always be set and that we should fail queries without an explicit value. A default value would hide the new behavior which is breaking anyway. WDYT ?

@jpountz
Copy link
Contributor

jpountz commented Nov 21, 2017

Sounds good to me. Should it be a 7.0 change then?

@jimczi
Copy link
Contributor Author

jimczi commented Nov 21, 2017

We can also add max_expansions as an override of the rewrite in 6.x. This would not break anything in that version and we can add a deprecation warning if the value is not set ?

@jpountz
Copy link
Contributor

jpountz commented Nov 21, 2017

👍

@jimczi
Copy link
Contributor Author

jimczi commented Jun 7, 2018

Closed by #30913

@jimczi jimczi closed this as completed Jun 7, 2018
@nirmalc
Copy link
Contributor

nirmalc commented Jun 7, 2018

Thanks @jimczi , are we planning to patch this on 6.4 /6.5 ? - I can rebase against 6.4 if you prefer ( Thats what we plan to launch in prod ;-) )

@jimczi
Copy link
Contributor Author

jimczi commented Jun 7, 2018

@nirmalc it's merged in 6.x already so 6.4 it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

4 participants