Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow Searches in Listing Views #771

Merged
merged 28 commits into from
Apr 20, 2018
Merged

Conversation

nihadness
Copy link
Contributor

@nihadness nihadness commented Apr 13, 2018

Current behavior before PR

When searching for a string in listing views, catalog metadata values are used to be searched. Basically, the system calls all the brains from the catalog and then filters them by string operations. It works pretty good with catalogs that don't contain big amount of data. But for example, with a AR catalog containing nearly 31.000 records, that filtering operation takes 24-25 seconds.

Desired behavior after PR is merged

Instead of retrieving all the brains and filtering them after, the system should query the catalog. Unfortunately, Zope doesn't let query by wildcards through Field , Keyword and etc. indexes (see: http://zope.readthedocs.io/en/latest/zope2book/SearchingZCatalog.html#defining-indexes). Only ZCTextIndexes can be queried by wildcards if wildcard is in the end of the keyword.
To solve all this problem, new Add-On can be used. TextIndexNG3 helps to save indexes in another index type and then query them easily. So, for catalogs those are planned to be used with big amount of data, a new index- listing_searchable_text can be added (which type will be TextIndexNG3), and the query will be handled in bika_listing. In case listing_searchable_text doesn't exist in the catalog, the search will take place with the older method.


I confirm I have tested this PR thoroughly and coded it according to PEP8
and Plone's Python styleguide standards.

entries.append(value)

# Concatenate all strings to one text blob
return " ".join(entries)
Copy link
Contributor

@ramonski ramonski Apr 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nihadness, is it possible to provide a pluggable index function which works the same like https://github.com/senaite/senaite.core/blob/master/bika/lims/browser/bika_listing.py#L1209 ?

The explicit approach here lacks custom fields and therefore won't find any remarks, dates, review_title etc.
Also any additional index need to be integrated with a code change with this approach...

Does this search works as well case insensitive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramonski , it could be possible to add all the fields but it is not a good approach to save all values twice. I think beside multiplying memory of AR Listing Catalog by two, it would also slow down the queries. Default Fields such as remarks, dates, review_title and etc. can be added to the list as well, if it makes sense from functional point of view.
Yes you are right that when new field is added and is searchable, then it must be added to the list as well. But this how also Plone's 'SearchableTextFields' work. When a new field is added, it must be defined in Searchable Text Field from ZMI or from the code. So maybe we can sacrifice at this point.
Yes, everything saved in the field is lower case and search terms are always lowered as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the memory wouldn't fall into weight compared to the explicit approach here.

Using the fields from the metadata would make this whole machinery quite flexible and usable for all catalogs. New searchable values could be added w/o code change by simply adding a metadata column to the catalog.

Otherwise the behavior of the search in AR listing behaves differently to all the other listings which use the metadata_search method.

In fact I would need to remove the listing index which comes in from the upgrade step for my customers after this release, because they are searching for columns not listed here...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes, optimizing for performance comes with a cost. With @nihadness ' approach, the cost is that indeed, you'll need to specifically define for which fields you want a portal_type object to be searchable in lists. The amount and complexity of data one would expect from an Analysis Requests list and from other lists like Methods, Analysis Services, etc. is incomparable.
metadata_search will work just fine for most of the lists and you will probably not notice any difference with ng3_searchin terms of performance if the number of objects is <10k. But there are a these few lists (ARs, Worksheets and Samples) that will grow indefinitely and will easily reach >10k. At this point, I truly believe that adding the NG3Index is inalienable, no matter if the listing behaves differently to all other listings.

On the other hand, enabling all lists to use NG3Index approach would be an error imo. Mostly because metadata_search will just work fine in most cases and adding a NG3Index to all them will come with non-justifiable costs: more code and more data in database, but without any benefit.

Disabling NG3Index for a given customer is not harder than adding a client-specific metadata for a given catalog. Just remove the index from the catalog and metadata_search will substitute ng3_search without the need of further actions.

If we agree that some lists (only 3 I can think of) might behave differently, then there is no reason to add all metadata in NG3Index, better to just add those that are meaningful, but allow others to be added in the indexer if necessary. In case we want those indexer to behave different in lab-specific add-ons, or don't like those strings hard-coded there, then we can always override that indexer or make use of adapters to get the fields transparently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is see the need for the index @xispa, no question. I only don't understand why we not simply put the output of metadata_to_searchable_text into this indexer, so I can use it as it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, maybe I'm missing here something...

What is the difference between

  • Concatenating the values of the explicit getters into a single string and put that into the index
  • Getting all metadata, concatenate the strings together and put it then to the index?

Ok, maybe we have some bytes more in the string, but does it really count? We could maybe skip the dates, so it would be even more similar in my opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I think just having more bytes in the strings is acceptable at some point . But I also think-considering values are saved in the list inside NG3Index, adding more elements (strings) to that list, will slow down the queries.
If you think it wouldn't make a big difference, then we can simply add all the metadata columns there and see.

Copy link
Contributor

@ramonski ramonski Apr 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Searching and finding contents in the sysstem is crucial in my opinion for user acceptance.

Having a quick search, which does not find the expected results, will lower this acceptance and I'd rather accept more memory/time consumption to provide a good search facility than to lose user acceptance.

Also users expect to wait after they entered a search value. We should rather improve performance on pure display sites and listings.

Anyhow, please consider that in your coding @nihadness and thanks for your replies so far.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course, I always try to consider things we talk about in PR's, very helpful for self-improvement :)
So, I will just modify the method and add all the metadata values to that index.
Welcome and thank you and Jordi too!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking (for another PR) that maybe could be nice to create a function like serch_by_term(catalog, portal_type, search_term, base_query=None) in senaite.core.api and move all related logic (metadata_search and ngx3_search) there?
btw, these discussions are one of the reasons I love you all! ;)

""" A method for AR listing catalog metadata
:return: Title of Storage Location
"""
return self.getStorageLocation() and self.getStorageLocation().Title() or ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are calling here a "potentailly expensive" method twice. Better do it once and keep it in a variable

'getSamplePointTitle', 'getCreatorFullName',
'getProfilesTitle', 'getStorageLocationTitle',
'getClientOrderNumber', 'getClientReference',
'getClientSampleID', 'getTemplateTitle', )
Copy link
Contributor

@ramonski ramonski Apr 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better go here through all available columns (see: https://github.com/senaite/senaite.core/blob/master/bika/lims/browser/bika_listing.py#L1163) and call a (recursive) "stringify" function to handle lists/dictionaries and other nested types as well, e.g. like here: https://github.com/senaite/senaite.publisher/blob/master/src/senaite/publisher/reportmodel.py#L257

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see the previous comment about going through all the columns. Also bear in mind that, that index lives inside the object itself, and values are not from the catalog. So our advantage here is also that we can save more values which are not metadata.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the index get the instance as the first argument and is therefore capable to return attributes from the object to the catalog index as well.

However, it is also possible to fetch the assigned catalog of the portal_type from the archetypes_tool and have access to all the metadata columns.

So it might be indeed an advantage to have access the attributes, but I would rather use that to augment the already existing metadata columns.

A more flexible approach makes this whole effort you invest at the moment into this much more sustainable w/o further need for code changes in the core...

@ramonski ramonski merged commit b06c0a1 into senaite:master Apr 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants