Slow Searches in Listing Views #771

nihadness · 2018-04-13T12:08:17Z

Current behavior before PR

When searching for a string in listing views, catalog metadata values are used to be searched. Basically, the system calls all the brains from the catalog and then filters them by string operations. It works pretty good with catalogs that don't contain big amount of data. But for example, with a AR catalog containing nearly 31.000 records, that filtering operation takes 24-25 seconds.

Desired behavior after PR is merged

Instead of retrieving all the brains and filtering them after, the system should query the catalog. Unfortunately, Zope doesn't let query by wildcards through Field , Keyword and etc. indexes (see: http://zope.readthedocs.io/en/latest/zope2book/SearchingZCatalog.html#defining-indexes). Only ZCTextIndexes can be queried by wildcards if wildcard is in the end of the keyword.
To solve all this problem, new Add-On can be used. TextIndexNG3 helps to save indexes in another index type and then query them easily. So, for catalogs those are planned to be used with big amount of data, a new index- listing_searchable_text can be added (which type will be TextIndexNG3), and the query will be handled in bika_listing. In case listing_searchable_text doesn't exist in the catalog, the search will take place with the older method.

I confirm I have tested this PR thoroughly and coded it according to PEP8
and Plone's Python styleguide standards.

ramonski · 2018-04-17T11:42:30Z

bika/lims/catalog/indexers/analysisrequest.py

+        entries.append(value)
+
+    # Concatenate all strings to one text blob
+    return " ".join(entries)


Hi @nihadness, is it possible to provide a pluggable index function which works the same like https://github.com/senaite/senaite.core/blob/master/bika/lims/browser/bika_listing.py#L1209 ?

The explicit approach here lacks custom fields and therefore won't find any remarks, dates, review_title etc.
Also any additional index need to be integrated with a code change with this approach...

Does this search works as well case insensitive?

@ramonski , it could be possible to add all the fields but it is not a good approach to save all values twice. I think beside multiplying memory of AR Listing Catalog by two, it would also slow down the queries. Default Fields such as remarks, dates, review_title and etc. can be added to the list as well, if it makes sense from functional point of view.
Yes you are right that when new field is added and is searchable, then it must be added to the list as well. But this how also Plone's 'SearchableTextFields' work. When a new field is added, it must be defined in Searchable Text Field from ZMI or from the code. So maybe we can sacrifice at this point.
Yes, everything saved in the field is lower case and search terms are always lowered as well.

I think the memory wouldn't fall into weight compared to the explicit approach here.

Using the fields from the metadata would make this whole machinery quite flexible and usable for all catalogs. New searchable values could be added w/o code change by simply adding a metadata column to the catalog.

Otherwise the behavior of the search in AR listing behaves differently to all the other listings which use the metadata_search method.

In fact I would need to remove the listing index which comes in from the upgrade step for my customers after this release, because they are searching for columns not listed here...

Sometimes, optimizing for performance comes with a cost. With @nihadness ' approach, the cost is that indeed, you'll need to specifically define for which fields you want a portal_type object to be searchable in lists. The amount and complexity of data one would expect from an Analysis Requests list and from other lists like Methods, Analysis Services, etc. is incomparable.
metadata_search will work just fine for most of the lists and you will probably not notice any difference with ng3_searchin terms of performance if the number of objects is <10k. But there are a these few lists (ARs, Worksheets and Samples) that will grow indefinitely and will easily reach >10k. At this point, I truly believe that adding the NG3Index is inalienable, no matter if the listing behaves differently to all other listings.

On the other hand, enabling all lists to use NG3Index approach would be an error imo. Mostly because metadata_search will just work fine in most cases and adding a NG3Index to all them will come with non-justifiable costs: more code and more data in database, but without any benefit.

Disabling NG3Index for a given customer is not harder than adding a client-specific metadata for a given catalog. Just remove the index from the catalog and metadata_search will substitute ng3_search without the need of further actions.

If we agree that some lists (only 3 I can think of) might behave differently, then there is no reason to add all metadata in NG3Index, better to just add those that are meaningful, but allow others to be added in the indexer if necessary. In case we want those indexer to behave different in lab-specific add-ons, or don't like those strings hard-coded there, then we can always override that indexer or make use of adapters to get the fields transparently.

Is see the need for the index @xispa, no question. I only don't understand why we not simply put the output of metadata_to_searchable_text into this indexer, so I can use it as it is.

Ok, maybe I'm missing here something...

What is the difference between

Concatenating the values of the explicit getters into a single string and put that into the index

Getting all metadata, concatenate the strings together and put it then to the index?

Ok, maybe we have some bytes more in the string, but does it really count? We could maybe skip the dates, so it would be even more similar in my opinion.

well, I think just having more bytes in the strings is acceptable at some point . But I also think-considering values are saved in the list inside NG3Index, adding more elements (strings) to that list, will slow down the queries.
If you think it wouldn't make a big difference, then we can simply add all the metadata columns there and see.

Searching and finding contents in the sysstem is crucial in my opinion for user acceptance.

Having a quick search, which does not find the expected results, will lower this acceptance and I'd rather accept more memory/time consumption to provide a good search facility than to lose user acceptance.

Also users expect to wait after they entered a search value. We should rather improve performance on pure display sites and listings.

Anyhow, please consider that in your coding @nihadness and thanks for your replies so far.

of course, I always try to consider things we talk about in PR's, very helpful for self-improvement :)
So, I will just modify the method and add all the metadata values to that index.
Welcome and thank you and Jordi too!

Just thinking (for another PR) that maybe could be nice to create a function like serch_by_term(catalog, portal_type, search_term, base_query=None) in senaite.core.api and move all related logic (metadata_search and ngx3_search) there?
btw, these discussions are one of the reasons I love you all! ;)

ramonski · 2018-04-17T11:49:46Z

bika/lims/content/analysisrequest.py

+        """ A method for AR listing catalog metadata
+        :return: Title of Storage Location
+        """
+        return self.getStorageLocation() and self.getStorageLocation().Title() or ''


You are calling here a "potentailly expensive" method twice. Better do it once and keep it in a variable

ramonski · 2018-04-17T11:56:26Z

bika/lims/catalog/indexers/analysisrequest.py

+                         'getSamplePointTitle', 'getCreatorFullName',
+                         'getProfilesTitle', 'getStorageLocationTitle',
+                         'getClientOrderNumber', 'getClientReference',
+                         'getClientSampleID', 'getTemplateTitle', )


Better go here through all available columns (see: https://github.com/senaite/senaite.core/blob/master/bika/lims/browser/bika_listing.py#L1163) and call a (recursive) "stringify" function to handle lists/dictionaries and other nested types as well, e.g. like here: https://github.com/senaite/senaite.publisher/blob/master/src/senaite/publisher/reportmodel.py#L257

please see the previous comment about going through all the columns. Also bear in mind that, that index lives inside the object itself, and values are not from the catalog. So our advantage here is also that we can save more values which are not metadata.

Yes, the index get the instance as the first argument and is therefore capable to return attributes from the object to the catalog index as well.

However, it is also possible to fetch the assigned catalog of the portal_type from the archetypes_tool and have access to all the metadata columns.

So it might be indeed an advantage to have access the attributes, but I would rather use that to augment the already existing metadata columns.

A more flexible approach makes this whole effort you invest at the moment into this much more sustainable w/o further need for code changes in the core...

nihadness added 7 commits April 13, 2018 10:25

Index and Metadata search methods in Bika Listing

18f64ee

Separate ZCIndex and Metadata Searches in Bika Listing.

b138ca5

ZcTextIndex -> TextIndexNG3 in AR Listing Catalog.

4ab8556

TextIndexNG3 in AR Listing Catalog

19f9b0e

Release Note

7dae403

Indexer method of listing_searchable_test for AR's

4a2523f

Forgotten Index Adapter

bca4152

nihadness added Improvement 🔧 PR: Not Ready ⛔️ P2: Very Important labels Apr 13, 2018

nihadness self-assigned this Apr 13, 2018

nihadness requested review from ramonski and xispa April 13, 2018 12:08

nihadness added 5 commits April 16, 2018 11:16

Do not remove special characters in TextIndexNG3 searches.

987b2f6

Comments.

27dbff3

Always check the manual sort and log the execution time.

f114715

Add more fields to ng3 searchable text index.

9afab35

Changes.rst

59ceca8

nihadness removed the PR: Not Ready ⛔️ label Apr 16, 2018

nihadness added 6 commits April 16, 2018 12:26

Log and do not fail when attribute not found for NG3 Text Index.

cb4fd6d

Renamed adapter of NG3 Text Index.

513ede4

Add 'TextIndexNG3' to dependencies.

3e45b7a

Undo Renaming.

349abd6

More searchable fields in AR Listing

3c2554c

Join list values while saving in searchable text.

f616d5f

nihadness added the PR: Not Ready ⛔️ label Apr 16, 2018

Add 'StorageLocationTitle' method to AR object.

8628d31

xispa mentioned this pull request Apr 17, 2018

Partial search worked before senaite/senaite.health#66

Closed

nihadness added 2 commits April 17, 2018 13:02

Always strip illegal characters & convert to 'utf-8'

6c0e9c6

Do no save null values in searchable index

9e58fe5

A Comment

8b8fffa

nihadness removed the PR: Not Ready ⛔️ label Apr 17, 2018

Remove quotation symbols from 'Search Term'

41bcc19

ramonski reviewed Apr 17, 2018

View reviewed changes

nihadness added the PR: Not Ready ⛔️ label Apr 18, 2018

nihadness added 4 commits April 18, 2018 14:27

Do not call AR class methods twice.

5f50306

Save all metadata values in NG3 Index.

364ead8

Comments.

ec04e5a

Convert metadata to string.

bc744e8

nihadness removed the PR: Not Ready ⛔️ label Apr 19, 2018

nihadness mentioned this pull request Apr 19, 2018

Index and Metadata optimizations for AR Listing View senaite/senaite.health#70

Merged

xispa approved these changes Apr 20, 2018

View reviewed changes

ramonski approved these changes Apr 20, 2018

View reviewed changes

Merge branch 'master' into bika-listing-search

1fa1ff1

ramonski merged commit b06c0a1 into senaite:master Apr 20, 2018

xispa mentioned this pull request Apr 26, 2018

Search Filter box not working properly for ARs #773

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow Searches in Listing Views #771

Slow Searches in Listing Views #771

nihadness commented Apr 13, 2018 •

edited

Loading

ramonski Apr 17, 2018 •

edited

Loading

nihadness Apr 18, 2018

ramonski Apr 18, 2018

xispa Apr 18, 2018

ramonski Apr 18, 2018

ramonski Apr 18, 2018

nihadness Apr 18, 2018

ramonski Apr 18, 2018 •

edited

Loading

nihadness Apr 18, 2018

xispa Apr 18, 2018

ramonski Apr 17, 2018

ramonski Apr 17, 2018 •

edited

Loading

nihadness Apr 18, 2018

ramonski Apr 18, 2018

Slow Searches in Listing Views #771

Slow Searches in Listing Views #771

Conversation

nihadness commented Apr 13, 2018 • edited Loading

Current behavior before PR

Desired behavior after PR is merged

ramonski Apr 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ramonski Apr 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ramonski Apr 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nihadness commented Apr 13, 2018 •

edited

Loading

ramonski Apr 17, 2018 •

edited

Loading

ramonski Apr 18, 2018 •

edited

Loading

ramonski Apr 17, 2018 •

edited

Loading