-
-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow Searches in Listing Views #771
Conversation
entries.append(value) | ||
|
||
# Concatenate all strings to one text blob | ||
return " ".join(entries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @nihadness, is it possible to provide a pluggable index function which works the same like https://github.com/senaite/senaite.core/blob/master/bika/lims/browser/bika_listing.py#L1209 ?
The explicit approach here lacks custom fields and therefore won't find any remarks, dates, review_title etc.
Also any additional index need to be integrated with a code change with this approach...
Does this search works as well case insensitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ramonski , it could be possible to add all the fields but it is not a good approach to save all values twice. I think beside multiplying memory of AR Listing Catalog by two, it would also slow down the queries. Default Fields such as remarks, dates, review_title and etc. can be added to the list as well, if it makes sense from functional point of view.
Yes you are right that when new field is added and is searchable, then it must be added to the list as well. But this how also Plone's 'SearchableTextFields' work. When a new field is added, it must be defined in Searchable Text Field from ZMI or from the code. So maybe we can sacrifice at this point.
Yes, everything saved in the field is lower case and search terms are always lowered as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the memory wouldn't fall into weight compared to the explicit approach here.
Using the fields from the metadata would make this whole machinery quite flexible and usable for all catalogs. New searchable values could be added w/o code change by simply adding a metadata column to the catalog.
Otherwise the behavior of the search in AR listing behaves differently to all the other listings which use the metadata_search method.
In fact I would need to remove the listing index which comes in from the upgrade step for my customers after this release, because they are searching for columns not listed here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes, optimizing for performance comes with a cost. With @nihadness ' approach, the cost is that indeed, you'll need to specifically define for which fields you want a portal_type
object to be searchable in lists. The amount and complexity of data one would expect from an Analysis Requests list and from other lists like Methods, Analysis Services, etc. is incomparable.
metadata_search
will work just fine for most of the lists and you will probably not notice any difference with ng3_search
in terms of performance if the number of objects is <10k. But there are a these few lists (ARs, Worksheets and Samples) that will grow indefinitely and will easily reach >10k. At this point, I truly believe that adding the NG3Index
is inalienable, no matter if the listing behaves differently to all other listings.
On the other hand, enabling all lists to use NG3Index
approach would be an error imo. Mostly because metadata_search
will just work fine in most cases and adding a NG3Index
to all them will come with non-justifiable costs: more code and more data in database, but without any benefit.
Disabling NG3Index
for a given customer is not harder than adding a client-specific metadata for a given catalog. Just remove the index from the catalog and metadata_search
will substitute ng3_search
without the need of further actions.
If we agree that some lists (only 3 I can think of) might behave differently, then there is no reason to add all metadata in NG3Index
, better to just add those that are meaningful, but allow others to be added in the indexer if necessary. In case we want those indexer to behave different in lab-specific add-ons, or don't like those strings hard-coded there, then we can always override that indexer or make use of adapters to get the fields transparently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is see the need for the index @xispa, no question. I only don't understand why we not simply put the output of metadata_to_searchable_text
into this indexer, so I can use it as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, maybe I'm missing here something...
What is the difference between
- Concatenating the values of the explicit getters into a single string and put that into the index
- Getting all metadata, concatenate the strings together and put it then to the index?
Ok, maybe we have some bytes more in the string, but does it really count? We could maybe skip the dates, so it would be even more similar in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, I think just having more bytes in the strings is acceptable at some point . But I also think-considering values are saved in the list inside NG3Index
, adding more elements (strings) to that list, will slow down the queries.
If you think it wouldn't make a big difference, then we can simply add all the metadata columns there and see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Searching and finding contents in the sysstem is crucial in my opinion for user acceptance.
Having a quick search, which does not find the expected results, will lower this acceptance and I'd rather accept more memory/time consumption to provide a good search facility than to lose user acceptance.
Also users expect to wait after they entered a search value. We should rather improve performance on pure display sites and listings.
Anyhow, please consider that in your coding @nihadness and thanks for your replies so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course, I always try to consider things we talk about in PR's, very helpful for self-improvement :)
So, I will just modify the method and add all the metadata values to that index.
Welcome and thank you and Jordi too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking (for another PR) that maybe could be nice to create a function like serch_by_term(catalog, portal_type, search_term, base_query=None)
in senaite.core.api
and move all related logic (metadata_search
and ngx3_search
) there?
btw, these discussions are one of the reasons I love you all! ;)
bika/lims/content/analysisrequest.py
Outdated
""" A method for AR listing catalog metadata | ||
:return: Title of Storage Location | ||
""" | ||
return self.getStorageLocation() and self.getStorageLocation().Title() or '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are calling here a "potentailly expensive" method twice. Better do it once and keep it in a variable
'getSamplePointTitle', 'getCreatorFullName', | ||
'getProfilesTitle', 'getStorageLocationTitle', | ||
'getClientOrderNumber', 'getClientReference', | ||
'getClientSampleID', 'getTemplateTitle', ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better go here through all available columns (see: https://github.com/senaite/senaite.core/blob/master/bika/lims/browser/bika_listing.py#L1163) and call a (recursive) "stringify" function to handle lists/dictionaries and other nested types as well, e.g. like here: https://github.com/senaite/senaite.publisher/blob/master/src/senaite/publisher/reportmodel.py#L257
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please see the previous comment about going through all the columns. Also bear in mind that, that index lives inside the object itself, and values are not from the catalog. So our advantage here is also that we can save more values which are not metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the index get the instance as the first argument and is therefore capable to return attributes from the object to the catalog index as well.
However, it is also possible to fetch the assigned catalog of the portal_type from the archetypes_tool and have access to all the metadata columns.
So it might be indeed an advantage to have access the attributes, but I would rather use that to augment the already existing metadata columns.
A more flexible approach makes this whole effort you invest at the moment into this much more sustainable w/o further need for code changes in the core...
Current behavior before PR
When searching for a string in listing views, catalog metadata values are used to be searched. Basically, the system calls all the brains from the catalog and then filters them by string operations. It works pretty good with catalogs that don't contain big amount of data. But for example, with a AR catalog containing nearly 31.000 records, that filtering operation takes 24-25 seconds.
Desired behavior after PR is merged
Instead of retrieving all the brains and filtering them after, the system should query the catalog. Unfortunately, Zope doesn't let query by wildcards through
Field , Keyword and etc.
indexes (see: http://zope.readthedocs.io/en/latest/zope2book/SearchingZCatalog.html#defining-indexes). OnlyZCTextIndexes
can be queried by wildcards if wildcard is in the end of the keyword.To solve all this problem, new Add-On can be used. TextIndexNG3 helps to save indexes in another index type and then query them easily. So, for catalogs those are planned to be used with big amount of data, a new index-
listing_searchable_text
can be added (which type will beTextIndexNG3
), and the query will be handled inbika_listing
. In caselisting_searchable_text
doesn't exist in the catalog, the search will take place with the older method.I confirm I have tested this PR thoroughly and coded it according to PEP8
and Plone's Python styleguide standards.