Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow Searches in Listing Views #771

Merged
merged 28 commits into from
Apr 20, 2018
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
18f64ee
Index and Metadata search methods in Bika Listing
nihadness Apr 13, 2018
b138ca5
Separate ZCIndex and Metadata Searches in Bika Listing.
nihadness Apr 13, 2018
4ab8556
ZcTextIndex -> TextIndexNG3 in AR Listing Catalog.
nihadness Apr 13, 2018
19f9b0e
TextIndexNG3 in AR Listing Catalog
nihadness Apr 13, 2018
7dae403
Release Note
nihadness Apr 13, 2018
4a2523f
Indexer method of listing_searchable_test for AR's
nihadness Apr 13, 2018
bca4152
Forgotten Index Adapter
nihadness Apr 13, 2018
987b2f6
Do not remove special characters in TextIndexNG3 searches.
nihadness Apr 16, 2018
27dbff3
Comments.
nihadness Apr 16, 2018
f114715
Always check the manual sort and log the execution time.
nihadness Apr 16, 2018
9afab35
Add more fields to ng3 searchable text index.
nihadness Apr 16, 2018
59ceca8
Changes.rst
nihadness Apr 16, 2018
cb4fd6d
Log and do not fail when attribute not found for NG3 Text Index.
nihadness Apr 16, 2018
513ede4
Renamed adapter of NG3 Text Index.
nihadness Apr 16, 2018
3e45b7a
Add 'TextIndexNG3' to dependencies.
nihadness Apr 16, 2018
349abd6
Undo Renaming.
nihadness Apr 16, 2018
3c2554c
More searchable fields in AR Listing
nihadness Apr 16, 2018
f616d5f
Join list values while saving in searchable text.
nihadness Apr 16, 2018
8628d31
Add 'StorageLocationTitle' method to AR object.
nihadness Apr 16, 2018
6c0e9c6
Always strip illegal characters & convert to 'utf-8'
nihadness Apr 17, 2018
9e58fe5
Do no save null values in searchable index
nihadness Apr 17, 2018
8b8fffa
A Comment
nihadness Apr 17, 2018
41bcc19
Remove quotation symbols from 'Search Term'
nihadness Apr 17, 2018
5f50306
Do not call AR class methods twice.
nihadness Apr 18, 2018
364ead8
Save all metadata values in NG3 Index.
nihadness Apr 18, 2018
ec04e5a
Comments.
nihadness Apr 18, 2018
bc744e8
Convert metadata to string.
nihadness Apr 19, 2018
1fa1ff1
Merge branch 'master' into bika-listing-search
ramonski Apr 20, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Changelog

**Fixed**

- #771 Slow Searches in Listing Views

**Security**

Expand Down
9 changes: 9 additions & 0 deletions RELEASE_NOTES.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
Release notes
=============

Update from 1.2.4 to 1.2.5
--------------------------

- This update requires the execution of `bin/buildout`, because
Products.TextIndexNG3 has been added. It will help to search by wildcards in
TextIndexNG3 indexes instead of looking for the keyword inside wildcards.
For now, it is used only in AR listing catalog.
https://pypi.python.org/pypi/Products.TextIndexNG3/

Update from 1.2.3 to 1.2.4
--------------------------

Expand Down
70 changes: 56 additions & 14 deletions bika/lims/browser/bika_listing.py
Original file line number Diff line number Diff line change
Expand Up @@ -1291,7 +1291,7 @@ def search(self, searchterm="", ignorecase=True):

# strip whitespaces off the searchterm
searchterm = searchterm.strip()
# strip illegal characters off the searchterm
# Strip illegal characters of the searchterm
searchterm = searchterm.strip(u"*.!$%&/()=-+:'`´^")
logger.info(u"ListingView::search:searchterm='{}'".format(searchterm))

Expand All @@ -1302,19 +1302,66 @@ def search(self, searchterm="", ignorecase=True):

# search the catalog
catalog = api.get_tool(self.catalog)
brains = catalog(query)

# return the unfiltered catalog results if no searchterm
if not searchterm:
brains = catalog(query)

# check if there is ng3 index in the catalog to query by wildcards
elif "listing_searchable_text" in catalog.indexes():
# Always expand all categories if we have a searchterm
self.expand_all_categories = True
brains = self.ng3_index_search(catalog, query, searchterm)

else:
self.expand_all_categories = True
brains = self.metadata_search(catalog, query, searchterm, ignorecase)

# Sort manually?
if self.manual_sort_on is not None:
brains = self.sort_brains(brains, sort_on=self.manual_sort_on)

# return the unfiltered catalog results
if not searchterm:
logger.info(u"ListingView::search: return {} results".format(len(brains)))
return brains
end = time.time()
logger.info(u"ListingView::search: Search for '{}' executed in "
u"{:.2f}s ({} matches)"
.format(searchterm, end - start, len(brains)))
return brains

# Always expand all categories if we have a searchterm
self.expand_all_categories = True
def ng3_index_search(self, catalog, query, searchterm):
""" Searches given catalog by query and also looks for a keyword in the
specific index called 'listing_searchable_text'
#REMEMBER TextIndexNG indexes are the only indexes that wildcards can be
used in the beginning of the string.
http://zope.readthedocs.io/en/latest/zope2book/SearchingZCatalog.html#textindexng
:param catalog: catalog to search
:param query:
:param searchterm: a keyword to look for in 'listing_searchable_text'
:return: brains matching the search result
"""
logger.info(u"ListingView::search: Prepare NG3 index query for '{}'"
.format(self.catalog))
# Remove quotation mark
searchterm = searchterm.replace('"', '')
# If the keyword is not encoded in searches, TextIndexNG3 encodes by
# default encoding which we cannot always trust
searchterm = searchterm.encode("utf-8")
query["listing_searchable_text"] = "*" + searchterm + "*"
return catalog(query)

def metadata_search(self, catalog, query, searchterm, ignorecase=True):
""" Retrieves all the brains from given catalog and returns the ones
with at least one metadata containing the search term
:param catalog: catalog to search
:param query:
:param searchterm:
:param ignorecase:
:return: brains matching search result
"""
# create a catalog query
logger.info(u"ListingView::search: Prepare metadata query for '{}'"
.format(self.catalog))

brains = catalog(query)

# Build a regular expression for the given searchterm
regex = self.make_regex_for(searchterm, ignorecase=ignorecase)
Expand All @@ -1332,12 +1379,7 @@ def match(brain):
return False

# Filtered brains by searchterm -> metadata match
out = filter(match, brains)

end = time.time()
logger.info(u"ListingView::search: Search for '{}' executed in {:.2f}s ({} matches)"
.format(searchterm, end - start, len(out)))
return out
return filter(match, brains)

def get_searchterm(self):
"""Get the user entered search value from the request
Expand Down
3 changes: 3 additions & 0 deletions bika/lims/catalog/analysisrequest_catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,9 @@
'getClientTitle': 'FieldIndex',
'getPrioritySortkey': 'FieldIndex',
'assigned_state': 'FieldIndex',
# Searchable Text Index by wildcards
# http://zope.readthedocs.io/en/latest/zope2book/SearchingZCatalog.html#textindexng
'listing_searchable_text': 'TextIndexNG3',
}
# Defining the columns for this catalog
_columns_list = [
Expand Down
40 changes: 39 additions & 1 deletion bika/lims/catalog/indexers/analysisrequest.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
# Some rights reserved. See LICENSE.rst, CONTRIBUTORS.rst.

from bika.lims import api
from bika.lims import logger
from bika.lims.interfaces import IAnalysisRequest
from bika.lims.workflow import getCurrentState
from plone.indexer import indexer


Expand All @@ -28,3 +28,41 @@ def assigned_state(instance):
return 'unassigned'

return 'assigned'


@indexer(IAnalysisRequest)
def listing_searchable_text(instance):
""" Indexes values of desired fields for searches in listing view. All the
field names added to 'plain_text_fields' will be available to search by
wildcards.
Please choose the searchable fields carefully and add only fields that
can be useful to search by. For example, there is no need to add 'SampleId'
since 'getId' of AR already contains that value. Nor 'ClientTitle' because
AR's are/can be filtered by client in Clients' 'AR Listing View'
:return: values of the fields defined as a string
"""
entries = []
plain_text_fields = ('getId', 'getContactFullName', 'getSampleTypeTitle',
'getSamplePointTitle', 'getCreatorFullName',
'getProfilesTitle', 'getStorageLocationTitle',
'getClientOrderNumber', 'getClientReference',
'getClientSampleID', 'getTemplateTitle', )
Copy link
Contributor

@ramonski ramonski Apr 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better go here through all available columns (see: https://github.com/senaite/senaite.core/blob/master/bika/lims/browser/bika_listing.py#L1163) and call a (recursive) "stringify" function to handle lists/dictionaries and other nested types as well, e.g. like here: https://github.com/senaite/senaite.publisher/blob/master/src/senaite/publisher/reportmodel.py#L257

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see the previous comment about going through all the columns. Also bear in mind that, that index lives inside the object itself, and values are not from the catalog. So our advantage here is also that we can save more values which are not metadata.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the index get the instance as the first argument and is therefore capable to return attributes from the object to the catalog index as well.

However, it is also possible to fetch the assigned catalog of the portal_type from the archetypes_tool and have access to all the metadata columns.

So it might be indeed an advantage to have access the attributes, but I would rather use that to augment the already existing metadata columns.

A more flexible approach makes this whole effort you invest at the moment into this much more sustainable w/o further need for code changes in the core...


# Concatenate plain text fields as they are
for field_name in plain_text_fields:
try:
value = api.safe_getattr(instance, field_name)
except:
logger.error("{} has no attribute called '{}' ".format(
repr(instance), field_name))
continue

if not value:
continue
if isinstance(value, list):
value = " ".join(value)

entries.append(value)

# Concatenate all strings to one text blob
return " ".join(entries)
Copy link
Contributor

@ramonski ramonski Apr 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nihadness, is it possible to provide a pluggable index function which works the same like https://github.com/senaite/senaite.core/blob/master/bika/lims/browser/bika_listing.py#L1209 ?

The explicit approach here lacks custom fields and therefore won't find any remarks, dates, review_title etc.
Also any additional index need to be integrated with a code change with this approach...

Does this search works as well case insensitive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramonski , it could be possible to add all the fields but it is not a good approach to save all values twice. I think beside multiplying memory of AR Listing Catalog by two, it would also slow down the queries. Default Fields such as remarks, dates, review_title and etc. can be added to the list as well, if it makes sense from functional point of view.
Yes you are right that when new field is added and is searchable, then it must be added to the list as well. But this how also Plone's 'SearchableTextFields' work. When a new field is added, it must be defined in Searchable Text Field from ZMI or from the code. So maybe we can sacrifice at this point.
Yes, everything saved in the field is lower case and search terms are always lowered as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the memory wouldn't fall into weight compared to the explicit approach here.

Using the fields from the metadata would make this whole machinery quite flexible and usable for all catalogs. New searchable values could be added w/o code change by simply adding a metadata column to the catalog.

Otherwise the behavior of the search in AR listing behaves differently to all the other listings which use the metadata_search method.

In fact I would need to remove the listing index which comes in from the upgrade step for my customers after this release, because they are searching for columns not listed here...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes, optimizing for performance comes with a cost. With @nihadness ' approach, the cost is that indeed, you'll need to specifically define for which fields you want a portal_type object to be searchable in lists. The amount and complexity of data one would expect from an Analysis Requests list and from other lists like Methods, Analysis Services, etc. is incomparable.
metadata_search will work just fine for most of the lists and you will probably not notice any difference with ng3_searchin terms of performance if the number of objects is <10k. But there are a these few lists (ARs, Worksheets and Samples) that will grow indefinitely and will easily reach >10k. At this point, I truly believe that adding the NG3Index is inalienable, no matter if the listing behaves differently to all other listings.

On the other hand, enabling all lists to use NG3Index approach would be an error imo. Mostly because metadata_search will just work fine in most cases and adding a NG3Index to all them will come with non-justifiable costs: more code and more data in database, but without any benefit.

Disabling NG3Index for a given customer is not harder than adding a client-specific metadata for a given catalog. Just remove the index from the catalog and metadata_search will substitute ng3_search without the need of further actions.

If we agree that some lists (only 3 I can think of) might behave differently, then there is no reason to add all metadata in NG3Index, better to just add those that are meaningful, but allow others to be added in the indexer if necessary. In case we want those indexer to behave different in lab-specific add-ons, or don't like those strings hard-coded there, then we can always override that indexer or make use of adapters to get the fields transparently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is see the need for the index @xispa, no question. I only don't understand why we not simply put the output of metadata_to_searchable_text into this indexer, so I can use it as it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, maybe I'm missing here something...

What is the difference between

  • Concatenating the values of the explicit getters into a single string and put that into the index
  • Getting all metadata, concatenate the strings together and put it then to the index?

Ok, maybe we have some bytes more in the string, but does it really count? We could maybe skip the dates, so it would be even more similar in my opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I think just having more bytes in the strings is acceptable at some point . But I also think-considering values are saved in the list inside NG3Index, adding more elements (strings) to that list, will slow down the queries.
If you think it wouldn't make a big difference, then we can simply add all the metadata columns there and see.

Copy link
Contributor

@ramonski ramonski Apr 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Searching and finding contents in the sysstem is crucial in my opinion for user acceptance.

Having a quick search, which does not find the expected results, will lower this acceptance and I'd rather accept more memory/time consumption to provide a good search facility than to lose user acceptance.

Also users expect to wait after they entered a search value. We should rather improve performance on pure display sites and listings.

Anyhow, please consider that in your coding @nihadness and thanks for your replies so far.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course, I always try to consider things we talk about in PR's, very helpful for self-improvement :)
So, I will just modify the method and add all the metadata values to that index.
Welcome and thank you and Jordi too!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking (for another PR) that maybe could be nice to create a function like serch_by_term(catalog, portal_type, search_term, base_query=None) in senaite.core.api and move all related logic (metadata_search and ngx3_search) there?
btw, these discussions are one of the reasons I love you all! ;)

1 change: 1 addition & 0 deletions bika/lims/catalog/indexers/configure.zcml
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,6 @@
<adapter name="sortable_title" factory=".analysiscategory.sortable_title"/>
<adapter name="sortable_title" factory=".baseanalysis.sortable_title"/>
<adapter name="assigned_state" factory=".analysisrequest.assigned_state"/>
<adapter name="listing_searchable_text" factory=".analysisrequest.listing_searchable_text"/>

</configure>
6 changes: 6 additions & 0 deletions bika/lims/content/analysisrequest.py
Original file line number Diff line number Diff line change
Expand Up @@ -2551,6 +2551,12 @@ def getSamplingRoundUID(self):
else:
return ''

def getStorageLocationTitle(self):
""" A method for AR listing catalog metadata
:return: Title of Storage Location
"""
return self.getStorageLocation() and self.getStorageLocation().Title() or ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are calling here a "potentailly expensive" method twice. Better do it once and keep it in a variable


@security.public
def getResultsRange(self):
"""Returns the valid result ranges for the analyses this Analysis
Expand Down
5 changes: 4 additions & 1 deletion bika/lims/upgrade/v01_02_005.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
# Some rights reserved. See LICENSE.rst, CONTRIBUTORS.rst.
from bika.lims import api
from bika.lims import logger
from bika.lims.catalog.analysisrequest_catalog import CATALOG_ANALYSIS_REQUEST_LISTING
from bika.lims.config import PROJECTNAME as product
from bika.lims.upgrade import upgradestep
from bika.lims.upgrade.utils import UpgradeUtils
Expand All @@ -28,7 +29,9 @@ def upgrade(tool):
logger.info("Upgrading {0}: {1} -> {2}".format(product, ver_from, version))

# -------- ADD YOUR STUFF HERE --------

ut.addIndex(CATALOG_ANALYSIS_REQUEST_LISTING, "listing_searchable_text",
"TextIndexNG3")
ut.refreshCatalogs()
logger.info("{0} upgraded to version {1}".format(product, version))

return True
2 changes: 2 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
'Products.DataGridField',
'Products.AdvancedQuery',
'Products.TinyMCE',
'Products.TextIndexNG3',
'collective.monkeypatcher',
'collective.js.jqueryui',
'plone.app.z3cform',
Expand All @@ -61,6 +62,7 @@
'plone.resource',
'CairoSVG==1.0.20',
'collective.taskqueue',
'zopyx.txng3.ext==3.4.0'
],
extras_require={
'test': [
Expand Down