Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed listing tables search logic to operate on catalog metadata #619

Merged
merged 49 commits into from
Feb 8, 2018

Conversation

ramonski
Copy link
Contributor

Description of the issue/feature this PR addresses

Linked issue: #608

Current behavior before PR

Listing table search build a big Advanced Query. It did not find any contents for most of the searches.

Desired behavior after PR is merged

The listing table generates now a searchable text out of all brain metadata and filters the brains where the searchterm is found in the catalog metadata.

--
I confirm I have tested this PR thoroughly and coded it according to PEP8
and Plone's Python styleguide standards.

@ramonski
Copy link
Contributor Author

Note to myself:
When rendering only the tables of the listing, the jQuery events are not rebinded.
This is the reason why a whole page reload is done.

@xispa
Copy link
Member

xispa commented Feb 1, 2018

I've done some functional testing. I just noticed that on filtering (via search term), the "Show more" button makes the whole page to be refreshed (with filtering and other params in GET) instead of adding the next items at the end of the list as it does when no term search is used. Could be nice to keep the same behavior when no search term is used. Is this possible?

EDIT: @3c615a3 + @149c439 resolve this particular issue

@xispa
Copy link
Member

xispa commented Feb 7, 2018

I've been testing your PR from a functional point of view and works excellent, but I am a bit worried about the performance of this search approach for large amounts of items. A search against 44289 ARs took ~30 seconds:

2018-02-07T00:50:51 INFO Bika ListingView::get_catalog_query: query={'path': {'query': '/', 'level': 0}, 'sort_on': 'created', 'sort_order': 'descending', 'cancellation_state': 'active'}
2018-02-07T00:50:51 INFO Bika ListingView::search: return 44289 results
...
2018-02-07T00:51:37 INFO Bika ListingView::search: Search for '3553' executed in 31.52s (11 matches)
...
2018-02-07T00:54:03 INFO Bika ListingView::search: Search for '158' executed in 26.59s (1115 matches)

but... I like more your approach than the previous one with AdvancedQuery.

Maybe we could add a ZCTextIndex and store all metadata we want to be searchable there. If get_catalog_indexes finds a ZCTextIndex it could use it through a direct search for only this index instead of get_metadata_dump_for+ regex.search for each brain. Otherwise, the machinery would use the latter mechanism. This way, we could have both approaches at the same time with a good compromise between performance and flexibility. What do you think?

@ramonski
Copy link
Contributor Author

ramonski commented Feb 7, 2018

Good idea and I really like it. I'm just not sure if these in-between searches are working as well for ZCText Indexes.

An other approach would be maybe to reduce the final filter set even further or just yielding the filtered results until the pagesize is reached?

@xispa
Copy link
Member

xispa commented Feb 7, 2018

I'd say that yielding the filtered results until the pagesize is reached is not worth. If you see the results I obtained above, the first search only found 11 matches from 44k records, while the pagesize is 50 by default. So, with the yeilding approach for this search, you'll not find any difference. Same happens if you are searching for a search tearm that only matches with records that were created long time ago (e.g. if you want to search for records from 2y ago).

I still think that ZCTextIndex approach is worth to try. But maybe we could go one step further an decouple this search machinery from bikalisting and create a function in the api - e.g. search_by_term(portal_type, catalog=None, indexes=[], term=None) - and delegate inside all the logic (check if the catalog has a ZCTextIndex or not, etc.).

@ramonski
Copy link
Contributor Author

ramonski commented Feb 7, 2018

Good catch. Indeed it would be only an improvement if not all catalog results need to be processed. I'll do some further research on this to look for some further approaches. If I don't find one, I would suggest to do the improvement in another PR to get this functionality already in as it is.

Copy link
Member

@xispa xispa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added some unnecessary comments

toggle_cols = self.get_toggle_cols()
for col in self.columns.keys():
self.columns[col]['toggle'] = col in toggle_cols
return self.pagesize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...or return utils.to_int(pagesize, self.pagesize)

if self.get_rows_only():
return self.contents_table(table_only=form_id)
else:
return self.template(self.context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can omit this last else

try:
return int(limit)
except (ValueError, TypeError):
return 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return utils.to_int(limit, 0)

return False
catalog = self.get_catalog()
sort_index = catalog.Indexes.get(sort_on)
if not hasattr(sort_index, 'documentToKeyMap'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pheew!, didn't know that an index without this documentToKeyMap attr was not suitable for sorting, good catch!
I'd add something like logger.warning("Index {} cannot be used for sorting".format(sort_on))

"""
if isinstance(thing, DateTime.DateTime):
return True
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return isinstance(thing, DateTime.DateTime) would be more concise

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants