Content Type Filtering for Migration. #16

nihadness · 2018-02-13T11:03:42Z

Current Behavior
While fetching data, it is not possible to filter the objects from the source instance. It would be nice if users could choose content types they want to import into the destination.

Expected Behavior after this PR
Users can enter content types separated by comma ( , ) to filter the objects that will be imported.

It is possible that sync will have to create some 'unwanted' objects if they are dependencies of main objects. However, for that kind of objects, dependencies will not be created recursively. It might cause an error during 'reindexing' those objects. A try:except added especially for that case.

Also some log levels and messages were updated to avoid noise in the logs and make them more understandable.

…ng-for-migration

ramonski

Phew, a lot changed in here I obviously missed... Please take care that you do not increase complexity unnecessary. However, only minor changes needed Nihad, so it would be cool if you could take care of them;)

ramonski · 2018-02-14T12:26:57Z

src/senaite/sync/browser/views.py

+            if ct:
+                content_types = [t.strip() for t in ct.split(",") if t]
+            else:
+                content_types = None


That's already the default value, so you can skip it.

content_types = form.get("content_types", None) if content_types is not None: content_types = content_types.split(",")

ramonski · 2018-02-14T12:28:36Z

src/senaite/sync/fetchstep.py

@@ -56,6 +56,7 @@ def verify(self):
        storage["credentials"]["url"] = self.url
        storage["credentials"]["username"] = self.username
        storage["credentials"]["password"] = self.password
+        storage["credentials"]["content_types"] = self.content_types


Not sure why the selected content_types need to be in the credentials storage... Please choose a better location to increase clarity

I had had to move it to the storage after I saw need of those 'content types' in both- fetch and import steps. It cannot be just a variable or saved in memory, because currently a user can do fetch, then stop the instance, and then run the import...
Actually currently @juangallostra is working on making settings and registry records optional, so we will have to save them somewhere as well.
So any better idea where to save all these Import Options ?

I meant that you should better put it into the settings storage;) The credentials should hold only the credentials for the sync.
https://github.com/senaite/senaite.sync/blob/master/src/senaite/sync/browser/views.py#L142

Ah true, that's where they should be : )

ramonski · 2018-02-14T12:30:37Z

src/senaite/sync/fetchstep.py

better use "/".join(["registry", key])

ramonski · 2018-02-14T12:34:44Z

src/senaite/sync/browser/templates/sync.pt

+                   class="form-control"
+                   id="content_types"
+                   tal:attributes="value view/content_types|nothing"
+                   name="content_types"/>


I would render here a multi checkbox field instead of requiring the user to type in the content types by hand.
Note: You also don't handle eventual spaces in between in your code, e.g. Sample, Worksheed , FooType

You are right that using checkboxes is a way better idea. But we discussed it with @xispa how to generate that list of checkboxes.
Should it be hardcoded?! - Not a good solution plus what about different content types of each add on?! Should we retrieve them from Portal Types registry? Then the list is too long...
So we ended up with letting the user enter them considering that the person who manages sync, will have some acknowledge.

And spaces are removed here: https://github.com/senaite/senaite.sync/pull/16/files#diff-32423e22290aaa3db2cdebe0c85fa752R106

Ok got it. Then maybe you should do something like this afterwards:

portal_types = api.get_tool("portal_types") content_types = filter(lambda ct: ct in portal_types, content_types)

Just a scribble, so you need to find out how to retrieve the current registered types...

That is a good idea, I added it to form submission in the view.

ramonski · 2018-02-14T12:39:05Z

src/senaite/sync/fetchstep.py

        # Knowing the catalog length compute the number of pages we will need
        # with the desired window size and overlap
        effective_window = window-overlap
-        number_of_pages = (catalog_data["count"]/effective_window) + 1
+        number_of_pages = (cd["count"]/effective_window) + 1


Why this? Wasn't there once a method yield_items that exactly went through the pages?

Yes there was and there is, thanks to you : )
But since we retrieve the objects in the order of modification time ('uid_catalog' does it actually), considering that some objects can be modified in the source during fetch process in the destination, we query some records twice. I.e: First query retrieves records from '0-1000' and the second query '990-1990' and so on. So if there are 10 objects being modified during each fetch query, we are still safe.

Ah, tricky :) But what if 11 objects were modified 🙈That was the reason I suggested that we just fetch the UID, portal_type and physical path in step 1 to just be able to create the object slugs. Then in step 2 I would fetch just in time to update the objects and probably re-fetch any missing objects...

We already fetch just UID, portal_type and path in first step...You mean fetching all the objects at once? It can be done here by increasing window size. But then with a source with 800.000 objects, it was loading memory to the full, especially if the source and the destination is on the same machine.
Please let me know if I didn't clearly understand what you had meant.

nihadness · 2018-02-20T08:51:59Z

Hi @ramonski ,could you please say what the state of this PR is? : )

ramonski

Hi Nihad, thanks for your efforts!
Only some minor changes needed and then we can merge that in;)

ramonski · 2018-02-20T09:10:16Z

src/senaite/sync/browser/views.py

+            content_types = form.get("content_types", None)
+            if content_types is not None:
+                content_types = [t.strip() for t in content_types.split(",")
+                                 if t]


I think you can just do a [t.strip() for t in content_types.split(",")] since you filter out invalids below

ramonski · 2018-02-20T09:11:38Z

src/senaite/sync/browser/views.py

-            if not all([self.domain_name, self.url, self.username,
-                        self.password]):
+            if not all([domain_name, url, username,
+                        password]):


make this a single line

ramonski · 2018-02-20T09:14:23Z

src/senaite/sync/importstep.py

-                        p_local_uid = existing.UID()
-                        self.sh.update_by_path(p_path, local_uid=p_local_uid)
+
+        existing = self.portal.unrestrictedTraverse(str(local_path), None)


Better ensure in translate_path that it returns a string than to typecast it here

ramonski · 2018-02-20T09:15:58Z

src/senaite/sync/importstep.py

+        if existing:
+            # Skip if its the portal object.
+            if len(p_path.split("/")) < 3:
+                return


check for the portal path instead of this. E.g.

portal = api.get_portal() portal_path = api.get_path(portal) if ppath == portal_path: return

ramonski · 2018-02-20T09:18:44Z

src/senaite/sync/syncstep.py

+        parent_path = item.get("parent_path")
+        # Skip if the parent is portal object
+        if len(parent_path.split("/")) < 3:
+            return


DRY;) Make a method that returns you the portal path and use it here as well for your check

ramonski · 2018-02-20T09:24:49Z

src/senaite/sync/importstep.py

+                # TODO: Make sure reindexing won't fail!
+                try:
+                    api.get_object_by_uid(uid).reindexObject()
+                except Exception, e:


Your exception catch is too generic in here.
E.g. it swallows the case when api.get_object_by_uid fails with or your reindexObject fails.
Try to divide this into two steps and try to find out which errors occur. The more specific we are, the better we can debug it later

you are right that just Exception is not good at all, but 'reindexing' can fail with too many error types. I added that 'TODO' there to find out which method (that is not a field in the Scehma) of the object can fail so we actually do not need this error handling here... For now, to be safe I think it is better to have generic Exception case, if you don't mind.

nihadness added 18 commits February 8, 2018 10:24

Content Types text area in sync view.

cfd18ea

Portal type in the queries of fetch step.

102b997

Parse and send 'content_types' to Fetch step from Sync View.

ccd6ea1

Fetch missing parents if content type filter is enabled.

04ace9a

Save selected portal types in storage.

bddabc0

Fetch missing dependencies during import.

aa7daaa

Remove 'try except' from parent creation step.

2e01d0d

Merge remote-tracking branch 'origin/master' into portal-type-filteri…

03a9f0e

…ng-for-migration

Try to handle missing objects when filtering is disabled too.

d1eef70

Remove log for field setter.

c898b9a

Skip Computed Fields when updating objects.

2f4126b

Remove info log for ProxyField Setter.

2c92b27

Fetch missing parents is now generic method.

9170e85

Handle objects with or without handling their dependencies.

cb85cbe

Trim Content Types.

9572eba

Skip Reference Fields if local uid is not found.

7d62c23

Handle and log errors during reindexing objects.

5e6f7f9

Modified Logs.

c1357c1

nihadness added the enhancement label Feb 13, 2018

nihadness self-assigned this Feb 13, 2018

nihadness requested a review from ramonski February 13, 2018 11:03

nihadness added 2 commits February 13, 2018 13:43

Correct parameter name in url.

c006128

Replace 'info' level with 'debug' for user import step logs.

e718540

nihadness mentioned this pull request Feb 13, 2018

Attachments are not imported. #17

Closed

ramonski suggested changes Feb 14, 2018

View reviewed changes

nihadness added 5 commits February 14, 2018 14:14

Review Changes.

4ab8c77

Save content types directly in storage.

50a76c9

Filter and get only valid content types.

fc83efe

Save content types under 'configuration' in the storage.

48f6905

Merge branch 'master' into portal-type-filtering-for-migration

54faa7d

juangallostra mentioned this pull request Feb 15, 2018

Configurable import #19

Merged

ramonski suggested changes Feb 20, 2018

View reviewed changes

ramonski merged commit 61c096c into senaite:master Feb 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content Type Filtering for Migration. #16

Content Type Filtering for Migration. #16

nihadness commented Feb 13, 2018 •

edited

Loading

ramonski left a comment

ramonski Feb 14, 2018

ramonski Feb 14, 2018

nihadness Feb 14, 2018

ramonski Feb 14, 2018

nihadness Feb 14, 2018

ramonski Feb 14, 2018

ramonski Feb 14, 2018

nihadness Feb 14, 2018

ramonski Feb 14, 2018

nihadness Feb 15, 2018

ramonski Feb 14, 2018

nihadness Feb 14, 2018

ramonski Feb 14, 2018

nihadness Feb 14, 2018

nihadness commented Feb 20, 2018

ramonski left a comment

ramonski Feb 20, 2018

ramonski Feb 20, 2018

ramonski Feb 20, 2018

ramonski Feb 20, 2018

ramonski Feb 20, 2018

ramonski Feb 20, 2018

nihadness Feb 21, 2018

Content Type Filtering for Migration. #16

Content Type Filtering for Migration. #16

Conversation

nihadness commented Feb 13, 2018 • edited Loading

ramonski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nihadness commented Feb 20, 2018

ramonski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nihadness commented Feb 13, 2018 •

edited

Loading