Added iter_thru_item to FileBytesReader #28

charlie-dufort · 2023-06-17T00:34:14Z

No description provided.

Added iter_thru_item to FileBytesReader

thorwhalen

Lots to say on this one.

First, I can't accept this pull request because this method is not "tracked" for wrapping; that is, the wrapping tools are set up to work with builtin mapping methods only -- when you add a method to the mapping, it will work in it's original form, but if you wrap it (in your case, here, add a key transformer), it will not, and therefore create bugs that are hard to find.

If you want to use this method, you can just subclass FileBytesReader and add it there (still, beware of what I said above -- only do this on your "final" layer). Let's remember that often using delegation instead of inheritance is a good idea (but not always).

That said, two things.

First, the "only mapping methods are wrappable" might be a temporary thing. We are considering a design that would enable users to "register" extra methods so that wrapping tools know about them. We violate the "pure mapping" rule ourselves sometimes. But the redesign hasn't happened yet, and we're not totally sure it's a good idea.
If we do, we'll probably do something like wrapt's wrapt’s excellent ObjectProxy.

Secondly, the need for "paged" or "chunked" readers (and writers!) has definitely come up. It shows up not only when we have big files, but in general, when ever there's a lot of data and/or IO is expensive (for example, remote DBs). So our tooling needs to get something reusable for that soon.

Here's what exists so far, around this theme:

Stream, which started out in dol (well, py2store) to provide more fine-tuned control over iteration. It's original purpose was precisely to read large files. Namely, a store would return a Stream instance that could then be consumed, automatically taking care of any paging/chunking concerns, but also preparing, filtering... This Stream then grew up to be called Creek in a namesake creek package that accumulated many other streaming tools.
Does your use case use iter_thru_item and __iter__, iter_thru_item? Often the answer is "just one of them", and in which case, maybe a different solution is better: One where your store returns values that are then to be "consumed". In that case, it might be a good idea to make consider context managers. See some past dol use cases here.

I've made a discussion to flag this need

thorwhalen · 2023-06-19T16:17:56Z

See #28 (review)

thorwhalen · 2023-06-23T13:04:51Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added iter_thru_item to FileBytesReader #28

Added iter_thru_item to FileBytesReader #28

charlie-dufort commented Jun 17, 2023

thorwhalen left a comment •

edited

Loading

thorwhalen commented Jun 19, 2023

thorwhalen commented Jun 23, 2023

Added iter_thru_item to FileBytesReader #28

Added iter_thru_item to FileBytesReader #28

Conversation

charlie-dufort commented Jun 17, 2023

thorwhalen left a comment • edited Loading

Choose a reason for hiding this comment

thorwhalen commented Jun 19, 2023

thorwhalen commented Jun 23, 2023

thorwhalen left a comment •

edited

Loading