Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added iter_thru_item to FileBytesReader #28

Closed
wants to merge 2 commits into from

Conversation

charlie-dufort
Copy link

No description provided.

Copy link
Member

@thorwhalen thorwhalen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots to say on this one.

First, I can't accept this pull request because this method is not "tracked" for wrapping; that is, the wrapping tools are set up to work with builtin mapping methods only -- when you add a method to the mapping, it will work in it's original form, but if you wrap it (in your case, here, add a key transformer), it will not, and therefore create bugs that are hard to find.

If you want to use this method, you can just subclass FileBytesReader and add it there (still, beware of what I said above -- only do this on your "final" layer). Let's remember that often using delegation instead of inheritance is a good idea (but not always).

That said, two things.

First, the "only mapping methods are wrappable" might be a temporary thing. We are considering a design that would enable users to "register" extra methods so that wrapping tools know about them. We violate the "pure mapping" rule ourselves sometimes. But the redesign hasn't happened yet, and we're not totally sure it's a good idea.
If we do, we'll probably do something like wrapt's wrapt’s excellent ObjectProxy.

Secondly, the need for "paged" or "chunked" readers (and writers!) has definitely come up. It shows up not only when we have big files, but in general, when ever there's a lot of data and/or IO is expensive (for example, remote DBs). So our tooling needs to get something reusable for that soon.

Here's what exists so far, around this theme:

  • Stream, which started out in dol (well, py2store) to provide more fine-tuned control over iteration. It's original purpose was precisely to read large files. Namely, a store would return a Stream instance that could then be consumed, automatically taking care of any paging/chunking concerns, but also preparing, filtering... This Stream then grew up to be called Creek in a namesake creek package that accumulated many other streaming tools.
  • Does your use case use iter_thru_item and __iter__, iter_thru_item? Often the answer is "just one of them", and in which case, maybe a different solution is better: One where your store returns values that are then to be "consumed". In that case, it might be a good idea to make consider context managers. See some past dol use cases here.

I've made a discussion to flag this need

@thorwhalen
Copy link
Member

See #28 (review)

@thorwhalen thorwhalen closed this Jun 19, 2023
@thorwhalen
Copy link
Member

See also: #31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants