Feature Request: recover from partially pulled layer (network error, NOT out out of disk space) without restarting #14616

praveenkumar · 2022-06-16T12:38:31Z

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind feature

Description
When pulling an image large layer (>1GB) through registry and pull fails in between because of network issue or user cancel it then
this layer is repulled from scratch instead resuming from partially pulled layer.

It would be good to have the partial layer saved, and when the pull is attempted again for the download to resume from where it left off.

Steps to reproduce the issue:

Create image with very large layers and store in repository
Pull image over slow and possibly unreliable network.
Have pull fail when layer is close to complete. ( try manually exit)

Describe the results you received:

Try again pulling same image and you will see that layer which was close to complete again start pulling from scratch.

Describe the results you expected:

Should resume instead pulling from scratch.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

$ podman version
Client:       Podman Engine
Version:      4.1.0
API Version:  4.1.0
Go Version:   go1.18
Built:        Fri May  6 21:45:54 2022
OS/Arch:      linux/amd64

This is kind of similar issue as #7497

The text was updated successfully, but these errors were encountered:

Luap99 · 2022-06-16T16:22:56Z

@mtrmac @vrothberg PTAL

mtrmac · 2022-06-16T17:31:05Z

This specific RFE is, I think, basically local to c/image/docker/dockerImageSource.GetBlob: on an EOF or some other failures (as long as we received some data), use a range request to continue where we left off. Mostly local and transparent to the rest, apart from having to decide on a heuristic for which failures to handle like that.

There might be some interaction with c/common/pkg/retry if the failures are timeouts. I’m not sure that we need to handle that, apart from maybe containers/common#654 (impose a total timeout on a single image copy).

To be explicit, this RFE is not “retry a layer pull individually, don’t abort the copy and retry on c/common/pkg/retry to do the copy all from scratch again”.

akiross · 2022-06-30T16:34:53Z

Another situation in which current state of things caused annoyance: process interrupted by lack of disk space.

Had to download a 12GB image; after all the blobs were downloaded, the process was interrupted during unpacking due to missing space on device. Blobs were discarded, had to download them all again.

Unsure if this is related to what OP said: indeed it's part of the same "continue where you left off" user experience.

$ podman version
Client:       Podman Engine
Version:      4.1.1
API Version:  4.1.1
Go Version:   go1.17.10
Built:        Tue Jan  1 01:00:00 1980
OS/Arch:      linux/amd64

mtrmac · 2022-06-30T16:40:26Z

@akiross That’s not this specific RFE. (Also, it’s not something I’d expect now that we commit layers immediately, but please report it separately.)

github-actions · 2022-07-31T00:08:00Z

A friendly reminder that this issue had no activity for 30 days.

alien999999999 · 2022-12-19T16:11:32Z

So, what happened for me is:

Trying to pull docker.io/some/image...
Getting image source signatures
Copying blob ea362f368469 done  
Copying blob a4431157df48 done  
Copying blob 014f7000a66e done  
Copying blob 85dcc8c0c752 done  
Copying blob b119e9491b35 done  
Copying blob bbcb5ff8fceb done  
Copying blob e83e99622cc7 [=====================>----------------] 1.7GiB / 3.0GiB
Copying blob 074ef7e60abf done  
Getting image source signatures
Copying blob bbcb5ff8fceb done  
Copying blob b119e9491b35 done  
Copying blob ea362f368469 done  
Copying blob 85dcc8c0c752 done  
Copying blob a4431157df48 done  
Copying blob 014f7000a66e done  
Copying blob e83e99622cc7 [=====================>----------------] 1.7GiB / 3.0GiB
Copying blob 074ef7e60abf done  
Getting image source signatures
Copying blob ea362f368469 done  
Copying blob b119e9491b35 done  
Copying blob a4431157df48 done  
Copying blob 014f7000a66e done  
Copying blob bbcb5ff8fceb done  
Copying blob 85dcc8c0c752 done  
Copying blob e83e99622cc7 [=====================>----------------] 1.7GiB / 3.0GiB
Copying blob 074ef7e60abf done  
Getting image source signatures
Copying blob ea362f368469 done  
Copying blob b119e9491b35 done  
Copying blob 014f7000a66e done  
Copying blob 85dcc8c0c752 done  
Copying blob bbcb5ff8fceb done  
Copying blob a4431157df48 done  
Copying blob 074ef7e60abf done  
Copying blob e83e99622cc7 [=====================>----------------] 1.7GiB / 3.0GiB
  write /tmp/storage328542328/8: no space left on device
Error: Error writing blob: error storing blob to file "/tmp/storage328542328/8": write /tmp/storage328542328/8: no space left on device

It looks like some timeout (can i increase it somewhere?) and it retries, but seems to start over...

vrothberg · 2022-12-20T09:05:33Z

@alien999999999, your other comment indicated that you're using Podman v3.0. I suggest updating to a more recent version of Podman and to increase /tmp or move it to a bigger partition.

alien999999999 · 2022-12-20T11:24:46Z

I'm going to look at that, but i think it's part of that distribution, not sure i can easily upgrade.

In any case upgrading /tmp is not gonna work, it just endlessly timeouts and retries... do you know if there is a timeout session i can just double? or download this blob manually so it gets picked up or something? i noticed an ostree option...

alien999999999 · 2022-12-20T11:30:49Z

Yeah, it's on the distro, but upgrading this is difficult, what else would i have to upgrade?

vrothberg · 2022-12-20T12:05:19Z

A timeout/retry won't help when there's not enough space on the device. All improvements for this issue are shipped with later versions of Podman.

alien999999999 · 2022-12-20T12:44:19Z

the /tmp is 8GB, the file is 3GB, it's only after 4 times 1.7GB that it crosses the out of disk space...

alien999999999 · 2022-12-20T13:05:04Z

according to https://stackoverflow.com/questions/16895294/how-to-set-timeout-for-http-get-requests-in-golang ; http.Client has a Timeout field? could you expose this to the pull_options ?

alien999999999 · 2022-12-20T13:38:39Z

if i manually download this blob with curl commands; how can i put this into an ostree? is there any documentation on that?

mtrmac · 2022-12-23T14:59:41Z

@alien999999999 Like #14616 (comment) , running out of space is not what we are trying to track in this RFE. Please file a fresh issue and discuss there.

alien999999999 · 2022-12-29T09:51:08Z

So, yes, tha'ts exactly it, this is NOT about the "out of disk space"... that is NOT my problem. my problem is it restarts after a timeout at 1.7GB (before the 3GB blob is done), and does so in endless loop, the out of disk space just happens as a result of the contant retrying, (I could add Terabytes and eventually it would also fill out); I was hoping there was a tunable timeout, so i could double that and the 3GB would be transferred and I could continue

rhatdan · 2023-01-03T18:59:45Z

@mtrmac Reminder.

ghost · 2024-10-31T14:34:35Z

any chance this will be supported soon?

mtrmac · 2024-10-31T14:41:11Z

containers/image#1816 exists for quite some time now, and I think that’s the extent of what makes sense to do.

A strict reading of the RFE is that if podman pull is interrupted, e.g. using Ctrl+C, that the being-pulled layer is stored somewhere and later reused on a later retry. To do that, we would need to somehow ensure that if there is no “later retry”, the partial file (which can be many gigabytes in size) would need to be automatically deleted somehow, without bothering the user. I don’t know how that could happen, and I think it’s very unlikely we would ever implement that.

ghost · 2024-10-31T14:53:42Z

@mtrmac Is it possible to add an option for the recoverable pull, where user take care about the the partial file
they can be deleted when using system prune

openshift-ci bot added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 16, 2022

rhatdan mentioned this issue Jul 25, 2022

Implement a retry on failing uploads / podman push #14048

Closed

github-actions bot added the stale-issue label Jul 31, 2022

vrothberg removed the stale-issue label Dec 20, 2022

mtrmac changed the title ~~Feature Request: recover from partially pulled layer without restarting~~ Feature Request: recover from partially pulled layer (network error, NOT out out of disk space) without restarting Dec 23, 2022

Luap99 mentioned this issue Nov 1, 2024

podman pull doesn't cache downloaded layers #24139

Closed

Luap99 marked this as a duplicate of #25119 Jan 27, 2025

Luap99 mentioned this issue Jan 27, 2025

podman pull should cache images to prevent long dealys while redownloading #25119

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: recover from partially pulled layer (network error, NOT out out of disk space) without restarting #14616

Feature Request: recover from partially pulled layer (network error, NOT out out of disk space) without restarting #14616

praveenkumar commented Jun 16, 2022

Luap99 commented Jun 16, 2022

mtrmac commented Jun 16, 2022

akiross commented Jun 30, 2022

mtrmac commented Jun 30, 2022

github-actions bot commented Jul 31, 2022

alien999999999 commented Dec 19, 2022 •

edited

Loading

vrothberg commented Dec 20, 2022

alien999999999 commented Dec 20, 2022

alien999999999 commented Dec 20, 2022

vrothberg commented Dec 20, 2022

alien999999999 commented Dec 20, 2022

alien999999999 commented Dec 20, 2022

alien999999999 commented Dec 20, 2022

mtrmac commented Dec 23, 2022

alien999999999 commented Dec 29, 2022

rhatdan commented Jan 3, 2023

ghost commented Oct 31, 2024

mtrmac commented Oct 31, 2024 •

edited

Loading

ghost commented Oct 31, 2024

Feature Request: recover from partially pulled layer (network error, NOT out out of disk space) without restarting #14616

Feature Request: recover from partially pulled layer (network error, NOT out out of disk space) without restarting #14616

Comments

praveenkumar commented Jun 16, 2022

Luap99 commented Jun 16, 2022

mtrmac commented Jun 16, 2022

akiross commented Jun 30, 2022

mtrmac commented Jun 30, 2022

github-actions bot commented Jul 31, 2022

alien999999999 commented Dec 19, 2022 • edited Loading

vrothberg commented Dec 20, 2022

alien999999999 commented Dec 20, 2022

alien999999999 commented Dec 20, 2022

vrothberg commented Dec 20, 2022

alien999999999 commented Dec 20, 2022

alien999999999 commented Dec 20, 2022

alien999999999 commented Dec 20, 2022

mtrmac commented Dec 23, 2022

alien999999999 commented Dec 29, 2022

rhatdan commented Jan 3, 2023

ghost commented Oct 31, 2024

mtrmac commented Oct 31, 2024 • edited Loading

ghost commented Oct 31, 2024

alien999999999 commented Dec 19, 2022 •

edited

Loading

mtrmac commented Oct 31, 2024 •

edited

Loading