Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change compression, recompress existing chunks in repo #630

Closed
ThomasWaldmann opened this issue Feb 4, 2016 · 2 comments
Closed

change compression, recompress existing chunks in repo #630

ThomasWaldmann opened this issue Feb 4, 2016 · 2 comments

Comments

@ThomasWaldmann
Copy link
Member

create an archive with different compression as used in previous archives and do not consider chunks we already have (with old compression), but store all chunks again (using the new compression).

hmm, is this already done by --no-files-cache? if it doesn't quickly skip the files, it will chunk them and store them again into the repo (using the new compression).

if that works, we should document that. if it works, it would ofc only change the chunks that are referenced by the new archive. chunks only referenced by old archives would not get recompressed.


💰 there is a bounty for this

@ThomasWaldmann ThomasWaldmann changed the title borg create --recompress change compression, recompress existing chunks in the repo Feb 4, 2016
@ThomasWaldmann ThomasWaldmann changed the title change compression, recompress existing chunks in the repo change compression, recompress existing chunks in repo Feb 4, 2016
@ThomasWaldmann ThomasWaldmann added this to the 1.1 - near future features milestone Feb 20, 2016
@ThomasWaldmann
Copy link
Member Author

maybe a bit cleaner is to iterate over all chunks in the repo, read (decrypt,decompress) the data and write it back (compress, encrypt) again.

when doing that, similar considerations as for deleting a lot of archives have to be done:
when rewriting the chunks, they are written to new segment files (needing space, potentially lots of space) and the old chunks in the old segment files are not needed any more. try to not double space requirements, but keep it down to ~ 1 segment file size.

enkore added a commit to enkore/borg that referenced this issue Mar 29, 2016
Use with caution: permanent data loss by specifying incorrect patterns
is easily possible. Make a dry run to make sure you got everything right.

borg rewrite has many uses:
- Can selectively remove files/dirs from old archives, e.g. to free
  space or purging picturarum biggus dickus from history
- Recompress data
- Rechunkify data, to have upgraded Attic / Borg 0.xx archives deduplicate
  with Borg 1.x archives. (Or to experiment with chunker-params for
  specific use cases

It is interrupt- and resumable.

Chunks are not freed on-the-fly.
Rationale:
  Makes only sense when rechunkifying, but logic on which new chunks to
  free what input chunks is complicated and *very* delicate.


Current TODOs:
- Detect and skip (unless --force) already recompressed chunks
  -- delayed until current PRs on borg.key APIs are decided
     borgbackup#810 borgbackup#789
- Usage example

Future TODOs:
- Refactor tests using py.test fixtures
  -- would require porting ArchiverTestCase to py.test: many changes,
     this changeset is already borderline too large.
- Possibly add a --target option to not replace the source archive
  -- with the target possibly in another Repo
     (better than "cp" due to full integrity checking, and deduplication
      at the target)

Fixes borgbackup#787 borgbackup#686 borgbackup#630 borgbackup#70 (and probably some I overlooked)
Also see borgbackup#757 and borgbackup#770
enkore added a commit to enkore/borg that referenced this issue Mar 29, 2016
Use with caution: permanent data loss by specifying incorrect patterns
is easily possible. Make a dry run to make sure you got everything right.

borg rewrite has many uses:
- Can selectively remove files/dirs from old archives, e.g. to free
  space or purging picturarum biggus dickus from history
- Recompress data
- Rechunkify data, to have upgraded Attic / Borg 0.xx archives deduplicate
  with Borg 1.x archives. (Or to experiment with chunker-params for
  specific use cases

It is interrupt- and resumable.

Chunks are not freed on-the-fly.
Rationale:
  Makes only sense when rechunkifying, but logic on which new chunks to
  free what input chunks is complicated and *very* delicate.


Current TODOs:
- Detect and skip (unless --force) already recompressed chunks
  -- delayed until current PRs on borg.key APIs are decided
     borgbackup#810 borgbackup#789
- Usage example

Future TODOs:
- Refactor tests using py.test fixtures
  -- would require porting ArchiverTestCase to py.test: many changes,
     this changeset is already borderline too large.
- Possibly add a --target option to not replace the source archive
  -- with the target possibly in another Repo
     (better than "cp" due to full integrity checking, and deduplication
      at the target)

Fixes borgbackup#787 borgbackup#686 borgbackup#630 borgbackup#70 (and probably some I overlooked)
Also see borgbackup#757 and borgbackup#770
enkore added a commit to enkore/borg that referenced this issue Mar 29, 2016
Use with caution: permanent data loss by specifying incorrect patterns
is easily possible. Make a dry run to make sure you got everything right.

borg rewrite has many uses:
- Can selectively remove files/dirs from old archives, e.g. to free
  space or purging picturarum biggus dickus from history
- Recompress data
- Rechunkify data, to have upgraded Attic / Borg 0.xx archives deduplicate
  with Borg 1.x archives. (Or to experiment with chunker-params for
  specific use cases

It is interrupt- and resumable.

Chunks are not freed on-the-fly.
Rationale:
  Makes only sense when rechunkifying, but logic on which new chunks to
  free what input chunks is complicated and *very* delicate.


Current TODOs:
- Detect and skip (unless --force) already recompressed chunks
  -- delayed until current PRs on borg.key APIs are decided
     borgbackup#810 borgbackup#789
- Usage example

Future TODOs:
- Refactor tests using py.test fixtures
  -- would require porting ArchiverTestCase to py.test: many changes,
     this changeset is already borderline too large.
- Possibly add a --target option to not replace the source archive
  -- with the target possibly in another Repo
     (better than "cp" due to full integrity checking, and deduplication
      at the target)

Fixes borgbackup#787 borgbackup#686 borgbackup#630 borgbackup#70 (and probably some I overlooked)
Also see borgbackup#757 and borgbackup#770
enkore added a commit to enkore/borg that referenced this issue Mar 29, 2016
Use with caution: permanent data loss by specifying incorrect patterns
is easily possible. Make a dry run to make sure you got everything right.

borg rewrite has many uses:
- Can selectively remove files/dirs from old archives, e.g. to free
  space or purging picturarum biggus dickus from history
- Recompress data
- Rechunkify data, to have upgraded Attic / Borg 0.xx archives deduplicate
  with Borg 1.x archives. (Or to experiment with chunker-params for
  specific use cases

It is interrupt- and resumable.

Chunks are not freed on-the-fly.
Rationale:
  Makes only sense when rechunkifying, but logic on which new chunks to
  free what input chunks is complicated and *very* delicate.


Current TODOs:
- Detect and skip (unless --force) already recompressed chunks
  -- delayed until current PRs on borg.key APIs are decided
     borgbackup#810 borgbackup#789
- Usage example

Future TODOs:
- Refactor tests using py.test fixtures
  -- would require porting ArchiverTestCase to py.test: many changes,
     this changeset is already borderline too large.
- Possibly add a --target option to not replace the source archive
  -- with the target possibly in another Repo
     (better than "cp" due to full integrity checking, and deduplication
      at the target)

Fixes borgbackup#787 borgbackup#686 borgbackup#630 borgbackup#70 (and probably some I overlooked)
Also see borgbackup#757 and borgbackup#770
enkore added a commit to enkore/borg that referenced this issue Mar 29, 2016
Use with caution: permanent data loss by specifying incorrect patterns
is easily possible. Make a dry run to make sure you got everything right.

borg rewrite has many uses:
- Can selectively remove files/dirs from old archives, e.g. to free
  space or purging picturarum biggus dickus from history
- Recompress data
- Rechunkify data, to have upgraded Attic / Borg 0.xx archives deduplicate
  with Borg 1.x archives. (Or to experiment with chunker-params for
  specific use cases

It is interrupt- and resumable.

Chunks are not freed on-the-fly.
Rationale:
  Makes only sense when rechunkifying, but logic on which new chunks to
  free what input chunks is complicated and *very* delicate.


Current TODOs:
- Detect and skip (unless --force) already recompressed chunks
  -- delayed until current PRs on borg.key APIs are decided
     borgbackup#810 borgbackup#789
- Usage example

Future TODOs:
- Refactor tests using py.test fixtures
  -- would require porting ArchiverTestCase to py.test: many changes,
     this changeset is already borderline too large.
- Possibly add a --target option to not replace the source archive
  -- with the target possibly in another Repo
     (better than "cp" due to full integrity checking, and deduplication
      at the target)

Fixes borgbackup#787 borgbackup#686 borgbackup#630 borgbackup#70 (and probably some I overlooked)
Also see borgbackup#757 and borgbackup#770
enkore added a commit to enkore/borg that referenced this issue Apr 1, 2016
Use with caution: permanent data loss by specifying incorrect patterns
is easily possible. Make a dry run to make sure you got everything right.

borg rewrite has many uses:
- Can selectively remove files/dirs from old archives, e.g. to free
  space or purging picturarum biggus dickus from history
- Recompress data
- Rechunkify data, to have upgraded Attic / Borg 0.xx archives deduplicate
  with Borg 1.x archives. (Or to experiment with chunker-params for
  specific use cases

It is interrupt- and resumable.

Chunks are not freed on-the-fly.
Rationale:
  Makes only sense when rechunkifying, but logic on which new chunks to
  free what input chunks is complicated and *very* delicate.


Current TODOs:
- Detect and skip (unless --force) already recompressed chunks
  -- delayed until current PRs on borg.key APIs are decided
     borgbackup#810 borgbackup#789
- Usage example

Future TODOs:
- Refactor tests using py.test fixtures
  -- would require porting ArchiverTestCase to py.test: many changes,
     this changeset is already borderline too large.
- Possibly add a --target option to not replace the source archive
  -- with the target possibly in another Repo
     (better than "cp" due to full integrity checking, and deduplication
      at the target)

Fixes borgbackup#787 borgbackup#686 borgbackup#630 borgbackup#70 (and probably some I overlooked)
Also see borgbackup#757 and borgbackup#770
enkore added a commit to enkore/borg that referenced this issue Apr 7, 2016
Use with caution: permanent data loss by specifying incorrect patterns
is easily possible. Make a dry run to make sure you got everything right.

borg recreate has many uses:
- Can selectively remove files/dirs from old archives, e.g. to free
  space or purging picturarum biggus dickus from history
- Recompress data
- Rechunkify data, to have upgraded Attic / Borg 0.xx archives deduplicate
  with Borg 1.x archives. (Or to experiment with chunker-params for
  specific use cases

It is interrupt- and resumable.

Chunks are not freed on-the-fly.
Rationale:
  Makes only sense when rechunkifying, but logic on which new chunks to
  free what input chunks is complicated and *very* delicate.

Future TODOs:
- Refactor tests using py.test fixtures
  -- would require porting ArchiverTestCase to py.test: many changes,
     this changeset is already borderline too large.
- Possibly add a --target option to not replace the source archive
  -- with the target possibly in another Repo
     (better than "cp" due to full integrity checking, and deduplication
      at the target)
- Detect and skip (unless --always-recompress) already recompressed chunks

Fixes borgbackup#787 borgbackup#686 borgbackup#630 borgbackup#70 (and probably some I overlooked)
Also see borgbackup#757 and borgbackup#770
enkore added a commit to enkore/borg that referenced this issue Apr 8, 2016
Use with caution: permanent data loss by specifying incorrect patterns
is easily possible. Make a dry run to make sure you got everything right.

borg recreate has many uses:
- Can selectively remove files/dirs from old archives, e.g. to free
  space or purging picturarum biggus dickus from history
- Recompress data
- Rechunkify data, to have upgraded Attic / Borg 0.xx archives deduplicate
  with Borg 1.x archives. (Or to experiment with chunker-params for
  specific use cases

It is interrupt- and resumable.

Chunks are not freed on-the-fly.
Rationale:
  Makes only sense when rechunkifying, but logic on which new chunks to
  free what input chunks is complicated and *very* delicate.

Future TODOs:
- Refactor tests using py.test fixtures
  -- would require porting ArchiverTestCase to py.test: many changes,
     this changeset is already borderline too large.
- Possibly add a --target option to not replace the source archive
  -- with the target possibly in another Repo
     (better than "cp" due to full integrity checking, and deduplication
      at the target)
- Detect and skip (unless --always-recompress) already recompressed chunks

Fixes borgbackup#787 borgbackup#686 borgbackup#630 borgbackup#70 (and probably some I overlooked)
Also see borgbackup#757 and borgbackup#770
enkore added a commit to enkore/borg that referenced this issue Apr 10, 2016
Use with caution: permanent data loss by specifying incorrect patterns
is easily possible. Make a dry run to make sure you got everything right.

borg recreate has many uses:
- Can selectively remove files/dirs from old archives, e.g. to free
  space or purging picturarum biggus dickus from history
- Recompress data
- Rechunkify data, to have upgraded Attic / Borg 0.xx archives deduplicate
  with Borg 1.x archives. (Or to experiment with chunker-params for
  specific use cases

It is interrupt- and resumable.

Chunks are not freed on-the-fly.
Rationale:
  Makes only sense when rechunkifying, but logic on which new chunks to
  free what input chunks is complicated and *very* delicate.

Future TODOs:
- Refactor tests using py.test fixtures
  -- would require porting ArchiverTestCase to py.test: many changes,
     this changeset is already borderline too large.
- Possibly add a --target option to not replace the source archive
  -- with the target possibly in another Repo
     (better than "cp" due to full integrity checking, and deduplication
      at the target)
- Detect and skip (unless --always-recompress) already recompressed chunks

Fixes borgbackup#787 borgbackup#686 borgbackup#630 borgbackup#70 (and probably some I overlooked)
Also see borgbackup#757 and borgbackup#770
@enkore
Copy link
Contributor

enkore commented Apr 10, 2016

"borg recreate" (#812) is now merged into master and allows this as well (among other things).

Skipping chunks already compressed with the specified compression is not implemented yet, but will likely come with a future PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants