Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
overlay differ: Do file comparison in some cases.
This change results in the overlay differ comparing files to determine if they are actually part of the diff. This is needed to resolve differences between the blobs created by the overlay differ and the double-walking differ. Before this change, the overlay differ always just assumed that if a file was in the upperdir it must be part of the diff and included it as an add or a modify change. However, there are situations in which files can appear in the upperdir without having been modified or even opened. For example, if "foo" is a file or dir present in the lowerdirs of an overlay mount and you run "mv foo footmp; mv footmp foo", then the upperdir will contain foo (in addition to any files found under foo if it's a dir). In this situation, the double-walking differ would not include foo as part of the diff, but the overlay differ would. This meant that the overlay differ would potentially include extra files in each blob for such diffs relative to the double-walking differ. As of now, while this does increase image size, it doesn't result in any inconsistencies in terms of the contents of images because it just results in files/dirs getting duplicated on top of their equivalents. However, for the upcoming DiffOp support, this inconsistency could actually result in the same operation producing mounts with different contents depending on which differ is used. This change is therefore necessary in order to enforce DiffOp consistency (on top of the possible improvements to exported image size). The main concern here is that this could undo the performance benefits that the overlay differ was intended to fix. However, in practice the situations where this has worse performance are quite obscure and the benefits should still be present. First, consider the case where foo is a directory and the user does the equivalent of "mv foo footmp; mv footmp foo". Even before this change, the overlay differ would see that foo is marked as opaque and thus fall back to using the double-walking differ. So there's no performance regression in this case as the double-walking differ does the same file comparisons as were added in this commit. For the case where the user shuffles a file back and forth, there will potentially be a slow file content based comparison if the underlying file has a truncated nanosecond timestamp (i.e. it was unpacked from a tar file). However, the situations in which you shuffle an individual file without changing it (or open it for writing but then write nothing) that is large enough in size for content comparisons to be slow are obscure. Additionally, while the content comparison may be slow, there will be time saved during export because the file won't be included unnecessarily in the exported blob, so it's a tradeoff rather than a pure loss. In situations where the user actually did change a file and it shows up in the upperdir, it should be extremely rare that the content comparison code path is followed. It would require that the user changed no other metadata of the file, including size, and both mod timestamps were the same (which could only really happen if their underlying filesystem lacked support for nanosecond precision and they modified the file within 1 second of its modification in the lowerdir or they manually changed the modtime with chtimes). Signed-off-by: Erik Sipsma <[email protected]>
- Loading branch information