Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-level check method #937

Open
gimdh opened this issue Mar 6, 2023 · 4 comments
Open

Multi-level check method #937

gimdh opened this issue Mar 6, 2023 · 4 comments

Comments

@gimdh
Copy link

gimdh commented Mar 6, 2023

For now, only one of check methods can be chosen as a method. #908 also suggests more detailed check method selection. However, IMO, it would be more rational to compare file name or size, or even both of them first, and then fall back to the hashing for the last resort. With stackable check methods, more obscure conditions such as edit date or creation date could also be adopted. While czkawka is fast on its own, this would even speed up process by reducing disk IO, especially on magnetic drives.

The UI I propose is to have some sort of list, where we can add check methods one-by-one with priority.

image

@Die4Ever
Copy link

Die4Ever commented Mar 6, 2023

I definitely think hashes should only be checked if the file sizes are the same, the hash won't be the same if the sizes are different anyways (collisions are possible but that totally goes against the use case for this program)

that should honestly be how the hash check always works but I haven't tested it because I haven't needed hash comparisons before

@qarmin
Copy link
Owner

qarmin commented Apr 2, 2023

In #956 I added size and name check.

Currently it works quite optimal(not sure if this can be speedup more):

  • Size - checks size of files
  • Name - checks name of files
  • Size and Name - checks name and size of files
  • Hash - checks size of files, and if there is at least 2 files with same size, app calculate hash for them

@gimdh
Copy link
Author

gimdh commented Apr 5, 2023

Great! That will cover most use cases.

I hope to see more sophisticated check method afterwards in the future though. For example, simple size check may fail if data is saved as fixed size chunks, or text count is identical in a text file. In such cases, modified date may help to further identify if they are safe to be considered identical. Would you mind if I keep this issue open?

@duracell
Copy link

duracell commented Nov 7, 2023

I searched for this function, but couldn't find it.
Is there no option in the CLI for this?

Edit:
And if I understand it correct it's still not possible to check the hash if the size is the same? Even in the GUI?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants