-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Smarter detection of binary files during delocate-merge #236
Conversation
Updates merging to use `lipo` to detect binary files and removes the previous less reliable behavior. This handles the common case of binary files having no suffix. Reusing old tests instead of trying to keep the old behavior. Fuse wheels tests requires lipo
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #236 +/- ##
==========================================
- Coverage 97.18% 97.06% -0.13%
==========================================
Files 16 16
Lines 1350 1361 +11
==========================================
+ Hits 1312 1321 +9
- Misses 38 40 +2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First thoughts.
from_tree: str | PathLike, | ||
lib_exts=(".so", ".dylib", ".a"), | ||
): | ||
to_tree: str | PathLike[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please forgive my ignorance, but what is the role of the [str]
specification to PathLike
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mypy's stricter settings don't like ambiguous generics. Not currently enabled, but now I add them as habit.
PathLike
is generic and PathLike[str]
declares the parameter as expecting Unicode paths. It should generally be the same as string type it's unioned with. So str | PathLike[str]
compared to str | bytes | PathLike[str] | PathLike[bytes]
. Even now the behavior of bytes types in paths is hard to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes - sorry - but I was unclear how the type checker distinguishes byte-based Paths from string-based Paths. When I'm running mypy
, the type checker already detects type mismatch when making Paths from bytes.
delocate/fuse.py
Outdated
if cmp_contents(from_path, to_path): | ||
continue | ||
try: | ||
# Try to fuse this file with lipo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - so you're leaving lipo-fuse to detect if this an executable binary on this platform? Is this safe - I mean - does lipo -create
guarantee to raise an error if one of the input files is not a mach-o binary? I did a quick speedcheck - using file
to check the filetype first for both files nearly doubles the time to do this check, compared to just calling lipo -create
, but using the python-magic
library, and parsing the output, adds virtually nothing to the check - and could also give a more informative message. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lipo
returns error codes on non-library files (or any error). It will say the file isn't a library on stderr I think. I did some basic probing to make sure that it's at least more reliable than otool
but I'm not a native MacOS dev. I'm unsure what to do other than to rely on the current tests.
python-magic
sounds interesting but fuse_trees
is not a function which does anything with errors other than ignoring them. A list if lipo
error messages to ignore might be a better idea as I'm more worried about actual library files silently failing to fuse rather than files being falsely detected as libraries. lipo
doesn't document its error codes very well so you might need to provide me with a list of which stderr strings should be suppressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also keep in mind the previous logic of all files with library-like suffixes being unconditionally fused using lipo
. So failing to operate on libraries might not be a problem since no one has raised an issue on it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to work on stderr - but I'm worried that processing stderr might be fragile across versions. I wonder, should we raise an error when actual libraries fail to fuse, and propagate that error? I guess we should decide if we want to propagate errors - for example - when the two libraries are the same architecture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure what that error would look like. Does lipo
even fail in these cases?
I personally wouldn't fail here if the architectures are the same, especially if lipo
allows that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lipo does not allow fusing two files with the same architecture, no:
$ cp ./delocate/tests/data/np-1.24.1_x86_64_random__sfc64.cpython-311-darwin.so foo.so
$ lipo -create delocate/tests/data/np-1.24.1_x86_64_random__sfc64.cpython-311-darwin.so foo.so -output bar.so
fatal error: /Library/Developer/CommandLineTools/usr/bin/lipo: delocate/tests/data/np-1.24.1_x86_64_random__sfc64.cpython-311-darwin.so and foo.so have the same architectures (x86_64) and can't be in the same fat output file
Co-authored-by: Matthew Brett <[email protected]>
Relfecting more - the behavior change here is that any file that matches across the two archives, and that does not compare equal, will run through That's probably OK - but will now mean |
I guess checking with |
Which MIME types should be expected from |
Here's some outputs: [ins] In [1]: import magic
[ins] In [2]: magic.from_file('README.rst')
Out[2]: 'Unicode text, UTF-8 text'
[ins] In [3]: magic.from_file('delocate/tests/data/np-1.24.1_x86_64_random__sfc64.cpython
...: -311-darwin.so')
Out[3]: 'Mach-O 64-bit x86_64 bundle, flags:<NOUNDEFS|DYLDLINK|TWOLEVEL>'
[ins] In [4]: magic.from_file('delocate/tests/data/np-1.6.0_intel_lib__compiled_base.so')
...:
Out[4]: 'Mach-O universal binary with 2 architectures: [x86_64:\\012- Mach-O 64-bit x86_64 bundle, flags:<NOUNDEFS|DYLDLINK|TWOLEVEL>] [\\012- i386:\\012- Mach-O i386 bundle, flags:<NOUNDEFS|DYLDLINK|TWOLEVEL>]'
[ins] In [5]: magic.from_file('delocate/tests/data/np-1.24.1_arm_random__sfc64.cpython-31
...: 1-darwin.so')
Out[5]: 'Mach-O 64-bit arm64 bundle, flags:<NOUNDEFS|DYLDLINK|TWOLEVEL>'
[ins] In [6]: magic.from_file('wheel_makers/fakepkg_toplevel/module2.c')
Out[6]: 'C source, ASCII text'
[ins] In [7]: !dd if=/dev/random of=randbin bs=1 count=100
...:
100+0 records in
100+0 records out
100 bytes transferred in 0.000585 secs (170940 bytes/sec)
[ins] In [8]: magic.from_file('randbin')
Out[8]: 'data' |
I was asking for
|
Yes, sorry, I sent the not-MIME outputs to show the level of detail they give. I'll send the MIME outputs in a bit. For the external dependency - given we're only running on Mac for the moment, and that binary packaging works currently for Mac - I think that's OK - I'm guessing they'll keep maintaining it. But the CI will show us when they drop binary packaging. |
MIME outputs, as you can see, are much less informative: [ins] In [10]: magic.from_file('README.rst', mime=True)
Out[10]: 'text/plain'
[ins] In [11]: magic.from_file('delocate/tests/data/np-1.24.1_x86_64_random__sfc64.cpytho
...: n-311-darwin.so', mime=True)
Out[11]: 'application/x-mach-binary'
[ins] In [13]: magic.from_file('delocate/tests/data/np-1.6.0_intel_lib__compiled_base.so'
...: , mime=True)
Out[13]: 'application/x-mach-binary'
[ins] In [14]: magic.from_file('delocate/tests/data/np-1.24.1_arm_random__sfc64.cpython-3
...: 11-darwin.so', mime=True)
Out[14]: 'application/x-mach-binary'
[ins] In [15]: magic.from_file('wheel_makers/fakepkg_toplevel/module2.c', mime=True)
Out[15]: 'text/x-c'
[ins] In [16]: magic.from_file('randbin', mime=True)
Out[16]: 'application/octet-stream' |
The detailed output doesn't seem helpful in this situation.
Could you demonstrate |
Sure, segfaults and so on are unlikely - and probably it will be fine. The main advantage of |
There's only one expected error from Example stderr:
I'll start by checking for this. |
lipo_fuse calls _run which messes up the output strings that needed to be checked.
Looks reasonable to me ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Updates merging to automatically detect binary files and removes the older behavior. This handles the common case of binary files having no suffix. Fixes #228
Using
_is_macho_file
has false negatives. Instead, all files are passed to thelipo
command and its return code is used to determine if the file was valid to be merged. This might have its own issues, but I'd need to completely rewritelipo_fuse
to make it graceful.The
lib_exts
parameter offuse_trees
is now ignored and I intend for it to be removed completely later.This reuses old tests instead of trying to keep the old behavior. Updating the tests now could conflict with PR #234.
Pull Request Checklist