Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aarch64: Use LDP/STP if FEAT_LSE2 is available #11

Merged
merged 1 commit into from
May 25, 2022
Merged

aarch64: Use LDP/STP if FEAT_LSE2 is available #11

merged 1 commit into from
May 25, 2022

Conversation

taiki-e
Copy link
Owner

@taiki-e taiki-e commented Apr 30, 2022

aarch64 part of #10

@taiki-e taiki-e added the O-arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state label Apr 30, 2022
@taiki-e taiki-e force-pushed the lse2 branch 3 times, most recently from 568ac2c to 6ddf9be Compare April 30, 2022 18:57
@taiki-e taiki-e changed the title aarch64: Use LDP/STP when FEAT_LSE2 is available aarch64: Use LDP/STP if FEAT_LSE2 is available Apr 30, 2022
@@ -26,6 +26,7 @@ test_task:
- RUSTFLAGS="$RUSTFLAGS -C target-feature=+lse" RUSTDOCFLAGS="$RUSTDOCFLAGS -C target-feature=+lse" cargo test -vv --workspace --exclude asm-test --exclude bench --all-features --release -- -Z unstable-options --report-time
# Use -Z build-std because the prebuilt libtest seems to be incompatible with LTO, causing miscompilation: https://gist.github.com/taiki-e/9713f8e02e8f9f852ccee8d6f089ec24
- RUSTFLAGS="$RUSTFLAGS -C target-feature=+lse" RUSTDOCFLAGS="$RUSTDOCFLAGS -C target-feature=+lse" CARGO_PROFILE_RELEASE_CODEGEN_UNITS=1 CARGO_PROFILE_RELEASE_LTO=fat cargo -Z build-std test -vv --workspace --exclude asm-test --exclude bench --all-features --release --tests --target $TARGET -- -Z unstable-options --report-time
# TODO: lse2 is not available on Graviton2 (armv8.2-a)
Copy link
Owner Author

@taiki-e taiki-e Apr 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When Cirrus CI moved to Graviton3 (armv8.5-a? IIRC EDIT: armv8.4-a) we can test this with real aarsh64 machines on CI.

(EDIT: Using Graviton2 actually seems to work too...)

@@ -62,3 +63,4 @@ valgrind_task:
- RUSTFLAGS="$RUSTFLAGS -C target-feature=+lse" RUSTDOCFLAGS="$RUSTDOCFLAGS -C target-feature=+lse" cargo test -vv --workspace --exclude asm-test --exclude bench --all-features --release -Z doctest-xcompile -- -Z unstable-options --report-time
# Use -Z build-std because the prebuilt libtest seems to be incompatible with LTO, causing miscompilation: https://gist.github.com/taiki-e/9713f8e02e8f9f852ccee8d6f089ec24
- RUSTFLAGS="$RUSTFLAGS -C target-feature=+lse" RUSTDOCFLAGS="$RUSTDOCFLAGS -C target-feature=+lse" CARGO_PROFILE_RELEASE_CODEGEN_UNITS=1 CARGO_PROFILE_RELEASE_LTO=fat cargo -Z build-std test -vv --workspace --exclude asm-test --exclude bench --all-features --release --tests --target $TARGET -- -Z unstable-options --report-time
# TODO: lse2 is not available on Graviton2 (armv8.2-a)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not confirmed if valgrind can handle CPU features not supported by the host.

@taiki-e
Copy link
Owner Author

taiki-e commented Apr 30, 2022

Multithreaded tests using QEMU with cpu=a64fx failed due to torn reads, but a64fx is armv8.2-a so this is fine.
https://github.com/taiki-e/portable-atomic/runs/6241823005?check_suite_focus=true

However, considering that QEMU with cpu=max also failed, it is also possible that QEMU has not yet properly implemented FEAT_LSE2.
https://github.com/taiki-e/portable-atomic/runs/6241980138?check_suite_focus=true

(Singlethread tests using QEMU succeed)

Comment on lines +177 to +186
// As of rustc nightly-2022-04-30, target_feature "lse2" is not available on rustc side:
// https://github.com/rust-lang/rust/blob/d201c812d40932509b2b5307c0b20c1ce78d21da/compiler/rustc_codegen_ssa/src/target_features.rs#L45
Copy link
Owner Author

@taiki-e taiki-e Apr 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM (and std_detect) support it, so using it is not a problem, but we need to parse rustflags in the build script.

@taiki-e taiki-e marked this pull request as ready for review April 30, 2022 20:30
@taiki-e
Copy link
Owner Author

taiki-e commented Apr 30, 2022

cc @ibraheemdev

Copy link

@ibraheemdev ibraheemdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (from my limited knowledge), thanks!

@taiki-e taiki-e force-pushed the lse2 branch 6 times, most recently from 7cdbb66 to 96f80c6 Compare May 25, 2022 14:18
@taiki-e taiki-e enabled auto-merge (rebase) May 25, 2022 14:50
@taiki-e taiki-e merged commit 4baa9f2 into main May 25, 2022
@taiki-e taiki-e deleted the lse2 branch May 25, 2022 15:24
taiki-e added a commit to taiki-e/atomic-maybe-uninit that referenced this pull request May 30, 2022
taiki-e added a commit to taiki-e/atomic-maybe-uninit that referenced this pull request May 30, 2022
taiki-e added a commit to taiki-e/atomic-maybe-uninit that referenced this pull request May 30, 2022
taiki-e added a commit to taiki-e/atomic-maybe-uninit that referenced this pull request May 30, 2022
taiki-e added a commit to taiki-e/atomic-maybe-uninit that referenced this pull request May 30, 2022
taiki-e added a commit to taiki-e/atomic-maybe-uninit that referenced this pull request May 30, 2022
@taiki-e taiki-e added the O-aarch64 Target: Armv8-A, Armv8-R, or later processors in AArch64 mode label Sep 2, 2024
@taiki-e taiki-e removed the O-arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state label Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-aarch64 Target: Armv8-A, Armv8-R, or later processors in AArch64 mode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants