-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIX] Fix hangs during testing #137967
[AIX] Fix hangs during testing #137967
Conversation
r? @ChrisDenton rustbot has assigned @ChrisDenton. Use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from the AIX perspective. These test cases hang the test run indefinitely at the moment, so this unblocks regular runs.
tests/ui/consts/large_const_alloc.rs
Outdated
@@ -2,6 +2,7 @@ | |||
// on 32bit and 16bit platforms it is plausible that the maximum allocation size will succeed | |||
// FIXME (#135952) In some cases on AArch64 Linux the diagnostic does not trigger | |||
//@ ignore-aarch64-unknown-linux-gnu | |||
//@ ignore-aix: FIXME(#137966) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the system behaves badly on large allocations, then there is nothing to fix here.
//@ ignore-aix: FIXME(#137966) | |
//@ ignore-aix: alloc failure on AIX can result in SIGKILL instead of nullptr |
@daltenty Do you have any idea why it sometimes hangs and sometimes SIGKILLs? |
It not exactly an "hang". |
...Is the problem that you literally have 128TiB of RAM? |
Hm, wait... laziness in paging due to overcommit, resulting in the system accepting an allocation that can't possibly be respected if called but assuming that no one will actually call that bluff? |
tests/ui/consts/large_const_alloc.rs
Outdated
@@ -2,6 +2,9 @@ | |||
// on 32bit and 16bit platforms it is plausible that the maximum allocation size will succeed | |||
// FIXME (#135952) In some cases on AArch64 Linux the diagnostic does not trigger | |||
//@ ignore-aarch64-unknown-linux-gnu | |||
// AIX will allow allow the allocation to go through, and get SIGKILL when zero initializing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// AIX will allow allow the allocation to go through, and get SIGKILL when zero initializing | |
// AIX will allow the allocation to go through, and get SIGKILL when zero initializing |
tests/ui/consts/large_const_alloc.rs
Outdated
@@ -2,6 +2,9 @@ | |||
// on 32bit and 16bit platforms it is plausible that the maximum allocation size will succeed | |||
// FIXME (#135952) In some cases on AArch64 Linux the diagnostic does not trigger | |||
//@ ignore-aarch64-unknown-linux-gnu | |||
// AIX will allow allow the allocation to go through, and get SIGKILL when zero initializing | |||
// the overcommited page. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// the overcommited page. | |
// the overcommitted page. |
// AIX will allow allow the allocation to go through, and get SIGKILL when zero initializing | ||
// the overcommited page. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// AIX will allow allow the allocation to go through, and get SIGKILL when zero initializing | |
// the overcommited page. | |
// AIX will allow the allocation to go through, and get SIGKILL when zero initializing | |
// the overcommitted page. |
address nits, squash, and then r=me |
bcf78ad
to
2a7ad95
Compare
Yes the sigkills are from lazy paging and OS overcommitting. Addressed nit and squashed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, wait... what are the actual mismatches on the scope ID, exactly?
They have the same scope, but a different zone index? Because it's assigned the 0 ZoneID...? But we create the IP with a scope_id
of 0...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think included the wrong scope id in the post. The peer we get back contains a scope id of 1 for the loopback.
(edited PR description to scope id to reflect it)
(gdb) list
41 });
42
43 let server = t!(UdpSocket::bind(&server_ip));
44 tx1.send(()).unwrap();
45 let mut buf = [0];
46 let (nread, src) = t!(server.recv_from(&mut buf));
47 assert_eq!(nread, 1);
48 assert_eq!(buf[0], 99);
49 assert_eq!(compare_ignore_zoneid(&src, &client_ip), true);
50 rx2.recv().unwrap();
(gdb) p src
$2 = core::net::socket_addr::SocketAddr::V6(core::net::socket_addr::SocketAddrV6 {ip: core::net::ip_addr::Ipv6Addr {octets: [0 <repeats 15 times>, 1]}, port: 19603, flowinfo: 0, scope_id: 1})
(gdb) p client_ip
$3 = core::net::socket_addr::SocketAddr::V6(core::net::socket_addr::SocketAddrV6 {ip: core::net::ip_addr::Ipv6Addr {octets: [0 <repeats 15 times>, 1]}, port: 19603, flowinfo: 0, scope_id: 0})
bash-5.2$ cat /etc/hosts | grep ::1
::1 loopback localhost # IPv6 loopback (lo0) name/address
bash-5.2$ ibm-clang++_r test.cpp -o test
bash-5.2$ cat test.cpp
#include <net/if.h>
#include <iostream>
#include <sysexits.h>
int main(void)
{
auto scope_id = if_nametoindex("lo0");
std::cout << scope_id << std::endl;
return EX_OK;
}
bash-5.2$ ./test
1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha.
We probably should be creating these with a scope ID of 1 for the loopback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now this change is fine though.
Fixed nits |
@mustartt: 🔑 Insufficient privileges: Not in reviewers |
Oh, sorry @bors r+ rollup |
…kingjubilee [AIX] Fix hangs during testing Fixes all current test hangs experienced during CI runs. 1. ipv6 link-local (the loopback device) gets assigned an automatic zone id of 1, causing the assert to fail and hang in `library/std/src/net/udp/tests.rs` 2. Const alloc does not fail gracefully 3. Debuginfo test has problem with gdb auto load safe path
Rollup of 18 pull requests Successful merges: - rust-lang#126856 (remove deprecated tool `rls`) - rust-lang#137314 (change definitely unproductive cycles to error) - rust-lang#137504 (Move methods from Map to TyCtxt, part 4.) - rust-lang#137701 (Convert `ShardedHashMap` to use `hashbrown::HashTable`) - rust-lang#137967 ([AIX] Fix hangs during testing) - rust-lang#138002 (Disable CFI for weakly linked syscalls) - rust-lang#138052 (strip `-Wlinker-messages` wrappers from `rust-lld` rmake test) - rust-lang#138063 (Improve `-Zunpretty=hir` for parsed attrs) - rust-lang#138109 (make precise capturing args in rustdoc Json typed) - rust-lang#138147 (Add maintainers for powerpc64le-unknown-linux-gnu) - rust-lang#138245 (stabilize `ci_rustc_if_unchanged_logic` test for local environments) - rust-lang#138296 (Remove `AdtFlags::IS_ANONYMOUS` and `Copy`/`Clone` condition for anonymous ADT) - rust-lang#138300 (add tracking issue for unqualified_local_imports) - rust-lang#138307 (Allow specifying glob patterns for try jobs) - rust-lang#138313 (Update books) - rust-lang#138315 (use next_back() instead of last() on DoubleEndedIterator) - rust-lang#138318 (Rustdoc: remove a bunch of `@ts-expect-error` from main.js) - rust-lang#138330 (Remove unnecessary `[lints.rust]` sections.) Failed merges: - rust-lang#137147 (Add exclude to config.toml) r? `@ghost` `@rustbot` modify labels: rollup
…kingjubilee [AIX] Fix hangs during testing Fixes all current test hangs experienced during CI runs. 1. ipv6 link-local (the loopback device) gets assigned an automatic zone id of 1, causing the assert to fail and hang in `library/std/src/net/udp/tests.rs` 2. Const alloc does not fail gracefully 3. Debuginfo test has problem with gdb auto load safe path
Rollup of 11 pull requests Successful merges: - rust-lang#135987 (Clarify iterator by_ref docs) - rust-lang#137967 ([AIX] Fix hangs during testing) - rust-lang#138063 (Improve `-Zunpretty=hir` for parsed attrs) - rust-lang#138147 (Add maintainers for powerpc64le-unknown-linux-gnu) - rust-lang#138288 (Document -Z crate-attr) - rust-lang#138300 (add tracking issue for unqualified_local_imports) - rust-lang#138307 (Allow specifying glob patterns for try jobs) - rust-lang#138315 (use next_back() instead of last() on DoubleEndedIterator) - rust-lang#138330 (Remove unnecessary `[lints.rust]` sections.) - rust-lang#138335 (Fix post-merge workflow) - rust-lang#138343 (Enable `f16` tests for `powf`) r? `@ghost` `@rustbot` modify labels: rollup
Rollup of 10 pull requests Successful merges: - rust-lang#135987 (Clarify iterator by_ref docs) - rust-lang#137967 ([AIX] Fix hangs during testing) - rust-lang#138063 (Improve `-Zunpretty=hir` for parsed attrs) - rust-lang#138147 (Add maintainers for powerpc64le-unknown-linux-gnu) - rust-lang#138288 (Document -Z crate-attr) - rust-lang#138300 (add tracking issue for unqualified_local_imports) - rust-lang#138307 (Allow specifying glob patterns for try jobs) - rust-lang#138315 (use next_back() instead of last() on DoubleEndedIterator) - rust-lang#138330 (Remove unnecessary `[lints.rust]` sections.) - rust-lang#138335 (Fix post-merge workflow) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#137967 - mustartt:fix-aix-test-hangs, r=workingjubilee [AIX] Fix hangs during testing Fixes all current test hangs experienced during CI runs. 1. ipv6 link-local (the loopback device) gets assigned an automatic zone id of 1, causing the assert to fail and hang in `library/std/src/net/udp/tests.rs` 2. Const alloc does not fail gracefully 3. Debuginfo test has problem with gdb auto load safe path
Fixes all current test hangs experienced during CI runs.
library/std/src/net/udp/tests.rs