-
-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pthreads segfault on RHEL 8.5 #36
Comments
Lloyd,
I'm grabbing a copy of RHEL 8.5 now. Once I get it set up I'll try to
recreate the problem. This is happening somewhere in the multithreaded
aes-ctr cipher which is annoying as I've done a lot of work on that
lately. As I get more information I'll update you.
Chris
…On 2/14/22 1:01 PM, Lloyd Brown wrote:
I'm building on a RHEL 8.5 image, and keep running into segfaults in the
child process after a connection is made and authenticated. I'm not sure
if the problem is yours, or something having changed with pthreads, etc.
I thought I'd post about it here, and see what happens. If I'm doing
something wrong, I'm happy to take feedback.
I've encountered this problem with the master branch (as of commit
ebf1fee
<ebf1fee>).
Basically, when I launch the /sshd/ daemon
(|/usr/local/openssh-hpn/master/sbin/sshd -ddd -p 2200 -f
/etc/ssh/sshd_config|, in this case), it runs and waits for the
connection. When I connect from another host, it gets all the way
through the authentication, and then the child process that it
"fork()"ed off, segfaults (backtrace below), and the connection closes.
For reference, this is on RHEL 8.5, with GCC 8.5.0, glibc-2.28-164.el8.
I manually ran the configure/make/make install, with the following
syntax on the configure line:
|./configure --prefix=/usr/local/openssh-hpn/master
--sysconfdir=/etc/ssh/ --with-default-path=/usr/local/bin:/bin:/usr/bin
--with-superuser-path=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
--with-md5-passwords --with-pam --with-privsep-path=/var/empty/sshd
--with-libedit --with-xauth=/usr/bin/xauth --disable-strip |
When I use |gdb| and the core file generated to get a backtrace, here's
what I find:
|(gdb) bt #0 __pthread_cancel (th=0) at pthread_cancel.c:33 #1
0x0000561e20178d77 in stop_and_join_pregen_threads
***@***.***=0x7f77a8ae3010) at cipher-ctr-mt.c:221 #2 0x0000561e20178e8e
in ssh_aes_ctr_cleanup (ctx=0x561e21baf280) at cipher-ctr-mt.c:638 #3
0x00007f77b0cee534 in EVP_CIPHER_CTX_reset () from
/lib64/libcrypto.so.1.1 #4 0x00007f77b0cee64d in EVP_CIPHER_CTX_free ()
from /lib64/libcrypto.so.1.1 #5 0x0000561e20178767 in cipher_init
***@***.***=0x561e21b91858, cipher=0x561e2040b400 <ciphers+160>,
key=0x561e21b86b70
"\301\367\">e\255\273\235\353Q\363b@{,\314\020\314\303\020\365\231\357\324\364\351\036P\274\215\n}",
keylen=16, iv=0x561e21bc3e30 "\302\002F\330.>", ivlen=<optimized out>,
do_encrypt=1) at cipher.c:357 #6 0x0000561e2017ffb8 in ssh_set_newkeys
***@***.***=0x561e21b96540, ***@***.***=1) at packet.c:914 #7
0x0000561e201808ef in ssh_packet_send2_wrapped
***@***.***=0x561e21b96540) at packet.c:1252 #8 0x0000561e20180988 in
ssh_packet_send2 (ssh=0x561e21b96540) at packet.c:1319 #9
0x0000561e2018213b in sshpkt_send ***@***.***=0x561e21b96540) at
packet.c:2741 #10 0x0000561e20197970 in kex_send_newkeys
***@***.***=0x561e21b96540) at kex.c:460 #11 0x0000561e2019ad0c in
input_kex_gen_init (type=<optimized out>, seq=<optimized out>,
ssh=0x561e21b96540) at kexgen.c:337 #12 0x0000561e2018928a in
ssh_dispatch_run ***@***.***=0x561e21b96540, ***@***.***=1,
***@***.***=0x0) at dispatch.c:113 #13 0x0000561e20189359 in
ssh_dispatch_run_fatal ***@***.***=0x561e21b96540, ***@***.***=1,
***@***.***=0x0) at dispatch.c:133 #14 0x0000561e20136d1f in
process_buffered_input_packets (ssh=0x561e21b96540) at serverloop.c:365
#15 server_loop2 ***@***.***=0x561e21b96540,
***@***.***=0x561e21b98090) at serverloop.c:365 #16
0x0000561e2014106f in do_authenticated2 (authctxt=0x561e21b98090,
ssh=0x561e21b96540) at session.c:2642 #17 do_authenticated
(ssh=0x561e21b96540, authctxt=0x561e21b98090) at session.c:365 #18
0x0000561e20127ac1 in main (ac=<optimized out>, av=<optimized out>) at
sshd.c:2343 (gdb) |
If there are further debugging steps I can take to help isolate this
problem, please let me know. I may be more of a sysadmin than a
developer, but I'll do my best to follow instructions.
Lloyd
—
Reply to this email directly, view it on GitHub
<#36>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKL66C5Y56VJLFSX3UGLEDU3E7QDANCNFSM5OMFLCGQ>.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Lloyd, I just got RHEL 8.5 running on a VM. This is fresh out of the box with only the updates applied and the necessary libraries (getting libedit-devel was annoying though). I built it with the configuration you gave me. The only thing I did different than you is run autoconf before ./configure. I wasn't not able to recreate the problem. I tried a few different configurations, settings, and ciphers and everything was working as expected. Did you make any other changes? |
This is an NFS-rooted image for deployment on a large HPC cluster. There have been several things that I've had to customize, but I can't think of anything in particular that would affect this. Would it make sense to compare versions numbers of specific packages? I'm not sure which would be the most relevant, but I'm happy to try that. I did do a bunch of I'm re-cloning again from scratch, to see if there's anything I accidentally did in the repository that might've had an effect. I tried building based on at least 2 other git tags before using the master branch, so it's possible there was something residual. I'll get back here shortly with the result. |
Hmm. Unfortunately I'm getting the same result, after using this newly-cloned copy of the repository:
It's a long-shot, but could it be affected by FIPS mode? I do have There could certainly be others, but here are the versions of all the packages that provides any of the paths, in the output of # for i in `ldd /usr/local/openssh-hpn/master/sbin/sshd | awk '{print $3}'`; do rpm -q --whatprovides "$i"; done | sort -u
audit-libs-3.0-0.17.20191104git1c2f876.el8.x86_64
glibc-2.28-164.el8.x86_64
libcap-ng-0.7.11-1.el8.x86_64
libxcrypt-4.1.1-6.el8.x86_64
openssl-libs-1.1.1k-5.el8_5.x86_64
pam-1.3.1-15.el8.x86_64
zlib-1.2.11-17.el8.x86_64
# |
After I rebooted the node without the I'm going to keep testing, and see if I can figure out anything further about what's going on. For reference, this page is RH's official documentation about how to enable FIPS mode, in case you want to verify my findings. I know that with RHEL7, which shipped OpenSSH 7.4p1, OpenSSH was included in the list of packages that had to be certified for FIPS mode compliance, but with RHEL8, which shipped OpenSSH 8.0p1, it was no longer included. I had heard that OpenSSH had started using OpenSSL libs exclusively for it's crypto setup, which would explain the change between RHEL7 and RHEL8. I had assumed that would still be true with your HPN-modified code, as long as it was based on something >= OpenSSH v 8.0, but perhaps that isn't a correct assumption. I'm not suggesting that you necessarily need to fix this, or anything. Just trying to understand the situation, and what the limitations are. Deciding to explicitly not support FIPS mode, is a totally understandable response. Lloyd |
I'll take a look at that. I haven't even considered what is going on
with FIPS so it could be a problem. I'm not opposed to supporting FIPS
but I'll need to learn more about it.
The problem could be that the multithreaded aes-ctr mode does use
OpenSSL to generate the keystream but XORing the data happens outside of
OpenSSL. So it could be an issue there or it could be an issue with how
I'm handling the threads.
I'm probably not going to get a chance to look at this in the next
couple of days but I do want to figure out what is going on. So please
let me know if you find out anything else. I'll also be keeping this
ticket open until I either make a fix or explicitly decide against it.
Chris
…On 2/15/22 11:49 AM, Lloyd Brown wrote:
After I rebooted the node without the |fips=1| I no longer see the
problem occurring. I'm able to log in normally.
I'm going to keep testing, and see if I can figure out anything further
about what's going on. For reference, this page
<https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/security_hardening/assembly_installing-a-rhel-8-system-with-fips-mode-enabled_security-hardening>
is RH's official documentation about how to enable FIPS mode, in case
you want to verify my findings.
I know that with RHEL7, which shipped OpenSSH 7.4p1, OpenSSH was
included in the list of packages that had to be certified for FIPS mode
compliance, but with RHEL8, which shipped OpenSSH 8.0p1, it was no
longer included
<https://www.redhat.com/en/blog/how-rhel-8-designed-fips-140-2-requirements>.
I had heard that OpenSSH had started using OpenSSL libs exclusively for
it's crypto setup, which would explain the change between RHEL7 and
RHEL8. I had assumed that would still be true with your HPN-modified
code, as long as it was based on something >= OpenSSH v 8.0, but perhaps
that isn't a correct assumption.
I'm not suggesting that you necessarily need to fix this, or anything.
Just trying to understand the situation, and what the limitations are.
Deciding to explicitly *not* support FIPS mode, is a totally
understandable response.
Lloyd
—
Reply to this email directly, view it on GitHub
<#36 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKL66FSEIQFQHC6BFYMX7DU3J723ANCNFSM5OMFLCGQ>.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID:
***@***.***>
|
By the way, I did confirm that it is because of FIPS and something to do
with how the multithreaded cipher is interacting with it. In the
meantime, you can disable the the multithreaded version by using
-oDisableMTAES=yes when starting the server (or setting it in the
sshd_config) you'll also need to disabled it in the client. Same option
but you'd need to add it to the system ssh_config file.
I'm curious as to what's happening here and I will work on it in the
next few days. I need to finishing up some packaging for Ubuntu and
Fedora first.
Chris
…On 2/15/22 11:49 AM, Lloyd Brown wrote:
After I rebooted the node without the |fips=1| I no longer see the
problem occurring. I'm able to log in normally.
I'm going to keep testing, and see if I can figure out anything further
about what's going on. For reference, this page
<https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/security_hardening/assembly_installing-a-rhel-8-system-with-fips-mode-enabled_security-hardening>
is RH's official documentation about how to enable FIPS mode, in case
you want to verify my findings.
I know that with RHEL7, which shipped OpenSSH 7.4p1, OpenSSH was
included in the list of packages that had to be certified for FIPS mode
compliance, but with RHEL8, which shipped OpenSSH 8.0p1, it was no
longer included
<https://www.redhat.com/en/blog/how-rhel-8-designed-fips-140-2-requirements>.
I had heard that OpenSSH had started using OpenSSL libs exclusively for
it's crypto setup, which would explain the change between RHEL7 and
RHEL8. I had assumed that would still be true with your HPN-modified
code, as long as it was based on something >= OpenSSH v 8.0, but perhaps
that isn't a correct assumption.
I'm not suggesting that you necessarily need to fix this, or anything.
Just trying to understand the situation, and what the limitations are.
Deciding to explicitly *not* support FIPS mode, is a totally
understandable response.
Lloyd
—
Reply to this email directly, view it on GitHub
<#36 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKL66FSEIQFQHC6BFYMX7DU3J723ANCNFSM5OMFLCGQ>.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID:
***@***.***>
|
Chris, Thank you. I can confirm with FIPS mode on, launching using the syntax below, that I can connect successfully with a non-HPN client, which I was not able to do before.
That will probably be an acceptable workaround for my purposes for now, though I am also curious what happens with your further investigations. But I totally understand about the uncertain timeline. Lloyd |
I am getting what appears to be this same issue on RHEL8 (OpenSSL 1.1.1k, no ability to pull OpenSSL3) FIPS boxes using the latest HPNSSH 18.4.1. Here's my findings in general:
At this point, I'm a bit lost where to continue looking or how to resolve this one (as are various teammates who've been helping debug this), and so I'd like to reopen this issue thread for some advice/pointers, and to try to help contribute to a fix if I can. Thanks! Below is a stack trace from GDB if it helps.
And for version info:
|
So I forgot about FIPS puking on the multithreaded AES. I've accepted your PR and it's moving through the process of making it into master. Have you seen any problems with the default chacha20 cipher? Just curious as that's threaded as well. |
I'm building on a RHEL 8.5 image, and keep running into segfaults in the child process after a connection is made and authenticated. I'm not sure if the problem is yours, or something having changed with pthreads, etc. I thought I'd post about it here, and see what happens. If I'm doing something wrong, I'm happy to take feedback.
I've encountered this problem with the master branch (as of commit ebf1fee). Basically, when I launch the
sshd
daemon (/usr/local/openssh-hpn/master/sbin/sshd -ddd -p 2200 -f /etc/ssh/sshd_config
, in this case), it runs and waits for the connection. When I connect from another host, it gets all the way through the authentication, and then the child process that itfork()
ed off, segfaults (backtrace below), and the connection closes.For reference, this is on RHEL 8.5, with GCC 8.5.0, glibc-2.28-164.el8. I manually ran the configure/make/make install, with the following syntax on the configure line:
When I use
gdb
and the core file generated to get a backtrace, here's what I find:If there are further debugging steps I can take to help isolate this problem, please let me know. I may be more of a sysadmin than a developer, but I'll do my best to follow instructions.
Lloyd
The text was updated successfully, but these errors were encountered: