Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GH-445] Implemented OpenSSH strict KEX protocol #446

Closed
wants to merge 2 commits into from

Conversation

lgoldstein
Copy link
Contributor

@lgoldstein lgoldstein commented Dec 22, 2023

@tomaswolf @gnodet I value your opinion so I would appreciate it if you could take a look at this. I had some doubts as to some pieces of code (don't want to bias your review so I won't say where) so I would really appreciate a fresh set of eyes on this. I apologize for the seemingly large diff on AbstractSession, but it is not my doing - the Maven formatting plugin we use did this (for whatever reason...). However, I believe it is still quite readable.

As far as testing goes, I have tested it with OpenSSH_8.9p1 Ubuntu-3ubuntu0.5, OpenSSL 3.0.2 15 Mar 2022. I do plan on adding unit tests for the code, but before I do that I prefer having your feedback, in case more changes are necessary. In this context, I do encourage you to test both client and server interoperability with this code as well - perhaps on a different OpenSSH version. Note that the feature requires turning it on explicitly - I have added a custom CLI option that you can use for this purpose with the CLI code - -o UseStrictKex=yes

@lgoldstein lgoldstein force-pushed the GH-445 branch 2 times, most recently from 44015d4 to ad699b0 Compare December 22, 2023 09:45
@lgoldstein lgoldstein force-pushed the GH-445 branch 6 times, most recently from 379df82 to 1d0e421 Compare December 22, 2023 14:15
@lgoldstein
Copy link
Contributor Author

I have also added unit tests (except one) - please review them as well and let me know if you think there is a use-case that is not covered by them

@tomaswolf
Copy link
Member

tomaswolf commented Dec 27, 2023

I find this very hard to review.

  1. That reformatting should be avoided. It indicates some problem with the configuration in your IDE or in your maven build. Please ensure that your IDE uses the same settings as the maven build, then such spurious reformatting should not happen.
  2. There is completely unrelated stuff in this change. These things must not be in this change at all.
    • The command execution timeout looks suspicious to me. At least the timeout for running a command should by default be zero, as it was before.
    • There was some refactoring grouping together isolated variables related to KEX (SessionCountersDetails/SessionKexDetails). Probably not needed to implement strict KEX.
  3. I don't think a custom SSH config property "UseStrictKex" is needed.
  4. In fact I don't think any customization flag is needed at all. (Also not in CoreModuleProperties.)
  5. This needs tests against OpenSSH 9.6. A container test once that OpenSSH version is available in some Linux repositories; until then, at least manual testing. We do have various container tests using older OpenSSH versions, so that bit of interoperability should already be covered.

As I understand it:

Both parties can unconditionally include the "strict kex" flag in their initial KEX_INIT message. "strict kex" is active if and only if both parties announce it in their initial KEX_INIT proposal. So both parties know whether strict kex is active once they have received the peer's proposal. If a party concludes that strict kex is active, it can:

  • check that the receive sequence number of the peer's initial KEX_INIT is 1. If not, there were earlier messages, and they disconnect.
    • It does not seem necessary to actively avoid sending IGNORE/DEBUG messages before one's own initial KEX_INIT. Normally that shouldn't happen anyway. If it is needed, it could be done unconditionally (i.e., irrespective of whether strict kex will be active).
  • as long as initialKexDone == false, only allow KEX messages. Reception of any other message causes the party to disconnect.
  • When a party sends its own NEW_KEYS message, it resets the send sequence number to zero after having encoded the NEW_KEYS message itself. (So basically where it installs the new keys it also resets the message sequence counter.)
  • When a party receives a NEW_KEYS message, it resets the receive sequence number to zero after having decoded the message. (Same here: where we install the new keys, we also reset the counter.)

Protocol-wise, this should be a fairly simple change. I would refrain from any beautification or not strictly needed refactoring; such things can be done in later changes once the basic modified protocol works. Probably unit tests would be the bulk of the change.

@lgoldstein
Copy link
Contributor Author

That reformatting should be avoided. It indicates some problem with the configuration in your IDE or in your maven build. Please ensure that your IDE uses the same settings as the maven build, then such spurious reformatting should not happen.

As I have indicated in my original posting, the reformatting was definitely not done by my IDE but by the Maven plugin. as can be clearly seen in the PR I have not changed its definitions in any way, so why it decided to do what it did is beyond me.

@lgoldstein
Copy link
Contributor Author

There is completely unrelated stuff in this change. These things must not be in this change at all.

Let's agree to disagree on this. My philosophy is that if, while changing a piece of code, one sees another piece of code that can be improved it can be considered. I agree with you that the extra change should not be too big as to shadow the original goal and introduce too much "noise". How big is "too big" ? How much is "too much" ? that's a judgment call. I believe that the unrelated changes introduced in this PR do not pose the risk of drowning the original intention. Obviously, you disagree - and that's fine too.

@lgoldstein
Copy link
Contributor Author

The command execution timeout looks suspicious to me. At least the timeout for running a command should by default be zero, as it was before.

Well put - will fix this

@lgoldstein
Copy link
Contributor Author

There was some refactoring grouping together isolated variables related to KEX (SessionCountersDetails/SessionKexDetails). Probably not needed to implement strict KEX.

Here I have to disagree with you. My open source philosophy puts a great emphasis on the word "open". This is why 99.999% of the code uses public or protected visibility. Furthermore, I want to provide the user as much flexibility as possible as well as the maximum information with the least amount of effort for the user. That being said, the 2 classes you mention make testing very easy since they exposes the internal counters in an easy-to-use manner instead of having to sub-class the session in a cumbersome manner.

@lgoldstein
Copy link
Contributor Author

I don't think a custom SSH config property "UseStrictKex" is needed

How else can one use our CLI to easily (emphasis on easily) activate/deactivate the feature ?

@lgoldstein
Copy link
Contributor Author

In fact I don't think any customization flag is needed at all. (Also not in CoreModuleProperties.)

In this issue I have to disagree wholeheartedly. As I have explained in a response to others - this is a major change in the code that has not gained enough "mileage" in the "wild" since we have not released it. It may contain some subtle bug that neither you nor i have detected. Therefore, we must have a "kill switch" for it so that our software is not rendered inoperable and then we have to scramble to fix it. This way, if a subtle bug is indeed detected the users can simply disable it, and they would simply revert to the situation before the vulnerability was detected - but at least something would be working.

You could make the case that the default for the property should be on. It is a valid claim, but I personally do not feel confident enough to do so in view of the major behavior change it entails. If, after running more tests, you should feel comfortable with making it active by default - please do so in a future code change. I will not object to it...

@lgoldstein
Copy link
Contributor Author

It does not seem necessary to actively avoid sending IGNORE/DEBUG messages before one's own initial KEX_INIT. Normally that shouldn't happen anyway. If it is needed, it could be done unconditionally (i.e., irrespective of whether strict kex will be active).

Perhaps, but since I don't want to rely on any timeouts or counters I thought it wise to prevent this explicitly. I am a fan of the adage

Just because you are paranoid does not mean that nobody is after you...

@lgoldstein
Copy link
Contributor Author

This needs tests against OpenSSH 9.6

Agreed - like I said, I cannot/do not have enough time to obtain and install this. The server I do have (version 8.9) is equipped with this feature and I tested the code (successfully) with it. I welcome your testing it with 9.6 if you can

@lgoldstein
Copy link
Contributor Author

Probably unit tests would be the bulk of the change

See StrictKexTest class

@lgoldstein
Copy link
Contributor Author

check that the receive sequence number of the peer's initial KEX_INIT is 1. If not, there were earlier messages, and they disconnect.

Done

as long as initialKexDone == false, only allow KEX messages. Reception of any other message causes the party to disconnect.

You know the KEX code better than I do. I was not aware of this - please feel free to introduce it later on. However, I do have to ask why this is needed if the first step is implemented

When a party sends its own NEW_KEYS message, it resets the send sequence number to zero after having encoded the NEW_KEYS message itself. (So basically where it installs the new keys it also resets the message sequence counter.)

Done

When a party receives a NEW_KEYS message, it resets the receive sequence number to zero after having decoded the message. (Same here: where we install the new keys, we also reset the counter.)

Done

@lgoldstein
Copy link
Contributor Author

Protocol-wise, this should be a fairly simple change.

:-) not so simple ...

I would refrain from any beautification or not strictly needed refactoring; such things can be done in later changes once the basic modified protocol works.

A matter of personal taste ... I respect and value your contribution to the project enormously, and usually agree with 99.99999999% of the time - this is the 0.00000000001% :-)

@tomaswolf
Copy link
Member

As you're a committer and PMC member I cannot stop you from merging this, but I fundamentally disagree with this change.

A PR should have one concern ("provide strict KEX"), and commits in a PR also should have one concern ("implement strict KEX functionality", "provide unit tests for strict KEX", "provide container tests (integration tests) for strict KEX").

Please note that OpenSSH 8.9 does not have the strict KEX feature. The first public version that does have it is OpenSSH 9.6. See the OpenSSH release notes. Or did you build your own version of 8.9 with the strict KEX patch applied?

As for the reformatting: of course I don't know how it happened... it looks to me as if your IDE reformatted the whole file to line length 80 on save. We normally use a line length of 120 characters.

Implementing strict KEX is about a 20, maybe 30-line change in AbstractSession. And then there may be hundreds of lines for tests. I'll try to put together a simpler PR over New Year.

@lgoldstein
Copy link
Contributor Author

lgoldstein commented Dec 29, 2023

Please note that OpenSSH 8.9 does not have the strict KEX feature. The first public version that does have it is OpenSSH 9.6. See the OpenSSH release notes. Or did you build your own version of 8.9 with the strict KEX patch applied?

I don't know what to tell you - when i run sshd -V it claims to be 8.9, and when I run the code it claims to have strict KEX... Maybe there is a subtle bug in my code - will try to look at it again

@lgoldstein
Copy link
Contributor Author

lgoldstein commented Dec 29, 2023

As you're a committer and PMC member I cannot stop you from merging this, but I fundamentally disagree with this change.

Then I won't - until at least we reach some agreement

Implementing strict KEX is about a 20, maybe 30-line change in AbstractSession. And then there may be hundreds of lines for tests. I'll try to put together a simpler PR over New Year.

I will definitely wait for it - no rush, you don't have to work over the New Year...

@lgoldstein
Copy link
Contributor Author

As for the reformatting: of course I don't know how it happened... it looks to me as if your IDE reformatted the whole file to line length 80 on save. We normally use a line length of 120 characters.

I will look into it - I don't remember setting it up this way, but who knows ...

@lgoldstein
Copy link
Contributor Author

lgoldstein commented Dec 29, 2023

it looks to me as if your IDE reformatted the whole file to line length 80 on save.

Does not look that way...
image

@lgoldstein
Copy link
Contributor Author

Here are more settings - as can be seen, the "Format source code" option is disabled. There is a very limited set of actions that it is set up to do on save

image

@lgoldstein
Copy link
Contributor Author

lgoldstein commented Dec 29, 2023

I'll try to put together a simpler PR over New Year.

I just had a thought - maybe you can test your code's interoperability with mine as well as OpenSSH's and thus perhaps detect bugs (in either of our PRs). Should be relatively easy to test your client with my server and vice versa...

@lgoldstein
Copy link
Contributor Author

lgoldstein commented Dec 29, 2023

I cannot stop you from merging this, but I fundamentally disagree with this change.

Implementing strict KEX is about a 20, maybe 30-line change in AbstractSession

Just to make clear - I am not offended in any way - quite the opposite. Like I said, I have tremendous respect for your work despite our differences in style and since you feel so strongly about it I will respect that - especially since you want to work on your own PR counter-proposal. I think this is a very positive outcome since we are both committed (no pun intended :-))) to the quality of our code.

I do want to stress 2 things:

  1. the importance of thorough unit tests - regardless of your opinion on the way I propose to fix the issue, my own experience is that the unit tests helped find some problems in the code I wrote.

  2. I feel I must insist on having a "kill switch" for this feature (i.e., a CoreModuleProperties entry). Like I said, it is a major change in the protocol that not all clients/servers support (yet). There may be a bug in our implementation, or more importantly in theirs. We want to give our users the capability to disable it globally or for a specific session (e.g., one that attempts to communicate with an improperly/faulty implemented peer). As I have stated, if you prefer making it ENABLED by default I will not object, but IMO a kill switch we must have.

In this context, please make sure that the tests include both ENABLED and DISABLED state for the kill switch in order to make sure it works properly - i.e., enabled/disabled client/server vs a disabled/enabled counterpart

@ecki
Copy link

ecki commented Dec 29, 2023

Debian bookworm backported the strictkex fix, so you could use that container (or maybe pinning Sid, it has the upstream fix, but that’s harder to keep available in the long run)

@tomaswolf
Copy link
Member

Please note that OpenSSH 8.9 does not have the strict KEX feature. The first public version that does have it is OpenSSH 9.6. See the OpenSSH release notes. Or did you build your own version of 8.9 with the strict KEX patch applied?

I don't know what to tell you - when i run sshd -V it claims to be 8.9, and when I run the code it claims to have strict KEX... Maybe there is a subtle bug in my code - will try to look at it again

Seems like the Linux distro package maintainers have backported the strict kex patch also to older versions. For Ubuntu, I see it having been applied to OpenSSH versions 8.2, 8.9, 9.3, and in debian for 8.4 and 9.2.

@lgoldstein
Copy link
Contributor Author

Seems like the Linux distro package maintainers have backported the strict kex patch also to older versions. For Ubuntu, I see it having been applied to OpenSSH versions 8.2, 8.9, 9.3, and in debian for 8.4 and 9.2.

Great - then it means you will be able to test your PR's interoperability not only with my code but also with an official OpenSSH implementation.

@tomaswolf
Copy link
Member

An alternate single-purpose PR that only implements strict KEX and nothing else is available and ready for review now at #449. Not claiming it was "better", but it's much more focused. It includes interoperability tests with OpenSSH with and without strict KEX.

@lgoldstein lgoldstein closed this Jan 6, 2024
@lgoldstein lgoldstein deleted the GH-445 branch January 6, 2024 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants