Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[grid] Node flag register-shutdown-on-failure #15297

Merged
merged 1 commit into from
Feb 19, 2025
Merged

Conversation

VietND96
Copy link
Member

@VietND96 VietND96 commented Feb 18, 2025

User description

Thanks for contributing to Selenium!
A PR well described will help maintainers to quickly review and merge it

Before submitting your PR, please check our contributing guidelines.
Avoid large PRs, help reviewers by making them as simple and short as possible.

Motivation and Context

register-shutdown-on-failure - If this flag is enabled, the Node will shut down after the register period is completed. This is useful for container environments to restart and register again. If restarted multiple times, the Node container status will be CrashLoopBackOff.

This also adds a message for the event onFailure in RetryPolicy to clarify the behavior of --register-period (After this period is completed, the Node will not attempt to register again), notify the user that needs to take action (e.g restart the Node, or spawn the process again for registration, or increase the default period 120 seconds) based on different situation.

java -jar selenium-server-4.29.0-SNAPSHOT.jar node --register-period 20
...
23:51:26.531 INFO [NodeServer.execute] - Started Selenium node 4.29.0 (revision Unknown): http://192.168.1.101:5555
23:51:26.543 INFO [NodeServer$2.lambda$start$2] - Sending registration event...
23:51:36.563 INFO [NodeServer$2.lambda$start$2] - Sending registration event...
23:51:46.549 INFO [NodeServer$2.lambda$start$2] - Sending registration event...
23:51:46.549 ERROR [NodeServer$2.lambda$start$1] - Registration event failed after period of 20 seconds. Node will not attempt to register again

Enable register-shutdown-on-failure

java -jar selenium-server-4.29.0-SNAPSHOT.jar node --register-shutdown-on-failure --register-period 20
...
07:33:43.417 INFO [NodeServer.execute] - Started Selenium node 4.29.0-SNAPSHOT (revision Unknown): http://192.168.1.101:5555
07:33:43.428 INFO [NodeServer$2.lambda$start$2] - Sending registration event...
07:33:53.450 INFO [NodeServer$2.lambda$start$2] - Sending registration event...
07:34:03.431 INFO [NodeServer$2.lambda$start$2] - Sending registration event...
07:34:03.431 ERROR [NodeServer$2.lambda$start$1] - Registration event failed after period of 20 seconds. Node will not attempt to register again
07:34:03.432 ERROR [NodeServer$2.lambda$start$1] - Shutting down

echo $?
1

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • I have read the contributing document.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Error Handling

The System.exit(1) call for registration failure may be too abrupt. Consider implementing a graceful shutdown that cleans up resources and allows pending operations to complete.

System.exit(1);
Documentation

The flag description should clarify what happens to in-progress sessions when the node shuts down, and whether there are any precautions users should take before enabling this feature.

description =
    "If enable this flag, after register period is completed, the Node will shutdown itself."
        + " This is useful for container environments to restart and register again. If"
        + " restart multiple times, Node container status will be CrashLoopBackOff.")

Copy link
Contributor

qodo-merge-pro bot commented Feb 18, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Use graceful shutdown instead of force-exit

Use a more graceful shutdown approach instead of System.exit(1). Consider using
server.stop() or similar shutdown hooks to properly cleanup resources.

java/src/org/openqa/selenium/grid/node/httpd/NodeServer.java [225-228]

 if (nodeOptions.getRegisterLoopBack()) {
-  LOG.severe("Shutting down");
-  System.exit(1);
+  LOG.severe("Initiating graceful shutdown");
+  executor.shutdown();
+  server.stop();
 }
  • Apply this suggestion
Suggestion importance[1-10]: 8

__

Why: Using System.exit(1) is a harsh way to terminate and can lead to resource leaks. A graceful shutdown approach would ensure proper cleanup of resources and more reliable container orchestration.

Medium
General
Add detailed error logging

Add error details to the failure log message to help diagnose registration
issues.

java/src/org/openqa/selenium/grid/node/httpd/NodeServer.java [221-224]

 LOG.severe(
   String.format(
-    "Registration event failed after period of %s seconds. Node will not attempt to register again.",
-    nodeOptions.getRegisterPeriod().getSeconds()));
+    "Registration event failed after period of %s seconds. Failure reason: %s. Node will not attempt to register again.",
+    nodeOptions.getRegisterPeriod().getSeconds(),
+    event.getLastFailure().getMessage()));
  • Apply this suggestion
Suggestion importance[1-10]: 7

__

Why: Adding detailed error information would significantly improve debugging capabilities by providing specific reasons for registration failures, making troubleshooting more efficient.

Medium
  • Update

Copy link
Contributor

qodo-merge-pro bot commented Feb 18, 2025

CI Feedback 🧐

(Feedback updated until commit f7ebb01)

A test triggered by this PR failed. Here is an AI-generated analysis of the failure:

Action: Ruby / Local Tests (edge, windows) / Local Tests (edge, windows)

Failed stage: Run Bazel [❌]

Failed test name: takes_screenshot-edge

Failure summary:

The action failed due to two main issues:

  • The build system (Bazel) couldn't access a required Ruby toolchain package due to a symlink error in
    Windows: Cannot read link (name=D:_bazel\external\rules_ruby++ruby+ruby\dist\msys64\etc\mtab)
  • The specific target //rb/spec/integration/selenium/webdriver:takes_screenshot-edge failed to build
    because of the missing Ruby toolchain dependency
  • Additionally, no test targets were found when testing was requested

  • Relevant error logs:
    1:  ##[group]Operating System
    2:  Microsoft Windows Server 2022
    ...
    
    478:  �[32mAnalyzing:�[0m 31 targets (210 packages loaded, 1559 targets configured)
    479:  �[32mAnalyzing:�[0m 31 targets (236 packages loaded, 3335 targets configured)
    480:  �[33mDEBUG: �[0mD:/_bazel/external/aspect_rules_js+/npm/private/npm_translate_lock_state.bzl:27:14: 
    481:  WARNING: `update_pnpm_lock` attribute in `npm_translate_lock(name = "npm")` is not yet supported on Windows. This feature
    482:  will be disabled for this build.
    483:  �[32mAnalyzing:�[0m 31 targets (242 packages loaded, 4109 targets configured)
    484:  �[32mAnalyzing:�[0m 31 targets (244 packages loaded, 4109 targets configured)
    485:  �[32mAnalyzing:�[0m 31 targets (244 packages loaded, 7351 targets configured)
    486:  �[31m�[1mERROR: �[0mno such package '@@rules_ruby++ruby+ruby//': error globbing [dist/**/*] op=FILES: Cannot read link (name=D:\_bazel\external\rules_ruby++ruby+ruby\dist\msys64\etc\mtab): unknown link type
    487:  �[31m�[1mERROR: �[0mD:/a/selenium/selenium/rb/spec/integration/selenium/webdriver/BUILD.bazel:31:24: Target '//rb/spec/integration/selenium/webdriver:takes_screenshot-edge' depends on toolchain '@@rules_ruby++ruby+ruby//:toolchain', which cannot be found: no such package '@@rules_ruby++ruby+ruby//': error globbing [dist/**/*] op=FILES: Cannot read link (name=D:\_bazel\external\rules_ruby++ruby+ruby\dist\msys64\etc\mtab): unknown link type'
    488:  �[31m�[1mERROR: �[0mAnalysis of target '//rb/spec/integration/selenium/webdriver:takes_screenshot-edge' failed; build aborted: Analysis failed
    489:  �[32mINFO: �[0mElapsed time: 31.035s, Critical Path: 1.28s
    490:  �[32mINFO: �[0m1 process: 1 internal.
    491:  �[31m�[1mERROR: �[0mBuild did NOT complete successfully
    492:  �[31m�[1mFAILED:�[0m 
    493:  �[31m�[1mERROR: �[0mNo test targets were found, yet testing was requested
    494:  �[0m
    495:  ##[error]Process completed with exit code 1.
    

    @VietND96 VietND96 force-pushed the node-register-loopback branch from 7544113 to 18becb4 Compare February 19, 2025 00:42
    @VietND96 VietND96 changed the title [grid] Node flag register-loopback to shutdown after register period failed [grid] Node flag register-shutdown-on-failure Feb 19, 2025
    @VietND96 VietND96 force-pushed the node-register-loopback branch from 18becb4 to f7ebb01 Compare February 19, 2025 00:45
    @VietND96 VietND96 merged commit 21d8e10 into trunk Feb 19, 2025
    25 of 33 checks passed
    @VietND96 VietND96 deleted the node-register-loopback branch February 19, 2025 01:10
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    1 participant