Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improve flexibility of the MaxRetriesFailureHandler #554

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

beilCrxmarkets
Copy link
Contributor

Enhance Flexibility in MaxRetriesExceededHandler Configuration

This pull request introduces modifications to the MaxRetriesFailureHandler to enhance the flexibility and action possibilities within the maxRetriesExceededHandler. Previously, invoking executionOperations.stop() universally, even when a maxRetriesExceededHandler was defined, limited the potential actions, such as task rescheduling.
Key Changes:

  • The maxRetriesExceededHandler can now decide independently whether to remove the task, allowing for more nuanced handling (e.g., rescheduling tasks).
  • These modifications represent a breaking change, affecting the behavior of the MaxRetriesFailureHandler. With no defined handler, the behavior remains unchanged via a default handler executing executionOperations.stop().
  • We considered adding a RetriesFailureHandler, which would function similarly to MaxRetriesFailureHandler but without automatic invocation of executionOperations.stop(). However, to avoid user confusion on choosing between handlers, this was not pursued.

Fixes

#538

Reminders

  • Added/ran automated tests
  • Update README and/or examples
  • Ran mvn spotless:apply

cc @kagkarlsson

@beilCrxmarkets beilCrxmarkets changed the title Improve flexibility of the MaxRetriesFailureHandler feat: Improve flexibility of the MaxRetriesFailureHandler Nov 29, 2024
Comment on lines 96 to 99
executionOperations.stop();
maxRetriesExceededHandler.accept(executionComplete, executionOperations);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure about the original intent of the maxRetriesExceededHandler. The handler should not have taken executionOperations as a parameter since the execution had already been stopped.

@kagkarlsson
Copy link
Owner

What is your use-case if I might ask? (what operation rather than stop() are you considering)

@beilCrxmarkets
Copy link
Contributor Author

What is your use-case if I might ask? (what operation rather than stop() are you considering)

If the executionOperation is not stoped jet, we could pass new FailureHandler.OnFailureReschedule<Void>(schedule)::onFailure or new FailureHandler.OnFailureRetryLater<Void>(schedule)::onFailure for the maxRetriesExceededHandler.

Our RecurringTask is triggered once a day. If this fails, we want to repeat it. If we exceed the maxRetires, we still want the recurring task to be triggered the next day.

@kagkarlsson
Copy link
Owner

Not sure I understand. Why are you using MaxRetriesFailureHandler if you do not want the "max"-behavior?

You can always implement your own fully custom failurehandler..

@beilCrxmarkets
Copy link
Contributor Author

Not sure I understand. Why are you using MaxRetriesFailureHandler if you do not want the "max"-behavior?

You can always implement your own fully custom failurehandler..

At the moment we use our own implemented FailureHandler to get the desired behavior.

However, I assume that we are not the only ones who need the described behavior, which is why it makes sense to store it in a central location.

I agree with you that the name “MaxRetriesFailureHandler” implies that after the maximum number of attempts is reached, the instance is terminated. We have therefore also considered proposing to add an additional FailureHandler with the name “RetriesFailureHandler”. However, the existence of a “MaxRetriesFailureHandler” and a “RetriesFailureHandler” can be confusing for beginners at first glance, as the difference is not 100% obvious.

If we were to start this project from scratch, it would make sense to add only the “RetriesFailureHandler” instead of the “MaxRetriesFailureHandler”, as the latter offers more flexibility. Since the library and thus the “MaxRetriesFailureHandler” have been around for a while and are used by several users as they are, the compromise would be to adapt the “MaxRetriesFailureHandler” accordingly so that it offers more flexibility.

@kagkarlsson
Copy link
Owner

Are you looking for behavior like:

  • up until X retries, use this FailureHandler
  • after X retries, use this other FailureHandler
    ?

@beilCrxmarkets
Copy link
Contributor Author

Are you looking for behavior like:

  • up until X retries, use this FailureHandler
  • after X retries, use this other FailureHandler
    ?

Yes, our task will be triggered once a day. If this fails, we want to try again 5 times shortly after (RetriesFailureHandler). After all 5 attempts are used up, we want to try again tomorrow at the usual time (OnFailureReschedule).

However, with the current implementation of the MaxRetriesFailureHandler, we cannot do this because the task is deleted from the DB after the 5 attempts are exhausted.

We also combine this with the ExponentialBackoffFailureHandler to get a delay between the 5 additional attempts. This does not cause any problems.

@beilCrxmarkets
Copy link
Contributor Author

@kagkarlsson How should we continue with this topic?

As stated in #538 (comment) @adamalexandru4 has the same issue.

@kagkarlsson
Copy link
Owner

kagkarlsson commented Mar 8, 2025

I don't want to add this functionality in to MaxRetries...

And it is not meant to handle a secondary FailureHandler, the signature is:

    public MaxRetriesFailureHandler(
        int maxRetries,
        FailureHandler<T> failureHandler,
        BiConsumer<ExecutionComplete, ExecutionOperations<T>> maxRetriesExceededHandler) {
      this.maxRetries = maxRetries;
      this.failureHandler = failureHandler;
      this.maxRetriesExceededHandler = maxRetriesExceededHandler;
    }

i.e. BiConsumer<ExecutionComplete, ExecutionOperations<T>>

We might add a warning if the type supplied is actually a FailureHandler?

(It might be there is a better name for MaxRetries..., not sure but since you both made that assumption...)

Your requirement I think should be captured in its own handler. If you want to supply a standard one for the project that is fine 👍. But what should it be called 😅. CompositeFailureHandler.primary(handler1).afterTries(5, handler2) (.afterTime(Duration.ofHours(1), handler3))` (just toying with ideas)

@beilCrxmarkets
Copy link
Contributor Author

I have reverted the changes made to the MaxRetriesFailureHandler and introduced a new SequenceFailureHandler equipped with builder functionality.

The SequenceFailureHandler facilitates the creation of a sequence involving two FailureHandlers. Initially, the primaryFailureHandler is invoked on the first failure, followed by the secondaryFailureHandler for subsequent failures.

While we can consider alternate names, I believe "sequence" aptly describes the concept of two failure handlers operating sequentially. In contrast, "composite" suggests a combination, which may not convey the intended functionality.

However, even with this setup, we cannot achieve the desired behavior. The limitation arises because the MaxRetriesFailureHandler within the SequenceFailureHandler removes the task execution. As a workaround, I've introduced an optional parameter primaryHandlerRetryCount to allow retries for the primaryFailureHandler.

This can create the wanted behavior via

SequenceFailureHandler.builder()
  .primary(new ExponentialBackoffFailureHandler(Duration.ofSeconds(3), 1.0))
  .afterTries(5, OnFailureRescheduleUsingTaskDataSchedule())
  .build()

Despite this solution, I remain uncertain if it constitutes the optimal API, given its inherent limitations.
An alternative approach might be the adoption of a list instead of the primaryFailureHandler, illustrated as follows:

  class SequenceFailureHandler<T> {

    private final List<FailureHandler<T>> failureHandlerSequence = new ArrayList<>();
    private final FailureHandler<T> fallbackFailureHandler;

    public SequenceFailureHandler(
        List<FailureHandler<T>> failureHandlerSequence, FailureHandler<T> fallbackFailureHandler) {
      this.failureHandlerSequence.addAll(failureHandlerSequence);
      this.fallbackFailureHandler = fallbackFailureHandler;
    }

    public void onFailure(
        ExecutionComplete executionComplete, ExecutionOperations<T> executionOperations) {
      int consecutiveFailures = executionComplete.getExecution().consecutiveFailures;
      if (consecutiveFailures < failureHandlerSequence.size()) {
        FailureHandler<T> failureHandler = failureHandlerSequence.get(consecutiveFailures);
        failureHandler.onFailure(executionComplete, executionOperations);
      } else {
        fallbackFailureHandler.onFailure(executionComplete, executionOperations);
      }
    }

    public static <T> List<FailureHandler<T>> times(int times, FailureHandler<T> failureHandler) {
      List<FailureHandler<T>> failureHandlers = new ArrayList<>();
      for (int i = 0; i < times; i++) {
        failureHandlers.add(failureHandler);
      }
      return failureHandlers;
    }
  }

This can be implemented as demonstrated below:

  private static void main() {
    new SequenceFailureHandler<>(
      times(5, new ExponentialBackoffFailureHandler<>(Duration.ofSeconds(3), 1.0)),
      new OnFailureReschedule<>(new CronSchedule("*****")));
  }

What do you think?

@beilCrxmarkets
Copy link
Contributor Author

I've discovered that our current approach will only partially resolve our issues. We need specific error handling behavior:

Retry the operation 5 times with a delay of 3 seconds between attempts. If the operation still fails, reschedule it for the next execution time, typically the following day, and repeat the process.

Currently, the consecutiveFailures count does not reset to 0 after rescheduling to the next day. As a result, if there is a failure the next day, the task is rescheduled immediately instead of attempting it 5 times.

You don't encounter this issue with the MaxRetriesFailureHandler, as it terminates the task when a certain number of consecutive failures is reached.

This problem isn't resolved by the SequenceFailureHandler, which utilizes a list. Once the end of the list is reached, it simply triggers the fallbackFailureHandler, causing the task to be rescheduled immediately.

I believe we need a LoopFailureHandler which accepts a list of FailureHandler objects and starts over with the first handler once the end of the list is reached.

Alternatively, we could implement a retry functionality using the modulo/remainder operator as follows:

    public void onFailure(
        final ExecutionComplete executionComplete,
        final ExecutionOperations<T> executionOperations) {
      int consecutiveFailures = executionComplete.getExecution().consecutiveFailures;
      int totalNumberOfFailures = consecutiveFailures + 1;
      if (totalNumberOfFailures % maxRetries == 0) {
        LOG.error(
            "Execution has failed {} times for task instance {}. Cancelling execution.",
            totalNumberOfFailures,
            executionComplete.getExecution().taskInstance);
        executionOperations.stop();
        maxRetriesExceededHandler.accept(executionComplete, executionOperations);
      } else {
        this.failureHandler.onFailure(executionComplete, executionOperations);
      }
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants