Skip to content

Change KafkaBackOffException Log Level to Debug #2063

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

tomazfernandes
Copy link
Contributor

Resolves #2009

Currently KafkaBackOffException is being logged at WARN level, which is verbose and creates confusion.

This PR leverages the existing KafkaException.Level structure to change the logging level to DEBUG and remove the coupling between the KafkaMessageListenerContainer and KafkaBackOffException.

The expected KafkaBackoffException's message is being logged at WARN level.

Change the logging level to DEBUG.
@tomazfernandes
Copy link
Contributor Author

This seems to be a proper solution without any meaningful side effects, WDYT?

While looking into this, I noticed that it seems the original exception's message (the one being thrown by the listener in the first place) is not being logged.

Looks like since the recoverer kicks in and has a successful execution (sending to the next topic), the error message never gets logged.

Would it make sense to log every Listener execution failure? The problem is if I understood correctly the original exception never gets logged.

LMKWYT

Thanks!

@tomazfernandes tomazfernandes changed the title Change KafkaBackOffException Log to Debug Change KafkaBackOffException Log Level to Debug Jan 11, 2022
The expected KafkaBackoffException's message is being logged at WARN level.

Make KafkaBackoffException be logged in DEBUG level

Remove coupling between KafkaMessageListenerContainer and KafkaBackoffException
@tomazfernandes
Copy link
Contributor Author

Changed the solution because the previous one would interfere with other exceptions.

My intention here is to remove BackOff related code from the KafkaMessageListenerContainer. This solution duplicates logic, albeit a very small one, from the DefaultErrorHandler, so maybe there's room for improvement there.

But maybe it'd be simpler to just change the logging level directly in the KMLC as it was before 😄

@garyrussell
Copy link
Contributor

I agree the decoupling makes sense; but I don't think we need a new subclass, we could just add

Copy link
Contributor

@garyrussell garyrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the decoupling from the KMLC makes sense; but I don't think we need a new subclass, we could just add

/**
* Set the level at which the exception thrown by this handler is logged.
* @param the throwable type.
* @return the level.
*/
protected KafkaException.Level getLogLevel(Class<? extends Throwable> throwable) {
          ...
}

to KafkaExceptionLogLevelAware and add a setter for the backoff exception log level (default DEBUG).

@tomazfernandes
Copy link
Contributor Author

I'm not sure I follow; you mean checking the exception type in the DefaultErrorHandler? This would add the same coupling there. Maybe that's ok, since it's coupling an exception to an error handler, and is a simpler solution.

Another way I see around this would be if KafkaException could, maybe optionally, look into its causes to figure out the root logging level, in case there's a deeper KafkaException. Then we could set the debug logging level into the backoff exception itself. Since KE is responsible for its own logging, maybe that's not a bad complexity for it to have.

@garyrussell
Copy link
Contributor

I mean doing this in the DEH....

SeekUtils.seekOrRecover(thrownException, records, consumer, container, isCommitRecovered(), // NOSONAR
		getRecoveryStrategy(records, thrownException), this.logger, getLogLevel(thrownException.getClass)); // <<<<< pass type

and then, maybe a more general API in KafkaExceptionLogLevelAware

public abstract class KafkaExceptionLogLevelAware {

	private KafkaException.Level logLevel = KafkaException.Level.ERROR;

	private final Map<Class<? extends Throwable>, KafkaException.Level> levels = new HashMap<>();

	{
		this.levels.put(KafkaBackoffException.class, KafkaException.Level.DEBUG);
	}

	/**
	 * Set the level at which the exception thrown by this handler is logged.
	 * @param logLevel the level (default ERROR).
	 */
	public void setLogLevel(KafkaException.Level logLevel) {
		Assert.notNull(logLevel, "'logLevel' cannot be null");
		this.logLevel = logLevel;
	}

	/**
	 * Set the level at which the exception thrown by this handler is logged.
	 * @return the level.
	 */
	protected KafkaException.Level getLogLevel() {
		return this.logLevel;
	}

	protected KafkaException.Level getLogLevel(Class<? extends Throwable> throwable) {
		return this.levels.containsKey(throwable) ? this.levels.get(throwable) : this.logLevel;
	}
	
	public void configureLogLevel(Class<? extends Throwable> throwable, KafkaException.Level level) {
		this.levels.put(throwable, level);
	}

}

I didn't answer your other point.

Would it make sense to log every Listener execution failure? The problem is if I understood correctly the original exception never gets logged.

The reason we added the KafkaExceptionLogLevelAware was because a lot of people complained about the log noise with it being logged on every retry.

But I see your point; since we always recover these failures, they are never logged. Maybe the DLPR needs an option to log the exception after successfully recovering a record (but that doesn't need to be part of this fix).

@garyrussell
Copy link
Contributor

Actually; maybe we don't need any of this additional complexity; with just your change to the KMLC, the retryable topic code could simply set the DEH's log level to DEBUG for all exceptions?

@tomazfernandes
Copy link
Contributor Author

Actually; maybe we don't need any of this additional complexity; with just your change to the KMLC, the retryable topic code could simply set the DEH's log level to DEBUG for all exceptions?

That was my previous attempt, but wouldn't that log every exception the recovery code might throw to DEBUG? Since they're all wrapped into a KE.

@garyrussell
Copy link
Contributor

Yeah - need to take a step back and rethink; recovery exceptions should always log at ERROR, maybe in the FailedRecordTracker directly, regardless of the log level for KEs. I'd rather see such errors logged twice rather than not at all.

@tomazfernandes
Copy link
Contributor Author

and then, maybe a more general API in KafkaExceptionLogLevelAware

The thing with this one is that the KafkaBackOffException comes wrapped in a ListenerExecutionException, so we'd have to either unwrap it or look into nested exceptions for this to work. But I agree since this is a class responsible for log levels it might be a good place to add the coupling, if we were going this way.

@tomazfernandes
Copy link
Contributor Author

tomazfernandes commented Jan 11, 2022

Yeah - need to take a step back and rethink; recovery exceptions should always log at ERROR, maybe in the FailedRecordTracker directly, regardless of the log level for KEs. I'd rather see such errors logged twice rather than not at all.

Actually, I stand corrected. There is logic in the SeekUtils class that logs the error regardless of it being a KE or not.

  catch (Exception ex) {
	  if (isBackoffException(ex)) {
		  logger.debug(ex, () -> ListenerUtils.recordToString(record)
				  + " included in seeks due to retry back off");
	  }
	  else {
		  logger.error(ex, () -> "Failed to determine if this record ("
				  + ListenerUtils.recordToString(record)
				  + ") should be recovererd, including in seeks");
	  }
	  skipped.set(false);
  }

So for the task at hand I think setting the DHE log level to DEBUG in the retry topic logic seems to be the right call.

@tomazfernandes
Copy link
Contributor Author

I opened a new PR with the DHE solution - that worked well in my tests. If that's ok, we can close this one.

@tomazfernandes
Copy link
Contributor Author

The reason we added the KafkaExceptionLogLevelAware was because a lot of people complained about the log noise with it being logged on every retry.

But I see your point; since we always recover these failures, they are never logged. Maybe the DLPR needs an option to log the exception after successfully recovering a record (but that doesn't need to be part of this fix).

We could log the error message just before we either send the record to the DLT or give up processing - that way nothing gets logged if any of the retrials work, and gets logged only once if none.

@garyrussell
Copy link
Contributor

Replaced by #2065

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

@RetryableTopic appear topic not ready for consumption, backing off for approx
2 participants