Disable flushing metrics in thread #120
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Disable
flushing_in_thread
ofthreadstats
by default. Previously, for synchronous metric submission (without piggybacking on the forwarder and logs), thethreadstats
is configured to flush the metrics both at the end of the invocation (blocking) and every 10s in a background thread (non-blocking). This PR disables the periodic background flushing.This change is a soft breaking change, though it shouldn't result in any data loss, it does delay synchronous metric submission for long-running Lambda functions (duration > 10s). In the worst case, metrics submitted at the beginning of an invocation could be delayed for showing up in Datadog for 15 min (the max Lambda duration allowed).
For affected functions, the behavior could be re-enabled by setting env var
DD_FLUSH_IN_THREAD
totrue
. However, given the problematic nature of this flushing behavior, I would recommend submitting metrics through the new Datadog Lambda extension instead.Datadog Forwarder relies on the
datadog-lambda
to flush metrics, so thedatadog-lambda
version needs to be bumped to get the issue fixed for the Datadog Forwarder DataDog/datadog-serverless-functions#436Also fixed the broken and flaky integration tests.
Motivation
A bug has been recently identified (see sample logs below). If there happen to be a background flushing in progress, the end-of-invocation flushing will be skipped (because there is already an in-flight flushing). However, if the background flushing doesn't finish before the AWS freezing the execution environment, it will resume at the beginning of the next invocation. There are two problems with it. First, if AWS decides to destroy the current execution environment, this request will never complete. Second, if the next invocation only happens a few min later (e.g., a cronjob), the request will fail due to connection timeout.
In short, the background thread flushing isn't suitable for Lambda and should be disabled.
Testing Guidelines
Additional Notes
Types of Changes
Check all that apply