Skip to content

Send errors metrics for 5xx response from API Gateway, Lambda Function URL, or ALB #229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
10 changes: 10 additions & 0 deletions datadog_lambda/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,13 @@ class XrayDaemon(object):
XRAY_TRACE_ID_HEADER_NAME = "_X_AMZN_TRACE_ID"
XRAY_DAEMON_ADDRESS = "AWS_XRAY_DAEMON_ADDRESS"
FUNCTION_NAME_HEADER_NAME = "AWS_LAMBDA_FUNCTION_NAME"


SERVER_ERRORS_STATUS_CODES = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably don't need them any longer.

"500": "500 Internal Server Error",
"501": "501 Not Implemented",
"502": "502 Bad Gateway",
"503": "503 Service Unavailable",
"504": "504 Gateway Timeout",
"505": "505 HTTP Version Not Supported",
}
19 changes: 18 additions & 1 deletion datadog_lambda/wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,14 @@
import traceback
from importlib import import_module

from ddtrace.constants import ERROR_MSG, ERROR_STACK, ERROR_TYPE

from datadog_lambda.extension import should_use_extension, flush_extension
from datadog_lambda.cold_start import set_cold_start, is_cold_start
from datadog_lambda.constants import (
XraySubsegment,
SERVER_ERRORS_STATUS_CODES,
TraceContextSource,
XraySubsegment,
)
from datadog_lambda.metric import (
flush_stats,
Expand Down Expand Up @@ -150,6 +153,7 @@ def __call__(self, event, context, **kwargs):
self._after(event, context)

def _before(self, event, context):
self.response = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's put this line inside the try:... block, so we avoid datadog code crashing customer application as much as we can....that is, datadog should fail quietly (only log something) but no interruption to customer application. I know this line is pretty safe, but it may "invite" future developers to add more lines outside the try block because they saw some lines outside...

try:

set_cold_start()
Expand Down Expand Up @@ -190,6 +194,19 @@ def _after(self, event, context):
status_code = extract_http_status_code_tag(self.trigger_tags, self.response)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied to the original conversation, but want to point out, in case of invocation fails, self.response would still hold the value from the previous good invocation, and you would end up emitting an error metric based on the previous invocation instead of the current one.

if status_code:
self.trigger_tags["http.status_code"] = status_code
if len(status_code) == 3 and status_code.startswith("5"):
submit_errors_metric(context)
if self.span:
self.span.set_traceback()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure you have a valid stacktrace in this case, since there isn't a real python exception being thrown. I suspect you need to directly set the error message and type fields https://github.com/DataDog/dd-trace-py/blob/fb8dfa2f33fff37d21df9728d8386c0260df9744/ddtrace/contrib/grpc/server_interceptor.py#L41-L42

We should set both fields to something meaningful that can help users understand the problem when seeing it. For error.type, we can use StatusCode 5xx, for error.msg we can probably say Lambda invocation returns status code 500 (we can probably spell out the actual status code in the message instead of 5xx).

@brettlangdon we are trying to mark Lambda invocations returning statusCode 5xx as errors in trace, is what I mentioned above the best way to do it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what we do in the tracer: https://github.com/DataDog/dd-trace-py/blob/3b91b0da8/ddtrace/contrib/trace_utils.py#L274-L282

tl;dr; you just need to do self.span.error = 1

self.span.error = 1
self.span.set_tags(
{
ERROR_TYPE: "5xx Server Errors",
ERROR_MSG: SERVER_ERRORS_STATUS_CODES.get(
status_code, "5xx Server Errors"
),
}
)
# Create a new dummy Datadog subsegment for function trigger tags so we
# can attach them to X-Ray spans when hybrid tracing is used
if self.trigger_tags:
Expand Down
58 changes: 58 additions & 0 deletions tests/test_wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,64 @@ def lambda_handler(event, context):
]
)

@patch('datadog_lambda.wrapper.extract_trigger_tags')
def test_5xx_sends_errors_metric_and_set_tags(self, mock_extract_trigger_tags):
mock_extract_trigger_tags.return_value = {
"function_trigger.event_source": "api-gateway",
"function_trigger.event_source_arn":
"arn:aws:apigateway:us-west-1::/restapis/1234567890/stages/prod",
"http.url": "70ixmpl4fl.execute-api.us-east-2.amazonaws.com",
"http.url_details.path": "/prod/path/to/resource",
"http.method": "GET",
}
@datadog_lambda_wrapper
def lambda_handler(event, context):
return {
"statusCode": 500,
"body": "fake response body"
}

lambda_event = {}

lambda_handler(lambda_event, get_mock_context())

self.mock_write_metric_point_to_stdout.assert_has_calls(
[
call(
"aws.lambda.enhanced.invocations",
1,
tags=[
"region:us-west-1",
"account_id:123457598159",
"functionname:python-layer-test",
"resource:python-layer-test:1",
"cold_start:true",
"memorysize:256",
"runtime:python3.9",
"datadog_lambda:v6.6.6",
"dd_lambda_layer:datadog-python39_X.X.X",
],
timestamp=None,
),
call(
"aws.lambda.enhanced.errors",
1,
tags=[
"region:us-west-1",
"account_id:123457598159",
"functionname:python-layer-test",
"resource:python-layer-test:1",
"cold_start:true",
"memorysize:256",
"runtime:python3.9",
"datadog_lambda:v6.6.6",
"dd_lambda_layer:datadog-python39_X.X.X",
],
timestamp=None,
),
]
)

def test_enhanced_metrics_cold_start_tag(self):
@datadog_lambda_wrapper
def lambda_handler(event, context):
Expand Down