Skip to content

Infer API Gateway spans #172

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
19f9da3
Infer spans from API Gateway events
Jul 26, 2021
c4cedfa
Adding some prints. remove later
Jul 27, 2021
2cfaee7
Change some info on the API Gateway span
Jul 27, 2021
e077667
Merge branch 'main' into chris.agocs/inferring_api_gateway_spans_poc
Sep 7, 2021
c1f11ee
Rename something
Sep 7, 2021
0cd45fd
black
Sep 8, 2021
40c79ca
>:(
Sep 8, 2021
c4d751e
black
Sep 8, 2021
96b7844
fix time
Sep 8, 2021
586033d
Support various API Gateway, HTTPAPI, and Websocket events
Sep 10, 2021
7d91108
black
Sep 10, 2021
2605f2e
Add DD_INFERRED_SPANS env var to turn inferred spans on and off
Sep 10, 2021
4ecc91d
infer spans in integration tests
Sep 10, 2021
ceea1cc
specify which env var to set true in order to enable inferred spans
Sep 10, 2021
f180439
try setting inferred span name to inferred span URL
Sep 10, 2021
ef9439c
s/beta/experimental/
Sep 14, 2021
07d2e97
Correctly create spans in separate services, assuming the extension i…
Sep 17, 2021
99ca466
Remove function_name
Sep 17, 2021
dd7840e
Flush after closing spans
Sep 21, 2021
33274e3
black
Sep 21, 2021
8503277
black
Sep 22, 2021
279ffe4
update snapshots
Sep 22, 2021
a580e45
Make the snapshots valid json
Sep 22, 2021
df376c9
merge integration tests
Sep 22, 2021
3a3cc4e
black
Sep 22, 2021
e277053
Remove the inferredSpansFilter
Oct 25, 2021
657ee36
Refactor inferred-span event type detection to use the trigger event …
Oct 25, 2021
a6170ff
remove unused import
Oct 25, 2021
e75a3a0
lines too long >=(
Oct 25, 2021
5bd5cb5
Merge main and blow away the integration test snapshots
Oct 25, 2021
38b6987
Finish refactor using _EventSource object
Oct 26, 2021
a1c0b9d
lol, remove println debugging
Oct 26, 2021
b325ee7
Update snapshots
Oct 26, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,17 @@ Initialize the Datadog tracer when set to `true`. Defaults to `false`.

Set to `true` to merge the X-Ray trace and the Datadog trace, when using both the X-Ray and Datadog tracing. Defaults to `false`.

### DD_INFERRED_SPANS (experimental)

Inferred Spans are spans that Datadog can create based on incoming event metadata.
Set `DD_INFERRED_SPANS` to `true` to infer spans based on Lambda events.
Inferring upstream spans is only supported if you are using the [Datadog Lambda Extension](https://docs.datadoghq.com/serverless/libraries_integrations/extension/).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually true? I know we said we wouldn't do any specific work to make it function in the forwarder, but the span will still get generated. I thin originally I said the forwarder might remap the service name, but I don't know if we verified that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did verify it. The forwarder adds tags from the tags cache to the trace payloads it sends back. If we want, we can always fix this and add forwarder support later on.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i guess the forwarder applies the service tag on all the trace payloads from the same event object? if so, even the trace payload carrying the inferred span has no lambda function arn, it still assumes it's from the same function with other payloads and apply the tag? If yes, we would need to make some changes in the forwarder, hopefully a small one.

Defaults to `false`.
Infers spans for:
- API Gateway REST events
- API Gateway websocket events
- HTTP API events

## Opening Issues

If you encounter a bug with this package, we want to hear about it. Before opening a new issue, search the existing issues to avoid duplicates.
Expand Down
147 changes: 147 additions & 0 deletions datadog_lambda/tracing.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import logging
import os
import json
from enum import Enum

from datadog_lambda.constants import (
SamplingPriority,
Expand All @@ -19,17 +20,48 @@
)
from ddtrace import tracer, patch
from ddtrace import __version__ as ddtrace_version
from ddtrace.filters import TraceFilter
from ddtrace.propagation.http import HTTPPropagator
from datadog_lambda import __version__ as datadog_lambda_version

logger = logging.getLogger(__name__)


SPAN_TYPE_TAG = "_dd.span_type"
SPAN_TYPE_INFERRED = "inferred"


class InferredSpanFilter(TraceFilter):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this class definitely deserve a comprehensive docstring to explain why it's needed and the background.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this warrants a comment explaining why we are using this.

def process_trace(self, trace):
logger.debug("InferredSpanFilter got a trace of length {}".format(len(trace)))
trace_to_send = []
for span in trace:
if span.get_tag(SPAN_TYPE_TAG) == SPAN_TYPE_INFERRED and len(trace) > 1:
logger.debug(
"Found an inferred span. Filtering it out and writing it separately."
)
tracer.write([span])
else:
logger.debug("Appending a span to the returned trace")
trace_to_send.append(span)
return trace_to_send


tracer.configure(settings={"FILTERS": [InferredSpanFilter()]})
dd_trace_context = {}
dd_tracing_enabled = os.environ.get("DD_TRACE_ENABLED", "false").lower() == "true"

propagator = HTTPPropagator()


class ManagedService(Enum):
UNKNOWN = 0
API_GATEWAY = 1
API_GATEWAY_WEBSOCKET = 2
HTTP_API = 3
APPSYNC = 4


def _convert_xray_trace_id(xray_trace_id):
"""
Convert X-Ray trace id (hex)'s last 63 bits to a Datadog trace id (int).
Expand Down Expand Up @@ -377,13 +409,125 @@ def set_dd_trace_py_root(trace_context_source, merge_xray_traces):
)


def create_inferred_span(event, context, function_name):
managed_service = detect_inferrable_span_type(event)
try:
if managed_service == ManagedService.API_GATEWAY:
logger.debug("API Gateway event detected. Inferring a span")
return create_inferred_span_from_api_gateway_event(event, context)
elif managed_service == ManagedService.HTTP_API:
logger.debug("HTTP API event detected. Inferring a span")
return create_inferred_span_from_http_api_event(event, context)
elif managed_service == ManagedService.API_GATEWAY_WEBSOCKET:
logger.debug("API Gateway Websocket event detected. Inferring a span")
return create_inferred_span_from_api_gateway_websocket_event(event, context)
except Exception as e:
logger.debug(
"Unable to infer span. Detected type: {}. Reason: {}", managed_service, e
)
return None
logger.debug("Unable to infer a span: unknown event type")
return None


def detect_inferrable_span_type(event):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reuse (maybe some refactoring required) the same logic from https://github.com/DataDog/datadog-lambda-python/blob/main/datadog_lambda/trigger.py? Want to avoid inconsistency between what's shown by the trace and the invocation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can plug this into the existing trigger detection logic. I'm wondering if we are duplicating the functionality somewhat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You and Tian are really on the same brain wave here 😄

if "httpMethod" in event: # likely some kind of API Gateway event
return ManagedService.API_GATEWAY
if "routeKey" in event: # likely HTTP API
return ManagedService.HTTP_API
if (
"requestContext" in event and "messageDirection" in event["requestContext"]
): # likely a websocket API
return ManagedService.API_GATEWAY_WEBSOCKET
return ManagedService.UNKNOWN


def create_inferred_span_from_api_gateway_websocket_event(event, context):
domain = event["requestContext"]["domainName"]
endpoint = event["requestContext"]["routeKey"]
tags = {
"operation_name": "aws.apigateway.websocket",
"service.name": domain,
"url": domain + endpoint,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http_tags["http.url"] = request_context["domainName"]
sounds like the standard field is http.url?

"endpoint": endpoint,
"resource_name": domain + endpoint,
"request_id": context.aws_request_id,
"connection_id": event["requestContext"]["connectionId"],
SPAN_TYPE_TAG: SPAN_TYPE_INFERRED,
}
request_time_epoch = event["requestContext"]["requestTimeEpoch"]
args = {
"resource": domain + endpoint,
"span_type": "web",
}
tracer.set_tags({"_dd.origin": "lambda"})
span = tracer.trace("aws.apigateway.websocket", **args)
if span:
span.set_tags(tags)
span.start = request_time_epoch / 1000
return span


def create_inferred_span_from_api_gateway_event(event, context):
domain = event["requestContext"]["domainName"]
path = event["path"]
tags = {
"operation_name": "aws.apigateway.rest",
"service.name": domain,
"url": domain + path,
"endpoint": path,
"http.method": event["httpMethod"],
"resource_name": domain + path,
"request_id": context.aws_request_id,
SPAN_TYPE_TAG: SPAN_TYPE_INFERRED,
}
request_time_epoch = event["requestContext"]["requestTimeEpoch"]
args = {
"resource": domain + path,
"span_type": "http",
}
tracer.set_tags({"_dd.origin": "lambda"})
span = tracer.trace("aws.apigateway", **args)
if span:
span.set_tags(tags)
span.start = request_time_epoch / 1000
return span


def create_inferred_span_from_http_api_event(event, context):
domain = event["requestContext"]["domainName"]
path = event["rawPath"]
tags = {
"operation_name": "aws.httpapi",
"service.name": domain,
"url": domain + path,
"endpoint": path,
"http.method": event["requestContext"]["http"]["method"],
"resource_name": domain + path,
"request_id": context.aws_request_id,
SPAN_TYPE_TAG: SPAN_TYPE_INFERRED,
}
request_time_epoch = event["requestContext"]["timeEpoch"]
args = {
"resource": domain + path,
"span_type": "http",
}
tracer.set_tags({"_dd.origin": "lambda"})
span = tracer.trace("aws.httpapi", **args)
if span:
span.set_tags(tags)
span.start = request_time_epoch / 1000
return span


def create_function_execution_span(
context,
function_name,
is_cold_start,
trace_context_source,
merge_xray_traces,
trigger_tags,
upstream=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "parent" a better name?

):
tags = {}
if context:
Expand All @@ -402,6 +546,7 @@ def create_function_execution_span(
else None,
"datadog_lambda": datadog_lambda_version,
"dd_trace": ddtrace_version,
"span.name": "aws.lambda",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this? Doesn't span = tracer.trace("aws.lambda", **args) (later in this function) set the span name to aws.lambda?

}
if trace_context_source == TraceContextSource.XRAY and merge_xray_traces:
tags["_dd.parent_source"] = trace_context_source
Expand All @@ -415,4 +560,6 @@ def create_function_execution_span(
span = tracer.trace("aws.lambda", **args)
if span:
span.set_tags(tags)
if upstream:
span.parent_id = upstream.span_id
return span
27 changes: 21 additions & 6 deletions datadog_lambda/wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,12 @@
set_correlation_ids,
set_dd_trace_py_root,
create_function_execution_span,
create_inferred_span,
)
from datadog_lambda.trigger import extract_trigger_tags, extract_http_status_code_tag

logger = logging.getLogger(__name__)


"""
Usage:

Expand Down Expand Up @@ -96,6 +96,11 @@ def __init__(self, func):
self.extractor_env = os.environ.get("DD_TRACE_EXTRACTOR", None)
self.trace_extractor = None
self.span = None
self.inferred_span = None
self.make_inferred_span = (
os.environ.get("DD_INFERRED_SPANS", "false").lower() == "true"
and should_use_extension
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not support inferred spans when using the forwarder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The forwarder adds tags from the tag cache to the trace payload(s) being sent back to the trace intake endpoint. This is an additional variable and an additional thing we would need to fix before the inferred spans show up as their own services. We can always remove this check and add forwarder support later.

)
self.response = None

if self.extractor_env:
Expand Down Expand Up @@ -138,6 +143,7 @@ def __call__(self, event, context, **kwargs):

def _before(self, event, context):
try:

set_cold_start()
submit_invocations_metric(context)
self.trigger_tags = extract_trigger_tags(event, context)
Expand All @@ -153,13 +159,18 @@ def _before(self, event, context):

if dd_tracing_enabled:
set_dd_trace_py_root(trace_context_source, self.merge_xray_traces)
if self.make_inferred_span:
self.inferred_span = create_inferred_span(
event, context, self.function_name
)
self.span = create_function_execution_span(
context,
self.function_name,
is_cold_start(),
trace_context_source,
self.merge_xray_traces,
self.trigger_tags,
upstream=self.inferred_span,
)
else:
set_correlation_ids()
Expand All @@ -180,16 +191,20 @@ def _after(self, event, context):
self.trigger_tags, XraySubsegment.LAMBDA_FUNCTION_TAGS_KEY
)

if not self.flush_to_log or should_use_extension:
flush_stats()
if should_use_extension:
flush_extension()

if self.span:
if status_code:
self.span.set_tag("http.status_code", status_code)
self.span.finish()
if self.inferred_span:
if status_code:
self.inferred_span.set_tag("http.status_code", status_code)
self.inferred_span.finish()
logger.debug("datadog_lambda_wrapper _after() done")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log line is no longer at the right place.


if not self.flush_to_log or should_use_extension:
flush_stats()
if should_use_extension:
flush_extension()
except Exception:
traceback.print_exc()

Expand Down
6 changes: 3 additions & 3 deletions scripts/run_integration_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -208,9 +208,9 @@ for handler_name in "${LAMBDA_HANDLERS[@]}"; do
sed -E "s/(\"span_id\"\: \")[A-Z0-9\.\-]+/\1XXXX/g" |
sed -E "s/(\"parent_id\"\: \")[A-Z0-9\.\-]+/\1XXXX/g" |
sed -E "s/(\"request_id\"\: \")[a-z0-9\.\-]+/\1XXXX/g" |
sed -E "s/(\"duration\"\: )[0-9\.\-]+/\1XXXX/g" |
sed -E "s/(\"start\"\: )[0-9\.\-]+/\1XXXX/g" |
sed -E "s/(\"system\.pid\"\: )[0-9\.\-]+/\1XXXX/g" |
sed -E "s/(\"duration\"\: )[0-9\.\-]+/\1\"XXXX\"/g" |
sed -E "s/(\"start\"\: )[0-9\.\-]+/\1\"XXXX\"/g" |
sed -E "s/(\"system\.pid\"\: )[0-9\.\-]+/\1\"XXXX\"/g" |
sed -E "s/(\"runtime-id\"\: \")[a-z0-9\.\-]+/\1XXXX/g" |
sed -E "s/(\"datadog_lambda\"\: \")([0-9]+\.[0-9]+\.[0-9])/\1X.X.X/g" |
sed -E "s/(\"dd_trace\"\: \")([0-9]+\.[0-9]+\.[0-9])/\1X.X.X/g"
Expand Down
Loading