Skip to content

Instantly share code, notes, and snippets.

@jessitron
Created May 28, 2026 16:11
Show Gist options
  • Select an option

  • Save jessitron/f2cc160cb5e635d1c60752b5f2d038a8 to your computer and use it in GitHub Desktop.

Select an option

Save jessitron/f2cc160cb5e635d1c60752b5f2d038a8 to your computer and use it in GitHub Desktop.
OpenTelemetry Collector as a Lambda — companion files (Dockerfile, config, deploy scripts). See blog post: https://jessitron.com/

OpenTelemetry Collector as a Lambda — example files

Companion gist for Running the OpenTelemetry Collector as a Lambda.

A working starting point. Edit config.env to set a name and region, then:

./bootstrap.sh   # one-time: ECR repo, IAM role, generates a bearer token
# edit .env: set HONEYCOMB_API_KEY
./build.sh       # build the container image, push to ECR
./deploy.sh      # create-or-update the Lambda + Function URL (idempotent)

The collector forwards to Honeycomb by default. To send elsewhere, edit the exporter in config.yaml — any OTLP/HTTP backend works with the same shape.

See the blog post for what each piece does and the gotchas you'll hit without these scripts (dual-permission Function URLs, --provenance=false on buildx, sending_queue: false, and so on).

#!/usr/bin/env bash
# One-time per AWS account/region: ECR repo, Lambda execution role, bearer token.
# Idempotent — safe to re-run.
set -euo pipefail
cd "$(dirname "$0")"
# shellcheck disable=SC1091
source ./config.env
if [[ ! -f .env ]]; then
cp env.example .env
echo "created .env from template — fill in HONEYCOMB_API_KEY before deploy" >&2
fi
ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
echo "==> account ${ACCOUNT}, region ${AWS_REGION}"
if aws ecr describe-repositories \
--repository-names "${ECR_REPO}" \
--region "${AWS_REGION}" >/dev/null 2>&1; then
echo "ecr: repo ${ECR_REPO} already exists"
else
echo "ecr: creating repo ${ECR_REPO}"
aws ecr create-repository \
--repository-name "${ECR_REPO}" \
--region "${AWS_REGION}" \
--image-scanning-configuration scanOnPush=true \
--image-tag-mutability MUTABLE \
>/dev/null
fi
if aws iam get-role --role-name "${LAMBDA_ROLE_NAME}" >/dev/null 2>&1; then
echo "iam: role ${LAMBDA_ROLE_NAME} already exists"
else
echo "iam: creating role ${LAMBDA_ROLE_NAME}"
aws iam create-role \
--role-name "${LAMBDA_ROLE_NAME}" \
--assume-role-policy-document file://role-trust.json \
--description "Execution role for the OTel collector Lambda" \
>/dev/null
aws iam attach-role-policy \
--role-name "${LAMBDA_ROLE_NAME}" \
--policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
fi
if ! grep -E '^export INGEST_BEARER_TOKEN=.+' .env >/dev/null 2>&1 \
|| grep -E '^export INGEST_BEARER_TOKEN=$' .env >/dev/null 2>&1; then
TOKEN=$(openssl rand -hex 32)
python3 - "$TOKEN" <<'PY'
import pathlib, re, sys
token = sys.argv[1]
p = pathlib.Path(".env")
text = p.read_text()
new_line = f'export INGEST_BEARER_TOKEN={token}'
if re.search(r'^export INGEST_BEARER_TOKEN=', text, flags=re.M):
text = re.sub(r'^export INGEST_BEARER_TOKEN=.*$', new_line, text, flags=re.M)
else:
text = text.rstrip() + "\n" + new_line + "\n"
p.write_text(text)
PY
echo "token: generated INGEST_BEARER_TOKEN, wrote .env"
else
echo "token: INGEST_BEARER_TOKEN already set in .env"
fi
echo
echo "bootstrap complete."
echo "next: edit .env and set HONEYCOMB_API_KEY, then run ./build.sh && ./deploy.sh"
#!/usr/bin/env bash
# Build the container image and push to ECR.
set -euo pipefail
cd "$(dirname "$0")"
# shellcheck disable=SC1091
source ./config.env
LOCAL_TAG="${COLLECTOR_NAME}:local"
# Bake a build identifier into the image so every span carries collector.version.
# Falls back to a timestamp if not in a git repo.
if SHA=$(git rev-parse --short HEAD 2>/dev/null); then
if ! git diff --quiet 2>/dev/null || ! git diff --cached --quiet 2>/dev/null; then
SHA="${SHA}-dirty"
fi
else
SHA=$(date +%Y%m%d-%H%M%S)
fi
echo "==> baking COLLECTOR_VERSION=${SHA}"
# --provenance=false --sbom=false: default buildx output is an OCI manifest
# with attestations, which Lambda rejects with InvalidParameterValueException.
docker buildx build \
--platform "linux/${LAMBDA_ARCH}" \
--provenance=false \
--sbom=false \
--build-arg "COLLECTOR_VERSION=${SHA}" \
--load \
-t "${LOCAL_TAG}" .
ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
REMOTE_TAG="${ACCOUNT}.dkr.ecr.${AWS_REGION}.amazonaws.com/${ECR_REPO}:latest"
aws ecr get-login-password --region "${AWS_REGION}" \
| docker login --username AWS --password-stdin "${ACCOUNT}.dkr.ecr.${AWS_REGION}.amazonaws.com"
docker tag "${LOCAL_TAG}" "${REMOTE_TAG}"
docker push "${REMOTE_TAG}"
# Module-level config. Edit to rename the Lambda, swap the ECR repo, etc.
COLLECTOR_NAME=collector
ECR_REPO=collector
LAMBDA_ROLE_NAME=CollectorLambda
AWS_REGION=us-west-2
LAMBDA_ARCH=arm64
LAMBDA_MEMORY_MB=512
LAMBDA_TIMEOUT_S=30
extensions:
bearertokenauth/ingest:
scheme: Bearer
token: ${env:INGEST_BEARER_TOKEN}
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
# include_metadata exposes incoming HTTP headers as request metadata
# so processors can reach them via from_context (see attributes/provenance).
include_metadata: true
auth:
authenticator: bearertokenauth/ingest
processors:
# Example transform: lift gen_ai.* attributes that OpenTelemetry GenAI
# instrumentation records as span events onto the parent span. The producer
# SDK can't do this — ReadableSpan.events is immutable at on_end, but the
# collector can write to span.attributes via OTTL. Adjust or remove for your
# workload.
transform/lift_gen_ai_event_attrs:
error_mode: ignore
trace_statements:
- context: spanevent
conditions:
- name == "gen_ai.client.inference.operation.details"
statements:
- merge_maps(span.attributes, attributes, "upsert")
# Stamp every span with collector provenance: which build of the collector
# processed it, and which Lambda invocation. Useful for debugging "did this
# trace go through the collector?" and "which deploy is this from?".
attributes/provenance:
actions:
- action: insert
key: collector.version
value: ${env:COLLECTOR_VERSION}
- action: insert
key: collector.invocation_trace_id
from_context: metadata.x-amzn-trace-id
# NO batch processor: Lambda freezes the container after the handler returns,
# so batched spans never flush. Synchronous export only.
exporters:
otlphttp/honeycomb:
endpoint: ${env:HONEYCOMB_OTLP_ENDPOINT}
headers:
x-honeycomb-team: ${env:HONEYCOMB_API_KEY}
# Same reason as above: async queueing strands spans across the freeze.
sending_queue:
enabled: false
service:
extensions: [bearertokenauth/ingest]
pipelines:
traces:
receivers: [otlp]
processors: [transform/lift_gen_ai_event_attrs, attributes/provenance]
exporters: [otlphttp/honeycomb]
telemetry:
logs:
level: info
#!/usr/bin/env bash
# Create-or-update the Lambda + Function URL. Idempotent.
set -euo pipefail
cd "$(dirname "$0")"
# shellcheck disable=SC1091
source ./config.env
# shellcheck disable=SC1091
source ./.env
for var in HONEYCOMB_API_KEY HONEYCOMB_OTLP_ENDPOINT INGEST_BEARER_TOKEN; do
if [[ -z "${!var:-}" ]]; then
echo "missing ${var} in .env" >&2
exit 1
fi
done
ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
IMAGE_URI="${ACCOUNT}.dkr.ecr.${AWS_REGION}.amazonaws.com/${ECR_REPO}:latest"
ROLE_ARN="arn:aws:iam::${ACCOUNT}:role/${LAMBDA_ROLE_NAME}"
ENV_JSON=$(python3 - <<PY
import json, os
print(json.dumps({"Variables": {
"HONEYCOMB_API_KEY": os.environ["HONEYCOMB_API_KEY"],
"HONEYCOMB_OTLP_ENDPOINT": os.environ["HONEYCOMB_OTLP_ENDPOINT"],
"INGEST_BEARER_TOKEN": os.environ["INGEST_BEARER_TOKEN"],
}}))
PY
)
if aws lambda get-function \
--function-name "${COLLECTOR_NAME}" \
--region "${AWS_REGION}" >/dev/null 2>&1; then
echo "lambda: updating ${COLLECTOR_NAME} image"
aws lambda update-function-code \
--function-name "${COLLECTOR_NAME}" \
--image-uri "${IMAGE_URI}" \
--region "${AWS_REGION}" \
>/dev/null
aws lambda wait function-updated \
--function-name "${COLLECTOR_NAME}" \
--region "${AWS_REGION}"
echo "lambda: updating ${COLLECTOR_NAME} env"
aws lambda update-function-configuration \
--function-name "${COLLECTOR_NAME}" \
--environment "${ENV_JSON}" \
--region "${AWS_REGION}" \
>/dev/null
aws lambda wait function-updated \
--function-name "${COLLECTOR_NAME}" \
--region "${AWS_REGION}"
else
echo "lambda: creating ${COLLECTOR_NAME}"
aws lambda create-function \
--function-name "${COLLECTOR_NAME}" \
--package-type Image \
--code "ImageUri=${IMAGE_URI}" \
--role "${ROLE_ARN}" \
--architectures "${LAMBDA_ARCH}" \
--memory-size "${LAMBDA_MEMORY_MB}" \
--timeout "${LAMBDA_TIMEOUT_S}" \
--environment "${ENV_JSON}" \
--region "${AWS_REGION}" \
>/dev/null
aws lambda wait function-active-v2 \
--function-name "${COLLECTOR_NAME}" \
--region "${AWS_REGION}"
echo "lambda: creating function URL (auth=NONE; bearer enforced inside collector)"
aws lambda create-function-url-config \
--function-name "${COLLECTOR_NAME}" \
--auth-type NONE \
--region "${AWS_REGION}" \
>/dev/null
# Function URLs with auth_type=NONE require BOTH lambda:InvokeFunctionUrl
# AND lambda:InvokeFunction (October 2025 change). Missing the second one
# yields 403 AccessDeniedException at the URL gate with no CloudWatch logs.
aws lambda add-permission \
--function-name "${COLLECTOR_NAME}" \
--statement-id FunctionURLAllowInvokeUrl \
--action lambda:InvokeFunctionUrl \
--principal '*' \
--function-url-auth-type NONE \
--region "${AWS_REGION}" \
>/dev/null
aws lambda add-permission \
--function-name "${COLLECTOR_NAME}" \
--statement-id FunctionURLAllowInvoke \
--action lambda:InvokeFunction \
--principal '*' \
--invoked-via-function-url \
--region "${AWS_REGION}" \
>/dev/null
fi
URL=$(aws lambda get-function-url-config \
--function-name "${COLLECTOR_NAME}" \
--region "${AWS_REGION}" \
--query FunctionUrl --output text)
echo
echo "function URL: ${URL}"
echo "OTLP traces endpoint: ${URL}v1/traces"
echo
echo "set on the producer:"
echo " OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=${URL}v1/traces"
echo " OTEL_EXPORTER_OTLP_HEADERS=authorization=Bearer ${INGEST_BEARER_TOKEN}"
# The contrib image is distroless + non-root, which Lambda's container runtime
# doesn't get along with. Stage the binary into Alpine. Critical detail from
# LWA's examples: use CMD only, no ENTRYPOINT. With ENTRYPOINT set, Lambda's
# container init loops "app is not ready" and times out at 10s.
FROM otel/opentelemetry-collector-contrib:0.151.0 AS collector
FROM alpine:3.20
RUN apk add --no-cache ca-certificates
COPY --from=collector /otelcol-contrib /app/otelcol-contrib
COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:1.0.0 /lambda-adapter /opt/extensions/lambda-adapter
COPY config.yaml /etc/otel/config.yaml
ENV AWS_LWA_PORT=4318
ENV AWS_LWA_INVOKE_MODE=buffered
ENV AWS_LWA_READINESS_CHECK_PATH=/
# Stamped onto every span by the attributes/provenance processor.
# Set via --build-arg COLLECTOR_VERSION=$(git rev-parse --short HEAD) in build.sh.
ARG COLLECTOR_VERSION=unknown
ENV COLLECTOR_VERSION=${COLLECTOR_VERSION}
CMD ["/app/otelcol-contrib", "--config=/etc/otel/config.yaml"]
# Per-deployment secrets. bootstrap.sh copies this to .env and generates
# INGEST_BEARER_TOKEN. You fill in HONEYCOMB_API_KEY before running deploy.sh.
export HONEYCOMB_API_KEY=
export HONEYCOMB_OTLP_ENDPOINT=https://api.honeycomb.io
export INGEST_BEARER_TOKEN=
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": { "Service": "lambda.amazonaws.com" },
"Action": "sts:AssumeRole"
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment