Skip to content

CreateProxyTxn is called twice for a reqID #387

@trentm

Description

@trentm

/cc @lahsivjar

APM AWS Lambda extension version (apm-aws-lambda version): latest main (commit 02f4ef1) plus a patch for some debug logging

Describe the bug

I believe the extension is reporting a transaction twice in a certain case -- at least it appears that CreateProxyTxn is called twice. My case is when (a) using the extension and Node.js APM agent layers that support the recent partial transactions work and (b) a lambda function executes that results in terminating the function process (via process.exit(42) in my case). This results in the extension receiving the platform.runtimeDone event from the Logs API, and then the SHUTDOWN event. The handlers for both of those events result in calling b.finalizeInvocation(...).

I'll show full repro steps below, but the annotated log is here: https://gist.github.com/trentm/66ebb5eb18c43eea0a8c013e193a41e6

My guess at the issue: should this code be deleting reqID from b.invocations?

func (b *Batch) OnLambdaLogRuntimeDone(reqID, status string, time time.Time) error {
	b.mu.Lock()
	defer b.mu.Unlock()
	return b.finalizeInvocation(reqID, status, time)
}

To Reproduce

  • create a new Lambda function, mine is named trentm-play-fn3

    name: trentm-play-fn3
    runtime: Node.js 18.x
    architecture: x86_64
    advanced settings/
        [x] Enable function URL
            Auth type: NONE
    

    Function URL: https://[REDACTED].lambda-url.us-west-2.on.aws/
    You don't have to enable a function URL, but it is a convenient way to invoke the function.

  • Handler code.
    WARNING: The AWS Lambda console defaults to creating an index.mjs handler
    file. This uses ES modules that the Node.js APM agent doesn't currently support.
    You must (a) delete index.mjs and (b) create a index.js file with:

    module.exports.handler = async (event) => {
        console.log('[handler] going to process.exit now');
        process.exit(42);
    };
  • Add layers to get the latest (unreleased) ext change, and the draft Node.js APM
    changes (feat: pre-register partial transactions in AWS Lambda apm-agent-nodejs#3285) to make the
    POST /register/transaction calls to the extension. You can use the following, or build your own:

    arn:aws:lambda:us-west-2:627286350134:layer:trentm-play-apm-lambda-extension:8
    arn:aws:lambda:us-west-2:627286350134:layer:trentm-dev-elastic-apm-node:53
    

    See note below for the patch to the extension in my build (for debug logging).

  • Configure envvars for the Lambda:

    NODE_OPTIONS=-r elastic-apm-node/start
    ELASTIC_APM_LAMBDA_APM_SERVER=https://trentm-play11.apm.us-west2.gcp.elastic-cloud.com:9243
    ELASTIC_APM_SECRET_TOKEN=[REDACTED]
    ELASTIC_APM_LOG_LEVEL=trace
    
  • Call the Lambda function once. We expect a 502 because it will crash.

    % curl -i https://[REDACTED].lambda-url.us-west-2.on.aws/
    HTTP/1.1 502 Bad Gateway
    Date: Tue, 25 Apr 2023 22:30:32 GMT
    Content-Type: application/json
    Content-Length: 21
    Connection: keep-alive
    x-amzn-RequestId: 343754cf-7763-402d-a7d8-0d2e5bda956a
    X-Amzn-Trace-Id: root=1-64485487-0126d8ea0a9690263d1bb94b;sampled=0;lineage=d073293c:0
    
    Internal Server Error
    
  • Look at the logs via aws logs tail --since 10s --follow /aws/lambda/trentm-play-fn3
    Gist of the annotated log output: https://gist.github.com/trentm/66ebb5eb18c43eea0a8c013e193a41e6

Patch to apm-aws-lambda.git for debug logging

the diff
diff --git a/accumulator/batch.go b/accumulator/batch.go
index 61618a4..5db1869 100644
--- a/accumulator/batch.go
+++ b/accumulator/batch.go
@@ -25,6 +25,7 @@ import (
        "time"

        "github.com/tidwall/gjson"
+       "go.uber.org/zap"
 )

 var (
@@ -79,15 +80,17 @@ type Batch struct {
        // invoke lifecycle then it is possible to receive the agent init request
        // before extension invoke is registered.
        currentlyExecutingRequestID string
+       logger                      *zap.SugaredLogger
 }

 // NewBatch creates a new BatchData which can accept a
 // maximum number of entries as specified by the arguments.
-func NewBatch(maxSize int, maxAge time.Duration) *Batch {
+func NewBatch(maxSize int, maxAge time.Duration, logger *zap.SugaredLogger) *Batch {
        return &Batch{
                invocations: make(map[string]*Invocation),
                maxSize:     maxSize,
                maxAge:      maxAge,
+               logger:      logger,
        }
 }

@@ -311,6 +314,7 @@ func (b *Batch) finalizeInvocation(reqID, status string, time time.Time) error {
        if !ok {
                return fmt.Errorf("invocation for requestID %s does not exist", reqID)
        }
+       b.logger.Infof("XXX call inc.CreateProxyTxn: status=%s time=%s", status, time)
        proxyTxn, err := inc.CreateProxyTxn(status, time)
        if err != nil {
                return err
diff --git a/app/app.go b/app/app.go
index 23ef134..d981c00 100644
--- a/app/app.go
+++ b/app/app.go
@@ -61,7 +61,6 @@ func New(ctx context.Context, opts ...ConfigOption) (*App, error) {

        app := &App{
                extensionName: c.extensionName,
-               batch:         accumulator.NewBatch(defaultMaxBatchSize, defaultMaxBatchAge),
        }

        var err error
@@ -70,6 +69,8 @@ func New(ctx context.Context, opts ...ConfigOption) (*App, error) {
                return nil, err
        }

+       app.batch = accumulator.NewBatch(defaultMaxBatchSize, defaultMaxBatchAge, app.logger)
+
        apmServerAPIKey, apmServerSecretToken, err := loadAWSOptions(ctx, c.awsConfig, app.logger)
        if err != nil {
                return nil, err

Metadata

Metadata

Assignees

No one assigned

    Labels

    aws-λ-extensionAWS Lambda ExtensionbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions