Skip to content

Conversation

@purple4reina
Copy link
Contributor

What Does This Do

Instead of creating a new tcp connection with each call to the extension (ie start and end invocation), use a pool of available connections.

Motivation

This change reduces aws.lambda.enhanced.runtime_duration by approximately 2%.

Screenshot 2023-12-26 at 2 00 50 PM

See https://ddserverless.datadoghq.com/notebook/2987494/reys-purple-notebook?range=3600000&view=view-mode&start=1703617009962&live=false

Additional Notes

Jira ticket: [PROJ-IDENT]

.writeTimeout(REQUEST_TIMEOUT_IN_S, SECONDS)
.readTimeout(REQUEST_TIMEOUT_IN_S, SECONDS)
.callTimeout(REQUEST_TIMEOUT_IN_S, SECONDS)
.connectionPool(new ConnectionPool(5, 300, SECONDS))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 connections, 300 second keep-alive.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do the magic numbers come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thin air. I mostly just made them up. Very happy to take any suggestions for edits though.

I figured we'll only be creating two connections, one for start and one for end. I wanted the number of connections in this pool to be a bit larger than the number we're expecting to need. This is why I chose 5.

The keep alive value of 5 minutes also just felt like it was a good amount of time to keep a connection alive. I wanna say that lambda only keeps the container alive for like 6 minutes, though I could very well be wrong.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could define a constant for them?

Sounds good, I'd verify with Darcy or AJ about the max time a lambda container will be alive, AFAIK, it can be longer than that (15 minutes(?) I'm not pretty sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this client is static, does it mean it's reused for multiple invocation? Can we test on how many connections it need by firing a bunch a requests (> 5) within a short period of time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duncanista Good idea about adding the constants. And we can't merge this until next week anyway, so I'll be sure to ask @DarcyRaynerDD what he thinks before we merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joeyzhao2018 the test I performed yesterday (with the image and link to notebook above) was using a simple hello world java function executed 10x/sec. I looked to see if there was a way to log when a new connection is made, but while this functionality exists in later versions of okhttp, it doesn't in the one we're using. (We actually vendor the package, see https://github.com/DataDog/okhttp/blob/java7/okhttp/src/main/java/okhttp3/ConnectionPool.java)

So, I don't know how else to confirm this. But, I supposed that if we're seeing all the data appear as expected and that the average runtime duration is less than it was, then this seems like a good thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, we're stuck on an old version of okhttp because the new version depends on kotlin.
But sounds like you've done your diligence.
Thanks

@pr-commenter
Copy link

pr-commenter bot commented Dec 26, 2023

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master rey.abolofia/conn-pool
git_commit_date 1707852060 1707852516
git_commit_sha c96efd6 72bee06
release_version 1.31.0-SNAPSHOT~c96efd6181 1.31.0-SNAPSHOT~72bee06be2
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1707855474 1707855474
ci_job_id 433204214 433204214
ci_pipeline_id 28234927 28234927
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
module Agent Agent
parent None None
variant iast iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 41 metrics, 13 unstable metrics.

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
end_time 2024-02-13T19:53:53 2024-02-13T20:12:50
git_branch master rey.abolofia/conn-pool
git_commit_date 1707852060 1707852516
git_commit_sha c96efd6 72bee06
release_version 1.31.0-SNAPSHOT~c96efd6181 1.31.0-SNAPSHOT~72bee06be2
start_time 2024-02-13T19:53:40 2024-02-13T20:12:37
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1707855474 1707855474
ci_job_id 433204214 433204214
ci_pipeline_id 28234927 28234927
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
variant iast iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 15 unstable metrics.

Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.31.0-SNAPSHOT~72bee06be2, baseline=1.31.0-SNAPSHOT~c96efd6181
    dateFormat X
    axisFormat %s
section baseline
no_agent (369.81 µs) : 350, 390
.   : milestone, 370,
iast (482.325 µs) : 461, 503
.   : milestone, 482,
iast_FULL (534.882 µs) : 515, 555
.   : milestone, 535,
iast_GLOBAL (497.112 µs) : 475, 519
.   : milestone, 497,
iast_HARDCODED_SECRET_DISABLED (486.868 µs) : 466, 508
.   : milestone, 487,
iast_INACTIVE (458.742 µs) : 437, 480
.   : milestone, 459,
iast_TELEMETRY_OFF (473.846 µs) : 452, 495
.   : milestone, 474,
tracing (445.381 µs) : 424, 466
.   : milestone, 445,
section candidate
no_agent (373.615 µs) : 353, 394
.   : milestone, 374,
iast (468.6 µs) : 448, 489
.   : milestone, 469,
iast_FULL (539.831 µs) : 519, 560
.   : milestone, 540,
iast_GLOBAL (498.995 µs) : 478, 520
.   : milestone, 499,
iast_HARDCODED_SECRET_DISABLED (471.592 µs) : 451, 492
.   : milestone, 472,
iast_INACTIVE (451.986 µs) : 431, 473
.   : milestone, 452,
iast_TELEMETRY_OFF (470.474 µs) : 449, 492
.   : milestone, 470,
tracing (452.673 µs) : 432, 474
.   : milestone, 453,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 369.81 µs [349.661 µs, 389.959 µs] -
iast 482.325 µs [461.266 µs, 503.383 µs] 112.515 µs (30.4%)
iast_FULL 534.882 µs [514.543 µs, 555.221 µs] 165.072 µs (44.6%)
iast_GLOBAL 497.112 µs [475.373 µs, 518.85 µs] 127.302 µs (34.4%)
iast_HARDCODED_SECRET_DISABLED 486.868 µs [466.223 µs, 507.514 µs] 117.058 µs (31.7%)
iast_INACTIVE 458.742 µs [437.116 µs, 480.369 µs] 88.933 µs (24.0%)
iast_TELEMETRY_OFF 473.846 µs [452.442 µs, 495.25 µs] 104.037 µs (28.1%)
tracing 445.381 µs [424.408 µs, 466.353 µs] 75.571 µs (20.4%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 373.615 µs [353.363 µs, 393.867 µs] -
iast 468.6 µs [448.144 µs, 489.056 µs] 94.985 µs (25.4%)
iast_FULL 539.831 µs [519.214 µs, 560.448 µs] 166.216 µs (44.5%)
iast_GLOBAL 498.995 µs [478.057 µs, 519.933 µs] 125.38 µs (33.6%)
iast_HARDCODED_SECRET_DISABLED 471.592 µs [451.158 µs, 492.026 µs] 97.977 µs (26.2%)
iast_INACTIVE 451.986 µs [430.99 µs, 472.982 µs] 78.371 µs (21.0%)
iast_TELEMETRY_OFF 470.474 µs [449.153 µs, 491.795 µs] 96.859 µs (25.9%)
tracing 452.673 µs [431.663 µs, 473.684 µs] 79.058 µs (21.2%)
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.31.0-SNAPSHOT~72bee06be2, baseline=1.31.0-SNAPSHOT~c96efd6181
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.349 ms) : 1330, 1367
.   : milestone, 1349,
appsec (1.778 ms) : 1752, 1804
.   : milestone, 1778,
iast (1.525 ms) : 1501, 1549
.   : milestone, 1525,
profiling (1.528 ms) : 1501, 1555
.   : milestone, 1528,
tracing (1.51 ms) : 1484, 1535
.   : milestone, 1510,
section candidate
no_agent (1.354 ms) : 1335, 1373
.   : milestone, 1354,
appsec (1.789 ms) : 1764, 1814
.   : milestone, 1789,
iast (1.536 ms) : 1512, 1561
.   : milestone, 1536,
profiling (1.579 ms) : 1553, 1605
.   : milestone, 1579,
tracing (1.5 ms) : 1475, 1525
.   : milestone, 1500,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.349 ms [1.33 ms, 1.367 ms] -
appsec 1.778 ms [1.752 ms, 1.804 ms] 429.432 µs (31.8%)
iast 1.525 ms [1.501 ms, 1.549 ms] 176.122 µs (13.1%)
profiling 1.528 ms [1.501 ms, 1.555 ms] 179.293 µs (13.3%)
tracing 1.51 ms [1.484 ms, 1.535 ms] 161.063 µs (11.9%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.354 ms [1.335 ms, 1.373 ms] -
appsec 1.789 ms [1.764 ms, 1.814 ms] 434.569 µs (32.1%)
iast 1.536 ms [1.512 ms, 1.561 ms] 182.289 µs (13.5%)
profiling 1.579 ms [1.553 ms, 1.605 ms] 224.726 µs (16.6%)
tracing 1.5 ms [1.475 ms, 1.525 ms] 145.739 µs (10.8%)

@purple4reina purple4reina force-pushed the rey.abolofia/conn-pool branch from 35176fa to 0c3db2c Compare December 27, 2023 17:12
Copy link
Contributor

@joeyzhao2018 joeyzhao2018 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

.writeTimeout(REQUEST_TIMEOUT_IN_S, SECONDS)
.readTimeout(REQUEST_TIMEOUT_IN_S, SECONDS)
.callTimeout(REQUEST_TIMEOUT_IN_S, SECONDS)
.connectionPool(new ConnectionPool(MAX_IDLE_CONNECTIONS, KEEP_ALIVE_DURATION, SECONDS))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DarcyRaynerDD do you have any opinions on the values we choose here?

@purple4reina purple4reina force-pushed the rey.abolofia/conn-pool branch from 0c3db2c to 3272716 Compare February 13, 2024 17:45
@purple4reina purple4reina force-pushed the rey.abolofia/conn-pool branch from 3272716 to 72bee06 Compare February 13, 2024 19:28
@bm1549 bm1549 merged commit 3210501 into master Feb 13, 2024
@bm1549 bm1549 deleted the rey.abolofia/conn-pool branch February 13, 2024 21:40
@github-actions github-actions bot added this to the 1.30.0 milestone Feb 13, 2024
@smola smola added the tag: serverless Serverless support label Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tag: serverless Serverless support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants