Skip to content

Conversation

estolfo
Copy link
Contributor

@estolfo estolfo commented Nov 17, 2021

Instead of using io.Copy, use a more straightforward compression approach. Also use a buffer pool to avoid creating a new buffer each time a request is sent.

@github-actions github-actions bot added the aws-λ-extension AWS Lambda Extension label Nov 17, 2021
@ghost
Copy link

ghost commented Nov 17, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-11-18T10:47:18.484+0000

  • Duration: 8 min 21 sec

  • Commit: d967323

Test stats 🧪

Test Results
Failed 0
Passed 54
Skipped 0
Total 54

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@estolfo estolfo changed the title Use more straightforward compression strategy and buffer pool Use more straightforward compression approach and buffer pool Nov 17, 2021
@felixbarny
Copy link
Member

Is there a specific issue with io.Pipe? I think the advantage of it is that there are no buffers required at all.

@estolfo
Copy link
Contributor Author

estolfo commented Nov 17, 2021

Yes, there are no synchronization guarantees and the client could start issuing the request before the gzip writer had finished writing everything. If there's an error, the body would be incomplete. Using a buffer ensures that the data is properly gzipped before it is sent.

Copy link
Contributor

@marclop marclop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on removing io.Pipe() too, I think that going forward we could also have some micro benchmarks to get a good understanding of the impact.

@estolfo estolfo requested a review from marclop November 17, 2021 12:54
@estolfo estolfo requested review from astorm and felixbarny November 17, 2021 12:56
@felixbarny
Copy link
Member

the client could start issuing the request before the gzip writer had finished writing everything. If there's an error, the body would be incomplete.

I don't see an issue with that. In fact, I think that's preferable. It avoids unnecessarily copying from buffer to buffer. The Elastic APM Java agent is also directly serializing the events to the HTTP request body to avoid having to copy and allocate buffers. I have never seen a situation where compression would fail.

@marclop
Copy link
Contributor

marclop commented Nov 18, 2021

I modified the PostToAPM function to accept an *http.Client so it doesn't cause an allocation every time it is called and did some benchmarking on my machine and the pipes seem to be slower than the pooled bytes.Buffer approach most likely because the io.Pipe() call causes extra allocations.

Overall it seems to be around 10% faster than the io.Pipe approach and it causes less allocations, which is also desirable.

$ benchstat old.txt new.txt
name         old time/op    new time/op    delta
PostToAPM-8     151µs ±11%     135µs ± 8%  -10.88%  (p=0.032 n=5+5)

name         old alloc/op   new alloc/op   delta
PostToAPM-8    1.25MB ± 0%    1.22MB ± 0%   -2.46%  (p=0.008 n=5+5)

name         old allocs/op  new allocs/op  delta
PostToAPM-8       112 ± 1%       104 ± 1%   -7.33%  (p=0.008 n=5+5)

Copy link
Contributor

@astorm astorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for avoid concurrency complications where we can. I ran this branch through the usual quick manual smoke test and did not see any regressions -- data continued to flow into my elastic cloud instance.

No objections to this change going in, but also no strong opinions on which direction we should go (with a slight edit to note that @marclop's quick benchmarking is a an additional +1 for a move away from the io.Pipe approach.

Copy link
Member

@felixbarny felixbarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the benchmarks, Marc! I have no objections now.

I modified the PostToAPM function to accept an *http.Client

Could you add that commit to this PR?

Signed-off-by: Marc Lopez Rubio <[email protected]>
@estolfo estolfo merged commit 550b9e0 into main Nov 18, 2021
@estolfo estolfo deleted the estolfo/compression-2 branch November 18, 2021 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

aws-λ-extension AWS Lambda Extension

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants