Improve Docker image caching and testing #78552

pugnascotia · 2021-10-01T10:17:12Z

Firstly: we tag our Docker images with various pieces of information,
including a timestamp for when the image was built. However, this makes
it impossible completely cache the image. When developing the Docker
images, it's very tedious to completely rebuild an image for every
single change. Therefore, provided we're not building a proper release
build, we fix the build time to midnight so that the Docker build cache
is usable.

Secondly: the DockerBuildTask outputs a marker file to indicate that
an image has been built, but that isn't enough for a meaningful
up-to-date check by Gradle. Improve this by fetching the newly-built
image's hash, and writing that to the output file.

Thirdly: improve the Docker tests to make them more ergonomic, and also
disble ingest.geoip.downloader.enabled by default.

Fourthly: add missing test coverage for sourcing settings from env vars.

Firstly: we tag our Docker images with various pieces of information, including a timestamp for when the image was built. However, this makes it impossible completely cache the image. When developing the Docker images, it's very tedious to completely rebuild an image for every single change. Therefore, provided we're not building a proper release build, we fix the build time to midnight so that the Docker build cache is usable. Secondly: the `DockerBuildTask` outputs a marker file to indicate that an image has been built, but that isn't enough for a meaningful up-to-date check by Gradle. Improve this by fetching the newly-built image's hash, and writing that to the output file. Thirdly: improve the Docker tests to make them more ergonomic, and also disble `ingest.geoip.downloader.enabled` by default.

elasticmachine · 2021-10-01T10:17:15Z

Pinging @elastic/es-delivery (Team:Delivery)

mark-vieira

I think we can simplify this to only include the change to how we handle noCache, the other changes in practice don't have meaningful effects.

Firstly: we tag our Docker images with various pieces of information,
including a timestamp for when the image was built. However, this makes
it impossible completely cache the image.

Not exactly. It makes it impossible to cache those layers, but since we're following best practice and adding these labels at the end of our Dockerfile, we can still cache all the expensive stuff. In practice, rebuilding the layers that simply add the labels is trivial and amounts to almost no time at all.

Secondly: the DockerBuildTask outputs a marker file to indicate that
an image has been built, but that isn't enough for a meaningful
up-to-date check by Gradle.

In this case the content of the marker file is mostly irrelevant. It's simply there to tell Gradle "hey, you've run this task". The only time Gradle actually cares about the content of the file is if something were to modify it, but the only thing that will do that is a clean, which will just delete it, and in any case, any modifications (regardless of the original content) would simply trigger the task to run again.

The marker file is just there to be a placeholder for the fact that the "output" of this task is not a simple file or directory that Gradle can track. The bit that's important is not the output, it's the inputs. Which is to say that so long as that marker file exists, and the inputs haven't changed since the last execution, Gradle will continue to treat the task as UP-TO-DATE. The reality is that basically any code change (aside from plugins) will change the ES distro, so we have to rebuild the Docker image. This already works, so there's no improvement to make here.

I did some experiments locally. All I did was build the image, make a code change, then rebuild. Comparing this PR to simply changing master to use noCache = false the results are identical.

This pull request: 48.6s
master w/ cache: 48.2s

So the problem wasn't we were inefficiently leveraging Docker caching, it's that we weren't using it at all. I'd simply reduce this to just set noCache = false, but instead of doing this conditionally based on whether it's a snapshot build, I'd use BuildParams.isCi. I think we always want to rebuild in CI, even for snapshot build. What we really care about is local developer builds being quicker, which using the Docker cache will definitely improve.

pugnascotia · 2021-10-05T09:27:30Z

The reason for writing the image SHA to the output file is so that Gradle knows to re-run tasks that depend on the Docker image - specifically the Cloud ESS image, which depends on the Cloud image. I was finding that when I edited the Cloud image, it was rebuilt but the Cloud ESS one was skipped. This is non-obvious and means you need to remove the marker file or run a clean to rebuild the ESS image.

As for the timestamp stuff - you're right that these are quick tasks and we do them at the end of the build in order to leverage the layer cache, but they still change the output SHA, which means that if you're running tests, the export image step still runs, and that's not lightening quick. So minor cleanups, comment changes or formatting changes in the Dockerfile slow down the build / test cycle. I was hitting this myself, so it is a DX issue, albeit a pretty minor one.

I'll make the BuildParams.isCi change.

…provements

mark-vieira · 2021-10-05T17:00:28Z

The reason for writing the image SHA to the output file is so that Gradle knows to re-run tasks that depend on the Docker image - specifically the Cloud ESS image, which depends on the Cloud image. I was finding that when I edited the Cloud image, it was rebuilt but the Cloud ESS one was skipped.

Right. This would only come into play if the output of the normal docker image was an input to the cloud image task but I don't see where that's happening. I assume you're referring to the ESS image, which depends on the cloud image. There's a task dependency there, but we aren't wiring up any inputs, so as it is, updating the marker file to include a hash wouldn't change any behavior.

So to get this working the way we want we'll want to make the output for the cloud build image an input to the ess cloud image. Also, like I said above, it's not super critical what the content of that marker file is, only that it changes to indicate to downstream tasks that they need to run again. So it can just be a timestamp for all we care, and that'll save us to work of having to shell out to docker to grab the image sha.

distribution/docker/build.gradle

mark-vieira · 2021-10-05T17:03:46Z

Nevermind, I see where you're adding the input now. That makes sense. The point about the content of the marker file stands though. It doesn't have to be a hash, it could be anything that simply changes on every run. The only benefit of using a hash was that if you say, made a change, built, then reverted and built again, the build cache could kick in since we'd already built that input. Docker tasks aren't cacheable obviously because the "output" isn't a file on disk so it's not applicable here.

pugnascotia · 2021-10-06T13:12:27Z

The image hash is a digest of the contents, so surely it's the most appropriate data to put in the marker file? Anything else is just arbitrary. If I adjust the Dockerfile in a way that doesn't change the layers (commenting, formatting), then downstream tasks shouldn't be affected, so it makes sense to me to output the image hash.

pugnascotia · 2021-10-06T13:24:09Z

@mark-vieira I'd like to get this merged so that I can reduce the diff in #77544.

…provements

mark-vieira · 2021-10-06T18:09:16Z

If I adjust the Dockerfile in a way that doesn't change the layers (commenting, formatting), then downstream tasks shouldn't be affected, so it makes sense to me to output the image hash.

Good point. That's a scenario I hadn't considered. You are definitely correct, using the hash is the more "correct" and it does better handle some scenarios. I'm good with keeping the implementation as is. Thanks, Rory!

…provements

pugnascotia · 2021-10-07T09:18:06Z

Backported to 7.x in f75ba8b.

Firstly: we tag our Docker images with various pieces of information, including a timestamp for when the image was built. However, this makes it impossible completely cache the image. When developing the Docker images, it's very tedious to completely rebuild an image for every single change. Therefore, provided we're not building a proper release build, we fix the build time to midnight so that the Docker build cache is usable. Secondly: the `DockerBuildTask` outputs a marker file to indicate that an image has been built, but that isn't enough for a meaningful up-to-date check by Gradle. Improve this by fetching the newly-built image's hash, and writing that to the output file. Thirdly: improve the Docker tests to make them more ergonomic, and also disable `ingest.geoip.downloader.enabled` by default. Fourthly: add missing test coverage for sourcing settings from env vars.

…' into feature/data_stream_support_routing * wjp/feature/data_stream_support_routing: (44 commits) Revert "Adjust /_cat/templates not to request all metadata (elastic#78812)" Allow indices lookup to be built lazily (elastic#78745) [DOCS] Document default security in alpha2 (elastic#78227) Add cluster applier stats (elastic#77552) Fix failing URLDecodeProcessorTests::testProcessor test (elastic#78690) Upgrade to lucene snapshot ba75dc5e6bf (elastic#78817) Adjust /_cat/templates not to request all metadata (elastic#78812) Simplify build plugin license handling (elastic#77009) Fix SearchableSnapshotsBlobStoreCacheIntegTests.testBlobStoreCache (elastic#78616) Improve Docker image caching and testing (elastic#78552) Load knn vectors format with mmapfs (elastic#78724) Fix date math zone test to use negative minutes (elastic#78796) Changing name of shards field in node/stats api to shard_stats (elastic#78531) [DOCS] Fix system index refs in restore tutorial (elastic#78582) Add previously removed settings back for 8.0 (elastic#78784) TSDB: Fix template name in test Add a system property to forcibly format everything (elastic#78768) Revert "Adding config so that some tests will break if over-the-wire encryption fails (elastic#78409)" (elastic#78787) Must date math test failure Adding config so that some tests will break if over-the-wire encryption fails (elastic#78409) ...

pugnascotia added >enhancement >non-issue >test Issues or PRs that are addressing/adding tests :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts v8.0.0 v7.16.0 labels Oct 1, 2021

pugnascotia requested a review from breskeby October 1, 2021 10:17

elasticmachine added the Team:Delivery Meta label for Delivery team label Oct 1, 2021

Imports

c42fb62

pugnascotia requested a review from mark-vieira October 1, 2021 13:54

pugnascotia mentioned this pull request Oct 1, 2021

Introduce declarative plugin management #77544

Merged

mark-vieira requested changes Oct 1, 2021

View reviewed changes

pugnascotia added 6 commits October 5, 2021 10:31

Use BuildParams.isCi

6c6fae9

Merge remote-tracking branch 'upstream/master' into docker-caching-im…

18bb56e

…provements

Merge remote-tracking branch 'upstream/master' into docker-caching-im…

2dc4331

…provements

Fix typo

f112f31

Revert accidental change

b48fd09

Fix snafu

6b84abc

mark-vieira reviewed Oct 5, 2021

View reviewed changes

distribution/docker/build.gradle Outdated Show resolved Hide resolved

Simpify input -> output wiring

f2e28e7

Merge remote-tracking branch 'upstream/master' into docker-caching-im…

c6659e0

…provements

Merge remote-tracking branch 'upstream/master' into docker-caching-im…

c362f72

…provements

pugnascotia merged commit 67e310e into elastic:master Oct 7, 2021

pugnascotia deleted the docker-caching-improvements branch October 7, 2021 08:19

jakelandis added v8.0.0-beta1 and removed v8.0.0 labels Oct 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve Docker image caching and testing #78552

Improve Docker image caching and testing #78552

Uh oh!

pugnascotia commented Oct 1, 2021

Uh oh!

elasticmachine commented Oct 1, 2021

Uh oh!

mark-vieira left a comment •

edited

Loading

Uh oh!

pugnascotia commented Oct 5, 2021

Uh oh!

mark-vieira commented Oct 5, 2021

Uh oh!

Uh oh!

mark-vieira commented Oct 5, 2021

Uh oh!

pugnascotia commented Oct 6, 2021

Uh oh!

pugnascotia commented Oct 6, 2021

Uh oh!

mark-vieira commented Oct 6, 2021 •

edited

Loading

Uh oh!

pugnascotia commented Oct 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Improve Docker image caching and testing #78552

Improve Docker image caching and testing #78552

Uh oh!

Conversation

pugnascotia commented Oct 1, 2021

Uh oh!

elasticmachine commented Oct 1, 2021

Uh oh!

mark-vieira left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pugnascotia commented Oct 5, 2021

Uh oh!

mark-vieira commented Oct 5, 2021

Uh oh!

Uh oh!

mark-vieira commented Oct 5, 2021

Uh oh!

pugnascotia commented Oct 6, 2021

Uh oh!

pugnascotia commented Oct 6, 2021

Uh oh!

mark-vieira commented Oct 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pugnascotia commented Oct 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mark-vieira left a comment •

edited

Loading

mark-vieira commented Oct 6, 2021 •

edited

Loading