-
Notifications
You must be signed in to change notification settings - Fork 456
CDRIVER-5971 Use Amazon ECR to obtain OCI images in EVG #2058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
vector-of-bool
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really hate that this is necessary. AFAIK, the "proper" way to handle container image proxying is to use a container regisries configuration file that redirects image requests, and a credentials helper that runs on the system to automatically authenticate against the registry.
I asked in devprod and, to my surprise (not), Docker is the limiting factor as it still refuses to implement years-old consensus among containerization tools. I don't want to sound too accusatory, but I will anyway, as Docker (the tool) seems to consistently drag its feet on any feature that would make it easier to remove dependence on Docker (the org) infrastructure. IIUC, if we could ensure everything ran via podman, devprod would be able to deploy a registries config that could do all of this for us and we wouldn't need to do any of these changes at all.
Anyway... my biggest objection would be the hardcoding of our infrastructure specifics into our main tooling scripts (Earthfiles, specifically), as it renders them unusable outside of our org and puts an "expiration date" on the file, which will become unusable when the referenced ECR host is inevitably moved or decommissioned (I wish we could at least use a CNAME rather than 901841024863.dkr.ecr.us-east-1.amazonaws.com (??)).
I thought about it for a while and have come up with a solution that will remove the "expiration date" from Earthfile: Just parameterize on the container registry, whith docker.io/library being the default, and setting that parameter when running in EVG. See the suggested changes for reference.
EVG tasks that specify this may want to use the .arg file or EARTHLY_ARGS environment variable for setting this parameter, since it would probably be used everywhere and its easier than trying to pipe it through all the build commands in the file.
Aside: Earthly appears to have some functionality for dealing with registries and credential helpers, but I'm not sure if it's relevant or useful at the moment.
| @@ -1,4 +1,4 @@ | |||
| VERSION --arg-scope-and-set --pass-args 0.7 | |||
| VERSION --arg-scope-and-set --pass-args --use-function-keyword 0.7 | |||
| LOCALLY | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a --global build arg to specify where to get containers by default:
| LOCALLY | |
| LOCALLY | |
| ARG --global default_container_registry = "docker.io/library" |
(I'm not set on this parameter name, but just an example.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chose the name default_search_registry in reference to podman's unqualified-search-registries setting.
Earthfile
Outdated
| # Create a release archive of the source tree. (Refer to dev docs) | ||
| release-archive: | ||
| FROM alpine:3.20 | ||
| FROM 901841024863.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/alpine:3.20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameterize the registry:
| FROM 901841024863.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/alpine:3.20 | |
| FROM $default_container_registry/alpine:3.20 |
and so on for other updated image references
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced all DockerHub-provided image names with $default_search_registry/library/<image>. DevProd-provided release tool images remain fully-qualified with the Amazon ECR registry name. Omitted the /library suffix to support orgname-qualified image short-names.
tools/earthly.sh
Outdated
|
|
||
| run-earthly() { | ||
| "$EARTHLY_EXE" "$@" | ||
| "$EARTHLY_EXE" --buildkit-image 901841024863.dkr.ecr.us-east-1.amazonaws.com/dockerhub/earthly/buildkitd:v0.8.3 "$@" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the suggested refactor, specifying --buildkit-image=... would need to be done in the generated tools/earthly.sh commands for the EVG tasks, rather than hardcoded in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EVG tasks that specify this may want to use the
.argfile orEARTHLY_ARGSenvironment variable for setting this parameter, since it would probably be used everywhere and its easier than trying to pipe it through all the build commands in the file.
Moving both into earthly_exec() as an additional set of CLI arguments to earthly.sh seems to be sufficient.
|
@vector-of-bool Thank you for the suggestions! Amazon ECR is no longer unconditionally required to obtain images which are hosted on DockerHub. However, authentication with Amazon ECR remains a requirement to obtain DevProd Release Tools images used during a release (e.g. |
vector-of-bool
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, LGTM
| # Create a release archive of the source tree. (Refer to dev docs) | ||
| release-archive: | ||
| FROM alpine:3.20 | ||
| FROM $default_search_registry/library/alpine:3.20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Embedding the /library suffix in the variable rules out its use with image names containing an orgname, e.g. if the /library suffix is embedded:
$default_search_registry/ubuntu:24.04->docker.io/library/ubuntu:24.04(correct)$default_search_registry/earthly/buildkitd->docker.io/library/earthly/buildkitd(incorrect)$default_search_registry/mongodb/mongo-cxx-driver->docker.io/library/mongodb/mongo-cxx-driver(incorrect)
I was expecting use of non-DockerHub-provided images to use fully-qualified names which include the registry name, as currently done for DevProd Release Tools images, e.g. quay.io/centos/centos:stream10 rather than centos:stream10, which is also consistent with Red Hat recommendations:
Users pulling without specifying the full path, commonly referred to as a short name, leaves them open to a particularly nasty form of attack, called image squatting. [...] Red Hat recommends that users always use fully qualified names when referring to container images in any context. [...] Consider removing all registries from the search list so that short names don’t resolve.
I do not think the ability to set $default_search_registry to anything but the default (DockerHub) or Amazon ECR (in CI) is really in scope for these changes, which are primarily to avoid Artifactory without hitting DockerHub rate limits again. Let me know if you'd like the suffix to be embedded regardless.
| return _SEPARATOR.join(f"{k}={v}" for k, v in self._asdict().items()) | ||
|
|
||
|
|
||
| # Use DevProd-provided Amazon ECR instance to obtain earthly-buildkitd in advance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment appears unrelated to this class's purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, outdated comment from old changes. Fixed.
kevinAlbs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for investigating and making these changes. I expect this will help the reliability of Evergreen tasks. LGTM with similar changes applied to the SBOM scripts.
docs/dev/earthly.rst
Outdated
| above. | ||
|
|
||
| .. seealso:: `"DevProd Platforms Container Registry" | ||
| <https://docs.devprod.prod.corp.mongodb.com/devprod-platforms-ecr>`_ and `"Configuring IAM Identity Center authentication with the AWS CLI" <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-sso.html>_`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| <https://docs.devprod.prod.corp.mongodb.com/devprod-platforms-ecr>`_ and `"Configuring IAM Identity Center authentication with the AWS CLI" <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-sso.html>_`. | |
| <https://docs.devprod.prod.corp.mongodb.com/devprod-platforms-ecr>`_ and `"Configuring IAM Identity Center authentication with the AWS CLI" <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-sso.html>`_. |
| return _SEPARATOR.join(f"{k}={v}" for k, v in self._asdict().items()) | ||
|
|
||
|
|
||
| # Authenticate with DevProd-provided Amazon ECR instance to use as pull-through cache for DockerHub. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest applying similar changes to sbom.py / sbom.sh since sbom.sh still refers to an artifactory image.
* Address Earthly 0.8 compatibility warnings * Workaround repository metalink retrieval errors on CentOS * Update release instructions to use AWS ECR instead of Artifactory * Use `$default_search_registry` to avoid forcing use of Amazon ECR * Also migrate sbom task from Artifactory to Amazon ECR
* Address Earthly 0.8 compatibility warnings * Workaround repository metalink retrieval errors on CentOS * Update release instructions to use AWS ECR instead of Artifactory * Use `$default_search_registry` to avoid forcing use of Amazon ECR * Also migrate sbom task from Artifactory to Amazon ECR
* CDRIVER-5891 address implicit bool-to-null conversion in return statements (#1888) * CDRIVER-5891 extend C standard coverage up to C23 (#1890) * CDRIVER-5959 sync change streams unified spec tests with 0aee4aad (#1971) * CDRIVER-5304 set cluster time in test sessions (#2008) * Ignore -Wpre-c11-compat warnings when building public-header-warnings (#2027) * CDRIVER-6042 Migrate Python scripts from Poetry to Astral UV (#2039) * Migrate EVG task coverage to latest Debian, Ubuntu, and RHEL distros (#2044) * CDRIVER-6010 use `ec2.assume_role` for Azure KMS task (#2051) * CDRIVER-5971 Use Amazon ECR to obtain OCI images in EVG (#2058) * update Earthly version from 0.8.3 to 0.8.16 (#2072) --------- Co-authored-by: Kevin Albertson <[email protected]>
* Address Earthly 0.8 compatibility warnings * Workaround repository metalink retrieval errors on CentOS * Update release instructions to use AWS ECR instead of Artifactory * Use `$default_search_registry` to avoid forcing use of Amazon ECR * Also migrate sbom task from Artifactory to Amazon ECR
* CDRIVER-5959 temporarily restrict updateDescription tests to 8.1 or older (#1970) * CDRIVER-5959 sync change streams unified spec tests with 0aee4aad (#1971) * CDRIVER-5304 set cluster time in test sessions (#2008) * CDRIVER-6010 use `ec2.assume_role` for mongohouse task (#2025) * Ignore -Wpre-c11-compat warnings when building public-header-warnings (#2027) * CDRIVER-6042 Migrate Python scripts from Poetry to Astral UV (#2039) * Migrate EVG task coverage to latest Debian, Ubuntu, and RHEL distros (#2044) * CDRIVER-6010 use `ec2.assume_role` for Azure KMS task (#2051) * CDRIVER-5971 Use Amazon ECR to obtain OCI images in EVG (#2058) * CDRIVER-5874 Drop Debian 10 EVG task coverage (#2062) * update Earthly version from 0.8.3 to 0.8.16 (#2072) --------- Co-authored-by: Kevin Albertson <[email protected]>
Resolves CDRIVER-5971. These changes should improve the performance and reliability of Earthly tasks on Evergreen by avoiding both DockerHub rate limits and Artifactory's limited bandwidth.
@mdb-ad and @rcsanchez97: even if not review-requested, I recommended attempting the documented instructions to authenticate with Amazon ECR and pull images on your local machine and verify if it works as expected, since these changes and instructions are meant to preserve/support the ability to run Earthly tasks locally (not Evergreen only).
DevProd plans to decommission the Artifactory instance currently providing DevProd Release Tools images (e.g. garasign-gpg, silkbomb, etc.). We temporarily used Artifactory as a workaround for DockerHub rate limits (#1794) but reverted shortly after (#1927). The new DevProd-provided Amazon ECR instance is intended to the proper long-term solution going forward.
The Amazon ECR instance is both a host for internal images (e.g. DevProd Release Tools) as well as a pull-through cache for DockerHub images. Therefore, it serves all our present OCI image acquisition needs. DockerHub images
<name>and<orgname>/<name>are specified aslibrary/<name>and<orgname>/<name>respectively. DevProd Release Tools imagesrelease-tools-container-registry-local/<name>are specified asrelease-infrastructure/<name>.However, using this Amazon ECR instance requires performing IAM Identity Center authentication with the
devprod-platforms-ecrAWS account. This is simple in Evergreen usingec2.assume_role, but the steps are a bit more involved on local developer machines. These steps are documented inearthly.rst, but reiterated below for locality.Using Amazon ECR requires authenticating with the
devprod-platforms-ecrAWS account. Before pulling any OCI images (viadocker,podman, orearthly), you must obtain AWS short-term credentials using SSO authentication with AWS CLI v2 (note: not v1, which does not supportaws sso). AWS CLI v2 can be obtained by runningsnap install awscli --classic(see: install docs). Initial configuration can be done by runningaws configure ssoor directly setting$HOME/.aws/configto the following:Note
"<profile>" is the AWS account, role, etc. "<session_name>" is the short-term credentials. I recommend setting them to something like "amazon-ecr" and "<your username>" respectively.
To modify an existing configuration, run
aws configure sso --profile <profile>.When short-term credentials expire, the following error is emitted:
The expired credentials be refreshed by running either one of:
The AWS credentials must be passed to
podmanand/ordockerafter (re)authentication by running:At this point, you should be able to pull images from Amazon ECR, e.g. by running
podman pull 901841024863.dkr.ecr.us-east-1.amazonaws.com/release-infrastructure/garasign-gpg.Note
Update: per feedback, implemented a
$default_search_registryEarthly argument with a default value ofdocker.ioto avoid unconditionally using Amazon ECR for images hosted on DockerHub. Amazon ECR can be used by default by passing the flag:DevProd Release Tools images remain fully-qualified with the Amazon ECR registry name.
I was unable to find a method of configuring the default container image registry that preserves the use of short-names (e.g.
alpine:3.20), but which doesn't involve system-level or user-level configuration, which would conflict with Evergreen task reproducibility. I am open to suggestions or a followup PR; otherwise, all images require the901841024863.dkr.ecr.us-east-1.amazonaws.com/prefix to be pulled from Amazon ECR. This is not necessary when manually runningpullcommands on local machines (such as to pre-pull images from DockerHub prior to their use by Earthly), e.g.podman pull alpineshould resolve to DockerHub (by default) whereaspodman pull garasign-gpgshould resolve to Amazon ECR (when authenticated).