-
Notifications
You must be signed in to change notification settings - Fork 2.4k
feat: Add S3 support for artifacts #3563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @wmsnp, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the artifact management capabilities by introducing a new S3-compatible artifact storage service. This addition addresses the need for an official, self-hosted solution, allowing users to store and retrieve artifacts asynchronously in S3 buckets. Furthermore, it includes an important fix for path handling in the existing file-based artifact service, improving its reliability across different operating systems. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Response from ADK Triaging Agent Hello @wmsnp, thank you for creating this PR! Could you please provide the manual end-to-end (E2E) test plan in the "Manual End-to-End (E2E) Tests" section of your PR description? This should include instructions on how to manually test your changes, along with any necessary setup or configuration details, logs, or screenshots that can help reviewers better understand the fix. This information will help reviewers to review your PR more efficiently. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an S3ArtifactService for artifact storage, which is a great addition. The implementation is comprehensive and includes good test coverage.
My review focuses on a few key areas for improvement:
- There is a critical race condition in
save_artifactthat could lead to data loss under concurrent writes. - Several methods that list S3 objects (
list_artifact_keys,list_versions,list_artifact_versions) do not handle API pagination, which will lead to incomplete results when more than 1000 items exist. - There are opportunities to improve performance in
delete_artifactandget_artifact_versionby using more efficient S3 operations. - A minor improvement is suggested for the test setup to enhance test isolation.
Overall, this is a solid foundation for S3 support, and addressing these points will make it more robust and performant.
- Fixed data race during save_artifact - Added pagination to multiple list_objects_v2 calls - Deleted all versions when removing an artifact - Introduced potential data race in mock_s3_artifact_service to better simulate real concurrency - Correctly injected aioboto3 mock in unit tests
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an S3ArtifactService to provide a self-hosted artifact storage solution using S3-compatible services. The implementation is robust, leveraging asynchronous operations, pagination for listing, and atomic uploads. It also includes comprehensive unit tests with detailed mocks for the S3 client. Additionally, a fix for Windows path handling in FileArtifactService is included.
My review includes a few suggestions to improve the new service:
- Properly packaging the
aioboto3dependency as an optional extra for users. - Increasing the robustness of metadata parsing.
- Improving the clarity of the S3 client lifecycle management.
| self._s3_client = ( | ||
| await aioboto3.Session() | ||
| .client(service_name="s3", **self.aws_configs) | ||
| .__aenter__() | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The manual invocation of __aenter__ to initialize the S3 client is a bit unusual and relies on the corresponding __aexit__ call in the close() method. While this is a valid pattern for managing a long-lived client, it's not immediately obvious. To improve code clarity and maintainability for future developers, please add a comment explaining why this manual context management is used (e.g., for performance by reusing the client and its connection pool across multiple calls).
|
Hi @wmsnp , Thank you for your contribution through this pull request! This PR has merge conflicts that require changes from your end. Could you please rebase your branch with the latest main branch to address these? |
- Use `override` from `typing_extensions` instead of `typing` for better compatibility. - Add error handling to `_unflatten_metadata` to improve robustness. - Update `pyproject.toml` to resolve previous merge conflicts.
|
@ryanaiagent please check it |
|
Hi @wmsnp ,Your PR has been received by the team and is currently under review. We will provide feedback as soon as we have an update to share. |
|
Hi @DeanChensj , can you please review this? |
|
Hi @wmsnp , can we move this to adk-community repo? |
|
This is really great contribution. It fits well with adk community repo. please move it there: https://github.com/google/adk-python-community. Thanks! |
Please ensure you have read the contribution guide before creating a pull request.
Link to Issue or Description of Change
Problem:
A previous PR attempted to add an Artifact Service but was closed due to merge conflicts and incompatibility with the latest ADK interfaces. While the ADK community plugin repository could offer unofficial implementations, there is currently no official, production-ready, self-hosted Artifact Service.
Solution:
S3ArtifactServiceto provide an self-hosted Artifact storage solution:aioboto3(not yet added topyproject.toml; users must install it manually)FileArtifactServiceand its tests to ensure cross-platform compatibilityTesting Plan
Unit Tests:
Manual End-to-End (E2E) Tests:
Please provide instructions on how to manually test your changes, including any
necessary setup or configuration. Please provide logs or screenshots to help
reviewers better understand the fix.
Checklist
aioboto3, not yet inpyproject.toml)