-
Notifications
You must be signed in to change notification settings - Fork 3.6k
(app) Add s3 drive type (1/2) #14002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dedbc62 to
87361e5
Compare
|
Please see new PR description for updated changes here. I've taken over work here for @panos-is now that he is on-call and we brought this PR in a slightly different direction. |
tchaton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should use the protocol (s3:// vs lit://) as an indication on how the drive should be mounted.
We should instead have an explicit flag for mounting, so we'll be able to mount non-s3:// locations (e.g. lit:// themselves, or GCP storage).
Eventually yes, but there's no particular reason to frontload this work here. Lets add these features when we get to them instead of adding a mount flag but only supporting |
|
@lantiga I've been going back and forth here, but I think I agree with @panos-is. When we have additional drive types and options, it'll make sense to introduce more flags, but for right now the proposed solution is simple to explain/understand & doesn't require us to introduce breaking changes in the near future when we have a demonstrated need for additional arguments & behaviors. |
|
My need is about the now though (while designing the training app): how can we mount a lit:// drive? Should we surface its underlying s3 bucket? Here's the use case: I use a notebook to download a dataset from somewhere (not s3) and I use a lit:// Drive to store the data. Then I want to mount that data to a Work. How do I do that? Same goes with: I have a data-generating process that puts data into a lit:// Drive. Then I want to mount the Drive into a Work to run a training job. Let's try to figure out a way to enable this because it's going to be a pretty common use case once we roll out the app for training. I'm good with proceeding incrementally, but relying on protocol is guaranteed to not be future-proof in a short timespan. |
d08e2f8 to
56aa999
Compare
for more information, see https://pre-commit.ci
56aa999 to
411d272
Compare
for more information, see https://pre-commit.ci
* Add S3 protocol and optimization field to the drive object * Add a list of drives to the work specification * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add only protocol for s3 drives, no optimization arguments, and add tests * added trailing slash criteria * allow slash in s3 drives * fix * fixed test issues Co-authored-by: Panos Lantavos-Stratigakis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Rick Izzo <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Rick Izzo <[email protected]>
* Add S3 protocol and optimization field to the drive object * Add a list of drives to the work specification * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add only protocol for s3 drives, no optimization arguments, and add tests * added trailing slash criteria * allow slash in s3 drives * fix * fixed test issues Co-authored-by: Panos Lantavos-Stratigakis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Rick Izzo <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Rick Izzo <[email protected]>
What does this PR do?
This PR adds a list of Drives to the app Work specification, this will enable apps to mount a given drive in the work and be able to access it through the file system (Using for example file system data loaders). This is an alternative way to work with drives than the existing access pattern and will later be expanded with additional options for optimizations.@rlizzo edits: This PR has been updated after feedback to basically only include the configuration for setting an s3 based Drive Type. A follow up PR will be made which allows us to inspect
Driveobjects attacked to each work attribute and then configure them based on the protocol (s3, etc). At the advice of @tchaton we have have removed the ability to use s3 based drives from theDriveget/list/put/deletemethod. If we want to enable this in the future, we will have a discussion on the API with @awaelchli and other interested parties.All the changes presented in this PR related to running in a cloud environment, when running locally, specifying work drives will not have any effect (However, you can mimic the cloud environment by placing of symlinking files to the same root folder).
Additionally, this PR updates the Drive spec:
s3://allowing the creation of drives directly from S3 buckets.Adding a new, optionaloptimizationfield allowing users to specify optimizations for Drives when running on the cloud environment.Does your PR introduce any breaking changes? If yes, please list them.
This PR should be fully backwards compatible with existing code and the Work.drives field is optional.
cc @Borda