Skip to content

Conversation

@panos-is
Copy link
Contributor

@panos-is panos-is commented Aug 3, 2022

What does this PR do?

This PR adds a list of Drives to the app Work specification, this will enable apps to mount a given drive in the work and be able to access it through the file system (Using for example file system data loaders). This is an alternative way to work with drives than the existing access pattern and will later be expanded with additional options for optimizations.


@rlizzo edits: This PR has been updated after feedback to basically only include the configuration for setting an s3 based Drive Type. A follow up PR will be made which allows us to inspect Drive objects attacked to each work attribute and then configure them based on the protocol (s3, etc). At the advice of @tchaton we have have removed the ability to use s3 based drives from the Drive get/list/put/delete method. If we want to enable this in the future, we will have a discussion on the API with @awaelchli and other interested parties.


All the changes presented in this PR related to running in a cloud environment, when running locally, specifying work drives will not have any effect (However, you can mimic the cloud environment by placing of symlinking files to the same root folder).

Additionally, this PR updates the Drive spec:

  1. Adding a new protocol s3:// allowing the creation of drives directly from S3 buckets.
  2. Adding a new, optional optimization field allowing users to specify optimizations for Drives when running on the cloud environment.

Does your PR introduce any breaking changes? If yes, please list them.

This PR should be fully backwards compatible with existing code and the Work.drives field is optional.

cc @Borda

@panos-is panos-is added feature Is an improvement or enhancement app (removed) Generic label for Lightning App package labels Aug 3, 2022
@rlizzo rlizzo force-pushed the panos/grid-9730-add-drives-to-work-spec branch from dedbc62 to 87361e5 Compare August 8, 2022 14:43
@rlizzo rlizzo self-assigned this Aug 8, 2022
@rlizzo
Copy link
Contributor

rlizzo commented Aug 8, 2022

Please see new PR description for updated changes here. I've taken over work here for @panos-is now that he is on-call and we brought this PR in a slightly different direction.

@rlizzo rlizzo requested review from awaelchli and tchaton August 8, 2022 15:57
@rlizzo rlizzo changed the title Feature: Add a list of drives to work spec Feature: Add s3 drive type Aug 8, 2022
Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@tchaton tchaton changed the title Feature: Add s3 drive type (app) Add s3 drive type Aug 8, 2022
@tchaton tchaton added this to the app:0.6 milestone Aug 8, 2022
@mergify mergify bot added the ready PRs ready to be merged label Aug 8, 2022
Copy link
Collaborator

@lantiga lantiga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should use the protocol (s3:// vs lit://) as an indication on how the drive should be mounted.
We should instead have an explicit flag for mounting, so we'll be able to mount non-s3:// locations (e.g. lit:// themselves, or GCP storage).

@panos-is
Copy link
Contributor Author

panos-is commented Aug 8, 2022

I don't think we should use the protocol (s3:// vs lit://) as an indication on how the drive should be mounted. We should instead have an explicit flag for mounting, so we'll be able to mount non-s3:// locations (e.g. lit:// themselves, or GCP storage).

Eventually yes, but there's no particular reason to frontload this work here. Lets add these features when we get to them instead of adding a mount flag but only supporting lit+false and s3+true here. That's bound to be somewhat confusing to the users.

@rlizzo
Copy link
Contributor

rlizzo commented Aug 9, 2022

@lantiga I've been going back and forth here, but I think I agree with @panos-is. When we have additional drive types and options, it'll make sense to introduce more flags, but for right now the proposed solution is simple to explain/understand & doesn't require us to introduce breaking changes in the near future when we have a demonstrated need for additional arguments & behaviors.

@lantiga
Copy link
Collaborator

lantiga commented Aug 9, 2022

My need is about the now though (while designing the training app): how can we mount a lit:// drive? Should we surface its underlying s3 bucket?

Here's the use case: I use a notebook to download a dataset from somewhere (not s3) and I use a lit:// Drive to store the data. Then I want to mount that data to a Work. How do I do that?

Same goes with: I have a data-generating process that puts data into a lit:// Drive. Then I want to mount the Drive into a Work to run a training job.

Let's try to figure out a way to enable this because it's going to be a pretty common use case once we roll out the app for training. I'm good with proceeding incrementally, but relying on protocol is guaranteed to not be future-proof in a short timespan.

@Borda Borda modified the milestones: app:0.6, app:0.7 Aug 9, 2022
@rlizzo rlizzo changed the title (app) Add s3 drive type (app) Add s3 drive type (1/2) Aug 9, 2022
@rlizzo rlizzo force-pushed the panos/grid-9730-add-drives-to-work-spec branch from d08e2f8 to 56aa999 Compare August 10, 2022 01:27
@rlizzo rlizzo force-pushed the panos/grid-9730-add-drives-to-work-spec branch from 56aa999 to 411d272 Compare August 10, 2022 15:13
@Borda Borda enabled auto-merge (squash) August 10, 2022 15:46
@rlizzo rlizzo disabled auto-merge August 10, 2022 16:33
@rlizzo rlizzo enabled auto-merge (squash) August 10, 2022 16:34
@rlizzo rlizzo merged commit 784b604 into master Aug 10, 2022
@rlizzo rlizzo deleted the panos/grid-9730-add-drives-to-work-spec branch August 10, 2022 20:07
rlizzo added a commit that referenced this pull request Aug 16, 2022
* Add S3 protocol and optimization field to the drive object

* Add a list of drives to the work specification

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add only protocol for s3 drives, no optimization arguments, and add tests

* added trailing slash criteria

* allow slash in s3 drives

* fix

* fixed test issues

Co-authored-by: Panos Lantavos-Stratigakis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rick Izzo <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Rick Izzo <[email protected]>
jessecambon pushed a commit to jessecambon/lightning that referenced this pull request Aug 16, 2022
* Add S3 protocol and optimization field to the drive object

* Add a list of drives to the work specification

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add only protocol for s3 drives, no optimization arguments, and add tests

* added trailing slash criteria

* allow slash in s3 drives

* fix

* fixed test issues

Co-authored-by: Panos Lantavos-Stratigakis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rick Izzo <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Rick Izzo <[email protected]>
@Borda Borda modified the milestones: app:0.7, app:0.6 Aug 22, 2022
Borda added a commit that referenced this pull request Sep 7, 2022
Borda added a commit that referenced this pull request Sep 8, 2022
@Borda Borda modified the milestones: app:0.6, app:0.7 Sep 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

app (removed) Generic label for Lightning App package feature Is an improvement or enhancement ready PRs ready to be merged

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

8 participants