Skip to content

Conversation

@gaborbernat
Copy link
Contributor

@gaborbernat gaborbernat commented Oct 17, 2020

Create the isolated build environment upfront to avoid executable, pip executable or path not yet set. Remove unused code segments.

Signed-off-by: Bernát Gábor [email protected]

As promised in #140 this is the follow-up PR to clean up the env code a bit and avoid some redundant type casts. There have been many code segments we don't need actually anymore with the switch to venv/virtualenv. While doing it I've taken the liberty to add a few tests to get 100% coverage for the module.

@gaborbernat gaborbernat requested a review from FFY00 as a code owner October 17, 2020 12:24
@gaborbernat gaborbernat added the enhancement New feature or request label Oct 17, 2020
@FFY00
Copy link
Member

FFY00 commented Oct 18, 2020

I am fine with the refactoring, as well as moving the environment creation to __init__, but what warrants breaking the API? Is there any good motivation behind that or is it just a nice to have? I'd greatly prefer to keep API compatibility.

@gaborbernat
Copy link
Contributor Author

gaborbernat commented Oct 18, 2020

I am fine with the refactoring, as well as moving the environment creation to __init__, but what warrants breaking the API?

We haven't yet officially released the tool yet (hence version 0.0.4) and as such, I don't think we are in a state yet where we must ensure API compatibility with the earlier version. Seconds I feel like the API was created as optimized for the old way of isolation builds, and as such a bit incompatible with where we landed now.

The biggest problem with the current API is my comment above:

We could put this to be a class method but wasn't sure if it's a good idea to annotate a class method with a contextmanager. And I don't think this class makes sense without always using enter right after creation.

The earlier API IMHO is not good with the constraint that this class must be entered to be used. It allowed people to do:

env = IsolatedBuildEnv.for_x()

# here basically any method call on env is a really bad idea 
# and undefined behavior, and as such IMHO the API itself should
# not allow it rather than adding defensive error raises at runtime

with env:
	pass 

The new API achieves exactly this.

By the way how much of the API you consider public? I'd personally restrict it to just the main method until people provide use cases where they'd need more. Making the API wider than we must always comes back to bite us...

@FFY00
Copy link
Member

FFY00 commented Oct 18, 2020

The earlier API IMHO is not good with the constraint that this class must be entered to be used. It allowed people to do:

env = IsolatedBuildEnv.for_x()

# here basically any method call on env is a really bad idea 
# and undefined behavior, and as such IMHO the API itself should
# not allow it rather than adding defensive error raises at runtime

with env:
	pass 

The new API achieves exactly this.

By the way how much of the API you consider public? I'd personally restrict it to just the main method, until people provide use cases where they'd need more.

I agree, and that is what I suggested in the code comment.

Overall I do not like the unbundling of the environment creation from IsolatedEnvironment. Let's keep it there, please 😊

@FFY00
Copy link
Member

FFY00 commented Oct 18, 2020

Sorry, misread your comment.

@gaborbernat
Copy link
Contributor Author

The earlier API IMHO is not good with the constraint that this class must be entered to be used. It allowed people to do:

env = IsolatedBuildEnv.for_x()

# here basically any method call on env is a really bad idea 
# and undefined behavior, and as such IMHO the API itself should
# not allow it rather than adding defensive error raises at runtime

with env:
	pass 

The new API achieves exactly this.

By the way how much of the API you consider public? I'd personally restrict it to just the main method, until people provide use cases where they'd need more.

I agree, and that is what I suggested in the code comment.

Overall I do not like the unbundling of the environment creation from IsolatedEnvironment. Let's keep it there, please 😊

But calling env.build without entering the context of the class is also broken behavior...

@FFY00
Copy link
Member

FFY00 commented Oct 18, 2020

here basically any method call on env is a really bad idea
and undefined behavior, and as such IMHO the API itself should
not allow it rather than adding defensive error raises at runtime

The only public method in IsolatedEnvironment is install, which can be called outside the environment. Could you elaborate?

But calling env.build without entering the context of the class is also broken behavior...

What is this env.build?

@gaborbernat
Copy link
Contributor Author

I'll amend the PR in the morning.

@FFY00
Copy link
Member

FFY00 commented Oct 18, 2020

@gaborbernat I hope I am not being too pushy, we can keep discussing this if you want.

@gaborbernat
Copy link
Contributor Author

What is this env.build?

My point was that achieving the workaround for pypa/pyproject-hooks#93 we patched the sys.executable within our context manager of the class. If one would call builder.build without using the context manager (but just creates the environment) things can still be misused. That being said with the merge and release of that PR this is no longer the case, so I've now adopted the release of pypa/pyproject-hooks#93 into this PR to avoid this problem.

src/build/env.py Outdated
:param exc_val: the value of exception raised (if any)
:param exc_tb: the traceback of exception raised (if any)
"""
self.close()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users who want to reuse an isolated build environment multiple times should not invoke this class's context manager. However, they must invoke close at the end of their application to trigger the cleanup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should call close here. This should deactivate the environment, not close it.

We should call close instead in __del__. I am not even sure if there is need for an explicit destructor. IMO we should just rename close to __del__.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very much against the language design and guarantees. Relying on __del__ to be called is just not supported/guaranteed within the Python language. See here https://bugs.python.org/issue39360 for more information about a nasty bug this can cause, and where core developers express that in Python you should always explicitly cleanup and never rely on __del__ to be called for resources.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to explicitly clean up the temporary directory, you can call the destructor. But you are not necessarily required to. We are not relying on __del__ to be called for anything.
If you want to keep env.close() as an alternative to del env, sure.

The main point here is that I do not think we should be forcefully cleaning up the environment when we exit. Maybe we can pass an argument to opt in/out?

Copy link
Contributor Author

@gaborbernat gaborbernat Oct 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only thing you should be cleaning up within a __del__ method is memory. Not folders and files.

The main point here is that I do not think we should be forcefully cleaning up the environment when we exit

Why? We created it just to be able to perform the build. Why would we not clean it up? At least within build. If we're talking about some custom application they can always use the API and not call the close /context amanger if they want to reuse across runs.

We are not relying on __del__ to be called for anything.

If we rename the __enter__ to __del__ we make this application non-deterministic. The close might or might not be called, there are no guarantees.

self._remove_paths = [] # type: List[str]
self._executable = _executable
self._old_executable = None # type: Optional[str]
self._pip_executable = None # type: Optional[str]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've realized that pip_executable is assuming too much. In the case of conda for example, the user would be most likely to use conda. To acknowledge this I've renamed this to instal_executable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep the original name. If you want to use conda you should subclass IsolatedEnvironment, where you can have a _conda_executable or something else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, potentially. It's a question of design. But then the base should be an abstract class because within conda you should not have pip.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like an abstract base class IsolatedEnvironment that defines the API as expected by the project builder, and then we add implementation IsolatedEnvironmentVenvPip? Because then yes naming it directly pip_executable makes a lot of sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this instead of [self._executable, "-m", "pip"]?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would 'sys.executable', '-m', 'pip' not work in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in general, hence why is a config value. Sometimes the host doesn't have pip, and in such cases, it's good to fall back to ensurepip. Why you don't like it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer -m as a rule since entry points add an additional layer of indirection that can easily go wrong in edge cases.

I also wanted to reduce the number of instance members since the more there are the more possibility things can go wrong, but from your explaination above, that’s no longer an issue now since there’s no way to avoid it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not we're still using -m (if you read the install method implementation), the pip_executable is not actually the pip executable script, but the python executable having the pip inside 🤔 The name is not best, but it's an implementation detail (part of a private class).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the clarification.

@gaborbernat gaborbernat requested a review from pganssle October 19, 2020 06:05
@gaborbernat
Copy link
Contributor Author

@FFY00 this is now ready for review.

src/build/env.py Outdated

def __enter__(self): # type: () -> IsolatedEnvironment
"""Enable the isolated build environment"""
return self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should adjust PATH here so that you can call scripts from the environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm strongly against that, but can you expand on why?

My reasoning is IMHO mangling with the path is always a really bad idea. See https://bugs.python.org/issue42041 for a recent bug and discussion on why.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Then we should implement a get_script method. I can do this in a later PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why and where would you need it though? Can you expand on that?

Copy link
Member

@FFY00 FFY00 Oct 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need it if I want to call a script that is provided by a package.

One could argue that you should be able to call entrypoints directly without need for a script, but they are not always public API. And regardless, per PEP 427, packages may bring their own scripts, so I have no other alternative than to call the script directly.

Copy link
Contributor Author

@gaborbernat gaborbernat Oct 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need it if I want to call a script that is provided by a package.

And regardless, per PEP 427, packages may bring their own scripts, so I have no other alternative than to call the script directly.

The above still doesn't answer where and why you would want to call a script?

P.S. Package backends per PEP-517 are always called via their Python API and not console scripts (per PEP-517). And the way I see it, the build package should only interact with package backends via that API and provision the environment for those backends (which is where the install executable comes in the picture).

src/build/env.py Outdated
:param exc_val: the value of exception raised (if any)
:param exc_tb: the traceback of exception raised (if any)
"""
self.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should call close here. This should deactivate the environment, not close it.

We should call close instead in __del__. I am not even sure if there is need for an explicit destructor. IMO we should just rename close to __del__.

@gaborbernat
Copy link
Contributor Author

@FFY00 @pganssle made the requested changes 👍 please review again.

Copy link
Member

@FFY00 FFY00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Just some minor comments.

setup.cfg Outdated
exclude_lines =
\#\s*pragma: no cover
^\s*raise NotImplementedError\b
^if __name__ == ['"]__main__['"]:$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? We test this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Inside the __main__ file. This makes that line covered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we test that via integration tests, it should already be covered 🤔

def _get_env_path(self, path): # type: (str) -> Optional[str]
"""
Returns sysconfig path from our environment
def _create_isolated_env(path): # type: (str) -> Tuple[str, str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't make sense for this to be in IsolatedEnvBuilder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, maybe, however, these are implementation detail methods. If we move them, you'd need to make these static methods and increase the indent by one level. So overall, probably not worth the change.


[coverage:report]
exclude_lines =
\#\s*pragma: no cover
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the core goal is to add raise NotImplementedError to be automatically marked with no coverage needed for them (because the metaclass makes it impossible to hit it). By default, coverage only excludes pragma no cover lines, so we want to extend that. There's no extend the exclude method, though, override of the exclude lines matcher, so we need to set it again.

Create the isolated build environment upfront to avoid executable, pip
executable or path not yet set. Remove unused code segments.

Signed-off-by: Bernát Gábor <[email protected]>
Signed-off-by: Bernát Gábor <[email protected]>
Signed-off-by: Bernát Gábor <[email protected]>
Signed-off-by: Bernát Gábor <[email protected]>
Signed-off-by: Bernát Gábor <[email protected]>
@gaborbernat gaborbernat merged commit 82bc16c into pypa:master Oct 21, 2020
@gaborbernat gaborbernat deleted the env-refact branch October 21, 2020 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants