-
-
Notifications
You must be signed in to change notification settings - Fork 688
Add typing to sage_input methods #40897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
b14d865 to
62b5191
Compare
|
The ruff failure is strange, I don't think that this is violating a linter rule. Things to try:
|
|
Documentation preview for this PR (built with commit c76581b; changes) is ready! 🎉 |
|
Thanks for looking into this. Adding the future imports fixed it, indeed! |
src/sage/geometry/fan.py
Outdated
| self._virtual_rays = PointCollection(virtual_rays, self.lattice()) | ||
|
|
||
| def _sage_input_(self, sib, coerced): | ||
| def _sage_input_(self, sib: SageInputBuilder, coerced: bool | Literal[2]) -> SageInputExpression: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you have Literal[2] here, where is this behaviour for _sage_input_ documented?
Also, Literal only works for Python literals. So Literal[2] means the int 2, not the Sage Integer 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's defined here:
sage/src/sage/misc/sage_input.py
Lines 56 to 66 in 899d400
| A ``_sage_input_`` method takes two parameters, conventionally named | |
| ``sib`` and ``coerced``. The first argument is a | |
| :class:`SageInputBuilder`; it has methods to build SIEs. The second | |
| argument, ``coerced``, is a boolean. This is only useful if your class | |
| is a subclass of :class:`Element` (although it is always present). If | |
| ``coerced`` is ``False``, then your method must generate an expression | |
| which will evaluate to a value of the correct type with the correct | |
| parent. If ``coerced`` is ``True``, then your method may generate an | |
| expression of a type that has a canonical coercion to your type; and if | |
| ``coerced`` is 2, then your method may generate an expression of a type | |
| that has a conversion to your type. |
not the Sage Integer 2.
I don't think there is a good way to express this as a type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not the Sage Integer 2.
I don't think there is a good way to express this as a type.
If Integer(2) is acceptable input (which I believe it is from some of the examples) then we can't use Literal[2] here. If only Integer(2) is acceptable then use Integer. If Integer(2) and int(2) are both acceptable then either Integer | int, or if you want to use Literal then Integer | Literal[2].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good, simple example of the sage vs python integer typing issue we were discussing in the other PR.
My impression is that python is not aiming to become strongly typed but that the purpose of these type annotations is to mainly help developers write better code. Applying this philosophy here, the typing for coerced should reveal at least these two 'bugs':
- Pass
1or3(or another integer) as an argument to sage_input - Assume in the implementation of sage_input that it's always a boolean
So in particular, if you have an integer and you would like to pass it as coerced, you would want to check that it's equal to 2 before going ahead, eg using:
Integer num = ...
if num == 2:
_sage_input(..., num)
else:
raise "expected num to be 2, but got ..."
At this point the type checker is able to deduct that num is the literal 2 (although it's not, it's only equal to it)
I feel like supporting (and highlighting the need) for such a pattern is more important than being 100% right about the typing info. One can always add a pyright: ignore comment if the typing is to strict/wrong, but one is sure to have covered the case correctly. Of course, you don't want to force to many such comments - it would be stupid to artificially restrict the possible input to int if half of sage's code is passing in Integer and that works well.
Curious what you think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ended up being a longer response than I expected, I've been thinking about type annotations a lot lately.
This is a good, simple example of the sage vs python integer typing issue we were discussing in the other PR.
Just thinking out loud, definitely don't do this in this PR and don't do this without a sage-devel discussion, but is there anything preventing Integer from being a subclass of int? It would solve a lot of annoyances with Integer. I'm guessing it would be too slow for such an important Sage object.
My impression is that python is not aiming to become strongly typed but that the purpose of these type annotations is to mainly help developers write better code.
Interesting, I think about it differently. My impression is that type annotations are for linters/type checkers to check that typing rules are respected without having to enforce type checking at runtime. But using Literal for parameters I think is beginning to veer into the territory of validating the values of inputs rather than just their types. So to me the utility of type annotations is the ability to have most of the benefits of strongly typed code without the runtime overhead by running a linter/type checker on the code which enforces the rules strictly. The more we adhere to this approach the more useful the type checkers will become (and the more bugs they'll be able to find). Perhaps user code does weirder things that respect the spirit but not the letter of type annotations (for example, a function that accepts Integer as a parameter but a user passes a Rational with denominator 1). Users shouldn't have to worry about details like that, and the nice thing about types not being checked at runtime is that something like that will (usually) work fine. I don't think many Sage users are running mypy or pyright on their Sage scripts. On the other hand, Sage developers should be expected to write slightly more explicit code that doesn't rely on the coercion system handling incorrect types (if only because relying on the coercion system will introduce overhead that could be avoided by more explicit code).
A big reason I favour the stricter annotations approach is that in theory it could be used by Cython to generate more efficient code one day (Cython's ability to use Python annotation is currently a bit limited though, but if we have strict type annotations then as Cython improves here so will the speed of Sage).
Something to consider: when Literal is used for a parameter, arguments cannot be a variable. So this is invalid:
def foo(a: Literal[5]) -> int:
return a + 1
b = 5
foo(b) # invalid, b is not a literalThat's kind of awkward, so I don't think I like Literal for parameters. It's probably best used only as a return type annotation for functions that return a constant (so things like .is_finite for objects that are always/never finite).
Curious what you think about it.
I think that we should use the most specific type annotation that is correct (doesn't require pyright: ignore). Having to use pyright: ignore defeats the purpose of type checking the function, the only time I'd use it is if there is a bug in the type checker and we don't want to wait for the type checker to get fixed before merging the PR. If it's legitimately too hard to come up with a correct annotation then we shouldn't annotate it. Or, just annotate it with the most specific correct thing (even if that annotation is more permissive than the function in reality). I don't think an annotation that is slightly too permissive is an issue. We probably have plenty of functions in Sage that accept only a positive Integer and check at runtime if the Integer is non-positive and raise an error if so, but I don't think there is a need to have a more specific PositiveInteger type.
Literal doesn't support Integer, so if a function accepts Integer then we can't use Literal there. I was discussing Literal with @fchapoton in #40972 and he thinks we should probably avoid it for now, and this situation seem like a good example of why we might want to avoid Literal. Personally I don't think Literal needs to always be avoided, but the only situation where I think it is likely to be useful is for return annotations, not parameter annotations (in #40972 (comment) I give an example of a hypothetical optimization Cython could make using Literal, but the Cython compiler doesn't currently have that optimization so it's a somewhat moot point).
(@fchapoton curious what you think about using Literal only for return but not parameter annotations.)
So for this PR, my suggestion is to change the annotation to bool | Integer | int, as this is the most specific correct annotation that avoids the Literal weirdness (even bool | Integer | Literal[2] is weird as the foo example above shows). Setting to needs work for now.
If you want to do a bit more as a separate PR, there is also my suggestion from #40634 (comment) to have a types.py module where we can define common type aliases/unions that are used in Sage (like Integer | int).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! I don't have too much time at the moment to properly respond, but please have a look at this playground where I played around with your example:
The foo(b) call is not marked as invalid in this example, as pyright is able to properly recognize that b is Literal[5]. Some other code like
num = 1
foo(num)
is properly marked as invalid.
(Note that the behavior under mypy might be different - in my experience pyright is usually the better option.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Note that the behavior under mypy might be different - in my experience pyright is usually the better option.)
Good to know, I did use mypy for that example (I happened to have mypy but not pyright already installed).
Still, my suggestion is the same. Let's not use Literal for parameters for now (at least not for integers because the whole int vs Integer thing makes it too complicated). I don't want to rely on implementation-specific behaviour of one of the two major type checkers. From a quick reading of the differences, it seems pyright is probably the best choice for our CI at least since it's faster right now, but it looks like mypy also has some advantages. And mypy might get improvements in the future that make it preferable (mypyc looks very interesting, but it's in alpha right now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A big reason I favour the stricter annotations approach is that in theory it could be used by Cython to generate more efficient code one day (Cython's ability to use Python annotation is currently a bit limited though, but if we have strict type annotations then as Cython improves here so will the speed of Sage).
It would be indeed nice if the typing info could be used to increase performance. The PEP that introduced typing in Python stated that as a non-goal and said
Using type hints for performance optimizations is left as an exercise for the reader.
I looked around and it seems typing info is still not used for performance. There were some initial discussions about using it to warm up the JIT compiler and there are some promising experiments like https://github.com/facebookincubator/cinder?tab=readme-ov-file#static-python. But all of those go into the direction of optimizing the performance without changing the runtime behavior (eg removing complete parts of the code execution path based on typing as in #40972 (comment) is not considered as far as I can see).
My impression is that python is not aiming to become strongly typed but that the purpose of these type annotations is to mainly help developers write better code.
Interesting, I think about it differently. My impression is that type annotations are for linters/type checkers to check that typing rules are respected without having to enforce type checking at runtime.
Okay, this is actually not that different to what I have in mind. I only would add "better intellisense" to the advantages of typing (eg if you type vector(...). your IDE is suggesting methods that are defined on the type that is returned by the vector function).
But using
Literalfor parameters I think is beginning to veer into the territory of validating the values of inputs rather than just their types.
I would use Literal only where the allowed space of inputs is very restricted, mostly in cases where you should actually use an enum but for historical reasons it's another set of fixed input values. For me, coerced is such a case as it only allows True/False/2.
So we essentially have two options:
coerced: bool | Literal[2]:- Good: complains about
3as input, and makes it clear that the code in the method doesn't need to worry about other integers - Bad: complains about
Int(2)
- Good: complains about
coerced: bool | int | Integer:- Good: accepts
Int(2) - Bad: but also
3orInt(3)(both to the callee and to the one implementing the method)
- Good: accepts
I honestly cannot come up with a scenario where one would need to input Int(2) instead of just a python 2. Just because most implementations actually handle the Int(2) case as well, doesn't mean the typing info should necessarily reflect that - in fact you can also pass floats like 2.00000000000000000000000000000000000001 into the method and it might work - but that doesn't mean it's desired. When it comes to finding bugs, the option using Literal seems to be more robust/helpful.
For me coerced is just an enum, with three values - that happen to be called True, False, 2.
Just to emphasize: I don't have a strong opinion about this particular case, and see the discussion more about a means to extract general principles that could guide how we use typing in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked around and it seems typing info is still not used for performance. There were some initial discussions about using it to warm up the JIT compiler and there are some promising experiments like https://github.com/facebookincubator/cinder?tab=readme-ov-file#static-python. But all of those go into the direction of optimizing the performance without changing the runtime behavior (eg removing complete parts of the code execution path based on typing as in #40972 (comment) is not considered as far as I can see).
Yes, it would change runtime behaviour and hence not happen in CPython. But it could happen in Cython or by something like mypyc, as these only aim to have correct type annotations not impact runtime behaviour. Cython does do some basic analysis to remove dead code, for example if False: ... is optimized out by Cython.
Just to emphasize: I don't have a strong opinion about this particular case, and see the discussion more about a means to extract general principles that could guide how we use typing in the future.
Yeah, it's an interesting case! The only part of this I feel strongly about is that we shouldn't rely on pyright-specific behaviour. Putting aside issues of CI usage limits, I think it would be reasonable to run both pyright and mypy on CI (perhaps pyright as a first pass, and if it passes then we run the slower but potentially more detailed mypy. There are some interesting things that only mypy can do, notably the currently alpha mypyc looks like it could be very useful for Sage once it matures.
For me
coercedis just an enum, with three values - that happen to be calledTrue, False, 2.
Agree.
So we essentially have two options:
* `coerced: bool | Literal[2]`: * Good: complains about `3` as input, and makes it clear that the code in the method doesn't need to worry about other integers * Bad: complains about `Int(2)` * `coerced: bool | int | Integer`: * Good: accepts `Int(2)` * Bad: but also `3` or `Int(3)` (both to the callee and to the one implementing the method)
There is a third option, and this would be my preferred option if we use Literal here:
coerced: bool | Integer | Literal[2]- Good: complains about
3as input, acceptsInteger(2). So it accepts all correct inputs, and rejects some incorrect inputs. - Bad: Also accepts
Integer(3). The type annotation is somewhat confusing, but one can just read the docstring to understand it.
- Good: complains about
I would also add a = 2; foo(a) not working in mypy to the "bad" list for all of these. This specific function seems unlikely to be called in a way where this is an issue though, so I can let it go as long as we understand that Literal shouldn't be used in places where one would reasonably use a variable as the argument (unless mypy behaviour changes to allow this of course, and I hope it does).
I honestly cannot come up with a scenario where one would need to input
Int(2)instead of just a python2. Just because most implementations actually handle theInt(2)case as well, doesn't mean the typing info should necessarily reflect that - in fact you can also pass floats like2.00000000000000000000000000000000000001into the method and it might work - but that doesn't mean it's desired. When it comes to finding bugs, the option using Literal seems to be more robust/helpful.
The reason to accept Integer is that the preparser changes 2 to Integer(2). However looking at it again now, is SageInputBuilder.__call__ the only path to _sage_input_ through the public API? If so, then my point about the preparser is mostly moot if we just convert Integer to int in SageInputBuilder.__call__ (but the annotation for __call__ would still need to allow for Integer). It's only mostly moot because some of our doctests call _sage_input_ directly with Sage doctests, so we would be testing it with Integer(2) there, but I'm not sure if we use 2 in any of those tests.
Do you know if there is anywhere in the code that we actually use a value of 2 for coereced? I couldn't find any places where we do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be a reasonable option:
bool | Literal[2]for_sage_input_since it is internal.bool | int | IntegerforSageInputBuilder.__call__since it is public (and thus can be called with the preparser in effect). Validate thatcoercedis eitherboolor2at runtime. Ifcoercedis anIntegerconvert it to anintbefore calling_sage_input_.
There is still the matter of if/where 2 is currently used and any potential doctests that are testing _sage_input_ with Integer(2) directly.
|
Anyway, this looks good other than the |
|
I like the type alias, that's quite nice and makes things more readable. Also provides one place to modify if we come to a consensus on the I only have two more suggestions:
# In case this method was called with `Integer(2)` because the preparser was in effect.
if coerced == 2:
coerced = 2 |
|
Needs work because of test failures and the unimplemented suggested changes in my previous comment. |
vincentmacri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarifying my remaining suggestion. Also offering an alternative to my suggestion.
src/sage/misc/sage_input.py
Outdated
| self._locals = {} | ||
|
|
||
| def __call__(self, x, coerced: bool | Literal[2] = False): | ||
| def __call__(self, x, coerced: CoercionMode = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've been thinking about this a bit. If you really want to use Literal and don't want to allow Integer here, how about adding an INPUT block to the docstring. Mention in the docstring that if the preparser is in effect this should be called with coerced=2r rather than with coerced=2. If we have any doctests that use coerced=2 then change them to coerced=2r (if you do a find and replace for this take into account that this could be called without specifying the name of the optional parameter).
I'd still prefer CoercionMode | Integer here because of the preparser thing, and a conversion to int in the function because it calls functions that only accept CoercionMode, but I can live with explaining this one in the docstring. This is a very weird function though, let's not make a habit of doing things like this.
Declare every sage_input method as:
📝 Checklist
⌛ Dependencies