Skip to content

Conversation

@tobiasdiez
Copy link
Contributor

@tobiasdiez tobiasdiez commented Sep 28, 2025

Declare every sage_input method as:

def _sage_input_(self, sib: SageInputBuilder, coerced: bool | Literal[2]) -> SageInputExpression:

📝 Checklist

  • The title is concise and informative.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation and checked the documentation preview.

⌛ Dependencies

@vincentmacri
Copy link
Member

The ruff failure is strange, I don't think that this is violating a linter rule. Things to try:

  1. Update ruff and see if it still fails (so rebase this on top of your other PR that updates the dependencies, making that PR a dependency of this one). There are recent PRs merged into ruff that mention changes to this rule, perhaps it has been fixed already.
  2. According to TC004 diagnostics/docs should clarify that type hints can be executed at run-time astral-sh/ruff#16490 maybe we need from __future__ import annotations?
  3. My only other idea is a bug in ruff that we should report if 1. and 2. both don't help.

@github-actions
Copy link

github-actions bot commented Oct 4, 2025

Documentation preview for this PR (built with commit c76581b; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

@tobiasdiez
Copy link
Contributor Author

Thanks for looking into this. Adding the future imports fixed it, indeed!

self._virtual_rays = PointCollection(virtual_rays, self.lattice())

def _sage_input_(self, sib, coerced):
def _sage_input_(self, sib: SageInputBuilder, coerced: bool | Literal[2]) -> SageInputExpression:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you have Literal[2] here, where is this behaviour for _sage_input_ documented?

Also, Literal only works for Python literals. So Literal[2] means the int 2, not the Sage Integer 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's defined here:

A ``_sage_input_`` method takes two parameters, conventionally named
``sib`` and ``coerced``. The first argument is a
:class:`SageInputBuilder`; it has methods to build SIEs. The second
argument, ``coerced``, is a boolean. This is only useful if your class
is a subclass of :class:`Element` (although it is always present). If
``coerced`` is ``False``, then your method must generate an expression
which will evaluate to a value of the correct type with the correct
parent. If ``coerced`` is ``True``, then your method may generate an
expression of a type that has a canonical coercion to your type; and if
``coerced`` is 2, then your method may generate an expression of a type
that has a conversion to your type.

not the Sage Integer 2.

I don't think there is a good way to express this as a type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not the Sage Integer 2.

I don't think there is a good way to express this as a type.

If Integer(2) is acceptable input (which I believe it is from some of the examples) then we can't use Literal[2] here. If only Integer(2) is acceptable then use Integer. If Integer(2) and int(2) are both acceptable then either Integer | int, or if you want to use Literal then Integer | Literal[2].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good, simple example of the sage vs python integer typing issue we were discussing in the other PR.

My impression is that python is not aiming to become strongly typed but that the purpose of these type annotations is to mainly help developers write better code. Applying this philosophy here, the typing for coerced should reveal at least these two 'bugs':

  1. Pass 1 or 3 (or another integer) as an argument to sage_input
  2. Assume in the implementation of sage_input that it's always a boolean

So in particular, if you have an integer and you would like to pass it as coerced, you would want to check that it's equal to 2 before going ahead, eg using:

Integer num = ... 
if num == 2:
    _sage_input(..., num)
else:
    raise "expected num to be 2, but got ..."

At this point the type checker is able to deduct that num is the literal 2 (although it's not, it's only equal to it)

I feel like supporting (and highlighting the need) for such a pattern is more important than being 100% right about the typing info. One can always add a pyright: ignore comment if the typing is to strict/wrong, but one is sure to have covered the case correctly. Of course, you don't want to force to many such comments - it would be stupid to artificially restrict the possible input to int if half of sage's code is passing in Integer and that works well.

Curious what you think about it.

Copy link
Member

@vincentmacri vincentmacri Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ended up being a longer response than I expected, I've been thinking about type annotations a lot lately.

This is a good, simple example of the sage vs python integer typing issue we were discussing in the other PR.

Just thinking out loud, definitely don't do this in this PR and don't do this without a sage-devel discussion, but is there anything preventing Integer from being a subclass of int? It would solve a lot of annoyances with Integer. I'm guessing it would be too slow for such an important Sage object.

My impression is that python is not aiming to become strongly typed but that the purpose of these type annotations is to mainly help developers write better code.

Interesting, I think about it differently. My impression is that type annotations are for linters/type checkers to check that typing rules are respected without having to enforce type checking at runtime. But using Literal for parameters I think is beginning to veer into the territory of validating the values of inputs rather than just their types. So to me the utility of type annotations is the ability to have most of the benefits of strongly typed code without the runtime overhead by running a linter/type checker on the code which enforces the rules strictly. The more we adhere to this approach the more useful the type checkers will become (and the more bugs they'll be able to find). Perhaps user code does weirder things that respect the spirit but not the letter of type annotations (for example, a function that accepts Integer as a parameter but a user passes a Rational with denominator 1). Users shouldn't have to worry about details like that, and the nice thing about types not being checked at runtime is that something like that will (usually) work fine. I don't think many Sage users are running mypy or pyright on their Sage scripts. On the other hand, Sage developers should be expected to write slightly more explicit code that doesn't rely on the coercion system handling incorrect types (if only because relying on the coercion system will introduce overhead that could be avoided by more explicit code).

A big reason I favour the stricter annotations approach is that in theory it could be used by Cython to generate more efficient code one day (Cython's ability to use Python annotation is currently a bit limited though, but if we have strict type annotations then as Cython improves here so will the speed of Sage).


Something to consider: when Literal is used for a parameter, arguments cannot be a variable. So this is invalid:

def foo(a: Literal[5]) -> int:
    return a + 1

b = 5
foo(b)  # invalid, b is not a literal

That's kind of awkward, so I don't think I like Literal for parameters. It's probably best used only as a return type annotation for functions that return a constant (so things like .is_finite for objects that are always/never finite).

Curious what you think about it.

I think that we should use the most specific type annotation that is correct (doesn't require pyright: ignore). Having to use pyright: ignore defeats the purpose of type checking the function, the only time I'd use it is if there is a bug in the type checker and we don't want to wait for the type checker to get fixed before merging the PR. If it's legitimately too hard to come up with a correct annotation then we shouldn't annotate it. Or, just annotate it with the most specific correct thing (even if that annotation is more permissive than the function in reality). I don't think an annotation that is slightly too permissive is an issue. We probably have plenty of functions in Sage that accept only a positive Integer and check at runtime if the Integer is non-positive and raise an error if so, but I don't think there is a need to have a more specific PositiveInteger type.

Literal doesn't support Integer, so if a function accepts Integer then we can't use Literal there. I was discussing Literal with @fchapoton in #40972 and he thinks we should probably avoid it for now, and this situation seem like a good example of why we might want to avoid Literal. Personally I don't think Literal needs to always be avoided, but the only situation where I think it is likely to be useful is for return annotations, not parameter annotations (in #40972 (comment) I give an example of a hypothetical optimization Cython could make using Literal, but the Cython compiler doesn't currently have that optimization so it's a somewhat moot point).

(@fchapoton curious what you think about using Literal only for return but not parameter annotations.)


So for this PR, my suggestion is to change the annotation to bool | Integer | int, as this is the most specific correct annotation that avoids the Literal weirdness (even bool | Integer | Literal[2] is weird as the foo example above shows). Setting to needs work for now.

If you want to do a bit more as a separate PR, there is also my suggestion from #40634 (comment) to have a types.py module where we can define common type aliases/unions that are used in Sage (like Integer | int).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! I don't have too much time at the moment to properly respond, but please have a look at this playground where I played around with your example:

https://pyright-play.net/?code=GYJw9gtgBALgngBwJYDsDmUkQWEMoAySMApiAIYA2AsAFB0AmJwUwYYAFOQFyHFlUA2gFYAugEooAWgB8mFDG50oKqCBIwAriBRRyUANRQAjHToAjKAF4owum07nJUAMTyAblSQMANFEtIAM5QKGD4%2BpT8FDT0tCia0DamtA4c8RDiZrSopGi8OdZQAORFrlCBkBoAFqgYqJ6RvlBhVWQA7kEksK3yudY2wpjBKCTuZFCjVJrkpAx6wQhwIEhoVfjqAMZgaChIAF4kwTBVM5j4G%2BQoofgnCAgkKHTqY1QA%2BvD3HDkkaJJuOfZ2F8FD9MrEkCxvhgrAMlLRVGpRiQ3h8SMDcn8%2BKRoiJRMpVKkoWCgA

The foo(b) call is not marked as invalid in this example, as pyright is able to properly recognize that b is Literal[5]. Some other code like

num = 1
foo(num)

is properly marked as invalid.

(Note that the behavior under mypy might be different - in my experience pyright is usually the better option.)

Copy link
Member

@vincentmacri vincentmacri Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note that the behavior under mypy might be different - in my experience pyright is usually the better option.)

Good to know, I did use mypy for that example (I happened to have mypy but not pyright already installed).

Still, my suggestion is the same. Let's not use Literal for parameters for now (at least not for integers because the whole int vs Integer thing makes it too complicated). I don't want to rely on implementation-specific behaviour of one of the two major type checkers. From a quick reading of the differences, it seems pyright is probably the best choice for our CI at least since it's faster right now, but it looks like mypy also has some advantages. And mypy might get improvements in the future that make it preferable (mypyc looks very interesting, but it's in alpha right now).

Copy link
Contributor Author

@tobiasdiez tobiasdiez Oct 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A big reason I favour the stricter annotations approach is that in theory it could be used by Cython to generate more efficient code one day (Cython's ability to use Python annotation is currently a bit limited though, but if we have strict type annotations then as Cython improves here so will the speed of Sage).

It would be indeed nice if the typing info could be used to increase performance. The PEP that introduced typing in Python stated that as a non-goal and said

Using type hints for performance optimizations is left as an exercise for the reader.

I looked around and it seems typing info is still not used for performance. There were some initial discussions about using it to warm up the JIT compiler and there are some promising experiments like https://github.com/facebookincubator/cinder?tab=readme-ov-file#static-python. But all of those go into the direction of optimizing the performance without changing the runtime behavior (eg removing complete parts of the code execution path based on typing as in #40972 (comment) is not considered as far as I can see).

My impression is that python is not aiming to become strongly typed but that the purpose of these type annotations is to mainly help developers write better code.

Interesting, I think about it differently. My impression is that type annotations are for linters/type checkers to check that typing rules are respected without having to enforce type checking at runtime.

Okay, this is actually not that different to what I have in mind. I only would add "better intellisense" to the advantages of typing (eg if you type vector(...). your IDE is suggesting methods that are defined on the type that is returned by the vector function).

But using Literal for parameters I think is beginning to veer into the territory of validating the values of inputs rather than just their types.

I would use Literal only where the allowed space of inputs is very restricted, mostly in cases where you should actually use an enum but for historical reasons it's another set of fixed input values. For me, coerced is such a case as it only allows True/False/2.


So we essentially have two options:

  • coerced: bool | Literal[2]:
    • Good: complains about 3 as input, and makes it clear that the code in the method doesn't need to worry about other integers
    • Bad: complains about Int(2)
  • coerced: bool | int | Integer:
    • Good: accepts Int(2)
    • Bad: but also 3 or Int(3) (both to the callee and to the one implementing the method)

I honestly cannot come up with a scenario where one would need to input Int(2) instead of just a python 2. Just because most implementations actually handle the Int(2) case as well, doesn't mean the typing info should necessarily reflect that - in fact you can also pass floats like 2.00000000000000000000000000000000000001 into the method and it might work - but that doesn't mean it's desired. When it comes to finding bugs, the option using Literal seems to be more robust/helpful.

For me coerced is just an enum, with three values - that happen to be called True, False, 2.


Just to emphasize: I don't have a strong opinion about this particular case, and see the discussion more about a means to extract general principles that could guide how we use typing in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked around and it seems typing info is still not used for performance. There were some initial discussions about using it to warm up the JIT compiler and there are some promising experiments like https://github.com/facebookincubator/cinder?tab=readme-ov-file#static-python. But all of those go into the direction of optimizing the performance without changing the runtime behavior (eg removing complete parts of the code execution path based on typing as in #40972 (comment) is not considered as far as I can see).

Yes, it would change runtime behaviour and hence not happen in CPython. But it could happen in Cython or by something like mypyc, as these only aim to have correct type annotations not impact runtime behaviour. Cython does do some basic analysis to remove dead code, for example if False: ... is optimized out by Cython.

Just to emphasize: I don't have a strong opinion about this particular case, and see the discussion more about a means to extract general principles that could guide how we use typing in the future.

Yeah, it's an interesting case! The only part of this I feel strongly about is that we shouldn't rely on pyright-specific behaviour. Putting aside issues of CI usage limits, I think it would be reasonable to run both pyright and mypy on CI (perhaps pyright as a first pass, and if it passes then we run the slower but potentially more detailed mypy. There are some interesting things that only mypy can do, notably the currently alpha mypyc looks like it could be very useful for Sage once it matures.

For me coerced is just an enum, with three values - that happen to be called True, False, 2.

Agree.

So we essentially have two options:

* `coerced: bool | Literal[2]`:
  
  * Good: complains about `3` as input, and makes it clear that the code in the method doesn't need to worry about other integers
  * Bad: complains about `Int(2)`

* `coerced: bool | int | Integer`:
  
  * Good: accepts `Int(2)`
  * Bad: but also `3` or `Int(3)` (both to the callee and to the one implementing the method)

There is a third option, and this would be my preferred option if we use Literal here:

  • coerced: bool | Integer | Literal[2]
    • Good: complains about 3 as input, accepts Integer(2). So it accepts all correct inputs, and rejects some incorrect inputs.
    • Bad: Also accepts Integer(3). The type annotation is somewhat confusing, but one can just read the docstring to understand it.

I would also add a = 2; foo(a) not working in mypy to the "bad" list for all of these. This specific function seems unlikely to be called in a way where this is an issue though, so I can let it go as long as we understand that Literal shouldn't be used in places where one would reasonably use a variable as the argument (unless mypy behaviour changes to allow this of course, and I hope it does).

I honestly cannot come up with a scenario where one would need to input Int(2) instead of just a python 2. Just because most implementations actually handle the Int(2) case as well, doesn't mean the typing info should necessarily reflect that - in fact you can also pass floats like 2.00000000000000000000000000000000000001 into the method and it might work - but that doesn't mean it's desired. When it comes to finding bugs, the option using Literal seems to be more robust/helpful.

The reason to accept Integer is that the preparser changes 2 to Integer(2). However looking at it again now, is SageInputBuilder.__call__ the only path to _sage_input_ through the public API? If so, then my point about the preparser is mostly moot if we just convert Integer to int in SageInputBuilder.__call__ (but the annotation for __call__ would still need to allow for Integer). It's only mostly moot because some of our doctests call _sage_input_ directly with Sage doctests, so we would be testing it with Integer(2) there, but I'm not sure if we use 2 in any of those tests.

Do you know if there is anywhere in the code that we actually use a value of 2 for coereced? I couldn't find any places where we do.

Copy link
Member

@vincentmacri vincentmacri Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be a reasonable option:

  • bool | Literal[2] for _sage_input_ since it is internal.
  • bool | int | Integer for SageInputBuilder.__call__ since it is public (and thus can be called with the preparser in effect). Validate that coerced is either bool or 2 at runtime. If coerced is an Integer convert it to an int before calling _sage_input_.

There is still the matter of if/where 2 is currently used and any potential doctests that are testing _sage_input_ with Integer(2) directly.

@vincentmacri
Copy link
Member

Anyway, this looks good other than the Literal issue discussed above. Once that is fixed I'll give it one last read and unless I've missed something I'll then set it to positive review.

@vincentmacri
Copy link
Member

I like the type alias, that's quite nice and makes things more readable. Also provides one place to modify if we come to a consensus on the int versus Integer thing later on that doesn't agree with what you've done here.

I only have two more suggestions:

  1. Change __call__ in sage_input.py to coerced: CoercionMode | Integer, because of the preparser issue I mentioned. I think this is the only public method here, so it's the only one that could reasonably be called with the preparser in effect. You can mention in the docstring why the annotation allows Integer.
  2. In __call__ in sage_input.py, add a check and conversion from Integer(2) to 2. Something like this should be fine:
# In case this method was called with `Integer(2)` because the preparser was in effect.
if coerced == 2:
    coerced = 2

@vincentmacri
Copy link
Member

Needs work because of test failures and the unimplemented suggested changes in my previous comment.

Copy link
Member

@vincentmacri vincentmacri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarifying my remaining suggestion. Also offering an alternative to my suggestion.

self._locals = {}

def __call__(self, x, coerced: bool | Literal[2] = False):
def __call__(self, x, coerced: CoercionMode = False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've been thinking about this a bit. If you really want to use Literal and don't want to allow Integer here, how about adding an INPUT block to the docstring. Mention in the docstring that if the preparser is in effect this should be called with coerced=2r rather than with coerced=2. If we have any doctests that use coerced=2 then change them to coerced=2r (if you do a find and replace for this take into account that this could be called without specifying the name of the optional parameter).

I'd still prefer CoercionMode | Integer here because of the preparser thing, and a conversion to int in the function because it calls functions that only accept CoercionMode, but I can live with explaining this one in the docstring. This is a very weird function though, let's not make a habit of doing things like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants