Expensive eager `__repr__` calls when serializing untagged unions

Hi!

I was doing some profiling recently and noticed that our application was spending a lot of time running `__repr__()`s inside `model_dump_json()`:

![Image](https://github.com/user-attachments/assets/cca699a8-0da9-4423-b31a-ad841c47af64)

(snippet of a flamegraph at the bottom of the call stack, sorry, can't show the full graph :( )

It turns out that the model we're serializing has an untagged union in it. Here's a reproduction where I've got an example model that does have the discriminator, and one that doesn't (yes, we really should have a discriminator here in our app code, but I think the performance issue is not really about whether we have a discriminator, it's purely about the order of operations in the non-tagged-union serialization code).

```python
from typing import Union, Annotated, Literal

from pydantic import BaseModel, Field

class A(BaseModel):
    type: Literal["a"] = "a"

    def __repr__(self):
        print("A.__repr__ called")
        super().__repr__()


class B(BaseModel):
    type: Literal["b"] = "b"

    def __repr__(self):
        print("B.__repr__ called")
        super().__repr__()


AB_NO_DISCRIMINATOR = Annotated[Union[A, B], Field(...)]
AB_WITH_DISCRIMINATOR = Annotated[Union[A, B], Field(..., discriminator="type")]


class NotDiscriminated(BaseModel):
    field: list[AB_NO_DISCRIMINATOR]


class Discriminated(BaseModel):
    field: list[AB_WITH_DISCRIMINATOR]


a = A()
b = B()

# When it's not a discriminated union,
# the reprs of the models that aren't first in the union get evaluated during serialization!

not_discriminated = NotDiscriminated(field=[a, b])
discriminated = Discriminated(field=[a, b])

print("##################")
print(f"{not_discriminated.model_dump_json()=}")

print("##################")
print(f"{discriminated.model_dump_json()=}")
```

If you run this, you'll see:
```
##################
B.__repr__ called
not_discriminated.model_dump_json()='{"field":[{"type":"a"},{"type":"b"}]}'
##################
discriminated.model_dump_json()='{"field":[{"type":"a"},{"type":"b"}]}'
```

This is surprising because I would only expect to need `B`'s repr during serialization if something went wrong and an error message was generated, but there are no errors or warnings generated by this example. Note that `A` appears first in the `Union` and its repr is not called.

I started tracing through the code and it looks like what's happening is that because we have an untagged union, we end up in this loop trying to find the serializer that we should use: https://github.com/pydantic/pydantic-core/blob/298e4eadc7e3745ae94309e6dc6e3855e7e50b62/src/serializers/type_serializers/union.rs#L87-L92

For the first item in the `field`, which is an `A`, that loop first tries to serialize it as an `A` since that's the first serializer in the union. This eventually gets to https://github.com/pydantic/pydantic-core/blob/8c59fa2956f224a46ecebe3ddf22400d92c497c5/src/serializers/type_serializers/model.rs#L199-L201 , which is allowed via https://github.com/pydantic/pydantic-core/blob/8c59fa2956f224a46ecebe3ddf22400d92c497c5/src/serializers/type_serializers/model.rs#L130 because the first item is in fact an `A`. 

However, for the second item in the list, which is a `B`, that first loop iteration still tries to serialize it using `A`'s serializer first (since `A` is first in the union). It gets to the same point, but `allow_value` fails, putting us in the `else` case here https://github.com/pydantic/pydantic-core/blob/8c59fa2956f224a46ecebe3ddf22400d92c497c5/src/serializers/type_serializers/model.rs#L202-L205 . It seems like `extra.check` is enabled at this point, because https://github.com/pydantic/pydantic-core/blob/8c59fa2956f224a46ecebe3ddf22400d92c497c5/src/serializers/extra.rs#L417-L426 runs and generates an error message about unexpected types, which includes the repr of the object - this is where the repr call happens!

However, that `Err` bounces back up to the serialization choice loop at https://github.com/pydantic/pydantic-core/blob/298e4eadc7e3745ae94309e6dc6e3855e7e50b62/src/serializers/type_serializers/union.rs#L87-L92 , which stashes it away and advances to the next potential serializer, which is for `B` and therefore succeeds on the `B` instance. If all serializers had failed we would need that error message, but since one succeeded, it ended up being irrelevant. The real union in our app has many options, each of which is a somewhat-nested model, so that's why we ended up spending so much time evaluating reprs.

If there's a discriminator, this all gets bypassed because there's no loop to discover the right serializer. Obviously it would be preferable to have a tagged union here, but it seems like the work of generating the reprs for errors that won't be used is still unnecessary in the case where you don't have one for whatever reason. No reason for untagged unions to be slower than they can be!

	for comb_serializer in choices {
	match selector(comb_serializer, &new_extra) {
	Ok(v) => return Ok(Some(v)),
	Err(err) => errors.push(err),
	}
	}

	} else if extra.check.enabled() {
	let type_name = value
	.get_type()
	.qualname()
	.unwrap_or_else(\|_\| PyString::new(value.py(), "<unknown python object>"));

	let value_str = truncate_safe_repr(value, None);
	Err(PydanticSerializationUnexpectedValue::new_err(Some(format!(
	"Expected `{field_type}` but got `{type_name}` with value `{value_str}` - serialized value may not be as expected"
	))))

	for comb_serializer in choices {
	match selector(comb_serializer, &new_extra) {
	Ok(v) => return Ok(Some(v)),
	Err(err) => errors.push(err),
	}
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expensive eager `repr` calls when serializing untagged unions #1649

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	} else if self.allow_value(value, &model_extra)? {
	let inner_value = self.get_inner_value(value, &model_extra)?;
	self.serializer.to_python(&inner_value, include, exclude, &model_extra)

	} else {
	extra.warnings.on_fallback_py(self.get_name(), value, &model_extra)?;
	infer_to_python(value, include, exclude, &model_extra)
	}

Expensive eager __repr__ calls when serializing untagged unions #1649

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Expensive eager `repr` calls when serializing untagged unions #1649