Skip to content

Conversation

GianlucaFicarelli
Copy link
Collaborator

@GianlucaFicarelli GianlucaFicarelli commented May 27, 2025

This PR allows to import and expose the traces used to generate an EModel, using the derivation pattern.
Fixes https://github.com/openbraininstitute/prod-platform-architecture/issues/70

The derivations are stored in a generic derivation table with the intention that it can be reused for other entities.
For the moment this association table contains only used_id and generated_id referencing other entities because we don't need other information, but it should be possible to add other fields if needed (for example the type of derivation, or activity, generation, usage, attributes as indicated in https://www.w3.org/TR/prov-dm/#concept-derivation)

The traces are exposed in 3 ways because of the discussion for ion-channel-models in #183 (comment) , but we may decide to keep only one, or some:

  1. expose GET /emodel/<uuid>/electrical-cell-recording)
  2. allow to search by emodel in GET /electrical-cell-recording?generated_emodel__id=<uuid>
  3. always return the list of traces in EModelReadExpanded.electrical_cell-recording, but include only id and type to prevent loading too many nested entities.

We could have another approach, that is to expose a dedicated /derivation endpoint, but it would be more beneficial if we store other information in the derivation itself.

The creation of the derivations can be implemented here or in a separate PR, but this needs more clarifications (cc @jdcourcol ):

  • should the derivations be created at the same time when the emodel is registered? Are they mandatory? in this case, they could be included in the create schema of EModel, and it would make sense if they are returned also by the read schema (approach 3 above)
  • should it be possible to create the derivations after the EModel has been registered? In this case, they could be registered through a dedicated endpoint, for example PUT /emodel/<uuid>/electrical-cell-recording that would make sense with the approach 1 above (alternatively, PATCH /emodel/<uuid> but this would make more sense for the attributes of the emodel, and it would need more clarifications on what other fields of EModel can be updated)
  • should it be possible to delete/recreate the derivations? in this case, in bulk or individually? This can also be decided later.

@GianlucaFicarelli GianlucaFicarelli changed the title Import emodel traces Import and expose emodel traces May 27, 2025
@jdcourcol
Copy link
Contributor

should the derivations be created at the same time when the emodel is registered? Are they mandatory? in this case, they could be included in the create schema of EModel, and it would make sense if they are returned also by the read schema (approach 3 above)

no it is not mandatory. In certain case we may not have it. I would not return it in the read schema.

should it be possible to create the derivations after the EModel has been registered? In this case, they could be registered through a dedicated endpoint, for example PUT /emodel//electrical-cell-recording that would make sense with the approach 1 above (alternatively, PATCH /emodel/ but this would make more sense for the attributes of the emodel, and it would need more clarifications on what other fields of EModel can be updated)

yes we can create the derivation after. it is not a field of the emodel in my opinion. it is a relationship between 2 entities.

should it be possible to delete/recreate the derivations? in this case, in bulk or individually? This can also be decided later.

yes we can delete / recreate them in order to fix errors.

@@ -10,3 +19,24 @@
read_many = router.get("")(app.service.emodel.read_many)
read_one = router.get("/{id_}")(app.service.emodel.read_one)
create_one = router.post("")(app.service.emodel.create_one)


@router.get("/{id_}/electrical-cell-recording")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have done /{id_}/derived_from as it sounds more generic to me (since we will have these generic relationships in multiple place). Of course, in that case it returns all things the entity derive from (since we can derive also from other thing than electrical cell recording)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this endpoint? If the metadata returns a list of id and type then we can fetch the entities using the get endpoint for electrical cell recording.

Copy link
Collaborator Author

@GianlucaFicarelli GianlucaFicarelli May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this endpoint? If the metadata returns a list of id and type then we can fetch the entities using the get endpoint for electrical cell recording.

This is open for discussion as mentioned in the description and in the other JD's comment (to not return the derivations in the read schema). It mainly depend on the use cases:
Are these derivations usually needed by the clients when fetching the emodels? If not, the join with the derivation table would be useless in these cases.
Is it possible that the client needs only the derivations for a given id, without fetching the entity itself? In this case it could be beneficial to provide a separate endpoint.

That endpoint can also be used to create/update the derivations for a specific entity.

I would have done /{id_}/derived_from as it sounds more generic to me (since we will have these generic relationships in multiple place). Of course, in that case it returns all things the entity derive from (since we can derive also from other thing than electrical cell recording)

We can make the endpoint name generic (we could use the terms "used" and "generated" as indicated in the PROV document, so it could be /emodel/{id_}/used).
However, in this way the schema that specifies the returned resources should be a list of generic Entity, or a list of union of Entities.
Specifically, I think that we need to explicitly define the union of Entities if we want to instantiate models of different types with Pydantic, using a discriminator, but this adds some complexity to the model.
Alternatively, the generic endpoint can return a list of entities with only id and type, and the user should make an additional call to fetch the full entities based on them.

Copy link
Contributor

@eleftherioszisis eleftherioszisis May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make the endpoint name generic (we could use the terms "used" and "generated" as indicated in the PROV document, so it could be /emodel/{id_}/used).

I am not sure it's a good idea to use activity-specific properties for an entity as we may want to introduce activities in the near future. derived-from/derived_from seems the right candidate for this job as it is an entity-related property:
https://www.w3.org/TR/prov-o/#wasDerivedFrom

Do we have multiple types of derivations for emodel?

Copy link
Collaborator Author

@GianlucaFicarelli GianlucaFicarelli May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

derived-from/derived_from seems the right candidate for this job as it is an entity-related property: https://www.w3.org/TR/prov-o/#wasDerivedFrom

Agreed, it seems more appropriate.

Do we have multiple types of derivations for emodel?

Not sure for emodels, I think JD was referring to entities in general, to find an approach that can be adopted also in other cases, but it can be useful to consider more specific examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a charm, I have to admit, in not having to load the metadata to check its derivations or its generations if it's an activity for instance.

Copy link
Collaborator Author

@GianlucaFicarelli GianlucaFicarelli May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a showcase (number 4) I added a new endpoint GET /{entity_route}/{entity_id}/derived-from that can work with any entity, and allows to filter by type.
This returns a paginated list of BasicEntity, that is a new pydantic schema with only id and type, because returning different types of entities in the same list would require special handling of polymorphism on both the server and the client side as mentioned in a previous comment.
This can be changed if needed, although it would require different queries and different joins to resolve the attributes of different types of entities.

To recap, there are 4 possible ways to retrieve the derivation, do you think that this last one can replace the previous ones for our use cases?
Do we need an additional attribute to differentiate different types of derivations for the same entity? (this could be also added later if needed)

  1. GET /emodel/<uuid>/electrical-cell-recording) that returns a list of ElectricalCellRecording (paginated)
  2. GET /electrical-cell-recording?generated_emodel__id=<uuid> that returns a list of ElectricalCellRecording (paginated)
  3. GET /emodel/<uuid> that returns a list of BasicEntity in EModelReadExpanded.electrical_cell-recording (not paginated)
  4. GET /{entity_route}/{entity_id}/derived-from that returns a list of BasicEntity (paginated)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use only number 4.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there aren't objections I'm going to remove 3 as well, and leave only 4.
The endpoint 3 returns the same result as 4, it would just allow to avoid an additional call.

Comment on lines 23 to 30
class ElectricalCellRecordingFilter(
CustomFilter, BrainRegionFilterMixin, SubjectFilterMixin, EntityFilterMixin, NameFilterMixin
):
generated_emodel: Annotated[
_NestedEModelFilter | None,
FilterDepends(with_prefix("generated_emodel", _NestedEModelFilter)),
] = None
order_by: list[str] = ["-creation_date"] # noqa: RUF012
Copy link
Contributor

@eleftherioszisis eleftherioszisis May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that experimental data is usually used to produce models, isn't it strange allow the filtering of emodels via electrical cell recordings? Then we would probably need this generated_X for all different types of models that use traces. Maybe this can be internal just for the needs of the filter in the service but not exposed to the /electrical-cell-recording endpoint filter?

Copy link
Collaborator Author

@GianlucaFicarelli GianlucaFicarelli May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically similar to the filter approach GET /ion-channel-model?emodel__id__in=['emodel1_id', 'emodel2_id1 ] suggested in #183 (comment)

However, if there aren't objections I'm going to remove:

  • this filter (approach 2)
  • and GET /emodel/<uuid>/electrical-cell-recording (approach 1 superseded by 4, using /derived-from)

while still keeping:

  • EModelReadExpanded.electrical_cell_recordings returns BasicEntity with only id and type (approach 3)
  • and 4 GET /{entity_route}/{entity_id}/derived-from

Copy link
Contributor

@g-bar g-bar May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the webapp we need to show a thumbnail for each emodel which we're missing
Capture d’écran 2025-05-30 à 12 03 11

Not sure if that is an electrical cell recording, or another data source?
If it is indeed an electrical cell recording we probably need the assets of electrical cell recording in the list response of /emodel

Probably @bilalesi knows what data is needed.

Copy link
Collaborator Author

@GianlucaFicarelli GianlucaFicarelli May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By calling:
3. GET /emodel/{entity_id} you would get EModelReadExpanded.electrical_cell_recordings that is a list of BasicEntityRead with id and type
4. GET /{entity_route}/{entity_id}/derived-from (that is generic, but in this case it's GET /emodel/{entity_id}/derived-from) you would get only the list of BasicEntityRead with id and type

With the ids, it's possible to get the full details by calling GET /electrical-cell-recording?id__in={id_1},{id_2}....

Alternatively, EModelReadExpanded.electrical_cell_recordings could return the full details already, but this doesn't scale well because the same EModelReadExpanded is used for both the read_one and read_many endpoints, changed in #177.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3, 4, we run into N+1 problem.

GET /electrical-cell-recording?id__in={id_1} might work but it gets a bit awkward, say we have 30 emodels in the GET /emodel response and we want the electrical cell recordings for them. If there are more than one electrical_cell_recording per emodel Calling GET /electrical-cell-recording?id__in={id_1} ... {id_30} needs more than one call to get all the data, and some processing on the frontend side to group them by emodel.

However, apparently we only need one trace per emodel in the list view, not sure exactly how we choose which one is used for the thumbnail or if we can show any of them, if it's indeed arbitrary then we could nest 1 record only to keep the response small.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thumbnail is not an experimental trace linked to the e-model. It is a plot of a run of the e-model with a supra-threshold current.
When we have an e-model configuration, we plot for the supra-threshold current with the lowest amplitude from the protocols from the configuration. For the e-models where we didn't have the configuration, we did the thumbnail plotting with a current equal to 130% of threshold if I recall correctly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for the clarification. That's a separate issue irrelevant to this to the traces conversation , so I created: #205

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the nexus data, there are:

  1. images linked to the emodel: optimisation, traces, scores, thumbnail, parameters_distribution, evo_parameter_density...
  2. images linked to the ElectricalCellRecording (SingleCellExperimentalTrace): ResponseTrace, StimulationTrace...

I understand that the images in 1 are out of scope for this PR, that should only link the emodel with the electrical-cell-recordings.
Is there a page in the UI that shows the traces used by a specific emodel (or more emodels)?
Is there a page that shows the images linked to ElectricalCellRecording/SingleCellExperimentalTrace?

Moreover, in nexus they were images and not distributions, so they aren't present in the current nexus dump. It was discussed before if they should be re-generated, or included in the next dump, but I'm not sure of the decision (cc @jdcourcol )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a page in the UI that shows the traces used by a specific emodel (or more emodels)?

The EModel detail page, but that use case is covered by 3.

Is there a page that shows the images linked to ElectricalCellRecording/SingleCellExperimentalTrace?

The trace detail page

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

images linked to the emodel: optimisation, traces, scores, thumbnail, parameters_distribution, evo_parameter_density

These should be "ValidationResult" as these are created by the validation of a emodel or a memodel.
Thumbnail is the one that should probably be moved somehow to the thumbnail service.

1. expose GET /emodel/<uuid>/electrical-cell-recording)
2. allow to search by emodel in GET /electrical-cell-recording?generated_emodel__id=<uuid>
* origin/main:
  Wire in assets to the BrainAtlas and BrainAtlasRead (#207)
  me model calibration result proposal (#188)
  Remove legacy distribution metadata from assets during import (#206)
  sort the returned hierarchy alphabetically (#201)
Copy link
Contributor

@AurelienJaquier AurelienJaquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@jdcourcol jdcourcol merged commit a1d1371 into main Jun 3, 2025
1 check passed
@jdcourcol jdcourcol deleted the import_emodel_traces branch June 3, 2025 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants