-
Notifications
You must be signed in to change notification settings - Fork 0
Import and expose emodel traces #200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
84a5dbe
to
c4c8519
Compare
no it is not mandatory. In certain case we may not have it. I would not return it in the read schema.
yes we can create the derivation after. it is not a field of the emodel in my opinion. it is a relationship between 2 entities.
yes we can delete / recreate them in order to fix errors. |
app/routers/emodel.py
Outdated
@@ -10,3 +19,24 @@ | |||
read_many = router.get("")(app.service.emodel.read_many) | |||
read_one = router.get("/{id_}")(app.service.emodel.read_one) | |||
create_one = router.post("")(app.service.emodel.create_one) | |||
|
|||
|
|||
@router.get("/{id_}/electrical-cell-recording") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have done /{id_}/derived_from
as it sounds more generic to me (since we will have these generic relationships in multiple place). Of course, in that case it returns all things the entity derive from (since we can derive also from other thing than electrical cell recording)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this endpoint? If the metadata returns a list of id and type then we can fetch the entities using the get endpoint for electrical cell recording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this endpoint? If the metadata returns a list of id and type then we can fetch the entities using the get endpoint for electrical cell recording.
This is open for discussion as mentioned in the description and in the other JD's comment (to not return the derivations in the read schema). It mainly depend on the use cases:
Are these derivations usually needed by the clients when fetching the emodels? If not, the join with the derivation table would be useless in these cases.
Is it possible that the client needs only the derivations for a given id, without fetching the entity itself? In this case it could be beneficial to provide a separate endpoint.
That endpoint can also be used to create/update the derivations for a specific entity.
I would have done
/{id_}/derived_from
as it sounds more generic to me (since we will have these generic relationships in multiple place). Of course, in that case it returns all things the entity derive from (since we can derive also from other thing than electrical cell recording)
We can make the endpoint name generic (we could use the terms "used" and "generated" as indicated in the PROV document, so it could be /emodel/{id_}/used
).
However, in this way the schema that specifies the returned resources should be a list of generic Entity, or a list of union of Entities.
Specifically, I think that we need to explicitly define the union of Entities if we want to instantiate models of different types with Pydantic, using a discriminator, but this adds some complexity to the model.
Alternatively, the generic endpoint can return a list of entities with only id and type, and the user should make an additional call to fetch the full entities based on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make the endpoint name generic (we could use the terms "used" and "generated" as indicated in the PROV document, so it could be /emodel/{id_}/used).
I am not sure it's a good idea to use activity-specific properties for an entity as we may want to introduce activities in the near future. derived-from/derived_from
seems the right candidate for this job as it is an entity-related property:
https://www.w3.org/TR/prov-o/#wasDerivedFrom
Do we have multiple types of derivations for emodel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
derived-from/derived_from
seems the right candidate for this job as it is an entity-related property: https://www.w3.org/TR/prov-o/#wasDerivedFrom
Agreed, it seems more appropriate.
Do we have multiple types of derivations for emodel?
Not sure for emodels, I think JD was referring to entities in general, to find an approach that can be adopted also in other cases, but it can be useful to consider more specific examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a charm, I have to admit, in not having to load the metadata to check its derivations or its generations if it's an activity for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a showcase (number 4) I added a new endpoint GET /{entity_route}/{entity_id}/derived-from
that can work with any entity, and allows to filter by type
.
This returns a paginated list of BasicEntity, that is a new pydantic schema with only id
and type
, because returning different types of entities in the same list would require special handling of polymorphism on both the server and the client side as mentioned in a previous comment.
This can be changed if needed, although it would require different queries and different joins to resolve the attributes of different types of entities.
To recap, there are 4 possible ways to retrieve the derivation, do you think that this last one can replace the previous ones for our use cases?
Do we need an additional attribute to differentiate different types of derivations for the same entity? (this could be also added later if needed)
GET /emodel/<uuid>/electrical-cell-recording)
that returns a list of ElectricalCellRecording (paginated)GET /electrical-cell-recording?generated_emodel__id=<uuid>
that returns a list of ElectricalCellRecording (paginated)GET /emodel/<uuid>
that returns a list of BasicEntity inEModelReadExpanded.electrical_cell-recording
(not paginated)GET /{entity_route}/{entity_id}/derived-from
that returns a list of BasicEntity (paginated)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use only number 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there aren't objections I'm going to remove 3 as well, and leave only 4.
The endpoint 3 returns the same result as 4, it would just allow to avoid an additional call.
class ElectricalCellRecordingFilter( | ||
CustomFilter, BrainRegionFilterMixin, SubjectFilterMixin, EntityFilterMixin, NameFilterMixin | ||
): | ||
generated_emodel: Annotated[ | ||
_NestedEModelFilter | None, | ||
FilterDepends(with_prefix("generated_emodel", _NestedEModelFilter)), | ||
] = None | ||
order_by: list[str] = ["-creation_date"] # noqa: RUF012 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that experimental data is usually used to produce models, isn't it strange allow the filtering of emodels via electrical cell recordings? Then we would probably need this generated_X for all different types of models that use traces. Maybe this can be internal just for the needs of the filter in the service but not exposed to the /electrical-cell-recording endpoint filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is technically similar to the filter approach GET /ion-channel-model?emodel__id__in=['emodel1_id', 'emodel2_id1 ]
suggested in #183 (comment)
However, if there aren't objections I'm going to remove:
- this filter (approach 2)
- and
GET /emodel/<uuid>/electrical-cell-recording
(approach 1 superseded by 4, using /derived-from)
while still keeping:
EModelReadExpanded.electrical_cell_recordings
returns BasicEntity with only id and type (approach 3)- and 4
GET /{entity_route}/{entity_id}/derived-from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the webapp we need to show a thumbnail for each emodel which we're missing
Not sure if that is an electrical cell recording, or another data source?
If it is indeed an electrical cell recording we probably need the assets of electrical cell recording in the list response of /emodel
Probably @bilalesi knows what data is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By calling:
3. GET /emodel/{entity_id}
you would get EModelReadExpanded.electrical_cell_recordings
that is a list of BasicEntityRead
with id
and type
4. GET /{entity_route}/{entity_id}/derived-from
(that is generic, but in this case it's GET /emodel/{entity_id}/derived-from
) you would get only the list of BasicEntityRead
with id
and type
With the ids, it's possible to get the full details by calling GET /electrical-cell-recording?id__in={id_1},{id_2}...
.
Alternatively, EModelReadExpanded.electrical_cell_recordings
could return the full details already, but this doesn't scale well because the same EModelReadExpanded is used for both the read_one and read_many endpoints, changed in #177.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3, 4, we run into N+1 problem.
GET /electrical-cell-recording?id__in={id_1}
might work but it gets a bit awkward, say we have 30 emodels in the GET /emodel
response and we want the electrical cell recordings for them. If there are more than one electrical_cell_recording per emodel Calling GET /electrical-cell-recording?id__in={id_1} ... {id_30}
needs more than one call to get all the data, and some processing on the frontend side to group them by emodel.
However, apparently we only need one trace per emodel in the list view, not sure exactly how we choose which one is used for the thumbnail or if we can show any of them, if it's indeed arbitrary then we could nest 1 record only to keep the response small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thumbnail is not an experimental trace linked to the e-model. It is a plot of a run of the e-model with a supra-threshold current.
When we have an e-model configuration, we plot for the supra-threshold current with the lowest amplitude from the protocols from the configuration. For the e-models where we didn't have the configuration, we did the thumbnail plotting with a current equal to 130% of threshold if I recall correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for the clarification. That's a separate issue irrelevant to this to the traces conversation , so I created: #205
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the nexus data, there are:
- images linked to the emodel: optimisation, traces, scores, thumbnail, parameters_distribution, evo_parameter_density...
- images linked to the ElectricalCellRecording (SingleCellExperimentalTrace): ResponseTrace, StimulationTrace...
I understand that the images in 1 are out of scope for this PR, that should only link the emodel with the electrical-cell-recordings.
Is there a page in the UI that shows the traces used by a specific emodel (or more emodels)?
Is there a page that shows the images linked to ElectricalCellRecording/SingleCellExperimentalTrace?
Moreover, in nexus they were images and not distributions, so they aren't present in the current nexus dump. It was discussed before if they should be re-generated, or included in the next dump, but I'm not sure of the decision (cc @jdcourcol )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a page in the UI that shows the traces used by a specific emodel (or more emodels)?
The EModel detail page, but that use case is covered by 3.
Is there a page that shows the images linked to ElectricalCellRecording/SingleCellExperimentalTrace?
The trace detail page
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
images linked to the emodel: optimisation, traces, scores, thumbnail, parameters_distribution, evo_parameter_density
These should be "ValidationResult" as these are created by the validation of a emodel or a memodel.
Thumbnail is the one that should probably be moved somehow to the thumbnail service.
1. expose GET /emodel/<uuid>/electrical-cell-recording) 2. allow to search by emodel in GET /electrical-cell-recording?generated_emodel__id=<uuid>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
This PR allows to import and expose the traces used to generate an EModel, using the derivation pattern.
Fixes https://github.com/openbraininstitute/prod-platform-architecture/issues/70
The derivations are stored in a generic
derivation
table with the intention that it can be reused for other entities.For the moment this association table contains only
used_id
andgenerated_id
referencing other entities because we don't need other information, but it should be possible to add other fields if needed (for example the type of derivation, or activity, generation, usage, attributes as indicated in https://www.w3.org/TR/prov-dm/#concept-derivation)The traces are exposed in 3 ways because of the discussion for ion-channel-models in #183 (comment) , but we may decide to keep only one, or some:
GET /emodel/<uuid>/electrical-cell-recording)
GET /electrical-cell-recording?generated_emodel__id=<uuid>
EModelReadExpanded.electrical_cell-recording
, but include onlyid
andtype
to prevent loading too many nested entities.We could have another approach, that is to expose a dedicated
/derivation
endpoint, but it would be more beneficial if we store other information in the derivation itself.The creation of the derivations can be implemented here or in a separate PR, but this needs more clarifications (cc @jdcourcol ):
PUT /emodel/<uuid>/electrical-cell-recording
that would make sense with the approach 1 above (alternatively,PATCH /emodel/<uuid>
but this would make more sense for the attributes of the emodel, and it would need more clarifications on what other fields of EModel can be updated)