-
Notifications
You must be signed in to change notification settings - Fork 77
"Append row" for table classes #1254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1254 +/- ##
=======================================
Coverage 93.82% 93.83%
=======================================
Files 26 26
Lines 21995 22024 +29
Branches 991 992 +1
=======================================
+ Hits 20637 20666 +29
Misses 1324 1324
Partials 34 34
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There's a lot to be said for the way you're doing it now - it's less code and it's robust to adding new columns. If this works for both (.e.g) NodeTableRow and Node, then I'm happy. Are both of these attrs classes at the moment? In the long run, we can say "any class that we can call |
I think I'll do the dataclasses migration now as a separate PR. |
I'm just dealing with the container classes. Seeing if they should be dataclasses. Will need tweaks as they have extra things like |
One other question is whether to take encoded or unencoded metadata when appending a container like |
I think we could be more flexible about this, and shouldn't worry too much about performance (this is row-by-row in Python, after all). As a high-level requirement, something like tables = tskit.TableCollection(1)
for node in other_tables.nodes():
tables.nodes.append(node) tables = tskit.TableCollection(1)
for node in ts.nodes():
tables.nodes.append(node) should always work, under the various permutations of schema present/not present in either the source or destination tables. We could perhaps emit warnings, if we're writing out without a schema, or something? |
Yes, def agree that those should work, current tests look like that. In the case where the destination table has no schema we can just take the bytes from the source object. The question is, when the receiving table has a schema, do we decode, validate against the schema and then encode? I guess the answer is yes, as otherwise the user won't find out till that row fails to decode at what could be a much later date. |
Yes, I think so. The guiding principle here is ease of use and developer friendliness; performance is very much secondary. |
So currently the table row classes do not decode metadata, so we couldn't check it. I can't remember why they don't, maybe as the table API is more low-level. All this is making me wonder why we have different classes for trees and tables, might be simpler to merge the two. Lets talk about it later? |
8f71d39
to
046b18e
Compare
This is now stacked on #1261 and is modified to take any class with the right attributes. |
86849ae
to
23c27e6
Compare
Will be easier to review once #1261 is merged. |
d62c3ec
to
85a566b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - see comment about the common usage pattern for modified rows
python/tests/test_highlevel.py
Outdated
table = getattr(tables, table_name) | ||
for i in range(len(getattr(ts_fixture.tables, table_name))): | ||
table.append(getattr(ts_fixture, table_name[:-1])(i)) | ||
print(ts_fixture.tables, tables) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
python/tests/simplify.py
Outdated
metadata=node.metadata, | ||
individual=node.individual, | ||
) | ||
output_id = self.tables.nodes.append(replace(node, flags=flags)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to say I'm not a big fan of this pattern. It seems awkward to me to have to know that the rows are dataclasses, and then you need to call this function dataclasses.replace
with the stuff you want to change. It's assuming that users are well versed in dataclasses.
What if we have a method
def replace(self, **kwargs):
return dataclasses.replace(self, **kwargs)
on the row classes so that this line would look like
output_id = self.tables.nodes.append(node.replace(flags=flags))
at least then the recommended pattern is discoverable and we can document it locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I see what you mean. I'm not sure why those methods aren't on dataclasses in the first place. I've added a method by inheritance of a class in util
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Closes #1111