Skip to content

"Append row" for table classes #1254

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 24, 2021
Merged

Conversation

benjeffery
Copy link
Member

@benjeffery benjeffery commented Mar 18, 2021

Closes #1111

@codecov
Copy link

codecov bot commented Mar 18, 2021

Codecov Report

Merging #1254 (0e4c6be) into main (ecf9bcb) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1254   +/-   ##
=======================================
  Coverage   93.82%   93.83%           
=======================================
  Files          26       26           
  Lines       21995    22024   +29     
  Branches      991      992    +1     
=======================================
+ Hits        20637    20666   +29     
  Misses       1324     1324           
  Partials       34       34           
Flag Coverage Δ
c-tests 92.49% <ø> (ø)
lwt-tests 92.97% <ø> (ø)
python-c-tests 95.12% <100.00%> (+0.01%) ⬆️
python-tests 98.83% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/tskit/tables.py 99.74% <100.00%> (+<0.01%) ⬆️
python/tskit/trees.py 97.88% <100.00%> (+0.01%) ⬆️
python/tskit/util.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ecf9bcb...0e4c6be. Read the comment docs.

@jeromekelleher
Copy link
Member

jeromekelleher commented Mar 18, 2021

I'm not happy with this as-is, as it using attr.asdict won't support duck typed objects with the right properties.

There's a lot to be said for the way you're doing it now - it's less code and it's robust to adding new columns. If this works for both (.e.g) NodeTableRow and Node, then I'm happy. Are both of these attrs classes at the moment?

In the long run, we can say "any class that we can call dataclasses.asdict on which has at least the required fields" will work. (We don't want to make attrs part of our API contract right now, since we're planning on moving to dataclasses when it's convenient)

@benjeffery
Copy link
Member Author

I think I'll do the dataclasses migration now as a separate PR.

@benjeffery
Copy link
Member Author

I'm just dealing with the container classes. Seeing if they should be dataclasses. Will need tweaks as they have extra things like individual.node in.

@benjeffery
Copy link
Member Author

One other question is whether to take encoded or unencoded metadata when appending a container like Node. Taking encoded metadata would be more performant - but if the user was adding rows to a table with a different, incompatible schema then they are silently adding bad metadata. I lean towards taking the encoded metadata with a warning in the docs.

@jeromekelleher
Copy link
Member

I think we could be more flexible about this, and shouldn't worry too much about performance (this is row-by-row in Python, after all).

As a high-level requirement, something like

tables = tskit.TableCollection(1)
for node in other_tables.nodes():
     tables.nodes.append(node)
tables = tskit.TableCollection(1)
for node in ts.nodes():
     tables.nodes.append(node)

should always work, under the various permutations of schema present/not present in either the source or destination tables. We could perhaps emit warnings, if we're writing out without a schema, or something?

@benjeffery
Copy link
Member Author

benjeffery commented Mar 19, 2021

Yes, def agree that those should work, current tests look like that. In the case where the destination table has no schema we can just take the bytes from the source object. The question is, when the receiving table has a schema, do we decode, validate against the schema and then encode?

I guess the answer is yes, as otherwise the user won't find out till that row fails to decode at what could be a much later date.

@jeromekelleher
Copy link
Member

I guess the answer is yes, as otherwise the user won't find out till that row fails to decode at what could be a much later date.

Yes, I think so. The guiding principle here is ease of use and developer friendliness; performance is very much secondary.

@benjeffery
Copy link
Member Author

So currently the table row classes do not decode metadata, so we couldn't check it. I can't remember why they don't, maybe as the table API is more low-level.

All this is making me wonder why we have different classes for trees and tables, might be simpler to merge the two. Lets talk about it later?

@benjeffery benjeffery force-pushed the row_append branch 3 times, most recently from 8f71d39 to 046b18e Compare March 24, 2021 15:22
@benjeffery
Copy link
Member Author

This is now stacked on #1261 and is modified to take any class with the right attributes.

@benjeffery benjeffery marked this pull request as ready for review March 24, 2021 15:22
@benjeffery benjeffery force-pushed the row_append branch 2 times, most recently from 86849ae to 23c27e6 Compare March 24, 2021 15:25
@benjeffery
Copy link
Member Author

Will be easier to review once #1261 is merged.

@benjeffery benjeffery force-pushed the row_append branch 2 times, most recently from d62c3ec to 85a566b Compare March 24, 2021 16:48
Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - see comment about the common usage pattern for modified rows

table = getattr(tables, table_name)
for i in range(len(getattr(ts_fixture.tables, table_name))):
table.append(getattr(ts_fixture, table_name[:-1])(i))
print(ts_fixture.tables, tables)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

metadata=node.metadata,
individual=node.individual,
)
output_id = self.tables.nodes.append(replace(node, flags=flags))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to say I'm not a big fan of this pattern. It seems awkward to me to have to know that the rows are dataclasses, and then you need to call this function dataclasses.replace with the stuff you want to change. It's assuming that users are well versed in dataclasses.

What if we have a method

def replace(self, **kwargs):
     return dataclasses.replace(self, **kwargs)

on the row classes so that this line would look like

output_id = self.tables.nodes.append(node.replace(flags=flags))

at least then the recommended pattern is discoverable and we can document it locally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see what you mean. I'm not sure why those methods aren't on dataclasses in the first place. I've added a method by inheritance of a class in util

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mergify mergify bot merged commit a6e1da9 into tskit-dev:main Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need a way to add entire rows to tables.
2 participants