"Append row" for table classes #1254

benjeffery · 2021-03-18T02:07:19Z

codecov · 2021-03-18T02:36:10Z

Codecov Report

Merging #1254 (0e4c6be) into main (ecf9bcb) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1254   +/-   ##
=======================================
  Coverage   93.82%   93.83%           
=======================================
  Files          26       26           
  Lines       21995    22024   +29     
  Branches      991      992    +1     
=======================================
+ Hits        20637    20666   +29     
  Misses       1324     1324           
  Partials       34       34

Flag	Coverage Δ
c-tests	`92.49% <ø> (ø)`
lwt-tests	`92.97% <ø> (ø)`
python-c-tests	`95.12% <100.00%> (+0.01%)`	⬆️
python-tests	`98.83% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
python/tskit/tables.py	`99.74% <100.00%> (+<0.01%)`	⬆️
python/tskit/trees.py	`97.88% <100.00%> (+0.01%)`	⬆️
python/tskit/util.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ecf9bcb...0e4c6be. Read the comment docs.

jeromekelleher · 2021-03-18T08:17:54Z

I'm not happy with this as-is, as it using attr.asdict won't support duck typed objects with the right properties.

There's a lot to be said for the way you're doing it now - it's less code and it's robust to adding new columns. If this works for both (.e.g) NodeTableRow and Node, then I'm happy. Are both of these attrs classes at the moment?

In the long run, we can say "any class that we can call dataclasses.asdict on which has at least the required fields" will work. (We don't want to make attrs part of our API contract right now, since we're planning on moving to dataclasses when it's convenient)

benjeffery · 2021-03-18T10:10:35Z

I think I'll do the dataclasses migration now as a separate PR.

benjeffery · 2021-03-18T14:20:16Z

I'm just dealing with the container classes. Seeing if they should be dataclasses. Will need tweaks as they have extra things like individual.node in.

benjeffery · 2021-03-18T15:57:30Z

One other question is whether to take encoded or unencoded metadata when appending a container like Node. Taking encoded metadata would be more performant - but if the user was adding rows to a table with a different, incompatible schema then they are silently adding bad metadata. I lean towards taking the encoded metadata with a warning in the docs.

jeromekelleher · 2021-03-18T18:15:45Z

I think we could be more flexible about this, and shouldn't worry too much about performance (this is row-by-row in Python, after all).

As a high-level requirement, something like

tables = tskit.TableCollection(1)
for node in other_tables.nodes():
     tables.nodes.append(node)

tables = tskit.TableCollection(1)
for node in ts.nodes():
     tables.nodes.append(node)

should always work, under the various permutations of schema present/not present in either the source or destination tables. We could perhaps emit warnings, if we're writing out without a schema, or something?

benjeffery · 2021-03-19T01:36:19Z

Yes, def agree that those should work, current tests look like that. In the case where the destination table has no schema we can just take the bytes from the source object. The question is, when the receiving table has a schema, do we decode, validate against the schema and then encode?

I guess the answer is yes, as otherwise the user won't find out till that row fails to decode at what could be a much later date.

jeromekelleher · 2021-03-19T08:44:37Z

I guess the answer is yes, as otherwise the user won't find out till that row fails to decode at what could be a much later date.

Yes, I think so. The guiding principle here is ease of use and developer friendliness; performance is very much secondary.

benjeffery · 2021-03-19T13:38:21Z

So currently the table row classes do not decode metadata, so we couldn't check it. I can't remember why they don't, maybe as the table API is more low-level.

All this is making me wonder why we have different classes for trees and tables, might be simpler to merge the two. Lets talk about it later?

benjeffery · 2021-03-24T15:22:38Z

This is now stacked on #1261 and is modified to take any class with the right attributes.

benjeffery · 2021-03-24T15:26:10Z

Will be easier to review once #1261 is merged.

jeromekelleher

Looks good - see comment about the common usage pattern for modified rows

jeromekelleher · 2021-03-24T17:00:30Z

python/tests/test_highlevel.py

+            table = getattr(tables, table_name)
+            for i in range(len(getattr(ts_fixture.tables, table_name))):
+                table.append(getattr(ts_fixture, table_name[:-1])(i))
+        print(ts_fixture.tables, tables)


jeromekelleher · 2021-03-24T17:08:30Z

python/tests/simplify.py

-            metadata=node.metadata,
-            individual=node.individual,
-        )
+        output_id = self.tables.nodes.append(replace(node, flags=flags))


I have to say I'm not a big fan of this pattern. It seems awkward to me to have to know that the rows are dataclasses, and then you need to call this function dataclasses.replace with the stuff you want to change. It's assuming that users are well versed in dataclasses.

What if we have a method

def replace(self, **kwargs): return dataclasses.replace(self, **kwargs)

on the row classes so that this line would look like

output_id = self.tables.nodes.append(node.replace(flags=flags))

at least then the recommended pattern is discoverable and we can document it locally.

Yeah, I see what you mean. I'm not sure why those methods aren't on dataclasses in the first place. I've added a method by inheritance of a class in util

jeromekelleher

LGTM!

benjeffery force-pushed the row_append branch from a526d84 to 85ed52a Compare March 18, 2021 02:07

benjeffery force-pushed the row_append branch from 85ed52a to 8a62d7d Compare March 18, 2021 12:33

benjeffery force-pushed the row_append branch 3 times, most recently from 8f71d39 to 046b18e Compare March 24, 2021 15:22

benjeffery marked this pull request as ready for review March 24, 2021 15:22

benjeffery requested a review from jeromekelleher March 24, 2021 15:22

benjeffery force-pushed the row_append branch 2 times, most recently from 86849ae to 23c27e6 Compare March 24, 2021 15:25

benjeffery added 2 commits March 24, 2021 16:37

Fix missing changes from 1261

fbb7dcf

Test ragged arrays with ragged data

dc76c8c

benjeffery force-pushed the row_append branch 2 times, most recently from d62c3ec to 85a566b Compare March 24, 2021 16:48

jeromekelleher reviewed Mar 24, 2021

View reviewed changes

benjeffery force-pushed the row_append branch from 85a566b to c3a8683 Compare March 24, 2021 17:33

Add table append method

0e4c6be

benjeffery force-pushed the row_append branch from c3a8683 to 0e4c6be Compare March 24, 2021 17:50

jeromekelleher approved these changes Mar 24, 2021

View reviewed changes

benjeffery added the AUTOMERGE-REQUESTED label Mar 24, 2021

mergify bot merged commit a6e1da9 into tskit-dev:main Mar 24, 2021

mergify bot removed the AUTOMERGE-REQUESTED label Mar 24, 2021

"Append row" for table classes #1254

"Append row" for table classes #1254

Conversation

benjeffery commented Mar 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jeromekelleher commented Mar 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benjeffery commented Mar 18, 2021

Uh oh!

benjeffery commented Mar 18, 2021

Uh oh!

benjeffery commented Mar 18, 2021

Uh oh!

jeromekelleher commented Mar 18, 2021

Uh oh!

benjeffery commented Mar 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeromekelleher commented Mar 19, 2021

Uh oh!

benjeffery commented Mar 19, 2021

Uh oh!

benjeffery commented Mar 24, 2021

Uh oh!

benjeffery commented Mar 24, 2021

Uh oh!

jeromekelleher left a comment

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Mar 24, 2021

Choose a reason for hiding this comment

Uh oh!

benjeffery Mar 24, 2021

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Mar 24, 2021

Choose a reason for hiding this comment

Uh oh!

benjeffery Mar 24, 2021

Choose a reason for hiding this comment

Uh oh!

jeromekelleher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

benjeffery commented Mar 18, 2021 •

edited

Loading

codecov bot commented Mar 18, 2021 •

edited

Loading

jeromekelleher commented Mar 18, 2021 •

edited

Loading

benjeffery commented Mar 19, 2021 •

edited

Loading