Skip to content

[DataFrame] Inconsistent null handling in DataFrameColumn arithmetic #5650

@rhysparry

Description

@rhysparry

Suppose that I have the following DataFrame:

var df = new DataFrame(
    new PrimitiveDataFrameColumn<int>("Foo", 10),
    new PrimitiveDataFrameColumn<int>("Bar", Enumerable.Range(1, 10))
);

When performing mathematical operations where either side is null I would expect the null to be propagated to the resulting column.

Indeed, that is what happens when the null value is the left-hand operand. E.g.

df.Columns["Foo"] + df.Columns["Bar"]

Here the result is a column of nulls, but if we reverse the operands:

df.Columns["Bar"] + df.Columns["Foo"]

The nulls in the Foo column are effectively treated as 0.

It looks like this occurs because the Arithmetic classes are working on the underlying buffers which don't keep track of the null values (that seems to be tracked in a separate NullBitMapBuffers property on the container).

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions