-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
bugSomething isn't workingSomething isn't workinghelp wantedExtra attention is neededExtra attention is neededregressionSomething that used to work no longer doesSomething that used to work no longer does
Description
Describe the bug
encode(..., "hex") can be used to get the hexadecimal representation of a string or a binary. Since datafusion v43 (specifically, since 1b3608d, ie. #12308), only strings and binaries that happen to be valid UTF-8 are supported.
To Reproduce
vlorentz@maxxi:~/datafusion/datafusion-cli$ git checkout 1b3608da7ca59d8d987804834d004e8b3e349d18
HEAD is now at 1b3608da7 fix: coalesce schema issues (#12308)
vlorentz@maxxi:~/datafusion/datafusion-cli$ cargo run
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.27s
Running `target/debug/datafusion-cli`
DataFusion CLI v42.0.0
> create table test ( foo bytea );
0 row(s) fetched.
Elapsed 0.007 seconds.
> insert into test (foo) values (X'8f50d3f60eae370ddbf85c86219c55108a350165');
+-------+
| count |
+-------+
| 1 |
+-------+
1 row(s) fetched.
Elapsed 0.006 seconds.
> EXPLAIN SELECT encode(foo, 'hex') FROM test;
+---------------+-----------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+-----------------------------------------------------------------------------------------+
| logical_plan | Projection: encode(CAST(test.foo AS Utf8), Utf8("hex")) |
| | TableScan: test projection=[foo] |
| physical_plan | ProjectionExec: expr=[encode(CAST(foo@0 AS Utf8), hex) as encode(test.foo,Utf8("hex"))] |
| | MemoryExec: partitions=1, partition_sizes=[1] |
| | |
+---------------+-----------------------------------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.007 seconds.
> SELECT encode(foo, 'hex') FROM test;
Arrow error: Invalid argument error: Encountered non UTF-8 data: invalid utf-8 sequence of 1 bytes from index 0
>
\q
Expected behavior
vlorentz@maxxi:~/datafusion/datafusion-cli$ git checkout 1b3608da7ca59d8d987804834d004e8b3e349d18^
Previous HEAD position was 1b3608da7 fix: coalesce schema issues (#12308)
HEAD is now at 9a3f8d115 Minor: Encapsulate type check in GroupValuesColumn, avoid panic (#12620)
vlorentz@maxxi:~/datafusion/datafusion-cli$ cargo run
Finished `dev` profile [unoptimized + debuginfo] target(s) in 53.01s
Running `target/debug/datafusion-cli`
DataFusion CLI v42.0.0
> create table test ( foo bytea );
0 row(s) fetched.
Elapsed 0.005 seconds.
> insert into test (foo) values (X'8f50d3f60eae370ddbf85c86219c55108a350165');
+-------+
| count |
+-------+
| 1 |
+-------+
1 row(s) fetched.
Elapsed 0.005 seconds.
> EXPLAIN SELECT encode(foo, 'hex') FROM test;
+---------------+---------------------------------------------------------------------------+
| plan_type | plan |
+---------------+---------------------------------------------------------------------------+
| logical_plan | Projection: encode(test.foo, Utf8("hex")) |
| | TableScan: test projection=[foo] |
| physical_plan | ProjectionExec: expr=[encode(foo@0, hex) as encode(test.foo,Utf8("hex"))] |
| | MemoryExec: partitions=1, partition_sizes=[1] |
| | |
+---------------+---------------------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.005 seconds.
> SELECT encode(foo, 'hex') FROM test;
+------------------------------------------+
| encode(test.foo,Utf8("hex")) |
+------------------------------------------+
| 8f50d3f60eae370ddbf85c86219c55108a350165 |
+------------------------------------------+
1 row(s) fetched.
Elapsed 0.004 seconds.
>
\q
Additional context
note CAST(test.foo AS Utf8) as part of the first query plan, which does not happen in the second one.
cc @mesejo
westonpace
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghelp wantedExtra attention is neededExtra attention is neededregressionSomething that used to work no longer doesSomething that used to work no longer does