HIVE-29201: Fix flaky test query_iceberg_metadata_of_unpartitioned_table.q #6075

armitage420 · 2025-09-15T12:07:49Z

What changes were proposed in this pull request?

Added explicit order by in queries

Why are the changes needed?

The test in itself uses SORT_QUERY_RESULTS to keep a deterministic ordering of queries. But there is still scope of non determinism, as SORT_QUERY_RESULTS sorts the output of each query lexicographically on unmasked rows, and if the present masked values change, then the output ordering changes as well. Hence, we need to add explicit order by on queries.

Does this PR introduce any user-facing change?

No

How was this patch tested?

q.out file results, and test pipeline

github-actions · 2025-09-15T12:10:50Z

@check-spelling-bot Report

🔴 Please review

See the files view or the action log for details.

Unrecognized words (3)

bucketedtables
languagemanual
teradatabinaryserde

Previously acknowledged words that are now absent

aarry bytecode cwiki HIVEFETCHOUTPUTSERDE timestamplocal yyyy

To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands

... in a clone of the [email protected]:armitage420/hive.git repository
on the flakyTest branch:

update_files() {
perl -e '
my @expect_files=qw('".github/actions/spelling/expect.txt"');
@ARGV=@expect_files;
my @stale=qw('"$patch_remove"');
my $re=join "|", @stale;
my $suffix=".".time();
my $previous="";
sub maybe_unlink { unlink($_[0]) if $_[0]; }
while (<>) {
if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; }
next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print;
}; maybe_unlink($previous);'
perl -e '
my $new_expect_file=".github/actions/spelling/expect.txt";
use File::Path qw(make_path);
use File::Basename qw(dirname);
make_path (dirname($new_expect_file));
open FILE, q{<}, $new_expect_file; chomp(my @words = <FILE>); close FILE;
my @add=qw('"$patch_add"');
my %items; @items{@words} = @words x (1); @items{@add} = @add x (1);
@words = sort {lc($a)."-".$a cmp lc($b)."-".$b} keys %items;
open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; };
close FILE;
system("git", "add", $new_expect_file);
'
}

comment_json=$(mktemp)
curl -L -s -S \
-H "Content-Type: application/json" \
"https://api.github.com/repos/apache/hive/issues/comments/3291850494" > "$comment_json"
comment_body=$(mktemp)
jq -r ".body // empty" "$comment_json" > $comment_body
rm $comment_json

patch_remove=$(perl -ne 'next unless s{^</summary>(.*)</details>$}{$1}; print' < "$comment_body")

patch_add=$(perl -e '$/=undef; $_=<>; if (m{Unrecognized words[^<]*</summary>\n*```\n*([^<]*)```\n*</details>$}m) { print "$1" } elsif (m{Unrecognized words[^<]*\n\n((?:\w.*\n)+)\n}m) { print "$1" };' < "$comment_body")

update_files
rm $comment_body
git add -u

If the flagged items do not appear to be text

If items relate to a ...

well-formed pattern.

If you can write a pattern that would match it,
try adding it to the patterns.txt file.

Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

Note that patterns can't match multiline strings.
binary file.

Please add a file path to the excludes.txt file matching the containing file.

File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).

…ble.q

github-actions · 2025-09-16T03:30:40Z

@check-spelling-bot Report

🔴 Please review

See the files view or the action log for details.

Unrecognized words (3)

bucketedtables
languagemanual
teradatabinaryserde

Previously acknowledged words that are now absent

aarry bytecode cwiki HIVEFETCHOUTPUTSERDE timestamplocal yyyy

To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands

... in a clone of the [email protected]:armitage420/hive.git repository
on the flakyTest branch:

update_files() {
perl -e '
my @expect_files=qw('".github/actions/spelling/expect.txt"');
@ARGV=@expect_files;
my @stale=qw('"$patch_remove"');
my $re=join "|", @stale;
my $suffix=".".time();
my $previous="";
sub maybe_unlink { unlink($_[0]) if $_[0]; }
while (<>) {
if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; }
next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print;
}; maybe_unlink($previous);'
perl -e '
my $new_expect_file=".github/actions/spelling/expect.txt";
use File::Path qw(make_path);
use File::Basename qw(dirname);
make_path (dirname($new_expect_file));
open FILE, q{<}, $new_expect_file; chomp(my @words = <FILE>); close FILE;
my @add=qw('"$patch_add"');
my %items; @items{@words} = @words x (1); @items{@add} = @add x (1);
@words = sort {lc($a)."-".$a cmp lc($b)."-".$b} keys %items;
open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; };
close FILE;
system("git", "add", $new_expect_file);
'
}

comment_json=$(mktemp)
curl -L -s -S \
-H "Content-Type: application/json" \
"https://api.github.com/repos/apache/hive/issues/comments/3294726465" > "$comment_json"
comment_body=$(mktemp)
jq -r ".body // empty" "$comment_json" > $comment_body
rm $comment_json

patch_remove=$(perl -ne 'next unless s{^</summary>(.*)</details>$}{$1}; print' < "$comment_body")

patch_add=$(perl -e '$/=undef; $_=<>; if (m{Unrecognized words[^<]*</summary>\n*```\n*([^<]*)```\n*</details>$}m) { print "$1" } elsif (m{Unrecognized words[^<]*\n\n((?:\w.*\n)+)\n}m) { print "$1" };' < "$comment_body")

update_files
rm $comment_body
git add -u

If the flagged items do not appear to be text

If items relate to a ...

well-formed pattern.

If you can write a pattern that would match it,
try adding it to the patterns.txt file.

Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

Note that patterns can't match multiline strings.
binary file.

Please add a file path to the excludes.txt file matching the containing file.

File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).

sonarqubecloud · 2025-09-16T04:42:00Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

deniskuzZ · 2025-09-16T11:02:34Z

I would rather not select columns that are going to be masked

armitage420 · 2025-09-16T11:22:45Z

@deniskuzZ Not selecting masked columns is not feasible for this particular test, as the there are part of the column values are masked and not the column(related to metadata itself) as a whole.

deniskuzZ · 2025-09-17T13:57:24Z

@deniskuzZ Not selecting masked columns is not feasible for this particular test, as the there are part of the column values are masked and not the column(related to metadata itself) as a whole.

oh, ok.

@armitage420

if the present masked values change

why would they change?

armitage420 · 2025-09-17T17:14:56Z

@armitage420

if the present masked values change

why would they change?

Thank you for your time @deniskuzZ !

Total size properties might change with file format upgrade, and in our case, it's orc here. Here's the jira for reference: HIVE-25607

Followed by the above mentioned jira, there was another jira that introduced masking for the same reason in iceberg qfiles: HIVE-25658

thomasrebele · 2025-09-18T10:41:08Z

I had a look at this flaky test, too. If you look at the expected query result, the first columns are the same, but the first different column is not sorted lexicographically:

0	hdfs://### HDFS PATH ###	ORC	0	#Masked#	378	...
0	hdfs://### HDFS PATH ###	ORC	0	#Masked#	365	...
0	hdfs://### HDFS PATH ###	ORC	0	#Masked#	374	...

The problem is that it is sorted on the original value of ### HDFS PATH ### and #Masked#, and after sorting the values are replaced.

Changing the query to make this deterministic is a workaround for this particular q file. A proposal for a more general fix: refactor the masking so that it is done before the sorting (out is a org.apache.hadoop.hive.common.io.SortPrintStream).

armitage420 · 2025-09-18T12:29:02Z

@thomasrebele Thank you for your input!
You are correct—the lexicographical sorting is done on unmasked values. Therefore, a better (and more accurate) fix would be to apply masking before sorting the results.
Currently, sorting is performed for every single query, whereas masking is only applied at the end, once we have collected all query results for the entire qfile. To implement the actual fix, we would need to change the test architecture so that masking is done per query, followed by sorting.
I'm not sure if this approach would be agreed upon, but if suggested, I can implement it!

@deniskuzZ @thomasrebele Do let me know what both of you think!

thomasrebele · 2025-09-18T16:26:09Z

I've been working on a draft of applying the masking before the sorting (in addition to applying the masking at the end of the processing) in https://github.com/thomasrebele/hive/tree/tr/HIVE-29201-v1. The design of FetchConverter makes it difficult to implement this cleanly. Alternatively, we could make FetchConverter an interface (and the old class would become FetchConverterImpl) to simplify the logic of LambdaFetchConverter. What do you think, @armitage420, @deniskuzZ?

deniskuzZ · 2025-09-19T11:39:46Z

i think masking in this specific test isn’t very effective, as it bypasses validation for several iceberg metadata fields
query_iceberg_metadata_of_partitioned_table.q doesn't suffer from the same issue?

armitage420 · 2025-09-19T12:16:53Z

i think masking in this specific test isn’t very effective, as it bypasses validation for several iceberg metadata fields query_iceberg_metadata_of_partitioned_table.q doesn't suffer from the same issue?

The masking is only done for HDFS paths, file_size_in_bytes and total file size of table properties, the masking doesn't really effect the validation of the test.

deniskuzZ · 2025-09-19T12:37:39Z

The masking is only done for HDFS paths, file_size_in_bytes and total file size of table properties, the masking doesn't really effect the validation of the test.

@armitage420 test adds some additional masking as well, try removing and see for yourself. Why do we mask row count instead of size_in_bytes?
https://iceberg.apache.org/docs/1.9.0/spark-queries/#all-data-files

0	hdfs://### HDFS PATH ###	ORC	0	5	        378	{1:7,2:30}	

0	hdfs://### HDFS PATH ###	ORC	0	#Masked#	378	{1:7,2:30}

asf-ci-hive added the tests pending label Sep 15, 2025

armitage420 changed the title ~~[WIP] Flaky test~~ HIVE-29201: Fix flaky test query_iceberg_metadata_of_unpartitioned_table.q Sep 15, 2025

asf-ci-hive added tests unstable and removed tests pending labels Sep 15, 2025

HIVE-29201: Fix flaky test query_iceberg_metadata_of_unpartitioned_ta…

17de6b9

…ble.q

armitage420 force-pushed the flakyTest branch from 5317723 to 17de6b9 Compare September 16, 2025 03:27

asf-ci-hive added tests pending and removed tests unstable labels Sep 16, 2025

asf-ci-hive added tests passed and removed tests pending labels Sep 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HIVE-29201: Fix flaky test query_iceberg_metadata_of_unpartitioned_table.q #6075

HIVE-29201: Fix flaky test query_iceberg_metadata_of_unpartitioned_table.q #6075

Uh oh!

armitage420 commented Sep 15, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 15, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 16, 2025 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Sep 16, 2025

Uh oh!

deniskuzZ commented Sep 16, 2025

Uh oh!

armitage420 commented Sep 16, 2025 •

edited

Loading

Uh oh!

deniskuzZ commented Sep 17, 2025 •

edited

Loading

Uh oh!

armitage420 commented Sep 17, 2025

Uh oh!

thomasrebele commented Sep 18, 2025 •

edited

Loading

Uh oh!

armitage420 commented Sep 18, 2025 •

edited

Loading

Uh oh!

thomasrebele commented Sep 18, 2025

Uh oh!

deniskuzZ commented Sep 19, 2025 •

edited

Loading

Uh oh!

armitage420 commented Sep 19, 2025

Uh oh!

deniskuzZ commented Sep 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

HIVE-29201: Fix flaky test query_iceberg_metadata_of_unpartitioned_table.q #6075

Are you sure you want to change the base?

HIVE-29201: Fix flaky test query_iceberg_metadata_of_unpartitioned_table.q #6075

Uh oh!

Conversation

armitage420 commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

@check-spelling-bot Report

🔴 Please review

Unrecognized words (3)

Uh oh!

github-actions bot commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

@check-spelling-bot Report

🔴 Please review

Unrecognized words (3)

Uh oh!

sonarqubecloud bot commented Sep 16, 2025

Quality Gate passed

Uh oh!

deniskuzZ commented Sep 16, 2025

Uh oh!

armitage420 commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deniskuzZ commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

armitage420 commented Sep 17, 2025

Uh oh!

thomasrebele commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

armitage420 commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasrebele commented Sep 18, 2025

Uh oh!

deniskuzZ commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

armitage420 commented Sep 19, 2025

Uh oh!

deniskuzZ commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

armitage420 commented Sep 15, 2025 •

edited

Loading

github-actions bot commented Sep 15, 2025 •

edited

Loading

github-actions bot commented Sep 16, 2025 •

edited

Loading

armitage420 commented Sep 16, 2025 •

edited

Loading

deniskuzZ commented Sep 17, 2025 •

edited

Loading

thomasrebele commented Sep 18, 2025 •

edited

Loading

armitage420 commented Sep 18, 2025 •

edited

Loading

deniskuzZ commented Sep 19, 2025 •

edited

Loading

deniskuzZ commented Sep 19, 2025 •

edited

Loading