-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-29201: Fix flaky test query_iceberg_metadata_of_unpartitioned_table.q #6075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@check-spelling-bot Report🔴 Please reviewSee the files view or the action log for details. Unrecognized words (3)bucketedtables Previously acknowledged words that are now absentaarry bytecode cwiki HIVEFETCHOUTPUTSERDE timestamplocal yyyyTo accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands... in a clone of the [email protected]:armitage420/hive.git repository
If the flagged items do not appear to be textIf items relate to a ...
|
5317723
to
17de6b9
Compare
@check-spelling-bot Report🔴 Please reviewSee the files view or the action log for details. Unrecognized words (3)bucketedtables Previously acknowledged words that are now absentaarry bytecode cwiki HIVEFETCHOUTPUTSERDE timestamplocal yyyyTo accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands... in a clone of the [email protected]:armitage420/hive.git repository
If the flagged items do not appear to be textIf items relate to a ...
|
|
I would rather not select columns that are going to be masked |
@deniskuzZ Not selecting masked columns is not feasible for this particular test, as the there are part of the column values are masked and not the column(related to metadata itself) as a whole. |
oh, ok.
why would they change? |
Thank you for your time @deniskuzZ ! Total size properties might change with file format upgrade, and in our case, it's orc here. Here's the jira for reference: HIVE-25607 Followed by the above mentioned jira, there was another jira that introduced masking for the same reason in iceberg qfiles: HIVE-25658 |
I had a look at this flaky test, too. If you look at the expected query result, the first columns are the same, but the first different column is not sorted lexicographically:
The problem is that it is sorted on the original value of Changing the query to make this deterministic is a workaround for this particular q file. A proposal for a more general fix: refactor the masking so that it is done before the sorting ( |
@thomasrebele Thank you for your input! @deniskuzZ @thomasrebele Do let me know what both of you think! |
I've been working on a draft of applying the masking before the sorting (in addition to applying the masking at the end of the processing) in https://github.com/thomasrebele/hive/tree/tr/HIVE-29201-v1. The design of |
i think masking in this specific test isn’t very effective, as it bypasses validation for several iceberg metadata fields |
The masking is only done for HDFS paths, file_size_in_bytes and total file size of table properties, the masking doesn't really effect the validation of the test. |
@armitage420 test adds some additional masking as well, try removing and see for yourself. Why do we mask row count instead of size_in_bytes?
|
What changes were proposed in this pull request?
Added explicit order by in queries
Why are the changes needed?
The test in itself uses SORT_QUERY_RESULTS to keep a deterministic ordering of queries. But there is still scope of non determinism, as SORT_QUERY_RESULTS sorts the output of each query lexicographically on unmasked rows, and if the present masked values change, then the output ordering changes as well. Hence, we need to add explicit order by on queries.
Does this PR introduce any user-facing change?
No
How was this patch tested?
q.out file results, and test pipeline