[SPARK-44137] Change handling of iterable objects for on field in joins
#41686
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The
onfield complained when I passed it a Tuple. That's because it saw that it checked forlistexactly, and so wrapped it into a list like[on], leading to immediate failure. This was surprising -- typically, tuple and list should be interchangeable, and typically tuple is the more readily accepted type. I have proposed a change that moves towards the principle of least surprise for this situation.The reason it checked for
listexactly is becauseColumnactually is anIterableobject because it implements__iter__. It only does this because it has__getitem__implemented, and this allows it to be iterated over withiter(). This caused bad behavior, and so__iter__was implemented to raise an exception any time a Column is iterated over. That change was implemented in SPARK-10417:#8574
It happens to also be that Python docs specifically advise against checking for iterability by using
isinstance(x, Iterable), and that checking for ability to calliter()is preferred. For references:https://stackoverflow.com/questions/1952464/in-python-how-do-i-determine-if-an-object-is-iterable
https://docs.python.org/3/library/collections.abc.html#collections.abc.Iterable
There will be no user-facing changes for existing working code. It will only fix code that did not work previously.
How was this patch tested?
Tests for:
isinstance_interablebehaves as-expected for all combinations of (str, col) and (bare, list, tuple).to_list_column_stylecreates a list when passed any of these types, and contains a non-iterable (as-defined)