Skip to content

Conversation

@swapnilushinde
Copy link

What changes were proposed in this pull request?

Currently, DataType equals* methods has lot of code duplication. This PR adds -

  1. Helper function that removes code duplication and makes it easier to understand.
  2. It makes adding new variations of equals* methods easy by simply choosing right parameter combinations of helper function.

Current functionality -

DataTypes are matched on three factors - (fieldName, dataType, nullability)
Current APIs

  1. equalsIgnoreNullability - (fieldNameCheck: Yes, dataType: Yes, nullability: No)
  2. equalsIgnoreCompatibleNullability - (fieldNameCheck: Yes, dataType: Yes, nullability: Compatible)
  3. equalsIgnoreCaseAndNullability - (fieldNameCheck: Yes(case insensitive), dataType: Yes, nullability: No)
  4. equalsStructurally - (fieldNameCheck: No, dataType: Yes, nullability: Yes/No)

Same API contracts are maintained but just new helper functions equalsDataTypes, isSameFieldName, isSameNullability added to give more flexibility & code centralization.
Additional private vals are added to set behavior of helper functions.

private val NoNameCheck = 0
private val CaseSensitiveNameCheck = 1
private val CaseInsensitiveNameCheck = 2
private val NoNullabilityCheck = 0
private val NullabilityCheck = 1
private val CompatibleNullabilityCheck = 2

Any combination of above variables can be generated in future to add new equals* APIs. For instance,

equalsIgnoreCaseCompatibleNullability

can easily be generated by calling -

equalsDataTypes(left, right, CaseInsensitiveNameCheck, CompatibleNullabilityCheck)

New scalatests added to test missing API unit testing as well.

How was this patch tested?

Code was tested with existing scalatests. New scalatests added for more robust testing of this functionality.
All tests were ran locally to make sure it doesn't affect elsewhere.

Please review http://spark.apache.org/contributing.html before opening a pull request.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much does this deduplicate? Seems overkill and even more complicated and not reducing the codes much. Roughly guess 60ish line deletion and 70ish line addition.

@swapnilushinde
Copy link
Author

swapnilushinde commented Jul 17, 2018

Thanks @HyukjinKwon! I removed old commented code in new comment.
Old code was repeating same recursive datatype checks again & again in every equals* function.

  • This commit abstracts that recursive equality checks and gives flexibility to easily add any combinations of equals* method in future. (eg. equalsIgnoreCaseCompatibleNullability() can be created with one line of code)
  • I have modularized dataType equality in smaller helper functions for more readability.
  • Equals* functions are just one line with this commit.

Please let me know if I need to change anything to make code changes more readable & less complex.

@HyukjinKwon
Copy link
Member

But the current code looks less readable and adding more lines. I would rather leave this as was.

@swapnilushinde
Copy link
Author

swapnilushinde commented Jul 18, 2018

true. Currently we have just 3 variations of comparing two datatypes for equality. Adding even one more equality function would easily cause writing same repetitive code which would negate observed increase in code lines.
Comparing two datasets' with different criteria for datatype equality is very handy for unit testing purposes too.
Can you please let me know what do you find unreadable/complex in new code so I can try to simplify it if possible?
@HyukjinKwon

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jul 18, 2018

Constant variables and squashing the logic into one function look not worth enough and overkill. Less duplication is good of course but it doesn't look worth enough by the overkill. I would focus on more important stuff. -1 from me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants