-
Notifications
You must be signed in to change notification settings - Fork 0
22 enceladus schema utils #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…us-schema-utils # Conflicts: # src/test/scala/za/co/absa/spark/commons/implicits/StructFieldImplicitsTest.scala
…mnimplicits # Conflicts: # README.md
src/main/scala/za/co/absa/spark/commons/implicits/StructFieldImplicits.scala
Outdated
Show resolved
Hide resolved
src/main/scala/za/co/absa/spark/commons/implicits/StructTypeImplicits.scala
Show resolved
Hide resolved
src/main/scala/za/co/absa/spark/commons/implicits/StructTypeImplicits.scala
Outdated
Show resolved
Hide resolved
src/main/scala/za/co/absa/spark/commons/implicits/StructTypeImplicits.scala
Outdated
Show resolved
Hide resolved
src/main/scala/za/co/absa/spark/commons/implicits/StructTypeImplicits.scala
Outdated
Show resolved
Hide resolved
src/main/scala/za/co/absa/spark/commons/implicits/StructTypeImplicits.scala
Outdated
Show resolved
Hide resolved
src/main/scala/za/co/absa/spark/commons/implicits/StructTypeImplicits.scala
Outdated
Show resolved
Hide resolved
…s-schema-utils # Conflicts: # README.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little confusing, that the feature branch is marked as feature/21, the PR topic is feature/22.
Aside from the comment I have made, LGTM. (read, checked out, ran some integration tests)
| val maxSpark2XVersionExcluded: SemanticVersion = semver"3.0.0" | ||
|
|
||
| val minSpark3XVersionIncluded: SemanticVersion = semver"3.0.0" | ||
| val maxSpark3XVersionExcluded: SemanticVersion = semver"4.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems tricky. When this was in Enceladus, we could set the highest version that we have tried and could have adjusted the max version as new versions get released and tested.
With being moved to commons, this logic no longer applies, because the check can be used elsewhere with different compatibility requirements. This leads me to believe that in this general case, the method fromSpark3XCompatibilitySettings could maybe only force the minimal Spark version. Can be discussed.
src/main/scala/za/co/absa/spark/commons/implicits/StructTypeImplicits.scala
Outdated
Show resolved
Hide resolved
src/main/scala/za/co/absa/spark/commons/implicits/StructTypeImplicits.scala
Outdated
Show resolved
Hide resolved
src/main/scala/za/co/absa/spark/commons/implicits/StructTypeImplicits.scala
Outdated
Show resolved
Hide resolved
True, that was a mistake when naming the branch |
…mnimplicits # Conflicts: # README.md
…us-schema-utils # Conflicts: # README.md # src/main/scala/za/co/absa/spark/commons/schema/SchemaUtils.scala # src/test/scala/za/co/absa/spark/commons/implicits/StructFieldImplicitsTest.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now (just read the code this time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for one comment on readme, looks good.
|
|
||
| 1. Determine the name of a field overriden by metadata | ||
|
|
||
| ```scala | ||
| structField.getFieldNameOverriddenByMetadata() | ||
| ``` | ||
|
|
||
| Of them, metadata methods are: | ||
|
|
||
| 1. Gets the metadata Option[String] value given a key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes the tests have the suffix Test sometimes Suite. Would look better to be consistent.
| * | ||
| * @return Metadata "sourcecolumn" if it exists or field.name | ||
| */ | ||
| def getFieldNameOverriddenByMetadata(): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questioning its SparkCommons status.
(There might be more of these, when I think, they are too (Enceladus) specific, to be placed to SparkCommons. Happy to discuss their status.)
src/main/scala/za/co/absa/spark/commons/implicits/StructTypeImplicits.scala
Outdated
Show resolved
Hide resolved
| * @return the keys of the returned map are the columns' names after renames, the values are the source columns; | ||
| * the name are full paths denoted with dot notation | ||
| */ | ||
| def getRenamesInSchema(includeIfPredecessorChanged: Boolean = true): Map[String, String] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questioning its SparkCommons status.
| * @param path The path to the attribute | ||
| * @return The path of the first array field or "" if none were found | ||
| */ | ||
| def getFirstArrayPath(path: String): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questioning its SparkCommons status.
| * @param path The path to the attribute | ||
| * @return Seq of dot-separated paths for all array fields in the provided path | ||
| */ | ||
| def getAllArraysInPath(path: String): Seq[String] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questioning its SparkCommons status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving
| * @param fieldPathName A field to check | ||
| * @return true if the specified field is an array | ||
| */ | ||
| def isArray(fieldPathName: String): Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot be this effectively replaced with the getFieldType?
| * @param path the fully qualified field name | ||
| * @return unique top level field name | ||
| */ | ||
| def unpath(path: String): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questioning its SparkCommons status.
This is really weird function. I mean it works in our cases, but it rather utilitarian, not very conceptual, with a cryptic name.
| * @return A non-array data type at the bottom of array nesting | ||
| */ | ||
| @tailrec | ||
| final def getDeepestArrayType(arrayType: ArrayType): DataType = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questioning its SparkCommons status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add implicits for arraytype
src/main/scala/za/co/absa/spark/commons/schema/SchemaUtils.scala
Outdated
Show resolved
Hide resolved
src/main/scala/za/co/absa/spark/commons/schema/SchemaUtils.scala
Outdated
Show resolved
Hide resolved
The base branch was changed.
…s-schema-utils # Conflicts: # README.md # src/main/scala/za/co/absa/spark/commons/implicits/StructFieldImplicits.scala # src/test/scala/za/co/absa/spark/commons/implicits/ColumnImplicitsTest.scala # src/test/scala/za/co/absa/spark/commons/implicits/StructFieldImplicitsTest.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM (just read the code)
README.md
Outdated
| 1. Get a field from a text path | ||
|
|
||
| ```scala | ||
| arrayType.isEquivalentArrayType(otherArrayType) | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, the description does not seem to match the method
README.md
Outdated
| 1. Get a field from a text path | ||
|
|
||
| ```scala | ||
| dataType.isEquivalentDataType(otherDt) | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
| 1. Get a field from a text path | ||
|
|
||
| ```scala | ||
| structType.getField(path) | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably the source where it made sense.
| * @return true if provided arrays are the same ignoring nullability | ||
| */ | ||
| @scala.annotation.tailrec | ||
| final def isEquivalentArrayType(other: ArrayType): Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seem very similar to DataTypeEnhancements(dt: DataType).isEquivalentDataType(). Couldn't one of these methods use the other instead of containing the same logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's fix only obvious errors now. We can improve and add items in next release - a minor.
Adrian rightly pointed out, endless improvements prevent release, and therefor usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me, makes sense.
…s-schema-utils # Conflicts: # src/test/scala/za/co/absa/spark/commons/implicits/DataFrameImplicitsTest.scala # src/test/scala/za/co/absa/spark/commons/schema/SchemaUtilsSpec.scala
…s' into feature/21-enceladus-schema-utils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another good work 👍
Closes #22