22 enceladus schema utils #23

AdrianOlosutean · 2022-01-13T10:32:44Z

Closes #22

…uctTypeImplicits

…us-schema-utils # Conflicts: # src/test/scala/za/co/absa/spark/commons/implicits/StructFieldImplicitsTest.scala

…mnimplicits # Conflicts: # README.md

src/main/scala/za/co/absa/spark/commons/implicits/StructFieldImplicits.scala

src/main/scala/za/co/absa/spark/commons/implicits/StructTypeImplicits.scala

…s-schema-utils # Conflicts: # README.md

dk1844

It's a little confusing, that the feature branch is marked as feature/21, the PR topic is feature/22.

Aside from the comment I have made, LGTM. (read, checked out, ran some integration tests)

dk1844 · 2022-01-24T09:14:51Z

src/main/scala/za/co/absa/spark/commons/SparkVersionGuard.scala

+  val maxSpark2XVersionExcluded: SemanticVersion = semver"3.0.0"
+
+  val minSpark3XVersionIncluded: SemanticVersion = semver"3.0.0"
+  val maxSpark3XVersionExcluded: SemanticVersion = semver"4.0.0"


This seems tricky. When this was in Enceladus, we could set the highest version that we have tried and could have adjusted the max version as new versions get released and tested.

With being moved to commons, this logic no longer applies, because the check can be used elsewhere with different compatibility requirements. This leads me to believe that in this general case, the method fromSpark3XCompatibilitySettings could maybe only force the minimal Spark version. Can be discussed.

src/main/scala/za/co/absa/spark/commons/implicits/StructTypeImplicits.scala

AdrianOlosutean · 2022-01-24T16:56:11Z

It's a little confusing, that the feature branch is marked as feature/21, the PR topic is feature/22.

True, that was a mistake when naming the branch

…mnimplicits # Conflicts: # README.md

…us-schema-utils # Conflicts: # README.md # src/main/scala/za/co/absa/spark/commons/schema/SchemaUtils.scala # src/test/scala/za/co/absa/spark/commons/implicits/StructFieldImplicitsTest.scala

dk1844

LGTM now (just read the code this time)

Zejnilovic

Except for one comment on readme, looks good.

Zejnilovic · 2022-01-28T13:44:19Z

README.md

+
+1. Determine the name of a field overriden by metadata
+
+    ```scala
+      structField.getFieldNameOverriddenByMetadata()
+    ```
+
 Of them, metadata methods are:

 1. Gets the metadata Option[String] value given a key


this seems off

benedeki

Sometimes the tests have the suffix Test sometimes Suite. Would look better to be consistent.

benedeki · 2022-01-30T04:22:15Z

src/main/scala/za/co/absa/spark/commons/implicits/StructFieldImplicits.scala

+     *
+     * @return       Metadata "sourcecolumn" if it exists or field.name
+     */
+    def getFieldNameOverriddenByMetadata(): String = {


Questioning its SparkCommons status.
(There might be more of these, when I think, they are too (Enceladus) specific, to be placed to SparkCommons. Happy to discuss their status.)

src/main/scala/za/co/absa/spark/commons/implicits/StructTypeImplicits.scala

benedeki · 2022-01-30T04:26:31Z

src/main/scala/za/co/absa/spark/commons/implicits/StructTypeImplicits.scala

+     * @return        the keys of the returned map are the columns' names after renames, the values are the source columns;
+     *                the name are full paths denoted with dot notation
+     */
+    def getRenamesInSchema(includeIfPredecessorChanged: Boolean = true): Map[String, String] = {


Questioning its SparkCommons status.

benedeki · 2022-01-30T04:27:09Z

src/main/scala/za/co/absa/spark/commons/implicits/StructTypeImplicits.scala

+     * @param path The path to the attribute
+     * @return The path of the first array field or "" if none were found
+     */
+    def getFirstArrayPath(path: String): String = {


Questioning its SparkCommons status.

benedeki · 2022-01-30T04:27:31Z

src/main/scala/za/co/absa/spark/commons/implicits/StructTypeImplicits.scala

+     * @param path The path to the attribute
+     * @return Seq of dot-separated paths for all array fields in the provided path
+     */
+    def getAllArraysInPath(path: String): Seq[String] = {


Questioning its SparkCommons status.

benedeki · 2022-01-30T13:14:34Z

src/main/scala/za/co/absa/spark/commons/implicits/StructTypeImplicits.scala

+     * @param fieldPathName A field to check
+     * @return true if the specified field is an array
+     */
+    def isArray(fieldPathName: String): Boolean = {


Cannot be this effectively replaced with the getFieldType?

benedeki · 2022-01-30T13:22:22Z

src/main/scala/za/co/absa/spark/commons/schema/SchemaUtils.scala

+   * @param path the fully qualified field name
+   * @return unique top level field name
+   */
+  def unpath(path: String): String = {


Questioning its SparkCommons status.
This is really weird function. I mean it works in our cases, but it rather utilitarian, not very conceptual, with a cryptic name.

benedeki · 2022-01-30T13:22:53Z

src/main/scala/za/co/absa/spark/commons/schema/SchemaUtils.scala

+   * @return A non-array data type at the bottom of array nesting
+   */
+  @tailrec
+  final def getDeepestArrayType(arrayType: ArrayType): DataType = {


Questioning its SparkCommons status.

Add implicits for arraytype

src/main/scala/za/co/absa/spark/commons/schema/SchemaUtils.scala

The base branch was changed.

…s-schema-utils # Conflicts: # README.md # src/main/scala/za/co/absa/spark/commons/implicits/StructFieldImplicits.scala # src/test/scala/za/co/absa/spark/commons/implicits/ColumnImplicitsTest.scala # src/test/scala/za/co/absa/spark/commons/implicits/StructFieldImplicitsTest.scala

dk1844

Otherwise LGTM (just read the code)

dk1844 · 2022-02-10T11:22:44Z

README.md

+1. Get a field from a text path
+
+    ```scala
+      arrayType.isEquivalentArrayType(otherArrayType)
+    ```   


Here, the description does not seem to match the method

dk1844 · 2022-02-10T11:23:11Z

README.md

+1. Get a field from a text path
+
+    ```scala
+      dataType.isEquivalentDataType(otherDt)
+    ```   


dk1844 · 2022-02-10T11:23:34Z

README.md

+1. Get a field from a text path
+
+    ```scala
+      structType.getField(path)
+    ```


This is probably the source where it made sense.

dk1844 · 2022-02-10T11:30:10Z

src/main/scala/za/co/absa/spark/commons/implicits/ArrayTypeImplicits.scala

+     * @return true if provided arrays are the same ignoring nullability
+     */
+    @scala.annotation.tailrec
+    final def isEquivalentArrayType(other: ArrayType): Boolean = {


This seem very similar to DataTypeEnhancements(dt: DataType).isEquivalentDataType(). Couldn't one of these methods use the other instead of containing the same logic?

Let's fix only obvious errors now. We can improve and add items in next release - a minor.
Adrian rightly pointed out, endless improvements prevent release, and therefor usage.

Fine by me, makes sense.

…s-schema-utils # Conflicts: # src/test/scala/za/co/absa/spark/commons/implicits/DataFrameImplicitsTest.scala # src/test/scala/za/co/absa/spark/commons/schema/SchemaUtilsSpec.scala

…s' into feature/21-enceladus-schema-utils

benedeki

Another good work 👍

AdrianOlosutean added 2 commits January 11, 2022 11:18

#19 ColumnImplicits and StructFieldImplicits

9d43738

#21 functions from enceladus SchemaUtils put into SchemaUtils and Str…

b88af07

…uctTypeImplicits

AdrianOlosutean changed the title ~~Feature/21 enceladus schema utils~~ Feature/22 enceladus schema utils Jan 13, 2022

#19 tests and small code fix

330c1fa

AdrianOlosutean changed the base branch from master to feature/19-add-columnimplicits January 13, 2022 14:24

AdrianOlosutean added 5 commits January 13, 2022 15:27

Merge branch 'feature/19-add-columnimplicits' into feature/21-encelad…

475949e

…us-schema-utils # Conflicts: # src/test/scala/za/co/absa/spark/commons/implicits/StructFieldImplicitsTest.scala

#22 fixes

56bfd97

#22 headers

32021cf

Merge remote-tracking branch 'origin/master' into feature/19-add-colu…

25f1983

…mnimplicits # Conflicts: # README.md

#19 feedback

1a5b004

Zejnilovic reviewed Jan 19, 2022

View reviewed changes

AdrianOlosutean mentioned this pull request Jan 19, 2022

#19 ColumnImplicits and StructFieldImplicits #21

Merged

AdrianOlosutean added 3 commits January 19, 2022 14:52

#19 renames

2110f30

Merge remote-tracking branch 'origin/master' into feature/21-enceladu…

f62b5c1

…s-schema-utils # Conflicts: # README.md

#22 refactoring

5f2746a

AdrianOlosutean marked this pull request as ready for review January 20, 2022 11:19

AdrianOlosutean added 2 commits January 20, 2022 14:22

#19 other feedback

e58d3aa

#22 refactoring

ade77e2

dk1844 reviewed Jan 24, 2022

View reviewed changes

#22 feedback

2517a20

Merge remote-tracking branch 'origin/master' into feature/19-add-colu…

d9533d2

…mnimplicits # Conflicts: # README.md

AdrianOlosutean changed the title ~~Feature/22 enceladus schema utils~~ 22 enceladus schema utils Jan 25, 2022

AdrianOlosutean added 2 commits January 25, 2022 12:57

Merge branch 'feature/19-add-columnimplicits' into feature/21-encelad…

4248a6f

…us-schema-utils # Conflicts: # README.md # src/main/scala/za/co/absa/spark/commons/schema/SchemaUtils.scala # src/test/scala/za/co/absa/spark/commons/implicits/StructFieldImplicitsTest.scala

#22 merge

8bde069

dk1844 previously approved these changes Jan 25, 2022

View reviewed changes

AdrianOlosutean added 2 commits January 26, 2022 12:30

#22 docs + import fixes

c359974

#22 bugfix

965908d

Zejnilovic previously approved these changes Jan 28, 2022

View reviewed changes

benedeki reviewed Jan 30, 2022

View reviewed changes

AdrianOlosutean added 2 commits February 3, 2022 16:45

#22 some feedback

3230c90

#22 multiple changes to implicit classes

09f3d20

AdrianOlosutean changed the base branch from feature/19-add-columnimplicits to master February 8, 2022 06:59

AdrianOlosutean added 3 commits February 8, 2022 08:15

#22 merging

11807ae

Merge branch 'master' into feature/21-enceladus-schema-utils

2e7c01f

dk1844 previously approved these changes Feb 10, 2022

View reviewed changes

AdrianOlosutean added 3 commits February 10, 2022 12:55

Merge remote-tracking branch 'origin/master' into feature/21-enceladu…

fdcaabc

…s-schema-utils # Conflicts: # src/test/scala/za/co/absa/spark/commons/implicits/DataFrameImplicitsTest.scala # src/test/scala/za/co/absa/spark/commons/schema/SchemaUtilsSpec.scala

#22 isOfType proper implemenetation

825ada4

Merge remote-tracking branch 'origin/feature/21-enceladus-schema-util…

8a141d8

…s' into feature/21-enceladus-schema-utils

AdrianOlosutean dismissed dk1844’s stale review via 8a141d8 February 10, 2022 15:59

#22 fix doc

1fa973c

benedeki approved these changes Feb 10, 2022

View reviewed changes

AdrianOlosutean merged commit e6289b8 into master Feb 11, 2022

AdrianOlosutean deleted the feature/21-enceladus-schema-utils branch February 11, 2022 06:45

22 enceladus schema utils #23

22 enceladus schema utils #23

Uh oh!

Conversation

AdrianOlosutean commented Jan 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dk1844 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AdrianOlosutean commented Jan 24, 2022

Uh oh!

dk1844 left a comment

Choose a reason for hiding this comment

Uh oh!

Zejnilovic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benedeki left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dk1844 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benedeki left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

AdrianOlosutean commented Jan 13, 2022 •

edited

Loading