Skip to content

Conversation

@cloud-fan
Copy link
Contributor

This PR adds a UnsafeArrayData, current we encode it in this way:

first 4 bytes is the # elements
then each 4 byte is the start offset of the element, unless it is negative, in which case the element is null.
followed by the elements themselves

an example: [10, 11, 12, 13, null, 14] will be encoded as:
5, 28, 32, 36, 40, -44, 44, 10, 11, 12, 13, 14

Note that, when we read a UnsafeArrayData from bytes, we can read the first 4 bytes as numElements and take the rest(first 4 bytes skipped) as value region.

unsafe map data just use 2 unsafe array data, first 4 bytes is # of elements, second 4 bytes is numBytes of key array, the follows key array data and value array data.

@SparkQA
Copy link

SparkQA commented Jul 29, 2015

Test build #38855 has finished for PR 7752 at commit bb1fc15.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class UnsafeArrayData extends ArrayData
    • public static class ArrayWriter
    • abstract class ArrayData extends SpecializedGetters with Serializable
    • class GenericArrayData(array: Array[Any]) extends ArrayData

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will fail on all places that use toArray in UnsafeArrayData right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll never call toArray in codegen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK the thing I'm worried about is fallback mode, and also I want to make it so unsaferow can be passed onto normal non-codegen/unsafe operators.

@cloud-fan cloud-fan changed the title [SPARK-9404][SQL][WIP] unsafe array data [SPARK-9404][SQL] unsafe array data Jul 30, 2015
@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #38969 has finished for PR 7752 at commit 8e33afa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class UnsafeArrayData extends ArrayData
    • public static class ArrayWriter
    • abstract class ArrayData extends SpecializedGetters with Serializable
    • class GenericArrayData(array: Array[Any]) extends ArrayData

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #38984 has finished for PR 7752 at commit d589bcb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jul 30, 2015

I've merged the other ArrayData patch. Can you bring this one up to date? Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a more abstract version of UnsafeRowWriters, we can make it work for both row and array in future refactor.

@SparkQA
Copy link

SparkQA commented Aug 1, 2015

Test build #39356 has finished for PR 7752 at commit 0bc44a4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should definitely document the binary format of UnsafeArrayData in the javadoc.

@SparkQA
Copy link

SparkQA commented Aug 2, 2015

Test build #39400 has finished for PR 7752 at commit ba75fb7.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class UnsafeArrayData extends ArrayData
    • public static class ArrayWriter
    • public class UnsafeWriters
    • public static class DecimalWriter
    • public static class UTF8StringWriter
    • public static class BinaryWriter
    • public static class StructWriter
    • public static class IntervalWriter
    • public static class ArrayWriter
    • case class FromUnsafe(child: Expression) extends UnaryExpression

@SparkQA
Copy link

SparkQA commented Aug 2, 2015

Test build #39402 has finished for PR 7752 at commit 2d6af4b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class UnsafeArrayData extends ArrayData
    • public static class ArrayWriter
    • public class UnsafeWriters
    • public static class DecimalWriter
    • public static class UTF8StringWriter
    • public static class BinaryWriter
    • public static class StructWriter
    • public static class IntervalWriter
    • public static class ArrayWriter
    • case class FromUnsafe(child: Expression) extends UnaryExpression

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may could just use UnsafeRow for ArrayType, it's not the most efficient one, but easy to catch the deadline, we could optimize it later, until we have more clear mind on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

He's already done with it hasn't he?

@SparkQA
Copy link

SparkQA commented Aug 2, 2015

Test build #39410 has finished for PR 7752 at commit d56708e.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • public class UnsafeArrayData extends ArrayData
    • public static class ArrayWriter
    • public class UnsafeWriters
    • public static class DecimalWriter
    • public static class UTF8StringWriter
    • public static class BinaryWriter
    • public static class StructWriter
    • public static class IntervalWriter
    • public static class ArrayWriter
    • case class FromUnsafe(child: Expression) extends UnaryExpression

@SparkQA
Copy link

SparkQA commented Aug 2, 2015

Test build #39411 has finished for PR 7752 at commit 20d1039.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class UnsafeArrayData extends ArrayData
    • public static class ArrayWriter
    • public class UnsafeWriters
    • public static class DecimalWriter
    • public static class UTF8StringWriter
    • public static class BinaryWriter
    • public static class StructWriter
    • public static class IntervalWriter
    • public static class ArrayWriter
    • case class FromUnsafe(child: Expression) extends UnaryExpression
    • class SpecificSafeProjection extends $
    • case class FromUTCTimestamp(left: Expression, right: Expression)
    • case class ToUTCTimestamp(left: Expression, right: Expression)
    • case class DateDiff(endDate: Expression, startDate: Expression)
    • case class InitCap(child: Expression) extends UnaryExpression with ImplicitCastInputTypes

@cloud-fan cloud-fan changed the title [SPARK-9404][SQL] unsafe array data [SPARK-9404][SPARK-9542][SQL] unsafe array data and map data Aug 2, 2015
@SparkQA
Copy link

SparkQA commented Aug 2, 2015

Test build #39430 has finished for PR 7752 at commit 6445289.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class UnsafeArrayData extends ArrayData
    • public class UnsafeMapData extends MapData
    • public class UnsafeReaders
    • public static class ArrayWriter
    • public static class MapWriter
    • public class UnsafeWriters
    • public static class DecimalWriter
    • public static class UTF8StringWriter
    • public static class BinaryWriter
    • public static class StructWriter
    • public static class IntervalWriter
    • public static class ArrayWriter
    • public static class MapWriter
    • case class FromUnsafe(child: Expression) extends UnaryExpression

@SparkQA
Copy link

SparkQA commented Aug 2, 2015

Test build #39436 has finished for PR 7752 at commit 36542b7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class UnsafeArrayData extends ArrayData
    • public class UnsafeMapData extends MapData
    • public class UnsafeReaders
    • public static class ArrayWriter
    • public static class MapWriter
    • public class UnsafeWriters
    • public static class DecimalWriter
    • public static class UTF8StringWriter
    • public static class BinaryWriter
    • public static class StructWriter
    • public static class IntervalWriter
    • public static class ArrayWriter
    • public static class MapWriter
    • case class FromUnsafe(child: Expression) extends UnaryExpression

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ghe -> the

@cloud-fan cloud-fan force-pushed the unsafe-array branch 2 times, most recently from feda148 to 345c064 Compare August 3, 2015 01:16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

label baseObject as @nullable

@SparkQA
Copy link

SparkQA commented Aug 3, 2015

Test build #39486 has finished for PR 7752 at commit 345c064.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class UnsafeArrayData extends ArrayData
    • public class UnsafeMapData extends MapData
    • public class UnsafeReaders
    • public static class ArrayWriter
    • public static class MapWriter
    • public class UnsafeWriters
    • public static class DecimalWriter
    • public static class UTF8StringWriter
    • public static class BinaryWriter
    • public static class StructWriter
    • public static class IntervalWriter
    • public static class ArrayWriter
    • public static class MapWriter
    • case class FromUnsafe(child: Expression) extends UnaryExpression

@SparkQA
Copy link

SparkQA commented Aug 3, 2015

Test build #39489 timed out for PR 7752 at commit 3269bd7 after a configured wait of 175m.

@SparkQA
Copy link

SparkQA commented Aug 3, 2015

Test build #1303 has finished for PR 7752 at commit 3269bd7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class UnsafeArrayData extends ArrayData
    • public class UnsafeMapData extends MapData
    • public class UnsafeReaders
    • public static class ArrayWriter
    • public static class MapWriter
    • public class UnsafeWriters
    • public static class DecimalWriter
    • public static class UTF8StringWriter
    • public static class BinaryWriter
    • public static class StructWriter
    • public static class IntervalWriter
    • public static class ArrayWriter
    • public static class MapWriter
    • case class FromUnsafe(child: Expression) extends UnaryExpression

@rxin
Copy link
Contributor

rxin commented Aug 3, 2015

OK going to merge this.

@asfgit asfgit closed this in 608353c Aug 3, 2015
asfgit pushed a commit that referenced this pull request Aug 8, 2015
…enerateSafe

In #7752 we added `FromUnsafe` to convert nexted unsafe data like array/map/struct to safe versions. It's a quick solution and we already have `GenerateSafe` to do the conversion which is codegened. So we should remove `FromUnsafe` and implement its codegen version in `GenerateSafe`.

Author: Wenchen Fan <[email protected]>

Closes #8029 from cloud-fan/from-unsafe and squashes the following commits:

ed40d8f [Wenchen Fan] add the copy back
a93fd4b [Wenchen Fan] cogengen FromUnsafe

(cherry picked from commit 106c078)
Signed-off-by: Davies Liu <[email protected]>
asfgit pushed a commit that referenced this pull request Aug 8, 2015
…enerateSafe

In #7752 we added `FromUnsafe` to convert nexted unsafe data like array/map/struct to safe versions. It's a quick solution and we already have `GenerateSafe` to do the conversion which is codegened. So we should remove `FromUnsafe` and implement its codegen version in `GenerateSafe`.

Author: Wenchen Fan <[email protected]>

Closes #8029 from cloud-fan/from-unsafe and squashes the following commits:

ed40d8f [Wenchen Fan] add the copy back
a93fd4b [Wenchen Fan] cogengen FromUnsafe
CodingCat pushed a commit to CodingCat/spark that referenced this pull request Aug 17, 2015
…enerateSafe

In apache#7752 we added `FromUnsafe` to convert nexted unsafe data like array/map/struct to safe versions. It's a quick solution and we already have `GenerateSafe` to do the conversion which is codegened. So we should remove `FromUnsafe` and implement its codegen version in `GenerateSafe`.

Author: Wenchen Fan <[email protected]>

Closes apache#8029 from cloud-fan/from-unsafe and squashes the following commits:

ed40d8f [Wenchen Fan] add the copy back
a93fd4b [Wenchen Fan] cogengen FromUnsafe
kiszk pushed a commit to kiszk/spark-gpu that referenced this pull request Dec 26, 2015
…enerateSafe

In apache/spark#7752 we added `FromUnsafe` to convert nexted unsafe data like array/map/struct to safe versions. It's a quick solution and we already have `GenerateSafe` to do the conversion which is codegened. So we should remove `FromUnsafe` and implement its codegen version in `GenerateSafe`.

Author: Wenchen Fan <[email protected]>

Closes #8029 from cloud-fan/from-unsafe and squashes the following commits:

ed40d8f [Wenchen Fan] add the copy back
a93fd4b [Wenchen Fan] cogengen FromUnsafe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants