Skip to content

Conversation

@terma
Copy link

@terma terma commented Jan 30, 2017

What changes were proposed in this pull request?

Add CalendarIntervalType to SQL Data Types in documentation

How was this patch tested?

unit tests

@marmbrus please review

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Jan 31, 2017

@HyukjinKwon is this OK by you?

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jan 31, 2017

I am OK but I remember there were some discussions about whether this type should be exposed or not and I could not track down the conclusion further, I remember I saw @rxin there IIRC.

@srowen
Copy link
Member

srowen commented Feb 6, 2017

CC @cloud-fan for #13008 (comment) and @yhuai for #8597 (comment) as they might be what you're referring to?

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Feb 6, 2017

Actually, mine was in the jira (the comment was something like... do we really want to support this as an external blabla..... IIRC.. sorry I can't find the jira..). It seems there are several ones here and there. Maybe #15751 (comment) is related too because it is about supporting reading/writing out that type where it might refer that we can explicitly give the schema with that type.cc @mambrus too.

@HyukjinKwon
Copy link
Member

(FWIW, I am OK but just worried if it might be supposed to be internal type, maybe in the future)

@terma
Copy link
Author

terma commented Feb 7, 2017

@srowen As I understood CalendarIntervalType only for compatibility with similar type in Hive. So probably better to mark it as internal and close jira?

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Feb 7, 2017

^ I want to be very sure if we are not going to expose this or not. Could any SQL committer guy or PMC confirm this?

CalendarIntervalType only for compatibility with similar type in Hive... mark it as internal ...

This means a lot of things for example,

  • we might have to hide this
scala> import org.apache.spark.sql.types.CalendarIntervalType
import org.apache.spark.sql.types.CalendarIntervalType

and etc.

@cloud-fan
Copy link
Contributor

Actually CalendarInterval is already exposed to users, e.g. we can call collect on a DataFrame with CalendarIntervalType field, and get rows containing CalendarInterval. We don't support reading/writing CalendarIntervalType though.

So I'm ok to add documents for it.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Feb 7, 2017

Then, It looks okay to me as describing the current state and I just checked it after building the doc with this, and also

we can already use it as below:

scala> sql("SELECT interval 1 second").schema(0).dataType.getClass
res0: Class[_ <: org.apache.spark.sql.types.DataType] = class org.apache.spark.sql.types.CalendarIntervalType$

scala> sql("SELECT interval 1 second").collect()(0).get(0).getClass
res1: Class[_] = class org.apache.spark.unsafe.types.CalendarInterval
scala> val rdd = spark.sparkContext.parallelize(Seq(Row(new CalendarInterval(0, 0))))
rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = ParallelCollectionRDD[0] at parallelize at <console>:32

scala> spark.createDataFrame(rdd, StructType(StructField("a", CalendarIntervalType) :: Nil))
res1: org.apache.spark.sql.DataFrame = [a: calendarinterval]

Another meta concern is, org.apache.spark.unsafe.types.CalendarInterval seems undocumented in both scaladoc/javadoc (entire unsafe module). Once we document this as a weak promise for this API, then we might have to keep this for backward compatibility.

Maybe just describe it as SQL dedicated type or not supported for now with some Note: rather than describing org.apache.spark.unsafe.types.CalendarInterval?

@cloud-fan
Copy link
Contributor

CC @rxin, if we are going to expose CalendarInterval and CalendarIntervalType officially, shall we move CalendarInterval to the same package as Decimal, or create a new class as the external representation of CalendarIntervalType?

@gatorsmile
Copy link
Member

@terma To avoid confusing the Spark SQL users, we might not document it? How about closing this PR now? Thanks!

@terma terma closed this Jun 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants