Skip to content

Commit b34ec2d

Browse files
committed
[SPARK-31735][CORE] Include date/timestamp in the summary report
Currently dates are missing from the export: from datetime import datetime, timedelta, timezone from pyspark.sql import types as T from pyspark.sql import Row from pyspark.sql import functions as F START = datetime(2014, 1, 1, tzinfo=timezone.utc) n_days = 22 date_range = [Row(date=(START + timedelta(days=n))) for n in range(0, n_days)] schema = T.StructType([T.StructField(name="date", dataType=T.DateType(), nullable=False)]) rdd = spark.sparkContext.parallelize(date_range) df = spark.createDataFrame(data=rdd, schema=schema) df.agg(F.max("date")).show() df.summary().show() +-------+ |summary| +-------+ | count| | mean| | stddev| | min| | 25%| | 50%| | 75%| | max| +-------+ Would be nice to include these as well Signed-off-by: Fokko Driesprong <[email protected]>
1 parent 2012d58 commit b34ec2d

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -264,7 +264,10 @@ object StatFunctions extends Logging {
264264
}
265265

266266
val selectedCols = ds.logicalPlan.output
267-
.filter(a => a.dataType.isInstanceOf[NumericType] || a.dataType.isInstanceOf[StringType])
267+
.filter(a => a.dataType.isInstanceOf[NumericType]
268+
|| a.dataType.isInstanceOf[StringType]
269+
|| a.dataType.isInstanceOf[DateType]
270+
|| a.dataType.isInstanceOf[TimestampType])
268271

269272
val aggExprs = statisticFns.flatMap { func =>
270273
selectedCols.map(c => Column(Cast(func(c), StringType)).as(c.name))

0 commit comments

Comments
 (0)