You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-31735][CORE] Include date/timestamp in the summary report
Currently dates are missing from the export:
from datetime import datetime, timedelta, timezone
from pyspark.sql import types as T
from pyspark.sql import Row
from pyspark.sql import functions as F
START = datetime(2014, 1, 1, tzinfo=timezone.utc)
n_days = 22
date_range = [Row(date=(START + timedelta(days=n))) for n in range(0, n_days)]
schema = T.StructType([T.StructField(name="date", dataType=T.DateType(), nullable=False)])
rdd = spark.sparkContext.parallelize(date_range)
df = spark.createDataFrame(data=rdd, schema=schema)
df.agg(F.max("date")).show()
df.summary().show()
+-------+
|summary|
+-------+
| count|
| mean|
| stddev|
| min|
| 25%|
| 50%|
| 75%|
| max|
+-------+
Would be nice to include these as well
Signed-off-by: Fokko Driesprong <[email protected]>
0 commit comments