Skip to content

Conversation

@rekhajoshm
Copy link
Contributor

val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")), 2)
val df = sqlContext.createDataFrame(data).toDF("id", "label")
val indexer = new StringIndexer()
  .setInputCol("label")
  .setOutputCol("labelIndex")
  .fit(df)
val transformed = indexer.transform(df)

println(transformed.schema.toString())
println(indexer.transformSchema(df.schema))

With a quick look, verified that two print of transformed schema return different nullable
StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,true))

StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,false))

rekhajoshm added 5 commits May 5, 2015 16:10
Pulling functionality from apache spark
pull latest from apache spark
Pulling functionality from apache spark
Pulling functionality from apache spark
@SparkQA
Copy link

SparkQA commented Nov 3, 2015

Test build #44942 has finished for PR 9440 at commit eae53fb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Nov 3, 2015

cc @yanboliang

@yanboliang
Copy link
Contributor

I think the patch is not appropriate because it does not pass regression test and will produce error in other test cases. I have found the cause of this bug, but not figure out a way to resolve it. Please see my comments at SPARK-11478. I think disable nullable check is a workaround, looking forward to others' opinions. @rekhajoshm @mengxr

@rekhajoshm
Copy link
Contributor Author

Thanks @yanboliang for your comments.My findings were similar to yours, and that nullable is the cause, driven by attr.toStructField().This was a few secs quick look/pull. Agree it needs more conversation. @mengxr

@SparkQA
Copy link

SparkQA commented Apr 13, 2016

Test build #2781 has finished for PR 9440 at commit eae53fb.

  • This patch fails R style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

Is this still active? If not, can you please close this issue pending discussion on the JIRA? Thanks!

@asfgit asfgit closed this in 5c5396c Sep 23, 2016
@rekhajoshm rekhajoshm deleted the SPARK-11478 branch June 21, 2018 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants