[SPARK-12104][SPARKR] collect() does not handle multiple columns with same name. #10118

sun-rui · 2015-12-03T04:15:08Z

No description provided.

… same name.

shivaram · 2015-12-03T04:26:15Z

cc @falaki

SparkQA · 2015-12-03T04:47:57Z

Test build #47114 has finished for PR 10118 at commit b3c654f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2015-12-03T04:48:38Z

looks good

falaki · 2015-12-03T05:34:46Z

R/pkg/R/DataFrame.R

This is slightly different from 1.5. We will get exact same column names in local data.frame. In Spark 1.5 subsequent instances of the same name are appended with numbers. I am not sure which one is better. In fact I slightly prefer your suggested behavior. But just in case others want to chime in: cc @shivaram

I think the current behavior in 1.6 is actually an unintentional change from a recent change in the collect() code
Matching back to 1.5.x seems to make sense

I tested with Spark 1.4.1 and 1.5.1, both just have the same names instead of making the duplicated names unique. So this PR's behavior is backward-compatible.

>df <- createDataFrame(sqlContext, list(list(1, 2)), schema = c("a", "a")) >collect(df) a a 1 1 2

Actually, it is very easy to make unique column names, like:

names(df) <- make.names(names(x), unique = TRUE)

But we need discussion is this preferred behavior?

Yeah lets keep the local DF names consistent with the schema in SQL. (i.e. duplicated name, name is fine). If this is a breaking change we can add a note in the release notes

@falaki Just curious: what is the query you used to create the numbered columns ?

I was using a left outer join and then collecting it.

@sun-rui Can we add a test with left-outer join and then collect ?

BTW: as I said, I like this behavior.

Yeah this behavior is fine. I just want to make sure that example doesn't trigger some other code path etc.

falaki · 2015-12-03T05:34:55Z

Thanks!

sun-rui · 2015-12-04T02:12:12Z

@falaki, I can't reproduce your result. for example, in Spark 1.5.1:

> df <- createDataFrame(sqlContext, list(list(1,"1")), schema = c("key", "value"))
> df1 <- join(df, df)
> collect(df1)
  key value key value
1   1     1   1     1
> df1 <- join(df, df, df$key == df$key, "left_outer")
> collect(df1)
  key value key value                                                           
1   1     1   1     1
> names(collect(df1))
[1] "key"   "value" "key"   "value"

shivaram · 2015-12-04T02:43:40Z

@sun-rui Was your test done with the patch ? I think @falaki 's example was on 1.5.1

sun-rui · 2015-12-04T02:46:59Z

@shivaram, no, done with 1.5.1

shivaram · 2015-12-04T02:50:06Z

Ok. Change LGTM. Merging this. We can discuss the left_join issue later if required

SparkQA · 2015-12-04T02:57:11Z

Test build #47178 has finished for PR 10118 at commit 3760410.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

… same name. Author: Sun Rui <[email protected]> Closes #10118 from sun-rui/SPARK-12104. (cherry picked from commit 5011f26) Signed-off-by: Shivaram Venkataraman <[email protected]>

[SPARK-12104][SPARKR] collect() does not handle multiple columns with…

b3c654f

… same name.

falaki reviewed Dec 3, 2015
View reviewed changes

add a test case for join().

3760410

asfgit closed this in 5011f26 Dec 4, 2015

[SPARK-12104][SPARKR] collect() does not handle multiple columns with same name. #10118

[SPARK-12104][SPARKR] collect() does not handle multiple columns with same name. #10118

Uh oh!

Conversation

sun-rui commented Dec 3, 2015

Uh oh!

shivaram commented Dec 3, 2015

Uh oh!

SparkQA commented Dec 3, 2015

Uh oh!

felixcheung commented Dec 3, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

falaki commented Dec 3, 2015

Uh oh!

sun-rui commented Dec 4, 2015

Uh oh!

shivaram commented Dec 4, 2015

Uh oh!

sun-rui commented Dec 4, 2015

Uh oh!

shivaram commented Dec 4, 2015

Uh oh!

SparkQA commented Dec 4, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants