[SPARK-10863][SPARKR] Method coltypes() to get R's data types of a DataFrame #8984

olarayej · 2015-10-05T19:49:16Z

Method coltypes() to get R's data types of a DataFrame

Changed setMethod stle Removed return()

Conflicts: R/pkg/R/DataFrame.R

# Conflicts: # R/pkg/R/DataFrame.R

shivaram · 2015-10-05T23:15:24Z

Jenkins, add to whitelist

shivaram · 2015-10-05T23:15:39Z

Jenkins, ok to test

SparkQA · 2015-10-05T23:37:43Z

Test build #43260 has finished for PR 8984 at commit b44152e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2015-10-06T19:06:44Z

R/pkg/R/DataFrame.R

could you check for the case when it doesn't match the known types?

@felixcheung Yeah, that's a good point. I'm thinking coltypes() should always have an equivalent R data type for each column. We don't want method coltypes() to return NA's or throw an unsupported-type error cuz that would mean that the input DataFrame is inconsistent.

Therefore, it'd be just a matter of putting in DATA_TYPES, the list all possible values returned by dtypes() (If I'm missing any). I couldn't find that in the docs. Could you point me to the list?

Finally, I think the check for unsupported data types should be done instead in the coltypes()<- method and in the DataFrame initialization. coltypes() assumes the input DataFrame was assigned valid data types, which makes sense to me.

@felixcheung, @shivaram: Any thoughts on this one?

http://spark.apache.org/docs/latest/sql-programming-guide.html#data-types is a list that might be helpful.

Also I think it might make sense to try and map them to R types and if we fail to find a relevant one we fallback to the SparkSQL type

@shivaram I agree. I could use the mapping below (got the short types from schema.R:118):
scala -> R
"string"="character",
"long"="integer",
"short"="integer",
"integer"="integer"
"byte"="integer",
"double"="numeric",
"float"="numeric",
"decimal"="numeric",
"boolean"="logical"

In any other case, I will use the same scala type. Sounds good?

Yep. This sounds good.

shivaram · 2015-10-08T16:38:55Z

@olarayej Could you bring this PR up to date with master branch ?

olarayej · 2015-10-08T21:39:36Z

@shivaram Could you share the best practices to merge the changes from the master branch into the PR branch? This looks like a very common thing and the team (@NarineK, @adrian555, and myself) have tried quite a few options already, but none of them look pretty. We'd really appreciate any guidance. Thanks!

shivaram · 2015-10-08T21:50:21Z

There are a number of ways to do this, so this is just the way I do it personally. In my case I have two remotes in my git setup. So my .git/config looks something like

...
[remote "origin"]
        url = https://github.com/shivaram/spark-1.git
        fetch = +refs/heads/*:refs/remotes/origin/*
[remote "apache-spark"]
        url = https://github.com/apache/spark.git
        fetch = +refs/heads/*:refs/remotes/apache-spark/*
...

So if I'm on a feature branch say SPARK-10863 I do the following

> git fetch apache-spark master 
...
From https://github.com/apache/spark
 * branch            master     -> FETCH_HEAD
...
> git merge FETCH_HEAD
... Accept the merge commit message that shows up
> git log -2 # Optionally use this to verify if things look fine
> git push origin SPARK-10863
... This will push changes to your fork for this branch

Let me know if this works for you

SparkQA · 2015-10-08T22:43:54Z

Test build #43418 has finished for PR 8984 at commit 523bfbf.

This patch fails PySpark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

olarayej · 2015-10-09T03:10:21Z

@shivaram Yes, that was helpful. Thank you! I have done the merge already. Jenkins, could you run tests?

SparkQA · 2015-10-09T03:36:19Z

Test build #43456 has finished for PR 8984 at commit b1afe8e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2015-10-09T18:26:22Z

+1 on @shivaram comment on data-type above.

SparkQA · 2015-10-09T19:17:15Z

Test build #43487 has finished for PR 8984 at commit 76fe59a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

olarayej · 2015-10-09T19:26:43Z

Thanks, @felixcheung @shivaram. I have committed my changes and tests have passed :-)

shivaram · 2015-10-09T19:49:07Z

R/pkg/inst/tests/test_sparkSQL.R

Could you add a test with some other types ? Also another one which runs into the NA case and uses the SQL type would be useful.

SparkQA · 2015-10-09T22:05:27Z

Test build #43495 has finished for PR 8984 at commit d53e8b3.

This patch fails R style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-10-09T22:43:37Z

Test build #43498 has finished for PR 8984 at commit baec23f.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

shivaram · 2015-10-09T22:46:20Z

Jenkins, retest this please

SparkQA · 2015-10-09T23:08:44Z

Test build #43499 has finished for PR 8984 at commit baec23f.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

shivaram · 2015-10-09T23:24:20Z

@shaneknapp This one seems to be failing with

sbt.ResolveException: download failed: org.apache.spark#spark-unsafe_2.10;1.5.0!spark-unsafe_2.10.jar

Any idea whats up ?

olarayej · 2015-11-06T22:23:47Z

@felixcheung I have tried quite a few things already but unfortunately, I haven't been able to do the rebase. Could you provide some suggestions? Thanks!

shivaram · 2015-11-09T18:30:51Z

@olarayej Do the git merge commands in #8984 (comment) not work ?

olarayej · 2015-11-09T19:16:12Z

@shivaram @felixcheung
I followed the same steps described by @shivaram.

What's confusing for us is that every time we run a fetch followed by a merge, it triggers conflicts with a number of files that we haven't modified (even outside the R folder). After I solved all conflicts, and ran a push, it also pushed those files. Now there are 194 modified files, which makes things pretty messy.

I'm thinking about creating a new branch and discard this one. Thoughts?

SparkQA · 2015-11-09T19:23:09Z

Test build #45398 has finished for PR 8984 at commit 0bc5b35.

This patch fails to build.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2015-11-09T19:39:30Z

Test build #45400 has finished for PR 8984 at commit ba091fb.

This patch fails to build.
This patch does not merge cleanly.
This patch adds no public classes.

shivaram · 2015-11-09T19:55:25Z

Yeah something seems to be messed up. You shouldn't get other files changed if you do a fetch + merge as long as the rest of your tree is synced to the same place.

You can open a new PR if you feel that its getting messy in this case -- The only downside is that we lose all these comments we had etc. but since this PR is close to being merged its probably fine in this case.

olarayej · 2015-11-10T00:50:12Z

I have created a new branch and PR #9579 to follow up on this.

This is a follow up on PR #8984, as the corresponding branch for such PR was damaged. Author: Oscar D. Lara Yejas <[email protected]> Closes #9579 from olarayej/SPARK-10863_NEW14. (cherry picked from commit 47735cd) Signed-off-by: Shivaram Venkataraman <[email protected]>

This is a follow up on PR #8984, as the corresponding branch for such PR was damaged. Author: Oscar D. Lara Yejas <[email protected]> Closes #9579 from olarayej/SPARK-10863_NEW14.

shivaram · 2015-11-10T19:11:13Z

@olarayej Could you close this PR ? Only the person who opened the PR can close it and it helps clear our PR queue at https://spark-prs.appspot.com/#r

olarayej · 2015-11-10T19:20:38Z

Closing this PR as #9579 has been created to follow up....

Oscar D. Lara Yejas and others added 11 commits September 24, 2015 22:01

SPARK-10807. Added as.data.frame as a synonym for collect().

461714d

Removed operator %++%, which is a synonym for paste()

e9e34b5

Removed extra blank space.

c65b682

Removed extra spaces to comply with R style

cee871c

Moved setGeneric declaration to generics.R.

0851163

Changed setMethod stle Removed return()

Added test cases for as.data.frame

7a8e62a

Merge remote-tracking branch 'origin/SPARK-10807' into SPARK-10807

de6d164

Conflicts: R/pkg/R/DataFrame.R

Changed setMethod declaration to comply with standard

a346cc6

Removed changes to .gitignore

6c4dcbc

Merge remote-tracking branch 'upstream/master'

99e6304

# Conflicts: # R/pkg/R/DataFrame.R

coltypes

30c5d26

felixcheung reviewed Oct 6, 2015
View reviewed changes

shivaram reviewed Oct 9, 2015
View reviewed changes

Oscar D. Lara Yejas and others added 13 commits November 6, 2015 22:08

Update DataFrame.R

9a9618e

Update types.R

25faa4e

Update types.R

57a47a4

Update DataFrame.R

e5ab466

Added tests for complex types

772de99

Update types.R

67b12a4

Update types.R

0bb39dc

Update test_sparkSQL.R

8aa13ef

Removed for loop

9b36955

Update DataFrame.R

95a8ece

Update DataFrame.R

462b1f1

Removed blank space

cd033c0

Merge tests and description files

ba091fb

olarayej force-pushed the SPARK-10863-NEW9 branch from 0bc5b35 to ba091fb Compare November 9, 2015 19:24

olarayej mentioned this pull request Nov 10, 2015

[SPARK-10863][SPARKR] Method coltypes() (New version) #9579

Closed

olarayej closed this Nov 10, 2015

[SPARK-10863][SPARKR] Method coltypes() to get R's data types of a DataFrame #8984

[SPARK-10863][SPARKR] Method coltypes() to get R's data types of a DataFrame #8984

Uh oh!

Conversation

olarayej commented Oct 5, 2015

Uh oh!

shivaram commented Oct 5, 2015

Uh oh!

shivaram commented Oct 5, 2015

Uh oh!

SparkQA commented Oct 5, 2015

Uh oh!

felixcheung Oct 6, 2015

Choose a reason for hiding this comment

Uh oh!

olarayej Oct 6, 2015

Choose a reason for hiding this comment

Uh oh!

olarayej Oct 8, 2015

Choose a reason for hiding this comment

Uh oh!

shivaram Oct 9, 2015

Choose a reason for hiding this comment

Uh oh!

olarayej Oct 9, 2015

Choose a reason for hiding this comment

Uh oh!

shivaram Oct 9, 2015

Choose a reason for hiding this comment

Uh oh!

shivaram commented Oct 8, 2015

Uh oh!

olarayej commented Oct 8, 2015

Uh oh!

shivaram commented Oct 8, 2015

Uh oh!

SparkQA commented Oct 8, 2015

Uh oh!

olarayej commented Oct 9, 2015

Uh oh!

SparkQA commented Oct 9, 2015

Uh oh!

felixcheung commented Oct 9, 2015

Uh oh!

SparkQA commented Oct 9, 2015

Uh oh!

olarayej commented Oct 9, 2015

Uh oh!

shivaram Oct 9, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 9, 2015

Uh oh!

SparkQA commented Oct 9, 2015

Uh oh!

shivaram commented Oct 9, 2015

Uh oh!

SparkQA commented Oct 9, 2015

Uh oh!

shivaram commented Oct 9, 2015

Uh oh!

olarayej commented Nov 6, 2015

Uh oh!

shivaram commented Nov 9, 2015

Uh oh!

olarayej commented Nov 9, 2015

Uh oh!

SparkQA commented Nov 9, 2015

Uh oh!

SparkQA commented Nov 9, 2015

Uh oh!

shivaram commented Nov 9, 2015

Uh oh!

olarayej commented Nov 10, 2015

Uh oh!

shivaram commented Nov 10, 2015

Uh oh!

olarayej commented Nov 10, 2015

Uh oh!

Reviewers