-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-13174][SparkR] Add read.csv and write.csv for SparkR #11457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixcheung @sun-rui (Firstly, sorry I am not pretty much fluent to R).
I am not too sure why it says they are being masked. I anyway added those here and could pass all tests but still I am wondering why it is and if it is correct or not. When I change the function name to another, it looks fine but it looks there are no such functions to mask.
It looks it masks the functions called
read.csvandwrite.csv. The console output was below:Would you please give me some advices on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are existing read.csv()/csv2() and write.csv()/csv2() in the R base package. It would be great that we can implement these functions in SparkR with the same signature in the R base package. Thus users can use these function in both SparkR and base package. It seems that the supported option in the signatures of these functions in the R base package are also supported in spark-csv.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sun-rui Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To oversimplify it, it would mask the other implementation when the name conflicts but the parameter signature is not exactly the same. Therefore try to see if you could change it to match as sun-rui suggests.
https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixcheung Actually, I have been looking up the link and working on this; however, I realised that it actually cannot use the extactly same parameters as spark-csv does not support all. So, I tried to add some possible parameters (locally) but it looks it ended up with incomplete versions.
This drags me into a thought that this one might have to be merged first and then dealt with later (also partly because CSV options are not confirmed yet and they are not fully merged yet, SPARK-12420).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, would you maybe create another PR based on this? I think it takes pretty much to figure out for me since this is my very second R codes (sorry).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely an issue though - R base:: read.csv is very commonly used and a name conflict as detected here will make base version inaccessible whenever SparkR package is loaded.
I can probably help to sort out the right signature - I'll try to check this out in the next few days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixcheung Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its better to match the arguments of
read.csvand not mask the localread.csv-- One thing is that we don't need to support all the options and in case some of the flags we don't support are set we can throw a warning / error.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixcheung Could you maybe give me the link for the PR if you opened? I will close this after checking another PR is open.