Skip to content

Conversation

@tanwanirahul
Copy link
Contributor

Fix type inference issue for sparse CSV data - https://issues.apache.org/jira/browse/SPARK-13309

@rxin
Copy link
Contributor

rxin commented Feb 14, 2016

cc @HyukjinKwon want to review this one?

@SparkQA
Copy link

SparkQA commented Feb 14, 2016

Test build #2542 has finished for PR 11194 at commit 5edaa2a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

@rxin Sure.

}

def mergeRowTypes(first: Array[DataType], second: Array[DataType]): Array[DataType] = {
first.zipAll(second, NullType, NullType).map { case ((a, b)) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is not the part of diff but it might be great to change ((a, b)) to (a, b))

@HyukjinKwon
Copy link
Member

Overall, it looks good to me. This was already merged in databricks/spark-csv#261 and the logic looks identical.

@HyukjinKwon
Copy link
Member

Actually, I have had a thought that we might have to make a class such as TestCSVData for dataset for testing (similarly with TestJsonData for JSON datasource) or a class like CSVTest (similarly with OrcTest fpr ORC datasource) rather than adding test CSV files for everytime.

I think this might better be done in another PR. If you agree on this, I will create an issue and PR for this after this one is merged.

@tanwanirahul
Copy link
Contributor Author

Yes, could be done. I personally prefer to keep code and data separate. This way:

  • Changing the data does not require us to compile the code.
  • Reading through the source code is friendly if it does not contain data in between.

@tanwanirahul
Copy link
Contributor Author

@HyukjinKwon @rxin Is this waiting on me? Just want to confirm it I am expected to add anything more.

@tanwanirahul
Copy link
Contributor Author

Could we please merge this?

@rxin
Copy link
Contributor

rxin commented Feb 29, 2016

Sorry for the delay. I'm merging this in master. Thanks!

@asfgit asfgit closed this in dd3b545 Feb 29, 2016
@tanwanirahul
Copy link
Contributor Author

@rxin @HyukjinKwon thank you.

roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
Fix type inference issue for sparse CSV data - https://issues.apache.org/jira/browse/SPARK-13309

Author: Rahul Tanwani <[email protected]>

Closes apache#11194 from tanwanirahul/master.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants