[SPARK-13309][SQL] Fix type inference issue with CSV data #11194

tanwanirahul · 2016-02-13T18:59:41Z

Fix type inference issue for sparse CSV data - https://issues.apache.org/jira/browse/SPARK-13309

rxin · 2016-02-14T08:21:11Z

cc @HyukjinKwon want to review this one?

SparkQA · 2016-02-14T09:45:17Z

Test build #2542 has finished for PR 11194 at commit 5edaa2a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2016-02-14T12:17:48Z

@rxin Sure.

HyukjinKwon · 2016-02-14T12:22:40Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala

  }

  def mergeRowTypes(first: Array[DataType], second: Array[DataType]): Array[DataType] = {
    first.zipAll(second, NullType, NullType).map { case ((a, b)) =>


(This is not the part of diff but it might be great to change ((a, b)) to (a, b))

HyukjinKwon · 2016-02-14T12:52:43Z

Overall, it looks good to me. This was already merged in databricks/spark-csv#261 and the logic looks identical.

HyukjinKwon · 2016-02-14T14:37:13Z

Actually, I have had a thought that we might have to make a class such as TestCSVData for dataset for testing (similarly with TestJsonData for JSON datasource) or a class like CSVTest (similarly with OrcTest fpr ORC datasource) rather than adding test CSV files for everytime.

I think this might better be done in another PR. If you agree on this, I will create an issue and PR for this after this one is merged.

tanwanirahul · 2016-02-14T15:32:02Z

Yes, could be done. I personally prefer to keep code and data separate. This way:

Changing the data does not require us to compile the code.
Reading through the source code is friendly if it does not contain data in between.

tanwanirahul · 2016-02-18T08:48:06Z

@HyukjinKwon @rxin Is this waiting on me? Just want to confirm it I am expected to add anything more.

tanwanirahul · 2016-02-25T14:54:08Z

Could we please merge this?

rxin · 2016-02-29T07:16:07Z

Sorry for the delay. I'm merging this in master. Thanks!

tanwanirahul · 2016-02-29T07:28:37Z

@rxin @HyukjinKwon thank you.

Fix type inference issue for sparse CSV data - https://issues.apache.org/jira/browse/SPARK-13309 Author: Rahul Tanwani <[email protected]> Closes apache#11194 from tanwanirahul/master.

[SPARK-13309][SQL] Fix type inference issue with CSV data

5edaa2a

HyukjinKwon reviewed Feb 14, 2016
View reviewed changes

Fix review comments

60cd75c

Merge branch 'master' of https://github.com/apache/spark

410c518

asfgit closed this in dd3b545 Feb 29, 2016

anandab mentioned this pull request Feb 29, 2016

[SPARK-13309][SQL] Fix type inference issue with CSV data anandab/spark#2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-13309][SQL] Fix type inference issue with CSV data #11194

[SPARK-13309][SQL] Fix type inference issue with CSV data #11194

Uh oh!

tanwanirahul commented Feb 13, 2016

Uh oh!

rxin commented Feb 14, 2016

Uh oh!

SparkQA commented Feb 14, 2016

Uh oh!

HyukjinKwon commented Feb 14, 2016

Uh oh!

HyukjinKwon Feb 14, 2016

Uh oh!

HyukjinKwon commented Feb 14, 2016

Uh oh!

HyukjinKwon commented Feb 14, 2016

Uh oh!

tanwanirahul commented Feb 14, 2016

Uh oh!

tanwanirahul commented Feb 18, 2016

Uh oh!

tanwanirahul commented Feb 25, 2016

Uh oh!

rxin commented Feb 29, 2016

Uh oh!

tanwanirahul commented Feb 29, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-13309][SQL] Fix type inference issue with CSV data #11194

[SPARK-13309][SQL] Fix type inference issue with CSV data #11194

Uh oh!

Conversation

tanwanirahul commented Feb 13, 2016

Uh oh!

rxin commented Feb 14, 2016

Uh oh!

SparkQA commented Feb 14, 2016

Uh oh!

HyukjinKwon commented Feb 14, 2016

Uh oh!

HyukjinKwon Feb 14, 2016

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Feb 14, 2016

Uh oh!

HyukjinKwon commented Feb 14, 2016

Uh oh!

tanwanirahul commented Feb 14, 2016

Uh oh!

tanwanirahul commented Feb 18, 2016

Uh oh!

tanwanirahul commented Feb 25, 2016

Uh oh!

rxin commented Feb 29, 2016

Uh oh!

tanwanirahul commented Feb 29, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants