[SPARK-1852] prevents queries with sorts submitting jobs prematurely #948

liancheng · 2014-06-03T06:53:10Z

This issue is related to SPARK-1021, but this PR doesn't try to solve that one. Worked around by only forcing query planning when running DDL and other native commands.

AmplabJenkins · 2014-06-03T06:57:58Z

Merged build triggered.

AmplabJenkins · 2014-06-03T06:58:08Z

Merged build started.

AmplabJenkins · 2014-06-03T07:59:30Z

Merged build finished.

AmplabJenkins · 2014-06-03T07:59:31Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15376/

AmplabJenkins · 2014-06-03T10:22:58Z

Merged build triggered.

AmplabJenkins · 2014-06-03T10:23:06Z

Merged build started.

AmplabJenkins · 2014-06-03T11:37:01Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-03T11:37:01Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15380/

marmbrus · 2014-06-03T18:43:25Z

Thanks for looking into this! Just a thought: what do you think about instead putting a method called process() or something into QueryExecution. That way we wouldn't have to duplicate the logic in two places, and subclasses of QueryExecution could also override it if there were special cases?

marmbrus · 2014-06-03T18:44:43Z

/cc @concretevitamin who is looking at similar parts of the code at the moment.

concretevitamin · 2014-06-03T22:00:48Z

Removing duplication is definitely a good thing. The other benefit is that it seems more natural to me to push such DDL/command processing logic into QueryExecution, instead of putting them into two thin entry methods (sql and hql).

I am super new to Spark SQL so bear with me if this is silly -- what is the reason we don't do this as well for InsertIntoHiveTable and InsertIntoParquetTable?

liancheng · 2014-06-04T02:46:24Z

@marmbrus Agree. I'll try to remove the lazy qualifier of SchemaRDD.queryExecution since the field object itself is rather cheap (all of its fields are lazy), and then call the process() method in QueryExecution constructor to evaluate native commands eagerly.

@concretevitamin InsertIntoHiveTable and InsertIntoParquetTable are physical operations, which are parts of physical plans. However, we only need to take care of logical plan here. InsertIntoTable and InsertIntoCreatedTable are translated into those physical operators after logical plan optimization.

AmplabJenkins · 2014-06-10T02:42:50Z

Merged build triggered.

AmplabJenkins · 2014-06-10T03:12:50Z

Merged build triggered.

liancheng · 2014-06-10T03:32:36Z

@marmbrus @concretevitamin While working on the cache table SQL command and reviewing PR #956, I think this PR may make both changes cleaner, so I tried to finish this one first. For example, we won't need processCmd and eagerlyProcess anymore since all commands and insertions are eagerly executed in a unified way.

Instead of QueryExecution, I factored the duplicated code into SchemaRDDLike to make the change cleaner and more straightforward (after some experiments, it turned out that moving code to QueryExecution complicates things a lot).

concretevitamin · 2014-06-10T05:06:11Z

Hey @liancheng - I think this refactoring will solve this particular ticket (i.e. queries w/ sorts will not be eagerly executed anymore). However, I don't see why we don't need processCmd and eagerlyProcess anymore; for instance, for set commands this PR's changes will eagerly call toRdd, but we still need some logic in the two toRdd to handle the set. Are you proposing to just take that piece of logic and put them into SchemaRDDLike? That could work. And what about processCmd then?

AmplabJenkins · 2014-06-10T05:08:00Z

Merged build started.

AmplabJenkins · 2014-06-10T05:08:03Z

Merged build finished.

AmplabJenkins · 2014-06-10T05:08:03Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15596/

AmplabJenkins · 2014-06-10T05:08:07Z

Merged build started.

AmplabJenkins · 2014-06-10T05:08:10Z

Merged build finished.

AmplabJenkins · 2014-06-10T05:08:11Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15598/

liancheng · 2014-06-10T20:59:00Z

Just a memo: please refer to this comment and SPARK-2094 for more details about this PR and PR #956.

liancheng · 2014-06-11T22:33:33Z

Closing this PR because corresponding change is merged into another up coming PR that aims to solve SPARK-2094.

…message from mapr stream which was produced before application start (apache#948)

[SPARK-1852] prevents queries with sorts submitting jobs prematurely

d7674d0

Took insertion into account

aef8721

liancheng added 2 commits June 9, 2014 17:39

WriteToFile should be executed eagerly too

2bf0e20

Refactored duplicated code to SchemaRDD

d459c15

Removed an unused import

2c9052a

liancheng mentioned this pull request Jun 10, 2014

[SPARK-1508][SQL] Add SQLConf to SQLContext. #956

Closed

liancheng closed this Jun 11, 2014

liancheng deleted the spark-1852 branch September 24, 2014 00:12

agirish pushed a commit to HPEEzmeral/apache-spark that referenced this pull request May 5, 2022

MapR [SPARK-966] Streaming application with the latest offset read 1 …

571ab35

…message from mapr stream which was produced before application start (apache#948)

udaynpusa pushed a commit to mapr/spark that referenced this pull request Jan 30, 2024

MapR [SPARK-966] Streaming application with the latest offset read 1 …

6a78d00

…message from mapr stream which was produced before application start (apache#948)

[SPARK-1852] prevents queries with sorts submitting jobs prematurely #948

[SPARK-1852] prevents queries with sorts submitting jobs prematurely #948

Uh oh!

Conversation

liancheng commented Jun 3, 2014

Uh oh!

AmplabJenkins commented Jun 3, 2014

Uh oh!

AmplabJenkins commented Jun 3, 2014

Uh oh!

AmplabJenkins commented Jun 3, 2014

Uh oh!

AmplabJenkins commented Jun 3, 2014

Uh oh!

AmplabJenkins commented Jun 3, 2014

Uh oh!

AmplabJenkins commented Jun 3, 2014

Uh oh!

AmplabJenkins commented Jun 3, 2014

Uh oh!

AmplabJenkins commented Jun 3, 2014

Uh oh!

marmbrus commented Jun 3, 2014

Uh oh!

marmbrus commented Jun 3, 2014

Uh oh!

concretevitamin commented Jun 3, 2014

Uh oh!

liancheng commented Jun 4, 2014

Uh oh!

AmplabJenkins commented Jun 10, 2014

Uh oh!

AmplabJenkins commented Jun 10, 2014

Uh oh!

liancheng commented Jun 10, 2014

Uh oh!

concretevitamin commented Jun 10, 2014

Uh oh!

AmplabJenkins commented Jun 10, 2014

Uh oh!

AmplabJenkins commented Jun 10, 2014

Uh oh!

AmplabJenkins commented Jun 10, 2014

Uh oh!

AmplabJenkins commented Jun 10, 2014

Uh oh!

AmplabJenkins commented Jun 10, 2014

Uh oh!

AmplabJenkins commented Jun 10, 2014

Uh oh!

liancheng commented Jun 10, 2014

Uh oh!

liancheng commented Jun 11, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants