Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Apr 19, 2018

What changes were proposed in this pull request?

Add fallback logic for UnsafeProjection. In production we can try to create unsafe projection using codegen implementation. Once any compile error happens, it fallbacks to interpreted implementation.

How was this patch tested?

Added test.

@viirya
Copy link
Member Author

viirya commented Apr 19, 2018

cc @hvanhovell

import org.codehaus.commons.compiler.CompileException
import org.codehaus.janino.InternalCompilerException

object CodegenObjectFactory {
Copy link
Contributor

@hvanhovell hvanhovell Apr 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this an abstract class in which we just need to implement hooks for the code generated and interpreted versions, i.e.:

object CodegenError {
  def unapply(throwable: Throwable): Option[Exception] = throwable match {
    case e: InternalCompilerException => Some(e)
    case e: CompileException => Some(e)
    case _ => None
  }
}

abstract class CodegenObjectFactory[IN, OUT] {
  def create(in: IN): OUT = try createCodeGeneratedObject(in) catch {
    case CodegenError(_) =>createInterpretedObject(in)
  }

  protected def createCodeGeneratedObject(in: IN): OUT
  protected def createInterpretedObject(in: IN): OUT
}

object UnsafeProjectionCreator extends CodegenObjectFactory[Seq[Expression], UnsafeProjection] {
  ...
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, except that I think UnsafeProjectionCreator should still support UnsafeProjectionCreator's APIs. So we can replace all usage of UnsafeProjection with UnsafeProjectionCreator.

@SparkQA
Copy link

SparkQA commented Apr 19, 2018

Test build #89578 has finished for PR 21106 at commit fe2a1cd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 20, 2018

Test build #89603 has finished for PR 21106 at commit 36f90cf.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class CodegenObjectFactory[IN, OUT]

@viirya
Copy link
Member Author

viirya commented Apr 20, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented Apr 20, 2018

Test build #89604 has finished for PR 21106 at commit 36f90cf.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class CodegenObjectFactory[IN, OUT]

@viirya
Copy link
Member Author

viirya commented Apr 20, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented Apr 20, 2018

Test build #89615 has finished for PR 21106 at commit 36f90cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class CodegenObjectFactory[IN, OUT]

@viirya
Copy link
Member Author

viirya commented Apr 23, 2018

@hvanhovell Any more comments on the current design? If not, I will apply this to all places that we create unsafe projection.

@hvanhovell
Copy link
Contributor

Yeah, let's move forward slowly. I am still pondering about the what the right abstraction here would look like; this looks promising though.

Can you try to unify this class with UnsafeProjectionCreator?

@cloud-fan
Copy link
Contributor

I think it's very hard to unify the entry point(input type) for all the code generators. E.g. some use Seq[Expression] as input, some use Seq[DataType].

I'd like to make CodegenObjectFactory only define internal interfaces, and each implementation defines its own public interface by its need. E.g.

object UnsafeProjectionCreator
    extends CodegenObjectFactory[(Seq[Expression], Boolean), UnsafeProjection] {

  protected def createCodeGeneratedObject(
      expressions: Seq[Expression],
      subexpressionEliminationEnabled: Boolean) = ...

  protected def createInterpretedObject(
      expressions: Seq[Expression],
      subexpressionEliminationEnabled: Boolean) = ...

  def create(expressions: Seq[Expression]) = ...
  def create(schema: StructType) = ...
}

@viirya
Copy link
Member Author

viirya commented Apr 25, 2018

I tried to unify this class with UnsafeProjectionCreator. So there is no more UnsafeProjectionCreator trait.

Without the trait UnsafeProjectionCreator, one problem is we can't easily test both against UnsafeProjection and InterpretedUnsafeProjection objects with the same interface of UnsafeProjectionCreator (e.g., UnsafeRowConverterSuite).

For test, I think we still need to test codegen and interpreted unsafe projection implementations individually as UnsafeRowConverterSuite does.

@cloud-fan
Copy link
Contributor

I think UnsafeProjectionCreator is only useful for tests, because in production we should always try codegen first, and fallback to interpretation if codegen fails. There is no need to disable the fallback(always codegen) or skip the codegen(always interpretation) in production.

How about we provide a sql conf for test only, to be able to disable the fallback(always codegen) or skip the codegen(always interpretation)?

@SparkQA
Copy link

SparkQA commented Apr 26, 2018

Test build #89882 has finished for PR 21106 at commit 68213fb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait CodegenObjectFactoryBase[IN, OUT]
  • trait CodegenObjectFactory[IN, OUT] extends CodegenObjectFactoryBase[IN, OUT]
  • trait CodegenObjectFactoryWithoutFallback[IN, OUT] extends CodegenObjectFactoryBase[IN, OUT]
  • trait InterpretedCodegenObjectFactory[IN, OUT] extends CodegenObjectFactoryBase[IN, OUT]

@viirya
Copy link
Member Author

viirya commented Apr 27, 2018

A sql conf sounds good to me. @hvanhovell What do you think?

@viirya
Copy link
Member Author

viirya commented May 3, 2018

@hvanhovell If you don't have other comment, I will follow @cloud-fan's suggestion to add a sql conf for it. Thanks.

@viirya
Copy link
Member Author

viirya commented May 6, 2018

@hvanhovell @cloud-fan A SQL config was added to control codegen/interpreted for test. Please let me know if you have any comment on current approach. Thanks.

@SparkQA
Copy link

SparkQA commented May 6, 2018

Test build #90277 has finished for PR 21106 at commit 32e2766.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 6, 2018

Test build #90278 has finished for PR 21106 at commit aeef468.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class CodegenObjectFactorySuite extends SparkFunSuite with PlanTestBase

// Creates wanted object. First trying codegen implementation. If any compile error happens,
// fallbacks to interpreted version.
def createObject(in: IN): OUT = {
val fallbackMode = SQLConf.get.getConf(SQLConf.CODEGEN_OBJECT_FALLBACK)
Copy link
Member Author

@viirya viirya May 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can only access SQLConf on the driver. Now this causes test failure. We can't know this config value inside createObject which can be called on executors.

To solve this, UnsafeProjection can't be an object as now, but a case class with a fallback mode parameter. So we can create this factory on driver. @cloud-fan @hvanhovell WDYT?

@SparkQA
Copy link

SparkQA commented May 7, 2018

Test build #90313 has finished for PR 21106 at commit 06cc8cc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 7, 2018

Test build #90319 has finished for PR 21106 at commit 1f5cc17.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

if (unsafeRow != expectedRow) {
fail("Incorrect evaluation in unsafe mode: " +
s"$expression, actual: $unsafeRow, expected: $expectedRow$input")
for (fallbackMode <- Seq("CODEGEN_ONLY", "NO_CODEGEN")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not hardcode it, call CodegenObjectFactoryMode.xxx instead

f(UnsafeProjection)
f(InterpretedUnsafeProjection)
private def testBothCodegenAndInterpreted(name: String)(f: => Unit): Unit = {
for (fallbackMode <- Seq("CODEGEN_ONLY", "NO_CODEGEN")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@SparkQA
Copy link

SparkQA commented May 17, 2018

Test build #90725 has finished for PR 21106 at commit 8c56ae4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 17, 2018

Test build #90724 has finished for PR 21106 at commit 716d88f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 18, 2018

Test build #90776 has finished for PR 21106 at commit 5a735af.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented May 18, 2018

retest this please.

@cloud-fan
Copy link
Contributor

also cc @rednaxelafx

@rednaxelafx
Copy link
Contributor

Should this PR wait for #21299 so that accessing confs on executors can be made simpler?

@SparkQA
Copy link

SparkQA commented May 18, 2018

Test build #90783 has finished for PR 21106 at commit 5a735af.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented May 22, 2018

@cloud-fan @rednaxelafx Accessing confs are now simpler. Please review this again. Thanks.

@SparkQA
Copy link

SparkQA commented May 22, 2018

Test build #90932 has finished for PR 21106 at commit ed525f6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented May 22, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented May 22, 2018

Test build #90939 has finished for PR 21106 at commit ed525f6.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented May 22, 2018

retest this please

private def testBothCodegenAndInterpreted(name: String)(f: => Unit): Unit = {
val modes = Seq(CodegenObjectFactoryMode.CODEGEN_ONLY, CodegenObjectFactoryMode.NO_CODEGEN)
for (fallbackMode <- modes) {
test(name + " with " + fallbackMode) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s"$name with $fallbackMode"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@SparkQA
Copy link

SparkQA commented May 22, 2018

Test build #90947 has finished for PR 21106 at commit ed525f6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 22, 2018

Test build #90951 has finished for PR 21106 at commit c9ec817.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

UnsafeKVExternalSorterSuite should not create a task context with null properties, can you fix that?

@viirya
Copy link
Member Author

viirya commented May 22, 2018

@cloud-fan Fixed. Thanks.

@SparkQA
Copy link

SparkQA commented May 22, 2018

Test build #90965 has finished for PR 21106 at commit 5527595.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in a40ffc6 May 23, 2018
@viirya viirya deleted the SPARK-23711 branch December 27, 2023 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants