Skip to content

Conversation

@JoshRosen
Copy link
Contributor

What changes were proposed in this pull request?

This PR fixes three longstanding bugs in Spark's JsonProtocol:

  • TaskResourceRequest loses precision for amount < 0.5. The amount is a floating point number which is either between 0 and 0.5 or is a positive integer, but the JSON read path assumes it is an integer.
  • ExecutorResourceRequest integer overflows for values larger than Int.MaxValue because the write path writes longs but the read path assumes integers.
  • Off heap StorageLevels are not handled properly: the useOffHeap field isn't included in the JSON, so this StorageLevel cannot be round-tripped through JSON. This could cause the History Server to display inaccurate "off heap memory used" stats on the executors page.

I discovered these bugs while working on #36885.

Why are the changes needed?

JsonProtocol should be able to roundtrip events through JSON without loss of information.

Does this PR introduce any user-facing change?

Yes: it fixes bugs that impact information shown in the History Server Web UI. The new StorageLevel JSON field will be visible to tools which process raw event log JSON.

How was this patch tested?

Updated existing unit tests to cover the changed logic.

def taskResourceRequestFromJson(json: JValue): TaskResourceRequest = {
val rName = (json \ "Resource Name").extract[String]
val amount = (json \ "Amount").extract[Int]
val amount = (json \ "Amount").extract[Double]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amount is defined as a double at

@Evolving
@Since("3.1.0")
class TaskResourceRequest(val resourceName: String, val amount: Double)
extends Serializable {

def executorResourceRequestFromJson(json: JValue): ExecutorResourceRequest = {
val rName = (json \ "Resource Name").extract[String]
val amount = (json \ "Amount").extract[Int]
val amount = (json \ "Amount").extract[Long]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amount is defined as a long at

@Evolving
@Since("3.1.0")
class ExecutorResourceRequest(
val resourceName: String,
val amount: Long,
val discoveryScript: String = "",
val vendor: String = "") extends Serializable {

def storageLevelToJson(storageLevel: StorageLevel): JValue = {
("Use Disk" -> storageLevel.useDisk) ~
("Use Memory" -> storageLevel.useMemory) ~
("Use Off Heap" -> storageLevel.useOffHeap) ~
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@dcoliversun dcoliversun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice set of bugfixes @JoshRosen !

@JoshRosen
Copy link
Contributor Author

Thanks for the reviews. I'm going to merge this to the master branch (Spark 3.4.0).

@JoshRosen JoshRosen closed this in a39fc87 Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants