Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jul 21, 2025

What changes were proposed in this pull request?

This PR aims to support On-Demand Log Loading in History Server by looking up the rolling event log locations even Spark listing didn't finish to load the event log files.

val EVENT_LOG_ROLLING_ON_DEMAND_LOAD_ENABLED =
  ConfigBuilder("spark.history.fs.eventLog.rolling.onDemandLoadEnabled")
    .doc("Whether to look up rolling event log locations on demand manner before listing files.")
    .version("4.1.0")
    .booleanConf
    .createWithDefault(true)

Previously, Spark History Server will show Application ... Not Found page if a job is requested before scanning it even if the file exists in the correct location. So, this PR doesn't introduce any regressions because this aims to introduce a kind of fallback logic to improve error handling .

Screenshot 2025-07-22 at 14 08 21

Why are the changes needed?

Since Apache Spark 3.0, we have been using event log rolling not only for long-running jobs, but also for some failed jobs to archive the partial event logs incrementally.

Since Apache Spark 4.0, event log rolling is enabled by default.

On top of that, this PR aims to reduce storage cost at Apache Spark 4.1. By supporting On-Demand Loading for rolled event logs, we can use larger values for spark.history.fs.update.interval instead of the default 10s. Although Spark History logs are consumed in various ways, It has a big benefit because most of successful Spark jobs's logs are not visited by the users.

Does this PR introduce any user-facing change?

No. This is a new feature.

How was this patch tested?

Pass the CIs with newly added test case.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the CORE label Jul 21, 2025
@dongjoon-hyun
Copy link
Member Author

Could you review this PR, @LuciferYang ?

@LuciferYang
Copy link
Contributor

a bit busy today.... I'll take a look later.

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM
Thank you @dongjoon-hyun

@mridulm
Copy link
Contributor

mridulm commented Jul 22, 2025

+CC @thejdeep
You had implemented something similar internally

addListing(new ApplicationInfoWrapper(
ApplicationInfo(appId, appId, None, None, None, None, List.empty),
List(new AttemptInfoWrapper(info, logPath, 0, Some(1), None, None, None, None))))
load(appId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behavior if the application does not exist ? (typo in user query for example)
Will the listing now have an invalid entry ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, do you think it will be better if we check for the existence of the file at its location before adding an entry ? This is to keep parity with how checkForLogs works - we only add entries for whose event log locations exist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we had better avoid that because it requires the full path including "s3://...", @thejdeep .

case _: NoSuchElementException if this.conf.get(ON_DEMAND_ENABLED) =>
val name = Utils.nameForAppAndAttempt(appId, attemptId)
loadFromFallbackLocation(appId, attemptId,
RollingEventLogFilesWriter.EVENT_LOG_DIR_NAME_PREFIX + name)
Copy link
Contributor

@mridulm mridulm Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not assume it will be RollingEventLogFilesWriter, users dont need to be running with the default enabled, right ?

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jul 22, 2025

Thank you for review, @LuciferYang and @mridulm .

This is a continuation of Apache Spark 4.0.0 SPARK-45869 Revisit and Improve Spark Standalone Cluster to support Apache Spark as a subsystem.

For your questions,

What is the behavior if the application does not exist ? (typo in user query for example)
Will the listing now have an invalid entry ?

  1. No, this PR intentionally doesn't update the list UI immediately. The list UI will be updated periodically in the existing manner to sync with the storage. New log entry will be there with the uploaded correct information. For the typo(?) entry, it will be removed because it's the same situation when a user deletes log files from the storage arbitrarily.

We should not assume it will be RollingEventLogFilesWriter, users dont need to be running with the default enabled, right ?

  1. Yes, you are right. Initially, I also had a single file event log logic in this PR, but I removed it in the current PR to simplify this PR and to focus rolling event logs only. Legacy log format is out of scope in this PR. As you may noticed at SPARK-45771, I have been using rolling event logs always since Nov 2, 2023 in order to guarantee to get partial logs.

As a foot note, I want to advertise the default event log (rolling) type more instead of the legacy single file event log type.

@dongjoon-hyun
Copy link
Member Author

cc @peter-toth

@mridulm
Copy link
Contributor

mridulm commented Jul 22, 2025

  1. No, this PR intentionally doesn't update the list UI immediately. The list UI will be updated periodically in the existing manner to sync with the storage. New log entry will be there with the uploaded correct information. For the typo(?) entry, it will be removed because it's the same situation when a user deletes log files from the storage arbitrarily.

Use of addListing (here) will result in updating the listing db, right ? And so show up next time we list apps ? Or is it getting cleaned up when load fails (for invalid app for ex) ?
Please do let me know if I am missing something here !

  1. Yes, you are right. Initially, I also had a single file event log logic in this PR, but I removed it in the current PR to simplify this PR and to focus rolling event logs only. Legacy log format is out of scope in this PR. As you may noticed at SPARK-45771, I have been using rolling event logs always since Nov 2, 2023 in order to guarantee to get partial logs.

As a foot note, I want to advertise the default event log (rolling) type more instead of the legacy single file event log type.

The default has flipped, but we still support single event logs ... we have nontrivial usages of this pattern.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jul 22, 2025

And so show up next time we list apps ? Or is it getting cleaned up when load fails (for invalid app for ex) ? I might have missed that !

I got what you missed. In short, it's not exposed to the users if no file exists there. Please see getAppUI's logic. It returns None for FileNotFoundException.

override def getAppUI(appId: String, attemptId: Option[String]): Option[LoadedAppUI] = {
val app = try {
load(appId)
} catch {
case _: NoSuchElementException =>
return None
}
val attempt = app.attempts.find(_.info.attemptId == attemptId).orNull
if (attempt == null) {
return None
}
val conf = this.conf.clone()
val secManager = createSecurityManager(conf, attempt)
val kvstore = try {
diskManager match {
case Some(sm) =>
loadDiskStore(sm, appId, attempt)
case _ =>
createInMemoryStore(attempt)
}
} catch {
case _: FileNotFoundException =>
return None
}

So, the dummy metadata will be there in the hidden manner but cleaned up at the next periodic scan because there is no matching files.

The default has flipped, but we still support single event logs ... we have nontrivial usages of this pattern.

For the above question. Sorry for that, but what I meant is that this PR doesn't aim to support it like the PR description, @mridulm . I don't think that's the blocker for this improvement because this feature only chimes in the replacement of App Not Found page.

@mridulm
Copy link
Contributor

mridulm commented Jul 22, 2025

getListing will expose it, no ? It would end up resulting in invalid listing entries into the listing db.

Sorry for that, but what I meant is that this PR doesn't aim to support it like the PR description,

In that case, please document it in the config name (namespace it appropriately) and make that a precondition to enabling the flag for ON_DEMAND_ENABLED.
It is unfortunate that undeprecated features are seeing feature regression, but I will let you decide best course of action.

@dongjoon-hyun
Copy link
Member Author

getListing will expose it, no ? It would end up resulting in invalid listing entries into the listing db.

Sorry for that, but what I meant is that this PR doesn't aim to support it like the PR description,

In that case, please document it in the config and make that a precondition to enabling the flag for ON_DEMAND_ENABLED. It is unfortunate that undeprecated features are seeing feature regression, but I will let you decide best course of action.

Where do you mean this? Did you see the invalid data in the UI somewhere?

@dongjoon-hyun
Copy link
Member Author

If you are worrying about incomplete job UI, I can filter out in the UI page.

Or, I'm wondering if you are considering non-Apache Spark library which depends on this.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-52914][CORE] Support On-Demand Log Loading in History Server [SPARK-52914][CORE] Support On-Demand Log Loading for rolling logs in History Server Jul 22, 2025
@mridulm
Copy link
Contributor

mridulm commented Jul 22, 2025

Where do you mean this? Did you see the invalid data in the UI somewhere?

It has been a while since I looked at history server :-)
Consider this flow:
HistoryPage -> render -> shouldDisplayApplications -> getApplicationList -> getListing -> list.viewlisting.view(classOf[ApplicationInfoWrapper])

(This is an example, there are other paths to view over ApplicationInfoWrapper)

Will this not result in the entry added getting surfaced ?
You are absolutely right that if user 'selects' the application, nothing should show up as app does not exsit.

@mridulm
Copy link
Contributor

mridulm commented Jul 22, 2025

If you are worrying about incomplete job UI, I can filter out in the UI page.

Easier would be to simply remove the entry if invalid (that is, load failed) - will keep the api and ui in sync.

... assuming the example I gave is valid !

@dongjoon-hyun
Copy link
Member Author

Of course, I know you are the expert, but the dummy metadata has incomplete flag. So, I asked you if you are worrying about Incomplete job listing UI.

  def shouldDisplayApplications(requestedIncomplete: Boolean): Boolean = {
    parent.getApplicationList().exists(isApplicationCompleted(_) != requestedIncomplete)
  }

In addition, I agree with you that we had better delete it at FileNotFound exception.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jul 22, 2025

For the next items according to your advice,

  • I revised the PR title to be clear
  • I will rename the config.
  • I will add a clean up logic at FileNotFound.

@mridulm
Copy link
Contributor

mridulm commented Jul 22, 2025

Let me tag @thejdeep who worked on the internal solution (no bandwidth to open source it yet :( ) ... I did not review it, so I am not as aware of the intricacies as he is.
His suggestions should help the work we are doing here Dongjoon !

@dongjoon-hyun
Copy link
Member Author

Thank you for invaluable review, @mridulm . I addressed your comments.


assert(dir.listFiles().length === 1)
assert(provider.getAppUI("nonexist", None).isEmpty)
assert(provider.getListing().length === (if (onDemandEnabled) 1 else 0))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new line verifies the cleanup, @mridulm .

val app = try {
load(appId)
} catch {
case _: NoSuchElementException if this.conf.get(EVENT_LOG_ROLLING_ON_DEMAND_LOAD_ENABLED) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that we are trying to push for usage of RollingEventLogFilesWriter as the new default but for users who have single event logs and if they try to get the UI for an app, will this functionality not break for them since EVENT_LOG_ROLLING_ON_DEMAND_LOAD_ENABLED is true by default ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on what we can break here, @thejdeep ?

will this functionality not break for them since EVENT_LOG_ROLLING_ON_DEMAND_LOAD_ENABLED is true by default ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dummy metadata is added and cleaned up at FileNotFound exception immediate in this function as @mridulm requested. It works for both non-existing appId and SingleFile logs.

val info = ApplicationAttemptInfo(attemptId, date, date, date, 0, "spark", false, "unknown")
addListing(new ApplicationInfoWrapper(
ApplicationInfo(appId, appId, None, None, None, None, List.empty),
List(new AttemptInfoWrapper(info, logPath, 0, Some(1), None, None, None, None))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we rely on the event log for information like startTime, endTime, user etc ? Will this not lead to incorrect information being displayed on the home page of SHS ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only a dummy place to allow SHS shows the application logs before periodic scanning happens. The periodic scanning will keep it in sync.

BTW, I'm wondering how many times do you think this fallback is used in the production environments, @thejdeep ? I'm curious if you are thinking about turning off the periodic scanning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see that the intention is just to have dummy placeholders until the scanning takes care of it.

If users operate with a large Spark cluster, my two cents are that users may tend to access their app on demand much more frequently and it might just lead to a incorrect listing page. For example, we noticed that a good fraction of our SHS requests are on demand since users would like to get their reports as soon as their app finishes and before checkForLogs completes.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and technically, it's not exposed in the listing page. Could you build this PR and test it by yourself?

a incorrect listing page

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like a limitation of a single file event log, @thejdeep . If you have rolling event logs, SHS have the correct partial information already while your jobs are running.

For example, we noticed that a good fraction of our SHS requests are on demand since users would like to get their reports as soon as their app finishes and before checkForLogs completes.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just questions to understand your use cases:

  • How do you handle Spark Streaming Jobs with a single file event log ? Still your job doesn't use rolling event logs?
  • Are you assuming only Spark 2.x or 3.x jobs because Spark 4 jobs generates rolling events by default since SPARK-45771?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing context @dongjoon-hyun .

We currently do not use rolling event logs since we only currently serve batch use-cases. All applications are currently on 3.x.

I can build your PR locally and test it on single file event logs to see how it works with listing and cleanup. I can get back to you earliest by tomorrow if that works.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the info and your efforts on reviewing this. Take your time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun wanted to get your thoughts on #51604 (comment)

Thank you!

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jul 22, 2025

To @mridulm and @thejdeep , I'm wondering if you proposed your implementation to the Apache Spark community. Do you have a PR which I can take a look at your approach?

+CC @thejdeep You had implemented something similar internally

@thejdeep
Copy link
Contributor

@dongjoon-hyun These are two scenarios which I wanted to get your thoughts on, after testing your change. These are run with rolling event logs :

  1. SHS started with a cleaner interval, lets say spark.history.fs.cleaner.interval=1m and spark.history.fs.update.interval=10m. So in this case, if an app starts, with your change, we will able to load it on demand but since the cleaner runs every minute, we delete the event log directory (since the endTime is start of epoch) even if the app is still running and using the rolling app log directory.
  2. The listing page does not display the on-demand applications I agree but this is due to filtering on the UI historypage.js since we are adding to ApplicationAttemptInfo saying the app is not completed, however there is an end time specified.

@dongjoon-hyun
Copy link
Member Author

Is this true the Apache Spark deletes the running app's rolling logs, @thejdeep ?

we delete the event log directory (since the endTime is start of epoch) even if the app is still running and using the rolling app log directory.

@thejdeep
Copy link
Contributor

@dongjoon-hyun This is my test bed :

SPARK_HISTORY_OPTS="\
    -Dspark.history.fs.cleaner.enabled=true \
    -Dspark.history.fs.cleaner.interval=1m \
    -Dspark.history.fs.update.interval=10m"

Steps :

  1. Start SHS and let the initial checkForLogs run.
  2. Start an application before the next checkForLogs runs :
bin/spark-submit \ 
  --class org.apache.spark.examples.SparkPi \
  --conf spark.eventLog.enabled=true \
  --conf spark.eventLog.rolling.enabled=true \
  --conf spark.eventLog.rolling.maxFileSize=10m \
  --conf spark.eventLog.dir=file:///tmp/spark-events \
  /Users/example/spark/master/examples/target/scala-2.13/jars/spark-examples_2.13-4.1.0-SNAPSHOT.jar \
  10000
  1. Check the application ID in the log directory and access SHS by on-demand loading
  2. It will load the app
  3. When the cleaner thread runs, it will delete the directory.

@dongjoon-hyun
Copy link
Member Author

Thanks. Let me check the details, @thejdeep .

@dongjoon-hyun
Copy link
Member Author

I confirmed the regression. Thank you for reporting and sharing the setup. Let me fix it.

25/07/22 16:53:57 INFO FsHistoryProvider: Deleting expired event log for eventlog_v2_local-1753228384662

@dongjoon-hyun
Copy link
Member Author

To @thejdeep , it's fixed and the test case is updated to provide a test coverage for your report case.

private def loadFromFallbackLocation(appId: String, attemptId: Option[String], logPath: String)
: ApplicationInfoWrapper = {
val date = new Date(0)
val lastUpdate = new Date()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastUpdate is set with the current time, @thejdeep .


// The dummy entry should be protected from cleanLogs()
provider.cleanLogs()
assert(dir.listFiles().length === 1)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the test coverage, @thejdeep .

@dongjoon-hyun
Copy link
Member Author

Could you review this PR when you have some time, @viirya and @yaooqinn ?

Copy link
Member

@yaooqinn yaooqinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +352 to +358
case _: FileNotFoundException if this.conf.get(EVENT_LOG_ROLLING_ON_DEMAND_LOAD_ENABLED) =>
if (app.attempts.head.info.appSparkVersion == "unknown") {
listing.synchronized {
listing.delete(classOf[ApplicationInfoWrapper], appId)
}
}
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't quite understand what this did. Seems loadFromFallbackLocation loads a dummy record from the log path into listing?

But what does this FileNotFoundException catch? Why it deletes the dummy record immediately?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review, @viirya .

Yes, this adds a dummy record (based on the user request) to proceed to load the actual file. However, if the actual file doesn't exist, FoundNotFoundException will be thrown. It means it's a user mistake. In that case, since we don't need this dummy record, we cleaned up.

@thejdeep
Copy link
Contributor

Thanks for addressing the cleanup regression @dongjoon-hyun .

Another thing to call out just for posterity sake is the version field will be shown as unknown for the on-demand loaded apps until checkForLogs runs on these later.

Screenshot 2025-07-22 at 3 59 40 PM

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jul 23, 2025

Thanks for addressing the cleanup regression @dongjoon-hyun .

Another thing to call out just for posterity sake is the version field will be shown as unknown for the on-demand loaded apps until checkForLogs runs on these later.

Screenshot 2025-07-22 at 3 59 40 PM

As I mentioned before, it will be updated at the next scan, @thejdeep .

Previously, it was the following situation until the next scan and this PR proposes the above until next scan.
Screenshot 2025-07-22 at 14 08 21

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jul 23, 2025

Do you have any other comments, @thejdeep and @mridulm ?

val info = ApplicationAttemptInfo(
attemptId, date, date, lastUpdate, 0, "spark", false, "unknown")
addListing(new ApplicationInfoWrapper(
ApplicationInfo(appId, appId, None, None, None, None, List.empty),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So supposedly, the appId should be correct to load the record, but other info are dummy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And once periodic scanning happens, it will update the record with correct information?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct, @viirya ~

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a kind of placeholder.

@thejdeep
Copy link
Contributor

@dongjoon-hyun Thanks for addressing all my comments. LGTM.

@dongjoon-hyun
Copy link
Member Author

Thank you for your thorough reviews, @thejdeep . It helps this feature a lot.

Comment on lines 324 to 325
val logPath = RollingEventLogFilesWriter.EVENT_LOG_DIR_NAME_PREFIX +
Utils.nameForAppAndAttempt(appId, attemptId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RollingEventLogFilesWriter.getAppEventLogDirPath also considers logBaseDir, don't we need to consider it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we don't need logBaseDir at this info.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. AttemptInfoWrapper.logPath doesn't contain base dir. When FsHistoryProvider tried to read for attempt, it will append base dir.

load(appId)
} catch {
case _: NoSuchElementException if this.conf.get(EVENT_LOG_ROLLING_ON_DEMAND_LOAD_ENABLED) =>
loadFromFallbackLocation(appId, attemptId, logPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if EVENT_LOG_ENABLE_ROLLING is disabled? Should we only do this if EVENT_LOG_ENABLE_ROLLING is enabled?

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, EVENT_LOG_ENABLE_ROLLING is per-application setting. This is SHS, @viirya .

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I proposed the config name spark.history.fs.update.onDemandEnabled at the first commit because this is SHS setting. However, it was revised during the review in order to be clear in the context of rolling.

- spark.history.fs.update.onDemandEnabled
+ spark.history.fs.eventLog.rolling.onDemandLoadEnabled

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see. A bit confusing setup to me. EVENT_LOG_ENABLE_ROLLING determines at the EventLoggingListener in SparkContext application. When FsHistoryProvider creates EventLogFileReader, it doesn't care about this config but decide it is single log or rolling log based on attempt lastIndex.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it sounds confusing. Basically, it's the same for event log compression codec.

Since a Spark job can choose spark.eventLog.compress and spark.eventLog.compression.codec arbitrarily, we need to inference from the file name. It's inevitable because Writers(Spark job) and Reader(SHS) are independent.

Comment on lines 324 to 325
val logPath = RollingEventLogFilesWriter.EVENT_LOG_DIR_NAME_PREFIX +
Utils.nameForAppAndAttempt(appId, attemptId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems it's better to call existing method for preparing the log name. Otherwise, they might be out of sync unintentionally.

Suggested change
val logPath = RollingEventLogFilesWriter.EVENT_LOG_DIR_NAME_PREFIX +
Utils.nameForAppAndAttempt(appId, attemptId)
val logPath = RollingEventLogFilesWriter.EVENT_LOG_DIR_NAME_PREFIX +
EventLogFileWriter.nameForAppAndAttempt(appId, attemptId)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Ya, I agree. For now, it's a simple alias, you are right for the future.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jul 23, 2025

Thank you so much, @LuciferYang , @mridulm , @thejdeep , @yaooqinn , @viirya , @peter-toth .

Merged to master for Apache Spark 4.1.0.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-52914 branch July 23, 2025 17:33
dongjoon-hyun added a commit that referenced this pull request Oct 16, 2025
### What changes were proposed in this pull request?

This PR aims to document newly added `core` module configurations as a part of Apache Spark 4.1.0 preparation.

### Why are the changes needed?

To help the users use new features easily.

- #47856
- #51130
- #51163
- #51604
- #51630
- #51708
- #51885
- #52091
- #52382

### Does this PR introduce _any_ user-facing change?

No behavior change because this is a documentation update.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #52626 from dongjoon-hyun/SPARK-53926.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants