Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -175,10 +175,19 @@ abstract class FileCommitProtocol extends Logging {

/**
* Specifies that a file should be deleted with the commit of this job. The default
* implementation deletes the file immediately.
* implementation deletes the file immediately or moves file to trash based on whether
* the trash feature is enabled.
*
* See https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml
* for the relevant trash configuration
*/
def deleteWithJob(fs: FileSystem, path: Path, recursive: Boolean): Boolean = {
fs.delete(path, recursive)
if (fs.getConf.getInt("fs.trash.interval", 0) > 0 &&
Copy link
Contributor

@cloud-fan cloud-fan Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where can I find the document of fs.trash.interval? It's better to add a comment to link to the hadoop conf doc page, to explain that this conf indicates enabling the trash feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added a comment to link hadoop conf doc page. Thanks.

Trash.moveToAppropriateTrash(fs, path, fs.getConf)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this function call fail? if it will then it's better to try-catch it, log the error and fallback to non-trash behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me of #29552. I think at least the concerns raised there should be addressed here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the trash feature was merged and reverted historically from Apache Spark repository.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trash is a feature of Hadoop file system, and we need to think about how to integrate it with Spark. This may not be the only place to consider the trash feature.

true
} else {
fs.delete(path, recursive);
}
}

/**
Expand Down