Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
6b6273d
Fri Nov 4 19:03:57 PDT 2016
ericl Nov 5, 2016
40f4368
Fri Nov 4 19:42:34 PDT 2016
ericl Nov 5, 2016
3c8b43e
Fri Nov 4 19:56:49 PDT 2016
ericl Nov 5, 2016
38e09f4
Merge branch 'master' into sc-5027
ericl Nov 7, 2016
fb7ba10
Mon Nov 7 15:11:38 PST 2016
ericl Nov 7, 2016
3318970
Mon Nov 7 15:22:33 PST 2016
ericl Nov 7, 2016
aa2536f
correct several partition related behaviours of ExternalCatalog
cloud-fan Nov 7, 2016
bbe1e12
more test
ericl Nov 8, 2016
a905b09
Mon Nov 7 19:25:37 PST 2016
ericl Nov 8, 2016
3c43fa5
Mon Nov 7 19:28:49 PST 2016
ericl Nov 8, 2016
13aa481
Mon Nov 7 19:34:38 PST 2016
ericl Nov 8, 2016
37afbb1
Mon Nov 7 20:02:13 PST 2016
ericl Nov 8, 2016
dddee47
use Path
cloud-fan Nov 8, 2016
e97c17f
address comments
cloud-fan Nov 8, 2016
f85bb27
fix bug
cloud-fan Nov 8, 2016
6dc4cb5
Tue Nov 8 11:29:45 PST 2016
ericl Nov 8, 2016
d68d1db
Tue Nov 8 11:45:44 PST 2016
ericl Nov 8, 2016
b5bbb1d
Tue Nov 8 11:50:51 PST 2016
ericl Nov 8, 2016
d939b37
work around
cloud-fan Nov 8, 2016
fae929e
Tue Nov 8 19:42:53 PST 2016
ericl Nov 9, 2016
fbd7b42
Tue Nov 8 19:43:30 PST 2016
ericl Nov 9, 2016
4296612
Tue Nov 8 20:04:12 PST 2016
ericl Nov 9, 2016
001cb1d
Tue Nov 8 20:09:31 PST 2016
ericl Nov 9, 2016
91f87de
Tue Nov 8 20:11:02 PST 2016
ericl Nov 9, 2016
9dbc3f1
fix test
cloud-fan Nov 9, 2016
81bb266
Merge branch 'master' into sc-5027
ericl Nov 9, 2016
8e10ff7
upper case the test
ericl Nov 9, 2016
1f090b1
Merge remote-tracking branch 'github/pr/15797' into sc-5027
ericl Nov 9, 2016
1500566
merge case sensitivity fixes
ericl Nov 9, 2016
51c1322
fix crash in commit protocol
ericl Nov 10, 2016
63f7f2e
Merge remote-tracking branch 'upstream/master' into sc-5027
ericl Nov 10, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -82,9 +82,24 @@ abstract class FileCommitProtocol {
*
* The "dir" parameter specifies 2, and "ext" parameter specifies both 4 and 5, and the rest
* are left to the commit protocol implementation to decide.
*
* Important: it is the caller's responsibility to add uniquely identifying content to "ext"
* if a task is going to write out multiple files to the same dir. The file commit protocol only
* guarantees that files written by different tasks will not conflict.
*/
def newTaskTempFile(taskContext: TaskAttemptContext, dir: Option[String], ext: String): String

/**
* Similar to newTaskTempFile(), but allows files to committed to an absolute output location.
* Depending on the implementation, there may be weaker guarantees around adding files this way.
*
* Important: it is the caller's responsibility to add uniquely identifying content to "ext"
* if a task is going to write out multiple files to the same dir. The file commit protocol only
* guarantees that files written by different tasks will not conflict.
*/
def newTaskTempFileAbsPath(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want a default implementation? If the protocol doesn't implement this things will go seriously wrong at runtime wouldn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you unify this and newTaskTempFile? If we treat the default partition location like custom location.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about combining it but I think the method semantics become too subtle then.

taskContext: TaskAttemptContext, absoluteDir: String, ext: String): String

/**
* Commits a task after the writes succeed. Must be called on the executors when running tasks.
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,9 @@

package org.apache.spark.internal.io

import java.util.Date
import java.util.{Date, UUID}

import scala.collection.mutable

import org.apache.hadoop.conf.Configurable
import org.apache.hadoop.fs.Path
Expand All @@ -42,6 +44,19 @@ class HadoopMapReduceCommitProtocol(jobId: String, path: String)
/** OutputCommitter from Hadoop is not serializable so marking it transient. */
@transient private var committer: OutputCommitter = _

/**
* Tracks files staged by this task for absolute output paths. These outputs are not managed by
* the Hadoop OutputCommitter, so we must move these to their final locations on job commit.
*
* The mapping is from the temp output path to the final desired output path of the file.
*/
@transient private var addedAbsPathFiles: mutable.Map[String, String] = null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should document whether the strings are path to files, or just path to directories. I think they are just directories right? The naming suggests that they are files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are files. We need to track the unique output location of each file here in order to know where to place it. We could use directories, but they would end up with one file each anyways.


/**
* The staging directory for all files committed with absolute output paths.
*/
private def absPathStagingDir: Path = new Path(path, "_temporary-" + jobId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


protected def setupCommitter(context: TaskAttemptContext): OutputCommitter = {
val format = context.getOutputFormatClass.newInstance()
// If OutputFormat is Configurable, we should set conf to it.
Expand All @@ -54,11 +69,7 @@ class HadoopMapReduceCommitProtocol(jobId: String, path: String)

override def newTaskTempFile(
taskContext: TaskAttemptContext, dir: Option[String], ext: String): String = {
// The file name looks like part-r-00000-2dd664f9-d2c4-4ffe-878f-c6c70c1fb0cb_00003.gz.parquet
// Note that %05d does not truncate the split number, so if we have more than 100000 tasks,
// the file name is fine and won't overflow.
val split = taskContext.getTaskAttemptID.getTaskID.getId
val filename = f"part-$split%05d-$jobId$ext"
val filename = getFilename(taskContext, ext)

val stagingDir: String = committer match {
// For FileOutputCommitter it has its own staging path called "work path".
Expand All @@ -73,6 +84,28 @@ class HadoopMapReduceCommitProtocol(jobId: String, path: String)
}
}

override def newTaskTempFileAbsPath(
taskContext: TaskAttemptContext, absoluteDir: String, ext: String): String = {
val filename = getFilename(taskContext, ext)
val absOutputPath = new Path(absoluteDir, filename).toString

// Include a UUID here to prevent file collisions for one task writing to different dirs.
// In principle we could include hash(absoluteDir) instead but this is simpler.
val tmpOutputPath = new Path(
absPathStagingDir, UUID.randomUUID().toString() + "-" + filename).toString

addedAbsPathFiles(tmpOutputPath) = absOutputPath
tmpOutputPath
}

private def getFilename(taskContext: TaskAttemptContext, ext: String): String = {
// The file name looks like part-r-00000-2dd664f9-d2c4-4ffe-878f-c6c70c1fb0cb_00003.gz.parquet
// Note that %05d does not truncate the split number, so if we have more than 100000 tasks,
// the file name is fine and won't overflow.
val split = taskContext.getTaskAttemptID.getTaskID.getId
f"part-$split%05d-$jobId$ext"
}

override def setupJob(jobContext: JobContext): Unit = {
// Setup IDs
val jobId = SparkHadoopWriterUtils.createJobID(new Date, 0)
Expand All @@ -93,26 +126,42 @@ class HadoopMapReduceCommitProtocol(jobId: String, path: String)

override def commitJob(jobContext: JobContext, taskCommits: Seq[TaskCommitMessage]): Unit = {
committer.commitJob(jobContext)
val filesToMove = taskCommits.map(_.obj.asInstanceOf[Map[String, String]])
.foldLeft(Map[String, String]())(_ ++ _)
logDebug(s"Committing files staged for absolute locations $filesToMove")
val fs = absPathStagingDir.getFileSystem(jobContext.getConfiguration)
for ((src, dst) <- filesToMove) {
fs.rename(new Path(src), new Path(dst))
}
fs.delete(absPathStagingDir, true)
}

override def abortJob(jobContext: JobContext): Unit = {
committer.abortJob(jobContext, JobStatus.State.FAILED)
val fs = absPathStagingDir.getFileSystem(jobContext.getConfiguration)
fs.delete(absPathStagingDir, true)
}

override def setupTask(taskContext: TaskAttemptContext): Unit = {
committer = setupCommitter(taskContext)
committer.setupTask(taskContext)
addedAbsPathFiles = mutable.Map[String, String]()
}

override def commitTask(taskContext: TaskAttemptContext): TaskCommitMessage = {
val attemptId = taskContext.getTaskAttemptID
SparkHadoopMapRedUtil.commitTask(
committer, taskContext, attemptId.getJobID.getId, attemptId.getTaskID.getId)
EmptyTaskCommitMessage
new TaskCommitMessage(addedAbsPathFiles.toMap)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we just rename temp files to dest files in commitTask?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea it can go either way. Unclear which one is better. Renaming on job commit gives higher chance of corrupting data, whereas renaming in task commit is slightly more performant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer renaming in task commit.

}

override def abortTask(taskContext: TaskAttemptContext): Unit = {
committer.abortTask(taskContext)
// best effort cleanup of other staged files
for ((src, _) <- addedAbsPathFiles) {
val tmp = new Path(src)
tmp.getFileSystem(taskContext.getConfiguration).delete(tmp, false)
}
}

/** Whether we are using a direct output committer */
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -172,24 +172,20 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
val tableIdent = visitTableIdentifier(ctx.tableIdentifier)
val partitionKeys = Option(ctx.partitionSpec).map(visitPartitionSpec).getOrElse(Map.empty)

val dynamicPartitionKeys = partitionKeys.filter(_._2.isEmpty)
val dynamicPartitionKeys: Map[String, Option[String]] = partitionKeys.filter(_._2.isEmpty)
if (ctx.EXISTS != null && dynamicPartitionKeys.nonEmpty) {
throw new ParseException(s"Dynamic partitions do not support IF NOT EXISTS. Specified " +
"partitions with value: " + dynamicPartitionKeys.keys.mkString("[", ",", "]"), ctx)
}
val overwrite = ctx.OVERWRITE != null
val overwritePartition =
if (overwrite && partitionKeys.nonEmpty && dynamicPartitionKeys.isEmpty) {
Some(partitionKeys.map(t => (t._1, t._2.get)))
} else {
None
}
val staticPartitionKeys: Map[String, String] =
partitionKeys.filter(_._2.nonEmpty).map(t => (t._1, t._2.get))

InsertIntoTable(
UnresolvedRelation(tableIdent, None),
partitionKeys,
query,
OverwriteOptions(overwrite, overwritePartition),
OverwriteOptions(overwrite, if (overwrite) staticPartitionKeys else Map.empty),
ctx.EXISTS != null)
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -349,13 +349,15 @@ case class BroadcastHint(child: LogicalPlan) extends UnaryNode {
* Options for writing new data into a table.
*
* @param enabled whether to overwrite existing data in the table.
* @param specificPartition only data in the specified partition will be overwritten.
* @param staticPartitionKeys if non-empty, specifies that we only want to overwrite partitions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the partition spec by partial? we should document that here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep it is in the next sentence.

* that match this partial partition spec. If empty, all partitions
* will be overwritten.
*/
case class OverwriteOptions(
enabled: Boolean,
specificPartition: Option[CatalogTypes.TablePartitionSpec] = None) {
if (specificPartition.isDefined) {
assert(enabled, "Overwrite must be enabled when specifying a partition to overwrite.")
staticPartitionKeys: CatalogTypes.TablePartitionSpec = Map.empty) {
if (staticPartitionKeys.nonEmpty) {
assert(enabled, "Overwrite must be enabled when specifying specific partitions.")
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -185,9 +185,9 @@ class PlanParserSuite extends PlanTest {
OverwriteOptions(
overwrite,
if (overwrite && partition.nonEmpty) {
Some(partition.map(kv => (kv._1, kv._2.get)))
partition.map(kv => (kv._1, kv._2.get))
} else {
None
Map.empty
}),
ifNotExists)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -417,15 +417,17 @@ case class DataSource(
// will be adjusted within InsertIntoHadoopFsRelation.
val plan =
InsertIntoHadoopFsRelationCommand(
outputPath,
columns,
bucketSpec,
format,
_ => Unit, // No existing table needs to be refreshed.
options,
data.logicalPlan,
mode,
catalogTable)
outputPath = outputPath,
staticPartitionKeys = Map.empty,
customPartitionLocations = Map.empty,
partitionColumns = columns,
bucketSpec = bucketSpec,
fileFormat = format,
refreshFunction = _ => Unit, // No existing table needs to be refreshed.
options = options,
query = data.logicalPlan,
mode = mode,
catalogTable = catalogTable)
sparkSession.sessionState.executePlan(plan).toRdd
// Replace the schema with that of the DataFrame we just wrote out to avoid re-inferring it.
copy(userSpecifiedSchema = Some(data.schema.asNullable)).resolveRelation()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,10 @@ import org.apache.hadoop.fs.Path
import org.apache.spark.internal.Logging
import org.apache.spark.rdd.RDD
import org.apache.spark.sql._
import org.apache.spark.sql.catalyst.{CatalystConf, CatalystTypeConverters, InternalRow}
import org.apache.spark.sql.catalyst.{CatalystConf, CatalystTypeConverters, InternalRow, TableIdentifier}
import org.apache.spark.sql.catalyst.CatalystTypeConverters.convertToScala
import org.apache.spark.sql.catalyst.analysis._
import org.apache.spark.sql.catalyst.catalog.{CatalogTable, SimpleCatalogRelation}
import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogTablePartition, SimpleCatalogRelation}
import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
import org.apache.spark.sql.catalyst.expressions
import org.apache.spark.sql.catalyst.expressions._
Expand All @@ -37,7 +37,7 @@ import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Project, Union}
import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, UnknownPartitioning}
import org.apache.spark.sql.catalyst.rules.Rule
import org.apache.spark.sql.execution.{RowDataSourceScanExec, SparkPlan}
import org.apache.spark.sql.execution.command.{AlterTableAddPartitionCommand, DDLUtils, ExecutedCommandExec}
import org.apache.spark.sql.execution.command._
import org.apache.spark.sql.sources._
import org.apache.spark.sql.types._
import org.apache.spark.unsafe.types.UTF8String
Expand Down Expand Up @@ -182,41 +182,53 @@ case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] {
"Cannot overwrite a path that is also being read from.")
}

val overwritingSinglePartition =
overwrite.specificPartition.isDefined &&
val partitionSchema = query.resolve(
t.partitionSchema, t.sparkSession.sessionState.analyzer.resolver)
val partitionsTrackedByCatalog =
t.sparkSession.sessionState.conf.manageFilesourcePartitions &&
l.catalogTable.isDefined && l.catalogTable.get.partitionColumnNames.nonEmpty &&
l.catalogTable.get.tracksPartitionsInCatalog

val effectiveOutputPath = if (overwritingSinglePartition) {
val partition = t.sparkSession.sessionState.catalog.getPartition(
l.catalogTable.get.identifier, overwrite.specificPartition.get)
new Path(partition.location)
} else {
outputPath
}

val effectivePartitionSchema = if (overwritingSinglePartition) {
Nil
} else {
query.resolve(t.partitionSchema, t.sparkSession.sessionState.analyzer.resolver)
var initialMatchingPartitions: Seq[TablePartitionSpec] = Nil
var customPartitionLocations: Map[TablePartitionSpec, String] = Map.empty

// When partitions are tracked by the catalog, compute all custom partition locations that
// may be relevant to the insertion job.
if (partitionsTrackedByCatalog) {
val matchingPartitions = t.sparkSession.sessionState.catalog.listPartitions(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd create a new API in external catalog api and push this into it. In the future we can look into how to optimize this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need the set of matching partitions (including those with default locations) in order to determine which ones to delete at the end of an overwrite call.

This makes the optimization quite messy, so I'd rather not push it to the catalog for now.

l.catalogTable.get.identifier, Some(overwrite.staticPartitionKeys))
initialMatchingPartitions = matchingPartitions.map(_.spec)
customPartitionLocations = getCustomPartitionLocations(
t.sparkSession, l.catalogTable.get, outputPath, matchingPartitions)
}

// Callback for updating metastore partition metadata after the insertion job completes.
// TODO(ekl) consider moving this into InsertIntoHadoopFsRelationCommand
def refreshPartitionsCallback(updatedPartitions: Seq[TablePartitionSpec]): Unit = {
if (l.catalogTable.isDefined && updatedPartitions.nonEmpty &&
l.catalogTable.get.partitionColumnNames.nonEmpty &&
l.catalogTable.get.tracksPartitionsInCatalog) {
val metastoreUpdater = AlterTableAddPartitionCommand(
l.catalogTable.get.identifier,
updatedPartitions.map(p => (p, None)),
ifNotExists = true)
metastoreUpdater.run(t.sparkSession)
if (partitionsTrackedByCatalog) {
val newPartitions = updatedPartitions.toSet -- initialMatchingPartitions
if (newPartitions.nonEmpty) {
AlterTableAddPartitionCommand(
l.catalogTable.get.identifier, newPartitions.toSeq.map(p => (p, None)),
ifNotExists = true).run(t.sparkSession)
}
if (overwrite.enabled) {
val deletedPartitions = initialMatchingPartitions.toSet -- updatedPartitions
if (deletedPartitions.nonEmpty) {
AlterTableDropPartitionCommand(
l.catalogTable.get.identifier, deletedPartitions.toSeq,
ifExists = true, purge = true).run(t.sparkSession)
}
}
}
t.location.refresh()
}

val insertCmd = InsertIntoHadoopFsRelationCommand(
effectiveOutputPath,
effectivePartitionSchema,
outputPath,
if (overwrite.enabled) overwrite.staticPartitionKeys else Map.empty,
customPartitionLocations,
partitionSchema,
t.bucketSpec,
t.fileFormat,
refreshPartitionsCallback,
Expand All @@ -227,6 +239,34 @@ case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] {

insertCmd
}

/**
* Given a set of input partitions, returns those that have locations that differ from the
* Hive default (e.g. /k1=v1/k2=v2). These partitions were manually assigned locations by
* the user.
*
* @return a mapping from partition specs to their custom locations
*/
private def getCustomPartitionLocations(
spark: SparkSession,
table: CatalogTable,
basePath: Path,
partitions: Seq[CatalogTablePartition]): Map[TablePartitionSpec, String] = {
val hadoopConf = spark.sessionState.newHadoopConf
val fs = basePath.getFileSystem(hadoopConf)
val qualifiedBasePath = basePath.makeQualified(fs.getUri, fs.getWorkingDirectory)
partitions.flatMap { p =>
val defaultLocation = qualifiedBasePath.suffix(
"/" + PartitioningUtils.getPathFragment(p.spec, table.partitionSchema)).toString
val catalogLocation = new Path(p.location).makeQualified(
fs.getUri, fs.getWorkingDirectory).toString
if (catalogLocation != defaultLocation) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we distinguish partition locations that equal to default location? Partitions always have locations(custom specified or generated by default), do we really need to care about who set it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only purpose here is to optimize the common case where the directory scheme is followed. Othewise, we have to broadcast all the partition locations even if they are using the default.

Some(p.spec -> catalogLocation)
} else {
None
}
}.toMap
}
}


Expand Down
Loading