Skip to content

Conversation

@pan3793
Copy link
Member

@pan3793 pan3793 commented Feb 8, 2022

What changes were proposed in this pull request?

This is a backport PR of #33664 for branch-3.1.

Why are the changes needed?

We found a query in production that cost lots of time in optimize phase when enable DPP, the SQL pattern likes

select <cols...>
from a
left join b on a.<col> = b.<col>
left join c on b.<col> = c.<col>
left join d on c.<col> = d.<col>
left join e on d.<col> = e.<col>
left join f on e.<col> = f.<col>
left join g on f.<col> = g.<col>
left join h on g.<col> = h.<col>
...
Before this PR, Analyzer/Optimizer costs 15821.6 seconds
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 24763671
Total time: 15821.588159083 seconds

Rule                                                                                               Effective Time / Total Time                     Effective Runs / Total Runs

org.apache.spark.sql.catalyst.optimizer.Optimizer$OptimizeSubqueries                               6325784430499 / 14213630904736                  2047 / 342802
org.apache.spark.sql.catalyst.optimizer.ColumnPruning                                              28009746 / 267164777817                         1 / 691736
org.apache.spark.sql.catalyst.analysis.UpdateAttributeNullability                                  0 / 118329246545                                0 / 342804
org.apache.spark.sql.catalyst.optimizer.RemoveRedundantAliases                                     8997600 / 71070686823                           1 / 348934
org.apache.spark.sql.catalyst.optimizer.BooleanSimplification                                      923149600 / 59903945393                         2036 / 348934
org.apache.spark.sql.catalyst.optimizer.NullPropagation                                            12343892 / 52895299683                          1 / 348934
org.apache.spark.sql.execution.datasources.SchemaPruning                                           0 / 51638842573                                 0 / 171401
org.apache.spark.sql.catalyst.optimizer.UnwrapCastInBinaryComparison                               0 / 42686293006                                 0 / 348934
org.apache.spark.sql.catalyst.optimizer.SimplifyConditionals                                       0 / 41190229736                                 0 / 348934
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints                                1482075440 / 40899690529                        2048 / 171401
org.apache.spark.sql.catalyst.optimizer.SimplifyBinaryComparison                                   0 / 40747831162                                 0 / 348934
org.apache.spark.sql.catalyst.optimizer.SimplifyCaseConversionExpressions                          0 / 39946334062                                 0 / 348934
org.apache.spark.sql.catalyst.optimizer.FoldablePropagation                                        655084587 / 39426073462                         2047 / 348934
org.apache.spark.sql.catalyst.optimizer.PruneFilters                                               489780 / 34731670364                            1 / 520335
org.apache.spark.sql.catalyst.optimizer.OptimizeJsonExprs                                          0 / 33533545307                                 0 / 348934
org.apache.spark.sql.catalyst.optimizer.ReplaceNullWithFalseInPredicate                            0 / 31910754610                                 0 / 348934
org.apache.spark.sql.catalyst.optimizer.ConstantFolding                                            22871375 / 31308048659                          1 / 348934
org.apache.spark.sql.catalyst.optimizer.LikeSimplification                                         0 / 31276042884                                 0 / 348934
org.apache.spark.sql.catalyst.optimizer.OptimizeWindowFunctions                                    0 / 31100888868                                 0 / 348934
org.apache.spark.sql.catalyst.optimizer.ReorderAssociativeOperator                                 0 / 30855866688                                 0 / 348934
org.apache.spark.sql.catalyst.optimizer.SimplifyCasts                                              0 / 30414758547                                 0 / 348934
org.apache.spark.sql.catalyst.optimizer.OptimizeIn                                                 0 / 30211101358                                 0 / 348934
org.apache.spark.sql.catalyst.optimizer.OptimizeUpdateFields                                       0 / 29986491532                                 0 / 348936
org.apache.spark.sql.catalyst.optimizer.RemoveDispensableExpressions                               0 / 29740191900                                 0 / 348934
org.apache.spark.sql.catalyst.optimizer.SimplifyExtractValueOps                                    0 / 25399289000                                 0 / 348934
org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery                            0 / 21381558331                                 0 / 348934
org.apache.spark.sql.catalyst.optimizer.PushDownPredicates                                         5831922974 / 18327095753                        3072 / 691737
org.apache.spark.sql.catalyst.optimizer.ReplaceExpressions                                         5007709 / 17431480903                           1 / 171401
org.apache.spark.sql.catalyst.optimizer.GetCurrentDatabaseAndCatalog                               0 / 15321596322                                 0 / 171401
org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTime                                         0 / 15060720651                                 0 / 171401
org.apache.spark.sql.catalyst.optimizer.DecimalAggregates                                          0 / 14874103589                                 0 / 171401
org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions                               50420951 / 14798400219                          1 / 171401
org.apache.spark.sql.catalyst.optimizer.ReplaceUpdateFieldsExpression                              0 / 14791439186                                 0 / 171401
org.apache.spark.sql.catalyst.optimizer.EliminateMapObjects                                        0 / 14361213979                                 0 / 171401
org.apache.spark.sql.catalyst.optimizer.ReassignLambdaVariableID                                   0 / 14258459712                                 0 / 171401
org.apache.spark.sql.catalyst.optimizer.RewriteNonCorrelatedExists                                 0 / 14094898098                                 0 / 171401
org.apache.spark.sql.catalyst.optimizer.RemoveNoopOperators                                        1975195 / 12967136767                           2 / 691736
org.apache.spark.sql.catalyst.optimizer.PullupCorrelatedPredicates                                 0 / 12573667353                                 0 / 171401
org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown                               0 / 11355809540                                 0 / 171401
org.apache.spark.sql.execution.dynamicpruning.CleanupDynamicPruningFilters                         350911193 / 9931582633                          2048 / 171401
org.apache.spark.sql.catalyst.optimizer.ConstantPropagation                                        0 / 8268511133                                  0 / 348934
org.apache.spark.sql.catalyst.optimizer.CollapseProject                                            10135014 / 6764166524                           1 / 520335
org.apache.spark.sql.catalyst.optimizer.EliminateResolvedHint                                      0 / 6244880486                                  0 / 171401
org.apache.spark.sql.catalyst.optimizer.CombineUnions                                              0 / 5811583863                                  0 / 520335
org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates                                  28926471 / 5423624109                           1 / 171401
org.apache.spark.sql.catalyst.optimizer.CollapseRepartition                                        0 / 5245332974                                  0 / 348934
org.apache.spark.sql.catalyst.optimizer.EliminateSorts                                             0 / 5051398253                                  0 / 171401
org.apache.spark.sql.catalyst.optimizer.PropagateEmptyRelation                                     0 / 4975057807                                  0 / 342802
org.apache.spark.sql.catalyst.optimizer.NormalizeFloatingNumbers                                   0 / 4804919393                                  0 / 171401
org.apache.spark.sql.execution.python.ExtractPythonUDFFromAggregate                                0 / 4408441952                                  0 / 171401
org.apache.spark.sql.catalyst.optimizer.CollapseWindow                                             0 / 4349408795                                  0 / 348934
org.apache.spark.sql.catalyst.optimizer.EliminateSerialization                                     0 / 4170274724                                  0 / 348934
org.apache.spark.sql.catalyst.optimizer.TransposeWindow                                            0 / 4113327220                                  0 / 348934
org.apache.spark.sql.catalyst.optimizer.ReorderJoin                                                0 / 4107443950                                  0 / 348934
org.apache.spark.sql.catalyst.optimizer.CombineFilters                                             0 / 4061601886                                  0 / 348934
org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion                                 0 / 4042615936                                  0 / 348934
org.apache.spark.sql.catalyst.optimizer.PushDownLeftSemiAntiJoin                                   0 / 4024293929                                  0 / 348934
org.apache.spark.sql.catalyst.optimizer.EliminateLimits                                            0 / 3908219408                                  0 / 348934
org.apache.spark.sql.catalyst.optimizer.RewritePredicateSubquery                                   135506567 / 3895045754                          2047 / 171401
org.apache.spark.sql.catalyst.optimizer.EliminateOuterJoin                                         0 / 3875669986                                  0 / 348934
org.apache.spark.sql.catalyst.optimizer.PushLeftSemiLeftAntiThroughJoin                            0 / 3875663441                                  0 / 348934
org.apache.spark.sql.catalyst.optimizer.LimitPushDown                                              0 / 3859481346                                  0 / 348934
org.apache.spark.sql.catalyst.optimizer.ReplaceDeduplicateWithAggregate                            0 / 3404517096                                  0 / 171401
org.apache.spark.sql.catalyst.optimizer.PushExtraPredicateThroughJoin                              0 / 2876802099                                  0 / 171401
org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughNonJoin                                0 / 2546037832                                  0 / 171401
org.apache.spark.sql.catalyst.optimizer.ExtractPythonUDFFromJoinCondition                          0 / 2392444200                                  0 / 171401
org.apache.spark.sql.catalyst.optimizer.RemoveLiteralFromGroupExpressions                          0 / 2297492778                                  0 / 171401
org.apache.spark.sql.catalyst.optimizer.RewriteExceptAll                                           0 / 2239751163                                  0 / 171401
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromGenerate                                   0 / 2189398940                                  0 / 171401
org.apache.spark.sql.catalyst.optimizer.RemoveRepetitionFromGroupExpressions                       0 / 2187645769                                  0 / 171401
org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithFilter                                    0 / 2135169574                                  0 / 171401
org.apache.spark.sql.execution.python.ExtractGroupingPythonUDFFromAggregate                        0 / 2058019085                                  0 / 171401
org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases                                    602948 / 2030110600                             1 / 171401
org.apache.spark.sql.catalyst.optimizer.OptimizeLimitZero                                          0 / 2028208284                                  0 / 171401
org.apache.spark.sql.catalyst.optimizer.CombineTypedFilters                                        0 / 1985665817                                  0 / 171401
org.apache.spark.sql.catalyst.analysis.EliminateView                                               0 / 1965869470                                  0 / 171401
org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning                                    0 / 1892132711                                  0 / 171401
org.apache.spark.sql.catalyst.optimizer.RewriteIntersectAll                                        0 / 1891839429                                  0 / 171401
org.apache.spark.sql.catalyst.optimizer.ReplaceIntersectWithSemiJoin                               0 / 1873028976                                  0 / 171401
org.apache.spark.sql.catalyst.optimizer.ReplaceDistinctWithAggregate                               0 / 1872377392                                  0 / 171401
org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithAntiJoin                                  0 / 1860128741                                  0 / 171401
org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts                                     0 / 405470117                                   0 / 342802
org.apache.spark.sql.execution.OptimizeMetadataOnlyQuery                                           0 / 242854215                                   0 / 171401
org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder                                       0 / 226833963                                   0 / 171401
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences                                  117151873 / 123533385                           7 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion                     102066025 / 121689146                           5 / 13
org.apache.spark.sql.catalyst.optimizer.CombineConcats                                             0 / 112256648                                   0 / 348934
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts                              76183548 / 108546928                            4 / 13
org.apache.spark.sql.catalyst.optimizer.EliminateAggregateFilter                                   0 / 107905167                                   0 / 348934
org.apache.spark.sql.catalyst.optimizer.EliminateDistinct                                          0 / 103804760                                   0 / 171401
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions                                   85607389 / 94162213                             8 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$Division                                       76446295 / 93129469                             5 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings                                 53450988 / 87954605                             1 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator                                   0 / 85268434                                    0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$IfCoercion                                     50348094 / 80404327                             4 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion                               39786113 / 59089899                             3 / 13
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone                                             56110133 / 57619849                             10 / 13
org.apache.spark.sql.execution.dynamicpruning.PartitionPruning                                     22888739 / 54329716                             1 / 171401
org.apache.spark.sql.catalyst.analysis.DecimalPrecision                                            0 / 49832160                                    0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion                                   0 / 46158859                                    0 / 13
org.apache.spark.sql.execution.datasources.FindDataSourceTable                                     43323608 / 44575170                             1 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ConcatCoercion                                 14900959 / 39466022                             1 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveBinaryArithmetic                            0 / 38815701                                    0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality                                0 / 36470552                                    0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$DateTimeOperations                             7216638 / 35530574                              1 / 13
org.apache.spark.sql.execution.python.ExtractPythonUDFs                                            0 / 34271447                                    0 / 171401
org.apache.spark.sql.catalyst.analysis.ResolveLambdaVariables                                      0 / 32166544                                    0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$IntegralDivision                               0 / 30547791                                    0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveExpressionsWithNamePlaceholders                      0 / 30506602                                    0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$StringLiteralCoercion                          0 / 30116261                                    0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveHigherOrderFunctions                                 0 / 27859446                                    0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$EltCoercion                                    0 / 24218064                                    0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WindowFrameCoercion                            0 / 23413444                                    0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRandomSeed                                  0 / 22036715                                    0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$StackCoercion                                  0 / 21957835                                    0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder                                 0 / 21463552                                    0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$MapZipWithCoercion                             0 / 20489026                                    0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame                                 0 / 20481142                                    0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates                                   0 / 19715779                                    0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveMissingReferences                           0 / 19693693                                    0 / 13
org.apache.spark.sql.catalyst.analysis.TimeWindowing                                               0 / 19530333                                    0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations                                   16526728 / 18814733                             1 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery                                    0 / 16649474                                    0 / 13
org.apache.spark.sql.execution.aggregate.ResolveEncodersInScalaAgg                                 0 / 15275758                                    0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions                                    0 / 11329936                                    0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer                                4281027 / 10765657                              1 / 13
org.apache.spark.sql.catalyst.analysis.CTESubstitution                                             0 / 9577794                                     0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases                                     0 / 9217208                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns                                 0 / 8289279                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions                           0 / 8192539                                     0 / 13
org.apache.spark.sql.catalyst.analysis.ApplyCharTypePadding                                        0 / 7991101                                     0 / 2
org.apache.spark.sql.catalyst.analysis.CleanupAliases                                              5307130 / 7435920                               1 / 3
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WidenSetOperationTypes                         0 / 6506053                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast                                      0 / 6446727                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNewInstance                                 0 / 6075593                                     0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog                                       0 / 4092936                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOrdinalInOrderByAndGroupBy                  0 / 3430531                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTables                                      0 / 2966446                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics                           0 / 2864876                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$HandleNullInputsForUDF                             0 / 2818668                                     0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggAliasInGroupBy                           0 / 2816430                                     0 / 13
org.apache.spark.sql.execution.analysis.DetectAmbiguousSelfJoin                                    0 / 2491546                                     0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveUnion                                                0 / 2477483                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation                              0 / 2463866                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF                               0 / 2376711                                     0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate                                    0 / 2133804                                     0 / 13
org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec                                        0 / 1730246                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubqueryColumnAliases                       0 / 1712259                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions                          0 / 1698918                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolvePivot                                       0 / 1618569                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin                         0 / 1589833                                     0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveCatalogs                                             0 / 1560151                                     0 / 13
org.apache.spark.sql.catalyst.expressions.codegen.package$ExpressionCanonicalizer$CleanExpressions 0 / 1500410                                     0 / 1049
org.apache.spark.sql.catalyst.analysis.ResolveInlineTables                                         0 / 1463204                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUserSpecifiedColumns                        0 / 1355593                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNamespace                                   0 / 1339813                                     0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions                                 0 / 1338443                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveInsertInto                                  0 / 1323062                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$PullOutNondeterministic                            0 / 1318399                                     0 / 2
org.apache.spark.sql.execution.datasources.ResolveSQLOnFile                                        0 / 1302134                                     0 / 13
org.apache.spark.sql.execution.datasources.FallBackFileSourceV2                                    0 / 1074454                                     0 / 13
org.apache.spark.sql.catalyst.analysis.SubstituteUnresolvedOrdinals                                0 / 698629                                      0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveJoinStrategyHints                       0 / 452536                                      0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveCoalesceHints                           0 / 306642                                      0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$WindowsSubstitution                                0 / 300756                                      0 / 2
org.apache.spark.sql.execution.datasources.PreprocessTableCreation                                 0 / 276685                                      0 / 2
org.apache.spark.sql.catalyst.analysis.EliminateUnions                                             0 / 214857                                      0 / 2
org.apache.spark.sql.catalyst.analysis.UpdateOuterReferences                                       0 / 203514                                      0 / 2
org.apache.spark.sql.execution.datasources.PreprocessTableInsertion                                0 / 164163                                      0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveNoopDropTable                                        0 / 108345                                      0 / 2
org.apache.spark.sql.execution.datasources.DataSourceAnalysis                                      0 / 90574                                       0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAlterTableChanges                           0 / 85175                                       0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveHints$RemoveAllHints                                 0 / 82212                                       0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveHints$DisableHints                                   0 / 7780                                        0 / 2
After this PR, Analyzer/Optimizer costs 2.4 seconds seconds
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 2140
Total time: 2.407325128 seconds

Rule                                                                                               Effective Time / Total Time                     Effective Runs / Total Runs

org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts                              79087019 / 116017648                            4 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion                     88569423 / 112854377                            5 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences                                  89163583 / 95807127                             7 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$Division                                       73197512 / 91715660                             5 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions                                   73390555 / 81845702                             8 / 13
org.apache.spark.sql.catalyst.optimizer.ColumnPruning                                              24099474 / 80271976                             1 / 6
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator                                   0 / 79928019                                    0 / 13
org.apache.spark.sql.catalyst.optimizer.PushDownPredicates                                         73617831 / 79035882                             2 / 7
org.apache.spark.sql.catalyst.analysis.TypeCoercion$IfCoercion                                     52077624 / 78992522                             4 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings                                 39724587 / 73183778                             1 / 13
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone                                             52807613 / 54633834                             10 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$DateTimeOperations                             21340706 / 53040407                             1 / 13
org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions                               51595369 / 51595369                             1 / 1
org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion                               32842401 / 49095324                             3 / 13
org.apache.spark.sql.catalyst.analysis.DecimalPrecision                                            0 / 48574229                                    0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion                                   0 / 43384882                                    0 / 13
org.apache.spark.sql.execution.datasources.FindDataSourceTable                                     40348276 / 41709977                             1 / 13
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints                                40235278 / 40235278                             1 / 1
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ConcatCoercion                                 13542681 / 38418852                             1 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality                                0 / 35130781                                    0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$IntegralDivision                               0 / 35129152                                    0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$StringLiteralCoercion                          0 / 32116493                                    0 / 13
org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates                                  32065069 / 32065069                             1 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveBinaryArithmetic                            0 / 30166678                                    0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveLambdaVariables                                      0 / 30082254                                    0 / 13
org.apache.spark.sql.catalyst.optimizer.ConstantFolding                                            23910538 / 29049259                             1 / 4
org.apache.spark.sql.catalyst.analysis.ResolveExpressionsWithNamePlaceholders                      0 / 26220500                                    0 / 13
org.apache.spark.sql.execution.dynamicpruning.PartitionPruning                                     25857276 / 25857276                             1 / 1
org.apache.spark.sql.catalyst.optimizer.RemoveRedundantAliases                                     11800935 / 25453815                             1 / 4
org.apache.spark.sql.catalyst.analysis.TypeCoercion$EltCoercion                                    0 / 24882479                                    0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveHigherOrderFunctions                                 0 / 24773800                                    0 / 13
org.apache.spark.sql.catalyst.optimizer.BooleanSimplification                                      0 / 23687102                                    0 / 4
org.apache.spark.sql.catalyst.optimizer.OptimizeUpdateFields                                       0 / 22496833                                    0 / 6
org.apache.spark.sql.catalyst.analysis.UpdateAttributeNullability                                  0 / 22086610                                    0 / 4
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder                                 0 / 21776763                                    0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WindowFrameCoercion                            0 / 21668319                                    0 / 13
org.apache.spark.sql.execution.datasources.SchemaPruning                                           0 / 21119650                                    0 / 1
org.apache.spark.sql.catalyst.analysis.TypeCoercion$StackCoercion                                  0 / 20995370                                    0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$MapZipWithCoercion                             0 / 20852224                                    0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame                                 0 / 20482582                                    0 / 13
org.apache.spark.sql.catalyst.optimizer.NullPropagation                                            8449073 / 20310580                              1 / 4
org.apache.spark.sql.catalyst.optimizer.SimplifyConditionals                                       0 / 19685107                                    0 / 4
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveMissingReferences                           0 / 18005681                                    0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRandomSeed                                  0 / 17873443                                    0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates                                   0 / 16909722                                    0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations                                   14231820 / 16732791                             1 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery                                    0 / 16280559                                    0 / 13
org.apache.spark.sql.catalyst.optimizer.FoldablePropagation                                        0 / 16121492                                    0 / 4
org.apache.spark.sql.catalyst.analysis.TimeWindowing                                               0 / 15969419                                    0 / 13
org.apache.spark.sql.execution.aggregate.ResolveEncodersInScalaAgg                                 0 / 15441100                                    0 / 13
org.apache.spark.sql.catalyst.optimizer.SimplifyBinaryComparison                                   0 / 14334553                                    0 / 4
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer                                5534984 / 12617237                              1 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions                                    0 / 10014043                                    0 / 2
org.apache.spark.sql.catalyst.optimizer.SimplifyCaseConversionExpressions                          0 / 9761592                                     0 / 4
org.apache.spark.sql.catalyst.analysis.CTESubstitution                                             0 / 9747538                                     0 / 2
org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery                            0 / 9694808                                     0 / 4
org.apache.spark.sql.catalyst.optimizer.UnwrapCastInBinaryComparison                               0 / 9665154                                     0 / 4
org.apache.spark.sql.catalyst.optimizer.OptimizeJsonExprs                                          0 / 9430804                                     0 / 4
org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns                                 0 / 9322119                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases                                     0 / 9113696                                     0 / 13
org.apache.spark.sql.catalyst.optimizer.SimplifyExtractValueOps                                    0 / 8938695                                     0 / 4
org.apache.spark.sql.catalyst.optimizer.OptimizeIn                                                 0 / 8921298                                     0 / 4
org.apache.spark.sql.catalyst.optimizer.ReplaceNullWithFalseInPredicate                            0 / 8695066                                     0 / 4
org.apache.spark.sql.catalyst.optimizer.RemoveDispensableExpressions                               0 / 8438285                                     0 / 4
org.apache.spark.sql.catalyst.analysis.ApplyCharTypePadding                                        0 / 8406029                                     0 / 2
org.apache.spark.sql.catalyst.optimizer.ReorderAssociativeOperator                                 0 / 8196064                                     0 / 4
org.apache.spark.sql.catalyst.optimizer.CollapseProject                                            6878987 / 8000632                               1 / 5
org.apache.spark.sql.catalyst.optimizer.SimplifyCasts                                              0 / 7996708                                     0 / 4
org.apache.spark.sql.catalyst.optimizer.LikeSimplification                                         0 / 7983621                                     0 / 4
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions                           0 / 7948113                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast                                      0 / 7930266                                     0 / 13
org.apache.spark.sql.catalyst.optimizer.OptimizeWindowFunctions                                    0 / 7569548                                     0 / 4
org.apache.spark.sql.catalyst.optimizer.PruneFilters                                               1298537 / 7511669                               1 / 5
org.apache.spark.sql.execution.python.ExtractPythonUDFs                                            0 / 7419094                                     0 / 1
org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown                               0 / 7129007                                     0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNewInstance                                 0 / 6953704                                     0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WidenSetOperationTypes                         0 / 6409068                                     0 / 13
org.apache.spark.sql.catalyst.optimizer.RemoveNoopOperators                                        2948625 / 6073057                               2 / 6
org.apache.spark.sql.catalyst.analysis.CleanupAliases                                              3304603 / 5512866                               1 / 3
org.apache.spark.sql.catalyst.optimizer.ReplaceExpressions                                         5247113 / 5247113                               1 / 1
org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog                                       0 / 4331593                                     0 / 13
org.apache.spark.sql.catalyst.optimizer.NormalizeFloatingNumbers                                   0 / 3939580                                     0 / 1
org.apache.spark.sql.catalyst.optimizer.ConstantPropagation                                        0 / 3872746                                     0 / 4
org.apache.spark.sql.execution.dynamicpruning.CleanupDynamicPruningFilters                         3285689 / 3285689                               1 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTables                                      0 / 2954363                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics                           0 / 2772122                                     0 / 13
org.apache.spark.sql.catalyst.optimizer.DecimalAggregates                                          0 / 2751645                                     0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$HandleNullInputsForUDF                             0 / 2718652                                     0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOrdinalInOrderByAndGroupBy                  0 / 2682978                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggAliasInGroupBy                           0 / 2598472                                     0 / 13
org.apache.spark.sql.execution.python.ExtractPythonUDFFromAggregate                                0 / 2542252                                     0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF                               0 / 2451105                                     0 / 2
org.apache.spark.sql.catalyst.optimizer.EliminateMapObjects                                        0 / 2438919                                     0 / 1
org.apache.spark.sql.execution.analysis.DetectAmbiguousSelfJoin                                    0 / 2428491                                     0 / 2
org.apache.spark.sql.catalyst.expressions.codegen.package$ExpressionCanonicalizer$CleanExpressions 0 / 2277468                                     0 / 1049
org.apache.spark.sql.catalyst.optimizer.EliminateAggregateFilter                                   0 / 2273198                                     0 / 4
org.apache.spark.sql.catalyst.optimizer.EliminateSorts                                             0 / 2256059                                     0 / 1
org.apache.spark.sql.catalyst.optimizer.ReplaceUpdateFieldsExpression                              0 / 2227798                                     0 / 1
org.apache.spark.sql.catalyst.analysis.ResolveUnion                                                0 / 2220071                                     0 / 13
org.apache.spark.sql.catalyst.optimizer.PullupCorrelatedPredicates                                 0 / 2216158                                     0 / 1
org.apache.spark.sql.catalyst.optimizer.ReassignLambdaVariableID                                   0 / 2190345                                     0 / 1
org.apache.spark.sql.catalyst.optimizer.RewriteNonCorrelatedExists                                 0 / 2106499                                     0 / 1
org.apache.spark.sql.catalyst.optimizer.RemoveRepetitionFromGroupExpressions                       0 / 2088289                                     0 / 1
org.apache.spark.sql.catalyst.optimizer.CombineConcats                                             0 / 2073285                                     0 / 4
org.apache.spark.sql.catalyst.optimizer.EliminateResolvedHint                                      0 / 1922211                                     0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubqueryColumnAliases                       0 / 1900401                                     0 / 13
org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTime                                         0 / 1895150                                     0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate                                    0 / 1863956                                     0 / 13
org.apache.spark.sql.catalyst.optimizer.GetCurrentDatabaseAndCatalog                               0 / 1816019                                     0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions                          0 / 1787068                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin                         0 / 1769046                                     0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveCatalogs                                             0 / 1605576                                     0 / 13
org.apache.spark.sql.catalyst.optimizer.EliminateDistinct                                          0 / 1591594                                     0 / 1
org.apache.spark.sql.catalyst.analysis.ResolveInlineTables                                         0 / 1547587                                     0 / 13
org.apache.spark.sql.catalyst.optimizer.RewritePredicateSubquery                                   0 / 1526146                                     0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolvePivot                                       0 / 1506795                                     0 / 13
org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec                                        0 / 1502020                                     0 / 13
org.apache.spark.sql.catalyst.optimizer.PushExtraPredicateThroughJoin                              0 / 1442567                                     0 / 1
org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions                                 0 / 1420037                                     0 / 13
org.apache.spark.sql.execution.datasources.ResolveSQLOnFile                                        0 / 1399535                                     0 / 13
org.apache.spark.sql.execution.python.ExtractGroupingPythonUDFFromAggregate                        0 / 1385780                                     0 / 1
org.apache.spark.sql.catalyst.optimizer.Optimizer$OptimizeSubqueries                               0 / 1359249                                     0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNamespace                                   0 / 1356344                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation                              0 / 1355910                                     0 / 13
org.apache.spark.sql.execution.datasources.FallBackFileSourceV2                                    0 / 1325995                                     0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUserSpecifiedColumns                        0 / 1325072                                     0 / 13
org.apache.spark.sql.catalyst.optimizer.CollapseRepartition                                        0 / 1322557                                     0 / 4
org.apache.spark.sql.catalyst.optimizer.PushDownLeftSemiAntiJoin                                   0 / 1300710                                     0 / 4
org.apache.spark.sql.catalyst.optimizer.EliminateSerialization                                     0 / 1227217                                     0 / 4
org.apache.spark.sql.catalyst.analysis.Analyzer$PullOutNondeterministic                            0 / 1226182                                     0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveInsertInto                                  0 / 1225119                                     0 / 13
org.apache.spark.sql.catalyst.optimizer.PushLeftSemiLeftAntiThroughJoin                            0 / 1172415                                     0 / 4
org.apache.spark.sql.catalyst.optimizer.ReorderJoin                                                0 / 1158346                                     0 / 4
org.apache.spark.sql.catalyst.optimizer.PropagateEmptyRelation                                     0 / 1156508                                     0 / 2
org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion                                 0 / 1082244                                     0 / 4
org.apache.spark.sql.catalyst.optimizer.CombineUnions                                              0 / 1050254                                     0 / 5
org.apache.spark.sql.catalyst.optimizer.CollapseWindow                                             0 / 1003136                                     0 / 4
org.apache.spark.sql.catalyst.optimizer.TransposeWindow                                            0 / 1001447                                     0 / 4
org.apache.spark.sql.catalyst.optimizer.ExtractPythonUDFFromJoinCondition                          0 / 1001440                                     0 / 1
org.apache.spark.sql.catalyst.optimizer.LimitPushDown                                              0 / 974802                                      0 / 4
org.apache.spark.sql.catalyst.optimizer.EliminateLimits                                            0 / 927602                                      0 / 4
org.apache.spark.sql.catalyst.optimizer.CombineFilters                                             0 / 924219                                      0 / 4
org.apache.spark.sql.catalyst.optimizer.EliminateOuterJoin                                         0 / 768081                                      0 / 4
org.apache.spark.sql.catalyst.analysis.SubstituteUnresolvedOrdinals                                0 / 733641                                      0 / 2
org.apache.spark.sql.catalyst.optimizer.RemoveLiteralFromGroupExpressions                          0 / 449017                                      0 / 1
org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases                                    434360 / 434360                                 1 / 1
org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughNonJoin                                0 / 379178                                      0 / 1
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromGenerate                                   0 / 295206                                      0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$WindowsSubstitution                                0 / 292907                                      0 / 2
org.apache.spark.sql.catalyst.optimizer.CombineTypedFilters                                        0 / 256204                                      0 / 1
org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning                                    0 / 249912                                      0 / 1
org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveCoalesceHints                           0 / 249513                                      0 / 2
org.apache.spark.sql.catalyst.analysis.EliminateUnions                                             0 / 240773                                      0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveJoinStrategyHints                       0 / 230937                                      0 / 2
org.apache.spark.sql.catalyst.optimizer.ReplaceDeduplicateWithAggregate                            0 / 222074                                      0 / 1
org.apache.spark.sql.catalyst.analysis.EliminateView                                               0 / 216679                                      0 / 1
org.apache.spark.sql.catalyst.analysis.UpdateOuterReferences                                       0 / 137675                                      0 / 2
org.apache.spark.sql.execution.datasources.PreprocessTableCreation                                 0 / 124279                                      0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveNoopDropTable                                        0 / 111526                                      0 / 2
org.apache.spark.sql.execution.datasources.DataSourceAnalysis                                      0 / 100454                                      0 / 2
org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithFilter                                    0 / 99833                                       0 / 1
org.apache.spark.sql.catalyst.optimizer.OptimizeLimitZero                                          0 / 98320                                       0 / 1
org.apache.spark.sql.execution.datasources.PreprocessTableInsertion                                0 / 96634                                       0 / 2
org.apache.spark.sql.catalyst.optimizer.RewriteExceptAll                                           0 / 96403                                       0 / 1
org.apache.spark.sql.catalyst.optimizer.RewriteIntersectAll                                        0 / 94322                                       0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAlterTableChanges                           0 / 94092                                       0 / 2
org.apache.spark.sql.catalyst.optimizer.ReplaceIntersectWithSemiJoin                               0 / 92092                                       0 / 1
org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithAntiJoin                                  0 / 91821                                       0 / 1
org.apache.spark.sql.catalyst.analysis.ResolveHints$RemoveAllHints                                 0 / 91637                                       0 / 2
org.apache.spark.sql.catalyst.optimizer.ReplaceDistinctWithAggregate                               0 / 90672                                       0 / 1
org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts                                     0 / 42624                                       0 / 2
org.apache.spark.sql.execution.OptimizeMetadataOnlyQuery                                           0 / 25905                                       0 / 1
org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder                                       0 / 9020                                        0 / 1
org.apache.spark.sql.catalyst.analysis.ResolveHints$DisableHints                                   0 / 8111                                        0 / 2

The original description of SPARK-36444 did not show this improvement, but it does significantly improve the SQL compile performance for such cases.

Does this PR introduce any user-facing change?

Significant SQL compile performance improvement for some cases.

How was this patch tested?

Added UT.

@github-actions github-actions bot added the SQL label Feb 8, 2022
@pan3793
Copy link
Member Author

pan3793 commented Feb 8, 2022

cc @wangyum, I know it's a perf only change, but consider the huge improvement of sql compile performance, I think it's worth to backport to branch-3.1

@pan3793
Copy link
Member Author

pan3793 commented Feb 8, 2022

Also cc @cloud-fan @maryannxue @yaooqinn

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @pan3793 .
Thank you for making a PR, but Apache Spark community follows Semantic Versioning and doesn't allow backporting of new feature or improvements.

I'd like to recommend you to use Apache Spark 3.2.1 instead. If you need this in your fork, you can backport it into your fork directly instead.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, it would be great if you can upgrade Kyuubi to Apache Spark 3.2.1 instead of Apache Spark 3.1.2.

@pan3793
Copy link
Member Author

pan3793 commented Feb 8, 2022

Apache Spark community follows Semantic Versioning and doesn't allow backporting of new feature or improvements.

Thanks @dongjoon-hyun I think maybe we can treat it as a bug based on #33664 (comment), comparing to 2.4s Analyzer/Optimizer costs 15821.6 seconds looks really unreasonable.

To be clear, I'm asking you to upgrade Kyuubi to 3.2.1 instead of 3.1.2

Thanks for tips, Kyuubi supports both Spark 3.0/3.1/3.2, and builds against Spark 3.1 in default, we are planning to switch the default to Spark 3.2

cloud-fan pushed a commit that referenced this pull request Feb 18, 2022
### What changes were proposed in this pull request?

This PR propose to materialize `QueryPlan#subqueries` and pruned by `PLAN_EXPRESSION` on searching to improve the SQL compile performance.

### Why are the changes needed?

We found a query in production that cost lots of time in optimize phase (also include AQE optimize phase) when enable DPP, the SQL pattern likes

```
select <cols...>
from a
left join b on a.<col> = b.<col>
left join c on b.<col> = c.<col>
left join d on c.<col> = d.<col>
left join e on d.<col> = e.<col>
left join f on e.<col> = f.<col>
left join g on f.<col> = g.<col>
left join h on g.<col> = h.<col>
...
```
SPARK-36444 significantly reduces the optimize time (exclude AQE phase), see detail at #35431, but there are still lots of time costs in `InsertAdaptiveSparkPlan` on AQE optimize phase.

Before this change, the query costs 658s, after this change only costs 65s.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UTs.

Closes #35438 from pan3793/subquery.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@pan3793 pan3793 deleted the SPARK-36444-3.1 branch March 23, 2022 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants