Skip to content

Conversation

@viper-kun
Copy link
Contributor

When cache table in memory in spark sql, we allocate too more memory.

InMemoryColumnarTableScan.class
val initialBufferSize = columnType.defaultSize * batchSize
ColumnBuilder(attribute.dataType, initialBufferSize, attribute.name, useCompression)

BasicColumnBuilder.class
buffer = ByteBuffer.allocate(4 + size * columnType.defaultSize)

So total allocate size is (4+ size * columnType.defaultSize * columnType.defaultSize), We should change it to 4+ size * columnType.defaultSize.

@viper-kun
Copy link
Contributor Author

@liancheng @scwf is it OK?

@liancheng
Copy link
Contributor

Could you file a JIRA ticket and update the PR title to [SPARK-XXXX] [SQL] <title>?

@viper-kun viper-kun changed the title correct buffer size [SPARK-9973][SQL]correct buffer size Aug 14, 2015
@JoshRosen
Copy link
Contributor

Jenkins, this is ok to test.

@SparkQA
Copy link

SparkQA commented Aug 15, 2015

Test build #40973 has finished for PR 8189 at commit 6741f23.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

Thanks, I'm merging this to master.

@rxin Is it OK to have this one in branch-1.5 at this time?

@asfgit asfgit closed this in 182f9b7 Aug 16, 2015
@liancheng
Copy link
Contributor

@viper-kun Please add your name and email address to GitHub so that we can include that information while merging your PRs. I've added your information gathered from JIRA by hand this time.

@rxin
Copy link
Contributor

rxin commented Aug 16, 2015

This is fine for 1.5.

asfgit pushed a commit that referenced this pull request Aug 16, 2015
The `initialSize` argument of `ColumnBuilder.initialize()` should be the
number of rows rather than bytes.  However `InMemoryColumnarTableScan`
passes in a byte size, which makes Spark SQL allocate more memory than
necessary when building in-memory columnar buffers.

Author: Kun Xu <[email protected]>

Closes #8189 from viper-kun/errorSize.

(cherry picked from commit 182f9b7)
Signed-off-by: Cheng Lian <[email protected]>
CodingCat pushed a commit to CodingCat/spark that referenced this pull request Aug 17, 2015
The `initialSize` argument of `ColumnBuilder.initialize()` should be the
number of rows rather than bytes.  However `InMemoryColumnarTableScan`
passes in a byte size, which makes Spark SQL allocate more memory than
necessary when building in-memory columnar buffers.

Author: Kun Xu <[email protected]>

Closes apache#8189 from viper-kun/errorSize.
@viper-kun viper-kun deleted the errorSize branch January 18, 2017 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants