Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Mar 13, 2024

Rationale for this change

While implementing Variable-size Binary View Layout (thanks @ariesdevil !) in apache/arrow-rs#5481 it was not 100% clear if the inlined string was zero padded.

@bkietz noted that

The spec does say "padded with zero" https://github.com/apache/arrow/blob/main/docs/source/format/Columnar.rst?plain=1#L384 but it could be repeated in the surrounding paragraph. In any case, padded with zero is definitely the intent

    * Short strings, length <= 12
      | Bytes 0-3  | Bytes 4-15                            |
      |------------|---------------------------------------|
      | length     | data (padded with 0)                  |

What changes are included in this PR?

Add a sentence in the surrounding text to make it clear the inlined strings values are zero padded

Note I do not think this is a specification change (and therefore doesn't need a vote on the mailing list) as the spec already specifies the padding is zero (in the diagram). This simply clarifies the text to emphasize this point for ease of understanding

Are these changes tested?

Are there any user-facing changes?

@alamb alamb changed the title [Minor] Clarify VariableLengthStringView padding [Minor] Clarify VariableLengthStringView padding is 0 Mar 13, 2024
@alamb alamb changed the title [Minor] Clarify VariableLengthStringView padding is 0 [Minor] Clarify inlined strings in VariableLengthStringView is padded with0 Mar 13, 2024
@alamb alamb changed the title [Minor] Clarify inlined strings in VariableLengthStringView is padded with0 [Minor] Clarify inlined strings in VariableLengthStringView is padded with0 Mar 13, 2024
@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Mar 13, 2024
@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@kou kou changed the title [Minor] Clarify inlined strings in VariableLengthStringView is padded with0 MINOR: [Docs] Clarify inlined strings in VariableLengthStringView is padded with 0 Mar 13, 2024
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kou kou merged commit bd3fab4 into apache:main Mar 13, 2024
@kou kou removed the awaiting committer review Awaiting committer review label Mar 13, 2024
@github-actions github-actions bot added the awaiting merge Awaiting merge label Mar 13, 2024
@alamb alamb deleted the alamb/clarify_padding branch March 13, 2024 21:42
@alamb
Copy link
Contributor Author

alamb commented Mar 13, 2024

Thank you for the review @kou

@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit bd3fab4.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting merge Awaiting merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants