Skip to content

Conversation

@ahrens
Copy link
Contributor

@ahrens ahrens commented Mar 5, 2021

The histogram approximate median is calculated as about double what it
should be. When we are on the bucket that goes past the median, the
histogram is the previous bucket, plus part of this bucket.

Signed-off-by: Matthew Ahrens [email protected]

Pull request checklist

Please check if your PR fulfills the following requirements:

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been reviewed and added / updated if needed (for bug fixes / features)
  • Build was run locally and any changes were pushed
  • Lint has passed locally and any fixes were made for failures

Pull request type

Please check the type of change your PR introduces:

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

What is the current behavior?

sdb> spa -H
ADDR               NAME
------------------------------------------------------------
0xffff9e71f45e0000 dcenter
     seg-size   count
     --------   -----
       512.0B: 3821078 *
        1.0KB: 18548449 *****
        2.0KB: 13593997 ****
        4.0KB: 27047835 *******
        8.0KB: 65859562 ******************
       16.0KB: 97068066 **************************
       32.0KB: 120618582 *********************************
       64.0KB: 146639534 ****************************************
      128.0KB: 63314242 *****************
      256.0KB: 12225817 ***
      512.0KB: 2406714 *
        1.0MB: 223821
        2.0MB:   9345
        4.0MB:    568
        8.0MB:     94
       16.0MB:      5
     Approx. Median: 118.6KB

What is the new behavior?

ADDR               NAME
------------------------------------------------------------
0xffff9e71f45e0000 dcenter
     seg-size   count
     --------   -----
       512.0B: 3820762 *
        1.0KB: 18548933 *****
        2.0KB: 13591975 ****
        4.0KB: 27042948 *******
        8.0KB: 65845786 ******************
       16.0KB: 97047272 **************************
       32.0KB: 120593164 *********************************
       64.0KB: 146600971 ****************************************
      128.0KB: 63292556 *****************
      256.0KB: 12221468 ***
      512.0KB: 2404953 *
        1.0MB: 223489
        2.0MB:   9749
        4.0MB:   1048
        8.0MB:    341
       16.0MB:     43
       32.0MB:      7
     Approx. Median: 59.3KB

independent verification of approx median in this internal spreadsheet

Does this introduce a breaking change?

  • Yes
  • No

Other information

@ahrens ahrens requested review from mmaybee and sdimitro March 5, 2021 18:14
@sdimitro
Copy link
Contributor

sdimitro commented Mar 5, 2021

You probably need to change/regenerate the regression output now that you fixed this.
This link described how you can go about doing that: https://github.com/delphix/sdb/wiki/Integration-Tests

@ahrens
Copy link
Contributor Author

ahrens commented Mar 5, 2021

@sdimitro I was wondering how to do that, thanks! Perhaps that doc could be linked from the README and/or the wiki top page?

Copy link
Contributor

@mmaybee mmaybee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I should have noticed this when I was working on the histogram code!

The histogram approximate median is calculated as about double what it
should be.  When we are on the bucket that goes past the median, the
histogram is the *previous* bucket, plus part of this bucket.

Signed-off-by: Matthew Ahrens <[email protected]>
@ahrens ahrens force-pushed the histo branch 2 times, most recently from b0ce779 to d557319 Compare March 9, 2021 18:58
Copy link
Contributor

@sdimitro sdimitro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and thank you for taking time to fix this!

@ahrens ahrens merged commit db1b967 into delphix:master Mar 9, 2021
@ahrens ahrens deleted the histo branch March 9, 2021 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants