Skip to content

Add pointers tree to TempSpace class #8421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 22, 2025

Conversation

XaBbl4
Copy link
Contributor

@XaBbl4 XaBbl4 commented Jan 31, 2025

The problem is in the linear enumeration of the tree of free segments to find the best segment of the required size.
It appears in cases when a lot of free segments of a very small size appear in the tree, and with each request for allocateSpace - this cycle constantly enumerated.

This patch proposes a solution by adding a second tree, where the key is the size of the free segment and the payload is a reference to that segment in the first tree.

Test case in which this problem occurs:

CREATE TABLE LOG_TABLE (
    LOG_ID      BIGINT NOT NULL,
    FIELD_NAME  VARCHAR(31) NOT NULL,
    OLD_VALUE   BLOB SUB_TYPE 1 SEGMENT SIZE 80
);

CREATE TABLE TEST_TABLE (
    ID             BIGINT NOT NULL,
    FLD_01         VARCHAR(40),
    FLD_02         DATE,
    FLD_03         VARCHAR(40),
    FLD_04         DATE,
    FLD_05         VARCHAR(1000),
    FLD_06         VARCHAR(1000),
    FLD_07         NUMERIC(15,2),
    FLD_08         DATE,
    FLD_09         VARCHAR(95),
    FLD_10         BIGINT,
    FLD_11         BIGINT,
    FLD_12         BIGINT,
    FLD_13         VARCHAR(250),
    FLD_14         BIGINT,
    FLD_15         DATE,
    FLD_16         INTEGER,
    FLD_17         BIGINT,
    FLD_18         DATE,
    FLD_19         BIGINT,
    FLD_20         VARCHAR(95),
    FLD_21         DATE,
    FLD_22         BIGINT,
    FLD_23         NUMERIC(16,0),
    FLD_24         SMALLINT,
    FLD_25         BIGINT,
    FLD_26         BIGINT,
    FLD_27         SMALLINT default 0,
    FLD_28         DATE,
    FLD_29         SMALLINT,
    FLD_30         DATE,
    FLD_31         DATE,
    FLD_32         DATE,
    FLD_33         DATE,
    FLD_34         DATE,
    FLD_35         DATE,
    FLD_36         DATE,
    FLD_37         DATE,
    FLD_38         DATE,
    FLD_39         DATE,
    FLD_40         SMALLINT,
    FLD_41         SMALLINT,
    FLD_42         SMALLINT,
    FLD_43         BIGINT,
    FLD_44         BIGINT,
    FLD_45         BIGINT,
    FLD_46         SMALLINT,
    FLD_47         VARCHAR(1000),
    FLD_48         BIGINT,
    FLD_49         VARCHAR(4000),
    FLD_50         SMALLINT
);

ALTER TABLE TEST_TABLE ADD CONSTRAINT PK_TEST_TABLE PRIMARY KEY (ID) USING DESCENDING INDEX PK_TEST_TABLE;

set term ^ ;

create generator audit^

CREATE OR ALTER TRIGGER H$TEST_TABLE FOR TEST_TABLE
ACTIVE AFTER UPDATE POSITION 0
as
    declare id bigint;
    declare o BLOB SUB_TYPE 1 SEGMENT SIZE 80;
begin
    id = gen_id(audit, 1);
    o = old.FLD_01; if (o is distinct from new.FLD_01) then begin insert into LOG_TABLE(log_id, field_name, old_value) values(:id, 'FLD_01', :o); end
    o = old.FLD_02; if (o is distinct from new.FLD_02) then begin insert into LOG_TABLE(log_id, field_name, old_value) values(:id, 'FLD_02', :o); end
    
    -- ... the remaining lines for fields FLD_03..48
    
    o = old.FLD_49; if (o is distinct from new.FLD_49) then begin insert into LOG_TABLE(log_id, field_name, old_value) values(:id, 'FLD_49', :o); end
    o = old.FLD_50; if (o is distinct from new.FLD_50) then begin insert into LOG_TABLE(log_id, field_name, old_value) values(:id, 'FLD_50', :o); end
end^

commit^

-- Fill the table with any data
execute block
as
    declare id bigint;
begin
    id = 0;
    while (id < 128000) do
    begin
        insert into TEST_TABLE (ID,FLD_01,FLD_02,FLD_03,FLD_04,FLD_05,FLD_06,FLD_07,FLD_08,FLD_09,FLD_10,FLD_11,FLD_12,FLD_13,FLD_14,FLD_15,FLD_16,FLD_17,FLD_18,FLD_19,FLD_20,FLD_21,FLD_22,FLD_23,FLD_24,FLD_25,FLD_26,FLD_27,FLD_28,FLD_29,FLD_30,FLD_31,FLD_32,FLD_33,FLD_34,FLD_35,FLD_36,FLD_37,FLD_38,FLD_39,FLD_40,FLD_41,FLD_42,FLD_43,FLD_44,FLD_45,FLD_46,FLD_47,FLD_48,FLD_49,FLD_50)
            values (:id, 'Identifier', '2025-01-23', '1234567890123456789', '2025-01-23', 'Test data', 'Test another data', 500, '2025-01-24', 'Test Test Test', 3, 12345678901234, 12, 'Test', 12345678901234, '2025-01-30', 0, NULL, '2025-01-31', 12345679801234, 'Test Test data', NULL, 1234, 1234567, 2025, NULL, NULL, 0, '2025-02-01', 0, NULL, NULL, NULL, NULL, '2025-01-15', NULL, NULL, NULL, NULL, NULL, 1, 0, 0, NULL, 12346579801234, 12345678901234, NULL, 'Long test data for varchar(1000)', 12345678901234, 'Very long test data for varchar(4000)... Very long test data for varchar(4000)... Very long test data for varchar(4000)... Very long test data for varchar(4000)... Very long test data for varchar(4000)...', 0);

        id = id + 1;
    end
end^

commit^

-- Procedure for running the test
create procedure run_test(a int)
returns (t_cnt int, t_diff bigint)
as
    declare id bigint;
    declare t_begin timestamp;
    declare t_end timestamp;
begin
    t_cnt = a;
    t_begin = 'now';
    for select id from TEST_TABLE where id >= 0 and id < :t_cnt into :id do
        update TEST_TABLE set FLD_50 = :t_cnt where id = :id;
    t_end = 'now';
    t_diff = datediff(millisecond from :t_begin to :t_end);
    suspend;
end^

commit^

set term ; ^

-- Run the test with different number of records
select t.* from run_test(1000) as t;
commit;
select t.* from run_test(2000) as t;
commit;
select t.* from run_test(4000) as t;
commit;
select t.* from run_test(8000) as t;
commit;
select t.* from run_test(16000) as t;
commit;
select t.* from run_test(32000) as t;
commit;
select t.* from run_test(64000) as t;
commit;
select t.* from run_test(128000) as t;
commit;

After running the test before and after the patch we have:

CNT       BEFORE     K_B   AFTER     K_A        K
1000         937       -     648       -    1.446
2000        1824   1.947    1090   1.682    1.673
4000        4918   2.696    2326   2.134    2.114
8000       16217   3.297    4639   1.994    3.496
16000      53338   3.289    9340   2.013    5.711
32000     199328   3.737   19625   2.101   10.157
64000     826012   4.144   43653   2.224   18.922
128000   3924806   4.752   99759   2.285   39.343
  • before the patch we have uneven growth of execution time (K_B, e.g. BEFORE(128000) / BEFORE(64000) = 4.752), and the more records are changed in one transaction - the time goes to the sky :-)
  • after the patch we have approximately equal increase in execution time (K_A, e.g. AFTER(128000) / AFTER(64000) = 2.285) and overall acceleration of execution relative to the variant without the patch (K, e.g. BEFORE(128000) / AFTER(128000) = 39.343)

Necessary for faster search of a free segment of the required size. When using temporary blobs, there are situations when a large number of free segments of a small size accumulate during one transaction.
@hvlad
Copy link
Member

hvlad commented Jan 31, 2025

This patch proposes a solution by adding a second tree, where the key is the size of the free segment and the payload is a reference to that segment in the first tree.

AFAIU, the second tree contains chains of pointers to the segments of the same size. The code looks over complicated for me, SegmentLastPointer is definitely not the best name and add a lot of confusion for reader.

Did you consider to implement exactly what was described and much more simpler - second tree with plain segment pointers ordered by segment size ?

@hvlad
Copy link
Member

hvlad commented Jan 31, 2025

Even better could be to add map <size, position> (not <size, Segment*>) - it allows to return back to the Segment at FreeSegmentTree and not allocate separately every single Segment (as it was before the patch)

@hvlad
Copy link
Member

hvlad commented Feb 1, 2025

Tried to run the test with not patched master:

select t.* from run_test(64000) as t;

Arithmetic overflow or division by zero has occurred.
arithmetic exception, numeric overflow, or string truncation.
numeric value is out of range.
At procedure RUN_TEST line: 11, col: 9.

Note: FLD_50 SMALLINT

I've fixed RUN_TEST to avoid overflow and my results is very different:

       T_CNT                T_DIFF
============ =====================
        1000                    16
        2000                    40
        4000                    66
        8000                   130
       16000                   327
       32000                   534
       64000                  1987
      128000                  7023

Then I set page cache to 50000 pages and TempCacheLimit to 1G to avoid on-disk operations until commit,
and results is:

       T_CNT                T_DIFF
============ =====================
        1000                    15
        2000                    42
        4000                    65
        8000                   135
       16000                   249
       32000                   539
       64000                  1092
      128000                  2317

It shows almost ideal increase of execution time

@XaBbl4
Copy link
Contributor Author

XaBbl4 commented Feb 3, 2025

Note: FLD_50 SMALLINT

Sorry, my mistake, I initially tested on FLD_48 when preparing the resulting data, and then changed it to FLD_50 and remove 64k and 128k records in the test, because even on these values ​​the result is visible.

I've fixed RUN_TEST to avoid overflow and my results is very different,

On the release build, the effect is not so visible on such a sample, so the results are collected on a debug build, this is not entirely correct, but effectively.

Even better could be to add map <size, position> (not <size, Segment*>) - it allows to return back to the Segment at FreeSegmentTree and not allocate separately every single Segment (as it was before the patch)

FreeSegmentTree cannot be return back to the Segment (as it was before the patch), because when the tree level is changed, all objects in the tree are recreated, and the internal pointers of the doubly linked list (prev, next) begin to point to nowhere.
I can change it to map<size, position>, but this will give an additional call to freeSegments.locate(position), when in the current version patch we immediately get pointer to Segment*.

Previously I add debug information to the code:

  • count of call allocateSpace;
  • count of call releaseSpace;
  • approx count in freeSegment tree at the end of the query execution;
  • count of cycle iterations in allocateSpace for search available free segment;

And on a real DB the results were as follows:

  T_CNT   allocateSpace   releaseSpace   count in tree     cycle iterations
  5 000         149 000          5 000           4 500          332 000 000
 10 000         298 000         10 000           9 000        1 350 000 000
 20 000         596 000         20 000          18 000        5 440 000 000
700 000      21 000 000        700 000         630 000   ~6 615 000 000 000

It follows that almost all free segments are not reused (the size of free segments was less than 16 bytes), and with each call releaseSpace, the number of passes through the cycle in allocateSpace increased.
But I haven't checked these information on this testcase.

@hvlad
Copy link
Member

hvlad commented Feb 4, 2025

Could you provide results on RELEASE build ?
So far I see no effect. Perhaps another test case should be used to see it.

@XaBbl4
Copy link
Contributor Author

XaBbl4 commented Feb 7, 2025

My work computer is weaker, plus the OS may matter (I'm testing on Windows), but the results on the release build are as follows:

CNT       BEFORE     K_B     AFTER     K_A        K
1000         127   0.000       127   0.000    1.000
2000         287   2.260       186   1.465    1.543
4000         754   2.627       399   2.145    1.890
8000        2237   2.967       854   2.140    2.619
16000       7343   3.283      1648   1.930    4.456
32000      27770   3.782      3327   2.019    8.347
64000     115308   4.152      7175   2.157   16.071
128000    647598   5.616     14707   2.050   44.033

In the config it was written:

TempCacheLimit = 1G
DefaultDbCachePages = 50000

It shows almost ideal increase of execution time

I was trying to figure out why you had such a good increase time without a patch.
And if removed the commit operation between running tests with different numbers of records, then can achieve similar results. In this case, the following results were obtained:

CNT       BEFORE     K_B     AFTER     K_A        K
1000          90   0.000        97   0.000    0.928
2000         241   2.678       187   1.928    1.289
4000         358   1.485       365   1.952    0.981
8000         744   2.078       732   2.005    1.016
16000       1560   2.097      1576   2.153    0.990
32000       3384   2.169      3164   2.008    1.070
64000       7066   2.088      6745   2.132    1.048
128000     14759   2.089     14281   2.117    1.033

Which is approximately (within the margin of error) the same, both without the patch and with the patch.

There hasn't been enough time to figure out why the commit operation has such an impact on execution time without a patch, and has near no impact on execution time with a patch.

I also tried running the test on another, more powerful computer (with config parameters and a commit between requests), the execution time is faster there, but there is still a strong advantage with and without the patch:

CNT       BEFORE     K_B     AFTER     K_A        K
1000          43   0.000        52   0.000    0.827
2000         122   2.837       104   2.000    1.173
4000         315   2.582       207   1.990    1.522
8000         900   2.857       351   1.696    2.564
16000       2854   3.171       735   2.094    3.883
32000      10199   3.574      1471   2.001    6.933
64000      45324   4.444      3028   2.058   14.968
128000    232800   5.136      6396   2.112   36.398

I am sending a test case in the attached file, but I am not sure that it will run as is, since it uses a modified pytest to simultaneously run two DBMS.
test_185051.zip

@hvlad
Copy link
Member

hvlad commented Feb 9, 2025

Finally, I reproduced the issue with original test case: initially I created trigger H$TEST_TABLE as you wrote it - for 4 fields, not for all 50 fields. I considered it is OK as test updates FLD_50 only. But I missed that o is blob and every assignment creates a new tiny blob here.

So, now I can confirm the issue and the fix. Also, I've tried alternative fix (with map <size, position>, as I offered to consider) and it shows almost the same results.

I still think that code could be made a bit more understandable and looking how to improve it.

@hvlad
Copy link
Member

hvlad commented Feb 16, 2025

Let me first explain how I see what this PR is about.

Free segments organized into set of doubly-linked lists where each list contains segments of equal size.
List pointers are embedded into segment (Segment::prev, Segment::next).
class SegmentLastPointer contains head of the list (named 'last') and size of list elements.
All lists heads contained at the new BePlustTree (freeSegmentLastPointers).
The code works with lists as with stack - new elements inserted into head position, existing element taked out from list head also.

I consider this as nice way to add an extra index without much overhead.
The main confusion I see that names used in PR doesn't reflects the actual structures, explained above.

I offer to:

  • rename SegmentLastPointer to SegmentsList or SegmentsStack,
  • rename SegmentLastPointer::last to SegmentsList::head (SegmentsStack::head),
  • replace typedef FreeSegmentLastPointerTree by class SegmentBySize that will contain former FreeSegmentLastPointerTree and move methods lastPointerAdd() and lastPointerRemove() into this class.
  • rename them to addSegment() and removeSegment() accordingly.
  • extract code that searches in freeSegmentLastPointers at TempSpace::allocateSpace() into separate method SegmentBySize::getSegment()

I believe such (or similar) changes makes code much more clear and easy to read and understand.

@hvlad
Copy link
Member

hvlad commented Feb 20, 2025

Looks like you (@XaBbl4) agreed with proposed changes. So, how we going to proceed ?
I see two options:

  • accept this PR as is and I'll rework it as proposed, or
  • you will do it himself and commit into this PR.

Another opinions ?

@XaBbl4
Copy link
Contributor Author

XaBbl4 commented Feb 20, 2025

I will do this as part of the PR, but after fixing the bug found with this. For certain operations, fb_assert is triggered, there is a reproducible case.

@hvlad
Copy link
Member

hvlad commented Feb 21, 2025

Much better, thanks!

Two notes:

  • SegmentsStack::head actually is a tail, not a head of the list ;)
    It could be confusing for new code readers, I think.
    Rename it or change addSegment() and removeSegment() accordingly.

  • It will be good to add some code related with freeSegmentsBySize consistency into TempSpace::validate().

Also add consistency check in validate function
@XaBbl4
Copy link
Contributor Author

XaBbl4 commented Feb 21, 2025

Thanks for the notes.

I rename the variable, it is indeed correct.

Also added a check for the number of segments of the source tree freeSegments to validate function. I think this check is enough.

@dyemanov dyemanov merged commit 8d6c46e into FirebirdSQL:master Feb 22, 2025
24 checks passed
@dyemanov
Copy link
Member

@hvlad , any objection to backport into v5?

@XaBbl4 XaBbl4 deleted the add_fast_search_to_tempspace branch February 24, 2025 07:39
@hvlad
Copy link
Member

hvlad commented Feb 24, 2025

No objections, of course

dyemanov pushed a commit that referenced this pull request Feb 25, 2025
* Add pointers tree to TempSpace class

Necessary for faster search of a free segment of the required size. When using temporary blobs, there are situations when a large number of free segments of a small size accumulate during one transaction.

* Replace NULL to nullptr

* Refactor class and fix server crash

* Rename head to tail for better understanding

Also add consistency check in validate function

---------

Co-authored-by: Andrey Kravchenko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants