Add BeginInsert/InsertData/EndInsert flow #443

theory · 2025-10-30T22:02:46Z

Add a new pattern for "prepared inserts". It works like this:

Call BeginInsert with an INSERT query with optional columns and ending in VALUES. No values should be included in the string.
It returns a Block pre-configured with columns as declared in the INSERT statement
Add data to the block and periodically call InsertData to insert data and clear the block.
Call EndInsert() or just let the Client object go out of scope to signal the server that it's done inserting.

This allows one to send smaller batches of blocks, thereby using less memory, but still in a single ClickHouse INSERT operation.

Expected to be useful in the Postgres foreign data wrapper insert API, where multiple rows can be inserted at once but its API handles one-at-a-time insertion. It will also support the FDW COPY API, which can submit huge batches of data to insert, as well.

theory · 2025-10-30T22:05:42Z

clickhouse/client.cpp

+            if (chtype->GetCode() == Type::LowCardinality) {
+                chtype = col->As<ColumnLowCardinality>()->GetNestedType();
+            }


I'm honestly not sure this is the right thing to do. Might one need Type::LowCardonality?

Actually I think we can probably do away with this elision of LowCardonality if we can fix this issue. I can't figure out what to construct to append there. The error from Append there is:

no suitable user-defined conversion from "clickhouse::ItemView" to "clickhouse::ColumnRef" (aka "std::__1::shared_ptr<clickhouse::Column>") existsC/C++(312)

theory · 2025-10-30T22:06:23Z

clickhouse/client.cpp

+
+    void FinishInsert();
+
+    void SendData(const Block& block);


I had to move this to public so that PreparedInsert can call it. Not in the header file, though, so shouldn't matter.

clickhouse/client.cpp

theory · 2025-10-30T22:08:12Z

clickhouse/client.h

+    public:
+        Block * GetBlock();
+        void Execute();
+        // XXX This shouldn't be public.


I couldn't figure out how to make this private. Suggestions appreciated.

Would be nice if it worked declared public in the .cpp file, but I think I could also use an Impl class like Client does to hide such things.

Copilot

Pull Request Overview

This PR introduces a PreparedInsert pattern for more memory-efficient bulk data insertion. Instead of accumulating all data before sending, users can now prepare an INSERT statement once and execute multiple smaller batches within a single ClickHouse operation.

Key Changes:

Added PreparedInsert class with GetBlock(), Execute(), and Finish() methods for iterative data insertion
Implemented PrepareInsert() methods in Client for initiating prepared inserts
Added comprehensive unit test demonstrating the prepared insert workflow

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File	Description
clickhouse/client.h	Declared `PreparedInsert` nested class and `PrepareInsert()` methods with detailed documentation
clickhouse/client.cpp	Implemented `PreparedInsert` class methods, `ReceivePreparePackets()`, and refactored insert finalization logic
clickhouse/block.h	Fixed spelling in comments ("Convinience" → "Convenience")
ut/client_ut.cpp	Added `PrepareInsert` test case and fixed spelling in existing comment ("Spontaneosly" → "Spontaneously")

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ut/client_ut.cpp

clickhouse/client.h

clickhouse/client.cpp

ut/client_ut.cpp

clickhouse/client.cpp

slabko

Thank you very much for contributing this feature. It has been on the list for quite some time, and I’m glad someone has started looking into it.

However, I have a few remarks.

In general, if you look at the codebase, there is no manual memory management, that is, instead of using new and delete, we rely on std::unique_ptr and std::shared_ptr to manage heap-allocated resources. In fact, the delete keyword is never used anywhere in the project. Using manual memory management of the PreparedInsert class introduces a very bad situation where PreparedInsert can be inadvertently copied.The compiler will automatically generate the copy assignment and the copy constructor operators, which could lead to shallow copies of pointers and ultimately a double-free error, if users are not careful. This can easily happen by accident.

My second remark is a bit tougher. I know you’ve put thought and care into this design, but I’ll have to ask for large changes. The PreparedInsert is not needed here, and the API is simpler without it. The insert operation should be simple and not require many visible moving parts. Ideally, I would approach it like this:

Block block = client.BeginInsert("INSERT INTO test_clickhouse_cpp_insert VALUES");
for (const auto& td : TEST_DATA) {
    id->Append(td.id);
    name->Append(td.name);
    f->Append(td.f);
}
client.SendData(block);
...
client.SendData(block);
...
client.SendData(block);
client.EndInsert();

The main points here are:

BeginInsert and EndInsert clearly form a pair and serve one another.
It’s unambiguous that no other insert or select statements should occur between them. The current PreparedInsert design creates room for sharing the PreparedInsert around, which risks losing the connection state and start using the client object for something else in the meantime. The proposed pattern enforces a clear principle: one operation → one connection → one client object. Need another parallel operation - create another client.
Here the Block object is detached, and ownership is passed to the user code. The user knows it’s not an internal part of PreparedInsert and can freely modify it if needed.
You can still preserve automatic EndInsert behavior when the client goes out of scope by tracking its state - if it’s in insert mode, call EndInsert in the destructor.
I would avoid using the word Prepare... here, because it seem to have a bit different idea than what we are trying achiave here.

Thank you again for your work. Please let me know if you’d like any help, I’d be happy to assist.

theory · 2025-11-05T15:54:45Z

Thank you for the design suggestions. I'll work on them this afternoon.

theory · 2025-11-05T19:29:00Z

Done in a91ff8a.

theory · 2025-11-05T19:31:27Z

clickhouse/client.h

+     */
+    std::unique_ptr<Block> BeginInsert(const std::string& query);
+    std::unique_ptr<Block> BeginInsert(const std::string& query, const std::string& query_id);
+    void InsertData(Block& block);


Holler if you'd rather pass a std::unique_ptr<Block>. Seems okay to me to pass a *block instead, but I'm not yet up to snuff on idiomatic C++.

Does BeginInsert have to return std::unique_ptr? It seems to me that it doesn't to be a pointer at all, i.e.:

Block BeginInsert(const std::string& query);

InsertData looks good, except it should be

void InsertData(const Block& block);

But InsertData does modify the block, by design:

void Client::Impl::InsertData(Block& block) { assert(inserting); block.RefreshRowCount(); SendData(block); block.Clear(); }

Would you rather that refreshing the count and clearing be done by the caller?

Switched to returning a Block in 0a3da16, and also moved the docs to the README. Diff.

No, only new Block() would need delete. Destructor handles cleaning up vector's heap allocation

Changed to void InsertData(const Block& block); in 17467b7.

Add a new pattern for "prepared inserts". It works like this: * Call `BeginInsert` with an `INSERT` query with optional columns and ending in `VALUES`. No values should be included in the string. * It returns a `Block` pre-configured with columns as declared in the `INSERT` statement * Add data to the block and periodically call `InsertData` to insert data and clear the block. * Call `EndInsert()` or just let the `Client` object go out of scope to signal the server that it's done inserting. This allows one to send smaller batches of blocks, thereby using less memory, but still in a single ClickHouse `INSERT` operation. Expected to be useful in the Postgres foreign data wrapper insert API, where multiple rows can be inserted at once but its API handles one-at-a-time insertion. It will also support the FDW COPY API, which can submit huge batches of data to insert, as well.

theory commented Oct 30, 2025

View reviewed changes

clickhouse/client.cpp Outdated Show resolved Hide resolved

theory commented Oct 30, 2025

View reviewed changes

theory force-pushed the insert-block branch 5 times, most recently from 51d8216 to c93c844 Compare October 31, 2025 20:50

serprex approved these changes Nov 3, 2025

View reviewed changes

mshustov requested review from Copilot and slabko November 4, 2025 08:25

Copilot AI reviewed Nov 4, 2025

View reviewed changes

theory force-pushed the insert-block branch from c93c844 to d2e84c7 Compare November 4, 2025 18:02

slabko requested changes Nov 4, 2025

View reviewed changes

theory force-pushed the insert-block branch from d2e84c7 to a91ff8a Compare November 5, 2025 19:28

theory changed the title ~~Add PreparedInsert flow~~ Add BeginInsert/InsertData/EndInsert flow Nov 5, 2025

theory commented Nov 5, 2025

View reviewed changes

theory requested review from serprex and slabko November 5, 2025 19:32

theory force-pushed the insert-block branch 5 times, most recently from d054c36 to 17467b7 Compare November 7, 2025 21:38

theory force-pushed the insert-block branch from 17467b7 to d3b7dcd Compare November 7, 2025 21:41

Add BeginInsert/InsertData/EndInsert flow #443

Are you sure you want to change the base?

Add BeginInsert/InsertData/EndInsert flow #443

Conversation

theory commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slabko left a comment

Choose a reason for hiding this comment

Uh oh!

theory commented Nov 5, 2025

Uh oh!

theory commented Nov 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serprex Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

theory commented Oct 30, 2025 •

edited

Loading

serprex Nov 7, 2025 •

edited

Loading