Skip to content

Conversation

kafka1991
Copy link
Contributor

use non-blocking stdin check to prevent hanging
close #260 and #152

Changelog category (leave one):

  • Bug Fix

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

CI Settings

NOTE: If your merge the PR with modified CI you MUST KNOW what you are doing
NOTE: Checked options will be applied if set before CI RunConfig/PrepareRunConfig step

Run these jobs only (required builds will be added automatically):

  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Unit tests
  • Performance tests
  • All with aarch64
  • All with ASAN
  • All with TSAN
  • All with Analyzer
  • All with Azure
  • Add your option here

Deny these jobs:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64

Extra options:

  • do not test (only style check)
  • disable merge-commit (no merge from master before tests)
  • disable CI cache (job reuse)

Only specified batches in multi-batch jobs:

  • 1
  • 2
  • 3
  • 4

@kafka1991 kafka1991 changed the title fix(jupyter):use non-blocking stdin check to prevent hanging Fix(jupyter):use non-blocking stdin check to prevent hanging Sep 16, 2025
@auxten
Copy link
Member

auxten commented Sep 16, 2025

Better to have some tests inserting data with Jupyter Notebook

int flags = fcntl(fd, F_GETFL);
if (flags != -1)
{
fcntl(fd, F_SETFL, flags | O_NONBLOCK);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changed the behavior of stdin empty check. The orginal one will block wait for data if not eol found.
I still think this mod is risky while processing large chunk of input data with slow IO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon confirmation:

  • The std_in check here is only used for INSERT statements on the clickhouse-local client side; engines like CSV read do not use it for reading.

  • The interaction code between Python and C++ does not use the std_in fd

"\n",
"# Insert from INFILE and VALUES got stuck either\n",
"# chs.query(\"INSERT INTO embeddings FROM INFILE 'movie_embeddings.csv' FORMAT CSV\")\n",
"chs.query(\"INSERT INTO embeddings VALUES (1, [1,2,3,4,5,6,7,8,9,10])\")\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe gen some bigger CSV to test it is better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

complete add more tests

@kafka1991
Copy link
Contributor Author

kafka1991 commented Oct 2, 2025

@auxten

It's worth noting the following code doesn't work correctly (we can't accept and insert data from standard input like clickhouse-local).

    def testStdin(self):
        print("start")
        from chdb import session
        chs = session.Session()
        chs.query("CREATE DATABASE IF NOT EXISTS test ENGINE = Atomic")
        chs.query("USE test")
        chs.query('DROP TABLE IF EXISTS embeddings')
        time.sleep(60)
        chs.query("""CREATE TABLE embeddings
                     (
                         movieId   UInt32 NOT NULL,
                         embedding Array(Float32) NOT NULL
                     ) ENGINE = MergeTree()
          ORDER BY movieId""")
        chs.query("""INSERT INTO embeddings FORMAT CSV""")
        count = chs.query('SELECT COUNT(*) as count FROM embeddings')
        print(f"Records: {count}")

terminal run

cat tests/movie_embeddings.csv | python tests/test_query_py.py testStdin()

Receive exception:

RuntimeError: Code: 108. DB::Exception: Code: 108. DB::Exception: No data to insert. (NO_DATA_TO_INSERT) (version 25.5.2.1). (NO_DATA_TO_INSERT)

This issue has already existed in our main branch, introduced by commit 6123ace. This feature is mutually exclusive with the non-blocking Jupyter Notebook, so we can confirm that it is not supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using session.query("insert xxx") will stuck
2 participants