Skip to content

Conversation

@davidshepherd7
Copy link

Hi!

We're in the process of migrating to cockroachdb. We found that naively creating dev databases using sqlalchemy and cockroachdb was dramatically slower than postgres (~20 minutes vs ~30 seconds). We narrowed most of this time down to CREATE INDEX statements.

Talking with @data-matt he suggested doing index creation inside CREATE TABLE rather than as separate statements. This makes a huge difference, bringing our overall db initialisation time down to ~2m30.

So I've experimented with getting sqlalchemy to do it this way. The only approach I could find is to have visit_create_table do the index creation and visit_create_index be a no-op. I've got some prototype code here which works for how we use sqlalchemy at Wave.

I don't know sqlalchemy's internals very well so I'm kind of uncertain about my approach, in particular:

  • Are there important cases where sqlalchemy could emit DDL for the index without emitting it for the table, e.g. does it have any native migration generation which does this?
  • I can't see any places in built-in sqlalchemy DDL Compilers where they return a no-op from a visit function. Is this just a completely insane idea?

Do you have any ideas/thoughts?

Then my other question: if this works, how do we release it? Presumably this is a breaking change, so do we need to put it behind some kind of config flag?

@davidshepherd7 davidshepherd7 changed the title Moving index creation inside CREATE TABLE for massive database creation speedup Move index creation inside CREATE TABLE for massive database creation speedup Jul 18, 2025
index = element.target
assert isinstance(index, Index)
was_created = index.info.get("_cockroachdb_index_created_by_create_table", False)
assert was_created
Copy link
Author

@davidshepherd7 davidshepherd7 Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do need to handle emitting CREATE INDEX in cases where we_aren't also creating the corresponding table then we might be able to do that here by doing something like:

if not was_created:
    return compiler.visit_create_index(...)

(Assuming that sqlalchemy always does index creations after the corresponding table creation.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to be able to emit CREATE INDEX statements because an Alembic migration might want to add an index to an existing column. With the changes I proposed in wavemm#1 , this code works with the current master branch (bc87688)

from alembic.migration import MigrationContext
from alembic.operations import Operations
import sqlalchemy as sa

myengine = sa.create_engine("cockroachdb://root@localhost:26257/defaultdb")
conn = myengine.connect()
ctx = MigrationContext.configure(conn)
op = Operations(ctx)

op.drop_table("invoice", if_exists=True)
op.create_table(
    "invoice",
    sa.Column("invoice_number", sa.Integer(), nullable=False),
    sa.Column("account_number", sa.Integer(), nullable=True),
    sa.PrimaryKeyConstraint("invoice_number"),
)
op.create_index(op.f("ix_invoice_account_number"), "invoice", ["account_number"], unique=False)

but your modified version fails with

Traceback (most recent call last):
  File "/home/gord/git/sqlalchemy-cockroachdb/.gord_stuff/alembic_op.py", line 20, in <module>
    op.create_index(op.f("ix_invoice_account_number"), "invoice", ["account_number"], unique=False)
  File "<string>", line 3, in create_index
  File "/home/gord/git/sqlalchemy-cockroachdb/.venv/lib/python3.9/site-packages/alembic/operations/ops.py", line 1013, in create_index
    return operations.invoke(op)
  File "/home/gord/git/sqlalchemy-cockroachdb/.venv/lib/python3.9/site-packages/alembic/operations/base.py", line 441, in invoke
    return fn(self, operation)
  File "/home/gord/git/sqlalchemy-cockroachdb/.venv/lib/python3.9/site-packages/alembic/operations/toimpl.py", line 112, in create_index
    operations.impl.create_index(idx, **kw)
  File "/home/gord/git/sqlalchemy-cockroachdb/.venv/lib/python3.9/site-packages/alembic/ddl/postgresql.py", line 99, in create_index
    self._exec(CreateIndex(index, **kw))
  File "/home/gord/git/sqlalchemy-cockroachdb/.venv/lib/python3.9/site-packages/alembic/ddl/impl.py", line 246, in _exec
    return conn.execute(construct, params)
  File "/home/gord/git/sqlalchemy-cockroachdb/.venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1415, in execute
    return meth(
  File "/home/gord/git/sqlalchemy-cockroachdb/.venv/lib/python3.9/site-packages/sqlalchemy/sql/ddl.py", line 187, in _execute_on_connection
    return connection._execute_ddl(
  File "/home/gord/git/sqlalchemy-cockroachdb/.venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1523, in _execute_ddl
    compiled = ddl.compile(
  File "/home/gord/git/sqlalchemy-cockroachdb/.venv/lib/python3.9/site-packages/sqlalchemy/sql/elements.py", line 308, in compile
    return self._compiler(dialect, **kw)
  File "/home/gord/git/sqlalchemy-cockroachdb/.venv/lib/python3.9/site-packages/sqlalchemy/sql/ddl.py", line 76, in _compiler
    return dialect.ddl_compiler(dialect, self, **kw)
  File "/home/gord/git/sqlalchemy-cockroachdb/.venv/lib/python3.9/site-packages/sqlalchemy/sql/compiler.py", line 886, in __init__
    self.string = self.process(self.statement, **compile_kwargs)
  File "/home/gord/git/sqlalchemy-cockroachdb/.venv/lib/python3.9/site-packages/sqlalchemy/sql/compiler.py", line 932, in process
    return obj._compiler_dispatch(self, **kwargs)
  File "/home/gord/git/sqlalchemy-cockroachdb/.venv/lib/python3.9/site-packages/sqlalchemy/ext/compiler.py", line 538, in <lambda>
    lambda *arg, **kw: existing(*arg, **kw),
  File "/home/gord/git/sqlalchemy-cockroachdb/.venv/lib/python3.9/site-packages/sqlalchemy/ext/compiler.py", line 591, in __call__
    expr = fn(element, compiler, **kw)
  File "/home/gord/git/sqlalchemy-cockroachdb/sqlalchemy_cockroachdb/ddl_compiler.py", line 67, in visit_create_index
    assert was_created
AssertionError

IDX_USING = re.compile(r"^(?:btree|hash|gist|gin|[\w_]+)$", re.I)


# Heavily based on DDLCompiler.visit_create_index
Copy link
Author

@davidshepherd7 davidshepherd7 Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost PostgresqlDDLCompiler.visit_create_index. Differences are:

  • Remove the CREATE and ON {table_name} bits
  • Replacing USING with INVERTED (seems to be needed for crdb?)
  • Removing/commenting some features that I don't think crdb supports.

In the final version I would clean this up a lot more.

I don't think we should attempt to reuse PostgresqlDDLCompiler.visit_create_index - I think the string munging required for that would be quite brittle.

@data-matt
Copy link

@rafiss

@dikshant
Copy link

@gordthompson would you mind taking a look at this?

@dikshant dikshant requested a review from gordthompson July 28, 2025 15:23
Copy link
Collaborator

@gordthompson gordthompson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After making the changes noted below I got

$ pytest -k test_create_table /home/gord/git/sqlalchemy-cockroachdb/test/test_suite_sqlalchemy.py

to run. (2 tests passed, 4 ignored, 6 tests total)

@@ -1,5 +1,19 @@
from sqlalchemy import exc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing import re statement above this line.

Comment on lines +11 to +16
from sqlalchemy_cockroachdb.base import ( # type: ignore[import-untyped]
CockroachDBDialect,
)
from sqlalchemy_cockroachdb.ddl_compiler import ( # type: ignore[import-untyped]
CockroachDDLCompiler,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to remove these two imports to avoid

ImportError: cannot import name 'CockroachDBDialect' from partially initialized module 'sqlalchemy_cockroachdb.base' (most likely due to a circular import) (/home/gord/git/sqlalchemy-cockroachdb/sqlalchemy_cockroachdb/base.py)

@davidshepherd7
Copy link
Author

Hi @gordthompson thanks for the review and for getting the tests passing.

Do you have any thoughts on the general approach taken here? e.g. whether it's likely to have unexpected effects on edge case uses or cause maintenance difficulties in the future?

If not I can clean this up to something more ready to merge.

@gordthompson
Copy link
Collaborator

Do you have any thoughts on the general approach taken here?

The general approach seems reasonable to me, but

Presumably this is a breaking change, so do we need to put it behind some kind of config flag?

yes, I agree that the change is significant enough that it probably should be an opt-in feature.

@davidshepherd7
Copy link
Author

Great, thanks! I'll keep tinkering with this stuff inside our codebase for a while longer.

I'll probably want to upstream it around/before we start serious use of cockroachdb in production, so I'll aim to clean this PR up before then. So anytime from the next few weeks up to the end of this year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants