Skip to content

Conversation

njupopsicle
Copy link

Hello authors, as mentioned in #17, there does exist some inconsistency in database names between train/test.parquet and the actual SynSQL-2.5M databases., which would break the training process. To fix this issue and make SQL-R1 more reproducable, I add example_data/fix_db_names.py, which is generated by ChatGPT and can fix all inconsistent database names with the minimum editing distance algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant