Add .parquet file generator & files #111
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Updates
parquet_generator.pyandgenerate_parquet.py.generate_all_versions.py.requirements.txtfor quick Python dependency installation.pyarrow(for .parquet files)pymysql(missing for mysql files).gitignoreto prevent tracking of common Python environment files and folders.Reason for Changes
*.parquetfiles are widely used in the data science industry for efficient, compressed, and columnar data storage—enabling faster reads, smaller file sizes, and seamless integration with analytics tools like Pandas, Polars, PySpark, and Databricks.requirements.txtto simplify dependency management for both development and usage.Testing
I pulled in the generated
KJV.parquetfile into a local Jupyter Notebook to verify that it is structured correctly.Linked Issue
Resolves #112