Skip to content

Conversation

@jonnyg23
Copy link

@jonnyg23 jonnyg23 commented Oct 31, 2025

Updates

  • Added parquet_generator.py and generate_parquet.py.
  • Integrated the new generator into generate_all_versions.py.
    • Executed & Uploaded all files (~275.6 MB - the smallest format bundle)
  • Added a requirements.txt for quick Python dependency installation.
    • Updated documentation to reflect the new setup and usage.
    • Added pyarrow (for .parquet files)
    • Added pymysql (missing for mysql files)
  • Added a root-level .gitignore to prevent tracking of common Python environment files and folders.

Reason for Changes

  • *.parquet files are widely used in the data science industry for efficient, compressed, and columnar data storage—enabling faster reads, smaller file sizes, and seamless integration with analytics tools like Pandas, Polars, PySpark, and Databricks.
  • Added requirements.txt to simplify dependency management for both development and usage.
  • Updated existing docs for python pip installations to reference the names of these packages as seen in pypi.org.

Testing

I pulled in the generated KJV.parquet file into a local Jupyter Notebook to verify that it is structured correctly.

showing-parquet-has-correct-format

Linked Issue

Resolves #112

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for .parquet file format

1 participant