A lightweight, no‑code file ingestion workflow. Configure a set of tables, get a volume path for each, and drop files into those paths—your data lands in Unity Catalog tables via Auto Loader and Lakeflow Pipeline.
Edit table configs in ./src/configs/tables.json. Only name and format are required.
Currently supported formats are csv, json, avro and parquet.
For supported format_options, see the Auto Loader options. Not all options are supported here. If unsure, specify only name and format, or follow Debug Table Issues to discover the correct options.
[
{
"name": "table1",
"format": "csv",
"format_options":
{
"escape": "\""
},
"schema_hints": "id int, name string"
},
{
"name": "table2",
"format": "json"
}
]Tip: Keep
schema_hintsminimal; Auto Loader can evolve the schema as new columns appear.
databricks bundle deploy
databricks bundle run configuration_jobWait for the configuration job to finish before moving on.
First, grant write permissions to the volume. This enables the client to push files:
databricks bundle open filepush_volumeFetch the volume path for uploading files to a specific table (example: table1):
databricks tables get chi_cata.filepushschema.table1 --output json \
| jq -r '.properties["filepush.table_volume_path_data"]'Example output:
/Volumes/chi_cata/filepushschema/chi_cata_filepushschema_filepush_volume/data/table1
Upload files to the path above using any of the Volumes file APIs.
Databricks CLI example (destination uses the dbfs: scheme):
databricks fs cp /local/file/path/datafile1.csv \
dbfs:/Volumes/chi_cata/filepushschema/chi_cata_filepushschema_filepush_volume/data/table1REST API example:
# prerequisites: export DATABRICKS_HOST and DATABRICKS_TOKEN (PAT token)
curl -X PUT "$DATABRICKS_HOST/api/2.0/fs/files/Volumes/chi_cata/filepushschema/chi_cata_filepushschema_filepush_volume/data/table1/datafile1.csv" \
-H "Authorization: Bearer $DATABRICKS_TOKEN" \
-H "Content-Type: application/octet-stream" \
--data-binary @"/local/file/path/datafile1.csv"Within about a minute, the data should appear in the table chi_cata.filepushschema.table1.
If data isn’t parsed as expected, use dev mode to iterate on table options safely.
Configure tables as in Step 1 of Quick Start.
databricks bundle deploy -t dev
databricks bundle run configuration_job -t devWait for the configuration job to finish. Example output:
2025-09-23 22:03:04,938 [INFO] initialization - ==========
catalog_name: chi_cata
schema_name: dev_first_last_filepushschema
volume_path_root: /Volumes/chi_cata/dev_first_last_filepushschema/chi_cata_filepushschema_filepush_volume
volume_path_data: /Volumes/chi_cata/dev_first_last_filepushschema/chi_cata_filepushschema_filepush_volume/data
volume_path_archive: /Volumes/chi_cata/dev_first_last_filepushschema/chi_cata_filepushschema_filepush_volume/archive
==========
Note: In dev mode, the schema name is prefixed. Use the printed schema name for the remaining steps.
Get the dev volume path (note the prefixed schema):
databricks tables get chi_cata.dev_first_last_filepushschema.table1 --output json \
| jq -r '.properties["filepush.table_volume_path_data"]'Example output:
/Volumes/chi_cata/dev_first_last_filepushschema/chi_cata_filepushschema_filepush_volume/data/table1
Then follow the upload instructions from Quick Start → Step 3 to send test files.
Open the pipeline in the workspace:
databricks bundle open refresh_pipeline -t devClick Edit pipeline to launch the development UI. Open the debug_table_config notebook and follow its guidance to refine the table options. When satisfied, copy the final config back to ./src/configs/tables.json.
Redeploy the updated config and run a full refresh to correct existing data for an affected table:
databricks bundle deploy
databricks bundle run refresh_pipeline --full-refresh table1That’s it! You now have a managed, push-based file ingestion workflow with debuggable table configs and repeatable deployments!