Skip to content

Commit 70793f5

Browse files
eadgbearosipovartemYaroslavLitvinovVedinrampage644
authored
(WIP) Metastore + Unified Query 1.0 (#312)
* Initial metastore crate * Metastore implementation * Upgrading Slatedb to 0.4, adding snapshot tests * Checkpoint commit * Checkpoint before crate re-org * Fix timestamp displaying format (#279) * fix timestamp format * Revert timestamp change * in login request use database_name instead of warehouse (#277) * Adding missing licenses + license check (#278) * Adding missing licenses + license check * Fixing license config path * Fix IPC arrow format issue (#280) * Fix IPC arrow format issue * Fix IPC arrow format issue * Fix comments * Control plane tests (batch 3) (#274) * Add CP service tests * Add create/delete session tests * Add create/delete session tests * Use converted S3 error * Move DBT_SERIALIZATION_FORMAT to cli (#286) * Move dbt_ser_fmt to cli * Rename the field * Update query runner with session id (#285) * Issue 115 Arrow support: replace uints by ints in resulted RecordBatch (#287) * replace uints by ints in resulted RecordBatch * cargo clippy --workspace --tests * rebase & fixes * Set warehouse location based on storage profile path (#288) * map float types to real logical_type (#289) * map float types to real logical_type * separate sessions tests to avoid blocking issues * unit test for floats datatype maped into real logical type * Snowflake timestamp_from_parts udf (#297) * Inital version * snowflake timestamp_from_parts udf * Fix clippy * Fix example typo * Support negative values * Snowflake time_from_parts udf (#298) * Inital version * snowflake timestamp_from_parts udf * snowflake time_from_parts udf * snowflake time_from_parts udf * Fix clippy * Fix clippy * Fix example typo * Support negative values * Merge * snowflake date_from_parts udf (#299) * Inital version * snowflake timestamp_from_parts udf * snowflake time_from_parts udf * snowflake time_from_parts udf * snowflake date_from_parts udf * Fix clippy * Implement extraction of the corresponding date part from a date or timestamp (#302) * Implement extracts the corresponding date part from a date or timestamp * Add units for visit expressions * Add geospatial udfs (#304) * Fix object store for custom endpoint (#295) AWS S3 builder uses dedicated setting to allow http. Parse endpoint and enable if it isn't https. * Unified query, execution, catalog * Adding temporary tables to metastore * Custom DataFusion Catalog implementation (WIP) * Adding execution User Session type * Checkpoint * Catalog implementation for IB * Crate re-org complete, writing tests * Changing metastore crate to use iceberg_rust types * Metastore crate cleanup * Create table query fixed * Fixing copy_into query * Finished updating query type * Cleaning up service type * Wrapping up changes to the execution crate * Bumping Datafusion git ref * Fixing up after package reorg * Fixed DBT handler * Removing old crates * Disabling some UI handlers and models until new metastore integration is done * Disabling extra OpenAPI endpoints and some unit tests while test are fixed * Deleting manifest files that snuck into the repo (my bad) * Final compilation and clippy fixes * `cargo fix` * `cargo fmt` * Adding license blocks to new files * Adding missing licenses + license check (#278) * Adding missing licenses + license check * Fixing license config path * Fix IPC arrow format issue (#280) * Fix IPC arrow format issue * Fix IPC arrow format issue * Fix comments * Control plane tests (batch 3) (#274) * Add CP service tests * Add create/delete session tests * Add create/delete session tests * Use converted S3 error * Move DBT_SERIALIZATION_FORMAT to cli (#286) * Move dbt_ser_fmt to cli * Rename the field * Snowflake timestamp_from_parts udf (#297) * Inital version * snowflake timestamp_from_parts udf * Fix clippy * Fix example typo * Support negative values * Implement extraction of the corresponding date part from a date or timestamp (#302) * Implement extracts the corresponding date part from a date or timestamp * Add units for visit expressions * Add geospatial udfs (#304) * patch chrono 0.4.40 as it breaks build (#310) * Added support for `day` and other datetime keywords in DF (#294) * start here * no macro, just simple vistor mut * remvoe redundant trait creation * integration test add, fixed logic, next unit tests * activated impl and cargo fmt + clippy * added unit tests + impl extension * cargo fmt + clippy * cargo clippy * cargo clippy * removed redundant check * fix * merged with Artem PR * cargo fmt + clippy * small fixes * cargo fmt + clippy * update hash commis for datafusion, iceberg-rust (#301) * Update tests.yml (#313) Adding github large runners to tests * Return hardcoded is_dynamic (#318) * Add geospatial UDF st_makeline (#315) * Add geospatial udfs * Add geospatial st_makeline * Fix cargo * Add lic header * Change allocator to snmalloc (#317) * Add ST_MakePolygon, st_polygon (#319) * Add ST_MakePolygon, st_polygon * Add docs * Add ST_DIMENSION (#321) * Add ST_MakePolygon, st_polygon * Add docs * Add ST_dimention * Add license * Add ST_PointN and ST_Endpoint (#322) * Add ST_MakePolygon, st_polygon * Add docs * Add ST_dimention * Add license * Add ST_Endpoint, ST_PointN * Reimplement ST_Point to work with snowflake linestring * Add St_x, st_y, st_srid (#324) * Add ST_MakePolygon, st_polygon * Add docs * Add ST_dimention * Add license * Add ST_Endpoint, ST_PointN * Reimplement ST_Point to work with snowflake linestring * Add St_x, st_y, st_srid * Moving UDF registration * Changing Metastore to use Dashmap * Adding missing geospatial functions after merge * Fixing unit test structure, still need to fix some unit tests * Post-merge cleanup --------- Co-authored-by: Artem Osipov <[email protected]> Co-authored-by: Yaroslav Litvinov <[email protected]> Co-authored-by: Denys Tsomenko <[email protected]> Co-authored-by: Sergei Turukin <[email protected]> Co-authored-by: DanCodedThis <[email protected]>
1 parent 21265cb commit 70793f5

File tree

134 files changed

+9747
-5628
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

134 files changed

+9747
-5628
lines changed

.gitignore

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,4 +35,8 @@ warehouse-prefix/
3535
**/*.parquet
3636
crates/catalog/prefix/**
3737
embucket-warehouse-test/
38-
dummyprefix/
38+
dummyprefix/
39+
40+
*.manifest
41+
*.sst
42+
*.manifest.json

Cargo.toml

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,7 @@
22
default-members = ["bin/bucketd"]
33
members = [
44
"bin/bucketd",
5-
"crates/catalog",
6-
"crates/control_plane",
7-
"crates/nexus",
5+
"crates/metastore",
86
"crates/runtime",
97
"crates/utils",
108
]
@@ -21,9 +19,9 @@ axum-macros = "0.5"
2119
tokio = { version = "1", features = ["full"] }
2220
async-trait = { version = "0.1.84" }
2321
serde = { version = "1.0", features = ["derive"] }
24-
slatedb = { version = "0.2.0" }
22+
slatedb = { version = "0.4.0" }
23+
bytes = { version = "1.8.0" }
2524
snmalloc-rs = { version = "0.3" }
26-
bytes = { version = "1" }
2725
object_store = { version = "0.11.1", features = ["aws", "gcp", "azure"] }
2826
serde_json = "1.0"
2927
serde_yaml = "0.9"
@@ -46,10 +44,10 @@ datafusion-doc = { version = "45.0.0" }
4644
datafusion-macros = { version = "45.0.0" }
4745

4846
datafusion-functions-json = { git = "https://github.com/Embucket/datafusion-functions-json.git", rev = "f2b9b88cd9b4282bc0286a970333e8c01cec177b" }
49-
50-
iceberg-rust = { git = "https://github.com/Embucket/iceberg-rust.git", rev = "1d486d6127ba18dc2dced7ed5f4c403dd8a6a8b9" }
51-
iceberg-rest-catalog = { git = "https://github.com/Embucket/iceberg-rust.git", rev = "1d486d6127ba18dc2dced7ed5f4c403dd8a6a8b9" }
52-
datafusion_iceberg = { git = "https://github.com/Embucket/iceberg-rust.git", rev = "1d486d6127ba18dc2dced7ed5f4c403dd8a6a8b9" }
47+
iceberg-rust = { git = "https://github.com/Embucket/iceberg-rust.git", rev = "1bfc71ee1353f91773ab7354d64a8ca2c3c3bb3d" }
48+
iceberg-rust-spec = { git = "https://github.com/Embucket/iceberg-rust.git", rev = "1bfc71ee1353f91773ab7354d64a8ca2c3c3bb3d" }
49+
iceberg-rest-catalog = { git = "https://github.com/Embucket/iceberg-rust.git", rev = "1bfc71ee1353f91773ab7354d64a8ca2c3c3bb3d" }
50+
datafusion_iceberg = { git = "https://github.com/Embucket/iceberg-rust.git", rev = "1bfc71ee1353f91773ab7354d64a8ca2c3c3bb3d" }
5351

5452
[patch.crates-io]
5553
datafusion = { git = "https://github.com/Embucket/datafusion.git", rev = "eecc47ec20fec83158bd71611e799cb7d373646f" }
@@ -76,3 +74,4 @@ missing_errors_doc = { level = "allow", priority = 2 }
7674
missing_panics_doc = { level = "allow", priority = 2 }
7775
significant_drop_tightening = { level = "allow", priority = 2 }
7876
module_name_repetitions = { level = "allow", priority = 2 }
77+
option_if_let_else = {level = "allow", priority = 2}

bin/bucketd/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ license-file = {workspace = true}
88
[dependencies]
99
clap = { version = "4.5.27", features = ["env", "derive"] }
1010
dotenv = { version = "0.15.0" }
11-
nexus = { path = "../../crates/nexus" }
11+
icebucket_runtime = { path = "../../crates/runtime" }
1212
object_store = { workspace = true }
1313
tokio = { workspace = true }
1414
snmalloc-rs = { workspace = true }

bin/bucketd/src/cli.rs

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ use object_store::{
2222
};
2323
use std::fs;
2424
use std::path::PathBuf;
25+
use std::sync::Arc;
2526

2627
#[derive(Parser)]
2728
#[command(version, about, long_about=None)]
@@ -145,7 +146,7 @@ enum StoreBackend {
145146

146147
impl IceBucketOpts {
147148
#[allow(clippy::unwrap_used, clippy::as_conversions)]
148-
pub fn object_store_backend(self) -> ObjectStoreResult<Box<dyn ObjectStore>> {
149+
pub fn object_store_backend(self) -> ObjectStoreResult<Arc<dyn ObjectStore>> {
149150
match self.backend {
150151
StoreBackend::S3 => {
151152
let s3_allow_http = self.allow_http.unwrap_or(false);
@@ -162,11 +163,11 @@ impl IceBucketOpts {
162163
.with_endpoint(&endpoint)
163164
.with_allow_http(s3_allow_http)
164165
.build()
165-
.map(|s3| Box::new(s3) as Box<dyn ObjectStore>)
166+
.map(|s3| Arc::new(s3) as Arc<dyn ObjectStore>)
166167
} else {
167168
s3_builder
168169
.build()
169-
.map(|s3| Box::new(s3) as Box<dyn ObjectStore>)
170+
.map(|s3| Arc::new(s3) as Arc<dyn ObjectStore>)
170171
}
171172
}
172173
StoreBackend::File => {
@@ -176,9 +177,9 @@ impl IceBucketOpts {
176177
fs::create_dir(path).unwrap();
177178
}
178179
LocalFileSystem::new_with_prefix(file_storage_path)
179-
.map(|fs| Box::new(fs) as Box<dyn ObjectStore>)
180+
.map(|fs| Arc::new(fs) as Arc<dyn ObjectStore>)
180181
}
181-
StoreBackend::Memory => Ok(Box::new(InMemory::new()) as Box<dyn ObjectStore>),
182+
StoreBackend::Memory => Ok(Arc::new(InMemory::new()) as Arc<dyn ObjectStore>),
182183
}
183184
}
184185
}

bin/bucketd/src/main.rs

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,11 @@ pub(crate) mod cli;
1919

2020
use clap::Parser;
2121
use dotenv::dotenv;
22+
use icebucket_runtime::{
23+
config::{IceBucketDbConfig, IceBucketRuntimeConfig},
24+
http::config::IceBucketWebConfig,
25+
run_icebucket,
26+
};
2227
use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt};
2328

2429
#[global_allocator]
@@ -60,17 +65,20 @@ async fn main() {
6065
Ok(object_store) => {
6166
tracing::info!("Starting 🧊🪣 IceBucket...");
6267

63-
if let Err(e) = nexus::run_icebucket(
64-
object_store,
65-
slatedb_prefix,
66-
host,
67-
port,
68-
allow_origin,
69-
&dbt_serialization_format,
70-
)
71-
.await
72-
{
73-
tracing::error!("Failed to start IceBucket: {:?}", e);
68+
let runtime_config = IceBucketRuntimeConfig {
69+
db: IceBucketDbConfig {
70+
slatedb_prefix: slatedb_prefix.clone(),
71+
},
72+
web: IceBucketWebConfig {
73+
host: host.clone(),
74+
port,
75+
allow_origin: allow_origin.clone(),
76+
data_format: dbt_serialization_format.clone(),
77+
},
78+
};
79+
80+
if let Err(e) = run_icebucket(object_store, runtime_config).await {
81+
tracing::error!("Error while running IceBucket: {:?}", e);
7482
}
7583
}
7684
}

crates/catalog/src/error.rs

Lines changed: 0 additions & 64 deletions
This file was deleted.

0 commit comments

Comments
 (0)