Skip to content

Commit 39437f2

Browse files
committed
Merge branch 'main' of github.com:apache/iceberg-python into fd-support-initial-value
2 parents e99707d + b85127e commit 39437f2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

76 files changed

+3117
-1967
lines changed

.github/dependabot.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ updates:
2222
- package-ecosystem: "pip"
2323
directory: "/"
2424
schedule:
25-
interval: "daily"
25+
interval: "weekly"
2626
open-pull-requests-limit: 50
2727
- package-ecosystem: "github-actions"
2828
directory: "/"
2929
schedule:
30-
interval: "daily"
30+
interval: "weekly"

.github/workflows/pypi-build-artifacts.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ jobs:
6262
if: startsWith(matrix.os, 'ubuntu')
6363

6464
- name: Build wheels
65-
uses: pypa/[email protected].1
65+
uses: pypa/[email protected].2
6666
with:
6767
output-dir: wheelhouse
6868
config-file: "pyproject.toml"

.github/workflows/python-ci.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,8 @@ jobs:
5858
python-version: ${{ matrix.python }}
5959
cache: poetry
6060
cache-dependency-path: ./poetry.lock
61+
- name: Install system dependencies
62+
run: sudo apt-get update && sudo apt-get install -y libkrb5-dev # for kerberos
6163
- name: Install
6264
run: make install-dependencies
6365
- name: Linters

.github/workflows/python-integration.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,8 @@ jobs:
5050
- uses: actions/checkout@v4
5151
with:
5252
fetch-depth: 2
53+
- name: Install system dependencies
54+
run: sudo apt-get update && sudo apt-get install -y libkrb5-dev # for kerberos
5355
- name: Install
5456
run: make install
5557
- name: Run integration tests

.github/workflows/svn-build-artifacts.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ jobs:
5757
if: startsWith(matrix.os, 'ubuntu')
5858

5959
- name: Build wheels
60-
uses: pypa/[email protected].1
60+
uses: pypa/[email protected].2
6161
with:
6262
output-dir: wheelhouse
6363
config-file: "pyproject.toml"

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
help: ## Display this help
2020
@awk 'BEGIN {FS = ":.*##"; printf "\nUsage:\n make \033[36m\033[0m\n"} /^[a-zA-Z_-]+:.*?##/ { printf " \033[36m%-20s\033[0m %s\n", $$1, $$2 } /^##@/ { printf "\n\033[1m%s\033[0m\n", substr($$0, 5) } ' $(MAKEFILE_LIST)
2121

22-
POETRY_VERSION = 2.0.1
22+
POETRY_VERSION = 2.1.1
2323
install-poetry: ## Ensure Poetry is installed and the correct version is being used.
2424
@if ! command -v poetry &> /dev/null; then \
2525
echo "Poetry could not be found. Installing..."; \

dev/Dockerfile

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,20 +39,20 @@ WORKDIR ${SPARK_HOME}
3939
# Remember to also update `tests/conftest`'s spark setting
4040
ENV SPARK_VERSION=3.5.4
4141
ENV ICEBERG_SPARK_RUNTIME_VERSION=3.5_2.12
42-
ENV ICEBERG_VERSION=1.8.0
42+
ENV ICEBERG_VERSION=1.9.0-SNAPSHOT
4343
ENV PYICEBERG_VERSION=0.9.0
4444

4545
RUN curl --retry 5 -s -C - https://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop3.tgz -o spark-${SPARK_VERSION}-bin-hadoop3.tgz \
4646
&& tar xzf spark-${SPARK_VERSION}-bin-hadoop3.tgz --directory /opt/spark --strip-components 1 \
4747
&& rm -rf spark-${SPARK_VERSION}-bin-hadoop3.tgz
4848

4949
# Download iceberg spark runtime
50-
RUN curl --retry 5 -s https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-${ICEBERG_SPARK_RUNTIME_VERSION}/${ICEBERG_VERSION}/iceberg-spark-runtime-${ICEBERG_SPARK_RUNTIME_VERSION}-${ICEBERG_VERSION}.jar \
50+
RUN curl --retry 5 -s https://repository.apache.org/content/groups/snapshots/org/apache/iceberg/iceberg-spark-runtime-3.5_2.12/1.9.0-SNAPSHOT/iceberg-spark-runtime-3.5_2.12-1.9.0-20250409.001855-44.jar \
5151
-Lo /opt/spark/jars/iceberg-spark-runtime-${ICEBERG_SPARK_RUNTIME_VERSION}-${ICEBERG_VERSION}.jar
5252

5353

5454
# Download AWS bundle
55-
RUN curl --retry 5 -s https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-aws-bundle/${ICEBERG_VERSION}/iceberg-aws-bundle-${ICEBERG_VERSION}.jar \
55+
RUN curl --retry 5 -s https://repository.apache.org/content/groups/snapshots/org/apache/iceberg/iceberg-aws-bundle/1.9.0-SNAPSHOT/iceberg-aws-bundle-1.9.0-20250409.002731-88.jar \
5656
-Lo /opt/spark/jars/iceberg-aws-bundle-${ICEBERG_VERSION}.jar
5757

5858
COPY spark-defaults.conf /opt/spark/conf

dev/provision.py

Lines changed: 85 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
# KIND, either express or implied. See the License for the
1515
# specific language governing permissions and limitations
1616
# under the License.
17+
import math
1718

1819
from pyspark.sql import SparkSession
1920
from pyspark.sql.functions import current_date, date_add, expr
@@ -113,89 +114,99 @@
113114
"""
114115
)
115116

116-
spark.sql(
117-
f"""
118-
CREATE OR REPLACE TABLE {catalog_name}.default.test_positional_mor_deletes (
119-
dt date,
120-
number integer,
121-
letter string
122-
)
123-
USING iceberg
124-
TBLPROPERTIES (
125-
'write.delete.mode'='merge-on-read',
126-
'write.update.mode'='merge-on-read',
127-
'write.merge.mode'='merge-on-read',
128-
'format-version'='2'
129-
);
130-
"""
131-
)
117+
# Merge on read has been implemented in version ≥2:
118+
# v2: Using positional deletes
119+
# v3: Using deletion vectors
132120

133-
spark.sql(
134-
f"""
135-
INSERT INTO {catalog_name}.default.test_positional_mor_deletes
136-
VALUES
137-
(CAST('2023-03-01' AS date), 1, 'a'),
138-
(CAST('2023-03-02' AS date), 2, 'b'),
139-
(CAST('2023-03-03' AS date), 3, 'c'),
140-
(CAST('2023-03-04' AS date), 4, 'd'),
141-
(CAST('2023-03-05' AS date), 5, 'e'),
142-
(CAST('2023-03-06' AS date), 6, 'f'),
143-
(CAST('2023-03-07' AS date), 7, 'g'),
144-
(CAST('2023-03-08' AS date), 8, 'h'),
145-
(CAST('2023-03-09' AS date), 9, 'i'),
146-
(CAST('2023-03-10' AS date), 10, 'j'),
147-
(CAST('2023-03-11' AS date), 11, 'k'),
148-
(CAST('2023-03-12' AS date), 12, 'l');
149-
"""
150-
)
121+
for format_version in [2, 3]:
122+
identifier = f'{catalog_name}.default.test_positional_mor_deletes_v{format_version}'
123+
spark.sql(
124+
f"""
125+
CREATE OR REPLACE TABLE {identifier} (
126+
dt date,
127+
number integer,
128+
letter string
129+
)
130+
USING iceberg
131+
TBLPROPERTIES (
132+
'write.delete.mode'='merge-on-read',
133+
'write.update.mode'='merge-on-read',
134+
'write.merge.mode'='merge-on-read',
135+
'format-version'='{format_version}'
136+
);
137+
"""
138+
)
139+
140+
spark.sql(
141+
f"""
142+
INSERT INTO {identifier}
143+
VALUES
144+
(CAST('2023-03-01' AS date), 1, 'a'),
145+
(CAST('2023-03-02' AS date), 2, 'b'),
146+
(CAST('2023-03-03' AS date), 3, 'c'),
147+
(CAST('2023-03-04' AS date), 4, 'd'),
148+
(CAST('2023-03-05' AS date), 5, 'e'),
149+
(CAST('2023-03-06' AS date), 6, 'f'),
150+
(CAST('2023-03-07' AS date), 7, 'g'),
151+
(CAST('2023-03-08' AS date), 8, 'h'),
152+
(CAST('2023-03-09' AS date), 9, 'i'),
153+
(CAST('2023-03-10' AS date), 10, 'j'),
154+
(CAST('2023-03-11' AS date), 11, 'k'),
155+
(CAST('2023-03-12' AS date), 12, 'l');
156+
"""
157+
)
151158

152-
spark.sql(f"ALTER TABLE {catalog_name}.default.test_positional_mor_deletes CREATE TAG tag_12")
159+
spark.sql(f"ALTER TABLE {identifier} CREATE TAG tag_12")
153160

154-
spark.sql(f"ALTER TABLE {catalog_name}.default.test_positional_mor_deletes CREATE BRANCH without_5")
161+
spark.sql(f"ALTER TABLE {identifier} CREATE BRANCH without_5")
155162

156-
spark.sql(f"DELETE FROM {catalog_name}.default.test_positional_mor_deletes.branch_without_5 WHERE number = 5")
163+
spark.sql(f"DELETE FROM {identifier}.branch_without_5 WHERE number = 5")
157164

158-
spark.sql(f"DELETE FROM {catalog_name}.default.test_positional_mor_deletes WHERE number = 9")
165+
spark.sql(f"DELETE FROM {identifier} WHERE number = 9")
159166

160-
spark.sql(
161-
f"""
162-
CREATE OR REPLACE TABLE {catalog_name}.default.test_positional_mor_double_deletes (
163-
dt date,
164-
number integer,
165-
letter string
166-
)
167-
USING iceberg
168-
TBLPROPERTIES (
169-
'write.delete.mode'='merge-on-read',
170-
'write.update.mode'='merge-on-read',
171-
'write.merge.mode'='merge-on-read',
172-
'format-version'='2'
173-
);
174-
"""
175-
)
167+
identifier = f'{catalog_name}.default.test_positional_mor_double_deletes_v{format_version}'
176168

177-
spark.sql(
178-
f"""
179-
INSERT INTO {catalog_name}.default.test_positional_mor_double_deletes
180-
VALUES
181-
(CAST('2023-03-01' AS date), 1, 'a'),
182-
(CAST('2023-03-02' AS date), 2, 'b'),
183-
(CAST('2023-03-03' AS date), 3, 'c'),
184-
(CAST('2023-03-04' AS date), 4, 'd'),
185-
(CAST('2023-03-05' AS date), 5, 'e'),
186-
(CAST('2023-03-06' AS date), 6, 'f'),
187-
(CAST('2023-03-07' AS date), 7, 'g'),
188-
(CAST('2023-03-08' AS date), 8, 'h'),
189-
(CAST('2023-03-09' AS date), 9, 'i'),
190-
(CAST('2023-03-10' AS date), 10, 'j'),
191-
(CAST('2023-03-11' AS date), 11, 'k'),
192-
(CAST('2023-03-12' AS date), 12, 'l');
193-
"""
194-
)
169+
spark.sql(
170+
f"""
171+
CREATE OR REPLACE TABLE {identifier} (
172+
dt date,
173+
number integer,
174+
letter string
175+
)
176+
USING iceberg
177+
TBLPROPERTIES (
178+
'write.delete.mode'='merge-on-read',
179+
'write.update.mode'='merge-on-read',
180+
'write.merge.mode'='merge-on-read',
181+
'format-version'='2'
182+
);
183+
"""
184+
)
195185

196-
spark.sql(f"DELETE FROM {catalog_name}.default.test_positional_mor_double_deletes WHERE number = 9")
186+
spark.sql(
187+
f"""
188+
INSERT INTO {identifier}
189+
VALUES
190+
(CAST('2023-03-01' AS date), 1, 'a'),
191+
(CAST('2023-03-02' AS date), 2, 'b'),
192+
(CAST('2023-03-03' AS date), 3, 'c'),
193+
(CAST('2023-03-04' AS date), 4, 'd'),
194+
(CAST('2023-03-05' AS date), 5, 'e'),
195+
(CAST('2023-03-06' AS date), 6, 'f'),
196+
(CAST('2023-03-07' AS date), 7, 'g'),
197+
(CAST('2023-03-08' AS date), 8, 'h'),
198+
(CAST('2023-03-09' AS date), 9, 'i'),
199+
(CAST('2023-03-10' AS date), 10, 'j'),
200+
(CAST('2023-03-11' AS date), 11, 'k'),
201+
(CAST('2023-03-12' AS date), 12, 'l');
202+
"""
203+
)
197204

198-
spark.sql(f"DELETE FROM {catalog_name}.default.test_positional_mor_double_deletes WHERE letter == 'f'")
205+
# Perform two deletes, should produce:
206+
# v2: two positional delete files in v2
207+
# v3: one deletion vector since they are merged
208+
spark.sql(f"DELETE FROM {identifier} WHERE number = 9")
209+
spark.sql(f"DELETE FROM {identifier} WHERE letter == 'f'")
199210

200211
all_types_dataframe = (
201212
spark.range(0, 5, 1, 5)

mkdocs/docs/api.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,17 @@ static_table = StaticTable.from_metadata(
215215

216216
The static-table is considered read-only.
217217

218+
Alternatively, if your table metadata directory contains a `version-hint.text` file, you can just specify
219+
the table root path, and the latest metadata file will be picked automatically.
220+
221+
```python
222+
from pyiceberg.table import StaticTable
223+
224+
static_table = StaticTable.from_metadata(
225+
"s3://warehouse/wh/nyc.db/taxis
226+
)
227+
```
228+
218229
## Check if a table exists
219230

220231
To check whether the `bids` table exists:

mkdocs/docs/configuration.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,7 @@ PyIceberg uses [S3FileSystem](https://arrow.apache.org/docs/python/generated/pya
189189
| s3.access-key-id | admin | Configure the static access key id used to access the FileIO. |
190190
| s3.secret-access-key | password | Configure the static secret access key used to access the FileIO. |
191191
| s3.session-token | AQoDYXdzEJr... | Configure the static session token used to access the FileIO. |
192-
| s3.force-virtual-addressing | True | Whether to use virtual addressing of buckets. This must be set to True as OSS can only be accessed with virtual hosted style address. |
192+
| s3.force-virtual-addressing | True | Whether to use virtual addressing of buckets. This is set to `True` by default as OSS can only be accessed with virtual hosted style address. |
193193

194194
<!-- markdown-link-check-enable-->
195195

0 commit comments

Comments
 (0)