Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,5 @@
**/.terraform
**/node_modules
**/.terraform
**/docs/_build
**/htmlcov
74 changes: 74 additions & 0 deletions datasets/nasa-nex-gddp-cmip6-netcdf/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
FROM ubuntu:20.04

# Setup timezone info
ENV TZ=UTC

ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

RUN apt-get update && apt-get install -y software-properties-common

RUN add-apt-repository ppa:ubuntugis/ppa && \
apt-get update && \
apt-get install -y build-essential python3-dev python3-pip \
jq unzip ca-certificates wget curl git && \
apt-get autoremove && apt-get autoclean && apt-get clean

RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 10

# See https://github.com/mapbox/rasterio/issues/1289
ENV CURL_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt

# Install Python 3.10
RUN curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh" \
&& bash "Mambaforge-$(uname)-$(uname -m).sh" -b -p /opt/conda \
&& rm -rf "Mambaforge-$(uname)-$(uname -m).sh"

ENV PATH /opt/conda/bin:$PATH
ENV LD_LIBRARY_PATH /opt/conda/lib/:$LD_LIBRARY_PATH

RUN mamba install -y -c conda-forge python=3.10 gdal=3.3.3 pip setuptools cython numpy==1.21.5

RUN python -m pip install --upgrade pip

# Install common packages
COPY requirements-task-base.txt /tmp/requirements.txt
RUN python -m pip install --no-build-isolation -r /tmp/requirements.txt

#
# Copy and install packages
#

COPY pctasks/core /opt/src/pctasks/core
RUN cd /opt/src/pctasks/core && \
pip install .

COPY pctasks/cli /opt/src/pctasks/cli
RUN cd /opt/src/pctasks/cli && \
pip install .

COPY pctasks/task /opt/src/pctasks/task
RUN cd /opt/src/pctasks/task && \
pip install .

COPY pctasks/client /opt/src/pctasks/client
RUN cd /opt/src/pctasks/client && \
pip install .

COPY pctasks/ingest /opt/src/pctasks/ingest
RUN cd /opt/src/pctasks/ingest && \
pip install .

COPY pctasks/dataset /opt/src/pctasks/dataset
RUN cd /opt/src/pctasks/dataset && \
pip install .

COPY ./datasets/nasa-nex-gddp-cmip6-netcdf/requirements.txt /opt/src/datasets/nasa-nex-gddp-cmip6-netcdf/requirements.txt
RUN python3 -m pip install -r /opt/src/datasets/nasa-nex-gddp-cmip6-netcdf/requirements.txt

# Setup Python Path to allow import of test modules
ENV PYTHONPATH=/opt/src:$PYTHONPATH

WORKDIR /opt/src
44 changes: 44 additions & 0 deletions datasets/nasa-nex-gddp-cmip6-netcdf/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# planetary-computer-tasks dataset: nasa-nex-gddp-cmip6-netcdf

NASA NEX GDDP CMIP6 Dataset

## Building the Docker image

To build and push a custom docker image to our container registry:

```shell
az acr build -r {the registry} --subscription {the subscription} -t pctasks-nasa-nex-gddp-cmip6-netcdf:latest -t pctasks-nasa-nex-gddp-cmip6-netcdf:{date}.{count} -f datasets/nasa-nex-gddp-cmip6-netcdf/Dockerfile .
```

## Version Information

The upstream provider will occasionally update certain assets in the dataset
(e.g. the `pr` variable will be updated for some models). We want to host just
the latest version of each asset.

The code in `nasa_nex_gddp_cmip6.py` will list files under a prefix and discover
the latest version of each asset. These files are read and passed into the STAC
item creation method.

## Static update

This collection is not regularly updated.

```console
$ pctasks dataset process-items \
-d datasets/nasa-nex-gddp-cmip6-netcdf/dataset.yaml \
nasa-nex-gddp-cmip-test
--arg registry pccommponents.azurecr.io \
--upsert --submit
```

## Kerchunk Index Files

We have "experimental" Kerchunk index files. We include a
[kerchunk-workflow](./kerchunk-workflow.yaml) for generating these files.


**Notes:**

- Currently uses chunk size of one, because the item creation was timing out with chunksize of 100. However, haven't investigated middle ground.
- Runs in about 10 hours.
Loading