Skip to content

Conversation

antscloud
Copy link
Contributor

As discussed in #8302, here is a first attempt to implement cftime_rs.

There are a lot of tests and I struggle to understand all the processing in coding/times.py.
However, with this first attempt I've been able to make the test_cf_datetime work (ignoring one test)

@requires_cftime
@pytest.mark.filterwarnings("ignore:Ambiguous reference date string")
@pytest.mark.filterwarnings("ignore:Times can't be serialized faithfully")
@pytest.mark.parametrize(["num_dates", "units", "calendar"], _CF_DATETIME_TESTS)
def test_cf_datetime(num_dates, units, calendar) -> None:

Also there are some key differences betwwen cftime and cftime-rs :

  • A long int is used to represent the timestamp internally, so cftime-rs will not overflow as soon as numpy, python or cftime. It can go from -291,672,107,014 BC to 291,672,107,014 AD approximately and this depends on calendar.
  • There is no only_use_python_datetimes argument. Instead there are 4 distinct functions :
  • These functions only take a python list of one dimension and return a list of one dimension. A conversion should be done before.
  • There is no multiple datetime type (there are hidden) but instead a single object PyCFDatetime
  • There is no conda repository at the moment

Finally, and regardless of this PR, I guess there could be a speed improvement by vectorizing operations by replacing this :

def _encode_datetime_with_cftime(dates, units: str, calendar: str) -> np.ndarray:
"""Fallback method for encoding dates using cftime.
This method is more flexible than xarray's parsing using datetime64[ns]
arrays but also slower because it loops over each element.
"""
if cftime is None:
raise ModuleNotFoundError("No module named 'cftime'")
if np.issubdtype(dates.dtype, np.datetime64):
# numpy's broken datetime conversion only works for us precision
dates = dates.astype("M8[us]").astype(datetime)
def encode_datetime(d):
# Since netCDF files do not support storing float128 values, we ensure
# that float64 values are used by setting longdouble=False in num2date.
# This try except logic can be removed when xarray's minimum version of
# cftime is at least 1.6.2.
try:
return (
np.nan
if d is None
else cftime.date2num(d, units, calendar, longdouble=False)
)
except TypeError:
return np.nan if d is None else cftime.date2num(d, units, calendar)
return np.array([encode_datetime(d) for d in dates.ravel()]).reshape(dates.shape)

by something like this :

def encode_datetime_with_cftime(
dates, units: str, calendar: str
) -> np.ndarray[int | float]:
"""Fallback method for encoding dates using cftime.
This method is more flexible than xarray's parsing using datetime64[ns]
arrays but also slower because it loops over each element.
"""
if cftime is None:
raise ModuleNotFoundError("No module named 'cftime-rs'")
dates = np.array(dates)
if np.issubdtype(dates.dtype, np.datetime64):
# numpy's broken datetime conversion only works for us precision
dates = dates.astype("M8[us]").astype(datetime)
# Find all the none or NaN position
none_position = np.equal(dates, None)
# Remove NaN from the dates
filtered_dates = dates[~none_position]
print(filtered_dates)
# encoded_nums will be the same size as filtered_dates
# Try converting to f64 first to avoid unnecessary conversion to i64
try:
encoded_nums = cftime.pydate2num(
filtered_dates.tolist(), units, calendar, dtype="f64"
)
except TypeError:
encoded_nums = cftime.pydate2num(
filtered_dates.tolist(), units, calendar, dtype="i64"
)
# Create a full matrix of NaN
# And fill the num dates in the not NaN or None position
result = np.full(dates.shape, np.nan)
result[np.nonzero(~none_position)] = encoded_nums
return result

We can use numpy instead of list comprehensions. It takes a bit more of memory though.

@github-actions github-actions bot added topic-cftime CI Continuous Integration tools dependencies Pull requests that update a dependency file labels Oct 17, 2023
@dcherian
Copy link
Contributor

Finally, and regardless of this PR, I guess there could be a speed improvement by vectorizing operations by replacing this

Interesting. Can you open a separate PR with that change please? It'd be nice to get it in ASAP if it works.

antscloud added a commit to antscloud/xarray that referenced this pull request Oct 17, 2023
@Illviljan Illviljan added the run-benchmark Run the ASV benchmark workflow label Oct 17, 2023
antscloud added a commit to antscloud/xarray that referenced this pull request Jul 14, 2024
spencerkclark added a commit that referenced this pull request May 30, 2025
* feat: implement as discussed in PR #8322

* fix: Stick closer to the original function and add PR suggestions

* Update xarray/coding/times.py

Co-authored-by: Michael Niklas  <[email protected]>

* Update xarray/coding/times.py

* Update xarray/coding/times.py

* Add what's new entry

---------

Co-authored-by: Illviljan <[email protected]>
Co-authored-by: Deepak Cherian <[email protected]>
Co-authored-by: Michael Niklas <[email protected]>
Co-authored-by: Kai Mühlbauer <[email protected]>
Co-authored-by: Kai Mühlbauer <[email protected]>
Co-authored-by: Spencer Clark <[email protected]>
dcherian added a commit to dcherian/xarray that referenced this pull request May 30, 2025
* main:
  Fix performance regression in interp from pydata#9881 (pydata#10370)
  html repr: improve style for dropdown sections (pydata#10354)
  Grouper tweaks. (pydata#10362)
  Docs: Add links to getting help mermaid diagram (pydata#10324)
  Enforce ruff/flynt rules (FLY) (pydata#10375)
  Add missing AbstractWritableDataStore base methods and arguments (pydata#10343)
  Improve html repr in dark mode (Jupyterlab + Xarray docs) (pydata#10353)
  Pin Mypy to 1.15 (pydata#10378)
  use numpy dtype exposed by zarr array instead of metadata.data_type (pydata#10348)
  Fix doc typo for caption "Interoperability" (pydata#10374)
  Implement cftime vectorization as discussed in PR pydata#8322 (pydata#8324)
  Enforce ruff/flake8-pyi rules (PYI) (pydata#10359)
  Apply assorted ruff/Pylint rules (PL) / Enforce PLE rules (pydata#10366)
  (fix): pandas extension array repr for int64[pyarrow] (pydata#10317)
  Enforce ruff/flake8-implicit-str-concat rules (ISC) (pydata#10368)
  Enforce ruff/refurb rules (FURB) (pydata#10367)
  Ignore ruff/Pyflakes rule F401 more precisely (pydata#10369)
  Apply assorted ruff/flake8-simplify rules (SIM) (pydata#10364)
  Apply assorted ruff/flake8-pytest-style rules (PT) (pydata#10363)
  Fix "a array" misspelling (pydata#10365)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration tools dependencies Pull requests that update a dependency file run-benchmark Run the ASV benchmark workflow topic-cftime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants