Skip to content

Commit 7f6e5e2

Browse files
pp-motrexfeathers
andauthored
2v4 mergeback picks (#3668)
* Stop PPDataProxy accessing the file when no data is needed. (#3659) * Add 2.4 whatsnew into full whatsnew list. Co-authored-by: Martin Yeo <[email protected]>
1 parent ecfbcf2 commit 7f6e5e2

File tree

5 files changed

+266
-15
lines changed

5 files changed

+266
-15
lines changed

docs/iris/src/whatsnew/2.4.rst

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
What's New in Iris 2.4.0
2+
************************
3+
4+
:Release: 2.4.0
5+
:Date: 2020-02-20
6+
7+
This document explains the new/changed features of Iris in version 2.4.0
8+
(:doc:`View all changes <index>`.)
9+
10+
11+
Iris 2.4.0 Features
12+
===================
13+
14+
.. admonition:: Last python 2 version of Iris
15+
16+
Iris 2.4 is a final extra release of Iris 2, which back-ports specific desired features from
17+
Iris 3 (not yet released).
18+
19+
The purpose of this is both to support early adoption of certain newer features,
20+
and to provide a final release for Python 2.
21+
22+
The next release of Iris will be version 3.0 : a major-version release which
23+
introduces breaking API and behavioural changes, and only supports Python 3.
24+
25+
* :class:`iris.coord_systems.Geostationary` can now accept creation arguments of
26+
`false_easting=None` or `false_northing=None`, equivalent to values of 0.
27+
Previously these kwargs could be omitted, but could not be set to `None`.
28+
This also enables loading of netcdf data on a Geostationary grid, where either of these
29+
keys is not present as a grid-mapping variable property : Previously, loading any
30+
such data caused an exception.
31+
* The area weights used when performing area weighted regridding with :class:`iris.analysis.AreaWeighted`
32+
are now cached.
33+
This allows a significant speedup when regridding multiple similar cubes, by repeatedly using
34+
a `'regridder' object <../iris/iris/analysis.html?highlight=regridder#iris.analysis.AreaWeighted.regridder>`_
35+
which you created first.
36+
* Name constraint matching against cubes during loading or extracting has been relaxed from strictly matching
37+
against the :meth:`~iris.cube.Cube.name`, to matching against either the
38+
``standard_name``, ``long_name``, NetCDF ``var_name``, or ``STASH`` attributes metadata of a cube.
39+
* Cubes and coordinates now have a new ``names`` property that contains a tuple of the
40+
``standard_name``, ``long_name``, NetCDF ``var_name``, and ``STASH`` attributes metadata.
41+
* The :class:`~iris.NameConstraint` provides richer name constraint matching when loading or extracting
42+
against cubes, by supporting a constraint against any combination of
43+
``standard_name``, ``long_name``, NetCDF ``var_name`` and ``STASH``
44+
from the attributes dictionary of a :class:`~iris.cube.Cube`.
45+
46+
47+
Iris 2.4.0 Dependency Updates
48+
=============================
49+
* Iris is now able to use the latest version of matplotlib.
50+
51+
52+
Bugs Fixed
53+
==========
54+
* Fixed a problem which was causing file loads to fetch *all* field data
55+
whenever UM files (PP or Fieldsfiles) were loaded.
56+
With large sourcefiles, initial file loads are slow, with large memory usage
57+
before any cube data is even fetched. Large enough files will cause a crash.
58+
The problem occurs only with Dask versions >= 2.0.
59+

docs/iris/src/whatsnew/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ Iris versions.
1111

1212
latest.rst
1313
3.0.rst
14+
2.4.rst
1415
2.3.rst
1516
2.2.rst
1617
2.1.rst

lib/iris/fileformats/pp.py

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
)
3939
import iris.fileformats.rules
4040
import iris.coord_systems
41-
41+
from iris.util import _array_slice_ifempty
4242

4343
try:
4444
import mo_pack
@@ -594,19 +594,25 @@ def ndim(self):
594594
return len(self.shape)
595595

596596
def __getitem__(self, keys):
597-
with open(self.path, "rb") as pp_file:
598-
pp_file.seek(self.offset, os.SEEK_SET)
599-
data_bytes = pp_file.read(self.data_len)
600-
data = _data_bytes_to_shaped_array(
601-
data_bytes,
602-
self.lbpack,
603-
self.boundary_packing,
604-
self.shape,
605-
self.src_dtype,
606-
self.mdi,
607-
)
608-
data = data.__getitem__(keys)
609-
return np.asanyarray(data, dtype=self.dtype)
597+
# Check for 'empty' slicings, in which case don't fetch the data.
598+
# Because, since Dask v2, 'dask.array.from_array' performs an empty
599+
# slicing and we must not fetch the data at that time.
600+
result = _array_slice_ifempty(keys, self.shape, self.dtype)
601+
if result is None:
602+
with open(self.path, "rb") as pp_file:
603+
pp_file.seek(self.offset, os.SEEK_SET)
604+
data_bytes = pp_file.read(self.data_len)
605+
data = _data_bytes_to_shaped_array(
606+
data_bytes,
607+
self.lbpack,
608+
self.boundary_packing,
609+
self.shape,
610+
self.src_dtype,
611+
self.mdi,
612+
)
613+
result = data.__getitem__(keys)
614+
615+
return np.asanyarray(result, dtype=self.dtype)
610616

611617
def __repr__(self):
612618
fmt = (

lib/iris/tests/unit/fileformats/pp/test_PPDataProxy.py

Lines changed: 125 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
import iris.tests as tests
1111

1212
from unittest import mock
13+
import numpy as np
1314

1415
from iris.fileformats.pp import PPDataProxy, SplittableInt
1516

@@ -21,7 +22,7 @@ def test_lbpack_SplittableInt(self):
2122
self.assertEqual(proxy.lbpack, lbpack)
2223
self.assertIs(proxy.lbpack, lbpack)
2324

24-
def test_lnpack_raw(self):
25+
def test_lbpack_raw(self):
2526
lbpack = 4321
2627
proxy = PPDataProxy(None, None, None, None, None, lbpack, None, None)
2728
self.assertEqual(proxy.lbpack, lbpack)
@@ -33,5 +34,128 @@ def test_lnpack_raw(self):
3334
self.assertEqual(proxy.lbpack.n4, lbpack // 1000 % 10)
3435

3536

37+
class SliceTranslator:
38+
"""
39+
Class to translate an array-indexing expression into a tuple of keys.
40+
41+
An instance just returns the argument of its __getitem__ call.
42+
43+
"""
44+
45+
def __getitem__(self, keys):
46+
return keys
47+
48+
49+
# A multidimensional-indexable object that returns its index keys, so we can
50+
# use multidimensional-indexing notation to specify a slicing expression.
51+
Slices = SliceTranslator()
52+
53+
54+
class Test__getitem__slicing(tests.IrisTest):
55+
def _check_slicing(
56+
self, test_shape, indices, result_shape, data_was_fetched=True
57+
):
58+
# Check behaviour of the getitem call with specific slicings.
59+
# Especially: check cases where a fetch does *not* read from the file.
60+
# This is necessary because, since Dask 2.0, the "from_array" function
61+
# takes a zero-length slice of its array argument, to capture array
62+
# metadata, and in those cases we want to avoid file access.
63+
test_dtype = np.dtype(np.float32)
64+
proxy = PPDataProxy(
65+
shape=test_shape,
66+
src_dtype=test_dtype,
67+
path=None,
68+
offset=None,
69+
data_len=None,
70+
lbpack=0, # Note: a 'real' value is needed.
71+
boundary_packing=None,
72+
mdi=None,
73+
)
74+
75+
# Mock out the file-open call, to see if the file would be read.
76+
builtin_open_func_name = "builtins.open"
77+
mock_fileopen = self.patch(builtin_open_func_name)
78+
79+
# Also mock out the 'databytes_to_shaped_array' call, to fake minimal
80+
# operation in the cases where file-open *does* get called.
81+
fake_data = np.zeros(test_shape, dtype=test_dtype)
82+
self.patch(
83+
"iris.fileformats.pp._data_bytes_to_shaped_array",
84+
mock.MagicMock(return_value=fake_data),
85+
)
86+
87+
# Test the requested indexing operation.
88+
result = proxy.__getitem__(indices)
89+
90+
# Check the behaviour and results were as expected.
91+
self.assertEqual(mock_fileopen.called, data_was_fetched)
92+
self.assertIsInstance(result, np.ndarray)
93+
self.assertEqual(result.dtype, test_dtype)
94+
self.assertEqual(result.shape, result_shape)
95+
96+
def test_slicing_1d_normal(self):
97+
# A 'normal' 1d testcase with no empty slices.
98+
self._check_slicing(
99+
test_shape=(3,),
100+
indices=Slices[1:10],
101+
result_shape=(2,),
102+
data_was_fetched=True,
103+
)
104+
105+
def test_slicing_1d_empty(self):
106+
# A 1d testcase with an empty slicing.
107+
self._check_slicing(
108+
test_shape=(3,),
109+
indices=Slices[0:0],
110+
result_shape=(0,),
111+
data_was_fetched=False,
112+
)
113+
114+
def test_slicing_2d_normal(self):
115+
# A 2d testcase with no empty slices.
116+
self._check_slicing(
117+
test_shape=(3, 4),
118+
indices=Slices[2, :3],
119+
result_shape=(3,),
120+
data_was_fetched=True,
121+
)
122+
123+
def test_slicing_2d_allempty(self):
124+
# A 2d testcase with all empty slices.
125+
self._check_slicing(
126+
test_shape=(3, 4),
127+
indices=Slices[0:0, 0:0],
128+
result_shape=(0, 0),
129+
data_was_fetched=False,
130+
)
131+
132+
def test_slicing_2d_empty_dim0(self):
133+
# A 2d testcase with an empty slice.
134+
self._check_slicing(
135+
test_shape=(3, 4),
136+
indices=Slices[0:0],
137+
result_shape=(0, 4),
138+
data_was_fetched=False,
139+
)
140+
141+
def test_slicing_2d_empty_dim1(self):
142+
# A 2d testcase with an empty slice, and an integer index.
143+
self._check_slicing(
144+
test_shape=(3, 4),
145+
indices=Slices[1, 0:0],
146+
result_shape=(0,),
147+
data_was_fetched=False,
148+
)
149+
150+
def test_slicing_complex(self):
151+
# Multiple dimensions with multiple empty slices.
152+
self._check_slicing(
153+
test_shape=(3, 4, 2, 5, 6, 3, 7),
154+
indices=Slices[1:3, 2, 0:0, :, 1:1, :100],
155+
result_shape=(2, 0, 5, 0, 3, 7),
156+
data_was_fetched=False,
157+
)
158+
159+
36160
if __name__ == "__main__":
37161
tests.main()

lib/iris/util.py

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -959,6 +959,67 @@ def __lt__(self, other):
959959
return NotImplemented
960960

961961

962+
def _array_slice_ifempty(keys, shape, dtype):
963+
"""
964+
Detect cases where an array slice will contain no data, as it contains a
965+
zero-length dimension, and produce an equivalent result for those cases.
966+
967+
The function indicates 'empty' slicing cases, by returning an array equal
968+
to the slice result in those cases.
969+
970+
Args:
971+
972+
* keys (indexing key, or tuple of keys):
973+
The argument from an array __getitem__ call.
974+
Only tuples of integers and slices are supported, in particular no
975+
newaxis, ellipsis or array keys.
976+
These are the types of array access usage we expect from Dask.
977+
* shape (tuple of int):
978+
The shape of the array being indexed.
979+
* dtype (numpy.dtype):
980+
The dtype of the array being indexed.
981+
982+
Returns:
983+
result (np.ndarray or None):
984+
If 'keys' contains a slice(0, 0), this is an ndarray of the correct
985+
resulting shape and provided dtype.
986+
Otherwise it is None.
987+
988+
.. note::
989+
990+
This is used to prevent DataProxy arraylike objects from fetching their
991+
file data when wrapped as Dask arrays.
992+
This is because, for Dask >= 2.0, the "dask.array.from_array" call
993+
performs a fetch like [0:0, 0:0, ...], to 'snapshot' array metadata.
994+
This function enables us to avoid triggering a file data fetch in those
995+
cases : This is consistent because the result will not contain any
996+
actual data content.
997+
998+
"""
999+
# Convert a single key into a 1-tuple, so we always have a tuple of keys.
1000+
if isinstance(keys, tuple):
1001+
keys_tuple = keys
1002+
else:
1003+
keys_tuple = (keys,)
1004+
1005+
if any(key == slice(0, 0) for key in keys_tuple):
1006+
# An 'empty' slice is present : Return a 'fake' array instead.
1007+
target_shape = list(shape)
1008+
for i_dim, key in enumerate(keys_tuple):
1009+
if key == slice(0, 0):
1010+
# Reduce dims with empty slicing to length 0.
1011+
target_shape[i_dim] = 0
1012+
# Create a prototype result : no memory usage, as some dims are 0.
1013+
result = np.zeros(target_shape, dtype=dtype)
1014+
# Index with original keys to produce the desired result shape.
1015+
# Note : also ok in 0-length dims, as the slice is always '0:0'.
1016+
result = result[keys]
1017+
else:
1018+
result = None
1019+
1020+
return result
1021+
1022+
9621023
def create_temp_filename(suffix=""):
9631024
"""Return a temporary file name.
9641025

0 commit comments

Comments
 (0)