From ce05a10dc8c4aeed10ebf6b7c19720dea644a0be Mon Sep 17 00:00:00 2001
From: Richard Hattersley <rhattersley@gmail.com>
Date: Wed, 27 Apr 2016 15:29:53 +0100
Subject: [PATCH 1/4] First draft of IEP 1.

---
 docs/iris/src/IEP/IEP001.adoc | 138 ++++++++++++++++++++++++++++++++++
 1 file changed, 138 insertions(+)
 create mode 100644 docs/iris/src/IEP/IEP001.adoc

diff --git a/docs/iris/src/IEP/IEP001.adoc b/docs/iris/src/IEP/IEP001.adoc
new file mode 100644
index 0000000000..18493bc2ab
--- /dev/null
+++ b/docs/iris/src/IEP/IEP001.adoc
@@ -0,0 +1,138 @@
+# IEP 1 - Enhanced indexing
+
+## Background
+
+Currently, to select a subset of a Cube based on coordinate values we use something like:
+[source,python]
+----
+cube.extract(iris.Constraint(realization=3,
+                             model_level_number=[1, 5],
+                             latitude=lambda cell: 40 <= cell <= 60))
+----
+On the plus side, this works irrespective of the dimension order of the data, but the drawbacks with this form of indexing include:
+
+* It uses a completely different syntax to position-based indexing, e.g. `cube[4, 0:6]`.
+* It uses a completely different syntax to pandas and xarray value-based indexing, e.g. `df[4, 0:6]`.
+* It is long-winded.
+
+Similarly, to select a subset of a Cube using positional indices but where the dimension is unknown has no standard syntax _at all_! Instead it requires code akin to:
+[source,python]
+----
+key = [slice(None)] * cube.ndim
+key[cube.coord_dims('model_level_number')[0]] = slice(3, 9, 2)
+cube[tuple(key)]
+----
+
+The only form of indexing that is well supported is indexing by position where the dimension order is known:
+[source,python]
+----
+cube[4, 0:6, 30:]
+----
+
+## Proposal
+
+Provide indexing helpers on the Cube to extend support to all permutations of positional vs. named dimensions and positional vs. coordinate-value based selection.
+
+### Extended pandas style
+
+Use a single helper for index by position, and a single helper for index by value. Helper names taken from pandas, but their behaviour is extended by making them callable to support named dimensions.
+
+|===
+2.2+| 2+h|Index by
+h|Position h|Value
+
+.2+h|Dimension
+h|Position
+
+a|[source,python]
+----
+cube[:, 2]  # No change
+cube.iloc[:, 2]
+----
+
+a|[source,python]
+----
+cube.loc[:, 1.5]
+----
+
+h|Name
+
+a|[source,python]
+----
+cube[dict(height=2)]
+cube.iloc[dict(height=2)]
+cube.iloc(height=2)
+----
+
+a|[source,python]
+----
+cube.loc[dict(height=1.5)]
+cube.loc(height=1.5)
+----
+|===
+
+### xarray style
+
+xarray introduces a second set of helpers for accessing named dimensions that provide the callable syntax `(foo=...)`.
+
+|===
+2.2+| 2+h|Index by
+h|Position h|Value
+
+.2+h|Dimension
+h|Position
+
+a|[source,python]
+----
+cube[:, 2]  # No change
+----
+
+a|[source,python]
+----
+cube.loc[:, 1.5]
+----
+
+h|Name
+
+a|[source,python]
+----
+ cube[dict(height=2)]
+ cube.isel(height=2)
+----
+
+a|[source,python]
+----
+cube.loc[dict(height=1.5)]
+cube.sel(height=1.5)
+----
+|===
+
+### TODO
+* Consistent terminology
+* `coord.name()` vs. `var_name` vs. "dimension name"?
+* Names that aren't valid Python identifiers
+* Inclusive vs. exclusive
+** Default: Inclusive? (as for pandas & xarray)
+** Use boolean otherwise.
+* Multi-dimensional coordinates
+* Non-orthogonal coordinates
+* Bounds
+* Boolean array indexing
+* Lambdas?
+* What to do about constrained loading?
+* Relationship to http://scitools.org.uk/iris/docs/v1.9.2/iris/iris/cube.html#iris.cube.Cube.intersection[iris.cube.Cube.intersection]?
+* Relationship to interpolation (especially nearest-neighbour)?
+** e.g. What to do about values that don't exist?
+*** pandas throws a KeyError
+*** xarray supports (several) nearest-neighbour schemes via http://xarray.pydata.org/en/stable/indexing.html#nearest-neighbor-lookups[`data.sel()`]
+*** Apparently http://holoviews.org/[holoviews] does nearest-neighbour interpolation.
+* Time handling
+** e.g. Rich Signell's http://nbviewer.jupyter.org/gist/rsignell-usgs/13d7ce9d95fddb4983d4cbf98be6c71d[xarray/iris comparison]
+
+## References
+. Iris
+ * http://scitools.org.uk/iris/docs/v1.9.2/iris/iris.html#iris.Constraint[iris.Constraint]
+ * http://scitools.org.uk/iris/docs/v1.9.2/userguide/subsetting_a_cube.html[Subsetting a cube]
+. http://pandas.pydata.org/pandas-docs/stable/indexing.html[pandas indexing]
+. http://xarray.pydata.org/en/stable/indexing.html[xarray indexing]
+. http://legacy.python.org/dev/peps/pep-0472/[PEP 472 - Support for indexing with keyword arguments]

From e080ea047efc59094890627f0e7e2bf69e95c1d6 Mon Sep 17 00:00:00 2001
From: Richard Hattersley <rhattersley@gmail.com>
Date: Wed, 27 Apr 2016 16:43:02 +0100
Subject: [PATCH 2/4] Add "Out of scope" and "Work required"

---
 docs/iris/src/IEP/IEP001.adoc | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/docs/iris/src/IEP/IEP001.adoc b/docs/iris/src/IEP/IEP001.adoc
index 18493bc2ab..430cd995f3 100644
--- a/docs/iris/src/IEP/IEP001.adoc
+++ b/docs/iris/src/IEP/IEP001.adoc
@@ -33,6 +33,15 @@ cube[4, 0:6, 30:]
 
 Provide indexing helpers on the Cube to extend support to all permutations of positional vs. named dimensions and positional vs. coordinate-value based selection.
 
+### Out of scope
+
+* Deliberately enhancing the performance.
+This is a very valuable topic and should be addressed by subsequent efforts.
+
+* Time/date values as strings.
+Providing pandas-style string representations for convenient representation of partial date/times should be addressed in a subsequent effort.
+There is a risk that this topic could bog down when dealing with non-standard calendars and climatological date ranges.
+
 ### Extended pandas style
 
 Use a single helper for index by position, and a single helper for index by value. Helper names taken from pandas, but their behaviour is extended by making them callable to support named dimensions.
@@ -107,6 +116,12 @@ cube.sel(height=1.5)
 ----
 |===
 
+## Work required
+
+* Implementations for each of the new helper objects.
+* An update to the documentation to demonstrate best practice. Known impacted areas include:
+** The "Subsetting a Cube" chapter of the user guide.
+
 ### TODO
 * Consistent terminology
 * `coord.name()` vs. `var_name` vs. "dimension name"?
@@ -126,8 +141,6 @@ cube.sel(height=1.5)
 *** pandas throws a KeyError
 *** xarray supports (several) nearest-neighbour schemes via http://xarray.pydata.org/en/stable/indexing.html#nearest-neighbor-lookups[`data.sel()`]
 *** Apparently http://holoviews.org/[holoviews] does nearest-neighbour interpolation.
-* Time handling
-** e.g. Rich Signell's http://nbviewer.jupyter.org/gist/rsignell-usgs/13d7ce9d95fddb4983d4cbf98be6c71d[xarray/iris comparison]
 
 ## References
 . Iris
@@ -136,3 +149,4 @@ cube.sel(height=1.5)
 . http://pandas.pydata.org/pandas-docs/stable/indexing.html[pandas indexing]
 . http://xarray.pydata.org/en/stable/indexing.html[xarray indexing]
 . http://legacy.python.org/dev/peps/pep-0472/[PEP 472 - Support for indexing with keyword arguments]
+. http://nbviewer.jupyter.org/gist/rsignell-usgs/13d7ce9d95fddb4983d4cbf98be6c71d[Time slicing NetCDF or OPeNDAP datasets] - Rich Signell's xarray/iris comparison focussing on time handling and performance

From 8f6b53675742a083f53cd804a144128ff5c2f9d9 Mon Sep 17 00:00:00 2001
From: Richard Hattersley <rhattersley@gmail.com>
Date: Wed, 27 Apr 2016 17:01:32 +0100
Subject: [PATCH 3/4] Names that aren't valid Python identifiers

---
 docs/iris/src/IEP/IEP001.adoc | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/docs/iris/src/IEP/IEP001.adoc b/docs/iris/src/IEP/IEP001.adoc
index 430cd995f3..764536974b 100644
--- a/docs/iris/src/IEP/IEP001.adoc
+++ b/docs/iris/src/IEP/IEP001.adoc
@@ -33,6 +33,10 @@ cube[4, 0:6, 30:]
 
 Provide indexing helpers on the Cube to extend support to all permutations of positional vs. named dimensions and positional vs. coordinate-value based selection.
 
+Commonly, the names of dimensions are also valid Python identifiers.
+For names where this is not true, the names can expressed through either the `helper[...]` or `helper(...)` syntax by constructing an explicit dict.
+For example: `cube.loc[{'12': 0}]` or `cube.loc(**{'12': 0})`.
+
 ### Out of scope
 
 * Deliberately enhancing the performance.
@@ -125,7 +129,6 @@ cube.sel(height=1.5)
 ### TODO
 * Consistent terminology
 * `coord.name()` vs. `var_name` vs. "dimension name"?
-* Names that aren't valid Python identifiers
 * Inclusive vs. exclusive
 ** Default: Inclusive? (as for pandas & xarray)
 ** Use boolean otherwise.

From 993339549f702e10143883a994db00cbb7032421 Mon Sep 17 00:00:00 2001
From: Richard Hattersley <rhattersley@gmail.com>
Date: Thu, 28 Apr 2016 11:10:35 +0100
Subject: [PATCH 4/4] Slice behaviour and misc clarifications

---
 docs/iris/src/IEP/IEP001.adoc | 98 ++++++++++++++++++++++++-----------
 1 file changed, 68 insertions(+), 30 deletions(-)

diff --git a/docs/iris/src/IEP/IEP001.adoc b/docs/iris/src/IEP/IEP001.adoc
index 764536974b..d38b2e8478 100644
--- a/docs/iris/src/IEP/IEP001.adoc
+++ b/docs/iris/src/IEP/IEP001.adoc
@@ -12,10 +12,11 @@ cube.extract(iris.Constraint(realization=3,
 On the plus side, this works irrespective of the dimension order of the data, but the drawbacks with this form of indexing include:
 
 * It uses a completely different syntax to position-based indexing, e.g. `cube[4, 0:6]`.
-* It uses a completely different syntax to pandas and xarray value-based indexing, e.g. `df[4, 0:6]`.
-* It is long-winded.
+* It uses a completely different syntax to pandas and xarray value-based indexing, e.g. `df.loc[4, 0:6]`.
+* It is long-winded and requires the use of an additional class.
+* It requires the use of lambda functions even when just selecting a range.
 
-Similarly, to select a subset of a Cube using positional indices but where the dimension is unknown has no standard syntax _at all_! Instead it requires code akin to:
+Arguably, the situation when subsetting using positional indices but where the dimension order is unknown is even worse - it has no standard syntax _at all_! Instead it requires code akin to:
 [source,python]
 ----
 key = [slice(None)] * cube.ndim
@@ -31,31 +32,26 @@ cube[4, 0:6, 30:]
 
 ## Proposal
 
-Provide indexing helpers on the Cube to extend support to all permutations of positional vs. named dimensions and positional vs. coordinate-value based selection.
+Provide indexing helpers on the Cube to extend explicit support to all permutations of:
 
-Commonly, the names of dimensions are also valid Python identifiers.
-For names where this is not true, the names can expressed through either the `helper[...]` or `helper(...)` syntax by constructing an explicit dict.
-For example: `cube.loc[{'12': 0}]` or `cube.loc(**{'12': 0})`.
-
-### Out of scope
+* implicit dimension vs. named coordinate,
+* and positional vs. coordinate-value based selection.
 
-* Deliberately enhancing the performance.
-This is a very valuable topic and should be addressed by subsequent efforts.
+### Helper syntax options
 
-* Time/date values as strings.
-Providing pandas-style string representations for convenient representation of partial date/times should be addressed in a subsequent effort.
-There is a risk that this topic could bog down when dealing with non-standard calendars and climatological date ranges.
+Commonly, the names of coordinates are also valid Python identifiers.
+For names where this is not true, the names can expressed through either the `helper[...]` or `helper(...)` syntax by constructing an explicit dict.
+For example: `cube.loc[{'12': 0}]` or `cube.loc(**{'12': 0})`.
 
-### Extended pandas style
+#### Extended pandas style
 
-Use a single helper for index by position, and a single helper for index by value. Helper names taken from pandas, but their behaviour is extended by making them callable to support named dimensions.
+Use a single helper for index by position, and a single helper for index by value. Helper names taken from pandas, but their behaviour is extended by making them callable to support named coordinates.
 
 |===
-2.2+| 2+h|Index by
+.2+| 2+h|Index by
 h|Position h|Value
 
-.2+h|Dimension
-h|Position
+h|Implicit dimension
 
 a|[source,python]
 ----
@@ -68,7 +64,7 @@ a|[source,python]
 cube.loc[:, 1.5]
 ----
 
-h|Name
+h|Coordinate name
 
 a|[source,python]
 ----
@@ -84,16 +80,15 @@ cube.loc(height=1.5)
 ----
 |===
 
-### xarray style
+#### xarray style
 
 xarray introduces a second set of helpers for accessing named dimensions that provide the callable syntax `(foo=...)`.
 
 |===
-2.2+| 2+h|Index by
+.2+| 2+h|Index by
 h|Position h|Value
 
-.2+h|Dimension
-h|Position
+h|Implicit dimension
 
 a|[source,python]
 ----
@@ -105,7 +100,7 @@ a|[source,python]
 cube.loc[:, 1.5]
 ----
 
-h|Name
+h|Coordinate name
 
 a|[source,python]
 ----
@@ -120,6 +115,40 @@ cube.sel(height=1.5)
 ----
 |===
 
+### Slices
+
+The semantics of position-based slices will continue to match that of normal Python slices. The start position is included, the end position is excluded.
+
+Value-based slices will be stricly inclusive, with both the start and end values included. This behaviour differs from normal Python slices but is in common with pandas.
+
+Just as for normal Python slices, we do not need to provide the ability to control the include/exclude behaviour for slicing.
+
+### Value-based indexing
+
+#### Equality
+
+Should the behaviour of value-based equality depend on the data type of the coordinate?
+
+* integer: exact match
+* float: tolerance match, tolerance determined by bit-width
+* string: exact match
+
+#### Scalar/category
+
+If/how to deal with category selection `cube.loc(season='JJA')`? Defer to `groupby()`?
+
+`cube.loc[12]` - must always match a single value or raise KeyError, corresponding dimension will be removed
+`cube.loc[[12]]` - may match any number of values?  (incl. zero?), dimension will be retained
+
+### Out of scope
+
+* Deliberately enhancing the performance.
+This is a very valuable topic and should be addressed by subsequent efforts.
+
+* Time/date values as strings.
+Providing pandas-style string representations for convenient representation of partial date/times should be addressed in a subsequent effort - perhaps in conjunction with an explicit performance test suite.
+There is a risk that this topic could bog down when dealing with non-standard calendars and climatological date ranges.
+
 ## Work required
 
 * Implementations for each of the new helper objects.
@@ -127,11 +156,6 @@ cube.sel(height=1.5)
 ** The "Subsetting a Cube" chapter of the user guide.
 
 ### TODO
-* Consistent terminology
-* `coord.name()` vs. `var_name` vs. "dimension name"?
-* Inclusive vs. exclusive
-** Default: Inclusive? (as for pandas & xarray)
-** Use boolean otherwise.
 * Multi-dimensional coordinates
 * Non-orthogonal coordinates
 * Bounds
@@ -144,6 +168,20 @@ cube.sel(height=1.5)
 *** pandas throws a KeyError
 *** xarray supports (several) nearest-neighbour schemes via http://xarray.pydata.org/en/stable/indexing.html#nearest-neighbor-lookups[`data.sel()`]
 *** Apparently http://holoviews.org/[holoviews] does nearest-neighbour interpolation.
+* multi-dimensional coordinate => unroll?
+* var_name only selection? `cube.vloc(t0=12)`
+* Orthogonal only? Or also independent? `cube.loc_points(lon=[1, 1, 5], lat=[31, 33, 32])`
+  ** This seems quite closely linked to interpolation. Is the interpolation scheme orthogonal to cross-product vs. independent?
++
+[source,python]
+----
+cube.interpolate(
+    scheme='nearest',
+    mesh=dict(lon=[5, 10, 15], lat=[40, 50]))
+cube.interpolate(
+    scheme=Nearest(mode='spherical'),
+    locations=Ortho(lon=[5, 10, 15], lat=[40, 50]))
+----
 
 ## References
 . Iris