Skip to content

Commit f80c471

Browse files
committed
Sequence grouped(by:) and keyed(by:)
1 parent cff79d9 commit f80c471

File tree

1 file changed

+249
-0
lines changed

1 file changed

+249
-0
lines changed

proposals/NNNN-sequence-grouping.md

Lines changed: 249 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,249 @@
1+
# Sequence `grouped(by:)` and `keyed(by:)`
2+
3+
* Proposal: [SE-NNNN](NNNN-filename.md)
4+
* Authors: [Alexander Momchilov](https://github.com/amomchilov)
5+
* Review Manager: TBD
6+
* Status: **Awaiting review**
7+
* Implementation: [apple/swift#NNNNN](https://github.com/apple/swift/pull/NNNNN) or [apple/swift-evolution-staging#NNNNN](https://github.com/apple/swift-evolution-staging/pull/NNNNN)
8+
* Review: ([pitch](https://forums.swift.org/...))
9+
10+
## Introduction
11+
12+
This proposal would add new APIs on `Sequence` which let you group up or key elements in a more natural and fluent way than is currently possible with `Dictionary`'s initializers.
13+
14+
## Motivation
15+
16+
[SE-0165 Dictionary & Set Enhancements](https://github.com/apple/swift-evolution/blob/main/proposals/0165-dict.md) introduced some great utility APIs to Dictionary. Relating to this proposal are these 3 initializers:
17+
18+
1. [`Dictionary.init(grouping:by:)`](https://developer.apple.com/documentation/swift/dictionary/init(grouping:by:))
19+
2. [`Dictionary.init(uniqueKeysWithValues:)`](https://developer.apple.com/documentation/swift/dictionary/init(uniquekeyswithvalues:))
20+
3. [`Dictionary.init(_:uniquingKeysWith:)`](https://developer.apple.com/documentation/swift/dictionary/init(_:uniquingkeyswith:))
21+
22+
These APIs have proven very useful, and have replaced a plethora of hand-rolled calls to `reduce(into: [:]) { ... }`, but as initializers, they have some usability short comings.
23+
24+
### Grouping
25+
26+
It's not uncommon that you would like to group the results of a chain of method calls that transform a Sequence, and perhaps even keep chaining more transformations onto that Dictionary. For example:
27+
28+
```swift
29+
let studentsWithDuplicateNames = Dictionary(
30+
grouping: loadPeople()
31+
.map { json in decoder.decode(Person.self, from: json) }
32+
.filter(\.isStudent),
33+
by: \.firstName
34+
).filter { name, students in 1 < students.count }
35+
```
36+
37+
The initializer breaks down the simple top-to-bottom flow, and makes readers need to scan from the middle, to the top, and back to the bottom to follow the flow of transformations of values.
38+
39+
If this same capability existed as a method on `Sequence`, then the code can be much more fluent and prose-like:
40+
41+
```swift
42+
let studentsWithDuplicateNames = loadPeople()
43+
.map { json in decoder.decode(Person.self, from: json) }
44+
.filter(\.isStudent),
45+
.grouped(by: \.firstName) // Replaces Dictionary.init(grouping:by:)
46+
.filter { name, students in 1 < students.count }
47+
```
48+
49+
## Keying by a value
50+
51+
Many usages of [`Dictionary.init(uniqueKeysWithValues:)`](https://developer.apple.com/documentation/swift/dictionary/init(uniquekeyswithvalues:)) and [`Dictionary.init(_:uniquingKeysWith:)`](https://developer.apple.com/documentation/swift/dictionary/init(_:uniquingkeyswith:)) are expressing the idea of creating a Dictionary of values keyed by the some key (typically derived from the values themselves). [Many such uses](https://github.com/search?q=%2Fdictionary%5C%28uniqueKeysWithValues%3A.*%5C.map%2F+language%3ASwift&type=code&l=Swift) can be found where these initializers are paired with a call to `map`. This introduces syntactic complexity and an intermediate Array allocation, if the author doesn't remember to call `.lazy.map`.
52+
53+
```swift
54+
let studentsById = Dictionary(
55+
uniqueKeysWithValues: loadPeople()
56+
.map { json in decoder.decode(Person.self, from: json) }
57+
.filter(\.isStudent)
58+
.map { student in (key: student.id, value: student) }
59+
)
60+
```
61+
62+
This initializer is pretty syntactically heavy, its combination with `map` is a non-obvious pattern, and it suffers from the same reading up-and-down problem as the grouping case.
63+
64+
This concept could be expressed more clearly, and the intermediate array allocation can be spared, if this was a method on `Sequence`:
65+
66+
```swift
67+
let studentsById = loadPeople()
68+
.map { json in decoder.decode(Person.self, from: json) }
69+
.filter(\.isStudent)
70+
.keyed(by: \.id)
71+
```
72+
73+
### Prior art
74+
75+
| Language | Grouping API | "Keying" API |
76+
|---------------|--------------|-------------|
77+
| Java | [`groupingBy`](https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/stream/Collectors.html#groupingBy(java.util.function.Function)) | [`toMap`](https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/stream/Collectors.html#toMap(java.util.function.Function,java.util.function.Function)) |
78+
| Kotlin | [`groupBy`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/group-by.html) | [`associatedBy`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/associate-by.html) |
79+
| C# | [`GroupBy`](https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.groupby?view=net-7.0#system-linq-enumerable-groupby) | [`ToDictionary`](https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.todictionary?view=net-7.0#system-linq-enumerable-todictionary) |
80+
| Rust | [`group_by`](https://doc.rust-lang.org/std/primitive.slice.html#method.group_by) | - |
81+
| Ruby | [`group_by`](https://ruby-doc.org/3.2.2/Enumerable.html#method-i-group_by) | [`index_by`](https://rubydoc.info/gems/activesupport/7.0.5/Enumerable#index_by-instance_method)\* |
82+
| Python | [`groupby`](https://docs.python.org/3/library/itertools.html#itertools.groupby) | [dict comprehensions](https://peps.python.org/pep-0274/) |
83+
| PHP (Laravel) | [`groupBy`](https://laravel.com/docs/10.x/collections#method-groupby) | [`keyBy`](https://laravel.com/docs/10.x/collections#method-keyby) |
84+
85+
## Proposed solution
86+
87+
This proposal introduces 2 new methods on `Sequence`. Here are simple examples of their usages:
88+
89+
```swift
90+
let digitsGroupedByMod3 = (0...9).grouped(by: { $0 % 3 })
91+
// Results in:
92+
[
93+
0: [0, 3, 6, 9],
94+
1: [1, 4, 7],
95+
2: [2, 5, 8],
96+
]
97+
98+
let fruitsByFirstLetter = ["Apple", "Banana", "Cherry"].keyed(by: { $0.first! })
99+
// Results in:
100+
[
101+
"A": "Apple",
102+
"B": "Banana",
103+
"C": "Cherry",
104+
]
105+
```
106+
107+
## Detailed design
108+
109+
```swift
110+
extension Sequence {
111+
/// Groups the elements of `self` into a new Dictionary, whose keys are
112+
/// the groupings returned by the given closure and whose values are
113+
/// arrays of the elements that returned each key.
114+
/// - Parameters:
115+
/// - keyForValue: A closure that returns a key for each element in
116+
/// `self`.
117+
/// - Returns: A dictionary containing grouped elements of self, keyed by
118+
/// the keys derived by the `keyForValue` closure.
119+
func grouped<GroupKey>(
120+
by keyForValue: (Element) throws -> GroupKey
121+
) rethrows -> [GroupKey: [Element]]
122+
123+
/// Creates a new Dictionary from the elements of `self`, keyed by the
124+
/// results returned by the given `keyForValue` closure. As the dictionary is
125+
/// built, the initializer calls the `combine` closure with the current and
126+
/// new values for any duplicate keys. Pass a closure as `combine` that
127+
/// returns the value to use in the resulting dictionary: The closure can
128+
/// choose between the two values, combine them to produce a new value, or
129+
/// even throw an error.
130+
///
131+
/// If no `combine` closure is provided, deriving the same duplicate key for
132+
/// more than one element of self results in a runtime error.
133+
///
134+
/// - Parameters:
135+
/// - keysAndValues: A sequence of key-value pairs to use for the new fgcgcbcxbvcxbvcxbvcxbvcxbvcxbvcxbvcxbvcx !@!@!@!@
136+
/// dictionary.
137+
/// - combine: A closure that is called with the values for any duplicate
138+
/// keys that are encountered. The closure returns the desired value for
139+
/// the final dictionary.
140+
func keyed<Key>(
141+
by keyForValue: (Element) -> Key,
142+
uniquingKeysWith combine: ((Key, Element, Element) throws -> Element)? = nil
143+
) rethrows -> [Key: Element]
144+
}
145+
```
146+
147+
## Source compatibility
148+
149+
All the proposed additions are purely additive.
150+
151+
## ABI compatibility
152+
153+
This proposal is purely an extension of the standard library which
154+
can be implemented without any ABI support.
155+
156+
## Implications on adoption
157+
158+
TODO
159+
160+
The compatibility sections above are focused on the direct impact
161+
of the proposal on existing code. In this section, describe issues
162+
that intentional adopters of the proposal should be aware of.
163+
164+
For proposals that add features to the language or standard library,
165+
consider whether the features require ABI support. Will adopters need
166+
a new version of the library or language runtime? Be conservative: if
167+
you're hoping to support back-deployment, but you can't guarantee it
168+
at the time of review, just say that the feature requires a new
169+
version.
170+
171+
Consider also the impact on library adopters of those features. Can
172+
adopting this feature in a library break source or ABI compatibility
173+
for users of the library? If a library adopts the feature, can it
174+
be *un*-adopted later without breaking source or ABI compatibility?
175+
Will package authors be able to selectively adopt this feature depending
176+
on the tools version available, or will it require bumping the minimum
177+
tools version required by the package?
178+
179+
If there are no concerns to raise in this section, leave it in with
180+
text like "This feature can be freely adopted and un-adopted in source
181+
code with no deployment constraints and without affecting source or ABI
182+
compatibility."
183+
184+
## Alternatives considered
185+
186+
### Wait for a "pipe" operator, and just use the existing initializers
187+
188+
The general issue here is the ergonomics of free functions (and similarly, initializers and static functions), and how they don't chain together as nicely as instance functions. There has been community discussion around introducing a generalized solution to this problem, usually an Elixir-style [pipe operator](https://elixir-lang.org/getting-started/enumerables-and-streams.html#the-pipe-operator), `|>`.
189+
190+
This operator takes the value on its left, and passes it as the first argument to the function passed to its right. It might look like so:
191+
192+
```swift
193+
let studentsWithDuplicateNames = loadPeople()
194+
.map { json in decoder.decode(Person.self, from: json) }
195+
.filter(\.isStudent),
196+
|> { Dictionary(grouping: $0 by: \.name) }
197+
.filter { name, students in 1 < students.count }
198+
```
199+
200+
This composes nicely, and reuses the existing Dictionary initializer, but brings its own challenges.
201+
202+
`Dictionary(grouping:by:)` takes two arguments, but the `|>` operator would expect a right-hand-side closure that takes only 1 argument. Resolving this requires one of several approaches, each with some downsides:
203+
1. Explicitly wrap the rhs in a closure as shown. This is quite noisy.
204+
2. Introduce a generalized function-currying syntax that can take `Dictionary.init(grouping:by:)`, bind the `by` argument to `\.name`, and return a single-argument function. This seems unlikely to be added to the language, and a long way off in any case.
205+
3. Implement the `|>` as a special form in the language, that gives it special behaviour for cases like this. (e.g. `|> Dictionary(grouping:by: \.name)`). This adds syntactic complexity to the language, and privileges the first argument over the others, which might not always work nicely.
206+
207+
In any case, the resultant spelling would still be quite wordy, and less clear than the simple `grouped(by:)` and `keyed(by:)` methods.
208+
209+
### Don't pass the `Key` to `keyed(by:)`'s `combine` closure
210+
211+
The proposed `keyed(by:combine:)` API takes an optional `combine` closure with this type:
212+
213+
```swift
214+
(Key, Element, Element) throws -> Element
215+
```
216+
217+
This differs from the `combine` closure expected by the current `Dictionary.init(_:uniquingKeysWith:)` API, which only passes the old and new element, but not the `Key`:
218+
219+
```swift
220+
(Element, Element) throw -> Element
221+
```
222+
223+
If the caller needs the `key` in their decision to pick between the old and new value, they would be required to re-compute it for themselves. This looks like a needless artificial restriction: at the point at which the `combine` closure is called, the key is already available, and it could just be provided directly.
224+
225+
### `groupedBy` and `keyedBy`
226+
227+
The `by:` argument labels are lost when using trailing-closure syntax:
228+
229+
```swift
230+
(0...9).grouped { $0 % 3 }
231+
```
232+
233+
Authors are forced to pick between the terseness of closure syntax as shown, or the prose-like clarity of regular closure passing:
234+
235+
```swift
236+
(0...9).grouped(by: { $0 % 3 })
237+
```
238+
239+
There's a fair argument to be made that the best of both worlds would be to move the `by:` from the argument label, into the function's base name, allowing for these two spellings:
240+
241+
```swift
242+
(0...9).groupedBy({ $0 % 3 })
243+
(0...9).groupedBy { $0 % 3 }
244+
```
245+
246+
247+
## Acknowledgments
248+
249+
None (yet?)

0 commit comments

Comments
 (0)