From 00015c5cc9d90e227ec32fa0637e9f8c190ed784 Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Wed, 8 Feb 2023 15:03:33 +0100 Subject: [PATCH 01/43] Add text, first pass * General considerations on error handling * Predefined exceptions * Runtime crashes * Usage of the Option type In an upcoming commit, text on the Result type will be added --- data/tutorials/lg_08_error_handling.md | 443 ++++++++++++++++++++++--- 1 file changed, 402 insertions(+), 41 deletions(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index 5a1a094e91..30c204d5bd 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -9,62 +9,120 @@ date: 2021-05-27T21:07:30-00:00 # Error Handling +## Error as Special Values + +Don't do that. + +Some languages, most emblematically C, treats certain values as errors. For +instance, when receiving data through a network connection, a function expected +to return the number of received bytes might return a negative number meaning: +“timed out waiting”. Another example would be returning the empty string when +extracting a substring of negative length. Great software was written using this +style, but is is not the proper way to deal with errors in OCaml. + +OCaml has three major ways to deal with errors: +1. Exceptions +1. `Option` values +1. `Result` values + +Use them. Do not encode errors inside data. Exceptions provide a mean to deal +with errors at the control flow level while `Option` and `Result` provide a mean +to turn errors into dedicated data. + +The rest of this document presents and compares approaches towards error +handling. + ## Exceptions -One way of handling errors in OCaml is exceptions. The +Historically, the first way of handling errors in OCaml is exceptions. The standard library relies heavily upon them. -Exceptions belong to the type `exn` (an extensible sum type): +The biggest issue with exceptions is that they do not appear in types. One has +to read the documentation to see that, indeed, `List.find` or `String.sub` are not +total functions, and that they might fail by raising an exception. + +However, exceptions have the great merit of being compiled into efficient +machine code. When implementing trial and error approaches likely to back-track +often, exceptions can be used to acheive good performance. + +Exceptions belong to the type `exn` which is an [extensible sum type](/releases/latest/manual/extensiblevariants.html). ```ocaml +# exception Foo of string;; exception Foo of string -let i_will_fail () = - raise (Foo "Oh no!") +# let i_will_fail () = + raise (Foo "Oh no!");; + +# i_will_fail ();; +Exception: Foo "Oh no!". ``` -Here, we add a variant `Foo` to the type `exn`, and create a function -that will raise this exception. Now, how do we handle exceptions? -The construct is `try ... with ...`: +Here, we add a variant `Foo` to the type `exn`, and create a function that will +raise this exception. Now, how do we handle exceptions? The construct is `try +... with ...`: ```ocaml -let safe_inverse n = - try Some (1 / n) with - Division_by_zero -> None - -let safe_list_find p l = - try Some (List.find p l) with - Not_found -> None +# try i_will_fail () with Foo _ -> ();; +- : unit = () ``` -We can try those functions: +### Predefined Exceptions + +The standard library predefines several exceptions, see +[`Stdlib`](/releases/latest/api/Stdlib.html). Here are a few examples: ```ocaml # 1 / 0;; Exception: Division_by_zero. -# safe_inverse 2;; -- : int option = Some 0 -# safe_inverse 0;; -- : int option = None # List.find (fun x -> x mod 2 = 0) [1; 3; 5];; Exception: Not_found. -# safe_list_find (fun x -> x mod 2 = 0) [1; 3; 4; 5];; -- : int option = Some 4 -# safe_list_find (fun x -> x mod 2 = 0) [1; 3; 5];; -- : int option = None +# String.sub "Hello world!" 3 (-2);; +Exception: Invalid_argument "String.sub / Bytes.sub". +# let rec loop x = x :: loop x +val loop : 'a -> 'a list = +# loop 42;; +Stack overflow during evaluation (looping recursion?). +``` + +Although the last one doesn't look as an exception, it actually is. +```ocaml +# try loop 42 with Stack_overflow -> [];; +- : int list = [] +``` + +Among them the predefined exceptions of the standard library, the following ones +are intended to be raised by user written functions: +```ocaml +exception Exit +exception Not_found +exception Invalid_argument of string +exception Failure of string ``` -The biggest issue with exceptions is that they do not appear in types. -One has to read the documentation to see that, indeed, `Map.S.find` -or `List.hd` are not total functions, and that they might fail. +* `Exit` terminates your program with a success status, which is 0 in Unices (they do error values) +* `Not_found` should be raised when searching failed because there isn't anything satisfactory to be found +* `Invalid_argument` should be raised when a parameter can't be accepted +* `Failure` should be raised when a result can't be produced -It is considered good practice nowadays, when a function can fail in -cases that are not bugs (i.e., not `assert false`, but network failures, -keys not present, etc.) -to return a more explicit type such as `'a option` or `('a, 'b) result`. -A relatively common idiom is to have such a safe version of the function, -say, `val foo : a -> b option`, and an exception raising -version `val foo_exn : a -> b`. +Functions are provided to raise `Invalid_argument` and `Failure` using a string parameter: +```ocaml +val invalid_arg : string -> 'a +(** @raise Invalid_argument *) +val failwith : string -> 'a +(** @raise Failure *) +``` + +When implementing a software component which exposes functions raising +exceptions, a design decision must be made: +* Use the prexisting exceptions +* Raise custom exceptions + +Both can make sense, there isn't a general rule. If the exceptions of the +standard library are used, they must be raised under the conditions they are +intended to, otherwise handlers will have trouble processing them. Using custom +exceptions will force client code to include dedicated catch conditions. This +can be desirable for errors that must be handled at the the client level. ### Documentation @@ -73,14 +131,15 @@ Functions that can raise exceptions should be documented like this: ```ocaml val foo : a -> b -(** foo does this and that, here is how it works, etc. +(** [foo] does this and that, here is how it works, etc. @raise Invalid_argument if [a] doesn't satisfy ... - @raise Sys_error if filesystem is not happy *) + @raise Sys_error if filesystem is not happy +*) ``` -### Stacktraces +### Stack traces -To get a stacktrace when a unhandled exception makes your program crash, you +To get a stack trace when an unhandled exception makes your program crash, you need to compile the program in "debug" mode (with `-g` when calling `ocamlc`, or `-tag 'debug'` when calling `ocamlbuild`). Then: @@ -89,7 +148,7 @@ Then: OCAMLRUNPARAM=b ./myprogram [args] ``` -And you will get a stacktrace. Alternatively, you can call, from within the program, +And you will get a stack trace. Alternatively, you can call, from within the program, ```ocaml let () = Printexc.record_backtrace true @@ -100,7 +159,7 @@ let () = Printexc.record_backtrace true To print an exception, the module `Printexc` comes in handy. For instance, the following function `notify_user : (unit -> 'a) -> 'a` can be used to call a function and, if it fails, print the exception on `stderr`. -If stacktraces are enabled, this function will also display it. +If stack traces are enabled, this function will also display it. ```ocaml let notify_user f = @@ -129,9 +188,311 @@ Each printer should take care of the exceptions it knows about, returning `Some `, and return `None` otherwise (let the other printers do the job!). -## Result Type +## Runtime Crashes + +Although OCaml is a very safe language, it is possible to trigger unrecoverable +errors at runtime. + +### Exceptions not Raised + +Under panic circumstances, the native code compiler does a best-effort at +raising meaningful exceptions. However, some error conditions may remain +undetected, which will result in a segmentation fault. This is the specially the +case for stack overflows, which aren't always detected. + +> But catching stack overflows is tricky, both in Unix-like systems and under Windows, so the current implementation in OCaml is a best effort that is occasionally buggy. + +[Xavier Leroy, October 2021](https://discuss.ocaml.org/t/stack-overflow-reported-as-segfault/8646/8?u=cuihtlauac) + +### Bypassing Type-Safety + +OCaml provides a mean to bypass it's own type safety. Don't use it. Here is how +to shoot in its own feet: + +```shell +> echo "(Obj.magic () : int array).(0)" > foo.ml +> ocamlopt foo.ml +> ./a.out +Segmentation fault (core dumped) +``` + +### Language Bugs + +When a crash isn't coming from: +* A limitation of the native code compiler +* `Obj.magic` + +It may be a language bug. It happens. Here is what to do when this is suspected: + +1. Make sure the crash affects both compilers: bytecode and native +1. Write a self-contained and minimal proof-of-concept code which does nothing but triggering the crash +1. File an issue in the [OCaml Bug Tracker in GitHub](https://github.com/ocaml/ocaml/issues) + +Here is an example of such a bug: https://github.com/ocaml/ocaml/issues/7241 + +### Safe vs. Unsafe Functions + +Uncaught exceptions raise runtime crashes. Therefore, there is a tendency to use +the following terminology: +* Function raising exceptions: Unsafe +* Function handling errors in data: Safe + +The main means to write such kind of safe error handling functions is to use +either `Option` (next section) or `Result` (following section). + +## Using the `Option` Type for Errors + +The `Option` module provides the first alternative to exceptions. The `'a +option` datatype allows to express either the availability of data for instance +`Some 42` or the absence of data using `None`, which can represent an error. + +Using `Option` it is possible to write function that return `None` instead of +throwing an exception. +```ocaml +let div_opt m n = + try Some (m / n) with + Division_by_zero -> None + +let find_opt p l = + try Some (List.find p l) with + Not_found -> None +``` +We can try those functions: + +```ocaml +# 1 / 0;; +Exception: Division_by_zero. +# div_opt 42 2;; +- : int option = Some 24 +# div_opt 42 0;; +- : int option = None +# List.find (fun x -> x mod 2 = 0) [1; 3; 5];; +Exception: Not_found. +# find_opt (fun x -> x mod 2 = 0) [1; 3; 4; 5];; +- : int option = Some 4 +# find_opt (fun x -> x mod 2 = 0) [1; 3; 5];; +- : int option = None +``` + +This can even be turned into a higher-order generic function: +```ocaml +# let try_opt f x = try Some (f x) with _ -> None;; +val try_opt : ('a -> 'b) -> 'a -> 'b option = +``` + +It tends to be considered good practice nowadays when a function can fail in +cases that are not bugs (i.e., not `assert false`, but network failures, keys +not present, etc.) to return a more explicit type such as `'a option` or `('a, +'b) result` (see next section). + +### Naming Conventions + +There are two naming conventions to have two versions of the same partial +function, one raising exception, the other returning an option. In the above +examples, the convention of the standard library is used add an `_opt` suffix to +name of the version of the function which returns an option instead of raising +exceptions. +```ocaml +val find: ('a -> bool) -> 'a list -> 'a +(** @raise Not_found *) +val find_opt: ('a -> bool) -> 'a list -> 'a option +``` +This is extracted from the `List` module of the standard library. + +However, some project tend to avoid or reduce the usage of exceptions. In such a +context, reversing the convention is a relatively common idiom. It is the +version of the function which raises exceptions that is suffixed with `_exn`. +Using the same functions, that would be the specification +```ocaml +val find_exn: ('a -> bool) -> 'a list -> 'a +(** @raise Not_found *) +val find: ('a -> bool) -> 'a list -> 'a option +``` +### Composing Functions Returning Options + +The function `div_opt` can't raise exceptions. However, since it doesn't return +a result of type `int`, it can't be used in place of an `int`. The same way +OCaml doesn't +[promote](https://en.wikipedia.org/wiki/Type_conversion#Type_promotion) integers +into floats, it doesn't automatically converts `int option` into `int` or _vice +versa_. + +```ocaml +# 21 + Some 21;; +Error: This expression has type 'a option + but an expression was expected of type int +``` + +In order to combine option values with other values, conversion functions are +needed. Here are the functions provided by the `Option` module to extract the +data contained in an option: +```ocaml +val get : 'a t -> 'a +val value : 'a t -> default:'a -> 'a +val fold : none:'a -> some:('b -> 'a) -> 'b t -> 'a +``` +`get` returns the content or raises `Invalid_argument` if applied to `None`. +`value` essentially behaves as `get`, except it must be called with a default +value which will be returned of if applied to `None`. `fold` also needs to be +passed a default value that is returned when called on `None`, but it also +expects a function that will be applied to the content of the option, when not +empty. + +As a remark, observe that `value` can be implemented using `fold`: +```ocaml +# let value ~default = Option.fold ~none:default ~some:Fun.id;; +val value : default:'a -> 'a option -> 'a = +# Option.value ~default:() None = value ~default:() None;; +- : bool = true +# Option.value ~default:() (Some ()) = value ~default:() (Some ());; +- : bool = true +``` + +It is also possible to perform pattern matching on option values: +```ocaml +match opt with +| None -> ... (* Something *) +| Some x -> ... (* Something else *) +``` +However, sequencing such expressions leads to deep nesting which is often considered bad: + +> if you need more than 3 levels of indentation, you're screwed anyway, and should fix your program. + +[Linux Kernel Style Guide](https://www.kernel.org/doc/Documentation/process/coding-style.rst) + +The recomended way to avoid that is to refrain from or delay attempting to +access the content of an option value. + +### Using on `Option.map` and `Option.bind` + +Let's start with an example. Let's imagine one needs to write a function +returning the [hostname](https://en.wikipedia.org/wiki/Hostname) part of an +email address. For instance, given the email +"gaston.lagaffe@courrier.dupuis.be", it would return "courrier". + +Here is a questionable but straitforward implementation, using exceptions: +```ocaml +let host email = + let fqdn_pos = String.index email '@' + 1 in + let fqdn_len = String.length email - fqdn_pos in + let fqdn = String.sub email fqdn_pos fqdn_len in + try + let host_len = String.index fqdn '.' in + String.sub fqdn 0 host_len + with Not_found -> + if fqdn <> "" then fqdn else raise Not_found +``` +This may fail by raising `Not_found` if the first the call to `String.index` +does, which make sense since if there is no '@' character in input string, it's +not an email address. However, if the second call to `String.index` fails, +meaning no dot character was found, we may return the whole fully qualified +domain name (FQDN) as a fallback, but only if it isn't the empty string. + +Note that `String.sub` may throw `Invalid_argument`. Fortunately, this can't +happen. In the worst case, the `@` character is the last one, then `fqdn_pos` is +off range by one but `fqdn_len` is null and that combination of parameters +doesn't count as an invalid substring. + +Here the equivalent function, using the same logic but `Option` instead of +exceptions: +```ocaml +let host_opt email = + match String.index_opt email '@' with + | Some at_pos -> begin + let fqdn_pos = at_pos + 1 in + let fqdn_len = String.length email - fqdn_pos in + let fqdn = String.sub email fqdn_pos fqdn_len in + match String.index_opt fqdn '.' with + | Some host_len -> Some (String.sub fqdn 0 host_len) + | None -> if fqdn <> "" then Some fqdn else None + end + | None -> None +``` + +Although it qualifies as safe, its legibility isn't improved. Some claim even +claim it is worse. + +Before showing how to improve this code, we need to explain how `Option.map` and +`Option.bind` work. +```ocaml +val Option.map : ('a -> 'b) -> 'a option -> 'b option +val Option.bind : 'a option -> ('a -> 'b option) -> 'b option +``` + +`Option.map` applies a function `f` to an option parameter, if it isn't `None` +```ocaml +let map f = function +| Some x -> Some (f x) +| None as opt -> opt +``` + +If `f` can be applied to something, its result is rewrapped into a fresh option. +If there isn't anything to supply to `f`, `None` is forwarded. + +If we don't take arguments order into account, `Option.bind` is almost exacly +the same, except we assume `f` returns an option, therefore there is no need to +rewrapped its result, it's already an option value: +```ocaml +let bind o f = match opt with +| Some x -> f x +| None -> None +``` + +`bind` having flipped parameter with respect to `map` allows to use it as custom +let binder: +```ocaml +# let ( let* ) = Option.bind;; +val ( let* ) : 'a option -> ('a -> 'b option) -> 'b option = +``` + +Using these mechanisms, here a possible way to rewrite `host_opt`: +```ocaml +# let host_opt email = + let* fqdn_pos = Option.map (( + ) 1) (String.index_opt email '@') in + let fqdn_len = String.length email - fqdn_pos in + let fqdn = String.sub email fqdn_pos fqdn_len in + String.index_opt fqdn '.' + |> Option.map (fun dot_pos -> String.sub fqdn 0 dot_pos) + |> function None when fqdn <> "" -> Some fqdn | opt -> opt;; +val host_opt : string -> string option = +``` -The Stdlib module contains the following type: +This version was picked to illustrate how to use and combine operations on +options allowing to acheive some balance between understandability and +robustness. A couple of observations: +* As in the original `host` function (with exceptions): + - The calls to `String` functions (`index_opt`, `length` and `sub`) are written in + the same order + - The same local names are used, with the same types +* There isn't any indentation or matching left +* Line 1: + - right-hand side of `=` : `Option.map` allows adding 1 to the result of `String.index_opt`, if it didn't failed + - left-hand side of `=` : the `let*` syntax turns all the rest of the + code (from line 2 to the end) into the body of an anonymous function which + takes `fqdn_pos` as parameter, and the function `( let* )` is called with the + right-hand side of `=` (as first parameter) and that anonymous function (as + second parameter). +* Lines 2 and 3: same as in the original +* Line 4: `try` or `match` is removed +* Line 5: `String.sub` is applied, if the previous step didn't failed, otherwise the error is forwarded +* Line 6: if nothing was found earlier, and if isn't empty, `fqdn` is returned as a fallback + + + +### Options and Return Early + +One of the limitation of the option type is it doesn't record the reason which +prevented having a value. `None` is silent, it doesn't say anything about what +went wrong. For this reason, function returning option values should document +the circumstances under which it may return `None`. Such a documentation is +likely to ressemble to the one required for exceptions using `@raise`. The +`Result` type is intended to fill this gap: manage error in data, like option +values but also provide information on errors, like exceptions. It is the topic +of the next section. + +## `Result` Type + +The `Result` module of the standard library contains the following type: ```ocaml type ('a, 'b) result = From e3bba316bf21f23d6669121d22304772eb3e1e30 Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Fri, 24 Feb 2023 19:00:35 +0100 Subject: [PATCH 02/43] Add missing text - Result - Assertions - Conclusing remarks --- data/tutorials/lg_08_error_handling.md | 298 ++++++++++++++++++++----- 1 file changed, 243 insertions(+), 55 deletions(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index 30c204d5bd..88f049e9ae 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -383,8 +383,8 @@ let host email = if fqdn <> "" then fqdn else raise Not_found ``` This may fail by raising `Not_found` if the first the call to `String.index` -does, which make sense since if there is no '@' character in input string, it's -not an email address. However, if the second call to `String.index` fails, +does, which make sense since if there is no `@` character in the input string, +it's not an email address. However, if the second call to `String.index` fails, meaning no dot character was found, we may return the whole fully qualified domain name (FQDN) as a fallback, but only if it isn't the empty string. @@ -409,7 +409,7 @@ let host_opt email = | None -> None ``` -Although it qualifies as safe, its legibility isn't improved. Some claim even +Although it qualifies as safe, its legibility isn't improved. Some may even claim it is worse. Before showing how to improve this code, we need to explain how `Option.map` and @@ -423,7 +423,7 @@ val Option.bind : 'a option -> ('a -> 'b option) -> 'b option ```ocaml let map f = function | Some x -> Some (f x) -| None as opt -> opt +| None -> None ``` If `f` can be applied to something, its result is rewrapped into a fresh option. @@ -433,7 +433,7 @@ If we don't take arguments order into account, `Option.bind` is almost exacly the same, except we assume `f` returns an option, therefore there is no need to rewrapped its result, it's already an option value: ```ocaml -let bind o f = match opt with +let bind opt f = match opt with | Some x -> f x | None -> None ``` @@ -452,7 +452,7 @@ Using these mechanisms, here a possible way to rewrite `host_opt`: let fqdn_len = String.length email - fqdn_pos in let fqdn = String.sub email fqdn_pos fqdn_len in String.index_opt fqdn '.' - |> Option.map (fun dot_pos -> String.sub fqdn 0 dot_pos) + |> Option.map (fun host_len -> String.sub fqdn 0 host_len) |> function None when fqdn <> "" -> Some fqdn | opt -> opt;; val host_opt : string -> string option = ``` @@ -461,10 +461,10 @@ This version was picked to illustrate how to use and combine operations on options allowing to acheive some balance between understandability and robustness. A couple of observations: * As in the original `host` function (with exceptions): - - The calls to `String` functions (`index_opt`, `length` and `sub`) are written in - the same order + - The calls to `String` functions (`index_opt`, `length` and `sub`) are the + same and in the same order - The same local names are used, with the same types -* There isn't any indentation or matching left +* There isn't any indentation or pattern-matching left * Line 1: - right-hand side of `=` : `Option.map` allows adding 1 to the result of `String.index_opt`, if it didn't failed - left-hand side of `=` : the `let*` syntax turns all the rest of the @@ -477,22 +477,24 @@ robustness. A couple of observations: * Line 5: `String.sub` is applied, if the previous step didn't failed, otherwise the error is forwarded * Line 6: if nothing was found earlier, and if isn't empty, `fqdn` is returned as a fallback - - -### Options and Return Early +When used to handle errors with catch statements, it requires some time to get +used the latter style. The key idea is avoiding or defering from directly +looking into option values, instead pass them along using _ad-hoc_ pipes (such +as `map` and `bind`). Erik Meijer calls that following the happy path. Visually, +it also looks like to the “early return“ pattern often found in C. One of the limitation of the option type is it doesn't record the reason which -prevented having a value. `None` is silent, it doesn't say anything about what -went wrong. For this reason, function returning option values should document -the circumstances under which it may return `None`. Such a documentation is -likely to ressemble to the one required for exceptions using `@raise`. The -`Result` type is intended to fill this gap: manage error in data, like option -values but also provide information on errors, like exceptions. It is the topic -of the next section. +prevented from having a value. `None` is silent, it doesn't say anything about +what went wrong. For this reason, function returning option values should +document the circumstances under which it may return `None`. Such a +documentation is likely to ressemble to the one required for exceptions using +`@raise`. The `Result` type is intended to fill this gap: manage error in data, +like option values but also provide information on errors, like exceptions. It +is the topic of the next section. ## `Result` Type -The `Result` module of the standard library contains the following type: +The `Result` module of the standard library defines the following type: ```ocaml type ('a, 'b) result = @@ -500,53 +502,220 @@ type ('a, 'b) result = | Error of 'b ``` -A value `Ok x` means that the computation succeeded with `x`, and -a value `Error e` means that it failed. -Pattern matching can be used to deal with both cases, as with any -other sum type. The advantage here is that a function `a -> b` that -fails can be modified so its type is `a -> (b, error) result`, -which makes the failure explicit. -The error case `e` in `Error e` can be of any type -(the `'b` type variable), but a few possible choices -are: +A value `Ok x` means that the computation succeeded and produced `x`, and a +value `Error e` means that it failed and `e` represents whatever information +collected in the process. Pattern matching can be used to deal with both cases, +as with any other sum type. However using `map` and `bind` can be more +convinient, maybe even more as it was with `Option`. + +Before taking a look at `Result.map`, let's think about `List.map` and +`Option.map` under a changed perspective. Both functions behaves like the +identity when applied to `[]` or `None`, respectively. That's the only +possibility since those parameters don't carry any data. Which isn't the case in +`Result` with its `Error` constructor. Nethertheless, `Result.map` is +implemented likewise, on `Error`, it also behaves like the identity. +```ocaml +let map f = function +| Ok x -> Ok (f x) +| Error e -> Error e +``` + +The `Result` module has two map functions: the one we've just seen and another +one, with the same logic, applied to `Error`: +```ocaml +let map_error f = function +| Ok x -> Ok x +| Error e -> f e +``` + +The same reasoning applies to `Result.bind`, except there's no `bind_error`. +Using those functions, here is an hypothetical example of code using [Anil +Madhavapeddy OCaml Yaml library](https://github.com/avsm/ocaml-yaml): +```ocaml +let file_opt = File.read_opt path in +let file_res = Option.to_result ~none:(`Msg "File not found") file_opt in begin + let* yaml = Yaml.of_string file_res in + let* found_opt = Yaml.Util.find key yaml in + let* found = Option.to_result ~none:(`Msg (key ^ ", key not found")) found_opt in + found +end |> Result.map_error (Printf.sprintf "%s, error: %s: " path) +``` + +Here are the types of the involved functions: +```ocaml +val File.read_opt : string -> string option +val Yaml.of_string : string -> (Yaml.value, [`Msg of string]) result +val Yaml.Util.find : string -> Yaml.value -> (Yaml.value option, [`Msg of string]) result +val Option.to_result : none:'e -> 'a option -> ('a, 'e) result +``` + +- `File.read_opt` is supposed to open a file, read its contents and return it as a +string wrapped in an option, if anything goes wrong `None` is returned. +- `Yaml.of_string` parses a string an turns into an ad-hoc OCaml type +- `Yaml.find` recursively searches a key in a Yaml tree, if found, it returns the + corresponding data, wrapped in an option +- `Option.to_result` perform conversion of an `option` into a `result` +- Finally, `let*` stands for `Result.bind` + +Since functions from the `Yaml` module both returns `result` data, it is easier +to write a pipe which process that type all along. That's why `Option.to_result` +needs to be used. Stages which produce `result` must be chained using `bind`, +stages which do not must be chained using some map function, in order for the +result to be wrapped back into a `result`. + +The map functions of the `Result` module allows processing of data or errors, +but the routines used must not fail, as `Result.map` will never turn an `Ok` +into an `Error` and `Result.map_error` will never turn an `Error` into an `Ok`. +On the other hand, functions passed to `Result.bind` are allowed to fail. As +stated before there isn't a `Result.bind_error`. One way to make sense out of +that absence is to consider its type, it would have to be: +```ocaml +val Result.bind_error : ('a, 'e) result -> ('e -> ('a, 'f) result) -> ('a, 'f) result +``` +We would have: +* `Result.map_error f (Ok x) = Ok x` +* And either: + - `Result.map_error f (Error e) = Ok y` + - `Result.map_error f (Error e) = Error e'` +Which means an error would be turned back into valid data, or changed into +another error. This is almost like recovering from an error. However, when +recovery fails, it may be preferable to preserve the initial cause of failure. +That behaviour can be acheived by defining the following function: + +```ocaml +# let recover f = Result.(fold ~ok:ok ~error:(fun (e : 'e) -> Option.to_result ~none:e (f e)));; +val recover : ('e -> 'a option) -> ('a, 'e) result -> ('a, 'e) result = +``` + +Although any kind of data can be wrapped as a `result` `Error`, it is +recommended to use that constructor to carry actual errors, for instance: - `exn`, in which case the result type just makes exceptions explicit. - `string`, where the error case is a message that indicates what failed. - `string Lazy.t`, a more elaborate form of error message that is only evaluated if printing is required. -- some polymorphic variant, with one case per - possible error. This is very accurate (each error can be dealt with - explicitly and occurs in the type) but the use of polymorphic variants - sometimes make error messages hard to read. +- some polymorphic variant, with one case per possible error. This is very + accurate (each error can be dealt with explicitly and occurs in the type) but + the use of polymorphic variants sometimes make the code harder to read. + +Note that some say the types `result` and `Either.t` are +[ismorphic](https://en.wikipedia.org/wiki/Isomorphism). Concretely, it means +it's always possible to replace one by the other, like in a completely neutral +refactoring. Values of type `result` and `Either.t` can be translated back and +forth, and appling both translations one after the other, in any order, returns +to the starting value. Nethertheless, this doesn't mean `result` should be used +in place of `Either.t`, or vise versa. Naming things matters, as punned by Phil +Karlton famous quote: + +> There are only two hard things in Computer Science: cache invalidation and +> naming things. + +Properly handling errors always makes the code harder to read. Using the right +tools, data and functions can help. Use them. + +## `bind` as a Binary Operator + +When `Option.bind` or `Result.bind` are used, they are often aliased into a +custom binding operator, such as `let*`. However, it is also possible to use +it as binary operator, which is almost always writen `>>=`. Using `bind` this +way must be detailed because it is extremly popular in other functional +programming language, and specially in OCaml's arch-rival _Which Must Not Be +Named_. -For easy combination of functions that can fail, many alternative standard -libraries provide useful combinators on the `result` type: `map`, `>>=`, etc. +Assuming `a` and `b` are valid OCaml expressions, the three pieces of sources +code are exactly the same: + +```ocaml +bind a (fun x -> b) +``` +```ocaml +let* x = a in b +``` +```ocaml +a >>= fun x -> b +``` + +It may seem pointless. To make sense, one must look at expressions where several +calls to `bind` are chained. The following three are also equivalent: + +```ocaml +bind a (fun x -> bind b (fun y -> c)) +``` +```ocaml +let* x = a in +let* y = b in +c +``` +```ocaml +a >>= fun x -> b >>= fun y -> c +``` +Variables `x` and `y` may appear in `c` in the three cases. The first form isn't +very convinent, it uses a lot of parenthesis. The second one is often the +prefered one due to its ressemblance with regular local definitions. The third +one is harder to read, `>>=` associates to the right in order to avoid +parenthesis in that precise case, but it's easy to get lost. Nethertheless, it +has some appeal when named functions are used. It looks a bit like good old Unix +pipes: +```ocaml +a >>= f >>= g +``` +looks better than: +```ocaml +let* x = a in +let* y = f x in +g y +``` +Writing `x >>= f` is very close to what is found in functionally tained +programming languages which have methods and receivers such as Kotlin, Scala, +Go, Rust, Swift or even modern Java, where it would be looking like: `x.bind(f)`. + +Here is the same code as presented at the end of the previous section, rewritten +using `Result.bind` as a binary opeator: +```ocaml +File.read_opt path +|> Option.to_result ~none:(`Msg "File not found") +>>= Yaml.of_string +>>= Yaml.Util.find key +>>= Option.to_result ~none:(`Msg (key ^ ", key not found")) +|> Result.map_error (Printf.sprintf "%s, error: %s: " path) +``` + +By the way, this style is called [Tacit +Programming](https://en.wikipedia.org/wiki/Tacit_programming). Thanks to the +associativity priorities of the >>= and |> operators, no parenthesis are needed. + +OCaml has a strict typing discipline, not a strict styling discipline, therefore +picking the right style is an author's decision. See the [OCaml Programming +Guidelines](/docs/guidelines) for more details on those matters. ## Assertions -The built-in `assert` takes an expression as an argument and throws an -exception *if* the provided expression evaluates to `false`. -Assuming that you don't catch this exception (it's probably -unwise to catch this exception, particularly for beginners), this -results in the program stopping and printing out the source file and -line number where the error occurred. An example: + +The built-in `assert` instruction takes an expression as an argument and throws +the `Assert_failure` exception if the provided expression evaluates to `false`. +Assuming that you don't catch this exception (it's probably unwise to catch this +exception, particularly for beginners), this results in the program stopping and +printing out the source file and line number where the error occurred. An +example: ```ocaml # assert (Sys.os_type = "Win32");; -Exception: Assert_failure ("//toplevel//", 1, 1). -Called from Stdlib__Fun.protect in file "fun.ml", line 33, characters 8-15 -Re-raised at Stdlib__Fun.protect in file "fun.ml", line 38, characters 6-52 -Called from Topeval.load_lambda in file "toplevel/byte/topeval.ml", line 89, characters 4-150 +Exception: Assert_failure ("//toplevel//", 1, 0). ``` -(Running this on Win32, of course, won't throw an error). +Running this on Win32, of course, won't throw an error. + +Writing `assert false` would just stop your program. This idiom is sometimes +used to indicate [dead code](https://en.wikipedia.org/wiki/Dead_code), parts of +the program that must be writen (often for type-checking or pattern matching +completeness) but are unreachable at run time. -You can also just call `assert false` to stop your program if things -just aren't going well, but you're probably better to use ... +Asserts should be undetstood as executable comments. There aren't supposed to +fail, unless during debugging or truely extraordinary circumstances absolutely +preventing the execution from making any kind of progress. -`failwith "error message"` throws a `Failure` exception, which again -assuming you don't try to catch it, will stop the program with the given -error message. `failwith` is often used during pattern matching, like -this real example: +When the execution reaches conditions which can't be handled, the right thing to +do is to throw a `Failure`, using `failwith "error message"`. Assertions aren't + meant to handle those cases. For instance, in the following code: ```ocaml @@ -557,6 +726,25 @@ match Sys.os_type with | _ -> failwith "this system is not supported" ``` -Note a couple of extra pattern matching features in this example too. A -so-called "range pattern" is used to match either `"Unix"` or -`"Cygwin"`, and the special `_` pattern which matches "anything else". +It is right to use `failwith`, using `assert` would be wrong. Here is the dual +example: +```ocaml +function x when true -> () | _ -> assert false +``` +Here, it would be wrong to use `failwith` since it requires the compiler to be +bugged or the system to be corrupted for second code path to be executed. +Breakage of the language semantics qualifies as extraordinary circumstances, it +is catastrophic. + +# Concluding Remarks + +Properly handling errors is a complex matter. It is [cross-cutting +concern](https://en.wikipedia.org/wiki/Cross-cutting_concern), it touches all +parts of an application and can't be isolated in dedicated module. In contrast +to several other main stream languages, OCaml provides several mechanisms to +handled exceptional circumstances, all with good runtime performances and code +understandability. Using them properly requires some initial learning and +partice. Later, it always require some thinking, which is good since proper +management of errors shouldn't ever be overlooked. No error handling is better +than the others, and is should be matter of adequacy to the context rather some +of taste. But opiniated OCaml code is also fine, so it's a balance. From 6cf262ed01ca898d00e1293b1cfaab73460a7452 Mon Sep 17 00:00:00 2001 From: Cuihtlauac Alvarado Date: Wed, 1 Mar 2023 18:00:12 +0100 Subject: [PATCH 03/43] Apply suggestions from @dustanddreams Thanks a lot Miod, I've merged all you suggestions Co-authored-by: Miod Vallat <118974489+dustanddreams@users.noreply.github.com> --- data/tutorials/lg_08_error_handling.md | 72 +++++++++++++------------- 1 file changed, 36 insertions(+), 36 deletions(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index 88f049e9ae..ecead21445 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -92,7 +92,7 @@ Although the last one doesn't look as an exception, it actually is. ``` Among them the predefined exceptions of the standard library, the following ones -are intended to be raised by user written functions: +are intended to be raised by user-written functions: ```ocaml exception Exit exception Not_found @@ -206,8 +206,8 @@ case for stack overflows, which aren't always detected. ### Bypassing Type-Safety -OCaml provides a mean to bypass it's own type safety. Don't use it. Here is how -to shoot in its own feet: +OCaml provides a mean to bypass its own type safety. Don't use it. Here is how +to shoot in one's own feet: ```shell > echo "(Obj.magic () : int array).(0)" > foo.ml @@ -246,7 +246,7 @@ The `Option` module provides the first alternative to exceptions. The `'a option` datatype allows to express either the availability of data for instance `Some 42` or the absence of data using `None`, which can represent an error. -Using `Option` it is possible to write function that return `None` instead of +Using `Option` it is possible to write functions that return `None` instead of throwing an exception. ```ocaml let div_opt m n = @@ -289,7 +289,7 @@ not present, etc.) to return a more explicit type such as `'a option` or `('a, There are two naming conventions to have two versions of the same partial function, one raising exception, the other returning an option. In the above -examples, the convention of the standard library is used add an `_opt` suffix to +examples, the convention of the standard library is to add an `_opt` suffix to name of the version of the function which returns an option instead of raising exceptions. ```ocaml @@ -299,7 +299,7 @@ val find_opt: ('a -> bool) -> 'a list -> 'a option ``` This is extracted from the `List` module of the standard library. -However, some project tend to avoid or reduce the usage of exceptions. In such a +However, some projects tend to avoid or reduce the usage of exceptions. In such a context, reversing the convention is a relatively common idiom. It is the version of the function which raises exceptions that is suffixed with `_exn`. Using the same functions, that would be the specification @@ -429,9 +429,9 @@ let map f = function If `f` can be applied to something, its result is rewrapped into a fresh option. If there isn't anything to supply to `f`, `None` is forwarded. -If we don't take arguments order into account, `Option.bind` is almost exacly +If we don't take arguments order into account, `Option.bind` is almost exactly the same, except we assume `f` returns an option, therefore there is no need to -rewrapped its result, it's already an option value: +rewrap its result, since it's already an option value: ```ocaml let bind opt f = match opt with | Some x -> f x @@ -466,7 +466,7 @@ robustness. A couple of observations: - The same local names are used, with the same types * There isn't any indentation or pattern-matching left * Line 1: - - right-hand side of `=` : `Option.map` allows adding 1 to the result of `String.index_opt`, if it didn't failed + - right-hand side of `=` : `Option.map` allows adding 1 to the result of `String.index_opt`, if it didn't fail - left-hand side of `=` : the `let*` syntax turns all the rest of the code (from line 2 to the end) into the body of an anonymous function which takes `fqdn_pos` as parameter, and the function `( let* )` is called with the @@ -474,18 +474,18 @@ robustness. A couple of observations: second parameter). * Lines 2 and 3: same as in the original * Line 4: `try` or `match` is removed -* Line 5: `String.sub` is applied, if the previous step didn't failed, otherwise the error is forwarded +* Line 5: `String.sub` is applied, if the previous step didn't fail, otherwise the error is forwarded * Line 6: if nothing was found earlier, and if isn't empty, `fqdn` is returned as a fallback When used to handle errors with catch statements, it requires some time to get -used the latter style. The key idea is avoiding or defering from directly +used the latter style. The key idea is avoiding or deferring from directly looking into option values, instead pass them along using _ad-hoc_ pipes (such as `map` and `bind`). Erik Meijer calls that following the happy path. Visually, it also looks like to the “early return“ pattern often found in C. -One of the limitation of the option type is it doesn't record the reason which -prevented from having a value. `None` is silent, it doesn't say anything about -what went wrong. For this reason, function returning option values should +One of the limitations of the option type is it doesn't record the reason which +prevented having a return value. `None` is silent, it doesn't say anything about +what went wrong. For this reason, functions returning option values should document the circumstances under which it may return `None`. Such a documentation is likely to ressemble to the one required for exceptions using `@raise`. The `Result` type is intended to fill this gap: manage error in data, @@ -503,17 +503,17 @@ type ('a, 'b) result = ``` A value `Ok x` means that the computation succeeded and produced `x`, and a -value `Error e` means that it failed and `e` represents whatever information -collected in the process. Pattern matching can be used to deal with both cases, +value `Error e` means that it failed and `e` represents whatever error information +has been collected in the process. Pattern matching can be used to deal with both cases, as with any other sum type. However using `map` and `bind` can be more -convinient, maybe even more as it was with `Option`. +convenient, maybe even more as it was with `Option`. Before taking a look at `Result.map`, let's think about `List.map` and -`Option.map` under a changed perspective. Both functions behaves like the +`Option.map` under a changed perspective. Both functions behave as identity when applied to `[]` or `None`, respectively. That's the only possibility since those parameters don't carry any data. Which isn't the case in `Result` with its `Error` constructor. Nethertheless, `Result.map` is -implemented likewise, on `Error`, it also behaves like the identity. +implemented likewise, on `Error`, it also behaves like identity. ```ocaml let map f = function | Ok x -> Ok (f x) @@ -558,7 +558,7 @@ string wrapped in an option, if anything goes wrong `None` is returned. - Finally, `let*` stands for `Result.bind` Since functions from the `Yaml` module both returns `result` data, it is easier -to write a pipe which process that type all along. That's why `Option.to_result` +to write a pipe which processes that type all along. That's why `Option.to_result` needs to be used. Stages which produce `result` must be chained using `bind`, stages which do not must be chained using some map function, in order for the result to be wrapped back into a `result`. @@ -604,8 +604,8 @@ it's always possible to replace one by the other, like in a completely neutral refactoring. Values of type `result` and `Either.t` can be translated back and forth, and appling both translations one after the other, in any order, returns to the starting value. Nethertheless, this doesn't mean `result` should be used -in place of `Either.t`, or vise versa. Naming things matters, as punned by Phil -Karlton famous quote: +in place of `Either.t`, or vice versa. Naming things matters, as punned by Phil +Karlton's famous quote: > There are only two hard things in Computer Science: cache invalidation and > naming things. @@ -618,12 +618,12 @@ tools, data and functions can help. Use them. When `Option.bind` or `Result.bind` are used, they are often aliased into a custom binding operator, such as `let*`. However, it is also possible to use it as binary operator, which is almost always writen `>>=`. Using `bind` this -way must be detailed because it is extremly popular in other functional -programming language, and specially in OCaml's arch-rival _Which Must Not Be +way must be detailed because it is extremely popular in other functional +programming languages, and specially in OCaml's arch-rival _Which Must Not Be Named_. -Assuming `a` and `b` are valid OCaml expressions, the three pieces of sources -code are exactly the same: +Assuming `a` and `b` are valid OCaml expressions, the following three pieces of sources +code are functionally identical: ```ocaml bind a (fun x -> b) @@ -650,7 +650,7 @@ c a >>= fun x -> b >>= fun y -> c ``` Variables `x` and `y` may appear in `c` in the three cases. The first form isn't -very convinent, it uses a lot of parenthesis. The second one is often the +very convenient, it uses a lot of parenthesis. The second one is often the prefered one due to its ressemblance with regular local definitions. The third one is harder to read, `>>=` associates to the right in order to avoid parenthesis in that precise case, but it's easy to get lost. Nethertheless, it @@ -665,7 +665,7 @@ let* x = a in let* y = f x in g y ``` -Writing `x >>= f` is very close to what is found in functionally tained +Writing `x >>= f` is very close to what is found in functionally tainted programming languages which have methods and receivers such as Kotlin, Scala, Go, Rust, Swift or even modern Java, where it would be looking like: `x.bind(f)`. @@ -685,7 +685,7 @@ Programming](https://en.wikipedia.org/wiki/Tacit_programming). Thanks to the associativity priorities of the >>= and |> operators, no parenthesis are needed. OCaml has a strict typing discipline, not a strict styling discipline, therefore -picking the right style is an author's decision. See the [OCaml Programming +picking the right style is left to the author's decision. See the [OCaml Programming Guidelines](/docs/guidelines) for more details on those matters. ## Assertions @@ -709,7 +709,7 @@ used to indicate [dead code](https://en.wikipedia.org/wiki/Dead_code), parts of the program that must be writen (often for type-checking or pattern matching completeness) but are unreachable at run time. -Asserts should be undetstood as executable comments. There aren't supposed to +Asserts should be understood as executable comments. There aren't supposed to fail, unless during debugging or truely extraordinary circumstances absolutely preventing the execution from making any kind of progress. @@ -739,12 +739,12 @@ is catastrophic. # Concluding Remarks Properly handling errors is a complex matter. It is [cross-cutting -concern](https://en.wikipedia.org/wiki/Cross-cutting_concern), it touches all -parts of an application and can't be isolated in dedicated module. In contrast -to several other main stream languages, OCaml provides several mechanisms to -handled exceptional circumstances, all with good runtime performances and code +concern](https://en.wikipedia.org/wiki/Cross-cutting_concern), touches all +parts of an application and can't be isolated in a dedicated module. In contrast +to several other mainstream languages, OCaml provides several mechanisms to +handle exceptional circumstances, all with good runtime performance and code understandability. Using them properly requires some initial learning and -partice. Later, it always require some thinking, which is good since proper +practice. Later, it always requires some thinking, which is good since proper management of errors shouldn't ever be overlooked. No error handling is better -than the others, and is should be matter of adequacy to the context rather some +than the others, and is should be matter of adequacy to the context rather than of taste. But opiniated OCaml code is also fine, so it's a balance. From 779458532e16211cdf7bcefe8664f659dc45372f Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Wed, 1 Mar 2023 18:14:33 +0100 Subject: [PATCH 04/43] Minor fixes --- data/tutorials/lg_08_error_handling.md | 146 ++++++++++++++----------- 1 file changed, 81 insertions(+), 65 deletions(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index ecead21445..266a82de21 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -38,14 +38,15 @@ Historically, the first way of handling errors in OCaml is exceptions. The standard library relies heavily upon them. The biggest issue with exceptions is that they do not appear in types. One has -to read the documentation to see that, indeed, `List.find` or `String.sub` are not -total functions, and that they might fail by raising an exception. +to read the documentation to see that, indeed, `List.find` or `String.sub` are +not total functions, and that they might fail by raising an exception. However, exceptions have the great merit of being compiled into efficient machine code. When implementing trial and error approaches likely to back-track often, exceptions can be used to acheive good performance. -Exceptions belong to the type `exn` which is an [extensible sum type](/releases/latest/manual/extensiblevariants.html). +Exceptions belong to the type `exn` which is an [extensible sum +type](/releases/latest/manual/extensiblevariants.html). ```ocaml # exception Foo of string;; @@ -100,12 +101,16 @@ exception Invalid_argument of string exception Failure of string ``` -* `Exit` terminates your program with a success status, which is 0 in Unices (they do error values) -* `Not_found` should be raised when searching failed because there isn't anything satisfactory to be found +* `Exit` terminates your program with a success status, which is 0 in Unices + (where success is 0 and any other value is an error, that is, errors are + handled as special values, like mentionned in the first section) +* `Not_found` should be raised when searching failed because there isn't + anything satisfactory to be found * `Invalid_argument` should be raised when a parameter can't be accepted * `Failure` should be raised when a result can't be produced -Functions are provided to raise `Invalid_argument` and `Failure` using a string parameter: +Functions are provided to raise `Invalid_argument` and `Failure` using a string +parameter: ```ocaml val invalid_arg : string -> 'a (** @raise Invalid_argument *) @@ -140,15 +145,15 @@ val foo : a -> b ### Stack traces To get a stack trace when an unhandled exception makes your program crash, you -need to compile the program in "debug" mode (with `-g` when calling -`ocamlc`, or `-tag 'debug'` when calling `ocamlbuild`). -Then: +need to compile the program in "debug" mode (with `-g` when calling `ocamlc`, or +`-tag 'debug'` when calling `ocamlbuild`). Then: ``` OCAMLRUNPARAM=b ./myprogram [args] ``` -And you will get a stack trace. Alternatively, you can call, from within the program, +And you will get a stack trace. Alternatively, you can call, from within the +program, ```ocaml let () = Printexc.record_backtrace true @@ -156,10 +161,10 @@ let () = Printexc.record_backtrace true ### Printing -To print an exception, the module `Printexc` comes in handy. For instance, -the following function `notify_user : (unit -> 'a) -> 'a` can be used -to call a function and, if it fails, print the exception on `stderr`. -If stack traces are enabled, this function will also display it. +To print an exception, the module `Printexc` comes in handy. For instance, the +following function `notify_user : (unit -> 'a) -> 'a` can be used to call a +function and, if it fails, print the exception on `stderr`. If stack traces are +enabled, this function will also display it. ```ocaml let notify_user f = @@ -170,8 +175,8 @@ let notify_user f = raise e ``` -OCaml knows how to print its built-in exception, but you can also tell it -how to print your own exceptions: +OCaml knows how to print its built-in exception, but you can also tell it how to +print your own exceptions: ```ocaml exception Foo of int @@ -184,9 +189,9 @@ let () = ) ``` -Each printer should take care of the exceptions it knows about, returning -`Some `, and return `None` otherwise (let the other printers -do the job!). +Each printer should take care of the exceptions it knows about, returning `Some +`, and return `None` otherwise (let the other printers do the +job!). ## Runtime Crashes @@ -200,9 +205,12 @@ raising meaningful exceptions. However, some error conditions may remain undetected, which will result in a segmentation fault. This is the specially the case for stack overflows, which aren't always detected. -> But catching stack overflows is tricky, both in Unix-like systems and under Windows, so the current implementation in OCaml is a best effort that is occasionally buggy. +> But catching stack overflows is tricky, both in Unix-like systems and under +> Windows, so the current implementation in OCaml is a best effort that is +> occasionally buggy. -[Xavier Leroy, October 2021](https://discuss.ocaml.org/t/stack-overflow-reported-as-segfault/8646/8?u=cuihtlauac) +[Xavier Leroy, October +2021](https://discuss.ocaml.org/t/stack-overflow-reported-as-segfault/8646/8) ### Bypassing Type-Safety @@ -225,8 +233,10 @@ When a crash isn't coming from: It may be a language bug. It happens. Here is what to do when this is suspected: 1. Make sure the crash affects both compilers: bytecode and native -1. Write a self-contained and minimal proof-of-concept code which does nothing but triggering the crash -1. File an issue in the [OCaml Bug Tracker in GitHub](https://github.com/ocaml/ocaml/issues) +1. Write a self-contained and minimal proof-of-concept code which does nothing + but triggering the crash +1. File an issue in the [OCaml Bug Tracker in + GitHub](https://github.com/ocaml/ocaml/issues) Here is an example of such a bug: https://github.com/ocaml/ocaml/issues/7241 @@ -299,8 +309,8 @@ val find_opt: ('a -> bool) -> 'a list -> 'a option ``` This is extracted from the `List` module of the standard library. -However, some projects tend to avoid or reduce the usage of exceptions. In such a -context, reversing the convention is a relatively common idiom. It is the +However, some projects tend to avoid or reduce the usage of exceptions. In such +a context, reversing the convention is a relatively common idiom. It is the version of the function which raises exceptions that is suffixed with `_exn`. Using the same functions, that would be the specification ```ocaml @@ -354,11 +364,14 @@ match opt with | None -> ... (* Something *) | Some x -> ... (* Something else *) ``` -However, sequencing such expressions leads to deep nesting which is often considered bad: +However, sequencing such expressions leads to deep nesting which is often +considered bad: -> if you need more than 3 levels of indentation, you're screwed anyway, and should fix your program. +> if you need more than 3 levels of indentation, you're screwed anyway, and +> should fix your program. -[Linux Kernel Style Guide](https://www.kernel.org/doc/Documentation/process/coding-style.rst) +[Linux Kernel Style +Guide](https://www.kernel.org/doc/Documentation/process/coding-style.rst) The recomended way to avoid that is to refrain from or delay attempting to access the content of an option value. @@ -466,16 +479,19 @@ robustness. A couple of observations: - The same local names are used, with the same types * There isn't any indentation or pattern-matching left * Line 1: - - right-hand side of `=` : `Option.map` allows adding 1 to the result of `String.index_opt`, if it didn't fail - - left-hand side of `=` : the `let*` syntax turns all the rest of the - code (from line 2 to the end) into the body of an anonymous function which - takes `fqdn_pos` as parameter, and the function `( let* )` is called with the + - right-hand side of `=` : `Option.map` allows adding 1 to the result of + `String.index_opt`, if it didn't fail + - left-hand side of `=` : the `let*` syntax turns all the rest of the code + (from line 2 to the end) into the body of an anonymous function which takes + `fqdn_pos` as parameter, and the function `( let* )` is called with the right-hand side of `=` (as first parameter) and that anonymous function (as second parameter). * Lines 2 and 3: same as in the original * Line 4: `try` or `match` is removed -* Line 5: `String.sub` is applied, if the previous step didn't fail, otherwise the error is forwarded -* Line 6: if nothing was found earlier, and if isn't empty, `fqdn` is returned as a fallback +* Line 5: `String.sub` is applied, if the previous step didn't fail, otherwise + the error is forwarded +* Line 6: if nothing was found earlier, and if isn't empty, `fqdn` is returned + as a fallback When used to handle errors with catch statements, it requires some time to get used the latter style. The key idea is avoiding or deferring from directly @@ -503,17 +519,17 @@ type ('a, 'b) result = ``` A value `Ok x` means that the computation succeeded and produced `x`, and a -value `Error e` means that it failed and `e` represents whatever error information -has been collected in the process. Pattern matching can be used to deal with both cases, -as with any other sum type. However using `map` and `bind` can be more -convenient, maybe even more as it was with `Option`. +value `Error e` means that it failed and `e` represents whatever error +information has been collected in the process. Pattern matching can be used to +deal with both cases, as with any other sum type. However using `map` and `bind` +can be more convenient, maybe even more as it was with `Option`. Before taking a look at `Result.map`, let's think about `List.map` and -`Option.map` under a changed perspective. Both functions behave as -identity when applied to `[]` or `None`, respectively. That's the only -possibility since those parameters don't carry any data. Which isn't the case in -`Result` with its `Error` constructor. Nethertheless, `Result.map` is -implemented likewise, on `Error`, it also behaves like identity. +`Option.map` under a changed perspective. Both functions behave as identity when +applied to `[]` or `None`, respectively. That's the only possibility since those +parameters don't carry any data. Which isn't the case in `Result` with its +`Error` constructor. Nethertheless, `Result.map` is implemented likewise, on +`Error`, it also behaves like identity. ```ocaml let map f = function | Ok x -> Ok (f x) @@ -549,19 +565,19 @@ val Yaml.Util.find : string -> Yaml.value -> (Yaml.value option, [`Msg of string val Option.to_result : none:'e -> 'a option -> ('a, 'e) result ``` -- `File.read_opt` is supposed to open a file, read its contents and return it as a -string wrapped in an option, if anything goes wrong `None` is returned. +- `File.read_opt` is supposed to open a file, read its contents and return it as +a string wrapped in an option, if anything goes wrong `None` is returned. - `Yaml.of_string` parses a string an turns into an ad-hoc OCaml type -- `Yaml.find` recursively searches a key in a Yaml tree, if found, it returns the - corresponding data, wrapped in an option +- `Yaml.find` recursively searches a key in a Yaml tree, if found, it returns + the corresponding data, wrapped in an option - `Option.to_result` perform conversion of an `option` into a `result` - Finally, `let*` stands for `Result.bind` Since functions from the `Yaml` module both returns `result` data, it is easier -to write a pipe which processes that type all along. That's why `Option.to_result` -needs to be used. Stages which produce `result` must be chained using `bind`, -stages which do not must be chained using some map function, in order for the -result to be wrapped back into a `result`. +to write a pipe which processes that type all along. That's why +`Option.to_result` needs to be used. Stages which produce `result` must be +chained using `bind`, stages which do not must be chained using some map +function, in order for the result to be wrapped back into a `result`. The map functions of the `Result` module allows processing of data or errors, but the routines used must not fail, as `Result.map` will never turn an `Ok` @@ -616,14 +632,13 @@ tools, data and functions can help. Use them. ## `bind` as a Binary Operator When `Option.bind` or `Result.bind` are used, they are often aliased into a -custom binding operator, such as `let*`. However, it is also possible to use -it as binary operator, which is almost always writen `>>=`. Using `bind` this -way must be detailed because it is extremely popular in other functional -programming languages, and specially in OCaml's arch-rival _Which Must Not Be -Named_. +custom binding operator, such as `let*`. However, it is also possible to use it +as binary operator, which is almost always writen `>>=`. Using `bind` this way +must be detailed because it is extremely popular in other functional programming +languages, and specially in OCaml's arch-rival _Which Must Not Be Named_. -Assuming `a` and `b` are valid OCaml expressions, the following three pieces of sources -code are functionally identical: +Assuming `a` and `b` are valid OCaml expressions, the following three pieces of +sources code are functionally identical: ```ocaml bind a (fun x -> b) @@ -667,7 +682,8 @@ g y ``` Writing `x >>= f` is very close to what is found in functionally tainted programming languages which have methods and receivers such as Kotlin, Scala, -Go, Rust, Swift or even modern Java, where it would be looking like: `x.bind(f)`. +Go, Rust, Swift or even modern Java, where it would be looking like: +`x.bind(f)`. Here is the same code as presented at the end of the previous section, rewritten using `Result.bind` as a binary opeator: @@ -685,8 +701,8 @@ Programming](https://en.wikipedia.org/wiki/Tacit_programming). Thanks to the associativity priorities of the >>= and |> operators, no parenthesis are needed. OCaml has a strict typing discipline, not a strict styling discipline, therefore -picking the right style is left to the author's decision. See the [OCaml Programming -Guidelines](/docs/guidelines) for more details on those matters. +picking the right style is left to the author's decision. See the [OCaml +Programming Guidelines](/docs/guidelines) for more details on those matters. ## Assertions @@ -739,10 +755,10 @@ is catastrophic. # Concluding Remarks Properly handling errors is a complex matter. It is [cross-cutting -concern](https://en.wikipedia.org/wiki/Cross-cutting_concern), touches all -parts of an application and can't be isolated in a dedicated module. In contrast -to several other mainstream languages, OCaml provides several mechanisms to -handle exceptional circumstances, all with good runtime performance and code +concern](https://en.wikipedia.org/wiki/Cross-cutting_concern), touches all parts +of an application and can't be isolated in a dedicated module. In contrast to +several other mainstream languages, OCaml provides several mechanisms to handle +exceptional circumstances, all with good runtime performance and code understandability. Using them properly requires some initial learning and practice. Later, it always requires some thinking, which is good since proper management of errors shouldn't ever be overlooked. No error handling is better From c546b8f87c580ab3816c433abeac04a6ee65c851 Mon Sep 17 00:00:00 2001 From: Cuihtlauac Alvarado Date: Thu, 2 Mar 2023 08:15:52 +0100 Subject: [PATCH 05/43] Apply suggestions from @christinerose Co-authored-by: Christine Rose --- data/tutorials/lg_08_error_handling.md | 88 +++++++++++++------------- 1 file changed, 45 insertions(+), 43 deletions(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index 266a82de21..6cf84d3269 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -25,7 +25,9 @@ OCaml has three major ways to deal with errors: 1. `Option` values 1. `Result` values -Use them. Do not encode errors inside data. Exceptions provide a mean to deal +Use them. Do not encode errors inside data. + +Exceptions provide a mean to deal with errors at the control flow level while `Option` and `Result` provide a mean to turn errors into dedicated data. @@ -92,7 +94,7 @@ Although the last one doesn't look as an exception, it actually is. - : int list = [] ``` -Among them the predefined exceptions of the standard library, the following ones +Among them, the predefined exceptions of the standard library. The following ones are intended to be raised by user-written functions: ```ocaml exception Exit @@ -123,8 +125,8 @@ exceptions, a design decision must be made: * Use the prexisting exceptions * Raise custom exceptions -Both can make sense, there isn't a general rule. If the exceptions of the -standard library are used, they must be raised under the conditions they are +Both can make sense, and there isn't a general rule. If the standard library exceptions +are used, they must be raised under the conditions they are intended to, otherwise handlers will have trouble processing them. Using custom exceptions will force client code to include dedicated catch conditions. This can be desirable for errors that must be handled at the the client level. @@ -142,7 +144,7 @@ val foo : a -> b *) ``` -### Stack traces +### Stack Traces To get a stack trace when an unhandled exception makes your program crash, you need to compile the program in "debug" mode (with `-g` when calling `ocamlc`, or @@ -198,9 +200,9 @@ job!). Although OCaml is a very safe language, it is possible to trigger unrecoverable errors at runtime. -### Exceptions not Raised +### Exceptions Not Raised -Under panic circumstances, the native code compiler does a best-effort at +Under panic circumstances, the native code compiler gives its best effort for raising meaningful exceptions. However, some error conditions may remain undetected, which will result in a segmentation fault. This is the specially the case for stack overflows, which aren't always detected. @@ -298,9 +300,9 @@ not present, etc.) to return a more explicit type such as `'a option` or `('a, ### Naming Conventions There are two naming conventions to have two versions of the same partial -function, one raising exception, the other returning an option. In the above +function: one raising exception, the other returning an option. In the above examples, the convention of the standard library is to add an `_opt` suffix to -name of the version of the function which returns an option instead of raising +name of the version of the function that returns an option instead of raising exceptions. ```ocaml val find: ('a -> bool) -> 'a list -> 'a @@ -311,7 +313,7 @@ This is extracted from the `List` module of the standard library. However, some projects tend to avoid or reduce the usage of exceptions. In such a context, reversing the convention is a relatively common idiom. It is the -version of the function which raises exceptions that is suffixed with `_exn`. +version of the function that raises exceptions that is suffixed with `_exn`. Using the same functions, that would be the specification ```ocaml val find_exn: ('a -> bool) -> 'a list -> 'a @@ -381,9 +383,9 @@ access the content of an option value. Let's start with an example. Let's imagine one needs to write a function returning the [hostname](https://en.wikipedia.org/wiki/Hostname) part of an email address. For instance, given the email -"gaston.lagaffe@courrier.dupuis.be", it would return "courrier". +"gaston.lagaffe@courrier.dupuis.be," it would return "courrier." -Here is a questionable but straitforward implementation, using exceptions: +Here is a questionable but straightforward implementation using exceptions: ```ocaml let host email = let fqdn_pos = String.index email '@' + 1 in @@ -397,16 +399,16 @@ let host email = ``` This may fail by raising `Not_found` if the first the call to `String.index` does, which make sense since if there is no `@` character in the input string, -it's not an email address. However, if the second call to `String.index` fails, +signifying that it's not an email address. However, if the second call to `String.index` fails, meaning no dot character was found, we may return the whole fully qualified domain name (FQDN) as a fallback, but only if it isn't the empty string. Note that `String.sub` may throw `Invalid_argument`. Fortunately, this can't happen. In the worst case, the `@` character is the last one, then `fqdn_pos` is -off range by one but `fqdn_len` is null and that combination of parameters +off range by one but `fqdn_len` is null, and that combination of parameters doesn't count as an invalid substring. -Here the equivalent function, using the same logic but `Option` instead of +Below is the equivalent function using the same logic, but using `Option` instead of exceptions: ```ocaml let host_opt email = @@ -443,7 +445,7 @@ If `f` can be applied to something, its result is rewrapped into a fresh option. If there isn't anything to supply to `f`, `None` is forwarded. If we don't take arguments order into account, `Option.bind` is almost exactly -the same, except we assume `f` returns an option, therefore there is no need to +the same, except we assume `f` returns an option. Therefore, there is no need to rewrap its result, since it's already an option value: ```ocaml let bind opt f = match opt with @@ -471,13 +473,13 @@ val host_opt : string -> string option = ``` This version was picked to illustrate how to use and combine operations on -options allowing to acheive some balance between understandability and +options allowing users to achieve some balance between understandability and robustness. A couple of observations: * As in the original `host` function (with exceptions): - - The calls to `String` functions (`index_opt`, `length` and `sub`) are the + - The calls to `String` functions (`index_opt`, `length`, and `sub`) are the same and in the same order - - The same local names are used, with the same types -* There isn't any indentation or pattern-matching left + - The same local names are used with the same types +* There isn't any remaining indentation or pattern-matching * Line 1: - right-hand side of `=` : `Option.map` allows adding 1 to the result of `String.index_opt`, if it didn't fail @@ -495,9 +497,9 @@ robustness. A couple of observations: When used to handle errors with catch statements, it requires some time to get used the latter style. The key idea is avoiding or deferring from directly -looking into option values, instead pass them along using _ad-hoc_ pipes (such +looking into option values. Instead, pass them along using _ad-hoc_ pipes (such as `map` and `bind`). Erik Meijer calls that following the happy path. Visually, -it also looks like to the “early return“ pattern often found in C. +it also resembles the “early return“ pattern often found in C. One of the limitations of the option type is it doesn't record the reason which prevented having a return value. `None` is silent, it doesn't say anything about @@ -505,7 +507,7 @@ what went wrong. For this reason, functions returning option values should document the circumstances under which it may return `None`. Such a documentation is likely to ressemble to the one required for exceptions using `@raise`. The `Result` type is intended to fill this gap: manage error in data, -like option values but also provide information on errors, like exceptions. It +like option values, but also provide information on errors, like exceptions. It is the topic of the next section. ## `Result` Type @@ -570,8 +572,8 @@ a string wrapped in an option, if anything goes wrong `None` is returned. - `Yaml.of_string` parses a string an turns into an ad-hoc OCaml type - `Yaml.find` recursively searches a key in a Yaml tree, if found, it returns the corresponding data, wrapped in an option -- `Option.to_result` perform conversion of an `option` into a `result` -- Finally, `let*` stands for `Result.bind` +- `Option.to_result` performs conversion of an `option` into a `result`. +- Finally, `let*` stands for `Result.bind`. Since functions from the `Yaml` module both returns `result` data, it is easier to write a pipe which processes that type all along. That's why @@ -594,7 +596,7 @@ We would have: - `Result.map_error f (Error e) = Ok y` - `Result.map_error f (Error e) = Error e'` -Which means an error would be turned back into valid data, or changed into +This means an error would be turned back into valid data or changed into another error. This is almost like recovering from an error. However, when recovery fails, it may be preferable to preserve the initial cause of failure. That behaviour can be acheived by defining the following function: @@ -606,12 +608,12 @@ val recover : ('e -> 'a option) -> ('a, 'e) result -> ('a, 'e) result = Although any kind of data can be wrapped as a `result` `Error`, it is recommended to use that constructor to carry actual errors, for instance: -- `exn`, in which case the result type just makes exceptions explicit. -- `string`, where the error case is a message that indicates what failed. +- `exn`, in which case the result type just makes exceptions explicit +- `string`, where the error case is a message that indicates what failed - `string Lazy.t`, a more elaborate form of error message that is only evaluated - if printing is required. + if printing is required - some polymorphic variant, with one case per possible error. This is very - accurate (each error can be dealt with explicitly and occurs in the type) but + accurate (each error can be dealt with explicitly and occurs in the type), but the use of polymorphic variants sometimes make the code harder to read. Note that some say the types `result` and `Either.t` are @@ -627,7 +629,7 @@ Karlton's famous quote: > naming things. Properly handling errors always makes the code harder to read. Using the right -tools, data and functions can help. Use them. +tools, data, and functions can help. Use them. ## `bind` as a Binary Operator @@ -667,7 +669,7 @@ a >>= fun x -> b >>= fun y -> c Variables `x` and `y` may appear in `c` in the three cases. The first form isn't very convenient, it uses a lot of parenthesis. The second one is often the prefered one due to its ressemblance with regular local definitions. The third -one is harder to read, `>>=` associates to the right in order to avoid +one is harder to read, as `>>=` associates to the right in order to avoid parenthesis in that precise case, but it's easy to get lost. Nethertheless, it has some appeal when named functions are used. It looks a bit like good old Unix pipes: @@ -681,7 +683,7 @@ let* y = f x in g y ``` Writing `x >>= f` is very close to what is found in functionally tainted -programming languages which have methods and receivers such as Kotlin, Scala, +programming languages, which have methods and receivers such as Kotlin, Scala, Go, Rust, Swift or even modern Java, where it would be looking like: `x.bind(f)`. @@ -698,9 +700,9 @@ File.read_opt path By the way, this style is called [Tacit Programming](https://en.wikipedia.org/wiki/Tacit_programming). Thanks to the -associativity priorities of the >>= and |> operators, no parenthesis are needed. +associativity priorities of the `>>=` and `|>` operators, no parenthesis are needed. -OCaml has a strict typing discipline, not a strict styling discipline, therefore +OCaml has a strict typing discipline, not a strict styling discipline; therefore, picking the right style is left to the author's decision. See the [OCaml Programming Guidelines](/docs/guidelines) for more details on those matters. @@ -709,8 +711,8 @@ Programming Guidelines](/docs/guidelines) for more details on those matters. The built-in `assert` instruction takes an expression as an argument and throws the `Assert_failure` exception if the provided expression evaluates to `false`. Assuming that you don't catch this exception (it's probably unwise to catch this -exception, particularly for beginners), this results in the program stopping and -printing out the source file and line number where the error occurred. An +exception, particularly for beginners), this causes the program to stop and +print the source file and line number where the error occurred. An example: ```ocaml @@ -726,8 +728,8 @@ the program that must be writen (often for type-checking or pattern matching completeness) but are unreachable at run time. Asserts should be understood as executable comments. There aren't supposed to -fail, unless during debugging or truely extraordinary circumstances absolutely -preventing the execution from making any kind of progress. +fail, unless during debugging or truly extraordinary circumstances that absolutely +prevent the execution from making any kind of progress. When the execution reaches conditions which can't be handled, the right thing to do is to throw a `Failure`, using `failwith "error message"`. Assertions aren't @@ -742,14 +744,14 @@ match Sys.os_type with | _ -> failwith "this system is not supported" ``` -It is right to use `failwith`, using `assert` would be wrong. Here is the dual +It is right to use `failwith` because using `assert` would be incorrect. Here is the dual example: ```ocaml function x when true -> () | _ -> assert false ``` -Here, it would be wrong to use `failwith` since it requires the compiler to be +Here, it wouldn't be beneficial to use `failwith` since it requires the compiler to be bugged or the system to be corrupted for second code path to be executed. -Breakage of the language semantics qualifies as extraordinary circumstances, it +Breakage of the language semantics qualifies as extraordinary circumstances; it is catastrophic. # Concluding Remarks @@ -761,6 +763,6 @@ several other mainstream languages, OCaml provides several mechanisms to handle exceptional circumstances, all with good runtime performance and code understandability. Using them properly requires some initial learning and practice. Later, it always requires some thinking, which is good since proper -management of errors shouldn't ever be overlooked. No error handling is better +error management shouldn't ever be overlooked. No error handling is better than the others, and is should be matter of adequacy to the context rather than of taste. But opiniated OCaml code is also fine, so it's a balance. From d6586fd62430c9b9a42549499f4e777c10d38d26 Mon Sep 17 00:00:00 2001 From: Cuihtlauac Alvarado Date: Thu, 2 Mar 2023 08:43:42 +0100 Subject: [PATCH 06/43] Apply suggestions from code review Co-authored-by: Miod Vallat <118974489+dustanddreams@users.noreply.github.com> --- data/tutorials/lg_08_error_handling.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index 6cf84d3269..9bf89157c4 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -105,7 +105,7 @@ exception Failure of string * `Exit` terminates your program with a success status, which is 0 in Unices (where success is 0 and any other value is an error, that is, errors are - handled as special values, like mentionned in the first section) + handled as special values, like mentioned in the first section) * `Not_found` should be raised when searching failed because there isn't anything satisfactory to be found * `Invalid_argument` should be raised when a parameter can't be accepted @@ -177,7 +177,7 @@ let notify_user f = raise e ``` -OCaml knows how to print its built-in exception, but you can also tell it how to +OCaml knows how to print its built-in exceptions, but you can also tell it how to print your own exceptions: ```ocaml From a241df1fe87d09108c3a96a05a37cec568641a48 Mon Sep 17 00:00:00 2001 From: Cuihtlauac Alvarado Date: Thu, 2 Mar 2023 16:15:22 +0100 Subject: [PATCH 07/43] Apply suggestions from code review Co-authored-by: Riku Silvola <1518025+rikusilvola@users.noreply.github.com> --- data/tutorials/lg_08_error_handling.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index 9bf89157c4..c0a1540e31 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -216,8 +216,8 @@ case for stack overflows, which aren't always detected. ### Bypassing Type-Safety -OCaml provides a mean to bypass its own type safety. Don't use it. Here is how -to shoot in one's own feet: +OCaml provides means to bypass its type safety. Don't use it. Here is how +to shoot oneself in the foot: ```shell > echo "(Obj.magic () : int array).(0)" > foo.ml From 2de9cb2991618cd8fed2581892df259f102b43a3 Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Tue, 7 Mar 2023 14:54:21 +0100 Subject: [PATCH 08/43] Take more feedback into account --- data/tutorials/lg_08_error_handling.md | 48 +++++++++++++++++--------- 1 file changed, 32 insertions(+), 16 deletions(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index c0a1540e31..3e40670b4f 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -54,8 +54,8 @@ type](/releases/latest/manual/extensiblevariants.html). # exception Foo of string;; exception Foo of string -# let i_will_fail () = - raise (Foo "Oh no!");; +# let i_will_fail () = raise (Foo "Oh no!");; +val i_will_fail : unit -> 'a = # i_will_fail ();; Exception: Foo "Oh no!". @@ -82,7 +82,7 @@ Exception: Division_by_zero. Exception: Not_found. # String.sub "Hello world!" 3 (-2);; Exception: Invalid_argument "String.sub / Bytes.sub". -# let rec loop x = x :: loop x +# let rec loop x = x :: loop x;; val loop : 'a -> 'a list = # loop 42;; Stack overflow during evaluation (looping recursion?). @@ -111,8 +111,8 @@ exception Failure of string * `Invalid_argument` should be raised when a parameter can't be accepted * `Failure` should be raised when a result can't be produced -Functions are provided to raise `Invalid_argument` and `Failure` using a string -parameter: +Functions are provided by the standard libarry to raise `Invalid_argument` and +`Failure` using a string parameter: ```ocaml val invalid_arg : string -> 'a (** @raise Invalid_argument *) @@ -150,7 +150,7 @@ To get a stack trace when an unhandled exception makes your program crash, you need to compile the program in "debug" mode (with `-g` when calling `ocamlc`, or `-tag 'debug'` when calling `ocamlbuild`). Then: -``` +```shell OCAMLRUNPARAM=b ./myprogram [args] ``` @@ -261,13 +261,15 @@ option` datatype allows to express either the availability of data for instance Using `Option` it is possible to write functions that return `None` instead of throwing an exception. ```ocaml -let div_opt m n = +# let div_opt m n = try Some (m / n) with - Division_by_zero -> None + Division_by_zero -> None;; +val div_opt : int -> int -> int option = -let find_opt p l = +# let find_opt p l = try Some (List.find p l) with - Not_found -> None + Not_found -> None;; +val find_opt : ('a -> bool) -> 'a list -> 'a option = ``` We can try those functions: @@ -275,7 +277,7 @@ We can try those functions: # 1 / 0;; Exception: Division_by_zero. # div_opt 42 2;; -- : int option = Some 24 +- : int option = Some 21 # div_opt 42 0;; - : int option = None # List.find (fun x -> x mod 2 = 0) [1; 3; 5];; @@ -387,7 +389,7 @@ email address. For instance, given the email Here is a questionable but straightforward implementation using exceptions: ```ocaml -let host email = +# let host email = let fqdn_pos = String.index email '@' + 1 in let fqdn_len = String.length email - fqdn_pos in let fqdn = String.sub email fqdn_pos fqdn_len in @@ -395,7 +397,8 @@ let host email = let host_len = String.index fqdn '.' in String.sub fqdn 0 host_len with Not_found -> - if fqdn <> "" then fqdn else raise Not_found + if fqdn <> "" then fqdn else raise Not_found;; +val host : string -> string = ``` This may fail by raising `Not_found` if the first the call to `String.index` does, which make sense since if there is no `@` character in the input string, @@ -411,7 +414,7 @@ doesn't count as an invalid substring. Below is the equivalent function using the same logic, but using `Option` instead of exceptions: ```ocaml -let host_opt email = +# let host_opt email = match String.index_opt email '@' with | Some at_pos -> begin let fqdn_pos = at_pos + 1 in @@ -421,7 +424,8 @@ let host_opt email = | Some host_len -> Some (String.sub fqdn 0 host_len) | None -> if fqdn <> "" then Some fqdn else None end - | None -> None + | None -> None;; +val host_opt : string -> string option = ``` Although it qualifies as safe, its legibility isn't improved. Some may even @@ -532,6 +536,12 @@ applied to `[]` or `None`, respectively. That's the only possibility since those parameters don't carry any data. Which isn't the case in `Result` with its `Error` constructor. Nethertheless, `Result.map` is implemented likewise, on `Error`, it also behaves like identity. + +Here is its type: +```ocaml +val map : ('a -> 'b) -> ('a, 'c) result -> ('b, 'c) result +``` +And here is how it is written: ```ocaml let map f = function | Ok x -> Ok (f x) @@ -539,7 +549,13 @@ let map f = function ``` The `Result` module has two map functions: the one we've just seen and another -one, with the same logic, applied to `Error`: +one, with the same logic, applied to `Error` + +Here is its type: +```ocaml +val map : ('c -> 'd) -> ('a, 'c) result -> ('a, 'd) result +``` +And here is how it is written: ```ocaml let map_error f = function | Ok x -> Ok x From a24943ada0e5dffe6fe19dbc601e1b7beff2572f Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Wed, 26 Apr 2023 11:15:47 +0200 Subject: [PATCH 09/43] Revisit Text After Sleeping on it For a While --- data/tutorials/lg_08_error_handling.md | 81 +++++++++++++------------- 1 file changed, 41 insertions(+), 40 deletions(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index 3e40670b4f..301037b876 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -9,6 +9,8 @@ date: 2021-05-27T21:07:30-00:00 # Error Handling +In OCaml, errors can be handled in several ways. This document present most of the available means. However, handling errors using the effect handlers introduced in OCaml 5 isn't addressed yet. + ## Error as Special Values Don't do that. @@ -25,11 +27,10 @@ OCaml has three major ways to deal with errors: 1. `Option` values 1. `Result` values -Use them. Do not encode errors inside data. +Use them. Do not encode errors inside data. -Exceptions provide a mean to deal -with errors at the control flow level while `Option` and `Result` provide a mean -to turn errors into dedicated data. +Exceptions provide a mean to deal with errors at the control flow level while +`Option` and `Result` provide means to turn errors into dedicated data. The rest of this document presents and compares approaches towards error handling. @@ -40,8 +41,7 @@ Historically, the first way of handling errors in OCaml is exceptions. The standard library relies heavily upon them. The biggest issue with exceptions is that they do not appear in types. One has -to read the documentation to see that, indeed, `List.find` or `String.sub` are -not total functions, and that they might fail by raising an exception. +to read the documentation to see that, indeed, `List.find` or `String.sub` are functions that they might fail by raising an exception. However, exceptions have the great merit of being compiled into efficient machine code. When implementing trial and error approaches likely to back-track @@ -94,7 +94,7 @@ Although the last one doesn't look as an exception, it actually is. - : int list = [] ``` -Among them, the predefined exceptions of the standard library. The following ones +Among the predefined exceptions of the standard library, the following ones are intended to be raised by user-written functions: ```ocaml exception Exit @@ -111,7 +111,7 @@ exception Failure of string * `Invalid_argument` should be raised when a parameter can't be accepted * `Failure` should be raised when a result can't be produced -Functions are provided by the standard libarry to raise `Invalid_argument` and +Functions are provided by the standard library to raise `Invalid_argument` and `Failure` using a string parameter: ```ocaml val invalid_arg : string -> 'a @@ -147,7 +147,7 @@ val foo : a -> b ### Stack Traces To get a stack trace when an unhandled exception makes your program crash, you -need to compile the program in "debug" mode (with `-g` when calling `ocamlc`, or +need to compile the program in debug mode (with `-g` when calling `ocamlc`, or `-tag 'debug'` when calling `ocamlbuild`). Then: ```shell @@ -163,10 +163,10 @@ let () = Printexc.record_backtrace true ### Printing -To print an exception, the module `Printexc` comes in handy. For instance, the -following function `notify_user : (unit -> 'a) -> 'a` can be used to call a -function and, if it fails, print the exception on `stderr`. If stack traces are -enabled, this function will also display it. +To print an exception, the module `Printexc` comes in handy. For instance, it +allows the definition of a function such as `notify_user : (unit -> 'a) -> 'a` +that calls a function and, if it fails, print the exception on `stderr`. If +stack traces are enabled, this function will also display it. ```ocaml let notify_user f = @@ -193,7 +193,7 @@ let () = Each printer should take care of the exceptions it knows about, returning `Some `, and return `None` otherwise (let the other printers do the -job!). +job). ## Runtime Crashes @@ -259,7 +259,7 @@ option` datatype allows to express either the availability of data for instance `Some 42` or the absence of data using `None`, which can represent an error. Using `Option` it is possible to write functions that return `None` instead of -throwing an exception. +throwing an exception. Here are two examples of such functions: ```ocaml # let div_opt m n = try Some (m / n) with @@ -296,16 +296,16 @@ val try_opt : ('a -> 'b) -> 'a -> 'b option = It tends to be considered good practice nowadays when a function can fail in cases that are not bugs (i.e., not `assert false`, but network failures, keys -not present, etc.) to return a more explicit type such as `'a option` or `('a, -'b) result` (see next section). +not present, etc.) to return type such as `'a option` or `('a, 'b) result` (see +next section) rather than throwing an exception. ### Naming Conventions There are two naming conventions to have two versions of the same partial function: one raising exception, the other returning an option. In the above -examples, the convention of the standard library is to add an `_opt` suffix to -name of the version of the function that returns an option instead of raising -exceptions. +examples, the convention of the standard library is used: adding an `_opt` +suffix to name of the version of the function that returns an option instead of +raising exceptions. ```ocaml val find: ('a -> bool) -> 'a list -> 'a (** @raise Not_found *) @@ -322,6 +322,7 @@ val find_exn: ('a -> bool) -> 'a list -> 'a (** @raise Not_found *) val find: ('a -> bool) -> 'a list -> 'a option ``` + ### Composing Functions Returning Options The function `div_opt` can't raise exceptions. However, since it doesn't return @@ -347,7 +348,7 @@ val fold : none:'a -> some:('b -> 'a) -> 'b t -> 'a ``` `get` returns the content or raises `Invalid_argument` if applied to `None`. `value` essentially behaves as `get`, except it must be called with a default -value which will be returned of if applied to `None`. `fold` also needs to be +value which will be returned if applied to `None`. `fold` also needs to be passed a default value that is returned when called on `None`, but it also expects a function that will be applied to the content of the option, when not empty. @@ -378,14 +379,14 @@ considered bad: Guide](https://www.kernel.org/doc/Documentation/process/coding-style.rst) The recomended way to avoid that is to refrain from or delay attempting to -access the content of an option value. +access the content of an option value, as explained in the next sub section. ### Using on `Option.map` and `Option.bind` Let's start with an example. Let's imagine one needs to write a function returning the [hostname](https://en.wikipedia.org/wiki/Hostname) part of an -email address. For instance, given the email -"gaston.lagaffe@courrier.dupuis.be," it would return "courrier." +email address provided as a string. For instance, given the string +`"gaston.lagaffe@courrier.dupuis.be"` it would return the string `"courrier"` (one may have a point arguing against such a design, but this is only an example). Here is a questionable but straightforward implementation using exceptions: ```ocaml @@ -489,9 +490,8 @@ robustness. A couple of observations: `String.index_opt`, if it didn't fail - left-hand side of `=` : the `let*` syntax turns all the rest of the code (from line 2 to the end) into the body of an anonymous function which takes - `fqdn_pos` as parameter, and the function `( let* )` is called with the - right-hand side of `=` (as first parameter) and that anonymous function (as - second parameter). + `fqdn_pos` as parameter, and the function `( let* )` is called with `fqdn_pos` + and that anonymous function. * Lines 2 and 3: same as in the original * Line 4: `try` or `match` is removed * Line 5: `String.sub` is applied, if the previous step didn't fail, otherwise @@ -502,8 +502,8 @@ robustness. A couple of observations: When used to handle errors with catch statements, it requires some time to get used the latter style. The key idea is avoiding or deferring from directly looking into option values. Instead, pass them along using _ad-hoc_ pipes (such -as `map` and `bind`). Erik Meijer calls that following the happy path. Visually, -it also resembles the “early return“ pattern often found in C. +as `map` and `bind`). Erik Meijer call that style: “following the happy path”. +Visually, it also resembles the “early return“ pattern often found in C. One of the limitations of the option type is it doesn't record the reason which prevented having a return value. `None` is silent, it doesn't say anything about @@ -514,7 +514,7 @@ documentation is likely to ressemble to the one required for exceptions using like option values, but also provide information on errors, like exceptions. It is the topic of the next section. -## `Result` Type +## Using the `Result` Type for Errors The `Result` module of the standard library defines the following type: @@ -716,11 +716,12 @@ File.read_opt path By the way, this style is called [Tacit Programming](https://en.wikipedia.org/wiki/Tacit_programming). Thanks to the -associativity priorities of the `>>=` and `|>` operators, no parenthesis are needed. +associativity priorities of the `>>=` and `|>` operators, no parenthesis extends beyond a single line. -OCaml has a strict typing discipline, not a strict styling discipline; therefore, -picking the right style is left to the author's decision. See the [OCaml -Programming Guidelines](/docs/guidelines) for more details on those matters. +OCaml has a strict typing discipline, not a strict styling discipline; +therefore, picking the right style is left to the author's decision. That +applies error handling, pick a style knowingly. See the [OCaml Programming +Guidelines](/docs/guidelines) for more details on those matters. ## Assertions @@ -744,8 +745,8 @@ the program that must be writen (often for type-checking or pattern matching completeness) but are unreachable at run time. Asserts should be understood as executable comments. There aren't supposed to -fail, unless during debugging or truly extraordinary circumstances that absolutely -prevent the execution from making any kind of progress. +fail, unless during debugging or truly extraordinary circumstances that +absolutely prevent the execution from making any kind of progress. When the execution reaches conditions which can't be handled, the right thing to do is to throw a `Failure`, using `failwith "error message"`. Assertions aren't @@ -760,13 +761,13 @@ match Sys.os_type with | _ -> failwith "this system is not supported" ``` -It is right to use `failwith` because using `assert` would be incorrect. Here is the dual -example: +It is right to use `failwith` because using `assert` would be incorrect. Here is +the dual example: ```ocaml function x when true -> () | _ -> assert false ``` -Here, it wouldn't be beneficial to use `failwith` since it requires the compiler to be -bugged or the system to be corrupted for second code path to be executed. +Here, it wouldn't be beneficial to use `failwith` since it requires the compiler +to be bugged or the system to be corrupted for second code path to be executed. Breakage of the language semantics qualifies as extraordinary circumstances; it is catastrophic. From fffe5ac6e93d8ebc00d1dcbbbb6f60e7e4d657e9 Mon Sep 17 00:00:00 2001 From: Zineb-Ada <85489706+Zineb-Ada@users.noreply.github.com> Date: Wed, 26 Apr 2023 15:23:38 +0200 Subject: [PATCH 10/43] PR review 950 --- data/tutorials/lg_08_error_handling.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index 301037b876..60cf72edcd 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -9,7 +9,7 @@ date: 2021-05-27T21:07:30-00:00 # Error Handling -In OCaml, errors can be handled in several ways. This document present most of the available means. However, handling errors using the effect handlers introduced in OCaml 5 isn't addressed yet. +In OCaml, errors can be handled in several ways. This document presents most of the available means. However, handling errors using the effect handlers introduced in OCaml 5 isn't addressed yet. ## Error as Special Values @@ -20,7 +20,7 @@ instance, when receiving data through a network connection, a function expected to return the number of received bytes might return a negative number meaning: “timed out waiting”. Another example would be returning the empty string when extracting a substring of negative length. Great software was written using this -style, but is is not the proper way to deal with errors in OCaml. +style, but it is not the proper way to deal with errors in OCaml. OCaml has three major ways to deal with errors: 1. Exceptions @@ -45,7 +45,7 @@ to read the documentation to see that, indeed, `List.find` or `String.sub` are f However, exceptions have the great merit of being compiled into efficient machine code. When implementing trial and error approaches likely to back-track -often, exceptions can be used to acheive good performance. +often, exceptions can be used to achieve good performance. Exceptions belong to the type `exn` which is an [extensible sum type](/releases/latest/manual/extensiblevariants.html). @@ -129,7 +129,7 @@ Both can make sense, and there isn't a general rule. If the standard library exc are used, they must be raised under the conditions they are intended to, otherwise handlers will have trouble processing them. Using custom exceptions will force client code to include dedicated catch conditions. This -can be desirable for errors that must be handled at the the client level. +can be desirable for errors that must be handled at the client level. ### Documentation @@ -378,7 +378,7 @@ considered bad: [Linux Kernel Style Guide](https://www.kernel.org/doc/Documentation/process/coding-style.rst) -The recomended way to avoid that is to refrain from or delay attempting to +The recommended way to avoid that is to refrain from or delay attempting to access the content of an option value, as explained in the next sub section. ### Using on `Option.map` and `Option.bind` @@ -534,7 +534,7 @@ Before taking a look at `Result.map`, let's think about `List.map` and `Option.map` under a changed perspective. Both functions behave as identity when applied to `[]` or `None`, respectively. That's the only possibility since those parameters don't carry any data. Which isn't the case in `Result` with its -`Error` constructor. Nethertheless, `Result.map` is implemented likewise, on +`Error` constructor. Nevertheless, `Result.map` is implemented likewise, on `Error`, it also behaves like identity. Here is its type: @@ -615,7 +615,7 @@ We would have: This means an error would be turned back into valid data or changed into another error. This is almost like recovering from an error. However, when recovery fails, it may be preferable to preserve the initial cause of failure. -That behaviour can be acheived by defining the following function: +That behaviour can be achieved by defining the following function: ```ocaml # let recover f = Result.(fold ~ok:ok ~error:(fun (e : 'e) -> Option.to_result ~none:e (f e)));; @@ -637,7 +637,7 @@ Note that some say the types `result` and `Either.t` are it's always possible to replace one by the other, like in a completely neutral refactoring. Values of type `result` and `Either.t` can be translated back and forth, and appling both translations one after the other, in any order, returns -to the starting value. Nethertheless, this doesn't mean `result` should be used +to the starting value. Nevertheless, this doesn't mean `result` should be used in place of `Either.t`, or vice versa. Naming things matters, as punned by Phil Karlton's famous quote: @@ -686,7 +686,7 @@ Variables `x` and `y` may appear in `c` in the three cases. The first form isn't very convenient, it uses a lot of parenthesis. The second one is often the prefered one due to its ressemblance with regular local definitions. The third one is harder to read, as `>>=` associates to the right in order to avoid -parenthesis in that precise case, but it's easy to get lost. Nethertheless, it +parenthesis in that precise case, but it's easy to get lost. Nevertheless, it has some appeal when named functions are used. It looks a bit like good old Unix pipes: ```ocaml From 962f032ce9c9616dada3af103abd72e91555c11c Mon Sep 17 00:00:00 2001 From: Cuihtlauac Alvarado Date: Wed, 26 Apr 2023 17:22:03 +0200 Subject: [PATCH 11/43] Apply suggestions from @dustanddreams Co-authored-by: Miod Vallat <118974489+dustanddreams@users.noreply.github.com> --- data/tutorials/lg_08_error_handling.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index 60cf72edcd..f69540f4ae 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -15,7 +15,7 @@ In OCaml, errors can be handled in several ways. This document presents most of Don't do that. -Some languages, most emblematically C, treats certain values as errors. For +Some languages, most emblematically C, treat certain values as errors. For instance, when receiving data through a network connection, a function expected to return the number of received bytes might return a negative number meaning: “timed out waiting”. Another example would be returning the empty string when @@ -165,7 +165,7 @@ let () = Printexc.record_backtrace true To print an exception, the module `Printexc` comes in handy. For instance, it allows the definition of a function such as `notify_user : (unit -> 'a) -> 'a` -that calls a function and, if it fails, print the exception on `stderr`. If +that calls a function and, if it fails, prints the exception on `stderr`. If stack traces are enabled, this function will also display it. ```ocaml From c98b795d9092480dd708a2bf58a37e71b6ef07e3 Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Thu, 27 Apr 2023 16:14:19 +0200 Subject: [PATCH 12/43] Feedback from @AshineFoster and @Tchou --- data/tutorials/lg_08_error_handling.md | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index f69540f4ae..47f41b1b5e 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -16,13 +16,15 @@ In OCaml, errors can be handled in several ways. This document presents most of Don't do that. Some languages, most emblematically C, treat certain values as errors. For -instance, when receiving data through a network connection, a function expected -to return the number of received bytes might return a negative number meaning: -“timed out waiting”. Another example would be returning the empty string when -extracting a substring of negative length. Great software was written using this -style, but it is not the proper way to deal with errors in OCaml. +instance in `man 2 read`, one can read: +> On error, -1 is returned, and `errno` is set to indicate the error. -OCaml has three major ways to deal with errors: +Great software was written using this style. However, since correct are errors +values can't be distinguished, nothing but the programmer's discipline ensures +errors aren't ignored. This has been the cause of many bugs, some with dire +consequences. This is not the proper way to deal with errors in OCaml. + +There are three major ways to make it impossible to ignore errors in OCaml: 1. Exceptions 1. `Option` values 1. `Result` values @@ -48,7 +50,7 @@ machine code. When implementing trial and error approaches likely to back-track often, exceptions can be used to achieve good performance. Exceptions belong to the type `exn` which is an [extensible sum -type](/releases/latest/manual/extensiblevariants.html). +type](/manual/extensiblevariants.html). ```ocaml # exception Foo of string;; @@ -250,7 +252,11 @@ the following terminology: * Function handling errors in data: Safe The main means to write such kind of safe error handling functions is to use -either `Option` (next section) or `Result` (following section). +either `Option` (next section) or `Result` (following section). Although +handling errors in data using those types allows avoiding the issues of error +values and execeptions, it incurs extracting the enclosed value at every step, which: +* may require some boilerplate code. This +* come with a runtime cost. ## Using the `Option` Type for Errors @@ -458,8 +464,7 @@ let bind opt f = match opt with | None -> None ``` -`bind` having flipped parameter with respect to `map` allows to use it as custom -let binder: +`bind` having flipped parameter with respect to `map` allows using it as a [binding operator](/manual/bindingops.html), which is an extension of OCaml providing means to create “custom `let`”. Here is how it goes: ```ocaml # let ( let* ) = Option.bind;; val ( let* ) : 'a option -> ('a -> 'b option) -> 'b option = From 94e07a3197e8ccd24255c25f14f23374f9427164 Mon Sep 17 00:00:00 2001 From: Cuihtlauac Alvarado Date: Fri, 5 May 2023 08:59:43 +0200 Subject: [PATCH 13/43] Apply suggestions from @christinerose Co-authored-by: Christine Rose --- data/tutorials/lg_08_error_handling.md | 32 +++++++++++++------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index 47f41b1b5e..305b0497aa 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -9,7 +9,7 @@ date: 2021-05-27T21:07:30-00:00 # Error Handling -In OCaml, errors can be handled in several ways. This document presents most of the available means. However, handling errors using the effect handlers introduced in OCaml 5 isn't addressed yet. +In OCaml, errors can be handled in several ways. This document presents most of the available means. However, handling errors using the effect handlers introduced in OCaml 5 hasn't been addressed yet. ## Error as Special Values @@ -63,7 +63,7 @@ val i_will_fail : unit -> 'a = Exception: Foo "Oh no!". ``` -Here, we add a variant `Foo` to the type `exn`, and create a function that will +Here, we add a variant `Foo` to the type `exn` and create a function that will raise this exception. Now, how do we handle exceptions? The construct is `try ... with ...`: @@ -106,7 +106,7 @@ exception Failure of string ``` * `Exit` terminates your program with a success status, which is 0 in Unices - (where success is 0 and any other value is an error, that is, errors are + (where success is 0 and any other value is an error; that is, errors are handled as special values, like mentioned in the first section) * `Not_found` should be raised when searching failed because there isn't anything satisfactory to be found @@ -128,8 +128,8 @@ exceptions, a design decision must be made: * Raise custom exceptions Both can make sense, and there isn't a general rule. If the standard library exceptions -are used, they must be raised under the conditions they are -intended to, otherwise handlers will have trouble processing them. Using custom +are used, they must be raised under their intended conditions, +otherwise handlers will have trouble processing them. Using custom exceptions will force client code to include dedicated catch conditions. This can be desirable for errors that must be handled at the client level. @@ -166,8 +166,8 @@ let () = Printexc.record_backtrace true ### Printing To print an exception, the module `Printexc` comes in handy. For instance, it -allows the definition of a function such as `notify_user : (unit -> 'a) -> 'a` -that calls a function and, if it fails, prints the exception on `stderr`. If +allows the definition of a function, such as `notify_user : (unit -> 'a) -> 'a` +to call a function and, if it fails, prints the exception on `stderr`. If stack traces are enabled, this function will also display it. ```ocaml @@ -529,8 +529,8 @@ type ('a, 'b) result = | Error of 'b ``` -A value `Ok x` means that the computation succeeded and produced `x`, and a -value `Error e` means that it failed and `e` represents whatever error +A value `Ok x` means that the computation succeeded and produced `x`, a +value `Error e` means that it failed, and `e` represents whatever error information has been collected in the process. Pattern matching can be used to deal with both cases, as with any other sum type. However using `map` and `bind` can be more convenient, maybe even more as it was with `Option`. @@ -766,25 +766,25 @@ match Sys.os_type with | _ -> failwith "this system is not supported" ``` -It is right to use `failwith` because using `assert` would be incorrect. Here is +It is right to use `failwith`, because using `assert` would be incorrect. Here is the dual example: ```ocaml function x when true -> () | _ -> assert false ``` -Here, it wouldn't be beneficial to use `failwith` since it requires the compiler -to be bugged or the system to be corrupted for second code path to be executed. -Breakage of the language semantics qualifies as extraordinary circumstances; it -is catastrophic. +Here, it wouldn't be beneficial to use `failwith` because it requires a corrupted system or +for the compiler to be bugged for the second code path to be executed. +Breakage of the language semantics qualifies as extraordinary circumstances. It +is catastrophic! # Concluding Remarks Properly handling errors is a complex matter. It is [cross-cutting concern](https://en.wikipedia.org/wiki/Cross-cutting_concern), touches all parts -of an application and can't be isolated in a dedicated module. In contrast to +of an application, and can't be isolated in a dedicated module. In contrast to several other mainstream languages, OCaml provides several mechanisms to handle exceptional circumstances, all with good runtime performance and code understandability. Using them properly requires some initial learning and practice. Later, it always requires some thinking, which is good since proper error management shouldn't ever be overlooked. No error handling is better than the others, and is should be matter of adequacy to the context rather than -of taste. But opiniated OCaml code is also fine, so it's a balance. +of taste. But opinionated OCaml code is also fine, so it's a balance. From 57bedcc7193000efec2b91c700f209f9b8fda924 Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Thu, 12 Jan 2023 10:15:47 +0100 Subject: [PATCH 14/43] Inception of a document about Seq --- data/tutorials/ds_05_seq.md | 71 +++++++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) create mode 100644 data/tutorials/ds_05_seq.md diff --git a/data/tutorials/ds_05_seq.md b/data/tutorials/ds_05_seq.md new file mode 100644 index 0000000000..7da4746ce3 --- /dev/null +++ b/data/tutorials/ds_05_seq.md @@ -0,0 +1,71 @@ +--- +id: Sequences +title: Sequences +description: > + Learn about one of OCaml's must used, built-in data types +category: "data-structures" +date: 2023-01-12T09:00:00-01:00 +--- + +# Sequences + +A sequence looks a lot like a list. However from a pragmatic perspective, one +should imagine it may be infinite. One way to look at a value of type `'a Seq.t` +is to consider it as an icicle, a frozen stream of data. To understand this +analogy, consider how sequences are defined in the standard libary: +```ocaml +type 'a node = + | Nil + | Cons of 'a * 'a t +and 'a t = unit -> 'a node +``` +This is the mutually recursive definition of two types; `Seq.node` which is almost +the same as `list`: +```ocaml +type 'a list = + | [] + | (::) of 'a * 'a list +``` +and `Seq.t` which is merely a type alias for `unit -> 'a Seq.node`. The whole +point of this definition is the type of second argument of `Seq.Cons`, in `list` +it is a list while in `Seq.t` it is function. Empty lists and empty sequence are +defined the same way (`Seq.Nil` and `[]`). Non-empty lists are non-empty +sequences values are both pairs those former member is a piece of data. But non +empty sequence values have a sequence returning function as latter member +instead of a list. That function is the frozen part of the sequence. When a +non-empty sequence is processed, access to data at the tip of the sequence is +immediate, but access to the rest of the sequence is deferred. To access the +tail of non-empty sequence, it has to be microwaved, that is, the tail returning +function must be passed a `unit` value. + +Having frozen-by-function tails explains why sequences should be considered +potentially infinite. Unless a `Seq.Nil` has been found in the sequence, one +can't say for sure if some will ever appear. The tail could be a stream of +client requests in a server, readings from an embedded sensor or logs. All have +unforseenable termination and should be considered infinite. + +Here is how to build seemingly infinite sequences of integers. +```ocaml +# let rec ints_from n : int Seq.t = fun () -> Seq.Cons (n, ints_from (n + 1));; +val ints_from : int -> int Seq.t = +# let ints = ints_from 0;; +val ints : ints Seq.t = +``` +The function `ints_from n` looks as if building the infinite sequence $(n; n + +1; n + 2; n + 3;...)$ while the value `ints` looks as if representing the +infinite sequence $(0; 1; 2; 3; ...)$. In reality, since there isn't an infinite +amount of distinct values of type `int`, those sequences are not increasing, +when reaching `max_int` the values will circle down to `min_int`, actually they +are ultimately periodic. + +The OCaml standard library contains a module on sequences called `Seq`. It contains an `Seq.iter` function, which has the same behaviour as `List.iter`. Writting this +```ocaml +# Seq.iter print_int ints;; +``` +in an OCaml toplevel actually means: “print integers forever” and you have to +type `Crtl-C` to interrupt the execution. Perhaps more interestingly, the +following code is an infinite loop: +```ocaml +# Seq.iter ignore ints;; +``` +But the key point is: it doesn't leak memory. From 1faf742f05eda0a92004b44fde19f9940de44c53 Mon Sep 17 00:00:00 2001 From: Cuihtlauac Alvarado Date: Thu, 12 Jan 2023 20:56:38 +0100 Subject: [PATCH 15/43] Apply suggestions from code review Co-authored-by: Christine Rose --- data/tutorials/ds_05_seq.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/data/tutorials/ds_05_seq.md b/data/tutorials/ds_05_seq.md index 7da4746ce3..f82f05f232 100644 --- a/data/tutorials/ds_05_seq.md +++ b/data/tutorials/ds_05_seq.md @@ -19,50 +19,50 @@ type 'a node = | Cons of 'a * 'a t and 'a t = unit -> 'a node ``` -This is the mutually recursive definition of two types; `Seq.node` which is almost +This is the mutually recursive definition of two types: `Seq.node`, which is almost the same as `list`: ```ocaml type 'a list = | [] | (::) of 'a * 'a list ``` -and `Seq.t` which is merely a type alias for `unit -> 'a Seq.node`. The whole -point of this definition is the type of second argument of `Seq.Cons`, in `list` -it is a list while in `Seq.t` it is function. Empty lists and empty sequence are +and `Seq.t`, which is merely a type alias for `unit -> 'a Seq.node`. The whole +point of this definition is `Seq.Cons` second argument type; in `list`; in `list` +it is a list, while in `Seq.t`, it is a function. Empty lists and empty sequence are defined the same way (`Seq.Nil` and `[]`). Non-empty lists are non-empty sequences values are both pairs those former member is a piece of data. But non empty sequence values have a sequence returning function as latter member instead of a list. That function is the frozen part of the sequence. When a non-empty sequence is processed, access to data at the tip of the sequence is immediate, but access to the rest of the sequence is deferred. To access the -tail of non-empty sequence, it has to be microwaved, that is, the tail returning +tail of non-empty sequence, it has to be microwaved, i.e., the tail returning function must be passed a `unit` value. Having frozen-by-function tails explains why sequences should be considered potentially infinite. Unless a `Seq.Nil` has been found in the sequence, one can't say for sure if some will ever appear. The tail could be a stream of -client requests in a server, readings from an embedded sensor or logs. All have +client requests in a server, readings from an embedded sensor, or logs. All have unforseenable termination and should be considered infinite. -Here is how to build seemingly infinite sequences of integers. +Here is how to build seemingly infinite sequences of integers: ```ocaml # let rec ints_from n : int Seq.t = fun () -> Seq.Cons (n, ints_from (n + 1));; val ints_from : int -> int Seq.t = # let ints = ints_from 0;; val ints : ints Seq.t = ``` -The function `ints_from n` looks as if building the infinite sequence $(n; n + -1; n + 2; n + 3;...)$ while the value `ints` looks as if representing the -infinite sequence $(0; 1; 2; 3; ...)$. In reality, since there isn't an infinite +The function `ints_from n` looks as if building the infinite sequence `$(n; n + +1; n + 2; n + 3;...)$`, while the value `ints` looks as if representing the +infinite sequence `$(0; 1; 2; 3; ...)$`. In reality, since there isn't an infinite amount of distinct values of type `int`, those sequences are not increasing, -when reaching `max_int` the values will circle down to `min_int`, actually they +when reaching `max_int` the values will circle down to `min_int`. Actually, they are ultimately periodic. -The OCaml standard library contains a module on sequences called `Seq`. It contains an `Seq.iter` function, which has the same behaviour as `List.iter`. Writting this +The OCaml standard library contains a module on sequences called `Seq`. It contains an `Seq.iter` function, which has the same behaviour as `List.iter`. Writing this ```ocaml # Seq.iter print_int ints;; ``` -in an OCaml toplevel actually means: “print integers forever” and you have to +in an OCaml toplevel actually means “print integers forever,” and you have to type `Crtl-C` to interrupt the execution. Perhaps more interestingly, the following code is an infinite loop: ```ocaml From 2e510fba5a4a63e0ff4c82fd4cc8eee8d2208f79 Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Tue, 17 Jan 2023 18:32:26 +0100 Subject: [PATCH 16/43] Add sieve of Eratosthenes example --- data/tutorials/ds_05_seq.md | 152 ++++++++++++++++++++++++++++-------- 1 file changed, 118 insertions(+), 34 deletions(-) diff --git a/data/tutorials/ds_05_seq.md b/data/tutorials/ds_05_seq.md index f82f05f232..8ba653ab57 100644 --- a/data/tutorials/ds_05_seq.md +++ b/data/tutorials/ds_05_seq.md @@ -2,70 +2,154 @@ id: Sequences title: Sequences description: > - Learn about one of OCaml's must used, built-in data types + Learn about one an OCaml's must used, built-in data types category: "data-structures" date: 2023-01-12T09:00:00-01:00 --- # Sequences -A sequence looks a lot like a list. However from a pragmatic perspective, one -should imagine it may be infinite. One way to look at a value of type `'a Seq.t` -is to consider it as an icicle, a frozen stream of data. To understand this -analogy, consider how sequences are defined in the standard libary: +## Introduction + +Sequences look a lot like lists. However from a pragmatic perspective, one +should imagine they may be infinite. That's the key intuition to understanding +and using sequences. + +One way to look at a value of type `'a Seq.t` is to consider it as an icicle, a +frozen stream of data. To understand this analogy, consider how sequences are +defined in the standard library: ```ocaml type 'a node = | Nil | Cons of 'a * 'a t and 'a t = unit -> 'a node ``` -This is the mutually recursive definition of two types: `Seq.node`, which is almost -the same as `list`: +This is the mutually recursive definition of two types; `Seq.node` which is +almost the same as `list`: ```ocaml type 'a list = | [] | (::) of 'a * 'a list ``` -and `Seq.t`, which is merely a type alias for `unit -> 'a Seq.node`. The whole -point of this definition is `Seq.Cons` second argument type; in `list`; in `list` -it is a list, while in `Seq.t`, it is a function. Empty lists and empty sequence are -defined the same way (`Seq.Nil` and `[]`). Non-empty lists are non-empty -sequences values are both pairs those former member is a piece of data. But non -empty sequence values have a sequence returning function as latter member -instead of a list. That function is the frozen part of the sequence. When a -non-empty sequence is processed, access to data at the tip of the sequence is -immediate, but access to the rest of the sequence is deferred. To access the -tail of non-empty sequence, it has to be microwaved, i.e., the tail returning -function must be passed a `unit` value. - -Having frozen-by-function tails explains why sequences should be considered +and `Seq.t` which is merely a type alias for `unit -> 'a Seq.node`. The whole +point of this definition is the type of the second argument of `Seq.Cons`, which +is a function returning a sequence while its `list` sibling is a list. Let's +compare the constructors of `list` and `Seq.node`: +1. Empty lists and sequences are defined the same way, a constructor without any + parameter: `Seq.Nil` and `[]`. +1. Non-empty lists and sequences are both pairs whose former member is a piece + of data; +1. but the latter member, in lists, is a `list` too, while in sequences, it is a + function returning a `Seq.node`. + +A value of type `Seq.t` is “frozen” because the data it contains isn't +immediately available, a `unit` value has to be supplied to recover it, and +that's “unfreezing”. However, unfreezing only gives access to the tip of the +icicle, since the second argument of `Seq.Cons` is a function too. + +Having frozen-by-function tails explains why sequences may be considered potentially infinite. Unless a `Seq.Nil` has been found in the sequence, one -can't say for sure if some will ever appear. The tail could be a stream of -client requests in a server, readings from an embedded sensor, or logs. All have -unforseenable termination and should be considered infinite. +can't say for sure if some will ever appear. The sequence could be a stream of +client requests in a server, readings from an embedded sensor or system logs. +All have unforeseeable termination and it is easier to consider them infinite. Here is how to build seemingly infinite sequences of integers: ```ocaml -# let rec ints_from n : int Seq.t = fun () -> Seq.Cons (n, ints_from (n + 1));; +# let rec ints_from n : int Seq.t = fun () -> Seq.Cons (n, ints_from (n + 1)) + let ints = ints_from 0;; val ints_from : int -> int Seq.t = -# let ints = ints_from 0;; val ints : ints Seq.t = ``` -The function `ints_from n` looks as if building the infinite sequence `$(n; n + -1; n + 2; n + 3;...)$`, while the value `ints` looks as if representing the -infinite sequence `$(0; 1; 2; 3; ...)$`. In reality, since there isn't an infinite +The function `ints_from n` looks as if building the infinite sequence +$(n; n + 1; n + 2; n + 3;...)$ +while the value `ints` look as if representing the +infinite sequence $(0; 1; 2; 3; ...)$. In reality, since there isn't an infinite amount of distinct values of type `int`, those sequences are not increasing, -when reaching `max_int` the values will circle down to `min_int`. Actually, they -are ultimately periodic. +when reaching `max_int` the values will circle down to `min_int`. They are +ultimately periodic. -The OCaml standard library contains a module on sequences called `Seq`. It contains an `Seq.iter` function, which has the same behaviour as `List.iter`. Writing this +The OCaml standard library contains a module on sequences called `Seq`. It +contains a `Seq.iter` function, which has the same behaviour as `List.iter`. +Writing this: ```ocaml # Seq.iter print_int ints;; ``` -in an OCaml toplevel actually means “print integers forever,” and you have to -type `Crtl-C` to interrupt the execution. Perhaps more interestingly, the -following code is an infinite loop: +in an OCaml top-level means: “print integers forever” and you have to type +`Crtl-C` to interrupt the execution. Perhaps more interestingly, the following +code is also an infinite loop: ```ocaml # Seq.iter ignore ints;; ``` But the key point is: it doesn't leak memory. + +## Example + +Strangely, the `Seq` module of the OCaml standard library does not (yet) define +a function returning the elements at the beginning of a sequence. Here is a +possible implementation: +```ocaml +let rec take n seq () = match seq () with +| Seq.Cons (x, seq) when n > 0 -> Seq.Cons (x, take (n - 1) seq) +| _ -> Seq.Nil +``` +`take n seq` returns, at most, the `n` first elements of the sequence `seq`. If +`seq` contains less than `n` elements, an identical sequence is returned. In +particular, if `seq` is empty, an empty sequence is returned. + +Observe the first line of `take`, it is the common pattern for recursive +functions over sequences. The last two parameters are: +* a sequence called `seq`; +* a `unit` value. + +When executed, the function begins by unfreezing `seq` (that is, calling `seq +()`) and then pattern match to look inside the data made available. However, +this does not happen unless a `unit` parameter is passed to `take`. Writing +`take 10 seq` does not compute anything, it is a partial application and returns +a function needing a `unit` to produce a result. + +This can be used to print integers without looping forever as shown previously: +```ocaml +# ints |> take 43 |> List.of_seq;; +- : int list = +[0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16; 17; 18; 19; 20; 21; + 22; 23; 24; 25; 26; 27; 28; 29; 30; 31; 32; 33; 34; 35; 36; 37; 38; 39; 40; + 41; 42] +``` + +The `Seq` module has a function `Seq.filter`: +```ocaml +# Seq.filter;; +- : ('a -> bool) -> 'a Seq.t -> 'a Seq.t = +``` +It builds a sequence of elements satisfying a condition. + +Using `Seq.filter`, it is possible to make a straightforward implementation of the +[Sieve of Eratosthenes](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes). +Here it is: +```ocaml +let rec sieve seq () = match seq () with +| Seq.Cons (m, seq) -> Seq.Cons (m, sieve (Seq.filter (fun n -> n mod m > 0) seq)) +| seq -> seq;; +let facts = ints_from 2 |> sieve +``` + +This code can be used to generate lists of prime numbers. For instance, here is +the list of 100 first prime numbers: +```ocaml +# facts |> take 100 |> List.of_seq;; +- : int list = +[2; 3; 5; 7; 11; 13; 17; 19; 23; 29; 31; 37; 41; 43; 47; 53; 59; 61; 67; 71; + 73; 79; 83; 89; 97; 101; 103; 107; 109; 113; 127; 131; 137; 139; 149; 151; + 157; 163; 167; 173; 179; 181; 191; 193; 197; 199; 211; 223; 227; 229; 233; + 239; 241; 251; 257; 263; 269; 271; 277; 281; 283; 293; 307; 311; 313; 317; + 331; 337; 347; 349; 353; 359; 367; 373; 379; 383; 389; 397; 401; 409; 419; + 421; 431; 433; 439; 443; 449; 457; 461; 463; 467; 479; 487; 491; 499; 503; + 509; 521; 523] +``` + +The function `sieve` is recursive, in OCaml and common senses: it is defined +using the `rec` keyword and calls itself. However, some call that kind of +function “corecursive”. This word is used to emphasize that, by design, it does +not terminate. Strictly speaking, the sieve of Eratosthenes is not an +algorithm either since it does not terminate. This implementation behaves the +same. From a2f1e8a9ca3eca78a9e520712619c5ecb0b9102b Mon Sep 17 00:00:00 2001 From: Cuihtlauac Alvarado Date: Wed, 18 Jan 2023 08:14:19 +0100 Subject: [PATCH 17/43] Update data/tutorials/ds_05_seq.md Co-authored-by: Christine Rose --- data/tutorials/ds_05_seq.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/data/tutorials/ds_05_seq.md b/data/tutorials/ds_05_seq.md index 8ba653ab57..fad4a37d83 100644 --- a/data/tutorials/ds_05_seq.md +++ b/data/tutorials/ds_05_seq.md @@ -2,7 +2,7 @@ id: Sequences title: Sequences description: > - Learn about one an OCaml's must used, built-in data types + Learn about an OCaml's most-used, built-in data types category: "data-structures" date: 2023-01-12T09:00:00-01:00 --- From f966704b22cc408340b4863ad7aa2ac0b7de8c2c Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Wed, 18 Jan 2023 17:45:44 +0100 Subject: [PATCH 18/43] Add unfold and conversion sections --- data/tutorials/ds_05_seq.md | 140 +++++++++++++++++++++++++++++------- 1 file changed, 115 insertions(+), 25 deletions(-) diff --git a/data/tutorials/ds_05_seq.md b/data/tutorials/ds_05_seq.md index fad4a37d83..182277ca33 100644 --- a/data/tutorials/ds_05_seq.md +++ b/data/tutorials/ds_05_seq.md @@ -9,6 +9,16 @@ date: 2023-01-12T09:00:00-01:00 # Sequences +## Prerequisites + +| Concept | Status | Documentation | Reference | +|---|---|---|---| +| Basic types | Mandatory | | | +| Functions | Mandatory | | | +| Lists | Mandatory | | | +| Options | Recommended | | | +| Arrays | Nice to have | | | + ## Introduction Sequences look a lot like lists. However from a pragmatic perspective, one @@ -48,49 +58,47 @@ that's “unfreezing”. However, unfreezing only gives access to the tip of the icicle, since the second argument of `Seq.Cons` is a function too. Having frozen-by-function tails explains why sequences may be considered -potentially infinite. Unless a `Seq.Nil` has been found in the sequence, one -can't say for sure if some will ever appear. The sequence could be a stream of -client requests in a server, readings from an embedded sensor or system logs. +potentially infinite. Until a `Seq`.Nil` value has been found in the sequence, +one can't say for sure if some will ever appear. The sequence could be a stream +of client requests in a server, readings from an embedded sensor or system logs. All have unforeseeable termination and it is easier to consider them infinite. Here is how to build seemingly infinite sequences of integers: ```ocaml -# let rec ints_from n : int Seq.t = fun () -> Seq.Cons (n, ints_from (n + 1)) - let ints = ints_from 0;; -val ints_from : int -> int Seq.t = -val ints : ints Seq.t = +# let rec ints n : int Seq.t = fun () -> Seq.Cons (n, ints_from (n + 1)) +val ints : int -> int Seq.t = ``` -The function `ints_from n` looks as if building the infinite sequence -$(n; n + 1; n + 2; n + 3;...)$ -while the value `ints` look as if representing the -infinite sequence $(0; 1; 2; 3; ...)$. In reality, since there isn't an infinite +The function `ints n` look as if building the infinite sequence +$(n; n + 1; n + 2; n + 3;...)$. In reality, since there isn't an infinite amount of distinct values of type `int`, those sequences are not increasing, when reaching `max_int` the values will circle down to `min_int`. They are -ultimately periodic. +ultimately periodic. -The OCaml standard library contains a module on sequences called `Seq`. It -contains a `Seq.iter` function, which has the same behaviour as `List.iter`. -Writing this: +The OCaml standard library contains a module on sequences called +[`Seq`](/releases/5.0/api/Seq.html). It contains a `Seq.iter` function, which +has the same behaviour as `List.iter`. Writing this: ```ocaml -# Seq.iter print_int ints;; +# Seq.iter print_int (ints 0);; ``` in an OCaml top-level means: “print integers forever” and you have to type `Crtl-C` to interrupt the execution. Perhaps more interestingly, the following code is also an infinite loop: ```ocaml -# Seq.iter ignore ints;; +# Seq.iter ignore (ints 0);; ``` But the key point is: it doesn't leak memory. ## Example -Strangely, the `Seq` module of the OCaml standard library does not (yet) define +The `Seq` module of the OCaml standard library contains + +does not (yet) define a function returning the elements at the beginning of a sequence. Here is a possible implementation: ```ocaml let rec take n seq () = match seq () with -| Seq.Cons (x, seq) when n > 0 -> Seq.Cons (x, take (n - 1) seq) -| _ -> Seq.Nil + | Seq.Cons (x, seq) when n > 0 -> Seq.Cons (x, take (n - 1) seq) + | _ -> Seq.Nil ``` `take n seq` returns, at most, the `n` first elements of the sequence `seq`. If `seq` contains less than `n` elements, an identical sequence is returned. In @@ -109,14 +117,14 @@ a function needing a `unit` to produce a result. This can be used to print integers without looping forever as shown previously: ```ocaml -# ints |> take 43 |> List.of_seq;; +# Seq.ints 0 |> Seq.take 43 |> List.of_seq;; - : int list = [0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16; 17; 18; 19; 20; 21; 22; 23; 24; 25; 26; 27; 28; 29; 30; 31; 32; 33; 34; 35; 36; 37; 38; 39; 40; 41; 42] ``` -The `Seq` module has a function `Seq.filter`: +The `Seq` module also has a function `Seq.filter`: ```ocaml # Seq.filter;; - : ('a -> bool) -> 'a Seq.t -> 'a Seq.t = @@ -128,9 +136,9 @@ Using `Seq.filter`, it is possible to make a straightforward implementation of t Here it is: ```ocaml let rec sieve seq () = match seq () with -| Seq.Cons (m, seq) -> Seq.Cons (m, sieve (Seq.filter (fun n -> n mod m > 0) seq)) -| seq -> seq;; -let facts = ints_from 2 |> sieve + | Seq.Cons (m, seq) -> Seq.Cons (m, sieve (Seq.filter (fun n -> n mod m > 0) seq)) + | seq -> seq +let facts = ints_from 2 |> sieve;; ``` This code can be used to generate lists of prime numbers. For instance, here is @@ -153,3 +161,85 @@ function “corecursive”. This word is used to emphasize that, by design, it d not terminate. Strictly speaking, the sieve of Eratosthenes is not an algorithm either since it does not terminate. This implementation behaves the same. + +## Unfolding Sequences + +Standard higher-order iteration functions are available on Sequences. For instance: +* `Seq.iter` +* `Seq.map` +* `Seq.fold_left` + +All those are also available for `Array`, `List` and `Set`. Since OCaml 4.11 +sequences have something which isn't (yet) available on those: `unfold`. Here is +how it is implemented: +```ocaml +let rec unfold f seq () = match f seq with + | None -> Nil + | Some (x, seq) -> Cons (x, unfold f seq) +``` +And here is its type: +```ocaml +val unfold : ('a -> ('b * 'a) option) -> 'a -> 'b Seq.t = +``` +Unlike previously mentioned iterators `Seq.unfold` does not have a sequence +parameter, but a sequence result. `unfold` provides a general means to build +sequences. For instance, `Seq.ints` can be implemented using `Seq.unfold` in a +fairly compact way: +```ocaml +let ints = Seq.unfold (fun n -> Some (n, n + 1));; +``` + +As a fun fact, observe `map` over sequences can be implemented +using `Seq.unfold`. Here is how to write it: +```ocaml +# let map f = Seq.unfold (fun seq -> seq |> Seq.uncons |> Option.map (fun (x, y) -> (f x, y)));; +val map : ('a -> 'b) -> 'a Seq.t -> 'b Seq.t = +``` +Here is a quick check: +```ocaml +# Seq.ints 0 |> map (fun x -> x * x) |> Seq.take 10 |> List.of_seq;; +- : int list = [0; 1; 4; 9; 16; 25; 36; 49; 64; 81] +``` + +Using this function: +```ocaml +let input_line_opt chan = + try Some (input_line chan, chan) + with End_of_file -> close_in chan; None +``` +It is possible to read a file using `Seq.unfold`: +```ocaml +"README.md" |> open_in |> Seq.unfold input_line_opt |> Seq.iter print_endline +``` + +Although this can be an appealing style, bear in mind it does not prevent from +taking care of open files. While the code above is fine, this one no longer is: +```ocaml +"README.md" |> open_in |> Seq.unfold input_line_opt |> Seq.take 10 |> Seq.iter print_endline +``` +Here, `close_in` will never be called over the input channel opened on `README.md`. + + +## Sequences for Conversions + +Throughout the standard library, sequences are used as a bridge to perform +conversions between many datatypes. For instance, here are the signatures of +some of those functions: +* Lists + ```ocaml + val List.of_seq : 'a list -> 'a Seq.t + val List.to_seq : 'a Seq.t -> 'a list + ``` +* Arrays + ```ocaml + val Array.of_seq : 'a array -> 'a Seq.t + val Array.to_seq : 'a Seq.t -> 'a array + ``` +* Strings + ```ocaml + val String.of_seq : string -> char Seq.t + val String.to_seq : char Seq.t -> string + ``` +Similar functions are also provided for sets, maps, hash tables (`Hashtbl`) and +others (except `Seq`, obviously). When implementing a datatype module, it is +advised to expose `to_seq` and `of_seq` functions. From 1bd800d7eb625e7df15d7890cadd799af260b588 Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Fri, 3 Feb 2023 17:25:22 +0100 Subject: [PATCH 19/43] Cleanup --- data/tutorials/ds_05_seq.md | 54 +++++++++++++++++++++++++------------ 1 file changed, 37 insertions(+), 17 deletions(-) diff --git a/data/tutorials/ds_05_seq.md b/data/tutorials/ds_05_seq.md index 182277ca33..3aff10e4c5 100644 --- a/data/tutorials/ds_05_seq.md +++ b/data/tutorials/ds_05_seq.md @@ -58,21 +58,27 @@ that's “unfreezing”. However, unfreezing only gives access to the tip of the icicle, since the second argument of `Seq.Cons` is a function too. Having frozen-by-function tails explains why sequences may be considered -potentially infinite. Until a `Seq`.Nil` value has been found in the sequence, +potentially infinite. Until a `Seq.Nil` value has been found in the sequence, one can't say for sure if some will ever appear. The sequence could be a stream -of client requests in a server, readings from an embedded sensor or system logs. +of incoming requests in a server, readings from an embedded sensor or system logs. All have unforeseeable termination and it is easier to consider them infinite. +In OCaml, any value `a` of type `t` can be turned into a constant function by +writing `fun _ -> a`, which has type `'a -> t`. When writing `fun () -> a` +instead, we get a function of type `unit -> t`. Such a function is called a +[_thunk_](https://en.wikipedia.org/wiki/Thunk). Using this terminology, sequence +values are thunks. With the analogy used earlier, `a` is frozen in its thunk. + Here is how to build seemingly infinite sequences of integers: ```ocaml -# let rec ints n : int Seq.t = fun () -> Seq.Cons (n, ints_from (n + 1)) +# let rec ints n : int Seq.t = fun () -> Seq.Cons (n, ints (n + 1)) val ints : int -> int Seq.t = ``` The function `ints n` look as if building the infinite sequence $(n; n + 1; n + 2; n + 3;...)$. In reality, since there isn't an infinite amount of distinct values of type `int`, those sequences are not increasing, when reaching `max_int` the values will circle down to `min_int`. They are -ultimately periodic. +ultimately periodic. The OCaml standard library contains a module on sequences called [`Seq`](/releases/5.0/api/Seq.html). It contains a `Seq.iter` function, which @@ -90,11 +96,9 @@ But the key point is: it doesn't leak memory. ## Example -The `Seq` module of the OCaml standard library contains - -does not (yet) define -a function returning the elements at the beginning of a sequence. Here is a -possible implementation: +The `Seq` module of the OCaml standard library contains the definition of the +function `Seq.take` which returns a specified number of elements from the +beginning of a sequence. Here is a simplified implementation: ```ocaml let rec take n seq () = match seq () with | Seq.Cons (x, seq) when n > 0 -> Seq.Cons (x, take (n - 1) seq) @@ -164,14 +168,16 @@ same. ## Unfolding Sequences -Standard higher-order iteration functions are available on Sequences. For instance: +Standard higher-order iteration functions are available on sequences. For +instance: * `Seq.iter` * `Seq.map` * `Seq.fold_left` -All those are also available for `Array`, `List` and `Set`. Since OCaml 4.11 -sequences have something which isn't (yet) available on those: `unfold`. Here is -how it is implemented: +All those are also available for `Array`, `List` and `Set` and behave +essentially the same. Observe that there is no `fold_right` function. Since +OCaml 4.11 there is something which isn't (yet) available on other types: +`unfold`. Here is how it is implemented: ```ocaml let rec unfold f seq () = match f seq with | None -> Nil @@ -189,10 +195,10 @@ fairly compact way: let ints = Seq.unfold (fun n -> Some (n, n + 1));; ``` -As a fun fact, observe `map` over sequences can be implemented -using `Seq.unfold`. Here is how to write it: +As a fun fact, one should observe `map` over sequences can be implemented using +`Seq.unfold`. Here is how to write it: ```ocaml -# let map f = Seq.unfold (fun seq -> seq |> Seq.uncons |> Option.map (fun (x, y) -> (f x, y)));; +# let map f = Seq.unfold (fun s -> s |> Seq.uncons |> Option.map (fun (x, y) -> (f x, y)));; val map : ('a -> 'b) -> 'a Seq.t -> 'b Seq.t = ``` Here is a quick check: @@ -200,6 +206,7 @@ Here is a quick check: # Seq.ints 0 |> map (fun x -> x * x) |> Seq.take 10 |> List.of_seq;; - : int list = [0; 1; 4; 9; 16; 25; 36; 49; 64; 81] ``` +The function `Seq.uncons` returns the head and tail of a sequence if it is not empty or `None` otherwise. Using this function: ```ocaml @@ -219,7 +226,6 @@ taking care of open files. While the code above is fine, this one no longer is: ``` Here, `close_in` will never be called over the input channel opened on `README.md`. - ## Sequences for Conversions Throughout the standard library, sequences are used as a bridge to perform @@ -243,3 +249,17 @@ some of those functions: Similar functions are also provided for sets, maps, hash tables (`Hashtbl`) and others (except `Seq`, obviously). When implementing a datatype module, it is advised to expose `to_seq` and `of_seq` functions. + +## Miscellaneous + +There are a couple of related Libraries, all providing means to handle large +flows of data: + +* Rizo I [Streaming](/p/streaming) +* Gabriel Radanne [Iter](/p/iter) +* Jane Street `Base.Sequence` + +There used to be a module called [`Stream`](/releases/4.13/api/Stream.html) in +the OCaml standard library. It was +[removed](https://github.com/ocaml/ocaml/pull/10482) in 2021 with the release of +OCaml 4.14. Beware books and documentation written before may still mention it. From f764b994d38d31a3209b600c43e45c8d1f126ad3 Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Fri, 3 Feb 2023 17:46:10 +0100 Subject: [PATCH 20/43] Add corecursion link --- data/tutorials/ds_05_seq.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/data/tutorials/ds_05_seq.md b/data/tutorials/ds_05_seq.md index 3aff10e4c5..6a6efc0b0f 100644 --- a/data/tutorials/ds_05_seq.md +++ b/data/tutorials/ds_05_seq.md @@ -161,10 +161,10 @@ the list of 100 first prime numbers: The function `sieve` is recursive, in OCaml and common senses: it is defined using the `rec` keyword and calls itself. However, some call that kind of -function “corecursive”. This word is used to emphasize that, by design, it does -not terminate. Strictly speaking, the sieve of Eratosthenes is not an -algorithm either since it does not terminate. This implementation behaves the -same. +function [_corecursive_](https://en.wikipedia.org/wiki/Corecursion). This word +is used to emphasize that, by design, it does not terminate. Strictly speaking, +the sieve of Eratosthenes is not an algorithm either since it does not +terminate. This implementation behaves the same. ## Unfolding Sequences From 9850ccaa4901b5cb7ff30f3f1e4539cf1768050c Mon Sep 17 00:00:00 2001 From: Cuihtlauac Alvarado Date: Mon, 6 Feb 2023 08:02:12 +0100 Subject: [PATCH 21/43] Apply suggestions from code review Thanks @dustanddreams, all suggestions merged Co-authored-by: Miod Vallat <118974489+dustanddreams@users.noreply.github.com> --- data/tutorials/ds_05_seq.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/data/tutorials/ds_05_seq.md b/data/tutorials/ds_05_seq.md index 6a6efc0b0f..e53c65e064 100644 --- a/data/tutorials/ds_05_seq.md +++ b/data/tutorials/ds_05_seq.md @@ -43,7 +43,7 @@ type 'a list = ``` and `Seq.t` which is merely a type alias for `unit -> 'a Seq.node`. The whole point of this definition is the type of the second argument of `Seq.Cons`, which -is a function returning a sequence while its `list` sibling is a list. Let's +is a function returning a sequence while its `list` counterpart returns a list. Let's compare the constructors of `list` and `Seq.node`: 1. Empty lists and sequences are defined the same way, a constructor without any parameter: `Seq.Nil` and `[]`. @@ -87,7 +87,7 @@ has the same behaviour as `List.iter`. Writing this: # Seq.iter print_int (ints 0);; ``` in an OCaml top-level means: “print integers forever” and you have to type -`Crtl-C` to interrupt the execution. Perhaps more interestingly, the following +`Ctrl-C` to interrupt the execution. Perhaps more interestingly, the following code is also an infinite loop: ```ocaml # Seq.iter ignore (ints 0);; @@ -252,7 +252,7 @@ advised to expose `to_seq` and `of_seq` functions. ## Miscellaneous -There are a couple of related Libraries, all providing means to handle large +There are a couple of related libraries, all providing means to handle large flows of data: * Rizo I [Streaming](/p/streaming) From b06bd438957289ccede15a46c2a30b6824d1956b Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Mon, 6 Feb 2023 09:11:35 +0100 Subject: [PATCH 22/43] Add fibs example --- data/tutorials/ds_05_seq.md | 48 ++++++++++++++++++++++++++++++++++++- 1 file changed, 47 insertions(+), 1 deletion(-) diff --git a/data/tutorials/ds_05_seq.md b/data/tutorials/ds_05_seq.md index e53c65e064..55dfab4f15 100644 --- a/data/tutorials/ds_05_seq.md +++ b/data/tutorials/ds_05_seq.md @@ -1,6 +1,6 @@ --- id: Sequences -title: Sequences +title: sequences description: > Learn about an OCaml's most-used, built-in data types category: "data-structures" @@ -226,6 +226,52 @@ taking care of open files. While the code above is fine, this one no longer is: ``` Here, `close_in` will never be called over the input channel opened on `README.md`. +## Sequences are Functions + +Although this looks like a possible way to define the [Fibonacci +sequence](https://en.wikipedia.org/wiki/Fibonacci_number): +```ocaml +# let rec fibs m n = Seq.cons m (fibs n (n + m));; +val fibs : int -> int -> int Seq.t = +``` +It actually isn't. It's a non-ending recursion which blows away the stack. +``` +# fibs 0 1;; +Stack overflow during evaluation (looping recursion?). +``` +This definition is behaving as expected: +```ocaml +# let rec fibs m n () = Seq.Cons (m, fibs n (n + m));; +val fibs : int -> int -> int Seq.t = +``` +It can be used to produce some Fibonacci numbers: +```ocaml +# fibs 0 1 |> Seq.take 10 |> List.of_seq;; +- : int list = [0; 1; 1; 2; 3; 5; 8; 13; 21; 34] +``` +Why is it so? The key difference lies in the recursive call `fibs n (n + m)`. In +the former definition, the application is complete, `fibs` is provided all the +arguments it expects; in the latter definition, the application is partial, the +`()` argument is missing. Since evaluation is +[eager](https://en.wikipedia.org/wiki/Evaluation_strategy#Eager_evaluation) in +OCaml, in the former case, evaluation of the recursive call is triggered, and +non-terminating looping occurs. In contrast, in the latter case, the partially +applied function is immediately returned as a +[closure](https://en.wikipedia.org/wiki/Closure_(computer_programming)). + +Sequences are functions, as stated by their type: +```ocaml +# #show Seq.t;; +type 'a t = unit -> 'a Seq.node +``` +Functions working with sequences must be written accordingly. +* Sequence consumer: partially applied function parameter +* Sequence producer: partially applied function result + +When code dealing with sequences does not behave as expected, in particular, if +it is crashing or hanging, there's a fair chance a mistake like in the first +definition of `fibs` was made. + ## Sequences for Conversions Throughout the standard library, sequences are used as a bridge to perform From 5165fef221796239aa41ef78d8b54c8441d371b2 Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Mon, 6 Feb 2023 09:57:16 +0100 Subject: [PATCH 23/43] Fix Typo --- data/tutorials/ds_05_seq.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/data/tutorials/ds_05_seq.md b/data/tutorials/ds_05_seq.md index 55dfab4f15..8a686cd253 100644 --- a/data/tutorials/ds_05_seq.md +++ b/data/tutorials/ds_05_seq.md @@ -1,6 +1,6 @@ --- -id: Sequences -title: sequences +id: sequences +title: Sequences description: > Learn about an OCaml's most-used, built-in data types category: "data-structures" From 74308d6d7c0bb87c2101876b293665d51b13f4c1 Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Mon, 6 Feb 2023 18:56:41 +0100 Subject: [PATCH 24/43] Add exercises --- data/problems/diag.md | 30 ++++++++++++++++++ data/problems/stream.md | 61 +++++++++++++++++++++++++++++++++++++ data/tutorials/ds_05_seq.md | 8 ++++- 3 files changed, 98 insertions(+), 1 deletion(-) create mode 100644 data/problems/diag.md create mode 100644 data/problems/stream.md diff --git a/data/problems/diag.md b/data/problems/diag.md new file mode 100644 index 0000000000..8fb4fdec1a --- /dev/null +++ b/data/problems/diag.md @@ -0,0 +1,30 @@ +--- +title: Diagonal of a Sequence of Sequences +number: "100" +difficulty: intermediate +tags: [ "seq" ] +--- + +# Solution + +```ocaml +let rec diag seq_seq () = + let hds, tls = Seq.filter_map Seq.uncons seq_seq |> Seq.split in + let hd, tl = Seq.uncons hds |> Option.map fst, Seq.uncons tls |> Option.map snd in + let d = Option.fold ~none:Seq.empty ~some:diag tl in + Option.fold ~none:Fun.id ~some:Seq.cons hd d () +``` + +# Statement + +Write a function `diag : 'a Seq.t Seq.t -> 'a Seq` that returns the _diagonal_ +of a sequence of sequences. The returned sequence is formed the following way: +The first element of the returned sequence is the first element of the first +sequence; the second element of the returned sequence is the second element of +the second sequence; the third element of the returned sequence is the third +element of the third sequence; and so on. + + + by taking the first element of the +first sequence, the second element of the second second + diff --git a/data/problems/stream.md b/data/problems/stream.md new file mode 100644 index 0000000000..0fa3610e82 --- /dev/null +++ b/data/problems/stream.md @@ -0,0 +1,61 @@ +--- +title: Never-Ending Sequences +number: "101" +difficulty: beginner +tags: [ "seq" ] +--- + +# Solution + +```ocaml +type 'a cons = Cons of 'a * 'a stream +and 'a stream = unit -> 'a cons + +let hd (seq : 'a stream) = let (Cons (x, _)) = seq () in x +let tl (seq : 'a stream) = let (Cons (_, seq)) = seq () in seq +let rec take n seq = if n = 0 then [] else let (Cons (x, seq)) = seq () in x :: take (n - 1) seq +let rec unfold f x () = let (y, x) = f x in Cons (y, unfold f x) +let bang x = unfold (fun x -> (x, x)) x +let ints x = unfold (fun x -> (x, x + 1)) x +let rec map f seq () = let (Cons (x, seq)) = seq () in Cons (f x, map f seq) +let rec filter p seq () = let (Cons (x, seq)) = seq () in let seq = filter p seq in if p x then Cons (x, seq) else seq () +let rec iter f seq = let (Cons (x, seq)) = seq () in f x; iter f seq +let to_seq seq = Seq.unfold (fun seq -> Some (hd seq, tl seq)) seq +let rec of_seq seq () = match seq () with +| Seq.Nil -> failwith "Not a infinite sequence" +| Seq.Cons (x, seq) -> Cons (x, of_seq seq) +``` + +# Statement + +Lists are finite, they always contain a finite number of elements. Sequences may +be finite or infinite. + +The goal of this exercise is to define a type `'a stream` which only contains +infinite sequences. Using this type, define the functions following functions: +```ocaml +val hd : 'a stream -> 'a +(** Returns the first element of a stream *) +val tl : 'a stream -> 'a stream +(** Removes the first element of a stream *) +val take : int -> 'a stream -> 'a list +(** [take n seq] returns the n first values of [seq] *) +val unfold : ('a -> 'b * 'a) -> 'a -> 'b stream +(** Similar to Seq.unfold *) +val bang : 'a -> 'a stream +(** [bang x] produces a infinitely repeating sequences of [x] values. *) +val ints : int -> int stream +(* Similar to Seq.ints *) +val map : ('a -> 'b) -> 'a stream -> 'b stream +(** Similar to List.map and Seq.map *) +val filter: ('a -> bool) -> 'a stream -> 'a stream +(** Similar to List.filter and Seq.filter *) +val iter : ('a -> unit) -> 'a stream -> 'b +(** Similar to List.iter and Seq.iter *) +val to_seq : 'a stream -> 'a Seq.t +(** Translates an ['a stream] into an ['a Seq.t] *) +val of_seq : 'a Seq.t -> 'a stream +(** Translates an ['a Seq.t] into an ['a stream] + @raise Failure if the input sequence is finite. *) +``` +Pro tip: Use irrefutable patterns. \ No newline at end of file diff --git a/data/tutorials/ds_05_seq.md b/data/tutorials/ds_05_seq.md index 8a686cd253..9cedaa5868 100644 --- a/data/tutorials/ds_05_seq.md +++ b/data/tutorials/ds_05_seq.md @@ -296,7 +296,7 @@ Similar functions are also provided for sets, maps, hash tables (`Hashtbl`) and others (except `Seq`, obviously). When implementing a datatype module, it is advised to expose `to_seq` and `of_seq` functions. -## Miscellaneous +## Miscellaneous Considerations There are a couple of related libraries, all providing means to handle large flows of data: @@ -309,3 +309,9 @@ There used to be a module called [`Stream`](/releases/4.13/api/Stream.html) in the OCaml standard library. It was [removed](https://github.com/ocaml/ocaml/pull/10482) in 2021 with the release of OCaml 4.14. Beware books and documentation written before may still mention it. + +## Exercices + +* [Diagonal](/problems#100) +* [Streams](/problems#101) + From 2ae4c58720206330b04f473f2b52231f803f7440 Mon Sep 17 00:00:00 2001 From: Cuihtlauac Alvarado Date: Tue, 7 Feb 2023 08:17:08 +0100 Subject: [PATCH 25/43] Apply suggestions from code review Grammar fixes Co-authored-by: Christine Rose --- data/problems/diag.md | 6 ++-- data/problems/stream.md | 2 +- data/tutorials/ds_05_seq.md | 63 ++++++++++++++++++------------------- 3 files changed, 35 insertions(+), 36 deletions(-) diff --git a/data/problems/diag.md b/data/problems/diag.md index 8fb4fdec1a..ce4f7b18ed 100644 --- a/data/problems/diag.md +++ b/data/problems/diag.md @@ -18,13 +18,13 @@ let rec diag seq_seq () = # Statement Write a function `diag : 'a Seq.t Seq.t -> 'a Seq` that returns the _diagonal_ -of a sequence of sequences. The returned sequence is formed the following way: +of a sequence of sequences. The returned sequence is formed as follows: The first element of the returned sequence is the first element of the first sequence; the second element of the returned sequence is the second element of the second sequence; the third element of the returned sequence is the third element of the third sequence; and so on. - by taking the first element of the -first sequence, the second element of the second second +By taking the first element of the +first sequence, the second element of the second sequence diff --git a/data/problems/stream.md b/data/problems/stream.md index 0fa3610e82..aa2afb3b78 100644 --- a/data/problems/stream.md +++ b/data/problems/stream.md @@ -28,7 +28,7 @@ let rec of_seq seq () = match seq () with # Statement -Lists are finite, they always contain a finite number of elements. Sequences may +Lists are finite, meaning they always contain a finite number of elements. Sequences may be finite or infinite. The goal of this exercise is to define a type `'a stream` which only contains diff --git a/data/tutorials/ds_05_seq.md b/data/tutorials/ds_05_seq.md index 9cedaa5868..50a30f8321 100644 --- a/data/tutorials/ds_05_seq.md +++ b/data/tutorials/ds_05_seq.md @@ -34,34 +34,33 @@ type 'a node = | Cons of 'a * 'a t and 'a t = unit -> 'a node ``` -This is the mutually recursive definition of two types; `Seq.node` which is +This is the mutually recursive definition of two types: `Seq.node`, which is almost the same as `list`: ```ocaml type 'a list = | [] | (::) of 'a * 'a list ``` -and `Seq.t` which is merely a type alias for `unit -> 'a Seq.node`. The whole -point of this definition is the type of the second argument of `Seq.Cons`, which +and `Seq.t`, which is merely a type alias for `unit -> 'a Seq.node`. The whole +point of this definition is the second argument's type `Seq.Cons`, which is a function returning a sequence while its `list` counterpart returns a list. Let's compare the constructors of `list` and `Seq.node`: 1. Empty lists and sequences are defined the same way, a constructor without any parameter: `Seq.Nil` and `[]`. 1. Non-empty lists and sequences are both pairs whose former member is a piece - of data; -1. but the latter member, in lists, is a `list` too, while in sequences, it is a + of data. +1. However, the latter member in lists is a `list` too, while in sequences, it is a function returning a `Seq.node`. A value of type `Seq.t` is “frozen” because the data it contains isn't -immediately available, a `unit` value has to be supplied to recover it, and -that's “unfreezing”. However, unfreezing only gives access to the tip of the +immediately available. A `unit` value has to be supplied to recover it, which is called “unfreezing.” However, unfreezing only gives access to the tip of the icicle, since the second argument of `Seq.Cons` is a function too. -Having frozen-by-function tails explains why sequences may be considered +Frozen-by-function tails explain why sequences may be considered potentially infinite. Until a `Seq.Nil` value has been found in the sequence, one can't say for sure if some will ever appear. The sequence could be a stream -of incoming requests in a server, readings from an embedded sensor or system logs. -All have unforeseeable termination and it is easier to consider them infinite. +of incoming requests in a server, readings from an embedded sensor, or system logs. +All have unforeseeable termination, and it is easier to consider them infinite. In OCaml, any value `a` of type `t` can be turned into a constant function by writing `fun _ -> a`, which has type `'a -> t`. When writing `fun () -> a` @@ -76,8 +75,8 @@ val ints : int -> int Seq.t = ``` The function `ints n` look as if building the infinite sequence $(n; n + 1; n + 2; n + 3;...)$. In reality, since there isn't an infinite -amount of distinct values of type `int`, those sequences are not increasing, -when reaching `max_int` the values will circle down to `min_int`. They are +amount of distinct values of type `int`, those sequences don't increase. +When reaching `max_int`, the values will circle down to `min_int`. They are ultimately periodic. The OCaml standard library contains a module on sequences called @@ -86,7 +85,7 @@ has the same behaviour as `List.iter`. Writing this: ```ocaml # Seq.iter print_int (ints 0);; ``` -in an OCaml top-level means: “print integers forever” and you have to type +in an OCaml toplevel means “print integers forever,” and you have to type `Ctrl-C` to interrupt the execution. Perhaps more interestingly, the following code is also an infinite loop: ```ocaml @@ -97,7 +96,7 @@ But the key point is: it doesn't leak memory. ## Example The `Seq` module of the OCaml standard library contains the definition of the -function `Seq.take` which returns a specified number of elements from the +function `Seq.take`, which returns a specified number of elements from the beginning of a sequence. Here is a simplified implementation: ```ocaml let rec take n seq () = match seq () with @@ -108,18 +107,18 @@ let rec take n seq () = match seq () with `seq` contains less than `n` elements, an identical sequence is returned. In particular, if `seq` is empty, an empty sequence is returned. -Observe the first line of `take`, it is the common pattern for recursive +Observe the first line of `take`. It is the common pattern for recursive functions over sequences. The last two parameters are: -* a sequence called `seq`; -* a `unit` value. +* a sequence called `seq` +* a `unit` value When executed, the function begins by unfreezing `seq` (that is, calling `seq -()`) and then pattern match to look inside the data made available. However, +()`) and then pattern matching to look inside the available data. However, this does not happen unless a `unit` parameter is passed to `take`. Writing -`take 10 seq` does not compute anything, it is a partial application and returns +`take 10 seq` does not compute anything; it is a partial application and returns a function needing a `unit` to produce a result. -This can be used to print integers without looping forever as shown previously: +This can be used to print integers without looping forever, as shown previously: ```ocaml # Seq.ints 0 |> Seq.take 43 |> List.of_seq;; - : int list = @@ -159,7 +158,7 @@ the list of 100 first prime numbers: 509; 521; 523] ``` -The function `sieve` is recursive, in OCaml and common senses: it is defined +The function `sieve` is recursive in OCaml and common sense. It is defined using the `rec` keyword and calls itself. However, some call that kind of function [_corecursive_](https://en.wikipedia.org/wiki/Corecursion). This word is used to emphasize that, by design, it does not terminate. Strictly speaking, @@ -174,9 +173,9 @@ instance: * `Seq.map` * `Seq.fold_left` -All those are also available for `Array`, `List` and `Set` and behave +All those are also available for `Array`, `List`, and `Set` and behave essentially the same. Observe that there is no `fold_right` function. Since -OCaml 4.11 there is something which isn't (yet) available on other types: +OCaml 4.11, there is something which isn't (yet) available on other types: `unfold`. Here is how it is implemented: ```ocaml let rec unfold f seq () = match f seq with @@ -187,7 +186,7 @@ And here is its type: ```ocaml val unfold : ('a -> ('b * 'a) option) -> 'a -> 'b Seq.t = ``` -Unlike previously mentioned iterators `Seq.unfold` does not have a sequence +Unlike previously mentioned iterators, `Seq.unfold` does not have a sequence parameter, but a sequence result. `unfold` provides a general means to build sequences. For instance, `Seq.ints` can be implemented using `Seq.unfold` in a fairly compact way: @@ -195,7 +194,7 @@ fairly compact way: let ints = Seq.unfold (fun n -> Some (n, n + 1));; ``` -As a fun fact, one should observe `map` over sequences can be implemented using +As a fun fact, one should observe `map` over sequences, as it can be implemented using `Seq.unfold`. Here is how to write it: ```ocaml # let map f = Seq.unfold (fun s -> s |> Seq.uncons |> Option.map (fun (x, y) -> (f x, y)));; @@ -206,7 +205,7 @@ Here is a quick check: # Seq.ints 0 |> map (fun x -> x * x) |> Seq.take 10 |> List.of_seq;; - : int list = [0; 1; 4; 9; 16; 25; 36; 49; 64; 81] ``` -The function `Seq.uncons` returns the head and tail of a sequence if it is not empty or `None` otherwise. +The function `Seq.uncons` returns the head and tail of a sequence if it is not empty, or it otherwise returns `None`. Using this function: ```ocaml @@ -219,14 +218,14 @@ It is possible to read a file using `Seq.unfold`: "README.md" |> open_in |> Seq.unfold input_line_opt |> Seq.iter print_endline ``` -Although this can be an appealing style, bear in mind it does not prevent from +Although this can be an appealing style, bear in mind that it does not prevent taking care of open files. While the code above is fine, this one no longer is: ```ocaml "README.md" |> open_in |> Seq.unfold input_line_opt |> Seq.take 10 |> Seq.iter print_endline ``` Here, `close_in` will never be called over the input channel opened on `README.md`. -## Sequences are Functions +## Sequences Are Functions Although this looks like a possible way to define the [Fibonacci sequence](https://en.wikipedia.org/wiki/Fibonacci_number): @@ -250,11 +249,11 @@ It can be used to produce some Fibonacci numbers: - : int list = [0; 1; 1; 2; 3; 5; 8; 13; 21; 34] ``` Why is it so? The key difference lies in the recursive call `fibs n (n + m)`. In -the former definition, the application is complete, `fibs` is provided all the -arguments it expects; in the latter definition, the application is partial, the +the former definition, the application is complete because `fibs` is provided all the +arguments it expects. In the latter definition, the application is partial because the `()` argument is missing. Since evaluation is [eager](https://en.wikipedia.org/wiki/Evaluation_strategy#Eager_evaluation) in -OCaml, in the former case, evaluation of the recursive call is triggered, and +OCaml, in the former case, evaluation of the recursive call is triggered and a non-terminating looping occurs. In contrast, in the latter case, the partially applied function is immediately returned as a [closure](https://en.wikipedia.org/wiki/Closure_(computer_programming)). @@ -268,7 +267,7 @@ Functions working with sequences must be written accordingly. * Sequence consumer: partially applied function parameter * Sequence producer: partially applied function result -When code dealing with sequences does not behave as expected, in particular, if +When code dealing with sequences does not behave as expected, like if it is crashing or hanging, there's a fair chance a mistake like in the first definition of `fibs` was made. From 3a78203cb597b26729f8fbd7b1e8945804d07044 Mon Sep 17 00:00:00 2001 From: Cuihtlauac Alvarado Date: Thu, 9 Feb 2023 07:56:37 +0100 Subject: [PATCH 26/43] Apply suggestions from code review Co-authored-by: Miod Vallat <118974489+dustanddreams@users.noreply.github.com> --- data/problems/stream.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/data/problems/stream.md b/data/problems/stream.md index aa2afb3b78..d448908322 100644 --- a/data/problems/stream.md +++ b/data/problems/stream.md @@ -32,7 +32,7 @@ Lists are finite, meaning they always contain a finite number of elements. Seque be finite or infinite. The goal of this exercise is to define a type `'a stream` which only contains -infinite sequences. Using this type, define the functions following functions: +infinite sequences. Using this type, define the following functions: ```ocaml val hd : 'a stream -> 'a (** Returns the first element of a stream *) From c62a6a2f116f474a3cd676afd47f555d57486162 Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Thu, 9 Feb 2023 08:10:14 +0100 Subject: [PATCH 27/43] Some fixes --- data/problems/diag.md | 5 ----- data/problems/stream.md | 2 +- data/tutorials/ds_05_seq.md | 13 ++++--------- 3 files changed, 5 insertions(+), 15 deletions(-) diff --git a/data/problems/diag.md b/data/problems/diag.md index ce4f7b18ed..96f3640bf6 100644 --- a/data/problems/diag.md +++ b/data/problems/diag.md @@ -23,8 +23,3 @@ The first element of the returned sequence is the first element of the first sequence; the second element of the returned sequence is the second element of the second sequence; the third element of the returned sequence is the third element of the third sequence; and so on. - - -By taking the first element of the -first sequence, the second element of the second sequence - diff --git a/data/problems/stream.md b/data/problems/stream.md index d448908322..b06386fd14 100644 --- a/data/problems/stream.md +++ b/data/problems/stream.md @@ -58,4 +58,4 @@ val of_seq : 'a Seq.t -> 'a stream (** Translates an ['a Seq.t] into an ['a stream] @raise Failure if the input sequence is finite. *) ``` -Pro tip: Use irrefutable patterns. \ No newline at end of file +**Tip:** Use `let ... =` patterns. \ No newline at end of file diff --git a/data/tutorials/ds_05_seq.md b/data/tutorials/ds_05_seq.md index 50a30f8321..80ac34778b 100644 --- a/data/tutorials/ds_05_seq.md +++ b/data/tutorials/ds_05_seq.md @@ -11,13 +11,8 @@ date: 2023-01-12T09:00:00-01:00 ## Prerequisites -| Concept | Status | Documentation | Reference | -|---|---|---|---| -| Basic types | Mandatory | | | -| Functions | Mandatory | | | -| Lists | Mandatory | | | -| Options | Recommended | | | -| Arrays | Nice to have | | | +You should be comfortable with writing functions over lists and, ideally, +understand what an option is. ## Introduction @@ -85,7 +80,7 @@ has the same behaviour as `List.iter`. Writing this: ```ocaml # Seq.iter print_int (ints 0);; ``` -in an OCaml toplevel means “print integers forever,” and you have to type +in an OCaml top-level means “print integers forever,” and you have to type `Ctrl-C` to interrupt the execution. Perhaps more interestingly, the following code is also an infinite loop: ```ocaml @@ -267,7 +262,7 @@ Functions working with sequences must be written accordingly. * Sequence consumer: partially applied function parameter * Sequence producer: partially applied function result -When code dealing with sequences does not behave as expected, like if +When code dealing with sequences does not behave as expected like if it is crashing or hanging, there's a fair chance a mistake like in the first definition of `fibs` was made. From 60543204f3aeceb43845780b6b55de08b4aa3508 Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Wed, 3 May 2023 18:51:30 +0200 Subject: [PATCH 28/43] Update text after letting it rest for a while --- data/tutorials/ds_05_seq.md | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/data/tutorials/ds_05_seq.md b/data/tutorials/ds_05_seq.md index 80ac34778b..326b9388ff 100644 --- a/data/tutorials/ds_05_seq.md +++ b/data/tutorials/ds_05_seq.md @@ -11,18 +11,17 @@ date: 2023-01-12T09:00:00-01:00 ## Prerequisites -You should be comfortable with writing functions over lists and, ideally, -understand what an option is. +You should be comfortable with writing functions over lists and options. ## Introduction -Sequences look a lot like lists. However from a pragmatic perspective, one +Sequences are very much like lists. However from a pragmatic perspective, one should imagine they may be infinite. That's the key intuition to understanding and using sequences. -One way to look at a value of type `'a Seq.t` is to consider it as an icicle, a -frozen stream of data. To understand this analogy, consider how sequences are -defined in the standard library: +One way to look at a value of type `'a Seq.t` is to consider it as a list, with +a twist when it's not emprty: a frozen tail. To understand this analogy, +consider how sequences are defined in the standard library: ```ocaml type 'a node = | Nil @@ -44,12 +43,13 @@ compare the constructors of `list` and `Seq.node`: parameter: `Seq.Nil` and `[]`. 1. Non-empty lists and sequences are both pairs whose former member is a piece of data. -1. However, the latter member in lists is a `list` too, while in sequences, it is a - function returning a `Seq.node`. +1. However, the latter member in lists is recursively a `list`, while in + sequences, it is a function returning a `Seq.node`. A value of type `Seq.t` is “frozen” because the data it contains isn't -immediately available. A `unit` value has to be supplied to recover it, which is called “unfreezing.” However, unfreezing only gives access to the tip of the -icicle, since the second argument of `Seq.Cons` is a function too. +immediately available. A `unit` value has to be supplied to recover it, which we +may see as “unfreezing.” However, unfreezing only gives access to the tip of the +sequence, since the second argument of `Seq.Cons` is a function too. Frozen-by-function tails explain why sequences may be considered potentially infinite. Until a `Seq.Nil` value has been found in the sequence, @@ -60,7 +60,7 @@ All have unforeseeable termination, and it is easier to consider them infinite. In OCaml, any value `a` of type `t` can be turned into a constant function by writing `fun _ -> a`, which has type `'a -> t`. When writing `fun () -> a` instead, we get a function of type `unit -> t`. Such a function is called a -[_thunk_](https://en.wikipedia.org/wiki/Thunk). Using this terminology, sequence +[_thunk_](https://en.wikipedia.org/wiki/Thunk). Using this terminology, `Seq.t` values are thunks. With the analogy used earlier, `a` is frozen in its thunk. Here is how to build seemingly infinite sequences of integers: @@ -68,10 +68,10 @@ Here is how to build seemingly infinite sequences of integers: # let rec ints n : int Seq.t = fun () -> Seq.Cons (n, ints (n + 1)) val ints : int -> int Seq.t = ``` -The function `ints n` look as if building the infinite sequence -$(n; n + 1; n + 2; n + 3;...)$. In reality, since there isn't an infinite -amount of distinct values of type `int`, those sequences don't increase. -When reaching `max_int`, the values will circle down to `min_int`. They are +The function `ints n` look as if building the infinite sequence `(n; n + 1; n + +2; n + 3;...)`. In reality, since there isn't an infinite amount of distinct +values of type `int`, those sequences aren't indefinitely increasing. When +reaching `max_int`, the values will circle down to `min_int`. They are ultimately periodic. The OCaml standard library contains a module on sequences called @@ -189,7 +189,7 @@ fairly compact way: let ints = Seq.unfold (fun n -> Some (n, n + 1));; ``` -As a fun fact, one should observe `map` over sequences, as it can be implemented using +As a fun fact, one should observe `map` over sequences can be implemented using `Seq.unfold`. Here is how to write it: ```ocaml # let map f = Seq.unfold (fun s -> s |> Seq.uncons |> Option.map (fun (x, y) -> (f x, y)));; From baed03028dc9b0e280781a09ae9f751a6b211f16 Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Wed, 3 May 2023 18:58:50 +0200 Subject: [PATCH 29/43] Add feedback from @gpetiot --- data/problems/diag.md | 2 +- data/problems/stream.md | 4 +-- data/tutorials/ds_05_seq.md | 60 +++++++++++++++++++------------------ 3 files changed, 34 insertions(+), 32 deletions(-) diff --git a/data/problems/diag.md b/data/problems/diag.md index 96f3640bf6..e239fed169 100644 --- a/data/problems/diag.md +++ b/data/problems/diag.md @@ -1,6 +1,6 @@ --- title: Diagonal of a Sequence of Sequences -number: "100" +number: "101" difficulty: intermediate tags: [ "seq" ] --- diff --git a/data/problems/stream.md b/data/problems/stream.md index b06386fd14..9670597fdd 100644 --- a/data/problems/stream.md +++ b/data/problems/stream.md @@ -1,6 +1,6 @@ --- title: Never-Ending Sequences -number: "101" +number: "100" difficulty: beginner tags: [ "seq" ] --- @@ -43,7 +43,7 @@ val take : int -> 'a stream -> 'a list val unfold : ('a -> 'b * 'a) -> 'a -> 'b stream (** Similar to Seq.unfold *) val bang : 'a -> 'a stream -(** [bang x] produces a infinitely repeating sequences of [x] values. *) +(** [bang x] produces an infinitely repeating sequence of [x] values. *) val ints : int -> int stream (* Similar to Seq.ints *) val map : ('a -> 'b) -> 'a stream -> 'b stream diff --git a/data/tutorials/ds_05_seq.md b/data/tutorials/ds_05_seq.md index 326b9388ff..7b169055c8 100644 --- a/data/tutorials/ds_05_seq.md +++ b/data/tutorials/ds_05_seq.md @@ -36,8 +36,8 @@ type 'a list = | (::) of 'a * 'a list ``` and `Seq.t`, which is merely a type alias for `unit -> 'a Seq.node`. The whole -point of this definition is the second argument's type `Seq.Cons`, which -is a function returning a sequence while its `list` counterpart returns a list. Let's +point of this definition is the second argument's type `Seq.Cons`, which is a +function returning a sequence while its `list` counterpart returns a list. Let's compare the constructors of `list` and `Seq.node`: 1. Empty lists and sequences are defined the same way, a constructor without any parameter: `Seq.Nil` and `[]`. @@ -51,11 +51,11 @@ immediately available. A `unit` value has to be supplied to recover it, which we may see as “unfreezing.” However, unfreezing only gives access to the tip of the sequence, since the second argument of `Seq.Cons` is a function too. -Frozen-by-function tails explain why sequences may be considered -potentially infinite. Until a `Seq.Nil` value has been found in the sequence, -one can't say for sure if some will ever appear. The sequence could be a stream -of incoming requests in a server, readings from an embedded sensor, or system logs. -All have unforeseeable termination, and it is easier to consider them infinite. +Frozen-by-function tails explain why sequences may be considered potentially +infinite. Until a `Seq.Nil` value has been found in the sequence, one can't say +for sure if some will ever appear. The sequence could be a stream of incoming +requests in a server, readings from an embedded sensor, or system logs. All have +unforeseeable termination, and it is easier to consider them infinite. In OCaml, any value `a` of type `t` can be turned into a constant function by writing `fun _ -> a`, which has type `'a -> t`. When writing `fun () -> a` @@ -68,7 +68,7 @@ Here is how to build seemingly infinite sequences of integers: # let rec ints n : int Seq.t = fun () -> Seq.Cons (n, ints (n + 1)) val ints : int -> int Seq.t = ``` -The function `ints n` look as if building the infinite sequence `(n; n + 1; n + +The function `ints n` looks as if building the infinite sequence `(n; n + 1; n + 2; n + 3;...)`. In reality, since there isn't an infinite amount of distinct values of type `int`, those sequences aren't indefinitely increasing. When reaching `max_int`, the values will circle down to `min_int`. They are @@ -80,7 +80,7 @@ has the same behaviour as `List.iter`. Writing this: ```ocaml # Seq.iter print_int (ints 0);; ``` -in an OCaml top-level means “print integers forever,” and you have to type +in an OCaml top-level means “print integers forever,” and you have to press `Ctrl-C` to interrupt the execution. Perhaps more interestingly, the following code is also an infinite loop: ```ocaml @@ -108,10 +108,10 @@ functions over sequences. The last two parameters are: * a `unit` value When executed, the function begins by unfreezing `seq` (that is, calling `seq -()`) and then pattern matching to look inside the available data. However, -this does not happen unless a `unit` parameter is passed to `take`. Writing -`take 10 seq` does not compute anything; it is a partial application and returns -a function needing a `unit` to produce a result. +()`) and then pattern matching to look inside the available data. However, this +does not happen unless a `unit` parameter is passed to `take`. Writing `take 10 +seq` does not compute anything; it is a partial application and returns a +function needing a `unit` to produce a result. This can be used to print integers without looping forever, as shown previously: ```ocaml @@ -129,9 +129,9 @@ The `Seq` module also has a function `Seq.filter`: ``` It builds a sequence of elements satisfying a condition. -Using `Seq.filter`, it is possible to make a straightforward implementation of the -[Sieve of Eratosthenes](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes). -Here it is: +Using `Seq.filter`, it is possible to make a straightforward implementation of +the [Sieve of +Eratosthenes](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes). Here it is: ```ocaml let rec sieve seq () = match seq () with | Seq.Cons (m, seq) -> Seq.Cons (m, sieve (Seq.filter (fun n -> n mod m > 0) seq)) @@ -153,12 +153,12 @@ the list of 100 first prime numbers: 509; 521; 523] ``` -The function `sieve` is recursive in OCaml and common sense. It is defined -using the `rec` keyword and calls itself. However, some call that kind of -function [_corecursive_](https://en.wikipedia.org/wiki/Corecursion). This word -is used to emphasize that, by design, it does not terminate. Strictly speaking, -the sieve of Eratosthenes is not an algorithm either since it does not -terminate. This implementation behaves the same. +The function `sieve` is recursive in OCaml and common sense. It is defined using +the `rec` keyword and calls itself. However, some call that kind of function +[_corecursive_](https://en.wikipedia.org/wiki/Corecursion). This word is used to +emphasize that, by design, it does not terminate. Strictly speaking, the sieve +of Eratosthenes is not an algorithm either since it does not terminate. This +implementation behaves the same. ## Unfolding Sequences @@ -200,7 +200,8 @@ Here is a quick check: # Seq.ints 0 |> map (fun x -> x * x) |> Seq.take 10 |> List.of_seq;; - : int list = [0; 1; 4; 9; 16; 25; 36; 49; 64; 81] ``` -The function `Seq.uncons` returns the head and tail of a sequence if it is not empty, or it otherwise returns `None`. +The function `Seq.uncons` returns the head and tail of a sequence if it is not +empty, or it otherwise returns `None`. Using this function: ```ocaml @@ -218,7 +219,8 @@ taking care of open files. While the code above is fine, this one no longer is: ```ocaml "README.md" |> open_in |> Seq.unfold input_line_opt |> Seq.take 10 |> Seq.iter print_endline ``` -Here, `close_in` will never be called over the input channel opened on `README.md`. +Here, `close_in` will never be called over the input channel opened on +`README.md`. ## Sequences Are Functions @@ -244,9 +246,9 @@ It can be used to produce some Fibonacci numbers: - : int list = [0; 1; 1; 2; 3; 5; 8; 13; 21; 34] ``` Why is it so? The key difference lies in the recursive call `fibs n (n + m)`. In -the former definition, the application is complete because `fibs` is provided all the -arguments it expects. In the latter definition, the application is partial because the -`()` argument is missing. Since evaluation is +the former definition, the application is complete because `fibs` is provided +all the arguments it expects. In the latter definition, the application is +partial because the `()` argument is missing. Since evaluation is [eager](https://en.wikipedia.org/wiki/Evaluation_strategy#Eager_evaluation) in OCaml, in the former case, evaluation of the recursive call is triggered and a non-terminating looping occurs. In contrast, in the latter case, the partially @@ -262,8 +264,8 @@ Functions working with sequences must be written accordingly. * Sequence consumer: partially applied function parameter * Sequence producer: partially applied function result -When code dealing with sequences does not behave as expected like if -it is crashing or hanging, there's a fair chance a mistake like in the first +When code dealing with sequences does not behave as expected like if it is +crashing or hanging, there's a fair chance a mistake like in the first definition of `fibs` was made. ## Sequences for Conversions From 28bac912f0194940683fc8395753a6edd5f7e1a9 Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Wed, 3 May 2023 19:17:53 +0200 Subject: [PATCH 30/43] More feedback from @gpetiot --- data/tutorials/ds_05_seq.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/data/tutorials/ds_05_seq.md b/data/tutorials/ds_05_seq.md index 7b169055c8..b8fcbaa6a7 100644 --- a/data/tutorials/ds_05_seq.md +++ b/data/tutorials/ds_05_seq.md @@ -20,7 +20,7 @@ should imagine they may be infinite. That's the key intuition to understanding and using sequences. One way to look at a value of type `'a Seq.t` is to consider it as a list, with -a twist when it's not emprty: a frozen tail. To understand this analogy, +a twist when it's not empty: a frozen tail. To understand this analogy, consider how sequences are defined in the standard library: ```ocaml type 'a node = @@ -306,7 +306,7 @@ the OCaml standard library. It was [removed](https://github.com/ocaml/ocaml/pull/10482) in 2021 with the release of OCaml 4.14. Beware books and documentation written before may still mention it. -## Exercices +## Exercises * [Diagonal](/problems#100) * [Streams](/problems#101) From dc571d3be6abd84d30c4e659c5df33ed6f94ec4a Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Thu, 4 May 2023 13:53:02 +0200 Subject: [PATCH 31/43] Add feedback from @xvw --- data/tutorials/ds_05_seq.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/data/tutorials/ds_05_seq.md b/data/tutorials/ds_05_seq.md index b8fcbaa6a7..db266025d2 100644 --- a/data/tutorials/ds_05_seq.md +++ b/data/tutorials/ds_05_seq.md @@ -298,7 +298,7 @@ There are a couple of related libraries, all providing means to handle large flows of data: * Rizo I [Streaming](/p/streaming) -* Gabriel Radanne [Iter](/p/iter) +* Simon Cruanes and Gabriel Radanne [Iter](/p/iter) * Jane Street `Base.Sequence` There used to be a module called [`Stream`](/releases/4.13/api/Stream.html) in From e9afbff6c9b9861cb36fadcde47d60b0f48b9e44 Mon Sep 17 00:00:00 2001 From: Cuihtlauac Alvarado Date: Fri, 5 May 2023 10:18:13 +0200 Subject: [PATCH 32/43] Apply suggestions from @christinerose Co-authored-by: Christine Rose --- data/tutorials/ds_05_seq.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/data/tutorials/ds_05_seq.md b/data/tutorials/ds_05_seq.md index db266025d2..1c7d420a62 100644 --- a/data/tutorials/ds_05_seq.md +++ b/data/tutorials/ds_05_seq.md @@ -80,7 +80,7 @@ has the same behaviour as `List.iter`. Writing this: ```ocaml # Seq.iter print_int (ints 0);; ``` -in an OCaml top-level means “print integers forever,” and you have to press +in an OCaml toplevel, this means “print integers forever,” and you have to press `Ctrl-C` to interrupt the execution. Perhaps more interestingly, the following code is also an infinite loop: ```ocaml @@ -110,7 +110,7 @@ functions over sequences. The last two parameters are: When executed, the function begins by unfreezing `seq` (that is, calling `seq ()`) and then pattern matching to look inside the available data. However, this does not happen unless a `unit` parameter is passed to `take`. Writing `take 10 -seq` does not compute anything; it is a partial application and returns a +seq` does not compute anything. It is a partial application and returns a function needing a `unit` to produce a result. This can be used to print integers without looping forever, as shown previously: @@ -264,7 +264,7 @@ Functions working with sequences must be written accordingly. * Sequence consumer: partially applied function parameter * Sequence producer: partially applied function result -When code dealing with sequences does not behave as expected like if it is +When code dealing with sequences does not behave as expected, like if it is crashing or hanging, there's a fair chance a mistake like in the first definition of `fibs` was made. From eb9ba226de8183172cebd355d2e750a7b0fa466b Mon Sep 17 00:00:00 2001 From: Cuihtlauac Alvarado Date: Fri, 5 May 2023 10:19:33 +0200 Subject: [PATCH 33/43] Update ds_05_seq.md --- data/tutorials/ds_05_seq.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/data/tutorials/ds_05_seq.md b/data/tutorials/ds_05_seq.md index 1c7d420a62..79a5dce66f 100644 --- a/data/tutorials/ds_05_seq.md +++ b/data/tutorials/ds_05_seq.md @@ -247,7 +247,7 @@ It can be used to produce some Fibonacci numbers: ``` Why is it so? The key difference lies in the recursive call `fibs n (n + m)`. In the former definition, the application is complete because `fibs` is provided -all the arguments it expects. In the latter definition, the application is +with all the arguments it expects. In the latter definition, the application is partial because the `()` argument is missing. Since evaluation is [eager](https://en.wikipedia.org/wiki/Evaluation_strategy#Eager_evaluation) in OCaml, in the former case, evaluation of the recursive call is triggered and a From 0025c8891d089903cd3e54f0334e5288647c71c1 Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Fri, 5 May 2023 15:22:11 +0200 Subject: [PATCH 34/43] Minor edits --- data/tutorials/lg_08_error_handling.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index 305b0497aa..458aa834cc 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -9,7 +9,9 @@ date: 2021-05-27T21:07:30-00:00 # Error Handling -In OCaml, errors can be handled in several ways. This document presents most of the available means. However, handling errors using the effect handlers introduced in OCaml 5 hasn't been addressed yet. +In OCaml, errors can be handled in several ways. This document presents most of +the available means. However, handling errors using the effect handlers +introduced in OCaml 5 isn't addressed yet. ## Error as Special Values @@ -26,13 +28,13 @@ consequences. This is not the proper way to deal with errors in OCaml. There are three major ways to make it impossible to ignore errors in OCaml: 1. Exceptions -1. `Option` values -1. `Result` values +1. `option` values +1. `result` values Use them. Do not encode errors inside data. Exceptions provide a mean to deal with errors at the control flow level while -`Option` and `Result` provide means to turn errors into dedicated data. +`option` and `result` provide means to turn errors into dedicated data. The rest of this document presents and compares approaches towards error handling. @@ -43,7 +45,8 @@ Historically, the first way of handling errors in OCaml is exceptions. The standard library relies heavily upon them. The biggest issue with exceptions is that they do not appear in types. One has -to read the documentation to see that, indeed, `List.find` or `String.sub` are functions that they might fail by raising an exception. +to read the documentation to see that, indeed, `List.find` or `String.sub` are +functions that might fail by raising an exception. However, exceptions have the great merit of being compiled into efficient machine code. When implementing trial and error approaches likely to back-track @@ -75,7 +78,7 @@ raise this exception. Now, how do we handle exceptions? The construct is `try ### Predefined Exceptions The standard library predefines several exceptions, see -[`Stdlib`](/releases/latest/api/Stdlib.html). Here are a few examples: +[`Stdlib`](/api/Stdlib.html). Here are a few examples: ```ocaml # 1 / 0;; From fd49da0857b00cd919f6de66b00f01456a83c1ab Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Fri, 5 May 2023 15:32:05 +0200 Subject: [PATCH 35/43] Delete accidentaly imported files --- data/problems/diag.md | 25 --- data/problems/stream.md | 61 ------- data/tutorials/ds_05_seq.md | 313 ------------------------------------ 3 files changed, 399 deletions(-) delete mode 100644 data/problems/diag.md delete mode 100644 data/problems/stream.md delete mode 100644 data/tutorials/ds_05_seq.md diff --git a/data/problems/diag.md b/data/problems/diag.md deleted file mode 100644 index e239fed169..0000000000 --- a/data/problems/diag.md +++ /dev/null @@ -1,25 +0,0 @@ ---- -title: Diagonal of a Sequence of Sequences -number: "101" -difficulty: intermediate -tags: [ "seq" ] ---- - -# Solution - -```ocaml -let rec diag seq_seq () = - let hds, tls = Seq.filter_map Seq.uncons seq_seq |> Seq.split in - let hd, tl = Seq.uncons hds |> Option.map fst, Seq.uncons tls |> Option.map snd in - let d = Option.fold ~none:Seq.empty ~some:diag tl in - Option.fold ~none:Fun.id ~some:Seq.cons hd d () -``` - -# Statement - -Write a function `diag : 'a Seq.t Seq.t -> 'a Seq` that returns the _diagonal_ -of a sequence of sequences. The returned sequence is formed as follows: -The first element of the returned sequence is the first element of the first -sequence; the second element of the returned sequence is the second element of -the second sequence; the third element of the returned sequence is the third -element of the third sequence; and so on. diff --git a/data/problems/stream.md b/data/problems/stream.md deleted file mode 100644 index 9670597fdd..0000000000 --- a/data/problems/stream.md +++ /dev/null @@ -1,61 +0,0 @@ ---- -title: Never-Ending Sequences -number: "100" -difficulty: beginner -tags: [ "seq" ] ---- - -# Solution - -```ocaml -type 'a cons = Cons of 'a * 'a stream -and 'a stream = unit -> 'a cons - -let hd (seq : 'a stream) = let (Cons (x, _)) = seq () in x -let tl (seq : 'a stream) = let (Cons (_, seq)) = seq () in seq -let rec take n seq = if n = 0 then [] else let (Cons (x, seq)) = seq () in x :: take (n - 1) seq -let rec unfold f x () = let (y, x) = f x in Cons (y, unfold f x) -let bang x = unfold (fun x -> (x, x)) x -let ints x = unfold (fun x -> (x, x + 1)) x -let rec map f seq () = let (Cons (x, seq)) = seq () in Cons (f x, map f seq) -let rec filter p seq () = let (Cons (x, seq)) = seq () in let seq = filter p seq in if p x then Cons (x, seq) else seq () -let rec iter f seq = let (Cons (x, seq)) = seq () in f x; iter f seq -let to_seq seq = Seq.unfold (fun seq -> Some (hd seq, tl seq)) seq -let rec of_seq seq () = match seq () with -| Seq.Nil -> failwith "Not a infinite sequence" -| Seq.Cons (x, seq) -> Cons (x, of_seq seq) -``` - -# Statement - -Lists are finite, meaning they always contain a finite number of elements. Sequences may -be finite or infinite. - -The goal of this exercise is to define a type `'a stream` which only contains -infinite sequences. Using this type, define the following functions: -```ocaml -val hd : 'a stream -> 'a -(** Returns the first element of a stream *) -val tl : 'a stream -> 'a stream -(** Removes the first element of a stream *) -val take : int -> 'a stream -> 'a list -(** [take n seq] returns the n first values of [seq] *) -val unfold : ('a -> 'b * 'a) -> 'a -> 'b stream -(** Similar to Seq.unfold *) -val bang : 'a -> 'a stream -(** [bang x] produces an infinitely repeating sequence of [x] values. *) -val ints : int -> int stream -(* Similar to Seq.ints *) -val map : ('a -> 'b) -> 'a stream -> 'b stream -(** Similar to List.map and Seq.map *) -val filter: ('a -> bool) -> 'a stream -> 'a stream -(** Similar to List.filter and Seq.filter *) -val iter : ('a -> unit) -> 'a stream -> 'b -(** Similar to List.iter and Seq.iter *) -val to_seq : 'a stream -> 'a Seq.t -(** Translates an ['a stream] into an ['a Seq.t] *) -val of_seq : 'a Seq.t -> 'a stream -(** Translates an ['a Seq.t] into an ['a stream] - @raise Failure if the input sequence is finite. *) -``` -**Tip:** Use `let ... =` patterns. \ No newline at end of file diff --git a/data/tutorials/ds_05_seq.md b/data/tutorials/ds_05_seq.md deleted file mode 100644 index 79a5dce66f..0000000000 --- a/data/tutorials/ds_05_seq.md +++ /dev/null @@ -1,313 +0,0 @@ ---- -id: sequences -title: Sequences -description: > - Learn about an OCaml's most-used, built-in data types -category: "data-structures" -date: 2023-01-12T09:00:00-01:00 ---- - -# Sequences - -## Prerequisites - -You should be comfortable with writing functions over lists and options. - -## Introduction - -Sequences are very much like lists. However from a pragmatic perspective, one -should imagine they may be infinite. That's the key intuition to understanding -and using sequences. - -One way to look at a value of type `'a Seq.t` is to consider it as a list, with -a twist when it's not empty: a frozen tail. To understand this analogy, -consider how sequences are defined in the standard library: -```ocaml -type 'a node = - | Nil - | Cons of 'a * 'a t -and 'a t = unit -> 'a node -``` -This is the mutually recursive definition of two types: `Seq.node`, which is -almost the same as `list`: -```ocaml -type 'a list = - | [] - | (::) of 'a * 'a list -``` -and `Seq.t`, which is merely a type alias for `unit -> 'a Seq.node`. The whole -point of this definition is the second argument's type `Seq.Cons`, which is a -function returning a sequence while its `list` counterpart returns a list. Let's -compare the constructors of `list` and `Seq.node`: -1. Empty lists and sequences are defined the same way, a constructor without any - parameter: `Seq.Nil` and `[]`. -1. Non-empty lists and sequences are both pairs whose former member is a piece - of data. -1. However, the latter member in lists is recursively a `list`, while in - sequences, it is a function returning a `Seq.node`. - -A value of type `Seq.t` is “frozen” because the data it contains isn't -immediately available. A `unit` value has to be supplied to recover it, which we -may see as “unfreezing.” However, unfreezing only gives access to the tip of the -sequence, since the second argument of `Seq.Cons` is a function too. - -Frozen-by-function tails explain why sequences may be considered potentially -infinite. Until a `Seq.Nil` value has been found in the sequence, one can't say -for sure if some will ever appear. The sequence could be a stream of incoming -requests in a server, readings from an embedded sensor, or system logs. All have -unforeseeable termination, and it is easier to consider them infinite. - -In OCaml, any value `a` of type `t` can be turned into a constant function by -writing `fun _ -> a`, which has type `'a -> t`. When writing `fun () -> a` -instead, we get a function of type `unit -> t`. Such a function is called a -[_thunk_](https://en.wikipedia.org/wiki/Thunk). Using this terminology, `Seq.t` -values are thunks. With the analogy used earlier, `a` is frozen in its thunk. - -Here is how to build seemingly infinite sequences of integers: -```ocaml -# let rec ints n : int Seq.t = fun () -> Seq.Cons (n, ints (n + 1)) -val ints : int -> int Seq.t = -``` -The function `ints n` looks as if building the infinite sequence `(n; n + 1; n + -2; n + 3;...)`. In reality, since there isn't an infinite amount of distinct -values of type `int`, those sequences aren't indefinitely increasing. When -reaching `max_int`, the values will circle down to `min_int`. They are -ultimately periodic. - -The OCaml standard library contains a module on sequences called -[`Seq`](/releases/5.0/api/Seq.html). It contains a `Seq.iter` function, which -has the same behaviour as `List.iter`. Writing this: -```ocaml -# Seq.iter print_int (ints 0);; -``` -in an OCaml toplevel, this means “print integers forever,” and you have to press -`Ctrl-C` to interrupt the execution. Perhaps more interestingly, the following -code is also an infinite loop: -```ocaml -# Seq.iter ignore (ints 0);; -``` -But the key point is: it doesn't leak memory. - -## Example - -The `Seq` module of the OCaml standard library contains the definition of the -function `Seq.take`, which returns a specified number of elements from the -beginning of a sequence. Here is a simplified implementation: -```ocaml -let rec take n seq () = match seq () with - | Seq.Cons (x, seq) when n > 0 -> Seq.Cons (x, take (n - 1) seq) - | _ -> Seq.Nil -``` -`take n seq` returns, at most, the `n` first elements of the sequence `seq`. If -`seq` contains less than `n` elements, an identical sequence is returned. In -particular, if `seq` is empty, an empty sequence is returned. - -Observe the first line of `take`. It is the common pattern for recursive -functions over sequences. The last two parameters are: -* a sequence called `seq` -* a `unit` value - -When executed, the function begins by unfreezing `seq` (that is, calling `seq -()`) and then pattern matching to look inside the available data. However, this -does not happen unless a `unit` parameter is passed to `take`. Writing `take 10 -seq` does not compute anything. It is a partial application and returns a -function needing a `unit` to produce a result. - -This can be used to print integers without looping forever, as shown previously: -```ocaml -# Seq.ints 0 |> Seq.take 43 |> List.of_seq;; -- : int list = -[0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16; 17; 18; 19; 20; 21; - 22; 23; 24; 25; 26; 27; 28; 29; 30; 31; 32; 33; 34; 35; 36; 37; 38; 39; 40; - 41; 42] -``` - -The `Seq` module also has a function `Seq.filter`: -```ocaml -# Seq.filter;; -- : ('a -> bool) -> 'a Seq.t -> 'a Seq.t = -``` -It builds a sequence of elements satisfying a condition. - -Using `Seq.filter`, it is possible to make a straightforward implementation of -the [Sieve of -Eratosthenes](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes). Here it is: -```ocaml -let rec sieve seq () = match seq () with - | Seq.Cons (m, seq) -> Seq.Cons (m, sieve (Seq.filter (fun n -> n mod m > 0) seq)) - | seq -> seq -let facts = ints_from 2 |> sieve;; -``` - -This code can be used to generate lists of prime numbers. For instance, here is -the list of 100 first prime numbers: -```ocaml -# facts |> take 100 |> List.of_seq;; -- : int list = -[2; 3; 5; 7; 11; 13; 17; 19; 23; 29; 31; 37; 41; 43; 47; 53; 59; 61; 67; 71; - 73; 79; 83; 89; 97; 101; 103; 107; 109; 113; 127; 131; 137; 139; 149; 151; - 157; 163; 167; 173; 179; 181; 191; 193; 197; 199; 211; 223; 227; 229; 233; - 239; 241; 251; 257; 263; 269; 271; 277; 281; 283; 293; 307; 311; 313; 317; - 331; 337; 347; 349; 353; 359; 367; 373; 379; 383; 389; 397; 401; 409; 419; - 421; 431; 433; 439; 443; 449; 457; 461; 463; 467; 479; 487; 491; 499; 503; - 509; 521; 523] -``` - -The function `sieve` is recursive in OCaml and common sense. It is defined using -the `rec` keyword and calls itself. However, some call that kind of function -[_corecursive_](https://en.wikipedia.org/wiki/Corecursion). This word is used to -emphasize that, by design, it does not terminate. Strictly speaking, the sieve -of Eratosthenes is not an algorithm either since it does not terminate. This -implementation behaves the same. - -## Unfolding Sequences - -Standard higher-order iteration functions are available on sequences. For -instance: -* `Seq.iter` -* `Seq.map` -* `Seq.fold_left` - -All those are also available for `Array`, `List`, and `Set` and behave -essentially the same. Observe that there is no `fold_right` function. Since -OCaml 4.11, there is something which isn't (yet) available on other types: -`unfold`. Here is how it is implemented: -```ocaml -let rec unfold f seq () = match f seq with - | None -> Nil - | Some (x, seq) -> Cons (x, unfold f seq) -``` -And here is its type: -```ocaml -val unfold : ('a -> ('b * 'a) option) -> 'a -> 'b Seq.t = -``` -Unlike previously mentioned iterators, `Seq.unfold` does not have a sequence -parameter, but a sequence result. `unfold` provides a general means to build -sequences. For instance, `Seq.ints` can be implemented using `Seq.unfold` in a -fairly compact way: -```ocaml -let ints = Seq.unfold (fun n -> Some (n, n + 1));; -``` - -As a fun fact, one should observe `map` over sequences can be implemented using -`Seq.unfold`. Here is how to write it: -```ocaml -# let map f = Seq.unfold (fun s -> s |> Seq.uncons |> Option.map (fun (x, y) -> (f x, y)));; -val map : ('a -> 'b) -> 'a Seq.t -> 'b Seq.t = -``` -Here is a quick check: -```ocaml -# Seq.ints 0 |> map (fun x -> x * x) |> Seq.take 10 |> List.of_seq;; -- : int list = [0; 1; 4; 9; 16; 25; 36; 49; 64; 81] -``` -The function `Seq.uncons` returns the head and tail of a sequence if it is not -empty, or it otherwise returns `None`. - -Using this function: -```ocaml -let input_line_opt chan = - try Some (input_line chan, chan) - with End_of_file -> close_in chan; None -``` -It is possible to read a file using `Seq.unfold`: -```ocaml -"README.md" |> open_in |> Seq.unfold input_line_opt |> Seq.iter print_endline -``` - -Although this can be an appealing style, bear in mind that it does not prevent -taking care of open files. While the code above is fine, this one no longer is: -```ocaml -"README.md" |> open_in |> Seq.unfold input_line_opt |> Seq.take 10 |> Seq.iter print_endline -``` -Here, `close_in` will never be called over the input channel opened on -`README.md`. - -## Sequences Are Functions - -Although this looks like a possible way to define the [Fibonacci -sequence](https://en.wikipedia.org/wiki/Fibonacci_number): -```ocaml -# let rec fibs m n = Seq.cons m (fibs n (n + m));; -val fibs : int -> int -> int Seq.t = -``` -It actually isn't. It's a non-ending recursion which blows away the stack. -``` -# fibs 0 1;; -Stack overflow during evaluation (looping recursion?). -``` -This definition is behaving as expected: -```ocaml -# let rec fibs m n () = Seq.Cons (m, fibs n (n + m));; -val fibs : int -> int -> int Seq.t = -``` -It can be used to produce some Fibonacci numbers: -```ocaml -# fibs 0 1 |> Seq.take 10 |> List.of_seq;; -- : int list = [0; 1; 1; 2; 3; 5; 8; 13; 21; 34] -``` -Why is it so? The key difference lies in the recursive call `fibs n (n + m)`. In -the former definition, the application is complete because `fibs` is provided -with all the arguments it expects. In the latter definition, the application is -partial because the `()` argument is missing. Since evaluation is -[eager](https://en.wikipedia.org/wiki/Evaluation_strategy#Eager_evaluation) in -OCaml, in the former case, evaluation of the recursive call is triggered and a -non-terminating looping occurs. In contrast, in the latter case, the partially -applied function is immediately returned as a -[closure](https://en.wikipedia.org/wiki/Closure_(computer_programming)). - -Sequences are functions, as stated by their type: -```ocaml -# #show Seq.t;; -type 'a t = unit -> 'a Seq.node -``` -Functions working with sequences must be written accordingly. -* Sequence consumer: partially applied function parameter -* Sequence producer: partially applied function result - -When code dealing with sequences does not behave as expected, like if it is -crashing or hanging, there's a fair chance a mistake like in the first -definition of `fibs` was made. - -## Sequences for Conversions - -Throughout the standard library, sequences are used as a bridge to perform -conversions between many datatypes. For instance, here are the signatures of -some of those functions: -* Lists - ```ocaml - val List.of_seq : 'a list -> 'a Seq.t - val List.to_seq : 'a Seq.t -> 'a list - ``` -* Arrays - ```ocaml - val Array.of_seq : 'a array -> 'a Seq.t - val Array.to_seq : 'a Seq.t -> 'a array - ``` -* Strings - ```ocaml - val String.of_seq : string -> char Seq.t - val String.to_seq : char Seq.t -> string - ``` -Similar functions are also provided for sets, maps, hash tables (`Hashtbl`) and -others (except `Seq`, obviously). When implementing a datatype module, it is -advised to expose `to_seq` and `of_seq` functions. - -## Miscellaneous Considerations - -There are a couple of related libraries, all providing means to handle large -flows of data: - -* Rizo I [Streaming](/p/streaming) -* Simon Cruanes and Gabriel Radanne [Iter](/p/iter) -* Jane Street `Base.Sequence` - -There used to be a module called [`Stream`](/releases/4.13/api/Stream.html) in -the OCaml standard library. It was -[removed](https://github.com/ocaml/ocaml/pull/10482) in 2021 with the release of -OCaml 4.14. Beware books and documentation written before may still mention it. - -## Exercises - -* [Diagonal](/problems#100) -* [Streams](/problems#101) - From 76f23e22498e3ef30b08bdd3d17e28c76a6dd506 Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Fri, 5 May 2023 16:49:35 +0200 Subject: [PATCH 36/43] Add realworldocaml ref --- data/tutorials/lg_08_error_handling.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index 458aa834cc..61fb5a7ff6 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -11,7 +11,10 @@ date: 2021-05-27T21:07:30-00:00 In OCaml, errors can be handled in several ways. This document presents most of the available means. However, handling errors using the effect handlers -introduced in OCaml 5 isn't addressed yet. +introduced in OCaml 5 isn't addressed yet. This topic is also addressed in the +chapter [“Error Handling”](https://dev.realworldocaml.org/error-handling.html) +of the “Real World OCaml” book by Yaron Minsky and Anil Madhavapeddy (2ⁿᵈ +edition, Cambridge University Press, October 2022). ## Error as Special Values From 80c8538ba03437e73dc37a6d928b94735908be75 Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Wed, 17 May 2023 18:20:32 +0200 Subject: [PATCH 37/43] Fix Exit exception text --- data/tutorials/lg_08_error_handling.md | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index 61fb5a7ff6..73a36dc45e 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -104,16 +104,8 @@ Although the last one doesn't look as an exception, it actually is. Among the predefined exceptions of the standard library, the following ones are intended to be raised by user-written functions: -```ocaml -exception Exit -exception Not_found -exception Invalid_argument of string -exception Failure of string -``` -* `Exit` terminates your program with a success status, which is 0 in Unices - (where success is 0 and any other value is an error; that is, errors are - handled as special values, like mentioned in the first section) +* `Exit` can be used to terminate a loop, like a `break` statement * `Not_found` should be raised when searching failed because there isn't anything satisfactory to be found * `Invalid_argument` should be raised when a parameter can't be accepted From 6299385526c068c43b8b00e475845476df17becb Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Wed, 17 May 2023 18:21:02 +0200 Subject: [PATCH 38/43] Add Fun.protect text --- data/tutorials/lg_08_error_handling.md | 57 +++++++++++++++++++++++--- 1 file changed, 52 insertions(+), 5 deletions(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index 73a36dc45e..d8ce8a9654 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -125,11 +125,58 @@ exceptions, a design decision must be made: * Use the prexisting exceptions * Raise custom exceptions -Both can make sense, and there isn't a general rule. If the standard library exceptions -are used, they must be raised under their intended conditions, -otherwise handlers will have trouble processing them. Using custom -exceptions will force client code to include dedicated catch conditions. This -can be desirable for errors that must be handled at the client level. +Both can make sense, and there isn't a general rule. If the standard library +exceptions are used, they must be raised under their intended conditions, +otherwise handlers will have trouble processing them. Using custom exceptions +will force client code to include dedicated catch conditions. This can be +desirable for errors that must be handled at the client level. + +### Using `Fun.protect` + +The `Fun` module of the standard library contains the following defintion: +```ocaml +val protect : finally:(unit -> unit) -> (unit -> 'a) -> 'a +``` +This function is meant to be used when something _always_ needs to be done +_after_ a computation is complete, either it succeded or failed. Any computation +can be postponed by wrapping it into a dummy function with only `()` as + parameter. Here, the computation triggered by passing `x` to `f` (including its + side effects) will not take place: +```ocaml +let work () = f x +``` + +It would, on execution of `work ()`. This is what `protect` does, and such a +`work` function is the kind of parameter `protect` expects. The `finally` +function is called by `protect`, after the completion of `work ()`, in two +possible way, depending on its outcome +1. If it successed, the result produced is forwarded +2. If it failed, exception raised is forwarded + +The `finally` function is only expected to perform some side-effect. In summary, +`protect` performs two computations in order: `work` and then `finally`, and +forwards the outcome of `work` either result or exception. + +The `finally` function shall not raise any exception. If it does, it will be +raised again, but wrapped into `Finally_raised`. + +Here is an example of how it can be used. Let's imagine a function reading the +`n` first line of a text file is needed (like the `head` Unix command). If the +file hasn't enough lines, the function must throw `End_of_file`. Here is a +possible implementation using `Fun.protect` to make sure the file is always closed: +```ocaml +# let rec head_channel chan = + let rec loop acc n = match input_line chan with + | line when n > 0 -> loop (line :: acc) (n - 1) + | _ -> List.rev acc in + loop [];; +val head_channel : in_channel -> int -> string list = +# let head_file filename n = + let ic = open_in filename in + let finally () = close_in ic in + Fun.protect ~finally (fun () -> head_channel ic n);; +val head_file : string -> int -> string list = +``` ### Documentation From 8b5525a9ff703c60555301b861dfe9663c90aef5 Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Wed, 17 May 2023 18:23:17 +0200 Subject: [PATCH 39/43] Fix sample syntax --- data/tutorials/lg_08_error_handling.md | 40 ++++++++++++++++++++++++-- 1 file changed, 37 insertions(+), 3 deletions(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index d8ce8a9654..af41432060 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -465,6 +465,7 @@ doesn't count as an invalid substring. Below is the equivalent function using the same logic, but using `Option` instead of exceptions: + ```ocaml # let host_opt email = match String.index_opt email '@' with @@ -484,13 +485,15 @@ Although it qualifies as safe, its legibility isn't improved. Some may even claim it is worse. Before showing how to improve this code, we need to explain how `Option.map` and -`Option.bind` work. +`Option.bind` work. Here are their types: + ```ocaml -val Option.map : ('a -> 'b) -> 'a option -> 'b option -val Option.bind : 'a option -> ('a -> 'b option) -> 'b option +val map : ('a -> 'b) -> 'a option -> 'b option +val bind : 'a option -> ('a -> 'b option) -> 'b option ``` `Option.map` applies a function `f` to an option parameter, if it isn't `None` + ```ocaml let map f = function | Some x -> Some (f x) @@ -833,3 +836,34 @@ practice. Later, it always requires some thinking, which is good since proper error management shouldn't ever be overlooked. No error handling is better than the others, and is should be matter of adequacy to the context rather than of taste. But opinionated OCaml code is also fine, so it's a balance. + +# External Ressources + +- [“Exceptions”](https://v2.ocaml.org/releases/5.0/htmlman/coreexamples.html#s%3Aexceptions) in ”The OCaml Manual, The Core Language”, chapter 1, section 6, December 2022 +- [Module `Option`](https://v2.ocaml.org/releases/5.0/api/Option.html) in OCaml Library +- [Module `Result`](https://v2.ocaml.org/releases/5.0/api/Result.html) in Ocaml Library +- [“Error Handling”](https://dev.realworldocaml.org/error-handling.html) in “Real World OCaml”, part 7, Yaron Minsky and Anil Madhavapeddy, 2ⁿᵈ edition, Cambridge University Press, October 2022 +- “Add "finally" function to Pervasives”, Marcello Seri, GitHub PR, [ocaml/ocaml/pull/1855](https://github.com/ocaml/ocaml/pull/1855) + +# Acknowledgements + +- Authors + 1. Simon Cruanes [@c-cube](https://github.com/c-cubeauthored) + 2. John Whitington [@johnwhitington](https://github.com/johnwhitington) + 3. Cuihtlauac Alvarado [@cuihtlauac](https://github.com/cuihtlauac) +- Contributors + * Dan Frumin [@co-dan](https://github.com/co-dan) + * Jean-Pierre Rodi + * Thibaut Mattio [@tmattio](https://github.com/tmattio) + * Jonah Beckford [@jonahbeckford](https://github.com/jonahbeckford) +- Suggestions and Corrections: + * Claude Jager-Rubinson + * [@rand00](https://github.com/rand00) + * Guillaume Munch-Maccagnoni [@gadmm](https://github.com/gadmm) + * Edwin Török [@edwintorok](https://github.com/edwintorok) + * Kim Nguyễn [@Tchou](https://github.com/Tchou) + * Ashine Foster [@AshineFoster](https://github.com/AshineFoster) + * Miod Vallat [@dustanddreams](https://github.com/dustanddreams) + * Christine Rose [@christinerose](https://github.com/christinerose) + * Riku Silvola [@rikusilvola](https://github.com/rikusilvola) + * Guillaume Petiot [@gpetiot](https://github.com/gpetiot) \ No newline at end of file From 4574f58be36944ee6ec5454b51dfc7070ff6942e Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Wed, 17 May 2023 18:23:51 +0200 Subject: [PATCH 40/43] Fix references text --- data/tutorials/lg_08_error_handling.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index af41432060..e07fc1b559 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -11,10 +11,8 @@ date: 2021-05-27T21:07:30-00:00 In OCaml, errors can be handled in several ways. This document presents most of the available means. However, handling errors using the effect handlers -introduced in OCaml 5 isn't addressed yet. This topic is also addressed in the -chapter [“Error Handling”](https://dev.realworldocaml.org/error-handling.html) -of the “Real World OCaml” book by Yaron Minsky and Anil Madhavapeddy (2ⁿᵈ -edition, Cambridge University Press, October 2022). +introduced in OCaml 5 isn't addressed yet. This topic is also addressed in th [_Error Handling_](https://dev.realworldocaml.org/error-handling.html) chapter +of the _Real World OCaml_ book by Yaron Minsky and Anil Madhavapeddy (see references). ## Error as Special Values From 7e8bf1010158902ddde0db4fd97499246207b990 Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Fri, 19 May 2023 17:37:04 +0200 Subject: [PATCH 41/43] Include more feedback --- data/tutorials/lg_08_error_handling.md | 33 +++++++++++++++++++++++--- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index e07fc1b559..6aaa9e135f 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -19,7 +19,15 @@ of the _Real World OCaml_ book by Yaron Minsky and Anil Madhavapeddy (see refere Don't do that. Some languages, most emblematically C, treat certain values as errors. For -instance in `man 2 read`, one can read: +instance, in Unix systems, here what is contained in `man 2 read`: +> read - read from a file descriptor +> +> `#include ` +> +> `ssize_t read(int fd, void *buf, size_t count);` +> +> [...] +> > On error, -1 is returned, and `errno` is set to indicate the error. Great software was written using this style. However, since correct are errors @@ -176,6 +184,24 @@ val head_channel : in_channel -> int -> string list = val head_file : string -> int -> string list = ``` +### Asynchronous Exceptions + +Some exceptions don't arise because something attempted by the program failed, +but rather because an external factor is impeding its execution. Those exeptions +are called asynchronous. This is the case, for instance, of the following ones: + +* `Out_of_memory` +* `Stack_overflow` +* `Sys.Break` + +The latter is thrown when the user interrupts an interactive execution. Because +they are losely or unrelated with the program logic, it mostly doesn't make +sense to track the place where an asynchronous exceptions was thrown, could be +anywhere. Defining if an application needs to catch those exceptions and how it +should be done is beyond the scope of this tutorial. Interrested readers may +refer to Guillaume Munch-Maccagnoni [A Guide to recover from +interrupts](https://guillaume.munch.name/software/ocaml/memprof-limits/recovering.html). + ### Documentation Functions that can raise exceptions should be documented like this: @@ -702,9 +728,9 @@ tools, data, and functions can help. Use them. When `Option.bind` or `Result.bind` are used, they are often aliased into a custom binding operator, such as `let*`. However, it is also possible to use it -as binary operator, which is almost always writen `>>=`. Using `bind` this way +as a binary operator, which is very often writen `>>=`. Using `bind` this way must be detailed because it is extremely popular in other functional programming -languages, and specially in OCaml's arch-rival _Which Must Not Be Named_. +languages, and specially in Haskell. Assuming `a` and `b` are valid OCaml expressions, the following three pieces of sources code are functionally identical: @@ -842,6 +868,7 @@ of taste. But opinionated OCaml code is also fine, so it's a balance. - [Module `Result`](https://v2.ocaml.org/releases/5.0/api/Result.html) in Ocaml Library - [“Error Handling”](https://dev.realworldocaml.org/error-handling.html) in “Real World OCaml”, part 7, Yaron Minsky and Anil Madhavapeddy, 2ⁿᵈ edition, Cambridge University Press, October 2022 - “Add "finally" function to Pervasives”, Marcello Seri, GitHub PR, [ocaml/ocaml/pull/1855](https://github.com/ocaml/ocaml/pull/1855) +- “A guide to recover from interrupts”, Guillaume Munch-Maccagnoni, parf the [`memprof-limits`](https://gitlab.com/gadmm/memprof-limits/) documentation # Acknowledgements From d8807671f53fa432eff1439fb5040af08a093127 Mon Sep 17 00:00:00 2001 From: Cuihtlauac ALVARADO Date: Fri, 19 May 2023 19:00:15 +0200 Subject: [PATCH 42/43] More feedback --- data/tutorials/lg_08_error_handling.md | 150 ++++++++++++++++--------- 1 file changed, 100 insertions(+), 50 deletions(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index 6aaa9e135f..d19a3a6d16 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -37,13 +37,13 @@ consequences. This is not the proper way to deal with errors in OCaml. There are three major ways to make it impossible to ignore errors in OCaml: 1. Exceptions -1. `option` values -1. `result` values +1. **`option`** values +1. **`result`** values Use them. Do not encode errors inside data. Exceptions provide a mean to deal with errors at the control flow level while -`option` and `result` provide means to turn errors into dedicated data. +`option` and **`result`** provide means to turn errors into dedicated data. The rest of this document presents and compares approaches towards error handling. @@ -111,7 +111,7 @@ Although the last one doesn't look as an exception, it actually is. Among the predefined exceptions of the standard library, the following ones are intended to be raised by user-written functions: -* `Exit` can be used to terminate a loop, like a `break` statement +* `Exit` can be used to terminate an iteration, like a `break` statement * `Not_found` should be raised when searching failed because there isn't anything satisfactory to be found * `Invalid_argument` should be raised when a parameter can't be accepted @@ -273,10 +273,10 @@ errors at runtime. ### Exceptions Not Raised -Under panic circumstances, the native code compiler gives its best effort for -raising meaningful exceptions. However, some error conditions may remain -undetected, which will result in a segmentation fault. This is the specially the -case for stack overflows, which aren't always detected. +The compiler and runtime makes a best effort for raising meaningful exceptions. +However, some error conditions may remain undetected, which can result in a +segmentation fault. This is the specially the case for `Out_of_memory`, which +is not reliable. It used to be the case for `Stack_overflow`: > But catching stack overflows is tricky, both in Unix-like systems and under > Windows, so the current implementation in OCaml is a best effort that is @@ -285,14 +285,16 @@ case for stack overflows, which aren't always detected. [Xavier Leroy, October 2021](https://discuss.ocaml.org/t/stack-overflow-reported-as-segfault/8646/8) -### Bypassing Type-Safety +This has improved since. Only linked C code should be able to trigger an +undetected stack overflow. -OCaml provides means to bypass its type safety. Don't use it. Here is how -to shoot oneself in the foot: +### Genuinely Unsafe Functions + +Some OCaml functions are genuinely unsafe. Use them with care; not like this: ```shell -> echo "(Obj.magic () : int array).(0)" > foo.ml -> ocamlopt foo.ml +> echo "fst Marshal.(from_string (to_string 0 []) 0)" > boom.ml +> ocamlc boom.ml > ./a.out Segmentation fault (core dumped) ``` @@ -301,7 +303,7 @@ Segmentation fault (core dumped) When a crash isn't coming from: * A limitation of the native code compiler -* `Obj.magic` +* An genuinely unsafe function such as found in modules `Marshal` and `Obj` It may be a language bug. It happens. Here is what to do when this is suspected: @@ -321,19 +323,19 @@ the following terminology: * Function handling errors in data: Safe The main means to write such kind of safe error handling functions is to use -either `Option` (next section) or `Result` (following section). Although +either **`option`** (next section) or **`result`** (following section). Although handling errors in data using those types allows avoiding the issues of error values and execeptions, it incurs extracting the enclosed value at every step, which: * may require some boilerplate code. This * come with a runtime cost. -## Using the `Option` Type for Errors +## Using the **`option`** Type for Errors -The `Option` module provides the first alternative to exceptions. The `'a +The **`option`** module provides the first alternative to exceptions. The `'a option` datatype allows to express either the availability of data for instance `Some 42` or the absence of data using `None`, which can represent an error. -Using `Option` it is possible to write functions that return `None` instead of +Using **`option`** it is possible to write functions that return `None` instead of throwing an exception. Here are two examples of such functions: ```ocaml # let div_opt m n = @@ -363,12 +365,6 @@ Exception: Not_found. - : int option = None ``` -This can even be turned into a higher-order generic function: -```ocaml -# let try_opt f x = try Some (f x) with _ -> None;; -val try_opt : ('a -> 'b) -> 'a -> 'b option = -``` - It tends to be considered good practice nowadays when a function can fail in cases that are not bugs (i.e., not `assert false`, but network failures, keys not present, etc.) to return type such as `'a option` or `('a, 'b) result` (see @@ -414,7 +410,7 @@ Error: This expression has type 'a option ``` In order to combine option values with other values, conversion functions are -needed. Here are the functions provided by the `Option` module to extract the +needed. Here are the functions provided by the **`option`** module to extract the data contained in an option: ```ocaml val get : 'a t -> 'a @@ -487,7 +483,7 @@ happen. In the worst case, the `@` character is the last one, then `fqdn_pos` is off range by one but `fqdn_len` is null, and that combination of parameters doesn't count as an invalid substring. -Below is the equivalent function using the same logic, but using `Option` instead of +Below is the equivalent function using the same logic, but using **`option`** instead of exceptions: ```ocaml @@ -536,7 +532,9 @@ let bind opt f = match opt with | None -> None ``` -`bind` having flipped parameter with respect to `map` allows using it as a [binding operator](/manual/bindingops.html), which is an extension of OCaml providing means to create “custom `let`”. Here is how it goes: +`bind` having flipped parameter with respect to `map` allows using it as a +[binding operator](/manual/bindingops.html), which is a popular extension of +OCaml providing means to create “custom `let`”. Here is how it goes: ```ocaml # let ( let* ) = Option.bind;; val ( let* ) : 'a option -> ('a -> 'b option) -> 'b option = @@ -587,13 +585,13 @@ prevented having a return value. `None` is silent, it doesn't say anything about what went wrong. For this reason, functions returning option values should document the circumstances under which it may return `None`. Such a documentation is likely to ressemble to the one required for exceptions using -`@raise`. The `Result` type is intended to fill this gap: manage error in data, +`@raise`. The **`result`** type is intended to fill this gap: manage error in data, like option values, but also provide information on errors, like exceptions. It is the topic of the next section. -## Using the `Result` Type for Errors +## Using the **`result`** Type for Errors -The `Result` module of the standard library defines the following type: +The **`result`** module of the standard library defines the following type: ```ocaml type ('a, 'b) result = @@ -605,12 +603,12 @@ A value `Ok x` means that the computation succeeded and produced `x`, a value `Error e` means that it failed, and `e` represents whatever error information has been collected in the process. Pattern matching can be used to deal with both cases, as with any other sum type. However using `map` and `bind` -can be more convenient, maybe even more as it was with `Option`. +can be more convenient, maybe even more as it was with **`option`**. Before taking a look at `Result.map`, let's think about `List.map` and `Option.map` under a changed perspective. Both functions behave as identity when applied to `[]` or `None`, respectively. That's the only possibility since those -parameters don't carry any data. Which isn't the case in `Result` with its +parameters don't carry any data. Which isn't the case in **`result`** with its `Error` constructor. Nevertheless, `Result.map` is implemented likewise, on `Error`, it also behaves like identity. @@ -625,7 +623,7 @@ let map f = function | Error e -> Error e ``` -The `Result` module has two map functions: the one we've just seen and another +The **`result`** module has two map functions: the one we've just seen and another one, with the same logic, applied to `Error` Here is its type: @@ -665,16 +663,16 @@ a string wrapped in an option, if anything goes wrong `None` is returned. - `Yaml.of_string` parses a string an turns into an ad-hoc OCaml type - `Yaml.find` recursively searches a key in a Yaml tree, if found, it returns the corresponding data, wrapped in an option -- `Option.to_result` performs conversion of an `option` into a `result`. +- `Option.to_result` performs conversion of an **`option`** into a **`result`**. - Finally, `let*` stands for `Result.bind`. -Since functions from the `Yaml` module both returns `result` data, it is easier +Since functions from the `Yaml` module both returns **`result`** data, it is easier to write a pipe which processes that type all along. That's why -`Option.to_result` needs to be used. Stages which produce `result` must be +`Option.to_result` needs to be used. Stages which produce **`result`** must be chained using `bind`, stages which do not must be chained using some map -function, in order for the result to be wrapped back into a `result`. +function, in order for the result to be wrapped back into a **`result`**. -The map functions of the `Result` module allows processing of data or errors, +The map functions of the **`result`** module allows processing of data or errors, but the routines used must not fail, as `Result.map` will never turn an `Ok` into an `Error` and `Result.map_error` will never turn an `Error` into an `Ok`. On the other hand, functions passed to `Result.bind` are allowed to fail. As @@ -699,7 +697,7 @@ That behaviour can be achieved by defining the following function: val recover : ('e -> 'a option) -> ('a, 'e) result -> ('a, 'e) result = ``` -Although any kind of data can be wrapped as a `result` `Error`, it is +Although any kind of data can be wrapped as a **`result`** `Error`, it is recommended to use that constructor to carry actual errors, for instance: - `exn`, in which case the result type just makes exceptions explicit - `string`, where the error case is a message that indicates what failed @@ -709,12 +707,12 @@ recommended to use that constructor to carry actual errors, for instance: accurate (each error can be dealt with explicitly and occurs in the type), but the use of polymorphic variants sometimes make the code harder to read. -Note that some say the types `result` and `Either.t` are +Note that some say the types **`result`** and `Either.t` are [ismorphic](https://en.wikipedia.org/wiki/Isomorphism). Concretely, it means it's always possible to replace one by the other, like in a completely neutral -refactoring. Values of type `result` and `Either.t` can be translated back and +refactoring. Values of type **`result`** and `Either.t` can be translated back and forth, and appling both translations one after the other, in any order, returns -to the starting value. Nevertheless, this doesn't mean `result` should be used +to the starting value. Nevertheless, this doesn't mean **`result`** should be used in place of `Either.t`, or vice versa. Naming things matters, as punned by Phil Karlton's famous quote: @@ -733,7 +731,7 @@ must be detailed because it is extremely popular in other functional programming languages, and specially in Haskell. Assuming `a` and `b` are valid OCaml expressions, the following three pieces of -sources code are functionally identical: +source code are functionally identical: ```ocaml bind a (fun x -> b) @@ -800,6 +798,58 @@ therefore, picking the right style is left to the author's decision. That applies error handling, pick a style knowingly. See the [OCaml Programming Guidelines](/docs/guidelines) for more details on those matters. +## Convertions Between Errors + +### Throwing Exceptions From **`option`** or **`result`** + +This is done by using the following functions: + +- From **`option`** to `Failure` exception, use function `Option.get`: + ```ocaml + val get : 'a option -> 'a + ``` + +- From **`result`** to `Failure`, exception use function `Result.get_ok` and `Result.get_error`: + ```ocaml + val get_ok : ('a, 'e) result -> 'a + val get_error : ('a, 'e) result -> 'e + ``` + +To raise other exceptions, pattern matching and `raise` must be used. + +## Convertion Between **`option`** and **`result`** + +This is done by using the following functions: + +- From **`option`** to **`result`**, use function `Option.to_result`: + ```ocaml + val to_result : none:'e -> 'a option -> ('a, 'e) result + ``` +- From **`result`** to **`option`**, use function `Result.to_option`: + ```ocaml + val to_option : ('a, 'e) result -> 'a option + ``` + +## Turning Exceptions in to **`option`** or **`result`** + +The standard library does not provide such functions. This must be done using +**`try ... with`** or `match ... exception` statements. For instance, here is +how to create a version of `Stdlib.input_line` which returns and **`option`** +instead of throwing an exception: + +```ocaml +let input_line_opt ic = try Some (input_line ic) with End_of_file -> None +``` + +It would be same for **`result`**, except some data must be provided to the +`Error` constructor. + +Some may like to turn this into a higher-order generic function: +```ocaml +# let catch f x = try Some (f x) with _ -> None;; +val catch : ('a -> 'b) -> 'a -> 'b option = +``` + ## Assertions The built-in `assert` instruction takes an expression as an argument and throws @@ -838,13 +888,13 @@ match Sys.os_type with | _ -> failwith "this system is not supported" ``` -It is right to use `failwith`, because using `assert` would be incorrect. Here is -the dual example: +It is right to use `failwith`, other operating systems aren't supported, but +they are possible. Here is the dual example: ```ocaml function x when true -> () | _ -> assert false ``` -Here, it wouldn't be beneficial to use `failwith` because it requires a corrupted system or -for the compiler to be bugged for the second code path to be executed. +Here, it wouldn't be correct to use `failwith` because it requires a corrupted +system or the compiler to be bugged for the second code path to be executed. Breakage of the language semantics qualifies as extraordinary circumstances. It is catastrophic! @@ -854,7 +904,7 @@ Properly handling errors is a complex matter. It is [cross-cutting concern](https://en.wikipedia.org/wiki/Cross-cutting_concern), touches all parts of an application, and can't be isolated in a dedicated module. In contrast to several other mainstream languages, OCaml provides several mechanisms to handle -exceptional circumstances, all with good runtime performance and code +exceptional events, all with good runtime performance and code understandability. Using them properly requires some initial learning and practice. Later, it always requires some thinking, which is good since proper error management shouldn't ever be overlooked. No error handling is better @@ -864,8 +914,8 @@ of taste. But opinionated OCaml code is also fine, so it's a balance. # External Ressources - [“Exceptions”](https://v2.ocaml.org/releases/5.0/htmlman/coreexamples.html#s%3Aexceptions) in ”The OCaml Manual, The Core Language”, chapter 1, section 6, December 2022 -- [Module `Option`](https://v2.ocaml.org/releases/5.0/api/Option.html) in OCaml Library -- [Module `Result`](https://v2.ocaml.org/releases/5.0/api/Result.html) in Ocaml Library +- [Module **`option`**](https://v2.ocaml.org/releases/5.0/api/Option.html) in OCaml Library +- [Module **`result`**](https://v2.ocaml.org/releases/5.0/api/Result.html) in Ocaml Library - [“Error Handling”](https://dev.realworldocaml.org/error-handling.html) in “Real World OCaml”, part 7, Yaron Minsky and Anil Madhavapeddy, 2ⁿᵈ edition, Cambridge University Press, October 2022 - “Add "finally" function to Pervasives”, Marcello Seri, GitHub PR, [ocaml/ocaml/pull/1855](https://github.com/ocaml/ocaml/pull/1855) - “A guide to recover from interrupts”, Guillaume Munch-Maccagnoni, parf the [`memprof-limits`](https://gitlab.com/gadmm/memprof-limits/) documentation From c6adbf89594343e534bef8aea2dd7cf3c55160a8 Mon Sep 17 00:00:00 2001 From: sabine Date: Wed, 7 Jun 2023 10:41:34 +0200 Subject: [PATCH 43/43] Update data/tutorials/lg_08_error_handling.md --- data/tutorials/lg_08_error_handling.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/data/tutorials/lg_08_error_handling.md b/data/tutorials/lg_08_error_handling.md index d19a3a6d16..2bf200bc47 100644 --- a/data/tutorials/lg_08_error_handling.md +++ b/data/tutorials/lg_08_error_handling.md @@ -920,7 +920,8 @@ of taste. But opinionated OCaml code is also fine, so it's a balance. - “Add "finally" function to Pervasives”, Marcello Seri, GitHub PR, [ocaml/ocaml/pull/1855](https://github.com/ocaml/ocaml/pull/1855) - “A guide to recover from interrupts”, Guillaume Munch-Maccagnoni, parf the [`memprof-limits`](https://gitlab.com/gadmm/memprof-limits/) documentation -# Acknowledgements + \ No newline at end of file