diff --git a/asset/feed.xml b/asset/feed.xml index d351708050..c0a6385f74 100644 --- a/asset/feed.xml +++ b/asset/feed.xml @@ -1,5 +1,5 @@ -https://ocaml.org/feed.xmlOCaml.org blog2023-05-02T05:51:46-00:00https://tarides.com/feed.xmltarides<p><a href="https://tn23.mini.debconf.org/">MinidebConf TN 23</a> was organised by Debian Developers and Villupuram Linux Users Group (VGLUG) as a precursor to DebConf 23 in September at Kochi, India. I had an opportunity to attend and speak at MiniDebConf TN.</p> +https://ocaml.org/feed.xmlOCaml.org blog2023-05-02T14:42:52-00:00https://tarides.com/feed.xmltarides<p><a href="https://tn23.mini.debconf.org/">MinidebConf TN 23</a> was organised by Debian Developers and Villupuram Linux Users Group (VGLUG) as a precursor to DebConf 23 in September at Kochi, India. I had an opportunity to attend and speak at MiniDebConf TN.</p> <p>I presented two sessions, one built on our experiences of introducing <a href="https://github.com/ocaml/code-of-conduct">a Code of Conduct</a> to an <a href="https://discuss.ocaml.org/t/adopting-the-ocaml-code-of-conduct/10870">open source community</a> <a href="https://hackmd.io/JIWCOrBfQ7CfzPqeDw4t2Q#/">(slides here</a>), and one called <a href="https://hackmd.io/wgB3EzlAQA6aTnQGyyp5Rw#/"><em>An Invitation to OCaml</em></a>, aimed at people with no prior OCaml experience. I was pleased to see a lot of folks getting interested in learning OCaml.</p> <p>Over the course of two days, I attended interesting sessions by speakers from across India and other parts of the world.</p> <h3 style="position:relative;"><a href="https://tarides.com/feed.xml#first-day" aria-label="first day permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>First Day</h3> diff --git a/data/planet/ahrefs.xml b/data/planet/ahrefs.xml new file mode 100644 index 0000000000..a7559d7b8b --- /dev/null +++ b/data/planet/ahrefs.xml @@ -0,0 +1,2 @@ + +https://medium.com/feed/ahrefs/tagged/ocamlahrefs2023-05-02T14:42:52-00:00https://medium.com/feed/ahrefs/tagged/ocamlahrefs<h3>Intro</h3><p>There has never been a better time to learn OCaml, one of the premier statically-typed functional programming languages used in industry. We at Ahrefs have used it on our backend since the early days of the company, and <a href="https://tech.ahrefs.com/one-and-a-half-years-of-reasonml-in-production-2250cf5ba63b">since 2018 have even used it extensively for our frontend code</a>. Today we&rsquo;re very excited to share with you our favorite recommendations for getting started with&nbsp;OCaml!</p><h3>Best resources for learning&nbsp;OCaml</h3><p>If you have little functional programming experience or are even a beginner to programming, the place to start is <a href="https://ocaml-book.com/">OCaml from the Very Beginning</a>, a book that is now free thanks to generous funding from the <a href="https://ocaml-sf.org/">OCaml Software Foundation</a> (which Ahrefs is a sponsor of). The book can be viewed directly from its website and can also be downloaded as a PDF. While advertised as being approachable even to new programmers, this doesn&rsquo;t quite seem to be true, at least based on feedback we&rsquo;ve received. The text itself introduces concepts in a structured way, but the exercises require a little background in programming to complete. Such background info can be obtained by watching the opening videos in <a href="https://www.youtube.com/playlist?list=PLre5AT9JnKShBOPeuiD9b-I4XROIJhkIU">this series of lectures</a> from Cornell&rsquo;s OCaml programming course. Speaking of the Cornell course, their official online textbook <a href="https://cs3110.github.io/textbook/cover.html">OCaml Programming: Correct + Efficient + Beautiful</a> is an excellent resource for learners with some programming experience under their belt (specifically, you should have written some code using a mainstream imperative language like Python or&nbsp;Java).</p><h3>How to install&nbsp;OCaml</h3><p>The <a href="https://ocaml.org/docs/up-and-running">official installation instructions</a> are entirely adequate. It shows you how to install the compiler and some useful dev tools like dune (build system), utop (interactive read-eval-print loop), and ocaml-lsp-server (useful for editor integration).</p><p>Actually, if you are working through <a href="https://ocaml-book.com/">OCaml from the Very Beginning</a>, you do not need to install OCaml during the first several chapters, as you can execute code snippets directly on the <a href="https://try.ocamlpro.com/">Try OCaml</a> page, or create a notebook at <a href="https://sketch.sh/">Sketch.sh</a>, an interactive OCaml notebook site maintained by Ahrefs through our monthly Open Source Friday&nbsp;program.</p><h3>Editor support</h3><p>For most beginners to the language, we recommend the <a href="https://marketplace.visualstudio.com/items?itemName=ocamllabs.ocaml-platform">official OCaml Platform Visual Studio Code extension</a>. The official OCaml installation guide has a <a href="https://ocaml.org/docs/up-and-running#editor-support-for-ocaml">good section on setting it up</a>. There is also great support for users of <a href="https://github.com/ocaml/tuareg">emacs</a> and&nbsp;<a href="https://github.com/ocaml/vim-ocaml">vim</a>.</p><h3>Tips</h3><p>Even early on, it&rsquo;s a good idea to start saving your code into&nbsp;.ml files and learning how to run it. The easiest way to run a simple program is to start up utop and inside of it run #use &quot;name_of_your_program.ml&quot; as <a href="https://ocaml.org/docs/first-hour#running-ocaml-programs">described here</a>.</p><p>When creating a new notebook in <a href="https://sketch.sh/">Sketch.sh</a>, the default syntax is ReasonML (it&rsquo;s not a different language, just <a href="https://en.wikipedia.org/wiki/Reason_(programming_language)">an alternate syntax that more resembles JavaScript</a>). Click on ML in the top left corner to switch to the original OCaml&nbsp;syntax.</p><figure><img src="https://cdn-images-1.medium.com/max/1024/0*EgPCQlLR4C54N3Ge" alt=""/></figure><h3>Conclusion</h3><p>OCaml originated from French academia more than 25 years ago, and from there spread to elite universities and forward-thinking companies around the world. Now the OCaml community has produced high quality learning material that is both free and easy to access. So take the initiative and learn you some&nbsp;OCaml!</p><img src="https://medium.com/_/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=2f22b578b984" width="1" height="1" alt=""/><hr/><p><a href="https://tech.ahrefs.com/how-to-get-started-with-ocaml-in-2022-2f22b578b984">How to get started with OCaml in 2022</a> was originally published in <a href="https://tech.ahrefs.com">Ahrefs</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>https://tech.ahrefs.com/how-to-get-started-with-ocaml-in-2022-2f22b578b984?source=rss----303662d88bae--ocamlHow to get started with OCaml in 20222022-10-31T16:13:10-00:00ahrefshttps://medium.com/feed/ahrefs/tagged/ocamlahrefs<h3>Monorobot: a notification bot for monorepos</h3><figure><img src="https://cdn-images-1.medium.com/max/1024/1*BQNCNLNPxGV3eXajYe6y2g.png" alt=""/><figcaption>Monorobot enables configurable directory tree notifications for your monorepo.</figcaption></figure><p>A few years ago, we decided to move most of our code into a monorepo. Many <a href="https://danluu.com/monorepo/">advocates</a> have highlighted its upsides, which include better cross-project coordination and simpler dependency management.</p><p>But one problem remained: <strong>none of the available GitHub integrations for Slack work nicely with monorepos</strong>. Slack is vital for day to day communication among Ahrefs&rsquo; globally distributed team, so a Slack integration was a must-have feature. We needed a service that could map activity from our various subprojects to their corresponding Slack channels&#8202;&mdash;&#8202;something existing solutions didn&rsquo;t&nbsp;offer.</p><p>That&rsquo;s why we built our own integration: <strong>Monorobot</strong>, a notification bot for monorepos. We&rsquo;ve improved it iteratively since the monorepo transition, incorporating real time feedback from our engineers over time. Today, Monorobot is an active member of Ahrefs&rsquo; Slack workspace, dutifully routing GitHub activity notifications to different channels based on the relevance of each activity.</p><p>And now we&rsquo;re <a href="https://github.com/ahrefs/monorobot">open-sourcing Monorobot</a>, for anybody to use in their monorepo setup! The package is available via&nbsp;<a href="https://opam.ocaml.org/packages/monorobot/">OPAM</a>:</p><pre>opam install monorobot</pre><p>Read on for more details about the motivation, an overview of the main features, and what&rsquo;s in the pipeline.</p><h3>Existing Slack integrations lack monorepo&nbsp;support</h3><p>Within a monorepo, multiple projects have their code located in separate, nested directories. Correspondingly, each project&rsquo;s Slack channel is only interested in activity from that part of the overall repository. The issue with most GitHub-to-Slack integrations is that once you subscribe a Slack channel to a GitHub repository, the channel receives <em>all</em> activity from that repository.</p><p>Suppose we operate various camel-related services, and we&rsquo;re planning to launch a new camel ride-sharing app called Camel Ride. The directory structure could look like&nbsp;this:</p><pre>monorepo/<br/>| frontend/<br/>| | camelride_ui/<br/>| | | mobile/<br/>| | | web/<br/>| | cameldance/<br/>| backend/<br/>| | camelride/<br/>| | | routing/<br/>| | | pricing/<br/>| | camelfood/</pre><p>As you can see, both the frontend/ and backend/ directories contain code for our fictitious ride-sharing service, along with code from other projects.</p><p>If we were to connect the <a href="https://slack.com/help/articles/232289568-GitHub-for-Slack">GitHub for Slack</a> integration to this repository, notifications for activity from all projects would be sent to the same channel. Even if I were only interested in activity from the Camel Ride project, I&rsquo;d need to sift through notifications from the other, unrelated projects. Imagine the volume of notifications this would create for a larger monorepo with dozens of projects. What a&nbsp;mess!</p><h3>Enter Monorobot</h3><figure><img src="https://cdn-images-1.medium.com/max/1024/1*ZHkBDdpNcMMw29BmHirM_A.png" alt=""/><figcaption>Monorobot, hard at&nbsp;work.</figcaption></figure><p>Monorobot enables more granular control over where notifications from the same repository get routed, depending on the type of activity. This routing behavior can be defined in a configuration file named&nbsp;.monorobot.json, which should be committed to the root of the monorepo. Once you create a <a href="https://docs.github.com/en/developers/webhooks-and-events/webhooks/about-webhooks">GitHub webhook</a> from the repository to a running instance of Monorobot, it will use the configuration file to route notifications to relevant channels based on the webhook event&nbsp;payload:</p><ul><li>For pushed commits, it checks the path prefixes of the files modified in the&nbsp;commits.</li><li>For activity related to PRs and issues, it checks their&nbsp;labels.</li><li>For status updates on pushed commits (e.g., CI builds), it uses the same path prefix logic as pushed&nbsp;commits.</li></ul><p>Additionally, Monorobot supports unfurling GitHub links shared in&nbsp;Slack.</p><h4>Path prefix routing for commit push notifications</h4><p>Continuing with our example, suppose we want to route all commit activity related to the Camel Ride project to a Slack channel called <em>#camelride</em>. Our configuration file might look like&nbsp;this:</p><pre>{<br/> ...,<br/> &quot;prefix_rules&quot;: {<br/> &quot;rules&quot;: [<br/> {<br/> &quot;match&quot;: [<br/> &quot;frontend/camelride_ui/&quot;,<br/> &quot;backend/camelride/&quot;<br/> ],<br/> &quot;ignore&quot;: [<br/> &quot;frontend/camelride_ui/images&quot;,<br/> ],<br/> &quot;channel&quot;: &quot;camelride&quot;<br/> }<br/> ]<br/> }<br/>}</pre><p>Each rule &ldquo;matches&rdquo; a file path to a channel. Whenever someone pushes a commit touching files with either of these prefixes, the <em>#camelride</em> channel will be notified. All other commits will be&nbsp;ignored.</p><p>If a file prefix appears in a rule&rsquo;s optional ignore field, the rule won't be matched even if the prefix is is also in the match field. In the above snippet, the Camel Ride frontend team has decided to silence notifications for activity in the images/ subdirectory.</p><p>Now, let&rsquo;s say the project&rsquo;s price optimization team is growing, and they&rsquo;ve decided to create their own separate Slack channel called <em>#camelride-pricing</em>. We can simply commit an update to the&nbsp;.monorobot.json file, and Monorobot will detect the configuration change:</p><pre>{<br/> ...,<br/> &quot;prefix_rules&quot;: {<br/> &quot;rules&quot;: [<br/> {<br/> &quot;match&quot;: [<br/> &quot;frontend/camelride_ui/&quot;,<br/> &quot;backend/camelride/&quot;<br/> ],<br/> &quot;ignore&quot;: [<br/> &quot;frontend/camelride_ui/images&quot;,<br/> ],<br/> &quot;channel&quot;: &quot;camelride&quot;<br/> },<br/> {<br/> &quot;match&quot;: [<br/> &quot;backend/camelride/pricing/&quot;<br/> ],<br/> &quot;channel&quot;: &quot;camelride-pricing&quot;<br/> }<br/> ]<br/> }<br/>}</pre><p>Since Monorobot will match the rule with the longest matched prefix, only commits related to the price optimization aspect of Camel Ride will notify <em>#camelride-pricing</em>, and all other general Camel Ride commits will notify <em>#camelride</em>.</p><p>There are additional configuration options for prefix rules (and for label rules discussed in the next section) that aren&rsquo;t mentioned here. Visit the <a href="https://github.com/ahrefs/monorobot">repository</a> for the full&nbsp;details.</p><h4>Label-based routing for PRs and issue notifications</h4><p>For activity related to pull requests and issues (opening, closing, merging, commenting, and reviewing), Monorobot uses labels to determine routing. The format is largely the same as for path prefix&nbsp;routing:</p><pre>{<br/> ...,<br/> &quot;label_rules&quot;: {<br/> &quot;default_channel&quot;: &quot;notifications&quot;,<br/> &quot;rules&quot;: [<br/> {<br/> &quot;match&quot;: [<br/> &quot;Camel Ride&quot;<br/> ],<br/> &quot;channel&quot;: &quot;camelride&quot;<br/> },<br/> {<br/> &quot;match&quot;: [<br/> &quot;Price Optimization&quot;<br/> ],<br/> &quot;channel&quot;: &quot;camelride-pricing&quot;<br/> }<br/> ]<br/> }<br/>}</pre><p>Here, all PRs and issues with the &ldquo;Camel Ride&rdquo; label will have activity sent to <em>#camelride</em>; those with the &ldquo;Price Optimization&rdquo; label to <em>#camelride-pricing</em>; and those with both labels to both channels.</p><p>The default_channel field provides an option to fall back on a channel if no rule is matched; this option is available for prefix rules as&nbsp;well.</p><h4>Status notifications</h4><p>Monorobot also supports build status notifications for CI pipelines. When it receives a status update for a pushed commit, it routes it to the relevant channel(s) by applying the prefix rules to the commit associated with the build. Further filtering based on status (e.g., ignoring canceled builds, and only notifying for a successful build when preceded by a failed one) is also possible.</p><h4>Link unfurling</h4><p>Finally, Monorobot can unfurl links to GitHub repositories shared on Slack (including private ones, if a personal access token is provided). This applies to commit, issue, and pull request&nbsp;URLs.</p><h3>What&rsquo;s next</h3><p>Monorobot is actively used at Ahrefs today, but there are lots of promising future directions it could take. Here, we list a&nbsp;few.</p><h4>Unifying GitHub and Slack identities</h4><p>It would be useful to allow GitHub user IDs to be mapped to Slack ones. This would enable more personalized features for Monorobot, such as direct messaging a user when their review is requested or when a CI build fails on a feature branch they authored.</p><h4>Consolidating notifications</h4><p>Sometimes, a collection of multiple GitHub webhook events makes sense to be grouped and delivered as a single Slack notification. For example, pull request reviews generate discrete webhook events for each review comment, but it would make more sense to pool them together, so as not to spam a channel with many notifications.</p><h4>Better status notifications</h4><p>A CI build failure can have multiple potential causes:</p><ol><li>A bad&nbsp;commit</li><li>A previous bad commit that has yet to be&nbsp;fixed</li><li>An issue with the pipeline itself (this is out of scope for Monorobot)</li></ol><p>It can be quite tricky to discern between the first two causes from the GitHub webhook event alone. Cause 1 is handled well by our current approach of using path prefix routing on the commit associated with the build. But with cause 2, that same approach doesn&rsquo;t always send the build failure notification to the channel where it is actually relevant. In that case, the originator of the initial bad commit won&rsquo;t be nagged about all subsequent failures, and Slack channels with no relevance to the cause of failure will get polluted with unnecessary notifications.</p><p>How can we best determine whether a status notification is &ldquo;relevant&rdquo; to a Slack channel? This is still an open question, but one possible direction is to track build state per <em>build step</em> rather than per <em>status</em>, and route notifications based on that. For example, if an overall build fails due to a backend build step failure, then it could be sent to a channel where the frontend team won&rsquo;t be notified.</p><h3>Wrapping up</h3><p>The overall goal of Monorobot is to make Slack notifications more <em>relevant</em> for all teams in a large <em>monorepo environment</em>, using the information available from GitHub webhook events. We&rsquo;ve had fairly positive results with our own internal usage, and now we hope others find it useful as&nbsp;well.</p><p>Monorobot is written in OCaml. <a href="https://github.com/ahrefs/monorobot">We welcome your feedback and contributions on&nbsp;GitHub!</a></p><p>P.S. If anyone does make an actual ride sharing service for camels, do let us&nbsp;know&hellip;</p><p><em>Thanks to Feihong, Igor, and Louis for feedback on this&nbsp;post.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=374260e2ca43" width="1" height="1" alt=""/><hr/><p><a href="https://tech.ahrefs.com/monorobot-a-slack-bot-for-monorepos-374260e2ca43">Monorobot: a Slack bot for monorepos</a> was originally published in <a href="https://tech.ahrefs.com">Ahrefs</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>https://tech.ahrefs.com/monorobot-a-slack-bot-for-monorepos-374260e2ca43?source=rss----303662d88bae--ocamlMonorobot: a Slack bot for monorepos2021-12-09T15:19:04-00:00ahrefshttps://medium.com/feed/ahrefs/tagged/ocamlahrefs<figure><img src="https://cdn-images-1.medium.com/max/1024/1*tYLUO4FDmJ6bzlsPp14LdQ.jpeg" alt=""/><figcaption>Photo by <a href="https://unsplash.com/@madebyjens?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Jens Lelie</a> on&nbsp;<a href="https://unsplash.com/s/photos/fork-road?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></figcaption></figure><p>At Ahrefs, we have been using BuckleScript and ReasonML in production <a href="https://tech.ahrefs.com/one-and-a-half-years-of-reasonml-in-production-2250cf5ba63b">for more than two years</a>. We already have a codebase of tens of thousands of lines of code, with several web applications that are data intensive and communicate with backend services written in <a href="http://ocaml.org/">OCaml</a>, using tools like&nbsp;<a href="https://github.com/ahrefs/atd">atd</a>.</p><p>Given our investment in these technologies, we have been following closely the recent changes in <a href="https://rescript-lang.org/">ReScript</a>, with its rebrand and renaming, and the split with the ReasonML project, explained in the project <a href="https://rescript-lang.org/blog/bucklescript-is-rebranding">blog&nbsp;post</a>.</p><h3>ReScript: becoming its own&nbsp;language</h3><p>We are excited about the way ReScript is unifying the experience and making it easier for developers who are getting started to find documentation in a single place, as well as continuing its strong focus on performance and readable JavaScript output.</p><p>On the other hand, we are trying to figure out the implications of this change in the mid- and long-term, especially regarding the integration with the OCaml ecosystem. And more importantly, what this evolution will mean for production users like us who rely on this integration.</p><p>ReScript integration with OCaml has historically been seamless, as BuckleScript started originally as a <a href="https://www.reddit.com/r/ocaml/comments/4enok3/bloombergbucklescript_a_back_end_for_the_ocaml/">new backend for the OCaml compiler</a>. However, in recent months, there have been several hints that ReScript wants to evolve towards becoming its own language:</p><ul><li>It has now <a href="https://github.com/rescript-lang/syntax">its own parser</a>, incompatible with OCaml native applications</li><li>Official repository guidelines for technical writing mentions explicitly that <a href="https://github.com/rescript-association/rescript-lang.org/blob/master/CONTRIBUTING.md#technical-writing-documentation">no reference to OCaml</a> should appear in&nbsp;docs</li><li>Upgrades to the latest version of OCaml compiler, which <a href="https://web.archive.org/web/20210208054855if_/https://github.com/rescript-lang/rescript-compiler/wiki">used to be part of the roadmap</a>, have been <a href="https://forum.rescript-lang.org/t/some-thoughts-on-community-building/1474">deprioritized</a> recently.</li></ul><p>So, even if officially ReScript has not announced that they will break backwards compatibility with OCaml, just the fact that it is sticking with an old version of the OCaml compiler poses some challenges for us in terms of tooling. The uncertainty about the future and the pace of changes add some risk to the high-level goals we have for our teams and codebase: we would like to share <em>more</em> code between frontend and backend, not&nbsp;less.</p><h3>Melange: a fork of ReScript, focused on OCaml compatibility</h3><p>When Ant&oacute;nio Monteiro <a href="https://anmonteiro.com/2021/03/on-ocaml-and-the-js-platform/">announced Melange</a>, a fork of ReScript but with a strong focus on keeping compatibility with OCaml, we decided to try it out and see how it could work for&nbsp;us.</p><p>Ultimately, the experiment was successful. We managed to build all our frontend applications with Melange, while keeping the existing bundling setup, which currently uses&nbsp;Webpack.</p><p>Throughout this process, we had to modify some parts of the code. We will now go through the most relevant parts of the&nbsp;process:</p><ul><li>Upgrade to OCaml 4.12: the most relevant part was the deprecation of Pervasives module to use&nbsp;Stdlib.</li><li>Use ppxlib in our ppxs: we had to upgrade the two ppxs that we use in the frontend codebase to the latest compiler version, <a href="https://github.com/ahrefs/bs-emotion/compare/master...jchavarri:ocaml4.12-ppxlib">bs-emotion-ppx</a> and an in-house <a href="https://github.com/ahrefs/bs-react-intl-ppx">ppx for internationalization</a>.</li><li>Configure esy: we were already using esy to bring the editor tooling into scope of the developer environment, so we just had to make sure melange would also be included in the json configuration.</li><li>Upgrade to Reason 3.7.0: a quite simple change too, as the whole process is automated by using refmt. As a side note, we ran into <a href="https://github.com/reasonml/reason/issues/2636">a small bug</a> with some type annotations, that we were able to work&nbsp;around.</li><li>&ldquo;Lift&rdquo; dune workspace to the root of our monorepo: this is probably the most intrusive change. Because we have shared code between backend and frontend, and Dune needs to have access to all sources under its workspace, we had to &ldquo;lift&rdquo; the Dune workspace from the backend directory to the root of monorepo.</li></ul><h3>The good</h3><p>This experiment allowed us to experience what a project like Melange could offer for our use case. Here are some of the things we might be able to leverage in a codebase built with&nbsp;Melange:</p><ul><li>Recent version of the OCaml compiler: at some point, we could pin compiler version between backend and frontend teams, making upgrades more straightforward as they would happen atomically.</li><li>Shared editor tooling: the official OCaml <a href="https://github.com/ocamllabs/vscode-ocaml-platform">vscode extension</a> works great with Melange, as well as any other OCaml editor integration. Having backend and frontend teams use similar editor setup removes a lot of maintenance work for&nbsp;us.</li><li>Consuming ppxs from source: Melange allows to consume ppxs from source, which also removes issues with pre-compiled ppxs (like this issue with the recent <a href="https://github.com/ahrefs/bs-emotion/issues/53">M1&nbsp;Macs</a>).</li><li>Melange allows to run all ppxs <a href="https://github.com/melange-re/melange/pull/171">from a single executable file</a>, which has some nice performance benefits.</li><li>Use Dune for atd files generators: ReScript &ldquo;generators&rdquo; are unfortunately <a href="https://web.archive.org/web/20200710044513if_/https://reasonml.org/docs/reason-compiler/latest/build-advanced">not documented anymore</a>, but we use them extensively for atd file generation. Being able to share Dune rules in backend and frontend would make our build setup&nbsp;easier.</li><li>Access to OCaml documentation tooling: Melange allows to leverage existing tooling for generating documentation, like&nbsp;<a href="https://github.com/ocaml/odoc/">odoc</a>.</li><li>Async syntax: the latest Reason version <a href="https://github.com/reasonml/reason/pull/2487">supports &ldquo;let op&rdquo; syntax</a>, which is handy for client-side code.</li></ul><h3>The bad</h3><p>While there are many things that are exciting about Melange, there are some other parts that can be improved.</p><ul><li>Build performance: We already knew that performance would be far worse than ReScript, as Melange uses Dune in a way that it was not designed for. In our tests, builds with Melange are roughly 1 order of magnitude slower than ReScript&nbsp;ones.</li><li>First-class Dune support: if there was a deeper integration between Dune and Melange, we could explore features like shared libraries or shared rules between backend and frontend. As of today, Dune has no knowledge about Melange environment, so it can perform basic rules execution, but there is no access to high level stanzas like library in&nbsp;Melange.</li><li>Two-headed goal: finally, we see a more strategic risk in Melange proposition. Right now it has two goals: keep compatibility with both ReScript and OCaml. But we don&rsquo;t know how long these goals will be feasible. If at some point ReScript decides to move away from the OCaml compiler fully, then Melange users would not be able to consume any updates to the ReScript ecosystem anymore.</li></ul><h3>Alright, but are you migrating to Melange or ReScript?</h3><p>With all the information available, the answer is: we don&rsquo;t know yet. &#128516; We want to keep exploring all the available options and have as much information as possible before committing further. So for now, we are upgrading the codebase to recent versions of ReScript, but we are holding up on features that only work one way. For example, we have not migrated our codebase to the ReScript syntax yet, as <a href="https://github.com/rescript-lang/syntax/issues/405">there is no way to translate back to Reason&nbsp;syntax</a>.</p><p>In the meantime, we will keep exploring how far the limitations of Melange can be mitigated. To be continued! &#128640;</p><p><em>Thanks to Igor and Feihong for reviewing and improving earlier versions of this&nbsp;post.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=9f881f6d022b" width="1" height="1" alt=""/><hr/><p><a href="https://tech.ahrefs.com/building-ahrefs-codebase-with-melange-9f881f6d022b">Building Ahrefs codebase with Melange</a> was originally published in <a href="https://tech.ahrefs.com">Ahrefs</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>https://tech.ahrefs.com/building-ahrefs-codebase-with-melange-9f881f6d022b?source=rss----303662d88bae--ocamlBuilding Ahrefs codebase with Melange2021-05-18T15:24:20-00:00ahrefshttps://medium.com/feed/ahrefs/tagged/ocamlahrefs<figure><img src="https://cdn-images-1.medium.com/max/1024/1*Nl5vYk_k-mC4j32XEjryHQ.jpeg" alt=""/><figcaption>Photo by <a href="https://unsplash.com/@willianjusten">https://unsplash.com/@willianjusten</a></figcaption></figure><p>The first <a href="https://reasonml.org/">Reason</a> application at <a href="https://ahrefs.com">Ahrefs</a> went online on January 31, 2019. Since then, many more applications have been either rewritten in Reason, are being slowly migrated from React to ReasonReact, or are conceived from the start as Reason projects. It is safe to say that the bet placed on Reason paid off big time. We will never go back to doing pure JavaScript again, with the possible exception of simple backend&nbsp;scripts.</p><p>In the past few years, it&rsquo;s come to light that there are a number of other <a href="https://www.messenger.com/">large</a> <a href="https://www.onegraph.com/">Reason</a>/<a href="https://darklang.com/">BuckleScript</a> <a href="https://onivim.io/">codebases</a> in the wild, but there still isn&rsquo;t a ton of information out there about what it&rsquo;s really like to work with Reason in production. To help remedy that, we thought it would be instructive to ask each of our frontend team members what their Reason journey has been like so&nbsp;far.</p><p>We gave them the following questions as starting points (but they were free to talk about anything they&nbsp;wanted):</p><ul><li>How does Reason compare to other languages you&rsquo;ve used in the&nbsp;past?</li><li>What&rsquo;s your favorite thing about&nbsp;Reason?</li><li>What&rsquo;s your least favorite thing about&nbsp;Reason?</li><li>How does ReasonReact compare to other frameworks you&rsquo;ve&nbsp;used?</li><li>Was it easy to pick up Reason? Why or why&nbsp;not?</li></ul><h4>Javi</h4><ul><li>How does Reason compare to other languages you&rsquo;ve used in the&nbsp;past?</li></ul><p>In the past I worked with languages like Java, C, or less known like Pascal or Prolog. But the languages I&rsquo;ve spent more time with are Objective-C and JavaScript. The main difference between all those languages and Reason is the exhaustiveness that you get from OCaml type checker. This is maybe awkward, but it feels like you stop coding alone and suddenly you have a sidekick always sitting next to you, that is helping you notice the things you forgot about, or found new code that is not consistent with code you or someone else wrote&nbsp;before.</p><p>In a world that is moving towards remote work, where many of us spend hours every day coding physically far from our colleagues, it makes the experience much more delightful. Plus, it allows for teams working on different time zones to keep a healthier work-life balance, because there is less need to have synchronous communication than with more dynamic languages, as more assumptions and design decisions are &ldquo;embedded&rdquo; into the&nbsp;code.</p><ul><li>What&rsquo;s your favorite thing about&nbsp;Reason?</li></ul><p>Can I pick two things? It&rsquo;s hard to choose only&nbsp;one.</p><p>The first one is the exhaustiveness and quality of the type checker, as mentioned above. Sometimes it takes a bit longer to build a feature than what it would in other languages, until the types are figured out. But this is largely compensated by the confidence one has when shipping code to production, or diving into large refactors.</p><p>The second one is the speed of the BuckleScript build system, which is built on top of <a href="https://ninja-build.org/">ninja</a>. I had never worked with such fast build system. As an example, we have recently started to use remote machines to develop at Ahrefs. In one of these machines that has 72 cores, BuckleScript takes roughly 3 seconds to clean build <em>all</em> our Reason code: application, libs, decoders&hellip; everything. Many tens of thousand lines of code! We thought there were something wrong, but we realized the compiler is just So Blazing&nbsp;Fast&trade;&#65039;.</p><ul><li>What&rsquo;s your least favorite thing about&nbsp;Reason?</li></ul><p>I guess we&rsquo;re going through a necessary stage until things stabilize in the future, but there is a lot of fragmentation at the moment between &ldquo;Reason native&rdquo;, which tries to stay closer to OCaml, and &ldquo;Reason web&rdquo;, which has a goal to become friendlier for JavaScript developers.</p><p>I am excited to see what <a href="https://reasonml.org/blog/bucklescript-8-1-new-syntax">BuckleScript new syntax</a> will lead to, but I would also love to see a &ldquo;universal&rdquo; solution that works for the main use cases out of the box, becoming sort of Rails for Ocaml or Reason. <a href="https://github.com/oxidizing/sihl/">sihl</a> is a project that seems to go in that direction and looks very promising.</p><ul><li>How does ReasonReact compare to other frameworks you&rsquo;ve&nbsp;used?</li></ul><p>I consider ReasonReact mostly like React + types on top, because the bindings layer is very thin. The thing that I like most about React is that it follows the Unix philosophy: it does one thing and it does it really well. Maybe we have forgotten already today, but having to maintain and mutate UI based on data updates was one of the main sources of bugs in the past. The other nice thing is that there is so much good content about it: blog posts, documentation, etc.</p><ul><li>Was it easy to pick up Reason? Why or why&nbsp;not?</li></ul><p>It took some time, as with any other language. We have things like syntax or semantics much more ingrained into our brains than we think, so there is always some &ldquo;rewiring&rdquo; time that is needed to learn a new language, even if Reason makes an effort to stay close to JavaScript syntax. The most challenging part was probably the bindings one, because coming from JavaScript, there are no previous knowledge that one can use as foundation to build upon, it&rsquo;s all &ldquo;new knowledge&rdquo;. glennsl <a href="https://github.com/glennsl/bucklescript-ffi-cheatsheet">BuckleScript ffi cheatsheet</a> was a huge help for&nbsp;me.</p><h4>Ze</h4><p>I really like working with Reason, and have wanted to do so for a while. I was quite happy to see that working with it matched my expectations.</p><p>You get so much support from the type system, and still have a lot of flexibility to represent your domain model. Coming from other languages or paradigms, you don&rsquo;t feel limited at all in what you can&nbsp;achieve.</p><p>The language has such a strong type system that you feel much more comfortable with your&nbsp;coding.</p><p>The OCaml type system is there to make sure you code with assurance. This is especially true when refactoring code. You can be sure that everything will work fine after it compiles. If it compiles, it works&nbsp;:)</p><p>It&rsquo;s also very helpful when working on a monorepo. You don&rsquo;t have to keep reading the source code of everything you use to make sure you don&rsquo;t have types mistakes. Changes in code in one lib reflect immediately in all the others. This makes the feedback loop much shorter and&nbsp;safer.</p><p>The editors integrations with the type system are quite good and help a lot to write code better and&nbsp;faster.</p><p>Also, compilation times are super&nbsp;fast.</p><p>Last, but not least, ReasonReact is, for me, the hidden gem of ReasonML. The newcomers that have some difficulty with the language should start with it. IMHO, ReasonReact is simpler and has a better developer experience than React itself. It should be the gateway drug frontend developers need to get started with Reason/OCaml &#128516;</p><h4>Liubomyr</h4><p>To me, all those language features boil down to one essential thing, and it&rsquo;s the easiness of refactoring. New business requirements popups all the time, and often your initial code assumptions are no longer correct. It was such a pain to modify code in a large JS codebase, as you never know how many things you potentially break in the process. With Reason, it has never been easier. If you need to change your data shape or some component API, you just do it, and from there, the compiler will guide you through all the places you broke, and help to fix&nbsp;those.</p><p>Coming from the JS world, it feels like the initial development is slower, because of the learning curve, missing bindings, less StackOverflow answers, but in the end, you are getting a stable software which is way easier to maintain and add features&nbsp;to.</p><h4>Egor</h4><p>I switched to Reason when I joined Ahrefs team about a year ago, before that I worked mostly with Ruby language.</p><p>The first thing that impressed me in ReasonML was code refactoring. Refactoring in language with a strong type system, like ReasonML and OCaml, is much easier than what I am used to. If your program compiles after your refactoring&#8202;&mdash;&#8202;most likely you did everything right, if it doesn&rsquo;t compile&#8202;&mdash;&#8202;you can immediately see what you forgot to change. This can be achieved in languages with a dynamic type system only with a huge amount of code tests (supporting big test suite is a time consuming process as well as code support).</p><p>The other thing that I really like about ReasonML codebase&#8202;&mdash;&#8202;how readable it is. When you just enter into ReasonML world&#8202;&mdash;&#8202;some things can be unfriendly from the first sight, for example, immutable let bindings, but in the end, you realize that these language decisions help you to write cleaner and simpler&nbsp;code.</p><h4>Seif</h4><p>The programming language I used the most in the past is JavaScript. I switched to Reason when I joined Ahrefs a few months ago. From the start, I worked mainly on the code shared by the majority of the tools and I don&rsquo;t think I would have had the same confidence making changes if I was doing it with JavaScript. I love JavaScript&rsquo;s developer experience and accessibility. Reason provided me predictability without hurting these very same things I like about JavaScript.</p><h4>Bryan</h4><p>Reason (and OCaml) is, by far, one of the easiest languages to work with. Easy in the sense that the compiler helps eliminate an entire class of errors so you don&rsquo;t have to worry about them. Additionally, in most other web-centric languages, it&rsquo;s a pain to add features to existing code that you&rsquo;ve not touched for a long time. With strong static typing, I can usually add the feature I want in either the backend or frontend, and then let the compiler tell me what needs to be&nbsp;updated.</p><p>Pattern-matching is one of my favourite features in Reason. To me, it makes more sense to be able to explicitly specify conditions that I&rsquo;m interested in a clear and concise manner, and let the compiler tell me if I missed out a particular condition. Records go hand-in-hand with this. As software programs are made up of data and instructions, records are the perfect data containers. They are quick to define and query, focusing on data rather than behaviour (think classes and instance methods).</p><p>It definitely took a while to pick up Reason mainly because it takes time to become familiar with idiomatic OCaml. But once I crested that learning curve, everything just made sense and all the features of the language that made Reason seemingly difficult to learn&#8202;&mdash;&#8202;strong typing, the functional paradigm, etc, became assistants that helped me to write better&nbsp;code.</p><h4>Feihong</h4><p><a href="https://reasonml.github.io/reason-react/en/">ReasonReact</a> is a great library for making complex UIs in a large codebase because you get the familiarity of React coupled with the type safety of OCaml. Having two well-established technologies in its foundation is a big advantage that ReasonReact has over other functional UI libraries/frameworks in the transpile-to-JS universe. I didn&rsquo;t have any professional OCaml experience before joining, yet the ramp up was made much easier by my existing knowledge of React and the (somewhat superficial) similarity of the Reason syntax to JS. Oftentimes it was possible to correctly guess the intent of existing Reason code without knowing all the syntax, because most React concepts carry over pretty directly. And even though the documentation is incomplete and not perfect, it&rsquo;s quite usable already and among conceptually-similar frameworks is second only to the Elm documentation.</p><p>The compiler errors were difficult to get used to at first. The compiler is fairly good at pointing out the location of the error, but not necessarily as good at explaining the nature or cause of the error. As such, having a REPL would be extremely useful. Actually, OCaml does have its own REPL, but BuckleScript (the compiler used by Reason to translate OCaml to JS) does not at the moment. Nonetheless, the <a href="https://reasonml.github.io/en/try">Try Reason</a> page is a really good tool to try out small snippets of code and is extremely useful while learning the language (we will still occasionally post Try Reason links in our slack channel).</p><h3>Summary</h3><p>The reality is that Ahrefs has always been an OCaml shop, but in the past OCaml was only used to build the backend. Now that we are also using it on the frontend, we get the benefits that our backend colleagues have enjoyed for many years: the expressiveness afforded by pattern matching, the ease of refactoring in large codebases, the stability of a mature programming language, and the confidence of &ldquo;if it compiles, it works&rdquo;. To make a shoddy nautical analogy, it is as if we had built a wooden ship powered by a turbo engine. But now the wooden parts are being replaced with steel and plastic, bringing the exterior of the ship up to modern standards as well. As a result, the ship runs faster and more reliably, making the passengers (our users) more satisfied. Also, pirates (bugs) have a harder time hijacking the ship because it&rsquo;s sturdier and defended by well-disciplined camels. Because the ship keeps getting more and more passengers who want to experience a delightful ride and take pictures with enigmatic camels, we require a constant influx of willing and able boat engineers (who aren&rsquo;t allergic to camels) to extend and maintain the ship. (Yes, that means that <a href="https://ahrefs.com/jobs">we are hiring</a>&#65039;.)</p><p><em>Thanks to Raman and Louis for fact checking this&nbsp;post.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=2250cf5ba63b" width="1" height="1" alt=""/><hr/><p><a href="https://tech.ahrefs.com/one-and-a-half-years-of-reasonml-in-production-2250cf5ba63b">One and a half years of ReasonML in production</a> was originally published in <a href="https://tech.ahrefs.com">Ahrefs</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>https://tech.ahrefs.com/one-and-a-half-years-of-reasonml-in-production-2250cf5ba63b?source=rss----303662d88bae--ocamlOne and a half years of ReasonML in production2020-07-26T15:19:31-00:00ahrefshttps://medium.com/feed/ahrefs/tagged/ocamlahrefs<p><em>Written with </em><a href="https://twitter.com/javierwchavarri"><em>Javier Ch&aacute;varri</em></a><em> and </em><a href="https://github.com/feihong/"><em>Feihong&nbsp;Hsu</em></a><em>.</em></p><p>The first <a href="https://www.reason-conf.us/">Reason Conf US</a> just ended. Many talks mentioned native compilation. Sharing code between BuckleScript and native artifacts is a use case which is more and more common. This blog post is an introduction on how to set up a library available for both worlds, sharing as much code as possible.</p><h3>The goal</h3><p>What we try to produce is a library with an identical interface for BuckleScript and native. But without duplicating code. It should also be possible to have some parts of the library that are a different implementation depending on the target, as we want to be able to leverage existing libraries that are working only in one of the&nbsp;worlds.</p><h3>The build&nbsp;systems</h3><p>For BuckleScript, there is only one build system: bsb. It is driven by a bsconfig.json file. And is installed as part of the bs-platform.</p><p>On the native side, there are a lot of different build systems that are available. But recently one of them became a de facto standard: dune. It works with a very minimal amount of configuration. And it supports the reason syntax by&nbsp;default.</p><p>These two tools are working in a way which is pretty similar. They share a lot of concepts. And it is easy to set them up so that both are working in the same codebase.</p><p>The main similarities that interest us&nbsp;are:</p><ul><li>The ability to work on specific source directories</li><li>Namespacing in bsb and wrapping in dune are both putting all the<br/>files of the library under a single module&nbsp;name</li></ul><h3>The source code file&nbsp;tree</h3><p>The code of the library is split into 3 directories.</p><pre>&#9500;&#9472;&#9472; js/<br/>&#9500;&#9472;&#9472; native/<br/>&#9492;&#9472;&#9472; shared/</pre><ul><li>shared is meant to host most of the code and all the code in this directory will be compiled in both&nbsp;modes.</li><li>js contains the parts that are specific to BuckleScript.</li><li>native contains the parts that are specific to native&nbsp;OCaml.</li></ul><h3>Set up the build&nbsp;systems</h3><p>Once we have our basic skeleton for the library, it is time to set up the build systems. We want to have two configurations as similar as possible to make them easier to understand. Once we are done, the tree will look like&nbsp;this:</p><pre>&#9500;&#9472;&#9472; bsconfig.json<br/>&#9500;&#9472;&#9472; dune<br/>&#9500;&#9472;&#9472; dune-project<br/>&#9500;&#9472;&#9472; js/<br/>&#9500;&#9472;&#9472; native/<br/>&#9492;&#9472;&#9472; shared/</pre><h4>BuckleScript</h4><p>At the root of the library we need a bsconfig.json file to drive<br/>bsb. The documentation is available at <a href="https://bucklescript.github.io/docs/en/build-configuration%5D(https://bucklescript.github.io/docs/en/build-configuration).">https://bucklescript.github.io/docs/en/build-configuration</a>.</p><p>The main part for us is sources. We will use it to tell bsb to look at the js and shared folders. We also want to set namespace to true, which will wrap all your project&rsquo;s files under a common module&nbsp;name.</p><pre> &quot;namespace&quot;: true,<br/> &quot;sources&quot;: [<br/> {<br/> &quot;dir&quot;: &quot;js&quot;,<br/> &quot;subdirs&quot;: true<br/> }, {<br/> &quot;dir&quot;: &quot;shared&quot;,<br/> &quot;subdirs&quot;: true<br/> }<br/> ],</pre><p>The rest of the file is as&nbsp;usual.</p><pre>{<br/> &quot;name&quot;: &quot;sharedlib&quot;,<br/> &quot;namespace&quot;: true,<br/> &quot;sources&quot;: [<br/> {<br/> &quot;dir&quot;: &quot;js&quot;,<br/> &quot;subdirs&quot;: true<br/> }, {<br/> &quot;dir&quot;: &quot;shared&quot;,<br/> &quot;subdirs&quot;: true<br/> }<br/> ],<br/> &quot;package-specs&quot;: {<br/> &quot;module&quot;: &quot;es6&quot;,<br/> &quot;in-source&quot;: true<br/> },<br/> &quot;refmt&quot;: 3,<br/> &quot;suffix&quot;: &quot;.bs.js&quot;,<br/> &quot;generate-merlin&quot;: true,<br/>}</pre><h4>Dune</h4><p>We must also add a dune file to the root of the library. For dune, we have different options&#8202;&mdash;&#8202;it is possible to ignore the js directory but read everything else. Or to check only shared and native. To make the configuration similar to BuckleScript, we will go with the second solution.</p><p>The dune directive to do that is dirs. By defaults it tells dune to explore every directory except the ones hidden (starting with a dot) or starting with an underscore. <a href="https://dune.readthedocs.io/en/stable/dune-files.html#dirs-since-1-6">More details in dune&rsquo;s documentation</a>. To make it do what we want, the configuration should&nbsp;be:</p><pre>(dirs shared native)</pre><p>We also use another option of dune to tell it to include the content of those two directories as if it was at the root of the project. Without this stanza, dune would only use the source files at the root of the project and ignore everything in the sub directories.</p><pre>(include_subdirs unqualified)</pre><p>Then we need the usual library stanza to give a name to our library, state the dependencies, compilation flags, etc. In our simple case, the only information needed is the name. We can explicitly set wrapped to true, but this is already the default behavior. The <a href="https://dune.readthedocs.io/en/stable/dune-files.html#library">documentation for the whole library stanza</a> describes how to specify more&nbsp;details.</p><p>The final dune file looks like&nbsp;this:</p><pre>(dirs shared native)<br/> (include_subdirs unqualified)<br/> (library<br/> (name sharedlib))</pre><p>We also want a basic dune-project. If we don&rsquo;t write it by hand, dune will generate it for us. I am using version 1.10 as an example. But it can be changed to whatever version suits your&nbsp;project.</p><pre>(lang dune 1.10)</pre><h3>Compilation</h3><p>With the setup described above, the compilation for BuckleScript and native is the same as in a setup with only one or the&nbsp;other.</p><ul><li>bsb -make-world for BuckleScript</li><li>dune build @all for&nbsp;dune</li></ul><p>The call to bsb is usally put in package.json in the scripts part, so that the usual yarn build can be used. For native, it depends if you rely on esy or&nbsp;opam.</p><h3>How to consume the&nbsp;library</h3><p>This is exactly the same setup that would be used in a pure BuckleScript or pure native&nbsp;library.</p><p>To use your library in BuckleScript:</p><ul><li>Add the name and version to package.json</li><li>Add the name to bsbconfig.json of consuming library/app</li></ul><p>To use your library in native OCaml, add the name of your library to the libraries part an executable or library stanza,&nbsp;e.g.</p><pre>(executable<br/> (name main)<br/> (libraries sharedlib))</pre><h3>Module naming</h3><p>If you want your module name to contain capital letters in the middle (e.g. TeenageMutantNinjaTurtles), then be aware that <a href="https://bucklescript.github.io/docs/en/build-configuration.html#name-namespace">name munging</a> works differently between bsbconfig.json and dune. For example, if you want to refer to your module as CoolSharedLib in your code, then the name in bsbconfig.json must be cool-shared-lib, and in dune it must be coolSharedLib.</p><h3>Platform specific&nbsp;code</h3><p>The whole library does not have to be exactly the same in the two platform. It is possible to add modules that are available only in one mode. Or to have modules with a different interface.</p><p>For example, by adding a file Foo.re in js but not in native, the library now has a module Foo available when compiled to javascript. But only when compiled to javascript.</p><h3>Downsides</h3><ul><li>Both bsb and dune generate&nbsp;.merlin files when they compile our library. They override each other. It might be troublesome if the version of ocaml used for native code is not 4.02.3. Simply recompile the library for your platform to solve the&nbsp;problem.</li><li>Out of the box, this approach doesn&rsquo;t really allow us to share interface files between both platforms: native and BuckleScript. One workaround for that, if we wanted to share some module Foo, is to:<br/>1. add Foo.mli or Foo.rei file in shared<br/>2. add include FooImplementation in Foo.ml<br/>3. add FooImplementation in both native and js&nbsp;folder</li><li>It&rsquo;s not possible to be platform specific for just a few lines of code (e.g. if IS_NATIVE foo else bar), the minimal per-platform unit is a file/module.</li></ul><h3>Example project</h3><p>We have set up a simple library to showcase what a repository looks like once the whole configuration is in place. It is <a href="https://github.com/ahrefs/hello-native-bucklescript">available on&nbsp;github</a>.</p><p>For now the repository contains only a library. But with this setup, it is actually possible to build an executable too. It is also possible to enrich it, for example by adding <a href="https://tech.ahrefs.com/getting-started-with-atdgen-and-bucklescript-1f3a14004081">atdgen to communicate between both sides of the&nbsp;library</a>.</p><img src="https://medium.com/_/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=22f45e5e946d" width="1" height="1" alt=""/><hr/><p><a href="https://tech.ahrefs.com/how-to-write-a-library-for-bucklescript-and-native-22f45e5e946d">How to write a library for BuckleScript and Native</a> was originally published in <a href="https://tech.ahrefs.com">Ahrefs</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>https://tech.ahrefs.com/how-to-write-a-library-for-bucklescript-and-native-22f45e5e946d?source=rss----303662d88bae--ocamlHow to write a library for BuckleScript and Native2019-10-22T10:09:09-00:00ahrefshttps://medium.com/feed/ahrefs/tagged/ocamlahrefs<p><a href="https://github.com/mjambon/atd">atdgen</a> is a project to create types and data structures that can be serialized to JSON. It is very convenient when communicating between multiple processes, creating a REST API or consuming JSON objects from other tools. It can be compared to <a href="https://json-schema.org/">JSON schema</a> or <a href="https://developers.google.com/protocol-buffers/">Protocol Buffers</a>, but with richer types and more features.</p><p>The idea is to write a list of types in a specification file, an&nbsp;.atd file. Then running atdgen, it is possible to generate OCaml or Java code to serialize/deserialize values of those types to/from corresponding json.</p><p>Until very recently, atdgen could generate code only for native OCaml. But <a href="https://github.com/mjambon/atd/pull/44">the support of bucklescript has been merged</a>! atdgen the cli tool is still a native OCaml binary. But it can output some OCaml code that can be compiled using <a href="https://bucklescript.github.io/">bucklescript</a>.</p><p>The work to implement this new feature of atdgen has been funded by <a href="https://ahrefs.com/">Ahrefs</a>. We highly appreciate open source tools. And as much as possible, we prefer to contribute to existing open source projects rather than to re-invent the wheel internally.</p><h3>Installation</h3><p>To install atdgen we first need to install <a href="https://opam.ocaml.org">opam</a> (OCaml package manager), as atdgen doesn&rsquo;t provide ready to use binaries and is only distributed as source package via opam. The procedure is simple and documented here: <a href="https://opam.ocaml.org/doc/2.0/Install.html">https://opam.ocaml.org/doc/2.0/Install.html</a></p><p>Then we need to initialize opam and create a switch. Any version of ocaml greater or equal to 4.03.0 should be&nbsp;fine.</p><pre>opam init -a<br/>opam switch create . 4.07.1 -y</pre><p>Once it is done, we have to install the development version of atdgen. The support of bucklescript is not officially released.</p><pre>opam pin add atd --dev-repo <br/>opam pin add atdgen --dev-repo</pre><p>Make sure that atdgen is available.</p><pre>$ which atdgen <br/>(current $PWD)/_opam/bin/atdgen</pre><p>Of course, we need bucklescript.</p><pre>yarn init <br/>yarn add bs-platform --dev</pre><p>We also need the bucklescript runtime for atdgen, as it is not currently provided by atdgen itself. So we have written and open-sourced our version of the runtime&nbsp;: <a href="https://github.com/ahrefs/bs-atdgen-codec-runtime">https://github.com/ahrefs/bs-atdgen-codec-runtime</a>.</p><p>This runtime is responsible for the conversion between JSON values and OCaml values. The JSON values are based on the standard <a href="https://bucklescript.github.io/bucklescript/api/Js.Json.html#TYPEt">Js.Json.t type</a> provided by bucklescript to be sure that it is easy to interoperate with the rest of the ecosystem.</p><p>It is published on npm for easy integration in bucklescript projects.</p><pre>yarn add @ahrefs/bs-atdgen-codec-runtime</pre><h3>Project configuration</h3><p>After the previous section, package.json should be almost ready. We can add a few scripts to make it more convenient to compile the project. Here is how it should look once completed.</p><pre>{<br/> &quot;name&quot;: &quot;demo-bs-atdgen&quot;,<br/> &quot;version&quot;: &quot;0.0.1&quot;,<br/> &quot;description&quot;: &quot;demo of atdgen with bucklescript&quot;,<br/> &quot;scripts&quot;: {<br/> &quot;clean&quot;: &quot;bsb -clean-world&quot;,<br/> &quot;build&quot;: &quot;bsb -make-world&quot;,<br/> &quot;watch&quot;: &quot;bsb -make-world -w&quot;,<br/> &quot;atdgen&quot;: &quot;atdgen -t meetup.atd &amp;&amp; atdgen -bs meetup.atd&quot;<br/> },<br/> &quot;devDependencies&quot;: {<br/> &quot;bs-platform&quot;: &quot;^4.0.5&quot;<br/> },<br/> &quot;peerDependencies&quot;: {<br/> &quot;bs-platform&quot;: &quot;^4.0.5&quot;<br/> },<br/> &quot;dependencies&quot;: {<br/> &quot;<a href="http://twitter.com/ahrefs/bs-atdgen-codec-runtime">@ahrefs/bs-atdgen-codec-runtime</a>&quot;: &quot;^1.0.4&quot;<br/> }<br/>}</pre><p>The bucklescript configuration is very simple. We use the basic configuration that can be found in any bucklescript project. Except that we need to add one dependency to bsconfig.json:</p><pre>{<br/> &quot;name&quot;: &quot;demo-bs-atdgen&quot;,<br/> &quot;version&quot;: &quot;0.0.1&quot;,<br/> &quot;sources&quot;: {<br/> &quot;dir&quot;: &quot;src&quot;,<br/> &quot;subdirs&quot;: true<br/> },<br/> &quot;package-specs&quot;: {<br/> &quot;module&quot;: &quot;commonjs&quot;,<br/> &quot;in-source&quot;: true<br/> },<br/> &quot;suffix&quot;: &quot;.bs.js&quot;,<br/> &quot;bs-dependencies&quot;: [<br/> &quot;<a href="http://twitter.com/ahrefs/bs-atdgen-codec-runtime">@ahrefs/bs-atdgen-codec-runtime</a>&quot;<br/> ],<br/> &quot;warnings&quot;: {<br/> &quot;error&quot;: &quot;+101&quot;<br/> },<br/> &quot;generate-merlin&quot;: true,<br/> &quot;namespace&quot;: true,<br/> &quot;refmt&quot;: 3<br/>}</pre><h3>First ATD definitions</h3><p>It is time to create a first&nbsp;.atd file, containing our types. This part is also documented on <a href="https://atd.readthedocs.io/en/latest/tutorial.html#getting-started">https://atd.readthedocs.io/en/latest/tutorial.html#getting-started</a></p><p>For this example, I decided to go with a meetup event. Put the type definitions in src/meetup.atd.</p><pre>(* This is a comment. Same syntax as in ocaml. *)</pre><pre>type access = [ Private | Public ]</pre><pre>(* the date will be a float in the json and a Js.Date.t in ocaml *)<br/>type date = float wrap &lt;ocaml module=&quot;Js.Date&quot; wrap=&quot;Js.Date.fromFloat&quot; unwrap=&quot;Js.Date.valueOf&quot;&gt;</pre><pre>(* Some people don't want to provide a phone number, make it optional *)<br/>type person = {<br/> name: string;<br/> email: string;<br/> ?phone: string nullable;<br/>}</pre><pre>type event = {<br/> access: access;<br/> name: string;<br/> host: person;<br/> date: date;<br/> guests: person list;<br/>}</pre><pre>type events = event list</pre><p>We use the atdgen binary (compiled previously) to generate the ocaml types and the code to serialize/deserialize those&nbsp;types.</p><pre>atdgen -t meetup.atd # generates an ocaml file containing the types<br/>atdgen -bs meetup.atd # generates the code to (de)serialize</pre><p>The generated files&nbsp;are:</p><ul><li>meetup_t.ml(i) which contain the ocaml types corresponding to our ATD definitions.</li><li>meetup_bs.ml(i) which contain the ocaml code to transform from and to json&nbsp;values.</li></ul><p>At this point we can compile our&nbsp;project.</p><pre>yarn build</pre><p>If everything worked properly, we now have two&nbsp;.bs.js files in the src directory.</p><pre>$ tree src<br/>src<br/>&#9500;&#9472;&#9472; meetup.atd<br/>&#9500;&#9472;&#9472; meetup_bs.bs.js<br/>&#9500;&#9472;&#9472; meetup_bs.ml<br/>&#9500;&#9472;&#9472; meetup_bs.mli<br/>&#9500;&#9472;&#9472; meetup_t.bs.js<br/>&#9500;&#9472;&#9472; meetup_t.ml<br/>&#9492;&#9472;&#9472; meetup_t.mli</pre><pre>0 directories, 7 files</pre><p>At this point, we can create new OCaml/Reason files in the src directory and use all the code atdgen generated for us. Two examples to illustrate that.</p><h3>Query a REST&nbsp;API</h3><p>A common usage of atdgen is to decode the JSON returned by a REST API. Here is a short example, using the reason syntax and bs-fetch.</p><pre>let get = (url, decode) =&gt;<br/> Js.Promise.(<br/> Fetch.fetchWithInit(<br/> url,<br/> Fetch.RequestInit.make(~method_=Get, ()),<br/> )<br/> |&gt; then_(Fetch.Response.json)<br/> |&gt; then_(json =&gt; json |&gt; decode |&gt; resolve)<br/> );</pre><pre>let v: Meetup_t.events =<br/> get(<br/> &quot;<a href="http://localhost:8000/events">http://localhost:8000/events</a>&quot;,<br/> Atdgen_codec_runtime.Decode.decode(Meetup_bs.read_events),<br/> );</pre><h3>Read and write a JSON&nbsp;file</h3><p>Atdgen for bucklescript doesn&rsquo;t take care of converting a string to a JSON object. Which allows us to use the performant json parser included in nodejs or the&nbsp;browser.</p><pre>let read_events filename =<br/> (* Read and parse the json file from disk, this doesn't involve atdgen. *)<br/> let json =<br/> Node_fs.readFileAsUtf8Sync filename<br/> |&gt; Js.Json.parseExn<br/> in<br/> (* Turn it into a proper record. The annotation is of course optional. *)<br/> let events: Meetup_t.events =<br/> Atdgen_codec_runtime.Decode.decode Meetup_bs.read_events json<br/> in<br/> events</pre><p>The reverse operation, converting a record to a JSON object and writing it in a file is also straightforward.</p><pre>let write_events filename events =<br/> Atdgen_codec_runtime.Encode.encode Meetup_bs.write_events events (* turn a list of records into json *)<br/> |. Js.Json.stringifyWithSpace 2 (* convert the json to a pretty string *)<br/> |&gt; Node_fs.writeFileAsUtf8Sync filename (* write the json in our file *)</pre><h3>Full example</h3><p>Now that we have our functions to read and write events, we can build a small cli to pretty print the list of events and add new&nbsp;events.</p><p>The source code of the full example is available <a href="https://github.com/ahrefs/bs-atdgen-codec-runtime/tree/master/example">on&nbsp;github</a>.</p><p>You can run it like&nbsp;this:</p><pre>$ echo &quot;[]&quot; &gt; events.json<br/>$ nodejs src/cli.bs.js add louis <a href="mailto:louis@nospam.com">louis@nospam.com</a><br/>$ nodejs src/cli.bs.js add bob <a href="mailto:bob@nospam.com">bob@nospam.com</a><br/>$ nodejs src/cli.bs.js print<br/>=== OCaml/Reason Meetup! summary ===<br/>date: Tue, 11 Sep 2018 15:04:16 GMT<br/>access: public<br/>host: bob &lt;<a href="mailto:bob@nospam.com">bob@nospam.com</a>&gt;<br/>guests: 1<br/>=== OCaml/Reason Meetup! summary ===<br/>date: Tue, 11 Sep 2018 15:04:13 GMT<br/>access: public<br/>host: louis &lt;<a href="mailto:louis@nospam.com">louis@nospam.com</a>&gt;<br/>guests: 1<br/>$ cat events.json<br/>[<br/> {<br/> &quot;guests&quot;: [<br/> {<br/> &quot;email&quot;: &quot;<a href="mailto:bob@nospam.com">bob@nospam.com</a>&quot;,<br/> &quot;name&quot;: &quot;bob&quot;<br/> }<br/> ],<br/> &quot;date&quot;: 1536678256177,<br/> &quot;host&quot;: {<br/> &quot;email&quot;: &quot;<a href="mailto:bob@nospam.com">bob@nospam.com</a>&quot;,<br/> &quot;name&quot;: &quot;bob&quot;<br/> },<br/> &quot;name&quot;: &quot;OCaml/Reason Meetup!&quot;,<br/> &quot;access&quot;: &quot;Public&quot;<br/> },<br/> {<br/> &quot;guests&quot;: [<br/> {<br/> &quot;email&quot;: &quot;<a href="mailto:louis@nospam.com">louis@nospam.com</a>&quot;,<br/> &quot;name&quot;: &quot;louis&quot;<br/> }<br/> ],<br/> &quot;date&quot;: 1536678253790,<br/> &quot;host&quot;: {<br/> &quot;email&quot;: &quot;<a href="mailto:louis@nospam.com">louis@nospam.com</a>&quot;,<br/> &quot;name&quot;: &quot;louis&quot;<br/> },<br/> &quot;name&quot;: &quot;OCaml/Reason Meetup!&quot;,<br/> &quot;access&quot;: &quot;Public&quot;<br/> }<br/>]</pre><img src="https://medium.com/_/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=1f3a14004081" width="1" height="1" alt=""/><hr/><p><a href="https://tech.ahrefs.com/getting-started-with-atdgen-and-bucklescript-1f3a14004081">Getting started with atdgen and bucklescript</a> was originally published in <a href="https://tech.ahrefs.com">Ahrefs</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>https://tech.ahrefs.com/getting-started-with-atdgen-and-bucklescript-1f3a14004081?source=rss----303662d88bae--ocamlGetting started with atdgen and bucklescript2018-09-12T02:53:58-00:00ahrefshttps://medium.com/feed/ahrefs/tagged/ocamlahrefs<blockquote>It was a dark and stormy night; the skylake CPU buzzed with excitement, and then, suddenly, the hyperthreads started to lock&nbsp;up..</blockquote><p>Or something like&nbsp;that.</p><p>This week a new erratum for the Intel Skylake and Kabylake processors families was brought to public attention on <a href="https://lists.debian.org/debian-devel/2017/06/msg00308.html">the Debian mailing list</a>, and then on <a href="https://news.ycombinator.com/item?id=14630183">various</a> <a href="https://www.reddit.com/r/programming/comments/6jfgfp/warning_intel_skylakekaby_lake_processors_broken/">social media</a> and <a href="http://www.theregister.co.uk/2017/06/25/intel_skylake_kaby_lake_hyperthreading/">news&nbsp;outlets</a>.</p><p>We have been investigating this issue since January with the core <a href="http://ocaml.org">OCaml</a> team, as we were struggling with a mysterious bug affecting our developers machines, and ultimately our production system, resulting in a corruption of important data in our databases.</p><p>At <a href="https://ahrefs.com">Ahrefs</a>, we operate a fleet of thousands of servers, running a wide variety of services (huge web crawler among others). At this scale, dealing with unexpected application behaviors is common. While we try to reduce the probability of the software not functioning as expected, bugs are sadly a real part of our everyday life. Even though we can assume the underlying hardware running any infrastructure can be thought of as more reliable and less prone to bugs than software components, issues can still arise in unexpected ways. When the number of servers increases, it is not unusual to observe faults in the hardware preventing the system from functioning as specified.</p><p>It is certainly not frequent to encounter such problems in CPUs but reading through <a href="https://www3.intel.com/content/dam/www/public/us/en/documents/specification-updates/desktop-6th-gen-core-family-spec-update.pdf">the list of errata published by any manufacturer,</a> each CPU model contains a fair amount of bugs. This story is about the bug in the microcode of Skylake processor leading to incorrect code execution under certain conditions. This is certainly scary at first sight: how can we trust our system if we cannot trust its main component&nbsp;? Yet, like software bugs, processor defects can be identified, contained, and we can take actions to prevent them from impacting the operation of the infrastructure.</p><p>We do not know the full implications of this particular bug, especially security implications in case of untrusted code execution. But we&rsquo;d like to tell the story of this erratum from our point of view, to provide some context, and show that dealing with it was not much different than dealing with any usual software flaw. While this post aims to cover our own perspective on this adventure, we would like to thank Mark Shinwell, Xavier Leroy, Fr&eacute;d&eacute;ric Bour, everyone involved in the <a href="https://caml.inria.fr/mantis/view.php?id=7452">Mantis issue</a> and the OCaml IRC channel for their help and time spent investigating with us. Update: Xavier Leroy told his own side of the story in another <a href="http://gallium.inria.fr/blog/intel-skylake-bug/">blog&nbsp;post.</a></p><h3>Setting the&nbsp;scene</h3><p>Our story starts in late 2016 after some of our backend developers received new laptops to work on. After a few days Enguerrand Decorne noticed unusual crashes during compilation of our OCaml codebase.</p><p>This issue, considered mildly annoying at first, seemed to affect only Enguerrand&rsquo;s machine. For a few days no other machine would exhibit the same behavior, so we figured this was a fault specific to his system configuration.</p><p>However, concerns were subsequently raised after witnessing the generation of invalid machine code and later on, after the deployment of a service on one of our new clusters composed of Skylake Xeon processors, leading to the insertion of corrupted data into our storage system. The priority raised from the annoying level, to potentially critical. Other developers started working together to obtain more information and assess the impact on our infrastructure. Soon after we were able to reproduce the issue on several machines.</p><p>The remainder of this post is a technical description of the steps taken to ensure that our systems were operating safely. It is intended to show that such low level CPU issues is not necessarily fatal&#8202;&mdash;&#8202;in less than two weeks, with the great help of core OCaml developers, we identified the conditions of the crash and set up a workaround.</p><h3>Tracking down crashes in&nbsp;OCaml</h3><p>Most of our backend code is written in <a href="https://ocaml.org">OCaml</a>, a high level and expressive language supporting functional programming style (among others), which allows us to develop robust systems with ease, thanks to its strong type system and mature&nbsp;legacy.</p><p>The compiler segfaults were definitely a surprise, since this shouldn&rsquo;t happen for any program written in OCaml, as type system and other features (such as automatic bounds-checking) usually guard us from such errors. However, stack overflows can be possible sources of segfault (when a non-optimal recursion is running too deep), so our first intuition was to increase the stack size when running the compiler. This didn&rsquo;t change anything, and the reported fault address wasn&rsquo;t anywhere near the stack address&nbsp;bound.</p><p>Before witnessing the crash on other machines, we suspected a failure in the virtualization software used by our two developers that were able to reproduce the crash, who use VMware as a part of their development workflow. We tried early on to switch to Virtualbox, but the migration proved itself fruitless as the crashes kept appearing. After a short while we began encountering the same issue on physical machines, so we ruled out a possible virtualization software&nbsp;bug.</p><p>The usual debugging process for crashing OCaml code didn&rsquo;t prove effective&#8202;&mdash;&#8202;we needed to narrow down our approach.</p><p>OCaml ships with <a href="https://realworldocaml.org/v1/en/html/the-compiler-backend-byte-code-and-native-code.html">two backend implementations</a>: a bytecode interpreter and a native compiler. We were able to reproduce the issue using both a native compiler and a compiler running on the bytecode interpreter. Consequently, this ruled out a miscompilation coming from the code <em>emitted</em> by the compiler, the OCaml runtime <em>itself</em> was misbehaving.</p><p>The runtime code is written in C, and implements low level functionalities, including the garbage collector used by both backends. After rebuilding the runtime with debug symbols, we were able to retrieve a proper stack trace and core dump. The stack trace pointed to the garbage collector&rsquo;s mark phase. OCaml&rsquo;s GC is a classic generational mark and sweep collector. The mark phase walks the heap starting from pointers on the stack and other registered root values, and marks every reachable block of&nbsp;memory.</p><p>Further inspection with <strong><em>gdb</em></strong> of the frame and address of crash revealed that the marking code encountered a corrupted block header with invalid size information, causing what looked like a buffer overrun error. Each memory block allocated in OCaml heap begins with a header word, storing metadata used by the GC, including a tag describing the kind of value present in this memory block. The header contains the size of the block, and the crash happened when the mark code was attempting to scan an array which was supposed to be more than 1TB&nbsp;large.</p><p>This was obviously not the cause of the problem but rather the consequence: something corrupted the header word after this block had been properly allocated, postponing the crash until the next GC cycle. It was the right time to escalate <a href="https://caml.inria.fr/mantis/view.php?id=7452">the issue to the OCaml bugtracker</a>, after isolating a proper test case to reproduce the&nbsp;issue.</p><h3>A set of strange&nbsp;leads</h3><p>Escalating the issue to Mantis made us to take a step back and gather our findings, and we quickly got great feedback from the OCaml core&nbsp;team.</p><p>At this point, what does the problem look&nbsp;like?</p><p>We only had sparse information, but <strong><em>dmesg</em></strong> gave us interesting data point. When a page fault occurs and the kernel detects an incorrect memory access, it logs a line in kernel log buffer containing the fault address, the instruction pointer and stack&nbsp;pointer.</p><p>[22985.879907] ocamlopt.opt[48221]: segfault at af8 ip 00005564455169bd sp 00007ffc9f36b130 error 4 in ocamlopt.opt[556445006000+613000]</p><p>Next to the 3 addresses, already available in the coredumps, an error code is reported. This number in decimal form is actually a bitset, and the flags are documented in the Linux kernel sources in <a href="https://github.com/torvalds/linux/blob/v4.11/arch/x86/mm/fault.c#L41">arch/x86/mm/fault.c</a>. Error 4 can thus be read as a read access page fault from user mode, trying to read memory which had not been previously mmap&rsquo;ed.</p><p>Error codes reported following our crashes involved protection faults or access to unmapped addresses, which corroborated our earlier buffer overrun hypothesis. More interestingly we witnessed a crash with the PF_RSVD flag enabled. This left us puzzled, none of us had ever seen such fault before. Apparently it indicates that the the page table was somehow corrupted, with some entries having non-zero bits reserved by the x86 architecture specification.</p><p>It was scary that the corruption would escape the process address space, and to our limited knowledge, it could only have been caused by kernel issue or potentially hardware issues, like memory errors. Yet we were able to reproduce this on several machines with different kernel version, and different hardware. We blamed virtual machines earlier but this theory was debunked already. We still have no explanation at this time, and pursuit on this front would require intimate knowledge of virtual memory implementations that we didn&rsquo;t&nbsp;have.</p><p>One developer wasn&rsquo;t able to reproduce the problem at all on his machine after hours of testing, but something was fishy: it didn&rsquo;t sound right that an OCaml runtime bug would be able to modify the page table. Maybe it was some corner case with reserved addresses, but this something was beyond our reach here. Out of ideas, it was time to get some assistance from tools intended to track memory corruptions, like <a href="https://github.com/google/sanitizers/wiki/AddressSanitizer">asan and&nbsp;ubsan</a>.</p><p>Running <strong><em>Asan</em></strong> didn&rsquo;t yield any meaningful results. <strong><em>Valgrind</em></strong> was later tried, following advises from the OCaml team, but every tools were preventing the crash. Quickly reproducing the bug for testing required running code in a loop, keeping the CPU and memory fully&nbsp;busy.</p><p>This was harder to do on developers machines, due to limited resources and other processes running, and Address Sanitizer would only increase the resources usage. Dedicating a powerful server would make further investigations more comfortable, and increase the likeliness of reproducing with instrumented code.</p><p>But with great surprise, it was not possible to reproduce the problem on a server machine, with and without instrumented code. This is when we realised that all the machines exhibiting the crashes were running a processor of the Intel Skylake processors family, while the server and other developer machines had CPUs from the Broadwell family.</p><h3>The hardware, an unusual&nbsp;suspect</h3><p>In the meantime several core OCaml developers had been closely investigating the issue and started auditing recent changes in the runtime, and identified a few suspicious changes and known&nbsp;bugs.</p><p>Certainly they were more qualified for this task, but it acted as an incentive to examine the history of this bug from our angle. At first, we had assumed that the bug was specific to the new laptop with virtual machines. This could not explain why the crash never manifested on older workstations equipped with Skylake processors. Several other developers had been using them for a few months, and only noticed the crash after awareness of the issue had been raised by Enguerrand.</p><p>What had changed, besides Skylake? Only a few week before, an internal migration from OCaml version 4.02.3 to 4.03.0 was rolled out in our codebase. Intrigued, we went ahead and tested OCaml 4.02.3 again, which showed no memory corruptions after several tests. It was time to browse the <a href="https://raw.githubusercontent.com/ocaml/ocaml/trunk/Changes">OCaml changelog</a> for runtime related entries. The search stopped quickly on a promising item in the list: the OCaml C runtime build optimisation level had been increased to -O2 from&nbsp;-O1.</p><p>Could the optimizations dig out an undefined behavior in C code, leading to bad assumptions in the GC code corrupting the heap&nbsp;? Rebuilding the runtime with -O1did not corrupt memory, so the source of the corruption was in the runtime <em>and</em> was triggered by some gcc specific optimization pass. This sounded like undefined behavior, although the information we had led us to some hardware&nbsp;bug.</p><p>The next day, Xavier Leroy commented on the bugreport reporting that the crash had been observed in the past. Another industrial OCaml user was affected, and they had discovered HyperThreading was part of the necessary conditions. After running the test case for several hours on several machines with HT disabled in the UEFI setup, it was clear we were facing a similar situation. This led to the hypothesis of a hardware&nbsp;bug:</p><blockquote><em>Is it crazy to imagine that gcc -O2 on the OCaml 4.03 runtime produces a specific instruction sequence that causes hardware issues in (some steppings of) Skylake processors with hyperthreading? Perhaps it is&nbsp;crazy.</em></blockquote><p>This possibility had struck us too, motivated by the HyperThreading, the page table corruption and the Skylake specific set of conditions.</p><p>This issue had certainly a strange profile. But nobody was ready to fully embrace the cpu bug hypothesis yet. We convinced ourselves that disabling HT could affect cache pressure and unfold some undefined behaviours.</p><p>HT could also explain the non-determinism, since cache pressure would depend on timings and scheduling. None of us had sufficient experience in this area to assess the strength of such hypothesis, and we did not quite buy it on a single threaded OCaml program. Our debugging motto claims that &ldquo;assumptions are not&nbsp;facts&rdquo;.</p><p>It was time to browse Intel errata list and attempt to update the CPU microcode. Although, the errata descriptions are formulated in vague terms, none of the issues disclosed at this time were looking similar to the situation under investigation. Unfortunately, CPUs microcode had no fix waiting for us either. OCaml developers investigated the errata list from their side but the lack of detailed information turned this into a fruitless and complex&nbsp;task.</p><p>In the absence of better alternative, we focused our work on pinpointing the exact source of the crash as if it was a software bug, in the hope of either finding a code issue or ruling out this hypothesis while getting more detailed data. We needed a way to identify the problematic code and find a workaround. From our side, it was not only a matter of finding whether or not there was a bug in OCaml code, but more crucially we needed a guarantee on the quality of our generated code running critical services in production.</p><h3>Identifying the offending code</h3><p>The other OCaml user affected by this issue reported that they had solved the problem by switching to another C compiler. Building the runtime with clang instead of GCC would prevent the GC from crashing. They also suggested to obtain a diff of the generated assembly. Indeed, once built with clang, the runtime would not crash. But clang generates widely different assembly from GCC and we did not have the resources to analyse several hundred thousand lines of&nbsp;changes.</p><p>If we could isolate the problematic C code, comparing the generated code would be easier. The problem had the form of a well known&nbsp;nail:</p><ul><li>Around 50 C files composing the OCaml&nbsp;runtime,</li><li>There is a good state (when built with gcc&nbsp;-O1)</li><li>And a bad state (when built with gcc&nbsp;-O2)</li></ul><p>This nail comes with a precious hammer: bisection.</p><p>The bisection approach had a downside in this occasion. Any state can be labeled bad with certainty as soon as the test crashes, but we would need to wait several hours to be confident enough to trust a non crashing test as good data-point. The reproducibility was not always consistent and a non-crashing state could be a false negative still waiting to trigger the conditions leading to the crash. A reduction of search space was necessary.</p><p>All the coredumps we had showed that the fault was caused by a corrupted heap block header, and our testcase involved the compiler. The OCaml compiler is not 100% deterministic, and IO/s primitives and unix environment in the runtime can affect timings and allocation patterns. But it sounded sensible to assume that the code corrupting a heap header block was also the code reading and writing those blocks: the major&nbsp;GC.</p><p>This hypothesis made bisecting fast: the first file we tried, <strong><em>major_gc.c</em></strong>, turned out to be the one. To make sure it was not a subtle issue in linker, reordering symbols or code blocks, we tried a few others files and confirmed changing the optimization level of some other files alone made no difference.</p><p>But the generated code difference was still way too large. Bringing this topic up on the <a href="http://webchat.freenode.net/?channels=#ocaml">OCaml IRC</a> discussion channel led to some useful inputs. We were taught that gcc supports an attribute to enable specific optimizations at the function level, using __attribute__((optimize(&quot;options,...&quot;))). Following the same strategy, it was easy to trace the source of the malfunctioning code to the <strong><em>sweep_slice</em></strong> function, which implements the sweeping phase of the classic mark and sweep garbage collector for the old generation.</p><p>Ignoring the subtle details of incremental GC, the <strong><em>sweep_slice</em></strong> function is the last pass of a normal major collection cycle. It is responsible for scanning all blocks in the major heap, and reclaiming unreachable blocks to the list of unallocated space.</p><p>The bulk of this function is a switch taking action for each block depending on its status&nbsp;:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/cfbc19fddccc5f2c1a76fcc802fae049/href">https://medium.com/media/cfbc19fddccc5f2c1a76fcc802fae049/href</a></iframe><p>This finding felt consistent with the information at hand. When the block is reachable, the color (describing the reachability status of the block) is reset. If the block became unreachable (<em>while color</em>) - it is reclaimed. In both cases, the block header is modified.</p><p>Getting back to the assembly&nbsp;diff.</p><p>Nobody in the team knows a great deal about assembly and we only have a really basic understanding of most of the instructions used in both versions. It quickly became obvious that the noise level in this diff, with thousands of lines of changed, was still too high for us to spot anything related to the problem. This problem was getting far beyond the common knowledge of everyone in the&nbsp;team.</p><p>But this was still sounding like your day to day bug tracking process. The less you know, the more careful you need to be, tackling the problem step by step. We stuck to what approach had served us well until now: bisecting.</p><p>We went through the list of optimisation passes enabled by GCC at -O2. This is a fair amount of optimisation passes and it would have been too time consuming to try them one by one, given the time needed to trigger the crash. Yet we had a hint: a memory corruption was happening semi randomly in the garbage collector. We were also keeping the undefined behaviour bug as a potential explanation. It was likely a pass which would change the structure of the code, reordering blocks and changing conditions.</p><p>After reading the description of all switches in the detailed gcc manual, the -ftree-* pass family looked promising. This set of transformations works on the <a href="https://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a> internal representation, a widespread intermediate language representation which has the benefit of being easy to read. They seem to make a huge impact on the generated assembly code, moving code blocks around and making assumptions on code invariants in order to move around, simplify or eliminate conditional checks altogether.</p><p>By looking at output of those passes on the related source code, we narrowed down the list of transformations to a couple of interesting passes, one of them being -ftree-vrp, which stands for Value Range Propagation. This pass computes bounds for each name binding and propagates proofs that a value must lie in a given&nbsp;range.</p><p>It turned out most of the other passes depended on it for further optimisations. Even though the issue ended up not being a bad assumptions in the range values, checking this pass proved to be worthwhile: enabling -ftree-vrp on <strong><em>sweep_slice</em></strong> function while every thing else was built with -O1 was enough to trigger a&nbsp;crash.</p><p>GCC provides very good diagnostics output, and after reading the manual we found the -fdump-tree-* switch to dump the SSA form before and after specific pass. The output is designed to be read by a human and provides meaningful naming, with source code locations, alongside the ranges propagated by the VRP pass. We spent some time studying the output and matched the difference in SSA tree between the crashing and not crashing&nbsp;code.</p><p>Examining the bounds and invariants derived by gcc, it was clear that no wrong hypothesis was&nbsp;stated.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/a5a311d0b5a4f890f9541f0aed91e73e/href">https://medium.com/media/a5a311d0b5a4f890f9541f0aed91e73e/href</a></iframe><p>The only meaningful observable change involves the suppression of rechecking the loop condition in the else branch of the <strong><em>sweep_slice</em></strong> function, after Value Range Propagation proved that the condition was invariant in this&nbsp;branch.</p><p>Often, reading the code carefully is the fastest way to find a bug. But after spending hours staring at the major GC code, it was clear enough that this check removal should not cause any semantic&nbsp;changes.</p><p>In this process, we identified a suspicious bit of code, where a signed long variable was promoted to unsigned according to C standard rules, which was changing the bounds derived by gcc, assuming it was always positive. But after some thinking we realised it made no difference at assembly level and although wrong, this assumption was not used anywhere.</p><p>We were now ready to rule out the possibility of a bug in OCaml runtime. It was still possible that GCC backend had a bug and was miscompiling this particular shape of code. And we were back at the assembly level again. After writing some awk formatting script to cleanup assembly and minimise noise in the diff (by renaming labels, detecting spurious code move, etc), and preventing inlining, we found a minimal assembly patch causing the&nbsp;crash.</p><p>There were only cosmetic differences. The test removal was propagated down to assembly and caused gcc to reorganise the layout of each switch case&nbsp;block.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/532ab4896b11177a96203fd0c81eaa58/href">https://medium.com/media/532ab4896b11177a96203fd0c81eaa58/href</a></iframe><p>Among those minor differences and changes of layout, we noticed a particular change which impacted exactly the reachable block header updated which could have caused header corruption. In the unoptimised version, the updating code looked like&nbsp;this:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/38ca30a1decf86233084721c3803b1c4/href">https://medium.com/media/38ca30a1decf86233084721c3803b1c4/href</a></iframe><p>For some reason, the block pointer was spilled to the&nbsp;stack.</p><p>Perhaps naively, and because we had earlier emitted the hypothesis of HT impacting cache pressure, we spent a few hours staring at this code and check if we were missing something subtle which could affect the control flow of the whole function and the stack location from which it was reloaded could be corrupted.</p><p>Despite our lack of assembly knowledge, after spending several hours reading this tiny change, we got convinced that it made strictly no semantic difference. Reading the x86 manual carefully didn&rsquo;t give any hint on any subtle behavior which would trigger. Executing any of those two sequence of instruction should give the exact same&nbsp;output.</p><h3>Mitigating the&nbsp;issue</h3><p>We were now quite certain it was a CPU&nbsp;bug.</p><p>The OCaml developers had reached the same conclusion, and were working on escalating the issue to Intel. After internal discussions we decided to keep this bug as low profile as possible since we were unsure about potential security implications, especially for JIT implementations.</p><p>Even if we had no confirmation at this point nor any explanations of the cause of this bug, which was beyond our reach, we could take&nbsp;actions.</p><p>The first step was to decide against getting any new Skylake based servers until further announcement. We were left with several Skylake machines but we refrained from deploying any OCaml code on them. OCaml comes with a great package manager, <a href="https://opam.ocaml.org/">opam</a>, which supports compiler switches. Switches allow to set up a clean and distinct environment with specific packages and compiler configuration.</p><p>We patched our internal opam repository to distribute unoptimised runtime to all developers and moved forward, waiting for further announcements.</p><p>This situation made us realise that microcode requires constant updates, just like any other software in the stack. We raised awareness on this topic in our devops team, and they took measure to ensure we could roll out updates to prod&nbsp;easily.</p><h3>Happy end</h3><p>In late May, devops team noticed a <a href="http://metadata.ftp-master.debian.org/changelogs/non-free/i/intel-microcode/intel-microcode_3.20170511.1_changelog">debian package update for intel-microcode</a> containing the following change:</p><pre>Likely fix nightmare-level Skylake erratum SKL150. Fortunately,<br/>either this erratum is very-low-hitting, or gcc/clang/icc/msvc<br/>won&rsquo;t usually issue the affected opcode pattern and it ends up<br/>being rare.<br/>SKL150 &mdash; Short loops using both the AH/BH/CH/DH registers and<br/>the corresponding wide register *may* result in unpredictable<br/>system behavior. Requires both logical processors of the same<br/>core (i.e. sibling hyperthreads) to be active to trigger, as<br/>well as a &ldquo;complex set of micro-architectural conditions&rdquo;</pre><p>The erratum description immediately rang a bell as it matched the diff in the assembly we had observed. We tested the microcode update and confirmed it fixed the corruption.</p><p>Finally, our Skylake CPUs were feeling safe and OCaml compiler was&nbsp;happy.</p><p><a href="https://ahrefs.com"><em>Ahrefs</em></a><em> runs an internet-scale bot that crawls the whole Web 24/7. Our backend system is powered by a custom petabyte-scale distributed key-value storage implemented in OCaml (and some C++ and Rust). We are a small team and strongly believe in better technology leading to better solutions for real-world problems. We worship functional languages and static typing, extensively employ code generation and meta-programming, value code clarity and predictability, and are constantly seeking to automate repetitive tasks and eliminate boilerplate. And we are&nbsp;</em><a href="https://ahrefs.com/jobs"><em>hiring</em></a><em>!</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=ab1ad2beddcd" width="1" height="1" alt=""/><hr/><p><a href="https://tech.ahrefs.com/skylake-bug-a-detective-story-ab1ad2beddcd">Skylake bug: a detective story</a> was originally published in <a href="https://tech.ahrefs.com">Ahrefs</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>https://tech.ahrefs.com/skylake-bug-a-detective-story-ab1ad2beddcd?source=rss----303662d88bae--ocamlSkylake bug: a detective story2017-06-28T18:34:51-00:00ahrefs \ No newline at end of file diff --git a/data/planet/hannes.xml b/data/planet/hannes.xml new file mode 100644 index 0000000000..e0d4a25c81 --- /dev/null +++ b/data/planet/hannes.xml @@ -0,0 +1,1100 @@ + +https://hannes.robur.coop/atomhannes2023-05-02T14:42:52-00:00https://hannes.robur.coop/atomhannes<h2>Deploying MirageOS unikernels</h2> +<p>More than five years ago, I posted <a href="https://hannes.robur.coop/Posts/VMM">how to deploy MirageOS unikernels</a>. My motivation to work on this topic is that I'm convinced of reduced complexity, improved security, and more sustainable resource footprint of MirageOS unikernels, and want to ease deployment thereof. More than one year ago, I described <a href="https://hannes.robur.coop/Posts/Deploy">how to deploy reproducible unikernels</a>.</p> +<h2>Albatross</h2> +<p>In recent months we worked hard on the underlying infrastructure: <a href="https://github.com/roburio/albatross">albatross</a>. Albatross is the orchestration system for MirageOS unikernels that use solo5 with <a href="https://github.com/Solo5/solo5/blob/master/docs/architecture.md">hvt or spt tender</a>. It deals with three tasks:</p> +<ul> +<li>unikernel creation (destroyal, restart) +</li> +<li>capturing console output +</li> +<li>collecting metrics in the host system about unikernels +</li> +</ul> +<p>An addition to the above is dealing with multiple tenants on the same machine: remote management of your unikernel fleet via TLS, and resource policies.</p> +<h2>History</h2> +<p>The initial commit of albatross was in May 2017. Back then it replaced the shell scripts and manual <code>scp</code> of unikernel images to the server. Over time it evolved and adapted to new environments. Initially a solo5 unikernel would only know of a single network interface, these days there can be multiple distinguished by name. Initially there was no support for block devices. Only FreeBSD was supported in the early days. Nowadays we built daily packages for Debian, Ubuntu, FreeBSD, and have support for NixOS, and the client side is supported on macOS as well.</p> +<h3>ASN.1</h3> +<p>The communication format between the albatross daemons and clients was changed multiple times. I'm glad that albatross uses ASN.1 as communication format, which makes extension with optional fields easy, and also allows &quot;choice&quot; (the sum type) to be not tagged (the binary is the same as no choice type), thus adding choice to an existing grammar, and preserving the old in the default (untagged) case is a decent solution.</p> +<p>So, if you care about backward and forward compatibility, as we do, since we may be in control of which albatross servers are deployed on our machine, but not what albatross versions the clients are using -- it may be wise to look into ASN.1. Recent efforts (json with schema, ...) may solve similar issues, but ASN.1 is as well very tiny in size.</p> +<h2>What resources does a unikernel need?</h2> +<p>A unikernel is just an operating system for a single service, there can't be much it can need.</p> +<h3>Name</h3> +<p>So, first of all a unikernel has a name, or a handle. This is useful for reporting statistics, but also to specify which console output you're interested in. The name is a string with printable ASCII characters (and dash '-' and dot '.'), with a length up to 64 characters - so yes, you can use an UUID if you like.</p> +<h3>Memory</h3> +<p>Another resource is the amount of memory assigned to the unikernel. This is specified in megabyte (as solo5 does), with the range being 10 (below not even a hello world wants to start) to 1024.</p> +<h3>Arguments</h3> +<p>Of course, you can pass via albatross boot parameters to the unikernel. Albatross doesn't impose any restrictions here, but the lower levels may.</p> +<h3>CPU</h3> +<p>Due to multiple tenants, and side channel attacks, it looked right at the beginning like a good idea to restrict each unikernel to a specific CPU. This way, one tenant may use CPU 5, and another CPU 9 - and they'll not starve each other (best to make sure that these CPUs are in different packages). So, albatross takes a number as the CPU, and executes the solo5 tender within <code>taskset</code>/<code>cpuset</code>.</p> +<h3>Fail behaviour</h3> +<p>In normal operations, exceptional behaviour may occur. I have to admit that I've seen MirageOS unikernels that suffer from not freeing all the memory they have allocated. To avoid having to get up at 4 AM just to start the unikernel that went out of memory, there's the possibility to restart the unikernel when it exited. You can even specify on which exit codes it should be restarted (the exit code is the only piece of information we have from the outside what caused the exit). This feature was implemented in October 2019, and has been very precious since then. :)</p> +<h3>Network</h3> +<p>This becomes a bit more complex: a MirageOS unikernel can have network interfaces, and solo5 specifies a so-called manifest with a list of these (name and type, and type is so far always basic). Then, on the actual server there are bridges (virtual switches) configured. Now, these may have the same name, or may need to be mapped. And of course, the unikernel expects a tap interface that is connected to such a bridge, not the bridge itself. Thus, albatross creates tap devices, attaches these to the respective bridges, and takes care about cleaning them up on teardown. The albatross client verifies that for each network interface in the manifest, there is a command-line argument specified (<code>--net service:my_bridge</code> or just <code>--net service</code> if the bridge is named service). The tap interface name is not really of interest to the user, and will not be exposed.</p> +<h3>Block devices</h3> +<p>On the host system, it's just a file, and passed to the unikernel. There's the need to be able to create one, dump it, and ensure that each file is only used by one unikernel. That's all that is there.</p> +<h2>Metrics</h2> +<p>Everyone likes graphs, over time, showing how much traffic or CPU or memory or whatever has been used by your service. Some of these statistics are only available in the host system, and it is also crucial for development purposes to compare whether the bytes sent in the unikernel sum up to the same on the host system's tap interface.</p> +<p>The albatross-stats daemon collects metrics from three sources: network interfaces, getrusage (of a child process), VMM debug counters (to count VM exits etc.). Since the recent 1.5.3, albatross-stats now connects at startup to the albatross-daemon and then retrieves the information which unikernels are up and running, and starts periodically collecting data in memory.</p> +<p>Other clients, being it a dump on your console window, a write into an rrd file (good old MRTG times), or a push to influx, can use the stats data to correlate and better analyse what is happening on the grand scale of things. This helped a lot by running several unikernels with different opam package sets to figure out which opam packages leave their hands on memory over time.</p> +<p>As a side note, if you make the unikernel name also available in the unikernel, it can tag its own metrics with the same identifier, and you can correlate high-level events (such as amount of HTTP requests) with low-level things &quot;allocated more memory&quot; or &quot;consumed a lot of CPU&quot;.</p> +<h2>Console</h2> +<p>There's not much to say about the console, just that the albatross-console daemon is running with low privileges, and reading from a FIFO that the unikernel writes to. It never writes anything to disk, but keeps the last 1000 lines in memory, available from a client asking for it.</p> +<h2>The daemons</h2> +<p>So, the main albatross-daemon runs with superuser privileges to create virtual machines, and opens a unix domain socket where the clients and other daemons are connecting to. The other daemons are executed with normal user privileges, and never write anything to disk.</p> +<p>The albatross-daemon keeps state about the running unikernels, and if it is restarted, the unikernels are started again. Maybe worth to mention that this lead sometimes to headaches (due to data being dumped to disk, and the old format should always be supported), but was also a huge relief to not have to care about creating all the unikernels just because albatross-daemon was killed.</p> +<h2>Remote management</h2> +<p>There's one more daemon program, either albatross-tls-inetd (to be executed by inetd), or albatross-tls-endpoint. They accept clients via a remote TCP connection, and establish a mutual-authenticated TLS handshake. When done, they forward the command to the respective Unix domain socket, and send back the reply.</p> +<p>The daemon itself has a X.509 certificate to authenticate, but the client is requested to show its certificate chain as well. This by now requires TLS 1.3, so the client certificates are sent over the encrypted channel.</p> +<p>A step back, x X.509 certificate contains a public key and a signature from one level up. When the server knows about the root (or certificate authority (CA)) certificate, and following the chain can verify that the leaf certificate is valid. Additionally, a X.509 certificate is a ASN.1 structure with some fixed fields, but also contains extensions, a key-value store where the keys are object identifiers, and the values are key-dependent data. Also note that this key-value store is cryptographically signed.</p> +<p>Albatross uses the object identifier, assigned to Camelus Dromedarius (MirageOS - 1.3.6.1.4.1.49836.42) to encode the command to be executed. This means that once the TLS handshake is established, the command to be executed is already transferred.</p> +<p>In the leaf certificate, there may be the &quot;create unikernel&quot; command with the unikernel image, it's boot parameters, and other resources. Or a &quot;read the console of my unikernel&quot;. In the intermediate certificates (from root to leaf), resource policies are encoded (this path may only have X unikernels running with a total of Y MB memory, and Z MB of block storage, using CPUs A and B, accessing bridges C and D). From the root downwards these policies may only decrease. When a unikernel should be created (or other commands are executed), the policies are verified to hold. If they do not, an error is reported.</p> +<h2>Fleet management</h2> +<p>Of course it is very fine to create your locally compiled unikernel to your albatross server, go for it. But in terms of &quot;what is actually running here?&quot; and &quot;does this unikernel need to be updated because some opam package had a security issues?&quot;, this is not optimal.</p> +<p>Since we provide <a href="https://builds.robur.coop">daily reproducible builds</a> with the current HEAD of the main opam-repository, and these unikernels have no configuration embedded (but take everything as boot parameters), we just deploy them. They come with the information what opam packages contributed to the binary, which environment variables were set, and which system packages were installed with which versions.</p> +<p>The whole result of reproducible builds for us means: we have a hash of a unikernel image that we can lookup in our build infrastructure, and take a look whether there is a newer image for the same job. And if there is, we provide a diff between the packages contributed to the currently running unikernel and the new image. That is what the albatross-client update command is all about.</p> +<p>Of course, your mileage may vary and you want automated deployments where each git commit triggers recompilation and redeployment. The downside would be that sometimes only dependencies are updated and you've to cope with that.</p> +<p>At the moment, there is a client connecting directly to the unix domain sockets, <code>albatross-client-local</code>, and one connecting to the TLS endpoint, <code>albatross-client-bistro</code>. The latter applies compression to the unikernel image.</p> +<h2>Installation</h2> +<p>For Debian and Ubuntu systems, we provide package repositories. Browse the dists folder for one matching your distribution, and add it to <code>/etc/apt/sources.list</code>:</p> +<pre><code>$ wget -q -O /etc/apt/trusted.gpg.d/apt.robur.coop.gpg https://apt.robur.coop/gpg.pub +$ echo &quot;deb https://apt.robur.coop ubuntu-20.04 main&quot; &gt;&gt; /etc/apt/sources.list # replace ubuntu-20.04 with e.g. debian-11 on a debian buster machine +$ apt update +$ apt install solo5 albatross +</code></pre> +<p>On FreeBSD:</p> +<pre><code>$ fetch -o /usr/local/etc/pkg/robur.pub https://pkg.robur.coop/repo.pub # download RSA public key +$ echo 'robur: { + url: &quot;https://pkg.robur.coop/${ABI}&quot;, + mirror_type: &quot;srv&quot;, + signature_type: &quot;pubkey&quot;, + pubkey: &quot;/usr/local/etc/pkg/robur.pub&quot;, + enabled: yes +}' &gt; /usr/local/etc/pkg/repos/robur.conf # Check https://pkg.robur.coop which ABI are available +$ pkg update +$ pkg install solo5 albatross +</code></pre> +<p>For other distributions and systems we do not (yet?) provide binary packages. You can compile and install them using opam (<code>opam install solo5 albatross</code>). Get in touch if you're keen on adding some other distribution to our reproducible build infrastructure.</p> +<h2>Conclusion</h2> +<p>After five years of development and operating albatross, feel free to get it and try it out. Or read the code, discuss issues and shortcomings with us - either at the issue tracker or via eMail.</p> +<p>Please reach out to us (at team AT robur DOT coop) if you have feedback and suggestions. We are a non-profit company, and rely on <a href="https://robur.coop/Donate">donations</a> for doing our work - everyone can contribute.</p> +https://hannes.robur.coop/Posts/AlbatrossDeploying reproducible unikernels with albatross2022-11-17T12:41:11-00:00hanneshttps://hannes.robur.coop/atomhannes<p>We at <a href="https://robur.coop">robur</a> developed <a href="https://git.robur.io/robur/opam-mirror">opam-mirror</a> in the last month and run a public opam mirror at https://opam.robur.coop (updated hourly).</p> +<h1>What is opam and why should I care?</h1> +<p><a href="https://opam.ocaml.org">Opam</a> is the OCaml package manager (also used by other projects such as <a href="https://coq.inria.fr">coq</a>). It is a source based system: the so-called repository contains the metadata (url to source tarballs, build dependencies, author, homepage, development repository) of all packages. The main repository is hosted on GitHub as <a href="https://github.com/ocaml/opam-repository">ocaml/opam-repository</a>, where authors of OCaml software can contribute (as pull request) their latest releases.</p> +<p>When opening a pull request, automated systems attempt to build not only the newly released package on various platforms and OCaml versions, but also all reverse dependencies, and also with dependencies with the lowest allowed version numbers. That's crucial since neither semantic versioning has been adapted across the OCaml ecosystem (which is tricky, for example due to local opens any newly introduced binding will lead to a major version bump), neither do many people add upper bounds of dependencies when releasing a package (nobody is keen to state &quot;my package will not work with <a href="https://erratique.ch/software/cmdliner">cmdliner</a> in version 1.2.0&quot;).</p> +<p>So, the opam-repository holds the metadata of lots of OCaml packages (around 4000 at the moment this article was written) with lots of versions (in total 25000) that have been released. It is used by the opam client to figure out which packages to install or upgrade (using a solver that takes the version bounds into consideration).</p> +<p>Of course, opam can use other repositories (overlays) or forks thereof. So nothing stops you from using any other opam repository. The url to the source code of each package may be a tarball, or a git repository or other version control systems.</p> +<p>The vast majority of opam packages released to the opam-repository include a link to the source tarball and a cryptographic hash of the tarball. This is crucial for security (under the assumption the opam-repository has been downloaded from a trustworthy source - check back later this year for updates on <a href="https://hannes.robur.coop/Posts/Conex">conex</a>). At the moment, there are some weak spots in respect to security: md5 is still allowed, and the hash and the tarball are downloaded from the same server: anyone who is in control of that server can inject arbitrary malicious data. As outlined above, we're working on infrastructure which fixes the latter issue.</p> +<h1>How does the opam client work?</h1> +<p>Opam, after initialisation, downloads the <code>index.tar.gz</code> from <code>https://opam.ocaml.org/index.tar.gz</code>, and uses this as the local opam universe. An <code>opam install cmdliner</code> will resolve the dependencies, and download all required tarballs. The download is first tried from the cache, and if that failed, the URL in the package file is used. The download from the cache uses the base url, appends the archive-mirror, followed by the hash algorithm, the first two characters of the has of the tarball, and the hex encoded hash of the archive, i.e. for cmdliner 1.1.1 which specifies its sha512: <code>https://opam.ocaml.org/cache/sha512/54/5478ad833da254b5587b3746e3a8493e66e867a081ac0f653a901cc8a7d944f66e4387592215ce25d939be76f281c4785702f54d4a74b1700bc8838a62255c9e</code>.</p> +<h1>How does the opam repository work?</h1> +<p>According to DNS, opam.ocaml.org is a machine at amazon. It likely, apart from the website, uses <code>opam admin index</code> periodically to create the index tarball and the cache. There's an observable delay between a package merge in the opam-repository and when it shows up at opam.ocaml.org. Recently, there was <a href="https://discuss.ocaml.org/t/opam-ocaml-org-is-currently-down-is-that-where-indices-are-kept-still/">a reported downtime</a>.</p> +<p>Apart from being a single point of failure, if you're compiling a lot of opam projects (e.g. a continuous integration / continuous build system), it makes sense from a network usage (and thus sustainability perspective) to move the cache closer to where you need the source archives. We're also organising the MirageOS <a href="http://retreat.mirage.io">hack retreats</a> in a northern African country with poor connectivity - so if you gather two dozen camels you better bring your opam repository cache with you to reduce the bandwidth usage (NB: this requires at the moment cooperation of all participants to configure their default opam repository accordingly).</p> +<h1>Re-developing &quot;opam admin create&quot; as MirageOS unikernel</h1> +<p>The need for a local opam cache at our <a href="https://builds.robur.coop">reproducible build infrastructure</a> and the retreats, we decided to develop <a href="https://git.robur.io/robur/opam-mirror">opam-mirror</a> as a <a href="https://mirage.io">MirageOS unikernel</a>. Apart from a useful showcase using persistent storage (that won't fit into memory), and having fun while developing it, our aim was to reduce our time spent on system administration (the <code>opam admin index</code> is only one part of the story, it needs a Unix system and a webserver next to it - plus remote access for doing software updates - which has quite some attack surface.</p> +<p>Another reason for re-developing the functionality was that the opam code (what opam admin index actually does) is part of the opam source code, which totals to 50_000 lines of code -- looking up whether one or all checksums are verified before adding the tarball to the cache, was rather tricky.</p> +<p>In earlier years, we avoided persistent storage and block devices in MirageOS (by embedding it into the source code with <a href="https://github.com/mirage/ocaml-crunch">crunch</a>, or using a remote git repository), but recent development, e.g. of <a href="https://somerandomidiot.com/blog/2022/03/04/chamelon/">chamelon</a> sparked some interest in actually using file systems and figuring out whether MirageOS is ready in that area. A month ago we started the opam-mirror project.</p> +<p>Opam-mirror takes a remote repository URL, and downloads all referenced archives. It serves as a cache and opam-repository - and does periodic updates from the remote repository. The idea is to validate all available checksums and store the tarballs only once, and store overlays (as maps) from the other hash algorithms.</p> +<h1>Code development and improvements</h1> +<p>Initially, our plan was to use <a href="https://github.com/mirage/ocaml-git">ocaml-git</a> for pulling the repository, <a href="https://github.com/yomimono/chamelon">chamelon</a> for persistent storage, and <a href="https://github.com/inhabitedtype/httpaf">httpaf</a> as web server. With <a href="https://github.com/mirage/ocaml-tar">ocaml-tar</a> recent support of <a href="https://github.com/mirage/ocaml-tar/pull/88">gzip</a> we should be all set, and done within a few days.</p> +<p>There is already a gap in the above plan: which http client to use - in the best case something similar to our <a href="https://github.com/roburio/http-lwt-client">http-lwt-client</a> - in MirageOS: it should support HTTP 1.1 and HTTP 2, TLS (with certificate validation), and using <a href="https://github.com/roburio/happy-eyeballs">happy-eyeballs</a> to seemlessly support both IPv6 and legacy IPv4. Of course it should follow redirect, without that we won't get far in the current Internet.</p> +<p>On the path (over the last month), we fixed file descriptor leaks (memory leaks) in <a href="https://github.com/dinosaure/paf-le-chien">paf</a> -- which is used as a runtime for httpaf and h2.</p> +<p>Then we ran into some trouble with chamelon (<a href="https://github.com/yomimono/chamelon/issues/11">out of memory</a>, some degraded peformance, it reporting out of disk space), and re-thought our demands for opam-mirror. Since the cache is only ever growing (new packages are released), there's no need to ever remove anything: it is append-only. Once we figured that out, we investigated what needs to be done in ocaml-tar (where tar is in fact a tape archive, and was initially designed as file format to be appended to) to support appending to an archive.</p> +<p>We also re-thought our bandwidth usage, and instead of cloning the git remote at startup, we developed <a href="https://git.robur.io/robur/git-kv">git-kv</a> which can dump and restore the git state.</p> +<p>Also, initially we computed all hashes of all tarballs, but with the size increasing (all archives are around 7.5GB) this lead to a major issue of startup time (around 5 minutes on a laptop), so we wanted to save and restore the maps as well.</p> +<p>Since neither git state nor the maps are suitable for tar's append-only semantics, and we didn't want to investigate yet another file system - such as <a href="https://github.com/mirage/ocaml-fat">fat</a> may just work fine, but the code looks slightly bitrot, and the reported issues and non-activity doesn't make this package very trustworthy from our point of view. Instead, we developed <a href="https://github.com/reynir/mirage-block-partition">mirage-block-partition</a> to partition a block device into two. Then we just store the maps and the git state at the end - the end of a tar archive is 2 blocks of zeroes, so stuff at the far end aren't considered by any tooling. Extending the tar archive is also possible, only the maps and git state needs to be moved to the end (or recomputed). As file system, we developed <a href="https://git.robur.io/reynir/oneffs">oneffs</a> which stores a single value on the block device.</p> +<p>We observed a high memory usage, since each requested archive was first read from the block device into memory, and then sent out. Thanks to Pierre Alains <a href="https://github.com/mirage/mirage-kv/pull/28">recent enhancements</a> of the mirage-kv API, there is a <code>get_partial</code>, that we use to chunk-wise read the archive and send it via HTTP. Now, the memory usage is around 20MB (the git repository and the generated tarball are kept in memory).</p> +<p>What is next? Downloading and writing to the tar archive could be done chunk-wise as well; also dumping and restoring the git state is quite CPU intensive, we would like to improve that. Adding the TLS frontend (currently done on our site by our TLS termination proxy <a href="https://github.com/roburio/tlstunnel">tlstunnel</a>) similar to how <a href="https://github.com/roburio/unipi">unipi</a> does it, including let's encrypt provisioning -- should be straightforward (drop us a note if you'd be interesting in that feature).</p> +<h1>Conclusion</h1> +<p>To conclude, we managed within a month to develop this opam-mirror cache from scratch. It has a reasonable footprint (CPU and memory-wise), is easy to maintain and easy to update - if you want to use it, we also provide <a href="https://builds.robur.coop/job/opam-mirror">reproducible binaries</a> for solo5-hvt. You can use our opam mirror with <code>opam repository set-url default https://opam.robur.coop</code> (revert to the other with <code>opam repository set-url default https://opam.ocaml.org</code>) or use it as a backup with <code>opam repository add robur --rank 2 https://opam.robur.coop</code>.</p> +<p>Please reach out to us (at team AT robur DOT coop) if you have feedback and suggestions. We are a non-profit company, and rely on <a href="https://robur.coop/Donate">donations</a> for doing our work - everyone can contribute.</p> +https://hannes.robur.coop/Posts/OpamMirrorMirroring the opam repository and all tarballs2022-09-29T13:04:14-00:00hanneshttps://hannes.robur.coop/atomhannes<h1>Introduction to monitoring</h1> +<p>At <a href="https://robur.coop">robur</a> we use a range of MirageOS unikernels. Recently, we worked on improving the operations story thereof. One part is shipping binaries using our <a href="https://builds.robur.coop">reproducible builds infrastructure</a>. Another part is, once deployed we want to observe what is going on.</p> +<p>I first got into touch with monitoring - collecting and graphing metrics - with <a href="https://oss.oetiker.ch/mrtg/">MRTG</a> and <a href="https://munin-monitoring.org/">munin</a> - and the simple network management protocol <a href="https://en.wikipedia.org/wiki/Simple_Network_Management_Protocol">SNMP</a>. From the whole system perspective, I find it crucial that the monitoring part of a system does not add pressure. This favours a push-based design, where reporting is done at the disposition of the system.</p> +<p>The rise of monitoring where graphs are done dynamically (such as <a href="https://grafana.com/">Grafana</a>) and can be programmed (with a query language) by the operator are very neat, it allows to put metrics in relation after they have been recorded - thus if there's a thesis why something went berserk, you can graph the collected data from the past and prove or disprove the thesis.</p> +<h1>Monitoring a MirageOS unikernel</h1> +<p>From the operational perspective, taking security into account - either the data should be authenticated and integrity-protected, or being transmitted on a private network. We chose the latter, there's a private network interface only for monitoring. Access to that network is only granted to the unikernels and metrics collector.</p> +<p>For MirageOS unikernels, we use the <a href="https://github.com/mirage/metrics">metrics</a> library - which design shares the idea of <a href="https://erratique.ch/software/logs">logs</a> that only if there's a reporter registered, work is performed. We use the Influx line protocol via TCP to report via <a href="https://www.influxdata.com/time-series-platform/telegraf/">Telegraf</a> to <a href="https://www.influxdata.com/">InfluxDB</a>. But due to the design of <a href="https://github.com/mirage/metrics">metrics</a>, other reporters can be developed and used -- prometheus, SNMP, your-other-favourite are all possible.</p> +<p>Apart from monitoring metrics, we use the same network interface for logging via syslog. Since the logs library separates the log message generation (in the OCaml libraries) from the reporting, we developed <a href="https://github.com/hannesm/logs-syslog">logs-syslog</a>, which registers a log reporter sending each log message to a syslog sink.</p> +<p>We developed a small library for metrics reporting of a MirageOS unikernel into the <a href="https://github.com/roburio/monitoring-experiments">monitoring-experiments</a> package - which also allows to dynamically adjust log level and disable or enable metrics sources.</p> +<h2>Required components</h2> +<p>Install from your operating system the packages providing telegraf, influxdb, and grafana.</p> +<p>Setup telegraf to contain a socket listener:</p> +<pre><code>[[inputs.socket_listener]] + service_address = &quot;tcp://192.168.42.14:8094&quot; + keep_alive_period = &quot;5m&quot; + data_format = &quot;influx&quot; +</code></pre> +<p>Use a unikernel that reports to Influx (below the heading &quot;Unikernels (with metrics reported to Influx)&quot; on <a href="https://builds.robur.coop">builds.robur.coop</a>) and provide <code>--monitor=192.168.42.14</code> as boot parameter. Conventionally, these unikernels expect a second network interface (on the &quot;management&quot; bridge) where telegraf (and a syslog sink) are running. You'll need to pass <code>--net=management</code> and <code>--arg='--management-ipv4=192.168.42.x/24'</code> to albatross-client-local.</p> +<p>Albatross provides a <code>albatross-influx</code> daemon that reports information from the host system about the unikernels to influx. Start it with <code>--influx=192.168.42.14</code>.</p> +<h2>Adding monitoring to your unikernel</h2> +<p>If you want to extend your own unikernel with metrics, follow along these lines.</p> +<p>An example is the <a href="https://github.com/roburio/dns-primary-git">dns-primary-git</a> unikernel, where on the branch <code>future</code> we have a single commit ahead of main that adds monitoring. The difference is in the unikernel configuration and the main entry point. See the <a href="https://builds.robur.coop/job/dns-primary-git-monitoring/build/latest/">binary builts</a> in contrast to the <a href="https://builds.robur.coop/job/dns-primary-git/build/latest/">non-monitoring builts</a>.</p> +<p>In config, three new command line arguments are added: <code>--monitor=IP</code>, <code>--monitor-adjust=PORT</code> <code>--syslog=IP</code> and <code>--name=STRING</code>. In addition, the package <code>monitoring-experiments</code> is required. And a second network interface <code>management_stack</code> using the prefix <code>management</code> is required and passed to the unikernel. Since the syslog reporter requires a console (to report when logging fails), also a console is passed to the unikernel. Each reported metrics includes a tag <code>vm=&lt;name&gt;</code> that can be used to distinguish several unikernels reporting to the same InfluxDB.</p> +<p>Command line arguments:</p> +<pre><code class="language-patch"> let doc = Key.Arg.info ~doc:&quot;The fingerprint of the TLS certificate.&quot; [ &quot;tls-cert-fingerprint&quot; ] in + Key.(create &quot;tls_cert_fingerprint&quot; Arg.(opt (some string) None doc)) + ++let monitor = ++ let doc = Key.Arg.info ~doc:&quot;monitor host IP&quot; [&quot;monitor&quot;] in ++ Key.(create &quot;monitor&quot; Arg.(opt (some ip_address) None doc)) ++ ++let monitor_adjust = ++ let doc = Key.Arg.info ~doc:&quot;adjust monitoring (log level, ..)&quot; [&quot;monitor-adjust&quot;] in ++ Key.(create &quot;monitor_adjust&quot; Arg.(opt (some int) None doc)) ++ ++let syslog = ++ let doc = Key.Arg.info ~doc:&quot;syslog host IP&quot; [&quot;syslog&quot;] in ++ Key.(create &quot;syslog&quot; Arg.(opt (some ip_address) None doc)) ++ ++let name = ++ let doc = Key.Arg.info ~doc:&quot;Name of the unikernel&quot; [&quot;name&quot;] in ++ Key.(create &quot;name&quot; Arg.(opt string &quot;ns.nqsb.io&quot; doc)) ++ + let mimic_impl random stackv4v6 mclock pclock time = + let tcpv4v6 = tcpv4v6_of_stackv4v6 $ stackv4v6 in + let mhappy_eyeballs = mimic_happy_eyeballs $ random $ time $ mclock $ pclock $ stackv4v6 in +</code></pre> +<p>Requiring <code>monitoring-experiments</code>, registering command line arguments:</p> +<pre><code class="language-patch"> package ~min:&quot;3.7.0&quot; ~max:&quot;3.8.0&quot; &quot;git-mirage&quot;; + package ~min:&quot;3.7.0&quot; &quot;git-paf&quot;; + package ~min:&quot;0.0.8&quot; ~sublibs:[&quot;mirage&quot;] &quot;paf&quot;; ++ package &quot;monitoring-experiments&quot;; ++ package ~sublibs:[&quot;mirage&quot;] ~min:&quot;0.3.0&quot; &quot;logs-syslog&quot;; + ] in + foreign +- ~keys:[Key.abstract remote_k ; Key.abstract axfr] ++ ~keys:[ ++ Key.abstract remote_k ; Key.abstract axfr ; ++ Key.abstract name ; Key.abstract monitor ; Key.abstract monitor_adjust ; Key.abstract syslog ++ ] + ~packages +</code></pre> +<p>Added console and a second network stack to <code>foreign</code>:</p> +<pre><code class="language-patch"> &quot;Unikernel.Main&quot; +- (random @-&gt; pclock @-&gt; mclock @-&gt; time @-&gt; stackv4v6 @-&gt; mimic @-&gt; job) ++ (console @-&gt; random @-&gt; pclock @-&gt; mclock @-&gt; time @-&gt; stackv4v6 @-&gt; mimic @-&gt; stackv4v6 @-&gt; job) ++ +</code></pre> +<p>Passing a console implementation (<code>default_console</code>) and a second network stack (with <code>management</code> prefix) to <code>register</code>:</p> +<pre><code class="language-patch">+let management_stack = generic_stackv4v6 ~group:&quot;management&quot; (netif ~group:&quot;management&quot; &quot;management&quot;) + + let () = + register &quot;primary-git&quot; +- [dns_handler $ default_random $ default_posix_clock $ default_monotonic_clock $ +- default_time $ net $ mimic_impl] ++ [dns_handler $ default_console $ default_random $ default_posix_clock $ default_monotonic_clock $ ++ default_time $ net $ mimic_impl $ management_stack] +</code></pre> +<p>Now, in the unikernel module the functor changes (console and second network stack added):</p> +<pre><code class="language-patch">@@ -4,17 +4,48 @@ + + open Lwt.Infix + +-module Main (R : Mirage_random.S) (P : Mirage_clock.PCLOCK) (M : Mirage_clock.MCLOCK) (T : Mirage_time.S) (S : Mirage_stack.V4V6) (_ : sig e +nd) = struct ++module Main (C : Mirage_console.S) (R : Mirage_random.S) (P : Mirage_clock.PCLOCK) (M : Mirage_clock.MCLOCK) (T : Mirage_time.S) (S : Mirage +_stack.V4V6) (_ : sig end) (Management : Mirage_stack.V4V6) = struct + + module Store = Irmin_mirage_git.Mem.KV(Irmin.Contents.String) + module Sync = Irmin.Sync(Store) +</code></pre> +<p>And in the <code>start</code> function, the command line arguments are processed and used to setup syslog and metrics monitoring to the specified addresses. Also, a TCP listener is waiting for monitoring and logging adjustments if <code>--monitor-adjust</code> was provided:</p> +<pre><code class="language-patch"> module D = Dns_server_mirage.Make(P)(M)(T)(S) ++ module Monitoring = Monitoring_experiments.Make(T)(Management) ++ module Syslog = Logs_syslog_mirage.Udp(C)(P)(Management) + +- let start _rng _pclock _mclock _time s ctx = ++ let start c _rng _pclock _mclock _time s ctx management = ++ let hostname = Key_gen.name () in ++ (match Key_gen.syslog () with ++ | None -&gt; Logs.warn (fun m -&gt; m &quot;no syslog specified, dumping on stdout&quot;) ++ | Some ip -&gt; Logs.set_reporter (Syslog.create c management ip ~hostname ())); ++ (match Key_gen.monitor () with ++ | None -&gt; Logs.warn (fun m -&gt; m &quot;no monitor specified, not outputting statistics&quot;) ++ | Some ip -&gt; Monitoring.create ~hostname ?listen_port:(Key_gen.monitor_adjust ()) ip management); + connect_store ctx &gt;&gt;= fun (store, upstream) -&gt; + load_git None store upstream &gt;&gt;= function + | Error (`Msg msg) -&gt; +</code></pre> +<p>Once you compiled the unikernel (or downloaded a binary with monitoring), and start that unikernel by passing <code>--net:service=tap0</code> and <code>--net:management=tap10</code> (or whichever your <code>tap</code> interfaces are), and as unikernel arguments <code>--ipv4=&lt;my-ip-address&gt;</code> and <code>--management-ipv4=192.168.42.2/24</code> for IPv4 configuration, <code>--monitor=192.168.42.14</code>, <code>--syslog=192.168.42.10</code>, <code>--name=my.unikernel</code>, <code>--monitor-adjust=12345</code>.</p> +<p>With this, your unikernel will report metrics using the influx protocol to 192.168.42.14 on port 8094 (every 10 seconds), and syslog messages via UDP to 192.168.0.10 (port 514). You should see your InfluxDB getting filled and syslog server receiving messages.</p> +<p>When you configure <a href="https://grafana.com/docs/grafana/latest/getting-started/getting-started-influxdb/">Grafana to use InfluxDB</a>, you'll be able to see the data in the data sources.</p> +<p>Please reach out to us (at team AT robur DOT coop) if you have feedback and suggestions.</p> +https://hannes.robur.coop/Posts/MonitoringAll your metrics belong to influx2022-03-08T11:26:31-00:00hanneshttps://hannes.robur.coop/atomhannes<h2>Introduction</h2> +<p>MirageOS development focus has been a lot on tooling and the developer experience, but to accomplish <a href="https://robur.coop">our</a> goal to &quot;get MirageOS into production&quot;, we need to lower the barrier. This means for us to release binary unikernels. As described <a href="https://hannes.robur.coop/Posts/NGI">earlier</a>, we received a grant for &quot;Deploying MirageOS&quot; from <a href="https://pointer.ngi.eu">NGI Pointer</a> to work on the required infrastructure. This is joint work with <a href="https://reynir.dk/">Reynir</a>.</p> +<p>We provide at <a href="https://builds.robur.coop">builds.robur.coop</a> binary unikernel images (and supplementary software). Doing binary releases of MirageOS unikernels is challenging in two aspects: firstly to be useful for everyone, a binary unikernel should not contain any configuration (such as private keys, certificates, etc.). Secondly, the binaries should be <a href="https://reproducible-builds.org">reproducible</a>. This is crucial for security; everyone can reproduce the exact same binary and verify that our build service did only use the sources. No malware or backdoors included.</p> +<p>This post describes how you can deploy MirageOS unikernels without compiling it from source, then dives into the two issues outlined above - configuration and reproducibility - and finally describes how to setup your own reproducible build infrastructure for MirageOS, and how to bootstrap it.</p> +<h2>Deploying MirageOS unikernels from binary</h2> +<p>To execute a MirageOS unikernel, apart from a hypervisor (Xen/KVM/Muen), a tender (responsible for allocating host system resources and passing these to the unikernel) is needed. Using virtio, this is conventionally done with qemu on Linux, but its code size (and attack surface) is huge. For MirageOS, we develop <a href="https://github.com/solo5/solo5">Solo5</a>, a minimal tender. It supports <em>hvt</em> - hardware virtualization (Linux KVM, FreeBSD BHyve, OpenBSD VMM), <em>spt</em> - sandboxed process (a tight seccomp ruleset (only a handful of system calls allowed, no hardware virtualization needed), Linux only). Apart from that, <a href="https://muen.sk"><em>muen</em></a> (a hypervisor developed in Ada), <em>virtio</em> (for some cloud deployments), and <em>xen</em> (PVHv2 or Qubes 4.0) - <a href="https://github.com/Solo5/solo5/blob/master/docs/building.md">read more</a>. We deploy our unikernels as hvt with FreeBSD BHyve as hypervisor.</p> +<p>On <a href="https://builds.robur.coop">builds.robur.coop</a>, next to the unikernel images, <a href="https://builds.robur.coop/job/solo5-hvt/"><em>solo5-hvt</em> packages</a> are provided - download the binary and install it. A <a href="https://github.com/NixOS/nixpkgs/tree/master/pkgs/os-specific/solo5">NixOS package</a> is already available - please note that <a href="https://github.com/Solo5/solo5/pull/494">soon</a> packaging will be much easier (and we will work on packages merged into distributions).</p> +<p>When the tender is installed, download a unikernel image (e.g. the <a href="https://builds.robur.coop/job/traceroute/build/latest/">traceroute</a> described in <a href="https://hannes.robur.coop/Posts/Traceroute">an earlier post</a>), and execute it:</p> +<pre><code>$ solo5-hvt --net:service=tap0 -- traceroute.hvt --ipv4=10.0.42.2/24 --ipv4-gateway=10.0.42.1 +</code></pre> +<p>If you plan to orchestrate MirageOS unikernels, you may be interested in <a href="https://github.com/roburio/albatross">albatross</a> - we provide <a href="https://builds.robur.coop/job/albatross/">binary packages as well for albatross</a>. An upcoming post will go into further details of how to setup albatross.</p> +<h2>MirageOS configuration</h2> +<p>A MirageOS unikernel has a specific purpose - composed of OCaml libraries - selected at compile time, which allows to only embed the required pieces. This reduces the attack surface drastically. At the same time, to be widely useful to multiple organisations, no configuration data must be embedded into the unikernel.</p> +<p>Early MirageOS unikernels such as <a href="https://github.com/mirage/mirage-www">mirage-www</a> embed content (blog posts, ..) and TLS certificates and private keys in the binary (using <a href="https://github.com/mirage/ocaml-crunch">crunch</a>). The <a href="https://github.com/mirage/qubes-mirage-firewall">Qubes firewall</a> (read the <a href="http://roscidus.com/blog/blog/2016/01/01/a-unikernel-firewall-for-qubesos/">blog post by Thomas</a> for more information) used to include the firewall rules until <a href="https://github.com/mirage/qubes-mirage-firewall/releases/tag/v0.6">v0.6</a> in the binary, since <a href="https://github.com/mirage/qubes-mirage-firewall/tree/v0.7">v0.7</a> the rules are read dynamically from QubesDB. This is big usability improvement.</p> +<p>We have several possibilities to provide configuration information in MirageOS, on the one hand via boot parameters (can be pre-filled at development time, and further refined at configuration time, but those passed at boot time take precedence). Boot parameters have a length limitation.</p> +<p>Another option is to <a href="https://github.com/roburio/tlstunnel/">use a block device</a> - where the TLS reverse proxy stores the configuration, modifiable via a TCP control socket (authentication using a shared hmac secret).</p> +<p>Several other unikernels, such as <a href="https://github.com/Engil/Canopy">this website</a> and <a href="https://github.com/roburio/caldav">our CalDAV server</a>, store the content in a remote git repository. The git URI and credentials (private key seed, host key fingerprint) are passed via boot parameter.</p> +<p>Finally, another option that we take advantage of is to introduce a post-link step that rewrites the binary to embed configuration. The tool <a href="https://github.com/dinosaure/caravan">caravan</a> developed by Romain that does this rewrite is used by our <a href="https://github.com/roburio/openvpn/tree/robur/mirage-router">openvpn router</a> (<a href="https://builds.robur.coop/job/openvpn-router/build/latest/">binary</a>).</p> +<p>In the future, some configuration information - such as monitoring system, syslog sink, IP addresses - may be done via DHCP on one of the private network interfaces - this would mean that the DHCP server has some global configuration option, and the unikernels no longer require that many boot parameters. Another option we want to investigate is where the tender shares a file as read-only memory-mapped region from the host system to the guest system - but this is tricky considering all targets above (especially virtio and muen).</p> +<h2>Behind the scenes: reproducible builds</h2> +<p>To provide a high level of assurance and trust, if you distribute binaries in 2021, you should have a recipe how they can be reproduced in a bit-by-bit identical way. This way, different organisations can run builders and rebuilders, and a user can decide to only use a binary if it has been reproduced by multiple organisations in different jurisdictions using different physical machines - to avoid malware being embedded in the binary.</p> +<p>For a reproduction to be successful, you need to collect the checksums of all sources that contributed to the built, together with other things (host system packages, environment variables, etc.). Of course, you can record the entire OS and sources as a tarball (or file system snapshot) and distribute that - but this may be suboptimal in terms of bandwidth requirements.</p> +<p>With opam, we already have precise tracking which opam packages are used, and since opam 2.1 the <code>opam switch export</code> includes <a href="https://github.com/ocaml/opam/pull/4040">extra-files (patches)</a> and <a href="https://github.com/ocaml/opam/pull/4055">records the VCS version</a>. Based on this functionality, <a href="https://github.com/roburio/orb">orb</a>, an alternative command line application using the opam-client library, can be used to collect (a) the switch export, (b) host system packages, and (c) the environment variables. Only required environment variables are kept, all others are unset while conducting a build. The only required environment variables are <code>PATH</code> (sanitized with an allow list, <code>/bin</code>, <code>/sbin</code>, with <code>/usr</code>, <code>/usr/local</code>, and <code>/opt</code> prefixes), and <code>HOME</code>. To enable Debian's <code>apt</code> to install packages, <code>DEBIAN_FRONTEND</code> is set to <code>noninteractive</code>. The <code>SWITCH_PATH</code> is recorded to allow orb to use the same path during a rebuild. The <code>SOURCE_DATE_EPOCH</code> is set to enable tools that record a timestamp to use a static one. The <code>OS*</code> variables are only used for recording the host OS and version.</p> +<p>The goal of reproducible builds can certainly be achieved in several ways, including to store all sources and used executables in a huge tarball (or docker container), which is preserved for rebuilders. The question of minimal trusted computing base and how such a container could be rebuild from sources in reproducible way are open.</p> +<p>The opam-repository is a community repository, where packages are released to on a daily basis by a lot of OCaml developers. Package dependencies usually only use lower bounds of other packages, and the continuous integration system of the opam repository takes care that upon API changes all reverse dependencies include the right upper bounds. Using the head commit of opam-repository usually leads to a working package universe.</p> +<p>For our MirageOS unikernels, we don't want to stay behind with ancient versions of libraries. That's why our automated building is done on a daily basis with the head commit of opam-repository. Since our unikernels are not part of the main opam repository (they include the configuration information which target to use, e.g. <em>hvt</em>), and we occasionally development versions of opam packages, we use <a href="https://git.robur.io/robur/unikernel-repo">the unikernel-repo</a> as overlay.</p> +<p>If no dependent package got a new release, the resulting binary has the same checksum. If any dependency was released with a newer release, this is picked up, and eventually the checksum changes.</p> +<p>Each unikernel (and non-unikernel) job (e.g. <a href="https://builds.robur.coop/job/dns-primary-git/build/latest/">dns-primary</a> outputs some artifacts:</p> +<ul> +<li>the <a href="https://builds.robur.coop/job/dns-primary-git/build/latest/f/bin/primary_git.hvt">binary image</a> (in <code>bin/</code>, unikernel image, OS package) +</li> +<li>the <a href="https://builds.robur.coop/job/dns-primary-git/build/latest/f/build-environment"><code>build-environment</code></a> containing the environment variables used for this build +</li> +<li>the <a href="https://builds.robur.coop/job/dns-primary-git/build/latest/f/system-packages"><code>system-packages</code></a> containing all packages installed on the host system +</li> +<li>the <a href="https://builds.robur.coop/job/dns-primary-git/build/latest/f/opam-switch"><code>opam-switch</code></a> that contains all opam packages, including git commit or tarball with checksum, and potentially extra patches, used for this build +</li> +<li>a job script and console output +</li> +</ul> +<p>To reproduce such a built, you need to get the same operating system (OS, OS_FAMILY, OS_DISTRIBUTION, OS_VERSION in build-environment), the same set of system packages, and then you can <code>orb rebuild</code> which sets the environment variables and installs the opam packages from the opam-switch.</p> +<p>You can <a href="https://builds.robur.coop/job/dns-primary-git/">browse</a> the different builds, and if there are checksum changes, you can browse to a diff between the opam switches to reason whether the checksum change was intentional (e.g. <a href="https://builds.robur.coop/compare/ba9ab091-9400-4e8d-ad37-cf1339114df8/23341f6b-cd26-48ab-9383-e71342455e81/opam-switch">here</a> the checksum of the unikernel changed when the x509 library was updated).</p> +<p>The opam reproducible build infrastructure is driven by:</p> +<ul> +<li><a href="https://github.com/roburio/orb">orb</a> conducting reproducible builds (<a href="https://builds.robur.coop/job/orb/">packages</a>) +</li> +<li><a href="https://github.com/roburio/builder">builder</a> scheduling builds in contained environments (<a href="https://builds.robur.coop/job/builder/">packages</a>) +</li> +<li><a href="https://git.robur.io/robur/builder-web">builder-web</a> storing builds in a database and providing a HTTP interface (<a href="https://builds.robur.coop/job/builder-web/">packages</a>) +</li> +</ul> +<p>These tools are themselves reproducible, and built on a daily basis. The infrastructure executing the build jobs installs the most recent packages of orb and builder before conducting a build. This means that our build infrastructure is reproducible as well, and uses the latest code when it is released.</p> +<h2>Conclusion</h2> +<p>Thanks to NGI funding we now have reproducible MirageOS binary builds available at <a href="https://builds.robur.coop">builds.robur.coop</a>. The underlying infrastructure is reproducible, available for multiple platforms (Ubuntu using docker, FreeBSD using jails), and can be easily bootstrapped from source (once you have OCaml and opam working, getting builder and orb should be easy). All components are open source software, mostly with permissive licenses.</p> +<p>We also have an index over sha-256 checksum of binaries - in the case you find a running unikernel image where you forgot which exact packages were used, you can do a reverse lookup.</p> +<p>We are aware that the web interface can be improved (PRs welcome). We will also work on the rebuilder setup and run some rebuilds.</p> +<p>Please reach out to us (at team AT robur DOT coop) if you have feedback and suggestions.</p> +https://hannes.robur.coop/Posts/DeployDeploying binary MirageOS unikernels2021-06-30T13:13:37-00:00hanneshttps://hannes.robur.coop/atomhannes<h2>Introduction</h2> +<p>Tl;DR: mirage-crypto-ec, with x509 0.12.0, and tls 0.13.0, provide fast and secure elliptic curve support in OCaml and MirageOS - using the verified <a href="https://github.com/mit-plv/fiat-crypto/">fiat-crypto</a> stack (Coq to OCaml to executable which generates C code that is interfaced by OCaml). In x509, a long standing issue (countryName encoding), and archive (PKCS 12) format is now supported, in addition to EC keys. In tls, ECDH key exchanges are supported, and ECDSA and EdDSA certificates.</p> +<h2>Elliptic curve cryptography</h2> +<p><a href="https://mirage.io/blog/tls-1-3-mirageos">Since May 2020</a>, our <a href="https://usenix15.nqsb.io">OCaml-TLS</a> stack supports TLS 1.3 (since tls version 0.12.0 on opam).</p> +<p>TLS 1.3 requires elliptic curve cryptography - which was not available in <a href="https://github.com/mirage/mirage-crypto">mirage-crypto</a> (the maintained fork of <a href="https://github.com/mirleft/ocaml-nocrypto">nocrypto</a>).</p> +<p>There are two major uses of elliptic curves: <a href="https://en.wikipedia.org/wiki/Elliptic-curve_Diffie%E2%80%93Hellman">key exchange (ECDH)</a> for establishing a shared secret over an insecure channel, and <a href="https://en.wikipedia.org/wiki/Elliptic_Curve_Digital_Signature_Algorithm">digital signature (ECDSA)</a> for authentication, integrity, and non-repudiation. (Please note that the construction of digital signatures on Edwards curves (Curve25519, Ed448) is called EdDSA instead of ECDSA.)</p> +<p>Elliptic curve cryptoraphy is <a href="https://eprint.iacr.org/2020/615">vulnerable</a> <a href="https://raccoon-attack.com/">to</a> <a href="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-5407">various</a> <a href="https://github.com/mimoo/timing_attack_ecdsa_tls">timing</a> <a href="https://minerva.crocs.fi.muni.cz/">attacks</a> - have a read of the <a href="https://blog.trailofbits.com/2020/06/11/ecdsa-handle-with-care/">overview article on ECDSA</a>. When implementing elliptic curve cryptography, it is best to avoid these known attacks. Gladly, there are some projects which address these issues by construction.</p> +<p>In addition, to use the code in MirageOS, it should be boring C code: no heap allocations, only using a very small amount of C library functions -- the code needs to be compiled in an environment with <a href="https://github.com/mirage/ocaml-freestanding/tree/v0.6.4/nolibc">nolibc</a>.</p> +<p>Two projects started in semantics, to solve the issue from the grounds up: <a href="https://github.com/mit-plv/fiat-crypto/">fiat-crypto</a> and <a href="https://github.com/project-everest/hacl-star/">hacl-star</a>: their approach is to use a proof system (<a href="https://coq.inria.fr">Coq</a> or <a href="https://www.fstar-lang.org/">F*</a> to verify that the code executes in constant time, not depending on data input. Both projects provide as output of their proof systems C code.</p> +<p>For our initial TLS 1.3 stack, <a href="https://github.com/pascutto/">Cl&eacute;ment</a>, <a href="https://github.com/NathanReb/">Nathan</a> and <a href="https://github.com/emillon/">Etienne</a> developed <a href="https://github.com/mirage/fiat">fiat-p256</a> and <a href="https://github.com/mirage/hacl">hacl_x5519</a>. Both were one-shot interfaces for a narrow use case (ECDH for NIST P-256 and X25519), worked well for their purpose, and allowed to gather some experience from the development side.</p> +<h3>Changed requirements</h3> +<p>Revisiting our cryptography stack with the elliptic curve perspective had several reasons, on the one side the customer project <a href="https://www.nitrokey.com/products/nethsm">NetHSM</a> asked for feasibility of ECDSA/EdDSA for various elliptic curves, on the other side <a href="https://github.com/mirage/ocaml-dns/pull/251">DNSSec</a> uses elliptic curve cryptography (ECDSA), and also <a href="https://www.wireguard.com/">wireguard</a> relies on elliptic curve cryptography. The number of X.509 certificates using elliptic curves is increasing, and we don't want to leave our TLS stack in a state where it can barely talk to a growing number of services on the Internet.</p> +<p>Looking at <a href="https://github.com/project-everest/hacl-star/"><em>hacl-star</em></a>, their <a href="https://hacl-star.github.io/Supported.html">support</a> is limited to P-256 and Curve25519, any new curve requires writing F*. Another issue with hacl-star is C code quality: their C code does neither <a href="https://github.com/mirage/hacl/issues/46">compile with older C compilers (found on Oracle Linux 7 / CentOS 7)</a>, nor when enabling all warnings (&gt; 150 are generated). We consider the C compiler as useful resource to figure out undefined behaviour (and other problems), and when shipping C code we ensure that it compiles with <code>-Wall -Wextra -Wpedantic --std=c99 -Werror</code>. The hacl project <a href="https://github.com/mirage/hacl/tree/master/src/kremlin">ships</a> a bunch of header files and helper functions to work on all platforms, which is a clunky <code>ifdef</code> desert. The hacl approach is to generate a whole algorithm solution: from arithmetic primitives, group operations, up to cryptographic protocol - everything included.</p> +<p>In contrast, <a href="https://github.com/mit-plv/fiat-crypto/"><em>fiat-crypto</em></a> is a Coq development, which as part of compilation (proof verification) generates executables (via OCaml code extraction from Coq). These executables are used to generate modular arithmetic (as C code) given a curve description. The <a href="https://github.com/mirage/mirage-crypto/tree/main/ec/native">generated C code</a> is highly portable, independent of platform (word size is taken as input) - it only requires a <code>&lt;stdint.h&gt;</code>, and compiles with all warnings enabled (once <a href="https://github.com/mit-plv/fiat-crypto/pull/906">a minor PR</a> got merged). Supporting a new curve is simple: generate the arithmetic code using fiat-crypto with the new curve description. The downside is that group operations and protocol needs to implemented elsewhere (and is not part of the proven code) - gladly this is pretty straightforward to do, especially in high-level languages.</p> +<h3>Working with fiat-crypto</h3> +<p>As mentioned, our initial <a href="https://github.com/mirage/fiat">fiat-p256</a> binding provided ECDH for the NIST P-256 curve. Also, BoringSSL uses fiat-crypto for ECDH, and developed the code for group operations and cryptographic protocol on top of it.</p> +<p>The work needed was (a) ECDSA support and (b) supporting more curves (let's focus on NIST curves). For ECDSA, the algorithm requires modular arithmetics in the field of the group order (in addition to the prime). We generate these primitives with fiat-crypto (named <code>npYYY_AA</code>) - that required <a href="https://github.com/mit-plv/fiat-crypto/commit/e31a36d5f1b20134e67ccc5339d88f0ff3cb0f86">a small fix in decoding hex</a>. Fiat-crypto also provides inversion <a href="https://github.com/mit-plv/fiat-crypto/pull/670">since late October 2020</a>, <a href="https://eprint.iacr.org/2021/549">paper</a> - which allowed to reduce our code base taken from BoringSSL. The ECDSA protocol was easy to implement in OCaml using the generated arithmetics.</p> +<p>Addressing the issue of more curves was also easy to achieve, the C code (group operations) are macros that are instantiated for each curve - the OCaml code are functors that are applied with each curve description.</p> +<p>Thanks to the test vectors (as structured data) from <a href="https://github.com/google/wycheproof/">wycheproof</a> (and again thanks to Etienne, Nathan, and Cl&eacute;ment for their OCaml code decodin them), I feel confident that our elliptic curve code works as desired.</p> +<p>What was left is X25519 and Ed25519 - dropping the hacl dependency entirely felt appealing (less C code to maintain from fewer projects). This turned out to require more C code, which we took from BoringSSL. It may be desirable to reduce the imported C code, or to wait until a project on top of fiat-crypto which provides proven cryptographic protocols is in a usable state.</p> +<p>To avoid performance degradation, I distilled some <a href="https://github.com/mirage/mirage-crypto/pull/107#issuecomment-799701703">X25519 benchmarks</a>, turns out the fiat-crypto and hacl performance is very similar.</p> +<h3>Achievements</h3> +<p>The new opam package <a href="https://mirage.github.io/mirage-crypto/doc/mirage-crypto-ec/Mirage_crypto_ec/index.html">mirage-crypto-ec</a> is released, which includes the C code generated by fiat-crypto (including <a href="https://github.com/mit-plv/fiat-crypto/pull/670">inversion</a>), <a href="https://github.com/mirage/mirage-crypto/blob/main/ec/native/point_operations.h">point operations</a> from BoringSSL, and some <a href="https://github.com/mirage/mirage-crypto/blob/main/ec/mirage_crypto_ec.ml">OCaml code</a> for invoking these functions and doing bounds checks, and whether points are on the curve. The OCaml code are some functors that take the curve description (consisting of parameters, C function names, byte length of value) and provide Diffie-Hellman (Dh) and digital signature algorithm (Dsa) modules. The nonce for ECDSA is computed deterministically, as suggested by <a href="https://tools.ietf.org/html/rfc6979">RFC 6979</a>, to avoid private key leakage.</p> +<p>The code has been developed in <a href="https://github.com/mirage/mirage-crypto/pull/101">NIST curves</a>, <a href="https://github.com/mirage/mirage-crypto/pull/106">removing blinding</a> (since we use operations that are verified to be constant-time), <a href="https://github.com/mirage/mirage-crypto/pull/108">added missing length checks</a> (reported by <a href="https://github.com/greg42">Greg</a>), <a href="https://github.com/mirage/mirage-crypto/pull/107">curve25519</a>, <a href="https://github.com/mirage/mirage-crypto/pull/117">a fix for signatures that do not span the entire byte size (discovered while adapting X.509)</a>, <a href="https://github.com/mirage/mirage-crypto/pull/118">fix X25519 when the input has offset &lt;&gt; 0</a>. It works on x86 and arm, both 32 and 64 bit (checked by CI). The development was partially sponsored by Nitrokey.</p> +<p>What is left to do, apart from further security reviews, is <a href="https://github.com/mirage/mirage-crypto/issues/109">performance improvements</a>, <a href="https://github.com/mirage/mirage-crypto/issues/112">Ed448/X448 support</a>, and <a href="https://github.com/mirage/mirage-crypto/issues/105">investigating deterministic k for P521</a>. Pull requests are welcome.</p> +<p>When you use the code, and encounter any issues, please <a href="https://github.com/mirage/mirage-crypto/issues">report them</a>.</p> +<h2>Layer up - X.509 now with ECDSA / EdDSA and PKCS 12 support, and a long-standing issue fixed</h2> +<p>With the sign and verify primitives, the next step is to interoperate with other tools that generate and use these public and private keys. This consists of serialisation to and deserialisation from common data formats (ASN.1 DER and PEM encoding), and support for handling X.509 certificates with elliptic curve keys. Since X.509 0.12.0, it supports EC private and public keys, including certificate validation and issuance.</p> +<p>Releasing X.509 also included to go through the issue tracker and attempt to solve the existing issues. This time, the <a href="https://github.com/mirleft/ocaml-x509/issues/69">&quot;country name is encoded as UTF8String, while RFC demands PrintableString&quot;</a> filed more than 5 years ago by <a href="https://github.com/reynir">Reynir</a>, re-reported by <a href="https://github.com/paurkedal">Petter</a> in early 2017, and again by <a href="https://github.com/NightBlues">Vadim</a> in late 2020, <a href="https://github.com/mirleft/ocaml-x509/pull/140">was fixed by Vadim</a>.</p> +<p>Another long-standing pull request was support for <a href="https://tools.ietf.org/html/rfc7292">PKCS 12</a>, the archive format for certificate and private key bundles. This has <a href="https://github.com/mirleft/ocaml-x509/pull/114">been developed and merged</a>. PKCS 12 is a widely used and old format (e.g. when importing / exporting cryptographic material in your browser, used by OpenVPN, ...). Its specification uses RC2 and 3DES (see <a href="https://unmitigatedrisk.com/?p=654">this nice article</a>), which are the default algorithms used by <code>openssl pkcs12</code>.</p> +<h2>One more layer up - TLS</h2> +<p>In TLS we are finally able to use ECDSA (and EdDSA) certificates and private keys, this resulted in slightly more complex configuration - the constraints between supported groups, signature algorithms, ciphersuite, and certificates are intricate:</p> +<p>The ciphersuite (in TLS before 1.3) specifies which key exchange mechanism to use, but also which signature algorithm to use (RSA/ECDSA). The supported groups client hello extension specifies which elliptic curves are supported by the client. The signature algorithm hello extension (TLS 1.2 and above) specifies the signature algorithm. In the end, at load time the TLS configuration is validated and groups, ciphersuites, and signature algorithms are condensed depending on configured server certificates. At session initiation time, once the client reports what it supports, these parameters are further cut down to eventually find some suitable cryptographic parameters for this session.</p> +<p>From the user perspective, earlier the certificate bundle and private key was a pair of <code>X509.Certificate.t list</code> and <code>Mirage_crypto_pk.Rsa.priv</code>, now the second part is a <code>X509.Private_key.t</code> - all provided constructors have been updates (notably <code>X509_lwt.private_of_pems</code> and <code>Tls_mirage.X509.certificate</code>).</p> +<h2>Finally, conduit and mirage</h2> +<p>Thanks to <a href="https://github.com/dinosaure">Romain</a>, conduit* 4.0.0 was released which supports the modified API of X.509 and TLS. Romain also developed patches and released mirage 3.10.3 which supports the above mentioned work.</p> +<h2>Conclusion</h2> +<p>Elliptic curve cryptography is now available in OCaml using verified cryptographic primitives from the fiat-crypto project - <code>opam install mirage-crypto-ec</code>. X.509 since 0.12.0 and TLS since 0.13.0 and MirageOS since 3.10.3 support this new development which gives rise to smaller EC keys. Our old bindings, fiat-p256 and hacl_x25519 have been archived and will no longer be maintained.</p> +<p>Thanks to everyone involved on this journey: reporting issues, sponsoring parts of the work, helping with integration, developing initial prototypes, and keep motivating me to continue this until the release is done.</p> +<p>In the future, it may be possible to remove zarith and gmp from the dependency chain, and provide EC-only TLS servers and clients for MirageOS. The benefit will be much less C code (libgmp-freestanding.a is 1.5MB in size) in our trusted code base.</p> +<p>Another potential project that is very close now is a certificate authority developed in MirageOS - now that EC keys, PKCS 12, revocation lists, ... are implemented.</p> +<h2>Footer</h2> +<p>If you want to support our work on MirageOS unikernels, please <a href="https://robur.coop/Donate">donate to robur</a>. I'm interested in feedback, either via <a href="https://twitter.com/h4nnes">twitter</a>, <a href="https://mastodon.social/@hannesm">hannesm@mastodon.social</a> or via eMail.</p> +https://hannes.robur.coop/Posts/ECCryptography updates in OCaml and MirageOS2021-04-23T13:33:06-00:00hanneshttps://hannes.robur.coop/atomhannes<h2>Introduction</h2> +<p>2020 was an intense year. I hope you're healthy and keep being healthy. I am privileged (as lots of software engineers and academics are) to be able to work from home during the pandemic. Let's not forget people in less privileged situations, and let&rsquo;s try to give them as much practical, psychological and financial support as we can these days. And as much joy as possible to everyone around :)</p> +<p>I cancelled the autumn MirageOS retreat due to the pandemic. Instead I collected donations for our hosts in Marrakech - they were very happy to receive our financial support, since they had a difficult year, since their income is based on tourism. I hope that in autumn 2021 we'll have an on-site retreat again.</p> +<p>For 2021, we (at <a href="https://robur.coop">robur</a>) got a grant from the EU (via <a href="https://pointer.ngi.eu">NGI pointer</a>) for &quot;Deploying MirageOS&quot; (more details below), and another grant from <a href="https://ocaml-sf.org">OCaml software foundation</a> for securing the opam supply chain (using <a href="https://github.com/hannesm/conex">conex</a>). Some long-awaited releases for MirageOS libraries, namely a <a href="https://discuss.ocaml.org/t/ann-first-release-of-awa-ssh">ssh implementation</a> and a rewrite of our <a href="https://discuss.ocaml.org/t/ann-release-of-ocaml-git-v3-0-duff-encore-decompress-etc/">git implementation</a> have already been published.</p> +<p>With my MirageOS view, 2020 was a pretty successful year, where we managed to add more features, fixed lots of bugs, and paved the road ahead. I want to thank <a href="https://ocamllabs.io/">OCamlLabs</a> for funding work on MirageOS maintenance.</p> +<h2>Recap 2020</h2> +<p>Here is a very subjective random collection of accomplishments in 2020, where I was involved with some degree.</p> +<h3>NetHSM</h3> +<p><a href="https://www.nitrokey.com/products/nethsm">NetHSM</a> is a hardware security module in software. It is a product that uses MirageOS for security, and is based on the <a href="https://muen.sk">muen</a> separation kernel. We at <a href="https://robur.coop">robur</a> were heavily involved in this product. It already has been security audited by an external team. You can pre-order it from Nitrokey.</p> +<h3>TLS 1.3</h3> +<p>Dating back to 2016, at the <a href="https://www.ndss-symposium.org/ndss2016/tron-workshop-programme/">TRON</a> (TLS 1.3 Ready or NOt), we developed a first draft of a 1.3 implementation of <a href="https://github.com/mirleft/ocaml-tls">OCaml-TLS</a>. Finally in May 2020 we got our act together, including ECC (ECDH P256 from <a href="https://github.com/mit-plv/fiat-crypto/">fiat-crypto</a>, X25519 from <a href="https://project-everest.github.io/">hacl</a>) and testing with <a href="https://github.com/tlsfuzzer/tlsfuzzer">tlsfuzzer</a>, and release tls 0.12.0 with TLS 1.3 support. Later we added <a href="https://github.com/mirleft/ocaml-tls/pull/414">ECC ciphersuites to TLS version 1.2</a>, implemented <a href="https://github.com/mirleft/ocaml-tls/pull/414">ChaCha20/Poly1305</a>, and fixed an <a href="https://github.com/mirleft/ocaml-tls/pull/424">interoperability issue with Go's implementation</a>.</p> +<p><a href="https://github.com/mirage/mirage-crypto">Mirage-crypto</a> provides the underlying cryptographic primitives, initially released in March 2020 as a fork of <a href="https://github.com/mirleft/ocaml-nocrypto">nocrypto</a> -- huge thanks to <a href="https://github.com/pqwy">pqwy</a> for his great work. Mirage-crypto detects <a href="https://github.com/mirage/mirage-crypto/pull/53">CPU features at runtime</a> (thanks to <a href="https://github.com/Julow">Julow</a>) (<a href="https://github.com/mirage/mirage-crypto/pull/96">bugfix for bswap</a>), using constant time modular exponentation (powm_sec) and hardens against Lenstra's CRT attack, supports <a href="https://github.com/mirage/mirage-crypto/pull/39">compilation on Windows</a> (thanks to <a href="https://github.com/avsm">avsm</a>), <a href="https://github.com/mirage/mirage-crypto/pull/90">async entropy harvesting</a> (thanks to <a href="https://github.com/seliopou">seliopou</a>), <a href="https://github.com/mirage/mirage-crypto/pull/65">32 bit support</a>, <a href="https://github.com/mirage/mirage-crypto/pull/72">chacha20/poly1305</a> (thanks to <a href="https://github.com/abeaumont">abeaumont</a>), <a href="https://github.com/mirage/mirage-crypto/pull/84">cross-compilation</a> (thanks to <a href="https://github.com/EduardoRFS">EduardoRFS</a>) and <a href="https://github.com/mirage/mirage-crypto/pull/78">various</a> <a href="https://github.com/mirage/mirage-crypto/pull/81">bug</a> <a href="https://github.com/mirage/mirage-crypto/pull/83">fixes</a>, even <a href="https://github.com/mirage/mirage-crypto/pull/95">memory leak</a> (thanks to <a href="https://github.com/talex5">talex5</a> for reporting several of these issues), and <a href="https://github.com/mirage/mirage-crypto/pull/99">RSA</a> <a href="https://github.com/mirage/mirage-crypto/pull/100">interoperability</a> (thanks to <a href="https://github.com/psafont">psafont</a> for investigation and <a href="https://github.com/mattjbray">mattjbray</a> for reporting). This library feels very mature now - being used by multiple stakeholders, and lots of issues have been fixed in 2020.</p> +<h3>Qubes Firewall</h3> +<p>The <a href="https://github.com/mirage/qubes-mirage-firewall/">MirageOS based Qubes firewall</a> is the most widely used MirageOS unikernel. And it got major updates: in May <a href="https://github.com/linse">Steffi</a> <a href="https://groups.google.com/g/qubes-users/c/Xzplmkjwa5Y">announced</a> her and <a href="https://github.com/yomimono">Mindy's</a> work on improving it for Qubes 4.0 - including <a href="https://www.qubes-os.org/doc/vm-interface/#firewall-rules-in-4x">dynamic firewall rules via QubesDB</a>. Thanks to <a href="https://prototypefund.de/project/portable-firewall-fuer-qubesos/">prototypefund</a> for sponsoring.</p> +<p>In October 2020, we released <a href="https://mirage.io/blog/announcing-mirage-39-release">Mirage 3.9</a> with PVH virtualization mode (thanks to <a href="https://github.com/mato">mato</a>). There's still a <a href="https://github.com/mirage/qubes-mirage-firewall/issues/120">memory leak</a> to be investigated and fixed.</p> +<h3>IPv6</h3> +<p>In December, with <a href="https://mirage.io/blog/announcing-mirage-310-release">Mirage 3.10</a> we got the IPv6 code up and running. Now MirageOS unikernels have a dual stack available, besides IPv4-only and IPv6-only network stacks. Thanks to <a href="https://github.com/nojb">nojb</a> for the initial code and <a href="https://github.com/MagnusS">MagnusS</a>.</p> +<p>Turns out this blog, but also robur services, are now available via IPv6 :)</p> +<h3>Albatross</h3> +<p>Also in December, I pushed an initial release of <a href="https://github.com/roburio/albatross">albatross</a>, a unikernel orchestration system with remote access. <em>Deploy your unikernel via a TLS handshake -- the unikernel image is embedded in the TLS client certificates.</em></p> +<p>Thanks to <a href="https://github.com/reynir">reynir</a> for statistics support on Linux and improvements of the systemd service scripts. Also thanks to <a href="https://github.com/cfcs">cfcs</a> for the initial Linux port.</p> +<h3>CA certs</h3> +<p>For several years I postponed the problem of how to actually use the operating system trust anchors for OCaml-TLS connections. Thanks to <a href="https://github.com/emillon">emillon</a> for initial code, there are now <a href="https://github.com/mirage/ca-certs">ca-certs</a> and <a href="https://github.com/mirage/ca-certs-nss">ca-certs-nss</a> opam packages (see <a href="https://discuss.ocaml.org/t/ann-ca-certs-and-ca-certs-nss">release announcement</a>) which fills this gap.</p> +<h2>Unikernels</h2> +<p>I developed several useful unikernels in 2020, and also pushed <a href="https://mirage.io/wiki/gallery">a unikernel gallery</a> to the Mirage website:</p> +<h3>Traceroute in MirageOS</h3> +<p>I already wrote about <a href="https://hannes.robur.coop/Posts/Traceroute">traceroute</a> which traces the routing to a given remote host.</p> +<h3>Unipi - static website hosting</h3> +<p><a href="https://github.com/roburio/unipi">Unipi</a> is a static site webserver which retrieves the content from a remote git repository. Let's encrypt certificate provisioning and dynamic updates via a webhook to be executed for every push.</p> +<h4>TLSTunnel - TLS demultiplexing</h4> +<p>The physical machine this blog and other robur infrastructure runs on has been relocated from Sweden to Germany mid-December. Thanks to UPS! Fewer IPv4 addresses are available in the new data center, which motivated me to develop <a href="https://github.com/roburio/tlstunnel">tlstunnel</a>.</p> +<p>The new behaviour is as follows (see the <code>monitoring</code> branch):</p> +<ul> +<li>listener on TCP port 80 which replies with a permanent redirect to <code>https</code> +</li> +<li>listener on TCP port 443 which forwards to a backend host if the requested server name is configured +</li> +<li>its configuration is stored on a block device, and can be dynamically changed (with a custom protocol authenticated with a HMAC) +</li> +<li>it is setup to hold a wildcard TLS certificate and in DNS a wildcard entry is pointing to it +</li> +<li>setting up a new service is very straightforward: only the new name needs to be registered with tlstunnel together with the TCP backend, and everything will just work +</li> +</ul> +<h2>2021</h2> +<p>The year started with a release of <a href="https://discuss.ocaml.org/t/ann-first-release-of-awa-ssh">awa</a>, a SSH implementation in OCaml (thanks to <a href="https://github.com/haesbaert">haesbaert</a> for initial code). This was followed by a <a href="https://discuss.ocaml.org/t/ann-release-of-ocaml-git-v3-0-duff-encore-decompress-etc/">git 3.0 release</a> (thanks to <a href="https://github.com/dinosaure">dinosaure</a>).</p> +<h3>Deploying MirageOS - NGI Pointer</h3> +<p>For 2021 we at robur received funding from the EU (via <a href="https://pointer.ngi.eu/">NGI pointer</a>) for &quot;Deploying MirageOS&quot;, which boils down into three parts:</p> +<ul> +<li>reproducible binary releases of MirageOS unikernels, +</li> +<li>monitoring (and other devops features: profiling) and integration into existing infrastructure, +</li> +<li>and further documentation and advertisement. +</li> +</ul> +<p>Of course this will all be available open source. Please get in touch via eMail (team aT robur dot coop) if you're eager to integrate MirageOS unikernels into your infrastructure.</p> +<p>We discovered at an initial meeting with an infrastructure provider that a DNS resolver is of interest - even more now that dnsmasq suffered from <a href="https://www.jsof-tech.com/wp-content/uploads/2021/01/DNSpooq_Technical-Whitepaper.pdf">dnspooq</a>. We are already working on an <a href="https://github.com/mirage/ocaml-dns/pull/251">implementation of DNSSec</a>.</p> +<p>MirageOS unikernels are binary reproducible, and <a href="https://github.com/rjbou/orb/pull/1">infrastructure tools are available</a>. We are working hard on a web interface (and REST API - think of it as &quot;Docker Hub for MirageOS unikernels&quot;), and more tooling to verify reproducibility.</p> +<h3>Conex - securing the supply chain</h3> +<p>Another funding from the <a href="http://ocaml-sf.org/">OCSF</a> is to continue development and deploy <a href="https://github.com/hannesm/conex">conex</a> - to bring trust into opam-repository. This is a great combination with the reproducible build efforts, and will bring much more trust into retrieving OCaml packages and using MirageOS unikernels.</p> +<h3>MirageOS 4.0</h3> +<p>Mirage so far still uses ocamlbuild and ocamlfind for compiling the virtual machine binary. But the switch to dune is <a href="https://github.com/mirage/mirage/issues/1195">close</a>, a lot of effort has been done. This will make the developer experience of MirageOS much more smooth, with a per-unikernel monorepo workflow where you can push your changes to the individual libraries.</p> +<h2>Footer</h2> +<p>If you want to support our work on MirageOS unikernels, please <a href="https://robur.coop/Donate">donate to robur</a>. I'm interested in feedback, either via <a href="https://twitter.com/h4nnes">twitter</a>, <a href="https://mastodon.social/@hannesm">hannesm@mastodon.social</a> or via eMail.</p> +https://hannes.robur.coop/Posts/NGIThe road ahead for MirageOS in 20212021-01-25T12:45:54-00:00hanneshttps://hannes.robur.coop/atomhannes<h2>Traceroute</h2> +<p>Is a diagnostic utility which displays the route and measures transit delays of +packets across an Internet protocol (IP) network.</p> +<pre><code class="language-bash">$ doas solo5-hvt --net:service=tap0 -- traceroute.hvt --ipv4=10.0.42.2/24 --ipv4-gateway=10.0.42.1 --host=198.167.222.207 + | ___| + __| _ \ | _ \ __ \ +\__ \ ( | | ( | ) | +____/\___/ _|\___/____/ +Solo5: Bindings version v0.6.5 +Solo5: Memory map: 512 MB addressable: +Solo5: reserved @ (0x0 - 0xfffff) +Solo5: text @ (0x100000 - 0x212fff) +Solo5: rodata @ (0x213000 - 0x24bfff) +Solo5: data @ (0x24c000 - 0x317fff) +Solo5: heap &gt;= 0x318000 &lt; stack &lt; 0x20000000 +2020-06-22 15:41:25 -00:00: INF [netif] Plugging into service with mac 76:9b:36:e0:e5:74 mtu 1500 +2020-06-22 15:41:25 -00:00: INF [ethernet] Connected Ethernet interface 76:9b:36:e0:e5:74 +2020-06-22 15:41:25 -00:00: INF [ARP] Sending gratuitous ARP for 10.0.42.2 (76:9b:36:e0:e5:74) +2020-06-22 15:41:25 -00:00: INF [udp] UDP interface connected on 10.0.42.2 +2020-06-22 15:41:25 -00:00: INF [application] 1 10.0.42.1 351us +2020-06-22 15:41:25 -00:00: INF [application] 2 192.168.42.1 1.417ms +2020-06-22 15:41:25 -00:00: INF [application] 3 192.168.178.1 1.921ms +2020-06-22 15:41:25 -00:00: INF [application] 4 88.72.96.1 16.716ms +2020-06-22 15:41:26 -00:00: INF [application] 5 * +2020-06-22 15:41:27 -00:00: INF [application] 6 92.79.215.112 16.794ms +2020-06-22 15:41:27 -00:00: INF [application] 7 145.254.2.215 21.305ms +2020-06-22 15:41:27 -00:00: INF [application] 8 145.254.2.217 22.05ms +2020-06-22 15:41:27 -00:00: INF [application] 9 195.89.99.1 21.088ms +2020-06-22 15:41:27 -00:00: INF [application] 10 62.115.9.133 20.105ms +2020-06-22 15:41:27 -00:00: INF [application] 11 213.155.135.82 30.861ms +2020-06-22 15:41:27 -00:00: INF [application] 12 80.91.246.200 30.716ms +2020-06-22 15:41:27 -00:00: INF [application] 13 80.91.253.163 28.315ms +2020-06-22 15:41:27 -00:00: INF [application] 14 62.115.145.27 30.436ms +2020-06-22 15:41:27 -00:00: INF [application] 15 80.67.4.239 42.826ms +2020-06-22 15:41:27 -00:00: INF [application] 16 80.67.10.147 47.213ms +2020-06-22 15:41:27 -00:00: INF [application] 17 198.167.222.207 48.598ms +Solo5: solo5_exit(0) called +</code></pre> +<p>This means with a traceroute utility you can investigate which route is taken +to a destination host, and what the round trip time(s) on the path are. The +sample output above is taken from a virtual machine on my laptop to the remote +host 198.167.222.207. You can see there are 17 hops between us, with the first +being my laptop with a tiny round trip time of 351us, the second and third are +using private IP addresses, and are my home network. The round trip time of the +fourth hop is much higher, this is the first hop on the other side of my DSL +modem. You can see various hops on the public Internet: the packets pass from +my Internet provider's backbone across some exchange points to the destination +Internet provider somewhere in Sweden.</p> +<p>The implementation of traceroute relies mainly on the time-to-live (ttl) field +(in IPv6 lingua it is &quot;hop limit&quot;) of IP packets, which is meant to avoid route +cycles that would infinitely forward IP packets in circles. Every router, when +forwarding an IP packet, first checks that the ttl field is greater than zero, +and then forwards the IP packet where the ttl is decreased by one. If the ttl +field is zero, instead of forwarding, an ICMP time exceeded packet is sent back +to the source.</p> +<p>Traceroute works by exploiting this mechanism: a series of IP packets with +increasing ttls is sent to the destination. Since upfront the length of the +path is unknown, it is a reactive system: first send an IP packet with a ttl of +one, if a ICMP time exceeded packet is returned, send an IP packet with a ttl of +two, etc. -- until an ICMP packet of type destination unreachable is received. +Since some hosts do not reply with a time exceeded message, it is crucial for +not getting stuck to use a timeout for each packet: when the timeout is reached, +an IP packet with an increased ttl is sent and an unknown for the ttl is +printed (see the fifth hop in the example above).</p> +<p>The packets send out are conventionally UDP packets without payload. From a +development perspective, one question is how to correlate the ICMP packet +with the sent UDP packet. Conveniently, ICMP packets contain the IP header and +the first eight bytes of the next protocol - the UDP header containing source +port, destination port, checksum, and payload length (each fields of size two +bytes). This means when we record the outgoing ports together with the sent +timestamp, and correlate the later received ICMP packet to the sent packet. +Great.</p> +<p>But as a functional programmer, let's figure whether we can abolish the +(globally shared) state. Since the ICMP packet contains the original IP +header and the first eight bytes of the UDP header, this is where we will +embed data. As described above, the data is the sent timestamp and the value +of the ttl field. For the latter, we can arbitrarily restrict it to 31 (5 bits). +For the timestamp, it is mainly a question about precision and maximum expected +round trip time. Taking the source and destination port are 32 bits, using 5 for +ttl, remaining are 27 bits (an unsigned value up to 134217727). Looking at the +decimal representation, 1 second is likely too small, 13 seconds are sufficient +for the round trip time measurement. This implies our precision is 100ns, by +counting the digits.</p> +<p>Finally to the code. First we need forth and back conversions between ports +and ttl, timestamp:</p> +<pre><code class="language-OCaml">(* takes a time-to-live (int) and timestamp (int64, nanoseconda), encodes them + into 16 bit source port and 16 bit destination port: + - the timestamp precision is 100ns (thus, it is divided by 100) + - use the bits 27-11 of the timestamp as source port + - use the bits 11-0 as destination port, and 5 bits of the ttl +*) +let ports_of_ttl_ts ttl ts = + let ts = Int64.div ts 100L in + let src_port = 0xffff land (Int64.(to_int (shift_right ts 11))) + and dst_port = 0xffe0 land (Int64.(to_int (shift_left ts 5))) lor (0x001f land ttl) + in + src_port, dst_port + +(* inverse operation of ports_of_ttl_ts for the range (src_port and dst_port + are 16 bit values) *) +let ttl_ts_of_ports src_port dst_port = + let ttl = 0x001f land dst_port in + let ts = + let low = Int64.of_int (dst_port lsr 5) + and high = Int64.(shift_left (of_int src_port) 11) + in + Int64.add low high + in + let ts = Int64.mul ts 100L in + ttl, ts +</code></pre> +<p>They should be inverse over the range of valid input: ports are 16 bit numbers, +ttl expected to be at most 31, ts a int64 expressed in nanoseconds.</p> +<p>Related is the function to print out one hop and round trip measurement:</p> +<pre><code class="language-OCaml">(* write a log line of a hop: the number, IP address, and round trip time *) +let log_one now ttl sent ip = + let now = Int64.(mul (logand (div now 100L) 0x7FFFFFFL) 100L) in + let duration = Mtime.Span.of_uint64_ns (Int64.sub now sent) in + Logs.info (fun m -&gt; m &quot;%2d %a %a&quot; ttl Ipaddr.V4.pp ip Mtime.Span.pp duration) +</code></pre> +<p>The most logic is when a ICMP packet is received:</p> +<pre><code class="language-OCaml">module Icmp = struct + type t = { + send : int -&gt; unit Lwt.t ; + log : int -&gt; int64 -&gt; Ipaddr.V4.t -&gt; unit ; + task_done : unit Lwt.u ; + } + + let connect send log task_done = + let t = { send ; log ; task_done } in + Lwt.return t + + (* This is called for each received ICMP packet. *) + let input t ~src ~dst buf = + let open Icmpv4_packet in + (* Decode the received buffer (the IP header has been cut off already). *) + match Unmarshal.of_cstruct buf with + | Error s -&gt; + Lwt.fail_with (Fmt.strf &quot;ICMP: error parsing message from %a: %s&quot; Ipaddr.V4.pp src s) + | Ok (message, payload) -&gt; + let open Icmpv4_wire in + (* There are two interesting cases: Time exceeded (-&gt; send next packet), + and Destination (port) unreachable (-&gt; we reached the final host and can exit) *) + match message.ty with + | Time_exceeded -&gt; + (* Decode the payload, which should be an IPv4 header and a protocol header *) + begin match Ipv4_packet.Unmarshal.header_of_cstruct payload with + | Ok (pkt, off) when + (* Ensure this packet matches our sent packet: the protocol is UDP + and the destination address is the host we're tracing *) + pkt.Ipv4_packet.proto = Ipv4_packet.Marshal.protocol_to_int `UDP &amp;&amp; + Ipaddr.V4.compare pkt.Ipv4_packet.dst (Key_gen.host ()) = 0 -&gt; + let src_port = Cstruct.BE.get_uint16 payload off + and dst_port = Cstruct.BE.get_uint16 payload (off + 2) + in + (* Retrieve ttl and sent timestamp, encoded in the source port and + destination port of the UDP packet we sent, and received back as + ICMP payload. *) + let ttl, sent = ttl_ts_of_ports src_port dst_port in + (* Log this hop. *) + t.log ttl sent src; + (* Sent out the next UDP packet with an increased ttl. *) + let ttl' = succ ttl in + Logs.debug (fun m -&gt; m &quot;ICMP time exceeded from %a to %a, now sending with ttl %d&quot; + Ipaddr.V4.pp src Ipaddr.V4.pp dst ttl'); + t.send ttl' + | Ok (pkt, _) -&gt; + (* Some stray ICMP packet. *) + Logs.debug (fun m -&gt; m &quot;unsolicited time exceeded from %a to %a (proto %X dst %a)&quot; + Ipaddr.V4.pp src Ipaddr.V4.pp dst pkt.Ipv4_packet.proto Ipaddr.V4.pp pkt.Ipv4_packet.dst); + Lwt.return_unit + | Error e -&gt; + (* Decoding error. *) + Logs.warn (fun m -&gt; m &quot;couldn't parse ICMP time exceeded payload (IPv4) (%a -&gt; %a) %s&quot; + Ipaddr.V4.pp src Ipaddr.V4.pp dst e); + Lwt.return_unit + end + | Destination_unreachable when Ipaddr.V4.compare src (Key_gen.host ()) = 0 -&gt; + (* We reached the final host, and the destination port was not listened to *) + begin match Ipv4_packet.Unmarshal.header_of_cstruct payload with + | Ok (_, off) -&gt; + let src_port = Cstruct.BE.get_uint16 payload off + and dst_port = Cstruct.BE.get_uint16 payload (off + 2) + in + (* Retrieve ttl and sent timestamp. *) + let ttl, sent = ttl_ts_of_ports src_port dst_port in + (* Log the final hop. *) + t.log ttl sent src; + (* Wakeup the waiter task to exit the unikernel. *) + Lwt.wakeup t.task_done (); + Lwt.return_unit + | Error e -&gt; + (* Decoding error. *) + Logs.warn (fun m -&gt; m &quot;couldn't parse ICMP unreachable payload (IPv4) (%a -&gt; %a) %s&quot; + Ipaddr.V4.pp src Ipaddr.V4.pp dst e); + Lwt.return_unit + end + | ty -&gt; + Logs.debug (fun m -&gt; m &quot;ICMP unknown ty %s from %a to %a: %a&quot; + (ty_to_string ty) Ipaddr.V4.pp src Ipaddr.V4.pp dst + Cstruct.hexdump_pp payload); + Lwt.return_unit +end +</code></pre> +<p>Now, the remaining main unikernel is the module <code>Main</code>:</p> +<pre><code class="language-OCaml">module Main (R : Mirage_random.S) (M : Mirage_clock.MCLOCK) (Time : Mirage_time.S) (N : Mirage_net.S) = struct + module ETH = Ethernet.Make(N) + module ARP = Arp.Make(ETH)(Time) + module IPV4 = Static_ipv4.Make(R)(M)(ETH)(ARP) + module UDP = Udp.Make(IPV4)(R) + + (* Global mutable state: the timeout task for a sent packet. *) + let to_cancel = ref None + + (* Send a single packet with the given time to live. *) + let rec send_udp udp ttl = + (* This is called by the ICMP handler which successfully received a + time exceeded, thus we cancel the timeout task. *) + (match !to_cancel with + | None -&gt; () + | Some t -&gt; Lwt.cancel t ; to_cancel := None); + (* Our hop limit is 31 - 5 bit - should be sufficient for most networks. *) + if ttl &gt; 31 then + Lwt.return_unit + else + (* Create a timeout task which: + - sleeps for --timeout interval + - logs an unknown hop + - sends another packet with increased ttl + *) + let cancel = + Lwt.catch (fun () -&gt; + Time.sleep_ns (Duration.of_ms (Key_gen.timeout ())) &gt;&gt;= fun () -&gt; + Logs.info (fun m -&gt; m &quot;%2d *&quot; ttl); + send_udp udp (succ ttl)) + (function Lwt.Canceled -&gt; Lwt.return_unit | exc -&gt; Lwt.fail exc) + in + (* Assign this timeout task. *) + to_cancel := Some cancel; + (* Figure out which source and destination port to use, based on ttl + and current timestamp. *) + let src_port, dst_port = ports_of_ttl_ts ttl (M.elapsed_ns ()) in + (* Send packet via UDP. *) + UDP.write ~ttl ~src_port ~dst:(Key_gen.host ()) ~dst_port udp Cstruct.empty &gt;&gt;= function + | Ok () -&gt; Lwt.return_unit + | Error e -&gt; Lwt.fail_with (Fmt.strf &quot;while sending udp frame %a&quot; UDP.pp_error e) + + (* The main unikernel entry point. *) + let start () () () net = + let cidr = Key_gen.ipv4 () + and gateway = Key_gen.ipv4_gateway () + in + let log_one = fun port ip -&gt; log_one (M.elapsed_ns ()) port ip + (* Create a task to wait on and a waiter to wakeup. *) + and t, w = Lwt.task () + in + (* Setup network stack: ethernet, ARP, IPv4, UDP, and ICMP. *) + ETH.connect net &gt;&gt;= fun eth -&gt; + ARP.connect eth &gt;&gt;= fun arp -&gt; + IPV4.connect ~cidr ~gateway eth arp &gt;&gt;= fun ip -&gt; + UDP.connect ip &gt;&gt;= fun udp -&gt; + let send = send_udp udp in + Icmp.connect send log_one w &gt;&gt;= fun icmp -&gt; + + (* The callback cascade for an incoming network packet. *) + let ethif_listener = + ETH.input + ~arpv4:(ARP.input arp) + ~ipv4:( + IPV4.input + ~tcp:(fun ~src:_ ~dst:_ _ -&gt; Lwt.return_unit) + ~udp:(fun ~src:_ ~dst:_ _ -&gt; Lwt.return_unit) + ~default:(fun ~proto ~src ~dst buf -&gt; + match proto with + | 1 -&gt; Icmp.input icmp ~src ~dst buf + | _ -&gt; Lwt.return_unit) + ip) + ~ipv6:(fun _ -&gt; Lwt.return_unit) + eth + in + (* Start the callback in a separate asynchronous task. *) + Lwt.async (fun () -&gt; + N.listen net ~header_size:Ethernet_wire.sizeof_ethernet ethif_listener &gt;|= function + | Ok () -&gt; () + | Error e -&gt; Logs.err (fun m -&gt; m &quot;netif error %a&quot; N.pp_error e)); + (* Send the initial UDP packet with a ttl of 1. This entails the domino + effect to receive ICMP packets, send out another UDP packet with ttl + increased by one, etc. - until a destination unreachable is received, + or the hop limit is reached. *) + send 1 &gt;&gt;= fun () -&gt; + t +end +</code></pre> +<p>The configuration (<code>config.ml</code>) for this unikernel is as follows:</p> +<pre><code class="language-OCaml">open Mirage + +let host = + let doc = Key.Arg.info ~doc:&quot;The host to trace.&quot; [&quot;host&quot;] in + Key.(create &quot;host&quot; Arg.(opt ipv4_address (Ipaddr.V4.of_string_exn &quot;141.1.1.1&quot;) doc)) + +let timeout = + let doc = Key.Arg.info ~doc:&quot;Timeout (in millisecond)&quot; [&quot;timeout&quot;] in + Key.(create &quot;timeout&quot; Arg.(opt int 1000 doc)) + +let ipv4 = + let doc = Key.Arg.info ~doc:&quot;IPv4 address&quot; [&quot;ipv4&quot;] in + Key.(create &quot;ipv4&quot; Arg.(required ipv4 doc)) + +let ipv4_gateway = + let doc = Key.Arg.info ~doc:&quot;IPv4 gateway&quot; [&quot;ipv4-gateway&quot;] in + Key.(create &quot;ipv4-gateway&quot; Arg.(required ipv4_address doc)) + +let main = + let packages = [ + package ~sublibs:[&quot;ipv4&quot;; &quot;udp&quot;; &quot;icmpv4&quot;] &quot;tcpip&quot;; + package &quot;ethernet&quot;; + package &quot;arp-mirage&quot;; + package &quot;mirage-protocols&quot;; + package &quot;mtime&quot;; + ] in + foreign + ~keys:[Key.abstract ipv4 ; Key.abstract ipv4_gateway ; Key.abstract host ; Key.abstract timeout] + ~packages + &quot;Unikernel.Main&quot; + (random @-&gt; mclock @-&gt; time @-&gt; network @-&gt; job) + +let () = + register &quot;traceroute&quot; + [ main $ default_random $ default_monotonic_clock $ default_time $ default_network ] +</code></pre> +<p>And voila, that's all the code. If you copy it together (or download the two +files from <a href="https://github.com/roburio/traceroute">the GitHub repository</a>), +and have OCaml, opam, and <a href="https://mirage.io/wiki/install">mirage (&gt;= 3.8.0)</a> installed, +you should be able to:</p> +<pre><code class="language-bash">$ mirage configure -t hvt +$ make depend +$ make +$ solo5-hvt --net:service=tap0 -- traceroute.hvt ... +... get the output shown at top ... +</code></pre> +<p>Enhancements may be to use a different protocol (TCP? or any other protocol ID (may be used to encode more information), encode data into IPv4 ID, or the full 8 bytes of the upper protocol), encrypt/authenticate the data transmitted (and verify it has not been tampered with in the ICMP reply), improve error handling and recovery, send multiple packets for improved round trip time measurements, ...</p> +<p>If you develop enhancements you'd like to share, please sent a pull request to the git repository.</p> +<p>Motivation for this traceroute unikernel was while talking with <a href="https://twitter.com/networkservice">Aaron</a> and <a href="https://github.com/phaer">Paul</a>, who contributed several patches to the IP stack which pass the ttl through.</p> +<p>If you want to support our work on MirageOS unikernels, please <a href="https://robur.coop/Donate">donate to robur</a>. I'm interested in feedback, either via <a href="https://twitter.com/h4nnes">twitter</a>, <a href="https://mastodon.social/@hannesm">hannesm@mastodon.social</a> or via eMail.</p> +https://hannes.robur.coop/Posts/TracerouteTraceroute2020-06-24T10:38:10-00:00hanneshttps://hannes.robur.coop/atomhannes<h2>Goal</h2> +<p>Have your domain served by OCaml-DNS authoritative name servers. Data is stored in a git remote, and let's encrypt certificates can be requested to DNS. This software is deployed since more than two years for several domains such as <code>nqsb.io</code> and <code>robur.coop</code>. This present the authoritative server side, and certificate library of the OCaml-DNS implementation formerly known as <a href="https://hannes.robur.coop/Posts/DNS">&micro;DNS</a>.</p> +<h2>Prerequisites</h2> +<p>You need to own a domain, and be able to delegate the name service to your own servers. +You also need two spare public IPv4 addresses (in different /24 networks) for your name servers. +A git server or remote repository reachable via git over ssh. +Servers which support <a href="https://github.com/solo5/solo5">solo5</a> guests, and have the corresponding tender installed. +A computer with <a href="https://opam.ocaml.org">opam</a> (&gt;= 2.0.0) installed.</p> +<h2>Data preparation</h2> +<p>Figure out a way to get the DNS entries of your domain in a <a href="https://tools.ietf.org/html/rfc1034">&quot;master file format&quot;</a>, i.e. what bind uses.</p> +<p>This is a master file for the <code>mirage</code> domain, defining <code>$ORIGIN</code> to avoid typing the domain name after each hostname (use <code>@</code> if you need the domain name only; if you need to refer to a hostname in a different domain end it with a dot (<code>.</code>), i.e. <code>ns2.foo.com.</code>). The default time to live <code>$TTL</code> is an hour (3600 seconds). +The zone contains a <a href="https://tools.ietf.org/html/rfc1035#section-3.3.13">start of authority (<code>SOA</code>) record</a> containing the nameserver, hostmaster, serial, refresh, retry, expiry, and minimum. +Also, a single <a href="https://tools.ietf.org/html/rfc1035#section-3.3.11">name server (<code>NS</code>) record</a> <code>ns1</code> is specified with an accompanying <a href="https://tools.ietf.org/html/rfc1035#section-3.4.1">address (<code>A</code>) records</a> pointing to their IPv4 address.</p> +<pre><code class="language-shell">git-repo&gt; cat mirage +$ORIGIN mirage. +$TTL 3600 +@ SOA ns1 hostmaster 1 86400 7200 1048576 3600 +@ NS ns1 +ns1 A 127.0.0.1 +www A 1.1.1.1 +git-repo&gt; git add mirage &amp;&amp; git commit -m initial &amp;&amp; git push +</code></pre> +<h2>Installation</h2> +<p>On your development machine, you need to install various OCaml packages. You don't need privileged access if common tools (C compiler, make, libgmp) are already installed. You have <code>opam</code> installed.</p> +<p>Let's create a fresh <code>switch</code> for the DNS journey:</p> +<pre><code class="language-shell">$ opam init +$ opam update +$ opam switch create udns 4.14.1 +# waiting a bit, a fresh OCaml compiler is getting bootstrapped +$ eval `opam env` #sets some environment variables +</code></pre> +<p>The last command set environment variables in your current shell session, please use the same shell for the commands following (or run <code>eval $(opam env)</code> in another shell and proceed in there - the output of <code>opam switch</code> sohuld point to <code>udns</code>).</p> +<h3>Validation of our zonefile</h3> +<p>First let's check that OCaml-DNS can parse our zonefile:</p> +<pre><code class="language-shell">$ opam install dns-cli #installs ~/.opam/udns/bin/ozone and other binaries +$ ozone &lt;git-repo&gt;/mirage # see ozone --help +successfully checked zone +</code></pre> +<p>Great. Error reporting is not great, but line numbers are indicated (<code>ozone: zone parse problem at line 3: syntax error</code>), <a href="https://github.com/mirage/ocaml-dns/tree/v4.2.0/zone">lexer and parser are lex/yacc style</a> (PRs welcome).</p> +<p>FWIW, <code>ozone</code> accepts <code>--old &lt;filename&gt;</code> to check whether an update from the old zone to the new is fine. This can be used as <a href="https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks">pre-commit hook</a> in your git repository to avoid bad parse states in your name servers.</p> +<h3>Getting the primary up</h3> +<p>The next step is to compile the primary server and run it to serve the domain data. Since the git-via-ssh client is not yet released, we need to add a custom opam repository to this switch.</p> +<pre><code class="language-shell"># get the `mirage` application via opam +$ opam install lwt mirage + +# get the source code of the unikernels +$ git clone https://github.com/roburio/dns-primary-git.git +$ cd dns-primary-git + +# let's build the server first as unix application +$ mirage configure #--no-depext if you have all system dependencies +$ make + +# run it +$ dist/primary-git +# starts a unix process which clones https://github.com/roburio/udns.git +# attempts to parse the data as zone files, and fails on parse error +$ dist/primary-git --remote=https://my-public-git-repository +# this should fail with ENOACCESS since the DNS server tries to listen on port 53 + +# which requires a privileged user, i.e. su, sudo or doas +$ sudo dist/primary-git --remote=https://my-public-git-repository +# leave it running, run the following programs in a different shell + +# test it +$ host ns1.mirage 127.0.0.1 +ns1.mirage has address 127.0.0.1 +$ dig any mirage @127.0.0.1 +# a DNS packet printout with all records available for mirage +</code></pre> +<p>That's exciting, the DNS server serving answers from a remote git repository.</p> +<h3>Securing the git access with ssh</h3> +<p>Let's authenticate the access by using ssh, so we feel ready to push data there as well. The primary-git unikernel already includes an experimental <a href="https://github.com/haesbaert/awa-ssh">ssh client</a>, all we need to do is setting up credentials - in the following a RSA keypair and the server fingerprint.</p> +<pre><code class="language-shell"># collect the RSA host key fingerprint +$ ssh-keyscan &lt;git-server&gt; &gt; /tmp/git-server-public-keys +$ ssh-keygen -l -E sha256 -f /tmp/git-server-public-keys | grep ED25519 +256 SHA256:a5kkkuo7MwTBkW+HDt4km0gGPUAX0y1bFcPMXKxBaD0 &lt;git-server&gt; (ED25519) +# we're interested in the SHA256:yyy only + +# generate a ssh keypair +$ awa_gen_key --keytype ed25510 # installed by the make step above in ~/.opam/udns/bin +private key: ed25519:nO7ervdJqzPfuvdM/J4qImipwVoI5gl53fpqgjZnv9w= +ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICyliWbwWWXBc1+DRIzReLQ4UFiVGXJ6Paw1Jts+XQte awa@awa.local +# please run your own awa_gen_key, don't use the numbers above +</code></pre> +<p>The public key needs is in standard OpenSSH format and needs to be added to the list of accepted keys on your server - the exact steps depend on your git server, if you're running your own with <a href="https://github.com/tv42/gitosis">gitosis</a>, add it as new public key file and grant that key access to the data repository. If you use gitlab or github, you may want to create a new user account and with the generated key.</p> +<p>The private key is not displayed, but only the seed required to re-generate it, when using the same random number generator, in our case <a href="https://mirage.github.io/mirage-crypto/doc/mirage-crypto-rng/Mirage_crypto_rng/index.html">fortuna implemented by mirage-crypto</a> - used by both <code>awa_gen_key</code> and <code>primary-git</code>. The seed is provided as command-line argument while starting <code>primary-git</code>:</p> +<pre><code class="language-shell"># execute with git over ssh, authenticator from ssh-keyscan, seed from awa_gen_key +$ ./primary-git --authenticator=SHA256:a5kkkuo7MwTBkW+HDt4km0gGPUAX0y1bFcPMXKxBaD0 --ssh-key=ed25519:nO7ervdJqzPfuvdM/J4qImipwVoI5gl53fpqgjZnv9w= --remote=git@&lt;git-server&gt;:repo-name.git +# started up, you can try the host and dig commands from above if you like +</code></pre> +<p>To wrap up, we now have a primary authoritative name server for our zone running as Unix process, which clones a remote git repository via ssh on startup and then serves it.</p> +<h3>Authenticated data updates</h3> +<p>Our remote git repository is the source of truth, if you need to add a DNS entry to the zone, you git pull, edit the zone file, remember to increase the serial in the SOA line, run <code>ozone</code>, git commit and push to the repository.</p> +<p>So, the <code>primary-git</code> needs to be informed of git pushes. This requires a communication channel from the git server (or somewhere else, e.g. your laptop) to the DNS server. I prefer in-protocol solutions over adding yet another protocol stack, no way my DNS server will talk HTTP REST.</p> +<p>The DNS protocol has an extension for <a href="https://tools.ietf.org/html/rfc1996">notifications of zone changes</a> (as a DNS packet), usually used between the primary and secondary servers. The <code>primary-git</code> accepts these notify requests (i.e. bends the standard slightly), and upon receival pulls the remote git repository, and serves the fresh zone files. Since a git pull may be rather excessive in terms of CPU cycles and network bandwidth, only authenticated notifications are accepted.</p> +<p>The DNS protocol specifies in another extension <a href="https://tools.ietf.org/html/rfc2845">authentication (DNS TSIG)</a> with transaction signatures on DNS packets including a timestamp and fudge to avoid replay attacks. As key material hmac secrets distribued to both the communication endpoints are used.</p> +<p>To recap, the primary server is configured with command line parameters (for remote repository url and ssh credentials), and serves data from a zonefile. If the secrets would be provided via command line, a restart would be necessary for adding and removing keys. If put into the zonefile, they would be publicly served on request. So instead, we'll use another file, still in zone file format, in the top-level domain <code>_keys</code>, i.e. the <code>mirage._keys</code> file contains keys for the <code>mirage</code> zone. All files ending in <code>._keys</code> are parsed with the normal parser, but put into an authentication store instead of the domain data store, which is served publically.</p> +<p>For encoding hmac secrets into DNS zone file format, the <a href="https://tools.ietf.org/html/rfc4034#section-2"><code>DNSKEY</code></a> format is used (designed for DNSsec). The <a href="https://www.isc.org/bind/">bind</a> software comes with <code>dnssec-keygen</code> and <code>tsig-keygen</code> to generate DNSKEY output: flags is 0, protocol is 3, and algorithm identifier for SHA256 is 163 (SHA384 164, SHA512 165). This is reused by the OCaml DNS library. The key material itself is base64 encoded.</p> +<p>Access control and naming of keys follows the DNS domain name hierarchy - a key has the form name._operation.domain, and has access granted to domain and all subdomains of it. Two operations are supported: update and transfer. In the future there may be a dedicated notify operation, for now we'll use update. The name part is ignored for the update operation.</p> +<p>Since we now embedd secret information in the git repository, it is a good idea to restrict access to it, i.e. make it private and not publicly cloneable or viewable. Let's generate a first hmac secret and send a notify:</p> +<pre><code class="language-shell">$ dd if=/dev/random bs=1 count=32 | b64encode - +begin-base64 644 - +kJJqipaQHQWqZL31Raar6uPnepGFIdtpjkXot9rv2xg= +==== +[..] +git-repo&gt; echo &quot;personal._update.mirage. DNSKEY 0 3 163 kJJqipaQHQWqZL31Raar6uPnepGFIdtpjkXot9rv2xg=&quot; &gt; mirage._keys +git-repo&gt; git add mirage._keys &amp;&amp; git commit -m &quot;add hmac secret&quot; &amp;&amp; git push + +# now we need to restart the primary git to get the git repository with the key +$ ./primary-git --ssh-key=... # arguments from above, remote git, host key fingerprint, private key seed + +# now test that a notify results in a git pull +$ onotify 127.0.0.1 mirage --key=personal._update.mirage:SHA256:kJJqipaQHQWqZL31Raar6uPnepGFIdtpjkXot9rv2xg= +# onotify was installed by dns-cli in ~/.opam/udns/bin/onotify, see --help for options +# further changes to the hmac secrets don't require a restart anymore, a notify packet is sufficient :D +</code></pre> +<p>Ok, this onotify command line could be setup as a git post-commit hook, or run manually after each manual git push.</p> +<h3>Secondary</h3> +<p>It's time to figure out how to integrate the secondary name server. An already existing bind or something else that accepts notifications and issues zone transfers with hmac-sha256 secrets should work out of the box. If you encounter interoperability issues, please get in touch with me.</p> +<p>The <code>secondary</code> unikernel is available from another git repository:</p> +<pre><code class="language-shell"># get the secondary sources +$ git clone https://github.com/roburio/dns-secondary.git +$ cd dns-secondary +</code></pre> +<p>It's only command line argument is a list of hmac secrets used for authenticating that the received data originates from the primary server. Data is initially transferred by a <a href="https://tools.ietf.org/html/rfc5936">full zone transfer (AXFR)</a>, later updates (upon refresh timer or notify request sent by the primary) use <a href="https://tools.ietf.org/html/rfc1995">incremental (IXFR)</a>. Zone transfer requests and data are authenticated with transaction signatures again.</p> +<p>Convenience by OCaml DNS is that transfer key names matter, and are of the form <primary-ip>.<secondary-ip>._transfer.domain, i.e. <code>1.1.1.1.2.2.2.2._transfer.mirage</code> if the primary server is 1.1.1.1, and the secondary 2.2.2.2. Encoding the IP address in the name allows both parties to start the communication: the secondary starts by requesting a SOA for all domains for which keys are provided on command line, and if an authoritative SOA answer is received, the AXFR is triggered. The primary server emits notification requests on startup and then on every zone change (i.e. via git pull) to all secondary IP addresses of transfer keys present for the specific zone in addition to the notifications to the NS records in the zone.</secondary-ip></primary-ip></p> +<pre><code class="language-shell">$ mirage configure +$ make +$ ./dist/secondary +</code></pre> +<h3>IP addresses and routing</h3> +<p>Both primary and secondary serve the data on the DNS port (53) on UDP and TCP. To run both on the same machine and bind them to different IP addresses, we'll use a layer 2 network (ethernet frames) with a host system software switch (bridge interface <code>service</code>), the unikernels as virtual machines (or seccomp-sandboxed) via the <a href="https://github.com/solo5/solo5">solo5</a> backend. Using xen is possible as well. As IP address range we'll use 10.0.42.0/24, and the host system uses the 10.0.42.1.</p> +<p>The primary git needs connectivity to the remote git repository, thus on a laptop in a private network we need network address translation (NAT) from the bridge where the unikernels speak to the Internet where the git repository resides.</p> +<pre><code class="language-shell"># on FreeBSD: +# configure NAT with pf, you need to have forwarding enabled +$ sysctl net.inet.ip.forwarding: 1 +$ echo 'nat pass on wlan0 inet from 10.0.42.0/24 to any -&gt; (wlan0)' &gt;&gt; /etc/pf.conf +$ service pf restart + +# make tap interfaces UP on open() +$ sysctl net.link.tap.up_on_open: 1 + +# bridge creation, naming, and IP setup +$ ifconfig bridge create +bridge0 +$ ifconfig bridge0 name service +$ ifconfig bridge0 10.0.42.1/24 + +# two tap interfaces for our unikernels +$ ifconfig tap create +tap0 +$ ifconfig tap create +tap1 +# add them to the bridge +$ ifconfig service addm tap0 addm tap1 +</code></pre> +<h3>Primary and secondary setup</h3> +<p>Let's update our zone slightly to reflect the IP changes.</p> +<pre><code class="language-shell">git-repo&gt; cat mirage +$ORIGIN mirage. +$TTL 3600 +@ SOA ns1 hostmaster 2 86400 7200 1048576 3600 +@ NS ns1 +@ NS ns2 +ns1 A 10.0.42.2 +ns2 A 10.0.42.3 + +# we also need an additional transfer key +git-repo&gt; cat mirage._keys +personal._update.mirage. DNSKEY 0 3 163 kJJqipaQHQWqZL31Raar6uPnepGFIdtpjkXot9rv2xg= +10.0.42.2.10.0.42.3._transfer.mirage. DNSKEY 0 3 163 cDK6sKyvlt8UBerZlmxuD84ih2KookJGDagJlLVNo20= +git-repo&gt; git commit -m &quot;updates&quot; . &amp;&amp; git push +</code></pre> +<p>Ok, the git repository is ready, now we need to compile the unikernels for the virtualisation target (see <a href="https://mirage.io/wiki/hello-world#Building-for-Another-Backend">other targets</a> for further information).</p> +<pre><code class="language-shell"># back to primary +$ cd ../dns-primary-git +$ mirage configure -t hvt # or e.g. -t spt (and solo5-spt below) +# installs backend-specific opam packages, recompiles some +$ make +[...] +$ solo5-hvt --net:service=tap0 -- primary_git.hvt --ipv4=10.0.42.2/24 --ipv4-gateway=10.0.42.1 --seed=.. --authenticator=.. --remote=... +# should now run as a virtual machine (kvm, bhyve), and clone the git repository +$ dig any mirage @10.0.42.2 +# should reply with the SOA and NS records, and also the name server address records in the additional section + +# secondary +$ cd ../dns-secondary +$ mirage configure -t hvt +$ make +$ solo5-hvt --net:service=tap1 -- secondary.hvt --ipv4=10.0.42.3/24 --keys=10.0.42.2.10.0.42.3._transfer.mirage:SHA256:cDK6sKyvlt8UBerZlmxuD84ih2KookJGDagJlLVNo20= +# an ipv4-gateway is not needed in this setup, but in real deployment later +# it should start up and transfer the mirage zone from the primary + +$ dig any mirage @10.0.42.3 +# should now output the same information as from 10.0.42.2 + +# testing an update and propagation +# edit mirage zone, add a new record and increment the serial number +git-repo&gt; echo &quot;foo A 127.0.0.1&quot; &gt;&gt; mirage +git-repo&gt; vi mirage &lt;- increment serial +git-repo&gt; git commit -m 'add foo' . &amp;&amp; git push +$ onotify 10.0.42.2 mirage --key=personal._update.mirage:SHA256:kJJqipaQHQWqZL31Raar6uPnepGFIdtpjkXot9rv2xg= + +# now check that it worked +$ dig foo.mirage @10.0.42.2 # primary +$ dig foo.mirage @10.0.42.3 # secondary got notified and transferred the zone +</code></pre> +<p>You can also check the behaviour when restarting either of the VMs, whenever the primary is available the zone is synchronised. If the primary is down, the secondary still serves the zone. When the secondary is started while the primary is down, it won't serve any data until the primary is online (the secondary polls periodically, the primary sends notifies on startup).</p> +<h3>Dynamic data updates via DNS, pushed to git</h3> +<p>DNS is a rich protocol, and it also has builtin <a href="https://tools.ietf.org/html/rfc2136">updates</a> that are supported by OCaml DNS, again authenticated with hmac-sha256 and shared secrets. Bind provides the command-line utility <code>nsupdate</code> to send these update packets, a simple <code>oupdate</code> unix utility is available as well (i.e. for integration of dynamic DNS clients). You know the drill, add a shared secret to the primary, git push, notify the primary, and voila we can dynamically in-protocol update. An update received by the primary via this way will trigger a git push to the remote git repository, and notifications to the secondary servers as described above.</p> +<pre><code class="language-shell"># being lazy, I reuse the key above +$ oupdate 10.0.42.2 personal._update.mirage:SHA256:kJJqipaQHQWqZL31Raar6uPnepGFIdtpjkXot9rv2xg= my-other.mirage 1.2.3.4 + +# let's observe the remote git +git-repo&gt; git pull +# there should be a new commit generated by the primary +git-repo&gt; git log + +# test it, should return 1.2.3.4 +$ dig my-other.mirage @10.0.42.2 +$ dig my-other.mirage @10.0.42.3 +</code></pre> +<p>So we can deploy further <code>oupdate</code> (or <code>nsupdate</code>) clients, distribute hmac secrets, and have the DNS zone updated. The source of truth is still the git repository, where the primary-git pushes to. Merge conflicts and timing of pushes is not yet dealt with. They are unlikely to happen since the primary is notified on pushes and should have up-to-date data in storage. Sorry, I'm unsure about the error semantics, try it yourself.</p> +<h3>Let's encrypt!</h3> +<p><a href="https://letsencrypt.org/">Let's encrypt</a> is a certificate authority (CA), which certificate is shipped as trust anchor in web browsers. They specified a protocol for <a href="https://tools.ietf.org/html/draft-ietf-acme-acme-05">automated certificate management environment (ACME)</a>, used to get X509 certificates for your services. In the protocol, a certificate signing request (publickey and hostname) is sent to let's encrypt servers, which sends a challenge to proof the ownership of the hostnames. One widely-used way to solve this challenge is running a web server, another is to serve it as text record from the authoritative DNS server.</p> +<p>Since I avoid persistent storage when possible, and also don't want to integrate a HTTP client stack in the primary server, I developed a third unikernel that acts as (hidden) secondary server, performs the tedious HTTP communication with let's encrypt servers, and stores all data in the public DNS zone.</p> +<p>For encoding of certificates, the DANE working group specified <a href="https://tools.ietf.org/html/rfc6698.html#section-7.1">TLSA</a> records in DNS. They are quadruples of usage, selector, matching type, and ASN.1 DER-encoded material. We set usage to 3 (domain-issued certificate), matching type to 0 (no hash), and selector to 0 (full certificate) or 255 (private usage) for certificate signing requests. The interaction is as follows:</p> +<ol> +<li>Primary, secondary, and let's encrypt unikernels are running +</li> +<li>A service (<code>ocertify</code>, <code>unikernels/certificate</code>, or the <code>dns-certify.mirage</code> library) demands a TLS certificate, and has a hmac-secret for the primary DNS +</li> +<li>The service generates a certificate signing request with the desired hostname(s), and performs an nsupdate with TLSA 255 <der encoded="encoded" signing-request="signing-request"> +</der></li> +<li>The primary accepts the update, pushes the new zone to git, and sends notifies to secondary and let's encrypt unikernels which (incrementally) transfer the zone +</li> +<li>The let's encrypt unikernel notices while transferring the zone a signing request without a certificate, starts HTTP interaction with let's encrypt +</li> +<li>The let's encrypt unikernel solves the challenge, sends the response as update of a TXT record to the primary nameserver +</li> +<li>The primary pushes the TXT record to git, and notifies secondaries (which transfer the zone) +</li> +<li>The let's encrypt servers request the TXT record from either or both authoritative name servers +</li> +<li>The let's encrypt unikernel polls for the issued certificate and send an update to the primary TLSA 0 <der encoded="encoded" certificate="certificate"> +</der></li> +<li>The primary pushes the certificate to git, notifies secondaries (which transfer the zone) +</li> +<li>The service polls TLSA records for the hostname, and use it upon retrieval +</li> +</ol> +<p>Note that neither the signing request nor the certificate contain private key material, thus it is fine to serve them publically. Please also note, that the service polls for the certificate for the hostname in DNS, which is valid (start and end date) certificate and uses the same public key, this certificate is used and steps 3-10 are not executed.</p> +<p>The let's encrypt unikernel does not serve anything, it is a reactive system which acts upon notification from the primary. Thus, it can be executed in a private address space (with a NAT). Since the OCaml DNS server stack needs to push notifications to it, it preserves all incoming signed SOA requests as candidates for notifications on update. The let's encrypt unikernel ensures to always have a connection to the primary to receive notifications.</p> +<pre><code class="language-shell"># getting let's encrypt up and running +$ cd .. +$ git clone https://github.com/roburio/dns-letsencrypt-secondary.git +$ cd dns-letsencrypt-secondary +$ mirage configure -t hvt +$ make + +# run it +$ solo5-hvt --net:service=tap2 -- letsencrypt.hvt --keys=... + +# test it +$ ocertify 10.0.42.2 foo.mirage +</code></pre> +<p>For actual testing with let's encrypt servers you need to have the primary and secondary deployed on your remote hosts, and your domain needs to be delegated to these servers. Good luck. And ensure you have backup your git repository.</p> +<p>As fine print, while this tutorial was about the <code>mirage</code> zone, you can stick any number of zones into the git repository. If you use a <code>_keys</code> file (without any domain prefix), you can configure hmac secrets for all zones, i.e. something to use in your let's encrypt unikernel and secondary unikernel. Dynamic addition of zones is supported, just create a new zonefile and notify the primary, the secondary will be notified and pick it up. The primary responds to a signed SOA for the root zone (i.e. requested by the secondary) with the SOA response (not authoritative), and additionally notifications for all domains of the primary.</p> +<h3>Conclusion and thanks</h3> +<p>This tutorial presented how to use the OCaml DNS based unikernels to run authoritative name servers for your domain, using a git repository as the source of truth, dynamic authenticated updates, and let's encrypt certificate issuing.</p> +<p>There are further steps to take, such as monitoring (<code>mirage configure --monitoring</code>), which use a second network interface for reporting syslog and metrics to telegraf / influx / grafana. Some DNS features are still missing, most prominently DNSSec.</p> +<p>I'd like to thank all people involved in this software stack, without other key components, including <a href="https://github.com/mirage/ocaml-git">git</a>, <a href="https://github.com/mirage/mirage-crypto">mirage-crypto</a>, <a href="https://github.com/mirage/awa-ssh">awa-ssh</a>, <a href="https://github.com/solo5/sol5">solo5</a>, <a href="https://github.com/mirage/mirage">mirage</a>, <a href="https://github.com/mmaker/ocaml-letsencrypt">ocaml-letsencrypt</a>, and more.</p> +<p>If you want to support our work on MirageOS unikernels, please <a href="https://robur.coop/Donate">donate to robur</a>. I'm interested in feedback, either via <a href="https://twitter.com/h4nnes">twitter</a>, <a href="https://mastodon.social/@hannesm">hannesm@mastodon.social</a> or via eMail.</p> +https://hannes.robur.coop/Posts/DnsServerDeploying authoritative OCaml-DNS servers as MirageOS unikernels2019-12-23T21:30:53-00:00hanneshttps://hannes.robur.coop/atomhannes<h2>Reproducible builds summit</h2> +<p>I'm just back from the <a href="https://reproducible-builds.org/events/Marrakesh2019/">Reproducible builds summit 2019</a>. In 2018, several people developing <a href="https://ocaml.org">OCaml</a> and <a href="https://opam.ocaml.org">opam</a> and <a href="https://mirage.io">MirageOS</a>, attended <a href="https://reproducible-builds.org/events/paris2018/">the Reproducible builds summit in Paris</a>. The notes from last year on <a href="https://reproducible-builds.org/events/paris2018/report/#Toc11410_331763073">opam reproducibility</a> and <a href="https://reproducible-builds.org/events/paris2018/report/#Toc11681_331763073">MirageOS reproducibility</a> are online. After last years workshop, Raja started developing the opam reproducibilty builder <a href="https://github.com/rjbou/orb">orb</a>, which I extended at and after this years summit. This year before and after the facilitated summit there were hacking days, which allowed further interaction with participants, writing some code and conduct experiments. I had this year again an exciting time at the summit and hacking days, thanks to our hosts, organisers, and all participants.</p> +<h2>Goal</h2> +<p>Stepping back a bit, first look on the <a href="https://reproducible-builds.org/">goal of reproducible builds</a>: when compiling source code multiple times, the produced binaries should be identical. It should be sufficient if the binaries are behaviourally equal, but this is pretty hard to check. It is much easier to check <strong>bit-wise identity of binaries</strong>, and relaxes the burden on the checker -- checking for reproducibility is reduced to computing the hash of the binaries. Let's stick to the bit-wise identical binary definition, which also means software developers have to avoid non-determinism during compilation in their toolchains, dependent libraries, and developed code.</p> +<p>A <a href="https://reproducible-builds.org/docs/test-bench/">checklist</a> of potential things leading to non-determinism has been written up by the reproducible builds project. Examples include recording the build timestamp into the binary, ordering of code and embedded data. The reproducible builds project also developed <a href="https://packages.debian.org/sid/disorderfs">disorderfs</a> for testing reproducibility and <a href="https://diffoscope.org/">diffoscope</a> for comparing binaries with file-dependent readers, falling back to <code>objdump</code> and <code>hexdump</code>. A giant <a href="https://tests.reproducible-builds.org/">test infrastructure</a> with <a href="https://tests.reproducible-builds.org/debian/index_variations.html">lots of variations</a> between the builds, mostly using Debian, has been setup over the years.</p> +<p>Reproducibility is a precondition for trustworthy binaries. See <a href="https://reproducible-builds.org/#why-does-it-matter">why does it matter</a>. If there are no instructions how to get from the published sources to the exact binary, why should anyone trust and use the binary which claims to be the result of the sources? It may as well contain different code, including a backdoor, bitcoin mining code, outputting the wrong results for specific inputs, etc. Reproducibility does not imply the software is free of security issues or backdoors, but instead of a audit of the binary - which is tedious and rarely done - the source code can be audited - but the toolchain (compiler, linker, ..) used for compilation needs to be taken into account, i.e. trusted or audited to not be malicious. <strong>I will only ever publish binaries if they are reproducible</strong>.</p> +<p>My main interest at the summit was to enhance existing tooling and conduct some experiments about the reproducibility of <a href="https://mirage.io">MirageOS unikernels</a> -- a unikernel is a statically linked ELF binary to be run as Unix process or <a href="https://github.com/solo5/solo5">virtual machine</a>. MirageOS heavily uses <a href="https://ocaml.org">OCaml</a> and <a href="https://opam.ocaml.org">opam</a>, the OCaml package manager, and is an opam package itself. Thus, <em>checking reproducibility of a MirageOS unikernel is the same problem as checking reproducibility of an opam package</em>.</p> +<h2>Reproducible builds with opam</h2> +<p>Testing for reproducibility is achieved by taking the sources and compile them twice independently. Afterwards the equality of the resulting binaries can be checked. In trivial projects, the sources is just a single file, or originate from a single tarball. In OCaml, opam uses <a href="https://github.com/ocaml/opam-repository">a community repository</a> where OCaml developers publish their package releases to, but can also use custom repositores, and in addition pin packages to git remotes (url including branch or commit), or a directory on the local filesystem. Manually tracking and updating all dependent packages of a MirageOS unikernel is not feasible: our hello-world compiled for hvt (kvm/BHyve) already has 79 opam dependencies, including the OCaml compiler which is distribued as opam package. The unikernel serving this website depends on 175 opam packages.</p> +<p>Conceptually there should be two tools, the <em>initial builder</em>, which takes the latest opam packages which do not conflict, and exports exact package versions used during the build, as well as hashes of binaries. The other tool is a <em>rebuilder</em>, which imports the export, conducts a build, and outputs the hashes of the produced binaries.</p> +<p>Opam has the concept of a <code>switch</code>, which is an environment where a package set is installed. Switches are independent of each other, and can already be exported and imported. Unfortunately the export is incomplete: if a package includes additional patches as part of the repository -- sometimes needed for fixing releases where the actual author or maintainer of a package responds slowly -- these package neither the patches end up in the export. Also, if a package is pinned to a git branch, the branch appears in the export, but this may change over time by pushing more commits or even force-pushing to that branch. In <a href="https://github.com/ocaml/opam/pull/4040">PR #4040</a> (under discussion and review), also developed during the summit, I propose to embed the additional files as base64 encoded values in the opam file. To solve the latter issue, I modified the export mechanism to <a href="https://github.com/ocaml/opam/pull/4055">embed the git commit hash (PR #4055)</a>, and avoid sources from a local directory and which do not have a checksum.</p> +<p>So the opam export contains the information required to gather the exact same sources and build instructions of the opam packages. If the opam repository would be self-contained (i.e. not depend on any other tools), this would be sufficient. But opam does not run in thin air, it requires some system utilities such as <code>/bin/sh</code>, <code>sed</code>, a GNU make, commonly <code>git</code>, a C compiler, a linker, an assembler. Since opam is available on various operating systems, the plugin <code>depext</code> handles host system dependencies, e.g. if your opam package requires <code>gmp</code> to be installed, this requires slightly different names depending on host system or distribution, take a look at <a href="https://github.com/ocaml/opam-repository/blob/master/packages/conf-gmp/conf-gmp.1/opam">conf-gmp</a>. This also means, opam has rather good information about both the opam dependencies and the host system dependencies for each package. Please note that the host system packages used during compilation are not yet recorded (i.e. which <code>gmp</code> package was installed and used during the build, only that a <code>gmp</code> package has to be installed). The base utilities mentioned above (C compiler, linker, shell) are also not recorded yet.</p> +<p>Operating system information available in opam (such as architecture, distribution, version), which in some cases maps to exact base utilities, is recorded in the build-environment, a separate artifact. The environment variable <a href="https://reproducible-builds.org/specs/source-date-epoch/"><code>SOURCE_DATE_EPOCH</code></a>, used for communicating the same timestamp when software is required to record a timestamp into the resulting binary, is also captured in the build environment.</p> +<p>Additional environment variables may be captured or used by opam packages to produce different output. To avoid this, both the initial builder and the rebuilder are run with minimal environment variables: only <code>PATH</code> (normalised to a whitelist of <code>/bin</code>, <code>/usr/bin</code>, <code>/usr/local/bin</code> and <code>/opt/bin</code>) and <code>HOME</code> are defined. Missing information at the moment includes CPU features: some libraries (gmp?, nocrypto) emit different code depending on the CPU feature.</p> +<h2>Tooling</h2> +<p><em>TL;DR: A <strong>build</strong> builds an opam package, and outputs <code>.opam-switch</code>, <code>.build-hashes.N</code>, and <code>.build-environment.N</code>. A <strong>rebuild</strong> uses these artifacts as input, builds the package and outputs another <code>.build-hashes.M</code> and <code>.build-environment.M</code>.</em></p> +<p>The command-line utility <code>orb</code> can be installed and used:</p> +<pre><code class="language-sh">$ opam pin add orb git+https://github.com/hannesm/orb.git#active +$ orb build --twice --keep-build-dir --diffoscope &lt;your-favourite-opam-package&gt; +</code></pre> +<p>It provides two subcommands <code>build</code> and <code>rebuild</code>. The <code>build</code> command takes a list of local opam <code>--repos</code> where to take opam packages from (defaults to <code>default</code>), a compiler (either a variant <code>--compiler=4.09.0+flambda</code>, a version <code>--compiler=4.06.0</code>, or a pin to a local development version <code>--compiler-pin=~/ocaml</code>), and optionally an existing switch <code>--use-switch</code>. It creates a switch, builds the packages, and emits the opam export, hashes of all files installed by these packages, and the build environment. The flags <code>--keep-build</code> retains the build products, opam's <code>--keep-build-dir</code> in addition temporary build products and generated source code. If <code>--twice</code> is provided, a rebuild (described next) is executed after the initial build.</p> +<p>The <code>rebuild</code> command takes a directory with the opam export and build environment to build the opam package. It first compares the build-environment with the host system, sets the <code>SOURCE_DATE_EPOCH</code> and switch location accordingly and executes the import. Once the build is finished, it compares the hashes of the resulting files with the previous run. On divergence, if build directories were kept in the previous build, and if diffoscope is available and <code>--diffoscope</code> was provided, diffoscope is run on the diverging files. If <code>--keep-build-dir</code> was provided as well, <code>diff -ur</code> can be used to compare the temporary build and sources, including build logs.</p> +<p>The builds are run in parallel, as opam does, this parallelism does not lead to different binaries in my experiments.</p> +<h2>Results and discussion</h2> +<p><strong>All MirageOS unikernels I have deployed are reproducible \o/</strong>. Also, several binaries such as <code>orb</code> itself, <code>opam</code>, <code>solo5-hvt</code>, and all <code>albatross</code> utilities are reproducible.</p> +<p>The unikernel range from hello world, web servers (e.g. this blog, getting its data on startup via a git clone to memory), authoritative DNS servers, CalDAV server. They vary in size between 79 and 200 opam packages, resulting in 2MB - 16MB big ELF binaries (including debug symbols). The <a href="https://github.com/roburio/reproducible-unikernel-repo">unikernel opam repository</a> contains some reproducible unikernels used for testing. Some work-in-progress enhancements are needed to achieve this:</p> +<p>At the moment, the opam package of a MirageOS unikernel is automatically generated by <code>mirage configure</code>, but only used for tracking opam dependencies. I worked on <a href="https://github.com/mirage/mirage/pull/1022">mirage PR #1022</a> to extend the generated opam package with build and install instructions.</p> +<p>As mentioned above, if locale is set, ocamlgraph needs to be patched to emit a (locale-dependent) timestamp.</p> +<p>The OCaml program <a href="https://github.com/mirage/ocaml-crunch"><code>crunch</code></a> embeds a subdirectory as OCaml code into a binary, which we use in MirageOS quite regularly for static assets, etc. This plays in several ways into reproducibility: on the one hand, it needs a timestamp for its <code>last_modified</code> functionality (and adheres since <a href="https://github.com/mirage/ocaml-crunch/pull/45">June 2018</a> to the <code>SOURCE_DATE_EPOCH</code> spec, thanks to Xavier Clerc). On the other hand, it used before version 3.2.0 (released Dec 14th) hashtables for storing the file contents, where iteration is not deterministic (the insertion is not sorted), <a href="https://github.com/mirage/ocaml-crunch/pull/51">fixed in PR #51</a> by using a Map instead.</p> +<p>In functoria, a tool used to configure MirageOS devices and their dependencies, can emit a list of opam packages which were required to build the unikernel. This uses <code>opam list --required-by --installed --rec &lt;pkgs&gt;</code>, which uses the cudf graph (<a href="https://github.com/mirage/functoria/pull/189#issuecomment-566696426">thanks to Raja for explanation</a>), that is during the rebuild dropping some packages. The <a href="https://github.com/mirage/functoria/pull/189">PR #189</a> avoids by not using the <code>--rec</code> argument, but manually computing the fixpoint.</p> +<p>Certainly, the choice of environment variables, and whether to vary them (as <a href="https://tests.reproducible-builds.org/debian/index_variations.html">debian does</a>) or to not define them (or normalise) while building, is arguably. Since MirageOS does neither support time zone nor internationalisation, there is no need to prematurely solving this issue. On related note, even with different locale settings, MirageOS unikernels are reproducible apart from an <a href="https://github.com/backtracking/ocamlgraph/pull/90">issue in ocamlgraph #90</a> embedding the output of <a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/date.html"><code>date</code></a>, which is different depending on <code>LANG</code> and locale (<code>LC_*</code>) settings.</p> +<p>Prior art in reproducible MirageOS unikernels is the <a href="https://github.com/mirage/qubes-mirage-firewall/">mirage-qubes-firewall</a>. Since <a href="https://github.com/mirage/qubes-mirage-firewall/commit/07ff3d61477383860216c69869a1ffee59145e45">early 2017</a> it is reproducible. Their approach is different by building in a docker container with the opam repository pinned to an exact git commit.</p> +<h2>Further work</h2> +<p>I only tested a certain subset of opam packages and MirageOS unikernels, mainly on a single machine (my laptop) running FreeBSD, and am happy if others will test reproducibility of their OCaml programs with the tools provided. There could as well be CI machines rebuilding opam packages and reporting results to a central repository. I'm pretty sure there are more reproducibility issues in the opam ecosystem. I developed an <a href="https://github.com/roburio/reproducible-testing-repo">reproducible testing opam repository</a> with opam packages that do not depend on OCaml, mainly for further tooling development. Some tests were also conducted on a Debian system with the same result. The variations, apart from build time, were using a different user, and different locale settings.</p> +<p>As mentioned above, more environment, such as the CPU features, and external system packages, should be captured in the build environment.</p> +<p>When comparing OCaml libraries, some output files (cmt / cmti / cma / cmxa) are not deterministic, but contain minimal diverge where I was not able to spot the root cause. It would be great to fix this, likely in the OCaml compiler distribution. Since the final result, the binary I'm interested in, is not affected by non-identical intermediate build products, I hope someone (you?) is interested in improving on this side. OCaml bytecode output also seems to be non-deterministic. There is <a href="https://github.com/coq/coq/issues/11229">a discussion on the coq issue tracker</a> which may be related.</p> +<p>In contrast to initial plans, I did not used the <a href="https://reproducible-builds.org/specs/build-path-prefix-map/"><code>BUILD_PATH_PREFIX_MAP</code></a> environment variable, which is implemented in OCaml by <a href="https://github.com/ocaml/ocaml/pull/1515">PR #1515</a> (and followups). The main reasons are that something in the OCaml toolchain (I suspect the bytecode interpreter) needed absolute paths to find libraries, thus I'd need a symlink from the left-hand side to the current build directory, which was tedious. Also, my installed assembler does not respect the build path prefix map, and BUILD_PATH_PREFIX_MAP is not widely supported. See e.g. the Debian <a href="https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/ocaml-zarith.html">zarith</a> package with different build paths and its effects on the binary.</p> +<p>I'm fine with recording the build path (switch location) in the build environment for now - it turns out to end up only once in MirageOS unikernels, likely by the last linking step, which <a href="http://blog.llvm.org/2019/11/deterministic-builds-with-clang-and-lld.html">hopefully soon be solved by llvm 9.0</a>.</p> +<p>What was fun was to compare the unikernel when built on Linux with gcc against a built on FreeBSD with clang and lld - spoiler: they emit debug sections with different dwarf versions, it is pretty big. Other fun differences were between OCaml compiler versions: the difference between minor versions (4.08.0 vs 4.08.1) is pretty small (~100kB as human-readable output), while the difference between major version (4.08.1 vs 4.09.0) is rather big (~900kB as human-readable diff).</p> +<p>An item on my list for the future is to distribute the opam export, build hashes and build environment artifacts in a authenticated way. I want to integrate this as <a href="https://in-toto.io/">in-toto</a> style into <a href="https://github.com/hannesm/conex">conex</a>, my not-yet-deployed implementation of <a href="https://theupdateframework.github.io/">tuf</a> for opam that needs further development and a test installation, hopefully in 2020.</p> +<p>If you want to support our work on MirageOS unikernels, please <a href="https://robur.coop/Donate">donate to robur</a>. I'm interested in feedback, either via <a href="https://twitter.com/h4nnes">twitter</a>, <a href="https://mastodon.social/@hannesm">hannesm@mastodon.social</a> or via eMail.</p> +https://hannes.robur.coop/Posts/ReproducibleOPAMReproducible MirageOS unikernel builds2019-12-16T18:29:30-00:00hanneshttps://hannes.robur.coop/atomhannes<h2>Cryptographic material</h2> +<p>Once a private and public key pair is generated (doesn't matter whether it is plain RSA, DSA, ECC on any curve), this is fine from a scientific point of view, and can already be used for authenticating and encrypting. From a practical point of view, the public parts need to be exchanged and verified (usually a fingerprint or hash thereof). This leads to the struggle how to encode this cryptographic material, and how to embed an identity (or multiple), capabilities, and other information into it. <a href="https://en.wikipedia.org/wiki/X.509">X.509</a> is a standard to solve this encoding and embedding, and provides more functionality, such as establishing chains of trust and revocation of invalidated or compromised material. X.509 uses certificates, which contain the public key, and additional information (in a extensible key-value store), and are signed by an issuer, either the private key corresponding to the public key - a so-called self-signed certificate - or by a different private key, an authority one step up the chain. A rather long, but very good introduction to certificates by Mike Malone is <a href="https://smallstep.com/blog/everything-pki.html">available here</a>.</p> +<h2>OCaml ecosystem evolving</h2> +<p>More than 5 years ago David Kaloper and I <a href="https://mirage.io/blog/introducing-x509">released the initial ocaml-x509</a> package as part of our <a href="https://nqsb.io">TLS stack</a>, which contained code for decoding and encoding certificates, and path validation of a certificate chain (as described in <a href="https://tools.ietf.org/html/rfc6125">RFC 5280</a>). The validation logic and the decoder/encoder, based on the ASN.1 grammar specified in the RFC, implemented using David's <a href="https://github.com/mirleft/ocaml-asn1-combinators">asn1-combinators</a> library changed much over time.</p> +<p>The OCaml ecosystem evolved over the years, which lead to some changes:</p> +<ul> +<li>Camlp4 deprecation - we used camlp4 for stream parsers of PEM-encoded certificates, and sexplib.syntax to derive s-expression decoders and encoders; +</li> +<li>Avoiding brittle ppx converters - which we used for s-expression decoders and encoders of certificates after camlp4 was deprecated; +</li> +<li>Build and release system iterations - initially oasis and a packed library, then topkg and ocamlbuild, now dune; +</li> +<li>Introduction of the <code>result</code> type in the standard library - we used to use <code>[ `Ok of certificate option | `Fail of failure ]</code>; +</li> +<li>No more leaking exceptions in the public API; +</li> +<li>Usage of pretty-printers, esp with the <a href="https://erratique.ch/software/fmt">fmt</a> library <code>val pp : Format.formatter -&gt; 'a -&gt; unit</code>, instead of <code>val to_string : t -&gt; string</code> functions; +</li> +<li>Release of <a href="https://erratique.ch/software/ptime">ptime</a>, a platform-independent POSIX time support; +</li> +<li>Release of <a href="https://erratique.ch/software/rresult">rresult</a>, which includes combinators for computation <code>result</code>s; +</li> +<li>Release of <a href="https://github.com/hannesm/gmap">gmap</a>, a <code>Map</code> whose value types depend on the key, used for X.509 extensions, GeneralName, DistinguishedName, etc.; +</li> +<li>Release of <a href="https://github.com/hannesm/domain-name">domain-name</a>, a library for domain name operations (as specified in <a href="https://tools.ietf.org/html/rfc1035">RFC 1035</a>) - used for name validation; +</li> +<li>Usage of the <a href="https://github.com/mirage/alcotest">alcotest</a> unit testing framework (instead of oUnit). +</li> +</ul> +<h2>More use cases for X.509</h2> +<p>Initially, we designed and used ocaml-x509 for providing TLS server endpoints and validation in TLS clients - mostly on the public web, where each operating system ships a set of ~100 trust anchors to validate any web server certificate against. But once you have a X.509 implementation, every authentication problem can be solved by applying it.</p> +<h3>Authentication with path building</h3> +<p>It turns out that the trust anchor sets are not equal across operating systems and versions, thus some web servers serve sets, instead of chains, of certificates - as described in <a href="https://tools.ietf.org/html/rfc4158">RFC 4158</a>, where the client implementation needs to build valid paths and accept a connection if any path can be validated. The path building was initially in 0.5.2 slightly wrong, but fixed quickly in <a href="https://github.com/mirleft/ocaml-x509/commit/1a1476308d24bdcc49d45c4cd9ef539ca57461d2">0.5.3</a>.</p> +<h3>Fingerprint authentication</h3> +<p>The chain of trust validation is useful for the open web, where you as software developer don't know to which remote endpoint your software will ever connect to - as long as the remote has a certificate signed (via intermediates) by any of the trust anchors. In the early days, before <a href="https://letsencrypt.org/">let's encrypt</a> was launched and embedded as trust anchors (or cross-signed by already deployed trust anchors), operators needed to pay for a certificate - a business model where some CAs did not bother to check the authenticity of a certificate signing request, and thus random people owning valid certificates for microsoft.com or google.com.</p> +<p>Instead of using the set of trust anchors, the fingerprint of the server certificate, or preferably the fingerprint of the public key of the certificate, can be used for authentication, as optionally done since some years in <a href="https://github.com/hannesm/jackline/commit/a1e6f3159be1e45e6b690845e1b29366c41239a2">jackline</a>, an XMPP client. Support for this certificate / public key pinning was added in x509 0.2.1 / 0.5.0.</p> +<h3>Certificate signing requests</h3> +<p>Until x509 0.4.0 there was no support for generating certificate signing requests (CSR), as defined in PKCS 10, which are self-signed blobs containing a public key, an identity, and possibly extensions. Such as CSR is sent to the certificate authority, and after validation of ownership of the identity and paying a fee, the certificate is issued. Let's encrypt specified the ACME protocol which automates the proof of ownership: they provide a HTTP API for requesting a challenge, providing the response (the proof of ownership) via HTTP or DNS, and then allow the submission of a CSR and downloading the signed certificate. The ocaml-x509 library provides operations for creating such a CSR, and also for signing a CSR to generate a certificate.</p> +<p>Mindy developed the command-line utility <a href="https://github.com/yomimono/ocaml-certify/">certify</a> which uses these operations from the ocaml-x509 library and acts as a swiss-army knife purely in OCaml for these required operations.</p> +<p>Maker developed a <a href="https://github.com/mmaker/ocaml-letsencrypt">let's encrypt library</a> which implements the above mentioned ACME protocol for provisioning CSR to certificates, also using our ocaml-x509 library.</p> +<p>To complete the required certificate authority functionality, in x509 0.6.0 certificate revocation lists, both validation and signing, was implemented.</p> +<h3>Deploying unikernels</h3> +<p>As <a href="https://hannes.robur.coop/Posts/VMM">described in another post</a>, I developed <a href="https://github.com/hannesm/albatross">albatross</a>, an orchestration system for MirageOS unikernels. This uses ASN.1 for internal socket communication and allows remote management via a TLS connection which is mutually authenticated with a X.509 client certificate. To encrypt the X.509 client certificate, first a TLS handshake where the server authenticates itself to the client is established, and over that connection another TLS handshake is established where the client certificate is requested. Note that this mechanism can be dropped with TLS 1.3, since there the certificates are transmitted over an already encrypted channel.</p> +<p>The client certificate already contains the command to execute remotely - as a custom extension, being it &quot;show me the console output&quot;, or &quot;destroy the unikernel with name = YYY&quot;, or &quot;deploy the included unikernel image&quot;. The advantage is that the commands are already authenticated, and there is no need for developing an ad-hoc protocol on top of the TLS session. The resource limits, assigned by the authority, are also part of the certificate chain - i.e. the number of unikernels, access to network bridges, available accumulated memory, accumulated size for block devices, are constrained by the certificate chain presented to the server, and currently running unikernels. The names of the chain are used for access control - if Alice and Bob have intermediate certificates from the same CA, neither Alice may manage Bob's unikernels, nor Bob may manage Alice's unikernels. I'm using albatross since 2.5 years in production on two physical machines with ~20 unikernels total (multiple users, multiple administrative domains), and it works stable and is much nicer to deal with than <code>scp</code> and custom hacked shell scripts.</p> +<h2>Why 0.7?</h2> +<p>There are still some missing pieces in our ocaml-x509 implementation, namely modern ECC certificates (depending on elliptic curve primitives not yet available in OCaml), RSA-PSS signing (should be straightforward), PKCS 12 (there is a <a href="https://github.com/mirleft/ocaml-x509/pull/114">pull request</a>, but this should wait until asn1-combinators supports the <code>ANY defined BY</code> construct to cleanup the code), ... +Once these features are supported, the library should likely be named PKCS since it supports more than X.509, and released as 1.0.</p> +<p>The 0.7 release series moved a lot of modules and function names around, thus it is a major breaking release. By using a map instead of lists for extensions, GeneralName, ..., the API was further revised - invariants that each extension key (an ASN.1 object identifier) may occur at most once are now enforced. By not leaking exceptions through the public interface, the API is easier to use safely - see <a href="https://github.com/mmaker/ocaml-letsencrypt/commit/dc53518f46310f384c9526b1d96a8e8f815a09c7">let's encrypt</a>, <a href="https://git.robur.io/?p=openvpn.git%3Ba=commitdiff%3Bh=929c53116c1438ba1214f53df7506d32da566ccc">openvpn</a>, <a href="https://github.com/yomimono/ocaml-certify/pull/17">certify</a>, <a href="https://github.com/mirleft/ocaml-tls/pull/394">tls</a>, <a href="https://github.com/mirage/capnp-rpc/pull/158">capnp</a>, <a href="https://github.com/hannesm/albatross/commit/50ed6a8d1ead169b3e322aaccb469e870ad72acc">albatross</a>.</p> +<p>I intended in 0.7.0 to have much more precise types, esp. for the SubjectAlternativeName (SAN) extension that uses a GeneralName, but it turns out the GeneralName is as well used for NameConstraints (NC) in a different way -- IP in SAN is an IPv4 or IPv6 address, in CN it is the IP/netmask; DNS is a domain name in SAN, in CN it is a name starting with a leading dot (i.e. &quot;.example.com&quot;), which is not a valid domain name. In 0.7.1, based on a bug report, I had to revert these variants and use less precise types.</p> +<h2>Conclusion</h2> +<p>The work on X.509 was sponsored by <a href="http://ocamllabs.io/">OCaml Labs</a>. You can support our work at robur by a <a href="https://robur.io/Donate">donation</a>, which we will use to work on our OCaml and MirageOS projects. You can also reach out to us to realize commercial products.</p> +<p>I'm interested in feedback, either via <strike><a href="https://twitter.com/h4nnes">twitter</a></strike> <a href="https://mastodon.social/@hannesm">hannesm@mastodon.social</a> or via eMail.</p> +https://hannes.robur.coop/Posts/X50907X509 0.72019-08-15T11:21:30-00:00hannes \ No newline at end of file diff --git a/data/planet/janestreet.xml b/data/planet/janestreet.xml new file mode 100644 index 0000000000..da8ad1b8ca --- /dev/null +++ b/data/planet/janestreet.xml @@ -0,0 +1,587 @@ + +https://blog.janestreet.com/feed.xmljanestreet2023-05-02T14:42:52-00:00https://blog.janestreet.com/feed.xmljanestreet<p>Our traders and researchers love Python for its agility and for its huge +open-source ecosystem, especially when it comes to machine learning. But +the heavy use of notebooks can make it difficult to support. Notebooks +have a very different lifecycle than regular code, and aren&rsquo;t always +rigorously version controlled. And while most of our code (much of it +written in OCaml) lives in a monorepo, putting all notebooks there is +difficult; many notebooks end up being stored all over the place.</p> + +https://blog.janestreet.com/building-reproducible-python-environments-with-xars/Building reproducible Python environments with XARs2023-04-14T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>At Jane Street we use a pattern/library called &ldquo;expect tests&rdquo; that +makes test-writing feel like a REPL session, or like exploratory +programming in a Jupyter notebook&mdash;with feedback cycles so fast and +joyful that it feels almost tactile. Having used them for some time now +this is the only way I&rsquo;d ever want to write tests.</p> + +https://blog.janestreet.com/the-joy-of-expect-tests/What if writing tests was a joyful experience?2023-01-09T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>In 2022 a consortium of companies ran an international competition, +called the <a href="https://www.zprize.io/">ZPrize</a>, to advance the state of +the art in &ldquo;zero-knowledge&rdquo; cryptography. We decided to have a go in +our free time at submitting solutions to both the Multi-Scalar +Multiplication (MSM) and Number Theoretic Transform (NTT) tracks, +using the same open source <a href="https://hardcaml.com/">Hardcaml</a> libraries +that Jane Street uses for our own FPGA development. We believe by +using Hardcaml we were able to more efficiently and robustly come up +with designs in the short competition period. These designs also +interact with the standard vendor RTL flow and so we hope they will be +useful to others.</p> + +https://blog.janestreet.com/zero-knowledge-fpgas-hardcaml/Accelerating zk-SNARKs - MSM and NTT algorithms on FPGAs with Hardcaml2022-12-07T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>The Dojima rice market, established around 1716, is widely considered to +be the world&rsquo;s first organized futures exchange. Instead of directly +exchanging money for rice on the spot, merchants would agree on a price +and future date at which rice and money would be exchanged. This allowed +farmers and consumers to hedge their risk. As a result, information +about the abundance or lack of rice would travel across the country as +fast as rice merchants carried it.</p> + +https://blog.janestreet.com/visualizing-information-propagation-in-markets-index/Visualizing information propagation in markets2022-11-23T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>One of the problems we wrestle with at Jane Street is how to +understand and manage the costs associated with the positions we hold: +things like margin, financing costs, market risk, regulatory capital +requirements, and so on. To that end, we&rsquo;ve built systems that +estimate these costs and propose ways to reduce them. Essentially, +this is a numerical optimization problem.</p> + +https://blog.janestreet.com/computations-that-differentiate-debug-and-document-themselves/Computations that differentiate, debug, and document themselves2022-11-17T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>We are excited to announce the launch of the Jane Street Graduate Research Fellowship!</p> + +https://blog.janestreet.com/graduate-research-fellowship/Introducing the Jane Street Graduate Research Fellowship2022-08-30T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>We&rsquo;re once again at the end of our internship season, and it&rsquo;s my task +to provide a few highlights of what the interns accomplished while +they were here.</p> + +https://blog.janestreet.com/what-the-interns-have-wrought-2022/What the interns have wrought, 2022 edition2022-08-25T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>We are excited to announce research internships in our Tools and +Compilers group.</p> + +https://blog.janestreet.com/research-internships-tnc/Research internships in our Tools and Compilers group2022-03-04T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Software engineering intern candidates often ask how team placement +works and how much input incoming interns have over their teams and +projects. We know team placement is an important factor for many +students when deciding which internship to accept. We&rsquo;ve spent +considerable time and thought on this process in recent years and hope +to demystify the experience with this post. <sup><a href="https://blog.janestreet.com/feed.xml#fn:1" class="footnote">1</a></sup></p> + +<div class="footnotes"> + <ol> + <li> + <p>This process is used in New York and London. Due to their smaller size Hong Kong&rsquo;s process is slightly different&nbsp;<a href="https://blog.janestreet.com/feed.xml#fnref:1" class="reversefootnote">&#8617;</a></p> + </li> + </ol> +</div> +https://blog.janestreet.com/project-pairing/How Jane Street Pairs Interns to Projects and Teams During the Software Engineering Internship2022-01-14T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Intel Processor Trace is a hardware technology that can record all +program execution flow along with timing information accurate to +around 30ns. As far as I can tell <a href="https://engineering.fb.com/2021/04/27/developer-tools/reverse-debugging/">a</a><a href="https://easyperf.net/blog/2019/08/23/Intel-Processor-Trace">l</a><a href="https://github.com/nyx-fuzz/libxdc">m</a><a href="https://blog.trailofbits.com/2021/03/19/un-bee-lievable-performance-fast-coverage-guided-fuzzing-with-honeybee-and-intel-processor-trace/">o</a><a href="http://halobates.de/blog/p/410">s</a><a href="https://dl.acm.org/doi/10.1145/3029806.3029830">t</a> +nobody uses it, seemingly because capturing the data is tricky and, +without any visualization tools, you&rsquo;re forced to read enormous text +dumps.</p> + +https://blog.janestreet.com/magic-trace/Magic-trace: Diagnosing tricky performance issues easily with Intel Processor Trace2022-01-11T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>We spend a lot of time on education at Jane Street. Like, really a +lot.</p> + +https://blog.janestreet.com/hiring-a-developer-educator/Hiring a Developer Educator2021-10-21T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>We recently restructured our standard libraries at Jane Street in a +way that eliminates the difference between <code class="highlighter-rouge">Core_kernel</code> and <code class="highlighter-rouge">Core</code> +and we&rsquo;re happy with the result. The new layout should reach the open +source world before the end of the year.</p> + +https://blog.janestreet.com/goodbye-Core_kernel/Goodbye Core_kernel2021-08-26T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>It&rsquo;s the end of another dev internship season, and this one marked +something of a transition, since halfway through the season, NY-based +interns were invited back to the recently reinvigorated office. Which +means that many more of us got the chance to meet and hang out with +the interns in person than we did last year. And hopefully the +interns were able to get a better sense of Jane Street and how it +operates.</p> + +https://blog.janestreet.com/what-the-interns-have-wrought-2021/What the interns have wrought, 2021 edition2021-08-09T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p><em><strong>This role has been filled</strong></em></p> + +https://blog.janestreet.com/looking-for-a-developer-experience-engineer-index/Looking for a developer experience engineer2021-06-15T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>I am pleased to announce that we have recently released a slew of new +Hardcaml libraries!</p> + +https://blog.janestreet.com/growing-the-hardcaml-toolset-index/Growing the Hardcaml toolset2020-12-01T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Jane Street is running a Kaggle contest based on a real problem with +real financial data. If you like ML projects, or think you might, +<a href="https://www.kaggle.com/c/jane-street-market-prediction" target="_blank">head over and check it +out</a>. +We think it&rsquo;s a pretty fun one. The prizes are pretty good too, with a +total $100K being paid out.</p> + +https://blog.janestreet.com/announcing-our-market-prediction-kaggle-competition-index/Announcing Our Market Prediction Kaggle Competition2020-11-24T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Memory issues can be hard to track down. A function that only +allocates a few small objects can cause a space leak if it&rsquo;s called +often enough and those objects are never collected. Even then, many +objects are <em>supposed</em> to be long-lived. How can a tool, armed with data +on allocations and their lifetimes, +help sort out the expected from the suspicious?</p> + +https://blog.janestreet.com/finding-memory-leaks-with-memtrace/Finding memory leaks with Memtrace2020-10-06T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreetSince version 4.10, OCaml offers a new best-fit memory allocator +alongside its existing default, the next-fit allocator. At Jane +Street, we've seen a big improvement after switching over to the new +allocator. + +This post isn't about how the new allocator works. For that, the best +source is these notes from a talk by its +author. Instead, this post is about just how tricky it is to compare two +allocators in a reasonable way, especially for a garbage-collected +system. + +https://blog.janestreet.com/memory-allocator-showdown/Memory allocator showdown2020-09-15T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>I&rsquo;m excited (and slightly terrified) to announce that Jane Street is +releasing a new podcast, called <a href="https://signalsandthreads.com/">Signals and +Threads</a>, and I&rsquo;m going to be the +host.</p> + +https://blog.janestreet.com/announcing-signals-and-threads-index/Announcing Signals and Threads, a new podcast from Jane Street2020-08-31T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>It&rsquo;s been an unusual internship season.</p> + +https://blog.janestreet.com/what-the-interns-have-wrought-2020/What the interns have wrought, 2020 edition2020-08-17T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>We&rsquo;re busy preparing for our software engineering <a href="https://blog.janestreet.com/unraveling/">fall hiring +season</a>. Over the years we&rsquo;ve +done our best to make our interview process more transparent to +candidates. While many candidates show up knowing something about what +our interviews look like, much of the information floating around on +the internet is outdated or wrong. These past few months have also +changed a lot about the process as we&rsquo;ve adapted to working from home +and other effects of COVID-19.</p> + +https://blog.janestreet.com/jane-street-interview-process-2020/The Jane Street Interview Process &mdash; 2020 Edition2020-07-24T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>At Jane Street, we have some experience using FPGAs for low-latency +systems&ndash;FPGAs are programmable hardware where you get the speed of an +application-specific integrated circuit (ASIC) but without being +committed to a design that&rsquo;s burned into the chip. It wasn&rsquo;t so long +ago that FPGAs were expensive and rare, but these days, you can rent a +$5,000 card on the Amazon AWS cloud for less than $3 an hour.</p> + +https://blog.janestreet.com/really-low-latency-multipliers-and-cryptographic-puzzles/Really low latency multipliers and cryptographic puzzles2020-06-22T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>At Jane Street, an <a href="https://blog.janestreet.com/testing-with-expectations">&ldquo;expect +test&rdquo;</a> is a +test where you don&rsquo;t manually write the output you&rsquo;d like to check +your code against &ndash; instead, this output is captured automatically +and inserted by a tool into the testing code itself. If further runs +produce different output, the test fails, and you&rsquo;re presented with +the diff.</p> + +https://blog.janestreet.com/using-ascii-waveforms-to-test-hardware-designs/Using ASCII waveforms to test hardware designs2020-06-01T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Web browsers have supported custom +<a href="https://en.wikipedia.org/wiki/NPAPI">plug-ins</a> and +<a href="https://en.wikipedia.org/wiki/Browser_extension">extensions</a> since +the 1990s, giving users the ability to add their own features and +tools for improving workflow or building closer integration with +applications or databases running on back-end servers.</p> + +https://blog.janestreet.com/chrome-extensions-finding-the-missing-proof/Chrome extensions: Finding the missing proof2020-04-17T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Jane Street has been posting tech talks from internal speakers and +invited guests for years&mdash;and they&rsquo;re all available on our YouTube +channel:</p> + +https://blog.janestreet.com/watch-all-of-jane-streets-tech-talks/Watch all of Jane Street's tech talks2020-02-20T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>When we set up a schedule on a computer, such as a list of commands to +run every day at particular times via Linux <a href="https://www.ostechnix.com/a-beginners-guide-to-cron-jobs">cron +jobs</a>, we +expect that schedule to execute reliably. Of course we&rsquo;ll check the +logs to see whether the job has failed, but we never question whether +the cron daemon itself will function. We always assume that it will, +as it always has done; we are not expecting mutiny in the ranks of the +operating system.</p> + +https://blog.janestreet.com/troubleshooting-systemd-with-systemtap/Troubleshooting systemd with SystemTap2020-02-03T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<div style="width: 75%; margin: auto; text-align: center; font-style: italic; font-size: 75%"> +The cover image is based on <a href="https://commons.wikimedia.org/wiki/File:Jupiter_family.jpg">Jupiter family</a> by NASA/JPL. +</div> + +https://blog.janestreet.com/using-python-and-ocaml-in-the-same-jupyter-notebook/Using Python and OCaml in the same Jupyter notebook2019-12-16T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<h2>Updates and a New Run</h2> + +https://blog.janestreet.com/deep-learning-the-hardest-go-problem-in-the-world/Deep-Learning the Hardest Go Problem in the World2019-12-06T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>My job involves a lot of staring at large numbers, mostly latencies in +nanoseconds, and picking out magnitudes like microseconds. I noticed +myself constantly counting digits in my text editor, in my terminal, +and in <a href="https://jupyter.org/">Jupyter</a> notebooks in my browser.</p> + +https://blog.janestreet.com/commas-in-big-numbers-everywhere/Commas in big numbers everywhere: An OpenType adventure2019-10-14T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Jane Street&rsquo;s intern program yet again is coming to an end, which is a +nice opportunity to look back over the summer and see what they&rsquo;ve +accomplished.</p> + +https://blog.janestreet.com/what-the-interns-have-wrought-2019/What the interns have wrought, 2019 edition2019-08-30T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Back when the Raspberry Pi was first released in 2012 Michael Bacarella wrote +a <a href="https://blog.janestreet.com/bootstrapping-ocamlasync-on-the-raspberry-pi/">blog post</a> +on using OCaml and Async on this little device. +Since then installing OCaml via opam has become a pretty smooth experience +and everything works out of the box when using Raspbian &ndash; the default Raspberry Pi +distribution.</p> + +https://blog.janestreet.com/using-ocaml-to-drive-a-raspberry-pi-robot-car/Using OCaml to drive a Raspberry Pi robot car2019-08-19T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>As our Tools &amp; Compilers team has grown, the kinds of projects we work +on has become more ambitious. Here are some of the major things we&rsquo;re +currently working on:</p> + +https://blog.janestreet.com/applied-PL-research/Do applied programming languages research at Jane Street!2019-08-16T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Now that OCaml 4.08 has been released, let&rsquo;s have a look at what was +accomplished, with a particular focus on how <a href="https://blog.janestreet.com/plans-for-ocaml-408/">our plans for +4.08</a> fared. I&rsquo;ll mostly focus on work that we +in the Jane Street Tools &amp; Compilers team were involved with, but we are +just some of the contributors to the OCaml compiler, and I&rsquo;ll have a +quick look at the end of the post at some of the other work that went +into 4.08.</p> + +https://blog.janestreet.com/a-look-at-ocaml-4.08/A look at OCaml 4.082019-07-12T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Welcome to another post in our series of how to use OCaml for machine learning. +In previous posts we&rsquo;ve discussed <a href="https://blog.janestreet.com/deep-learning-experiments-in-ocaml/">artistic style-transfer</a> and +<a href="https://blog.janestreet.com/playing-atari-games-with-ocaml-and-deep-rl/">reinforcement learning</a>. If you haven&rsquo;t read these feel +free to do so now, we&rsquo;ll wait right here until you&rsquo;re done. Ready? Ok, let&rsquo;s +continue &hellip;</p> + +https://blog.janestreet.com/of-pythons-and-camels/Of Pythons and Camels2019-07-09T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>At Jane Street, for the last several years, we have been increasingly interested +in machine learning and its many use cases. This is why it was exciting when +earlier this year myself and a few of my colleagues had the opportunity to +attend the AAAI 2019 conference. We&rsquo;d like to take this space to share with you +some of the interesting projects and themes we saw at the conference.</p> + +https://blog.janestreet.com/thoughts-from-aaai-19/Thoughts from AAAI 20192019-05-13T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>If you haven&rsquo;t heard of it, <a href="https://www.depthfirstlearning.com/2018/DFL-Fellowship">Depth First +Learning</a> is a +wonderful resource for learning about machine learning.</p> + +https://blog.janestreet.com/learning-ml-depth-first/Learning ML Depth-First2019-04-17T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Jane Street is sponsoring this year&rsquo;s <a href="https://makemit.org">MakeMIT +hackathon</a>, and we wanted to create a prize for +the winners that would do justice to the maker spirit of the +competition. As makers ourselves &ndash; it&rsquo;s not unusual to find a +&ldquo;software&rdquo; engineer here who hacks on FPGAs or who has a CNC machine +at home &ndash; it felt natural to get our hands dirty.</p> + +https://blog.janestreet.com/hackathon-keyboards/Machining the ultimate hackathon prize2019-02-28T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>At Jane Street, over the last few years, we&rsquo;ve been increasingly exploring machine learning to improve our models. Many of us are fascinated by the rapid improvement we see in a wide variety of applications due to developments in deep learning and reinforcement learning, both for its exciting potential for our own problems, and also on a personal level of pure interest and curiosity outside of work.</p> + +https://blog.janestreet.com/accelerating-self-play-learning-in-go/Accelerating Self-Play Learning in Go2019-02-28T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>In a <a href="https://blog.janestreet.com/deep-learning-experiments-in-ocaml/">previous blog post</a> +we detailed how we used OCaml to reproduce some classical deep-learning results +that would usually be implemented in Python. Here we will do the same with +some Reinforcement Learning (RL) experiments.</p> + +https://blog.janestreet.com/playing-atari-games-with-ocaml-and-deep-rl/Playing Atari Games with OCaml and Deep Reinforcement Learning2019-02-02T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>This blog post is about an interesting detail about machine learning +that I came across as a researcher at Jane Street - that of the +interaction between L2 regularization, also known as +weight decay, and batch normalization.</p> + +https://blog.janestreet.com/l2-regularization-and-batch-norm/L2 Regularization and Batch Norm2019-01-29T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>At Jane Street, our web UIs are built on top of an in-house framework +called <a href="https://github.com/janestreet/incr_dom">Incr_dom</a>, modeled in +part on <a href="https://reactjs.org/docs/faq-internals.html">React&rsquo;s virtual +DOM</a>. Rendering different +views efficiently in response to changes made to a shared model is a +quintessentially incremental computation&mdash;so it should be no surprise +that Incr_dom is built on top of +<a href="https://blog.janestreet.com/introducing-incremental/">Incremental</a>.</p> + +https://blog.janestreet.com/a-tutorial-for-building-web-applications-with-incrdom/A tutorial for building web applications with Incr_dom2019-01-15T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>At Jane Street, we often work with data that has a very low +signal-to-noise ratio, but fortunately we also have a <em>lot</em> of data. +Where practitioners in many fields might be accustomed to +having tens or hundreds of thousands of correctly labeled +examples, some of our problems are more like having a billion training +examples whose labels have only a slight tendency to be correct. +These large datasets present a number of interesting engineering +challenges. The one we address here: <em>How do you shuffle a really +large dataset?</em> (If you&rsquo;re not familiar with why one might need this, +jump to the section <a href="https://blog.janestreet.com/feed.xml#whyshuffle">Why shuffle</a> below.)</p> + +https://blog.janestreet.com/how-to-shuffle-a-big-dataset/How to shuffle a big dataset2018-09-26T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Last year we held a machine learning seminar in our London office, +which was an opportunity to reproduce some classical deep learning +results with a nice twist: we used OCaml as a programming language +rather than Python. This allowed us to train models defined in a +functional way in OCaml on a GPU using TensorFlow.</p> + +https://blog.janestreet.com/deep-learning-experiments-in-ocaml/Deep learning experiments in OCaml2018-09-20T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Yet again, intern season is coming to a close, and so it&rsquo;s time to +look back at what the interns have achieved in their short time with +us. I&rsquo;m always impressed by what our interns manage to squeeze into +the summer, and this year is no different.</p> + +https://blog.janestreet.com/what-the-interns-have-wrought-2018/What the interns have wrought, 2018 edition2018-08-06T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>With the external release of OCaml 4.07.0 imminent, we in Jane Street&rsquo;s +Tools &amp; Compilers group have been planning what we want to work on for +inclusion in OCaml 4.08. These days OCaml uses (or at least attempts) a +time-based release process with releases scheduled every 6 months. We&rsquo;re +trying to avoid rushing in changes at the last minute &ndash; as we&rsquo;ve been +prone to do in the past &ndash; so this list is restricted to things we could +conceivably finish in the next 4-5 months.</p> + +https://blog.janestreet.com/plans-for-ocaml-408/Plans for OCaml 4.082018-06-29T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Expect tests are a technique I&rsquo;ve written about +<a href="https://blog.janestreet.com/testing-with-expectations">before</a>, but until recently, it&rsquo;s been a +little on the theoretical side. That&rsquo;s because it&rsquo;s been hard to take +these ideas out for a spin due to lack of tooling outside of Jane +Street&rsquo;s walls.</p> + +https://blog.janestreet.com/repeatable-exploratory-programming/Repeatable exploratory programming2018-04-22T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>One of the joys of working at Jane Street for the last 15 or so years +has been seeing how our software stack has grown in scope. When I +started, I was building pretty narrowly focused systems for doing +statistical research on trading strategies, and then building systems +for executing those same strategies.</p> + +https://blog.janestreet.com/ocaml-all-the-way-down/OCaml all the way down2018-04-04T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Imagine a system for editing and reviewing code where:</p> + +https://blog.janestreet.com/putting-the-i-back-in-ide-towards-a-github-explorer/Putting the I back in IDE: Towards a Github Explorer2018-03-27T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Interested in learning OCaml? In the NYC area? Then this might +be for you!</p> + +https://blog.janestreet.com/learn-ocaml-nyc/Learn OCaml in NYC2018-02-16T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>People often think of formal methods and theorem provers as forbidding +tools, cool in theory but with a steep learning curve that makes them +hard to use in real life. In this post, we&rsquo;re going to describe a case +we ran into recently where we were able to leverage theorem proving +technology, Z3 in particular, to validate some real world engineering +we were doing on the OCaml compiler. This post is aimed at readers +interested in compilers, but assumes no familiarity with actual +compiler development.</p> + +https://blog.janestreet.com/proofs-and-refutations-using-z3/Proofs (and Refutations) using Z32018-02-15T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>As Jane Street grows, the quality of the development tools we use +matters more and more. We increasingly work on the OCaml compiler +itself: adding useful language features, fine-tuning the type system +and improving the performance of the generated code. Alongside this, +we also work on the surrounding toolchain, developing new tools for +profiling, debugging, documentation and build automation.</p> + +https://blog.janestreet.com/work-on-the-ocaml-compiler-at-jane-street/Work on the OCaml compiler at Jane Street!2017-12-20T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p><i>This post is aimed at readers who are already familiar with +<a href="https://en.wikipedia.org/wiki/Stochastic_gradient_descent">stochastic gradient descent</a> +(SGD) and terms like &ldquo;batch size&rdquo;. For an introduction to these +ideas, I recommend Goodfellow et al.&rsquo;s +<a href="http://www.deeplearningbook.org/">Deep Learning</a>, in particular the +introduction and, for more about SGD, Chapter 8. The relevance of SGD +is that it has made it feasible to work with much more complex models +than was formerly possible.</i></p> + +https://blog.janestreet.com/does-batch-size-matter/Does batch size matter?2017-10-31T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>It&rsquo;s time for our next +<a href="https://www.janestreet.com/tech-talks/">Jane Street Tech Talk</a>. When +we&rsquo;ve solicited suggestions for topics, one common request has been to +talk about our internal development process. Our next talk, +<a href="https://www.janestreet.com/tech-talks/janestreet-code-review/">How Jane Street Does Code Review</a>, +should fit the bill. The talk is being given by our own Ian Henry, and +discusses how we approach code review, and in particular how Iron, the +code review system we&rsquo;ve been using and improving for some years now, +fits in to that process.</p> + +https://blog.janestreet.com/jane-street-tech-talk-how-jane-street-does-code-review/How Jane Street Does Code Review (Jane Street Tech Talk)2017-10-29T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>After a summer hiatus, the Jane Street Tech Talks series is back on +for the fall! Last we left it, our very own Dominick LoBraico +presented on the evolution of our internal configuration methodology +and the systems that support it. For anybody that missed it, you can +check out a recording of the talk <a href="https://www.youtube.com/watch?v=0pX7-AG52BU">on YouTube</a>.</p> + +https://blog.janestreet.com/jane-street-tech-talk-verifying-network-data-planes/Jane Street Tech Talk, Verifying Network Data Planes2017-09-26T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Trading is a competitive business. You need great people and great +technology, of course, but also trading strategies that make money. +Where do those strategies come from? In this post we&rsquo;ll discuss how +the interplay of data, math and technology informs how we develop and +run strategies.</p> + +https://blog.janestreet.com/real-world-machine-learning-part-1/Real world machine learning (part 1)2017-08-28T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>For those of you interested in what +<a href="https://blog.janestreet.com/what-the-interns-have-wrought-rpc_parallel-and-core_profiler">what</a> +<a href="https://blog.janestreet.com/what-the-interns-have-wrought-2016">interns</a> +<a href="https://blog.janestreet.com/what-the-interns-have-wrought-2017">do</a> at Jane Street, here&rsquo;s a +<a href="http://thume.ca/2017/06/17/tree-diffing/">post</a> from former intern +Tristan Hume, on his work developing tree-diffing algorithms last +summer at Jane Street. It&rsquo;s a fun (and very detailed!) read.</p> +https://blog.janestreet.com/how-to-design-a-tree-diffing-algorithm/How to design a tree diffing algorithm2017-08-25T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>People seem to enjoy talking about programming methodologies. They +give them cute names, like +<a href="http://www.extremeprogramming.org/">eXtreme programming</a>, +<a href="https://www.agilealliance.org/">Agile</a>, and +<a href="https://www.scrum.org/resources/what-is-scrum">Scrum</a>; run +<a href="https://www.scrumalliance.org/sgcal">conferences</a> and build +<a href="https://www.scrumalliance.org/community">communities</a> around them; +write +<a href="https://www.amazon.com/Extreme-Programming-Explained-Embrace-Change/dp/0321278658/ref=sr_1_1?ie=UTF8&amp;qid=1503346126&amp;sr=8-1&amp;keywords=extreme%20programming">books</a> +that describe how to use them in excruciating detail; and +<a href="http://agilemanifesto.org/">manifestos</a> that lay out their +philosophy.</p> + +https://blog.janestreet.com/ironing-out-your-development-style/Ironing out your development style2017-08-24T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Jane Street is looking to hire an engineer with experience in both +software and hardware design to work on FPGA-based applications, and +on tools for creating such applications.</p> + +https://blog.janestreet.com/hiring-an-fpga-engineer/Hiring an FPGA engineer2017-08-16T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Intern season is coming to a close, and it&rsquo;s a nice time to look back +(as I&rsquo;ve done in +<a href="https://blog.janestreet.com/what-the-interns-have-wrought-rpc_parallel-and-core_profiler">previous</a> +<a href="https://blog.janestreet.com/what-the-interns-have-wrought-2016">years</a>) and review some of what +the interns did while they were here. The dev intern program has grown +considerably, with almost 40 dev interns between our NY, London, and +Hong Kong offices.</p> + +https://blog.janestreet.com/what-the-interns-have-wrought-2017/What the interns have wrought, 2017 edition2017-08-14T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>There are abundant resources online trying to scare programmers away from using +shell scripts. Most of them, if anything, succeed in convincing the reader to +blindly put something that resembles</p> + +https://blog.janestreet.com/when-bash-scripts-bite/When Bash Scripts Bite2017-05-11T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p><em>Update: I&rsquo;m excited to say that we&rsquo;ve now hired a (great!) technical +writer, so the position is closed.</em></p> + +https://blog.janestreet.com/looking-for-a-technical-writer/Looking for a technical writer2017-05-01T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>We have a new <a href="https://www.janestreet.com/tech-talks/">tech talk</a> coming up on +May 17th, from our very own Dominick LoBraico. This one is about how to +represent configurations with programs. In some sense, this is an obvious idea. +Lots of programmers have experienced the dysphoria that comes from watching your +elegant little configuration format metamorphize into a badly constructed +programming language with miserable tools. This happens because, as you try to +make your configs clearer and more concise, you often end up walking down the +primrose path of making your config format ever more language-like. But you +never really have the time to make it into a proper language.</p> + +https://blog.janestreet.com/caveat-configurator-how-to-replace-configs-with-code-and-why-you-might-not-want-to/Caveat Configurator: how to replace configs with code, and why you might not want to2017-04-25T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>It&rsquo;s often surprising just how much software performance depends on how the +software is deployed. All the time and effort you&rsquo;ve invested in optimization +can be erased by a few bad decisions in scheduler policy, affinity, or +background workload on a server.</p> + +https://blog.janestreet.com/this-is-not-the-performance-you-were-looking-for-the-tricks-systems-play-on-us/This is not the performance you were looking for: the tricks systems play on us2017-04-20T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>From now and then, I found myself having to write some mechanical and repetitive +code. The usual solution for this is to write a code generator; for instance in +the form of a ppx rewriter in the case of OCaml code. This however comes with a +cost: code generators are harder to review than plain code and it is a new +syntax to learn for other developers. So when the repetitive pattern is local to +a specific library or not widely used, it is often not worth the effort. +Especially if the code in question is meant to be reviewed and maintained by +several people.</p> + +https://blog.janestreet.com/trivial-meta-programming-with-cinaps/Trivial meta-programming with cinaps2017-03-20T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>I&rsquo;m happy to announce our next <a href="https://www.janestreet.com/tech-talks/">public tech +talk</a>, called <strong>Seven +Implementations of Incremental</strong>, on Wednesday, April 5th, presented by yours +truly. You can register +<a href="https://docs.google.com/forms/d/e/1FAIpQLSdtly4y-jYcLUVH8BJS-uKoiaKrQlRXSIWZeczw3tgwTx_6HA/viewform?c=0&amp;w=1">here</a>.</p> + +https://blog.janestreet.com/one-more-talk-two-more-videos/One more talk, two more videos2017-03-15T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Are you thinking about +<a href="https://www.janestreet.com/join-jane-street/apply/">applying</a> to Jane Street +for a software engineering role? Or already have a phone interview scheduled but unsure +what to expect? Read on as we walk through an example phone interview with you.</p> + +https://blog.janestreet.com/what-a-jane-street-dev-interview-is-like/What a Jane Street software engineering interview is like2017-02-28T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Our first <a href="https://blog.janestreet.com/how-to-build-an-exchange/">Jane Street Tech Talk</a> went really well! +Thanks to everyone who came and made it a fun event.</p> + +https://blog.janestreet.com/jane-street-tech-talks-verifying-puppet-configs/Jane Street Tech Talks: Verifying Puppet Configs2017-02-16T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p><strong>UPDATE</strong>: <em>We are full up. Tons of people signed up for the talk, and we&rsquo;re +now at the limit of what we feel like we can support in the space. Thanks for +all the interest, and if you didn&rsquo;t get into this one, don&rsquo;t worry, we have more +talks coming!</em></p> + +https://blog.janestreet.com/how-to-build-an-exchange/How to Build an Exchange2017-01-11T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Spacetime is a new memory profiling facility for OCaml to help find space leaks +and unwanted allocations. Whilst still a little rough around the edges, we&rsquo;ve +found it to be a very useful tool. Since there&rsquo;s not much documentation for +using spacetime beyond <a href="https://github.com/lpw25/prof_spacetime/blob/master/Readme.md">this +readme</a>, I&rsquo;ve +written a little intro to give people an idea of how to use it.</p> + +https://blog.janestreet.com/a-brief-trip-through-spacetime/A brief trip through Spacetime2017-01-09T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Ppx is a preprocessing system for OCaml where one maps over the OCaml abstract +syntax tree (AST) to interpret some special syntax fragments to generate code.</p> + +https://blog.janestreet.com/an-solution-to-the-ppx-versioning-problem/A solution to the ppx versioning problem2016-11-08T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>I was recently invited to do the keynote at the <a href="http://cufp.org/2016/">Commercial Users of Functional +Programming</a> workshop, a 15-year-old gathering which is +attached to ICFP, the primary academic functional programming conference.</p> + +https://blog.janestreet.com/observations-of-a-functional-programmer/Observations of a functional programmer2016-10-27T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Now that the interns have mostly gone back to school, it&rsquo;s a good time to look +back at what they did while they were here. We had a bumper crop &ndash; more than 30 +dev interns between our London, New York and Hong Kong offices &ndash; and they +worked on just about every corner of our code-base.</p> + +https://blog.janestreet.com/what-the-interns-have-wrought-2016/What the interns have wrought, 20162016-09-13T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Recruiting talented people has always been challenging.</p> + +https://blog.janestreet.com/unraveling/Unraveling of the tech hiring market2016-08-31T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>In the last few years, we&rsquo;ve spent more and more effort working on developer +tools, to the point where we now have a tools-and-compilers group devoted to the +area, for which we&rsquo;re actively hiring.</p> + +https://blog.janestreet.com/do-you-love-dev-tools-come-work-at-jane-street/Do you love dev tools? Come work at Jane Street.2016-08-30T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Earlier this year, we created +a <a href="http://github.com/janestreet/ppx_let">ppx_let</a>, a PPX rewriter that +introduces a syntax for working with monadic and applicative libraries like +Command, Async, Result and Incremental. We&rsquo;ve now amassed about six months of +experience with it, and we&rsquo;ve now seen enough to recommend it to a wider +audience.</p> + +https://blog.janestreet.com/let-syntax-and-why-you-should-use-it/Let syntax, and why you should use it2016-06-21T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>At Jane Street, we have always been heavy users of pre-processors, first with +camlp4 and now ppx. Pre-processing makes the infrastructure a bit more complex, +but it save us a lot of time by taking care of a lot of tedious boilerplate code +and in some case makes the code a bit prettier.</p> + +https://blog.janestreet.com/ppx_core-context-free-rewriters-for-better-semantic-and-faster-compilation/ppx_core: context-free rewriters for better semantics and faster compilation2016-05-23T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>We finally got a decent recording of one of my favorite talks. This one is about +our <a href="https://github.com/janestreet/incremental">Incremental</a> library (which I +wrote about <a href="https://blog.janestreet.com/introducing-incremental/">here</a>), and in particular about the +story of how we got to the present, quite performant, implementation.</p> + +https://blog.janestreet.com/seven-implementations-of-incremental/Seven Implementations of Incremental2016-03-09T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>In my <a href="https://blog.janestreet.com/flambda">previous post</a> I wrote about Flambda, which is the single +biggest feature coming to OCaml in this release. In this post, I&rsquo;ll review the +other features of 4.03 that caught my eye.</p> + +https://blog.janestreet.com/ocaml-4-03-everything-else/OCaml 4.03: Everything else2016-03-01T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>OCaml 4.03 is branched and a first release candidate is imminent, so it seems +like a good time to take stock of what&rsquo;s coming.</p> + +https://blog.janestreet.com/flambda/A better inliner for OCaml, and why it matters2016-02-24T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>In my last <a href="https://blog.janestreet.com/self-adjusting-dom/">post</a>, I gave some simple examples showing how +you could use +<a href="http://www.umut-acar.org/self-adjusting-computation">self adjusting computations</a>, +or SAC, as embodied by our <a href="https://blog.janestreet.com/introducing-incremental/">Incremental</a> library, to +incrementalize the computation of virtual dom nodes. In this post, I&rsquo;d like to +discuss how we can extend this approach to more realistic scales, and some of +the extensions to Incremental itself that are required to get there.</p> + +https://blog.janestreet.com/self-adjusting-dom-and-diffable-data/Self Adjusting DOM and Diffable Data2016-02-10T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>I&rsquo;ve been <a href="https://blog.janestreet.com/incrementality-and-the-web/">thinking recently</a> about how to +structure dynamic web applications, and in particular about the role that +incremental computation should play.</p> + +https://blog.janestreet.com/self-adjusting-dom/Self Adjusting DOM2016-02-06T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>I&rsquo;ve recently been thinking about the world of JavaScript and web applications. +That&rsquo;s odd for me, since I know almost nothing about the web. Indeed, Jane +Street&rsquo;s use of web technologies is quite minimal &ndash; nearly all of our user +interfaces are text based, and all told we&rsquo;ve been pretty happy with that.</p> + +https://blog.janestreet.com/incrementality-and-the-web/Incremental computation and the web2016-01-30T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<div class="video-container"> + <iframe src="https://youtube.com/embed/v1CmGbOGb2I?rel=0" width="560" height="315" frameborder="0" allowfullscreen=""></iframe> +</div> + +https://blog.janestreet.com/why-ocaml/Why OCaml?2016-01-25T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Testing is important, and it&rsquo;s hard to get people to do as much of it as they +should. Testing tools matter because the smoother the process is, the more tests +people will write.</p> + +https://blog.janestreet.com/testing-with-expectations/Testing with expectations2015-12-02T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Automated testing is a powerful tool for finding bugs and specifying correctness +properties of code. Haskell&rsquo;s Quickcheck library is the most well-known +automated testing library, based on over 15 years of research into how to write +property-base tests, generate useful sources of inputs, and report manageable +counterexamples. Jane Street&rsquo;s Core library has not had anything comparable up +until now; version 113.00 of Core finally has a version of Quickcheck, +integrating automated testing with our other facilities like s-expression +reporting for counterexample values, and support for asynchronous tests using +Async.</p> + +https://blog.janestreet.com/quickcheck-for-core/Quickcheck for Core2015-10-26T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>I&rsquo;m not sure how I&rsquo;ve managed to use rsync for so many years without ever +noticing this, but hey, you learn something new every day!</p> + +https://blog.janestreet.com/rsync-rounds-timestamps-to-the-nearest-second/rsync rounds timestamps to the nearest second2015-10-07T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Jane Street is a serious functional programming shop. We use OCaml, a statically +typed functional language for almost everything and have what is probably the +largest OCaml codebase anywhere.</p> + +https://blog.janestreet.com/no-functional-experience-required/No (functional) experience required2015-08-19T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>I&rsquo;m pleased to announce the release of +<a href="https://github.com/janestreet/incremental">Incremental</a> (well +commented mli +<a href="https://github.com/janestreet/incremental/blob/master/src/incremental_intf.ml">here</a>), +a powerful library for building <em>self-adjusting computations</em>, <em>i.e.</em>, +computations that can be updated efficiently when their inputs change.</p> + +https://blog.janestreet.com/introducing-incremental/Introducing Incremental2015-07-18T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>As with many projects in the OCaml world, at Jane Street we have been working on +migrating from camlp4 to ppx. After having developed equivalent ppx rewriters +for our camlp4 syntax extensions, the last step is to actually translate the +code source of all our libraries and applications from the camlp4 syntax to the +standard OCaml syntax with extension points and attributes.</p> + +https://blog.janestreet.com/converting-a-code-base-from-camlp4-to-ppx/Converting a code base from camlp4 to ppx2015-07-08T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Even though registers are a low-level CPU concept, having some knowledge about +them can help write faster code. Simply put, a CPU register is a storage for a +single variable. CPU can keep data in memory or cache or in registers and +registers are often much faster. Furthermore, some operations are possible only +when the data is in registers. Hence, the OCaml compiler tries to keep as many +variables as it can in the registers.</p> + +https://blog.janestreet.com/cpu-registers-and-ocaml-2/CPU Registers and OCaml2015-05-05T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>In the spirit of reinventing the wheel for fun, I hacked this together as a +quick challenge to myself last week. It&rsquo;s a little rough around the edges, but I +thought it was too cute not to share. If you have any bug fixes, please post +them in the comments.</p> + +https://blog.janestreet.com/reverse-web-proxy-in-50-lines-of-bash/Reverse web proxy in ~50 lines of BASH2015-05-01T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>We&rsquo;ve been doing a bunch of work recently on improving the responsiveness of +OCaml&rsquo;s garbage collector. I thought it would be worth discussing these +developments publicly to see if there was any useful feedback to be had on the +ideas that we&rsquo;re investigating.</p> + +https://blog.janestreet.com/building-a-lower-latency-gc/Building a lower-latency GC2015-04-10T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>The official OCaml documentation <a href="http://caml.inria.fr/pub/docs/manual-ocaml-4.01/intfc.html">&ldquo;Interfacing C with +OCaml&rdquo;</a> doesn&rsquo;t +document some interesting performance features.</p> + +https://blog.janestreet.com/faster-ocaml-to-c-calls/Faster OCaml to C calls2015-04-09T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>When GADTs (<a href="http://en.wikipedia.org/wiki/Generalized_algebraic_data_type">Generalized Algebraic Data +Types</a>) landed in +OCaml, I wasn&rsquo;t particularly happy about it. I assumed that it was the kind of +nonsense you get when you let compiler writers design your programming language.</p> + +https://blog.janestreet.com/why-gadts-matter-for-performance/Why GADTs matter for performance2015-03-30T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>We recently released a version of our open source libraries with a much +anticipated +<a href="https://github.com/janestreet/async_kernel/commit/bf11c4211595b2589b6517aefafceb2ad3bdc0fd">change</a> +&ndash; Async_kernel, the heart of the Async concurrent programming library, now +depends only on Core_kernel rather than on Core.</p> + +https://blog.janestreet.com/a-lighter-core/A lighter Core2015-03-21T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>7 years ago, I wrote a <a href="https://blog.janestreet.com/centralizing-distributed-version-control/" title="Centralizing Distributed Version Control">blog +post</a> +about how we at Jane Street were using our distributed version control system +(<code class="highlighter-rouge">hg</code>, though the story would be the same for <code class="highlighter-rouge">git</code>) in a partially centralized +way. Essentially, we built a centralized repo and a continuous integration +system whose job was to merge in new changesets. The key responsibility of this +system was to make sure that a change was rejected unless it merged, compiled +and <a href="http://graydon2.dreamwidth.org/1597.html" title="The Not Rocket Science Rule">tested +cleanly</a>.</p> + +https://blog.janestreet.com/centralizing-distributed-version-control-revisited/Centralizing distributed version control, revisited2015-03-04T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>We spend a lot of time and effort on training new people, and it never stops for +long. Right now our winter-intern class is ending; in five months we&rsquo;ll have a +slew of new interns to get up to speed, and a few months after that we&rsquo;ll have +an incoming class of new hires.</p> + +https://blog.janestreet.com/making-making-better/Making making better2015-01-31T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Very early on in his life, while on lengthy voyage from London to Philadelphia, +Ben Franklin created a system of thirteen virtues to live his life by. He spent +the remainder of his days giving special focus to one virtue per week in a 13 +week cycle, as well as noting the virtues he failed to live up to at the end of +each day.</p> + +https://blog.janestreet.com/13-virtues/13 Virtues2015-01-02T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>Sometimes its useful to be able see the values of environment variables in +running processes. We can use the following test program to see how well we can +accomplish this:</p> + +https://blog.janestreet.com/inspecting-the-environment-of-a-running-process/Inspecting the Environment of a Running Process2014-12-01T00:00:00-00:00janestreethttps://blog.janestreet.com/feed.xmljanestreet<p>If you were teaching a programming course, what language would you teach it in?</p> + +https://blog.janestreet.com/how-to-choose-a-teaching-language/How to choose a teaching language2014-11-17T00:00:00-00:00janestreet \ No newline at end of file diff --git a/data/planet/tarides.xml b/data/planet/tarides.xml new file mode 100644 index 0000000000..bbb3163389 --- /dev/null +++ b/data/planet/tarides.xml @@ -0,0 +1,8999 @@ + +https://tarides.com/feed.xmltarides2023-05-02T14:42:52-00:00https://tarides.com/feed.xmltarides<p><a href="https://tn23.mini.debconf.org/">MinidebConf TN 23</a> was organised by Debian Developers and Villupuram Linux Users Group (VGLUG) as a precursor to DebConf 23 in September at Kochi, India. I had an opportunity to attend and speak at MiniDebConf TN.</p> +<p>I presented two sessions, one built on our experiences of introducing <a href="https://github.com/ocaml/code-of-conduct">a Code of Conduct</a> to an <a href="https://discuss.ocaml.org/t/adopting-the-ocaml-code-of-conduct/10870">open source community</a> <a href="https://hackmd.io/JIWCOrBfQ7CfzPqeDw4t2Q#/">(slides here</a>), and one called <a href="https://hackmd.io/wgB3EzlAQA6aTnQGyyp5Rw#/"><em>An Invitation to OCaml</em></a>, aimed at people with no prior OCaml experience. I was pleased to see a lot of folks getting interested in learning OCaml.</p> +<p>Over the course of two days, I attended interesting sessions by speakers from across India and other parts of the world.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#first-day" aria-label="first day permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>First Day</h3> +<p>The conference was inaugurated by Dr. Ravikumar, a Member of Parliament (MP) from the Villupuram district. A tech-savvy politician who has presence in the fediverse, the MP emphasised the importance of adopting FOSS technologies by the government in his speech. He released the private beta of <a href="https://prav.app/">Prav</a>, a privacy-focussed communication app.</p> +<p>The first session was &quot;Introduction to Debian&quot; by Sruthi Chandran, the first and only woman Debian Developer from India. It was interesting to see how the Debian community is comprised of a diverse set of people all across the world and is completely driven by volunteers. I learnt about the <em>do-o-cratic model</em>, where people doing the work make decisions.</p> +<p>A professor at RV College of Engineering and a FOSS enthusiast, Dr. <a href="http://deepikak.in/">Deepika</a>'s session on the KDE ecosystem was a great primer on motivating people to move to FOSS technologies. I found her suggestion to use the term <em>Swatantra Software</em> to indicate Free Software (Free as in Freedom) to be a great one. Then, Martha and Kelvin, mappers by profession, took us through the journey of OpenStreetMaps from a blank state to its growth of being at par with other maps. They did a quick session on how to contribute to it.</p> +<p>Later I presented a session on &quot;Introducing a Code of Conduct&quot; to an open-source community. This talk was built upon our experience of drafting and enforcing a Code of Conduct for the OCaml community, which led to completion in late 2022. This effort started earlier in the same year, with the idea of first forming a group of respected members in the community to act as the enforcement team. The effort was <a href="https://discuss.ocaml.org/t/ocaml-software-foundation-january-2023-update/11217#community-3">supported by the OCaml Software foundation</a>. Once the team had enough strength, we worked on drafting a <a href="%28https://github.com/ocaml/code-of-conduct)">Code of Conduct document</a>, largely inspired from existing texts, and iterating it over till it was accepted by the community.</p> +<p>The day ended with a speakers-only round table session of <em>FOSSivist</em>. It was a discussion with VGLUG volunteers on how to utilize Free and Open Source Software technologies to uplift the lives of underprivileged students in the district. For context, Villupuram falls in the bottom five in the literacy rate, both average and female literacy rate. VGLUG was formed by a group of volunteers actively working on identifying talented first-generation Villupuram learners and training them for a career in tech.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#second-day" aria-label="second day permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Second Day</h3> +<p>The second day saw another lineup of interesting sessions. First was an introduction to contributing to Linux kernel by <a href="https://nihaal.me/">Nihal</a>. This was followed by <a href="https://gwolf.org/">Gunnar Wolf</a>'s session on Debian authentication. It was evident Debian takes privacy seriously, and he urged the listeners to do so too. Bhuvana presented a session on why Diversity and Inclusion is important in tech.</p> +<p>This was followed by my presentation on <a href="https://hackmd.io/wgB3EzlAQA6aTnQGyyp5Rw#"><em>An Invitation to OCaml</em></a>. I talked about all the nice things OCaml offers, including the new and <a href="https://www.youtube.com/watch?v=zJ4G0TKwzVc">exciting features in OCaml 5</a> with Multicore support. The talk is aimed at folks who've been programming in other languages, but new to Functional Programming. We go over why FP, slowly moving on to talk about OCaml features like immutability, type inference, garbage collection, etc. We also briefly touch upon the new features in OCaml 5, namely native support for parallelism and concurrency. It was great to chat about functional programming with folks afterwards.</p> +<p><a href="https://github.com/ranjithsiji">Renjith</a>, an active Wikipedian, presented their story of moving a Malayalam daily newspaper called <em>Janayugom</em> to an entirely FOSS tech stack. This saved the company a lot of money and stopped <em>Janayugom</em> from shutting down. Renjith emphasised the importance of free speech in a democracy and how small maganizes and newspapers play a role in it. Then Subin presented Varnam, an Indic input tool.</p> +<hr/> +<p>I was impressed by the efforts taken by <a href="https://vglug.org/">VGLUG volunteers</a> and the Debian India team to organise everything in a smooth manner. From the time we landed in Villupuram, we did not worry about anything. Transport, food, and lodging were all taken care of by VGLUG volunteers. I did not think such a vibrant community of FOSS users would operate in a rural town. The community is doing great work to uplift the lives of people in their Villupuram.</p> +<p>Best of all, it was great to meet old friends and make new ones. I hope to spread the joy of OCaml in more places.</p>https://tarides.com/blog/2023-04-28-ocaml-at-minidebconf-tn-2023OCaml at MinidebConf TN 20232023-04-28T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>What&rsquo;s the best way to spend a Friday evening? We think most people would agree that hacking on OCaml is pretty much at the top of that list (although full disclosure, our sample size for this data could be larger).</p> +<p>On Friday the 24th of February, Tarides&rsquo;s UK office hosted an evening of compiler hacking, presentations, and talks about all things OCaml. We&rsquo;re continuing a tradition that began in 2013, making this our 19th event, when we (then known as OCaml Labs) were based at the <a href="https://ocamllabs.io/compiler-hacking/">Computer Lab</a> in Cambridge. Just like back then, anyone with an interest in the OCaml compiler is welcome. At our recent event we had a mixture of students, industry professionals, and experts in attendance. If you'd like to create your own compiler hacking sessions, check out the <a href="https://github.com/tarides/compiler-hacking/wiki">wiki here</a>.</p> +<p>Something that&rsquo;s changed since 2013 is that OCaml now represents a large chunk of the undergraduate Computer Science tripos at the University of Cambridge; not only as the implementation language for courses such as Compiler Construction &amp; Semantics of Programming Language, but literally as the first language students are taught! This means that we had quite a few undergraduates turn up &ndash; it was great to see such an interest in OCaml across different backgrounds.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/28ce5039a7c2f139cc25067e673c960a/94358/talk_compiler.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 131.76470588235293%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/28ce5039a7c2f139cc25067e673c960a/7bf67/talk_compiler.jpg" class="gatsby-resp-image-image" alt="David's Talk" title="David's Talk" srcset="/static/28ce5039a7c2f139cc25067e673c960a/651be/talk_compiler.jpg 170w, +/static/28ce5039a7c2f139cc25067e673c960a/d30a3/talk_compiler.jpg 340w, +/static/28ce5039a7c2f139cc25067e673c960a/7bf67/talk_compiler.jpg 680w, +/static/28ce5039a7c2f139cc25067e673c960a/990cb/talk_compiler.jpg 1020w, +/static/28ce5039a7c2f139cc25067e673c960a/c44b8/talk_compiler.jpg 1360w, +/static/28ce5039a7c2f139cc25067e673c960a/94358/talk_compiler.jpg 3839w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#a-welcome-and-introduction-to-the-compiler" aria-label="a welcome and introduction to the compiler permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>A Welcome and Introduction to the Compiler</h2> +<p>The afternoon began with our very own David Allsopp giving the first of the day&rsquo;s two talks. He briefly laid the foundation for what Tarides is and what we do, but focussed on introducing OCaml and outlining some examples of things to hack on. Since we had the pleasure of hosting many undergraduate students who were new to the OCaml community, as well as some grizzled veterans (sorry, Jon!), it was important to have a selection of projects for all abilities.</p> +<p>Suggestions included bug fixes (which are always welcomed), documentation edits and improvements (which are always needed), and <a href="https://github.com/ocaml/ocaml/issues/">issues</a> labelled with the tag &ldquo;good first issue&rdquo; or &ldquo;newcomer job.&rdquo; Compilers that are self-bootstrapped (like OCaml) always require a <a href="https://dl.acm.org/doi/pdf/10.1145/358198.358210">complex build system</a>, so David concluded with a demonstration of the sequence of build system targets, explaining each step along the way.</p> +<p>Once the introduction was over, the room settled into a hive of activity, with some people furiously typing and others scratching their heads and looking thoughtful. Many of the undergrads focused on getting familiar with the OCaml compiler, whereas more experienced developers began undertaking their own hacking projects. We had invited well-known OCaml compiler hackers (including some of our own) as an awesome resource for all levels of experience. A combination of in-person hacking and an informal setting provided the perfect environment for sharing imaginative new ideas - something we&rsquo;ve all been missing since the pandemic.</p> +<p>With everyone divided up into smaller working groups, we worked our way around trying to help everyone make some progress. Groups were working on projects at all levels: some were trying to get the compiler to run <code>hello world</code>, whilst others (Patrick!) were forward-porting advanced modal type features between major versions of the compiler. A third year undergraduate was working on debugging the OCaml compiler for her dissertation, and she was attempting to use <a href="https://en.wikipedia.org/wiki/Hash_consing">hash consing</a> to make multiple identical values use the same bit of memory rather than multiple memory slots as a space-saving solution for the compiler.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/78aff19d70430adf340f314c7cc209b4/e7156/ryan.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 167.05882352941177%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/78aff19d70430adf340f314c7cc209b4/7bf67/ryan.jpg" class="gatsby-resp-image-image" alt="Ryan hacking on modal types" title="Ryan hacking on modal types" srcset="/static/78aff19d70430adf340f314c7cc209b4/651be/ryan.jpg 170w, +/static/78aff19d70430adf340f314c7cc209b4/d30a3/ryan.jpg 340w, +/static/78aff19d70430adf340f314c7cc209b4/7bf67/ryan.jpg 680w, +/static/78aff19d70430adf340f314c7cc209b4/990cb/ryan.jpg 1020w, +/static/78aff19d70430adf340f314c7cc209b4/c44b8/ryan.jpg 1360w, +/static/78aff19d70430adf340f314c7cc209b4/e7156/ryan.jpg 3527w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#local-allocations-and-pizza" aria-label="local allocations and pizza permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Local Allocations and Pizza</h2> +<p>After a couple of hours of hacking, Stephen Dolan gave us a tour of his and Leo White&rsquo;s ground-breaking work on stack allocation. This work was <a href="http://stedolan.net/talks/ocaml22/#1">presented at ICFP 2022</a> and is a compiler feature that aims to improve performance by reducing heap allocations in OCaml programs. Local allocations let programs use space on the stack (instead of allocating on the heap), which is automatically reclaimed without requiring the assistance of the (resource-heavy) garbage collector (GC). OCaml uses a stop-the-world parallel garbage collector for collecting recently allocated objects in the minor heap. This means that all the OCaml threads will need to stop when the minor heap is collected. Generating fewer heap allocations means less garbage, and less garbage means improved performance and reduced pause times. This is particularly important for parallel workloads. Local allocations are already being run in production internally at Jane Street, and there are plans to bring the associated benefits to the masses by upstreaming the work to mainline OCaml.</p> +<p>After Stephen&rsquo;s talk, and a quick but much needed pizza break, everyone went back to hacking. An all-too-common problem that cropped up several times happened when trying to run the freshly-built OCaml compiler from the build tree without first installing it. The error messages in this circumstance are not particularly intuitive, complaining of a &quot;bad interpreter: no such file or directory.&quot; The message refers to a bootstrap issue; the program is trying to find the interpreter, but the interpreter hasn&rsquo;t been installed yet. Some people solved the issue and moved straight onto the next task (a very common thing to do), but one group decided to tackle this head-on by improving the error message to provide more detail. This will help other new OCaml compiler developers and will almost certainly make life easier in our future hack events! This kind of &ldquo;simple&rdquo; fix is incredibly important for reducing the barrier to entry for new developers and emphasises the benefits of mixed-experience hack events with newcomers providing feedback and highlighting useful areas of improvement. We hope this work turns into a PR soon!</p> +<p>Another project focussed on ensuring OCaml programs can take advantage of new security features in Linux. There is a relatively new <a href="https://www.phoronix.com/news/Linux-6.3-Tmpfs-IDMAPPED">feature of the kernel</a> that allows the user to create a secure temporary file that is isolated from other users. One participant was experimenting with different versions of OCaml and Linux to see how this feature might be used in OCaml. Implementing this in the <a href="https://v2.ocaml.org/api/Unix.html">Unix module</a> is tempting, but as it provides the &quot;lowest common denominator&quot; interface, it has to be compatible with all platforms, and therefore does not cater to a niche function. A better option would be to write a separate library with a separate binding to address the compatibility issues, but that would require a lot of work for one feature. This illustrates the important kinds of questions that form the debate around supporting new, platform-specific features.</p> +<p>The prize for &ldquo;oldest bug addressed&rdquo; for the evening went to one of our most junior attendees, a first-year computer scientist who took on a problem first reported in 2005. The almost-20-year-old-issue involves structural comparisons of cyclical data structures and is easily reproduced by pasting &ldquo;let rec x = 1 :: x in x = x&rdquo; into a toplevel. A <a href="https://github.com/ocaml/ocaml/pull/12039">pull request fixing the problem</a> was made during the evening and has generated a lot of interesting discussion!</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#until-next-time" aria-label="until next time permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Until Next Time</h2> +<p>We&rsquo;re thrilled that we could restart these events, and it was lovely to see so many familiar faces alongside all the newcomers. The next hack day is scheduled for <a href="https://forms.gle/c6A2TSbUBZeVJSG46">March 31st</a>, and we&rsquo;re excited to see more people working on the compiler.</p> +<p>We&rsquo;d love to see you at a future event, but even if you can&rsquo;t come in person, there are loads of ways you can contribute. You can suggest projects and &quot;good first issues,&quot; add and improve on documentation, and even set up your own local event! You can check out the <a href="https://github.com/tarides/compiler-hacking/wiki">wiki here</a>.</p> +<p>We look forward to hanging out with more people around Cambridge who are curious or passionate about OCaml. If you&rsquo;re interested in joining future events in Cambridge, please <a href="https://tarides.com/company">email us</a>, we look forward to hearing from you! See you next time!</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/c8ce8d1832f3c017b0e81ac33743c362/5c13f/hacking.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 155.88235294117646%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/c8ce8d1832f3c017b0e81ac33743c362/7bf67/hacking.jpg" class="gatsby-resp-image-image" alt="Hacking" title="Hacking" srcset="/static/c8ce8d1832f3c017b0e81ac33743c362/651be/hacking.jpg 170w, +/static/c8ce8d1832f3c017b0e81ac33743c362/d30a3/hacking.jpg 340w, +/static/c8ce8d1832f3c017b0e81ac33743c362/7bf67/hacking.jpg 680w, +/static/c8ce8d1832f3c017b0e81ac33743c362/990cb/hacking.jpg 1020w, +/static/c8ce8d1832f3c017b0e81ac33743c362/c44b8/hacking.jpg 1360w, +/static/c8ce8d1832f3c017b0e81ac33743c362/5c13f/hacking.jpg 3228w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span> +<span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/2a82aadb22ee69e06e9b8c805dd7ce96/b62ab/Patrick.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 131.1764705882353%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/2a82aadb22ee69e06e9b8c805dd7ce96/7bf67/Patrick.jpg" class="gatsby-resp-image-image" alt="Patrick" title="Patrick" srcset="/static/2a82aadb22ee69e06e9b8c805dd7ce96/651be/Patrick.jpg 170w, +/static/2a82aadb22ee69e06e9b8c805dd7ce96/d30a3/Patrick.jpg 340w, +/static/2a82aadb22ee69e06e9b8c805dd7ce96/7bf67/Patrick.jpg 680w, +/static/2a82aadb22ee69e06e9b8c805dd7ce96/990cb/Patrick.jpg 1020w, +/static/2a82aadb22ee69e06e9b8c805dd7ce96/c44b8/Patrick.jpg 1360w, +/static/2a82aadb22ee69e06e9b8c805dd7ce96/b62ab/Patrick.jpg 3581w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p>https://tarides.com/blog/2023-03-22-compiler-hacking-in-cambridge-is-backCompiler Hacking in Cambridge is Back!2023-03-22T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>The aim of <a href="https://www.internationalwomensday.com/About">International Women&rsquo;s Day</a> is to raise awareness of gender inequalities and call for the empowerment of women worldwide. The goal is to forge a gender equal world and advance women&rsquo;s equality in all forms.</p> +<p>Within the <a href="https://www.internationalwomensday.com/Mission/Tech">field of technology</a>, the focus is on elevating and advancing gender parity in technology and celebrating the women shaping innovation.</p> +<p>While there have been advancements towards these goals, continuous attention and effort is required to create lasting change.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#how-is-tarides-promoting-gender-equality" aria-label="how is tarides promoting gender equality permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>How is Tarides Promoting Gender Equality?</h2> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#externally" aria-label="externally permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Externally</h3> +<p>One of our goals is to <a href="https://tarides.com/company/">foster diversity and inclusion in tech</a> and help provide more opportunities for underrepresented groups, including women. As a company, we partner with and support several organisations that promote diversity in the field of computer science. Some of them include:</p> +<ul> +<li><a href="https://www.50intech.com/about-us">50 in Tech</a> is working to achieve a gender balance of 50% women in tech by 2050. Their Gender Score Board helps companies across Europe measure their level of gender inclusion. They selected Tarides as an inclusive company and <a href="https://app.50intech.com/company/tarides">featured us on their website</a>. For more information on our work with 50 in Tech, we have a <a href="https://tarides.com/blog/2022-04-19-tarides-partners-with-50intech">blog post from 2022</a>.</li> +<li>The <a href="https://adatechschool.fr">Ada Tech School</a>, named after the first computer programmer Ada Lovelace, is a programming school designed for women but open to all. They are driven by three values: feminism, empathy, and singularity. We have a <a href="https://tarides.com/blog/2021-02-15-partnering-for-more-diversity-in-tech">blog post on our partnership with them from 2021</a>.</li> +<li><a href="https://girlscancode.fr">GirlsCanCode</a> is an initiative launched by the organisation Prologin that hosts summer camps specially aimed at teaching young women about computer programming, free of charge. We have a <a href="https://tarides.com/blog/2022-09-06-tarides-sponsors-girls-can-code">blog post about why we sponsor GirlsCanCode from 2022</a>.</li> +<li><a href="https://www.recurse.com/about">The Recurse Center</a> is an initiative that offers educational retreats for anyone who wants to get better at programming. They also provide needs-based grants to traditionally underrepresented groups to make programming more accessible for all. We're happy to have hired many talented engineers from the Recurse Center over the years!</li> +<li><a href="https://www.outreachy.org/">Outreachy</a> is an internship program that provides paid remote internships in open source and open science. Outreachy&rsquo;s goal is to increase diversity in open source and expressly invites anyone who faces underrepresentation or systemic bias in the technology industry of their country to apply. We sponsor and mentor interns in each biannual intake, and you can watch project presentations here (<a href="https://watch.ocaml.org/w/eSSmoyEcPTEXPGAqDtKENX">December 2021</a>, <a href="https://watch.ocaml.org/w/vXJtTj3cULRa1bZB5HrecX">May 2022</a>, <a href="https://watch.ocaml.org/w/pQSAfZ9kDSsSnr8Bxzocn3">December 2022</a>); read a <a href="https://discuss.ocaml.org/t/for-diversity-and-the-ocaml-community-outreachy-summer-2022/92340">community post</a> about how to get involved as a mentor; and read a <a href="https://tarides.com/blog/2022-08-02-irmin-in-the-browser">blog post</a> from one of our Summer 2022 interns about her project.</li> +<li>The <a href="https://oxbridgewomenincs8.wixsite.com/2020">Oxbridge Women in Computer Science Conference</a> is an annual one-day event hosted by the Universities of Oxford and Cambridge (UK). The purpose of the conference is to spotlight the successes of women within computer science and strengthen the network of women in computer science within a supportive environment. The conference is free and open to all genders. We have a <a href="https://tarides.com/blog/2020-12-14-tarides-sponsors-the-oxbridge-women-in-computer-science-conference-2020/">blog post from 2020 on our sponsorship of the conference</a>.</li> +</ul> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#internally" aria-label="internally permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Internally</h3> +<p>As a company we aim to provide a flexible and supportive working environment that encourages women to enter and remain in the workforce. Our aim is to make working at Tarides as inclusive as possible.</p> +<p>Examples of our policies include:</p> +<ul> +<li>Childcare support as an employee perk</li> +<li>Flexible hours and working</li> +<li>Equal pay scales based on experience and skills</li> +<li>Apprenticeships and internships to kickstart careers, or to enable later-stage career changes</li> +<li>Career progression development in technical and managerial roles</li> +</ul> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#still-some-way-to-go" aria-label="still some way to go permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Still Some Way to Go</h2> +<p>There is still much room for improvement, and we continue to be committed to removing barriers for women: at Tarides, in open source, and in computer science.</p> +<p>Currently at Tarides, 24% of our workforce is female, with 15% in technical roles. Over the last 12 months, Tarides has grown from 55 to 83 people, with 27% of those hired being women. Despite this increase, we are still below our ambitious goal of reaching 30% of women in tech roles. This highlights the disappointing reality that for each position we want to hire for, there are still proportionally fewer women applying and reaching the later stages of recruitment in our field.</p> +<p>In tech, we can address the issues from a number of angles, all of which will improve the overall picture. Collectively, we still need to encourage girls into STEM areas at an early age, in order to gradually increase the numbers of women in tech overall, but also to increase the size of the potential hiring pool of female applicants. Having female role models is essential, and we must continue to increase the representation of women in technology and STEM fields to encourage girls, and women, to see themselves in these kinds of roles. We must also continue to support lifelong learning by funding and creating training opportunities and resources for later-stage career changes.</p> +<p>At Tarides, we have specifically noticed a skills and training gap between entry-level internships (e.g., Outreachy) and the next level of progression into junior software engineer. We are getting better at helping women make their first steps into the tech world, but where do they go next? This year, we are focussing on how we can specifically improve this by preparing more resources to help learn functional programming, OCaml, and open-source methods, and by understanding the different levels of training and education needed in order to progress beyond these initial stages.</p> +<p>Finally, gender equality is not just a topic we should consider on March 8th every year. We must ensure that equality is in everyone&rsquo;s consciousness and that it forms the basis of our conversations and decisions.</p> +<blockquote> +<p>&ldquo;We will always have STEM with us. Some things will drop out of the public eye and will go away, but there will always be science, engineering and technology. And there will always, always be mathematics. Everything is physics and math.&rdquo; - <a href="https://www.nasa.gov/audience/foreducators/a-lifetime-of-stem.html">Katherine Johnson, NASA mathematician</a></p> +</blockquote>https://tarides.com/blog/2023-03-08-more-than-a-day-how-does-tarides-promote-women-in-techMore Than a Day: How Does Tarides Promote Women in Tech?2023-03-08T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Continuing our blog series on <a href="https://tarides.com/blog/2022-12-19-ocaml-5-with-multicore-support-is-here">Multicore OCaml</a>, this blog provides an overview of the road to OCaml Multicore. If you want to know how you can use OCaml 5 in your own projects, please <a href="https://tarides.com/company">contact us</a> for more information. We also recommend watching KC Sivaramakrishnan's ICFP 22' talk <a href="https://www.youtube.com/watch?v=zJ4G0TKwzVc">Retrofitting Concurrency - Lessons from the Engine Room</a></p> +<hr/> +<p>The journey to <a href="https://github.com/ocaml-multicore/ocaml-multicore/wiki">Multicore OCaml</a> is a journey from cutting-edge theory to real-life code. It&rsquo;s the story of an idea that grew from a small side-project into a multinational effort that brought a long-awaited update to OCaml. Along the road, the Multicore OCaml team faced many different challenges, leading them to re-evaluate their priorities and approach tasks differently.</p> +<p>As part of the Multicore Project since December 2014, KC Sivaramakrishnan is in a good position to describe the process from the initial days of experimentation right up until launch. He has unique insight into the decisions, challenges, and successes that the team experienced as they worked to turn innovative ideas into tangible results.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#the-journey-begins" aria-label="the journey begins permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The Journey Begins</h2> +<p>In 2013, the world had survived the 21st of December 2012, Flappy Bird was popular, and everyone was doing the Harlem shake. At the University of Cambridge, Professor Anil Madhavapeddy launched the Multicore OCaml project as part of the <a href="https://ocamllabs.io/">OCaml Labs</a> initiative alongside Leo White, Jeremy Yallop, and Phillipe Wang. They were eventually joined by Stephen Dolan, the then PhD student working on combining <a href="https://www.bcs.org/events/awards-and-competitions/distinguished-dissertations/previous-winners/2017-competition/">ML-style parameteric polymorphism with subtyping</a>.</p> +<p>In 2014 KC, who had just finished his PhD in the US, joined the team. His PhD had focused on making a multicore version of MLton Standard ML compiler, which made him an asset to the growing team that would see the Multicore OCaml Project through to completion. Together they collaborated on a project that would see many partial victories and setbacks, before ultimately releasing OCaml 5.0 to the public in December 2022.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#timeline" aria-label="timeline permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Timeline</h2> +<p>In the years since the project started, there have been several developments and incremental successes. Below is an overview of the milestones along the road to Multicore OCaml:</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/7a1079030724de99ac8bce15ae51a0e3/9c618/Multicore_timeline_new-01.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 133.52941176470588%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/7a1079030724de99ac8bce15ae51a0e3/7bf67/Multicore_timeline_new-01.jpg" class="gatsby-resp-image-image" alt="Isabella Leandersson's Graphic" title="Isabella Leandersson's Graphic" srcset="/static/7a1079030724de99ac8bce15ae51a0e3/651be/Multicore_timeline_new-01.jpg 170w, +/static/7a1079030724de99ac8bce15ae51a0e3/d30a3/Multicore_timeline_new-01.jpg 340w, +/static/7a1079030724de99ac8bce15ae51a0e3/7bf67/Multicore_timeline_new-01.jpg 680w, +/static/7a1079030724de99ac8bce15ae51a0e3/990cb/Multicore_timeline_new-01.jpg 1020w, +/static/7a1079030724de99ac8bce15ae51a0e3/c44b8/Multicore_timeline_new-01.jpg 1360w, +/static/7a1079030724de99ac8bce15ae51a0e3/9c618/Multicore_timeline_new-01.jpg 10800w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p><strong>2013</strong></p> +<ul> +<li>Multicore OCaml project was started by Prof Anil Madhavapeddy in the <a href="https://ocamllabs.io/">OCaml Labs</a> initiative at the University of Cambridge Computer Lab with Leo White, Jeremy Yallop, and Phillipe Wang. The team was later joined by the then PhD student Stephen Dolan, who was working on combining <a href="https://www.bcs.org/events/awards-and-competitions/distinguished-dissertations/previous-winners/2017-competition/">ML-style parameteric polymorphism with subtyping</a>.</li> +</ul> +<p><strong>2014</strong></p> +<ul> +<li>March: Stephen Dolan, Leo White, and Anil Madhavapeddy started hacking on Multicore OCaml</li> +<li>March: Earliest <a href="https://github.com/ocaml/ocaml/commit/a56e4530b5b173e8de28eead196d6878bc021c55">commit</a> that can be directly attributed to Multicore OCaml that is in the OCaml commit history. The commit removes most of the out-of-heap pointers the interpreter uses by replacing them with stack offsets.</li> +<li>September: A status update on Multicore OCaml was presented in the OCaml workshop 2014, you can read the <a href="https://web.archive.org/web/20160414164304/https://ocaml.org/meetings/ocaml/2014/ocaml2014_1.pdf">associated paper</a> by Stephen Dolan, Leo White, and Anil Madhavapeddy.</li> +</ul> +<p><strong>2015</strong></p> +<ul> +<li>First 6 months: An initial implementation of effect handlers was completed. The inspiration behind this idea came from the <a href="https://www.eff-lang.org">Eff language</a>.</li> +<li>September: Effect handlers in OCaml were presented at the OCaml workshop 2015. You can read more about it in this <a href="https://kcsrk.info/ocaml/multicore/2015/05/20/effects-multicore">blog post</a>.</li> +</ul> +<p><strong>2016</strong></p> +<ul> +<li>May: <a href="https://www.dagstuhl.de/en/seminars/seminar-calendar/seminar-details/16112">Dagstuhl Seminar 16112</a>: &ldquo;From Theory to Practice of Algebraic Effects and Handlers.&rdquo; Effect handlers in OCaml was presented and refined based on expert interactions.</li> +<li>ML workshop: Daniel Hillestr&ouml;m, Sam Lindley, KC Sivaramakrishnan <a href="https://kcsrk.info/publications">&quot;Compiling Links Effect Handlers to the OCaml Backend&quot;</a>. Daniel Hillerstrom developed a Multicore OCaml backend for Links language compiling Links effect handlers to OCaml effect handlers.</li> +<li>ML workshop: Oleg Kiselyov and KC Sivaramakrishnan <a href="https://kcsrk.info/papers/eff_ocaml_ml16.pdf">&quot;Eff Directly in OCaml&quot;</a>. Showed how to get the expressive power of Eff language directly using features from the OCaml language + OCaml effect handlers.</li> +<li>OCaml workshop: KC Sivaramakrishnan and Th&eacute;o Laurent <a href="https://kcsrk.info/papers/reagents_ocaml16.pdf">&quot;Lock-Free Programming for the Masses&quot;</a>. Presented the implementation of Reagents in OCaml, a composable lock-free programming library.</li> +</ul> +<p><strong>2017</strong></p> +<ul> +<li>Papers published at the ML &amp; OCaml Workshop: Stephen Dolan, Spiros Eliopoulos, Daniel Hillerstr&ouml;m, Anil Madhavapeddy, KC Sivaramakrishnan, and Leo White <a href="https://icfp17.sigplan.org/details/mlfamilyworkshop-2017-papers/2/Effectively-tackling-the-awkward-squad">&quot;Effectively Tackling the Awkward Squad&quot;</a>. The work outlined in this paper showed how effect handlers can simplify concurrent systems programming. These ideas were then incorporated in the development of <a href="https://github.com/ocaml-multicore/eio">Eio</a>.</li> +<li>Stephen Dolan and KC Sivaramakrishnan - <a href="https://icfp17.sigplan.org/details/ocaml-2017-talks/19/A-memory-model-for-multicore-OCaml">&quot;A Memory Model for Multicore OCaml&quot;</a>. The paper proposed a relaxed memory model for OCaml, broadly following the design of axiomatic memory models for languages such as C++ and Java, but with a number of differences to provide stronger guarantees and easier reasoning to the programmer, at the expense of not admitting every possible optimisation. This work eventually lead to the <a href="https://v2.ocaml.org/releases/5.0/htmlman/memorymodel.html">relaxed memory model used in OCaml 5</a>.</li> +</ul> +<p><strong>2018</strong></p> +<ul> +<li>Stephen Dolan, KC Sivaramakrishnan, Spiros Eliopoulos, Daniel Hillerstr&ouml;m, Anil Madhavapeddy, and Leo White presented a forward looking paper on &quot;<a href="https://kcsrk.info/papers/system_effects_feb_18.pdf">Concurrent Systems Programming with Effect Handlers&quot;</a> at the Trends in Functional Programming conference. This is the full version of the 2017 ML Workshop paper.</li> +<li>Stephen Dolan, KC Sivaramakrishnan, and Anil Madhavapeddy, published a paper on the relaxed memory model for OCaml at PLDI, <a href="https://kcsrk.info/papers/pldi18-memory.pdf">&quot;Bounding Data Races in Space and Time&quot;</a>. This is the full version of the memory model work presented at the 2017 OCaml Workshop.</li> +<li>The team worked on simplifying and speeding up the implementation of effect handlers.</li> +</ul> +<p><strong>2019</strong></p> +<ul> +<li>Sadiq Jaffer and Tom Kelly implemented a new garbage collector for the minor heap (parallel stop-the-world minor collector), which ensures that programs using C FFI in OCaml remain backwards compatible.</li> +<li>The <a href="https://github.com/ocaml-bench/sandmark/">Sandmark</a> benchmark suite for rigorously benchmarking OCaml programs was developed and deployed. These days the performance of OCaml compiler is tracked continuously using the <a href="https://sandmark.tarides.com">Sandmark nightly continuous benchmarking service</a>.</li> +</ul> +<p><strong>2020</strong></p> +<ul> +<li>The team decided to switch to the parallel stop-the-world minor collector (ParMinor) as default and drop the support for the concurrent minor collector (ConcMinor). ParMinor GC avoided a breaking change in the C FFI introduced by the ConcMinor GC. One concern is that the stop-the-world aspect in ParMinor would be a scalability bottleneck at large core counts. Our performance evaluation on the Sandmark suite showed that the impact of ParMinor is minimal even at large core counts (120+).</li> +<li>KC Sivaramakrishnan, Stephen Dolan, Leo White, Sadiq Jaffer, Tom Kelly, Anmol Sahoo, Sudha Parimala, Atul Dhiman, and Anil Madhavapeddy presented <a href="https://core.ac.uk/download/pdf/328720849.pdf">&quot;Retrofitting Parallelism onto OCaml&quot;</a> at ICFP 2020. The paper describes the design choices for multicore support in OCaml, the design of the ConcMinor and ParMinor GCs, detailed performance evaluation, and justifies our choice to switch to ParMinor as default. It won the distinguished paper award at ICFP.</li> +<li>From 2020 through 2021, the team focused on achieving feature parity with sequential OCaml (systhreads, GC performance, DWARF support, <a href="http://check.ocamllabs.io/">opam health check</a>, etc.)</li> +</ul> +<p><strong>2021</strong></p> +<ul> +<li>KC Sivaramakrishnan, Stephen Dolan, Leo White, Sadiq Jaffer, Tom Kelly, and Anil Madhavapeddy published <a href="https://kcsrk.info/papers/drafts/retro-concurrency.pdf">&quot;Retrofitting Effect Handlers onto OCaml&quot;</a> at PLDI 2021. The paper describes the design choices for the concurrency substrate in OCaml 5 and how effect handlers are a good fit for our needs.</li> +<li>Later half of 2021, OCaml core developers began reviewing code for Multicore OCaml, including the new concurrency and parallelism features, see <a href="https://github.com/ocaml-multicore/docs/blob/main/ocaml_5_upstreaming_proposal.md">this document</a> for more information.</li> +</ul> +<p><strong>2022</strong></p> +<ul> +<li>Early 2022, the <a href="https://github.com/ocaml/ocaml/pull/10831">Multicore PR</a> was merged!</li> +<li>Significant efforts were made by core OCaml developers to implement new features, review them, and ready the compiler for release. Without their hard work and dedication, the would be no OCaml Multicore nor OCaml 5.0.</li> +<li>Memory model successfully implemented</li> +<li>RISC-V backend and ARM64 backend achieved</li> +<li>December 16th, 2022: OCaml 5.0 is released!</li> +</ul> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#why-multicore-ocaml" aria-label="why multicore ocaml permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Why Multicore OCaml?</h2> +<p>The number of cores on the machines that we use have been <a href="https://www.techspot.com/article/2363-multi-core-cpu/">steadily increasing for years</a>. Almost every computer now has several cores available to the user, and for a programming language to use them effectively it must support shared-memory parallel programming. If it does not, the user is forced to execute everything sequentially using only one core, or use multi-process programming, which is hard to use and in many cases less efficient than shared-memory parallel programming.</p> +<p>There are two main features coming with OCaml 5: <strong>Parallelism</strong> and <strong>Concurrency.</strong> Parallelism is about performance; it&rsquo;s the idea that if you have an <em>n</em> amount of cores, you can make your program go faster by <em>n</em> amounts of time. The effects of parallelism will be most keenly felt in how fast your programs run, giving you as a user a significant performance boost.</p> +<p>On a bigger scale, parallel programming is significant for projects that need to complete resource intensive tasks quickly, like <a href="https://tarides.com/blog/2022-12-20-how-nomadic-labs-used-multicore-processing-to-create-a-faster-blockchain">theorem provers</a> for example. With multicore support for OCaml, developers can take advantage of features like type and memory-safety with unprecedented levels of performance.</p> +<p>Concurrency, on the other hand, is a programming abstraction. It is a way to tell your program that you want to execute several functions, each of which may potentially block for a short time while waiting for some external event. The programming language may choose either to execute such functions sequentially, one after the other, on a single core, interleaving their execution when a function gets blocked, or choose to execute them in parallel on several cores at once. Concurrency is useful, for example, when writing a web server that must handle several concurrent requests. The program may handle several such requests at the same time, but not necessarily need to use multiple cores to handle them. With OCaml 5, writing concurrent code is made a lot easier.</p> +<p>Previously, concurrent OCaml code would have to be written in a specific tool, Async or Lwt, that the developer would have to learn separately. However, these tools don&rsquo;t currently allow for asynchronous and synchronous code to interact with each other. In a <a href="https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/">blog post from 2015,</a>, Bob Nystrom goes describes this process in what he calls the &lsquo;Functional Colouring Problem&rsquo;. OCaml 5 brings in support for concurrency through <a href="https://kcsrk.info/webman/manual/effects.html">effect handlers</a> and the new Input/Output library <a href="https://github.com/ocaml-multicore/eio">Eio</a>, which lets developers compose asynchronous and synchronous code together. It&rsquo;s also easy to learn and use since it behaves like normal OCaml code, <a href="https://tarides.com/blog/2022-10-19-porting-charrua-unix-and-rawlink-to-eio">simplifying the developer workflow</a>.</p> +<p>For those who still prefer to use Lwt or Async, OCaml 5 doesn&rsquo;t preclude them from doing so. Should they one day want to switch from using either tool to using Eio, changing their code to be compatible with Eio is simple and user-friendly. Whilst they will still need to rewrite their applications to use the primitives provided by Eio, to do so is straight-forward and can be made incremental thanks to the Lwt- and Async-Eio bridges.</p> +<p>The team of people who worked on OCaml 5 knew from the start that bringing multicore support to OCaml would improve the lives of its users. It would make programs that run in OCaml <a href="https://medium.com/geekculture/what-makes-a-cpu-fast-344517cf91f9">faster and more efficient</a>, as well as help developers be more productive.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#the-academic-and-the-engineer" aria-label="the academic and the engineer permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The Academic and the Engineer</h2> +<p>When academics are on the cutting edge of science, they're essentially creating a new area of research as they go. This leads to a natural lag time between innovation and the creation of academic papers revieweing the process. For example, KC explains that &ldquo;The first <a href="https://kcsrk.info/ocaml/multicore/2015/05/20/effects-multicore/">talk on effect handlers was in 2015</a>, but the first <a href="https://anil.recoil.org/papers/2021-pldi-retroeff.pdf">proper paper on effect handlers</a> was just published in 2021.&rdquo;</p> +<p>Refererring to the time between experimentation and finished product, KC goes on to say: &ldquo;Personally, it has been challenging to take part in building these systems, because 95% of the work is not very visible but you have to create that 95% in order to talk about the 5%.&rdquo;</p> +<p>Since the road to get here has been so long, it feels all the more exhilarating that release day has finally arrived.</p> +<blockquote> +<p>&ldquo;It&rsquo;s incredible that we are at the stage where we&rsquo;re able to take cutting-edge research and put it into practice. In the last few years, we&rsquo;ve expanded from academic research to producing robust code that can be upstreamed.&rdquo;</p> +</blockquote> +<p>This is great news for the whole community, as it demonstrates OCaml&rsquo;s potential to turn research into real products. The team behind the multicore effort&rsquo;s goals were to modernise OCaml and make it faster and more efficient for everyone. The realisation of that goal took years of experimentation, optimisation, and groundbreaking research.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#first-major-challenge-its-all-about-garbage" aria-label="first major challenge its all about garbage permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>First Major Challenge: It&rsquo;s All About Garbage</h2> +<p>The first major challenge facing the Multicore team was OCaml&rsquo;s garbage collector. In OCaml, there are two programs working together on the heap, the language and the garbage collector. If the language supports parallelism but the garbage collector does not, the language would run fast just to be slowed down by the garbage collector.</p> +<p>To avoid this problem, the team made the garbage collector support parallelism to give users a uniformly smooth experience. &ldquo;Garbage collectors balance memory usage at the cost of time, so you can either have it use a small amount of memory but take a long time, or be fast but use a lot of memory,&rdquo; KC comments.</p> +<p>With different variables to optimise for, the team had to make some crucial decisions. OCaml already had a user base with certain expectations. They had to ensure that their changes did not remove features that users had come to expect. For example, OCaml is a <a href="https://ocaml.org/about">robust and predictable language</a>, and they needed to replace the garbage collector without sacrificing on that predictability. They also didn&rsquo;t want to settle for worse results in terms of performance.</p> +<p>Working on the new garbage collector, the team built an initial version that performed very well. However, they soon discovered that in order for the garbage collector to work, it would break the existing Application Programming Interface (API) interacting with C code.</p> +<blockquote> +<p>&ldquo;That was our dilemma: we had a nice, fast, garbage collector, but it would break people&rsquo;s code.&rdquo;</p> +</blockquote> +<p>A broken API would have been bad news for anyone, but it would especially affect any existing projects that relied heavily on C code (like Coq), as well as many industrial users who would have had to change millions of lines of code. The team worried that this would create a fork in the community, between those who would find it worth the upgrade and those who would not.</p> +<p>This was a big lesson for the team: user friendliness is incredibly important when introducing new technologies, and a big part of user friendliness is backwards compatibility. With this in mind, they set out to redesign the garbage collector. Although they were initially resigned to sacrifice some performance for the sake of compatibility, they ended up with a final product that not only did not break any code, but also didn&rsquo;t see significant performance losses! They <a href="https://icfp20.sigplan.org/details/icfp-2020-papers/21/Retrofitting-Parallelism-onto-OCaml">presented their findings</a> at ICFP 2020 and won the distinguished paper award.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#second-major-challenge-memory-model-what-memory-model" aria-label="second major challenge memory model what memory model permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Second Major Challenge: Memory Model, What Memory Model?</h2> +<p>The second challenge came as a result of the very way computers are constructed. Unsurprisingly, the hardware that actually executes your code predates the multicore era. Consequently, the hardware and compilers running the code are designed to make optimisations based on the assumption that you&rsquo;re running a single-threaded (so not multicore) program.</p> +<p>As you might imagine, several of these optimisations conflict with more modern, multicore aspects of code. In order for multicore code to run successfully in the face of these optimisations, useful abstractions are needed to determine what is safe and how parallel code is expected to run. These abstractions are called <a href="http://canonical.org/~kragen/memory-models/">memory models</a>, and they are necessary for hardware made for single-threaded programs to run multi-threaded code.</p> +<p>Memory models are very complex and have to balance simplicity with performance. The more straight-forward the model, the greater the risk that it can&rsquo;t account for all possibilities, and therefore cause bugs. Conversely, if the memory model is complex enough to maximise performance, it will be hard for people to understand and use.</p> +<p>For the Multicore OCaml Project, the team decided to take inspiration from the memory models of <a href="https://cplusplus.com">C++</a> and <a href="https://www.java.com/en/">Java</a>, which choose to prioritise performance. However, they still wanted to make a memory model that was straightforward and intuitive. &ldquo;OCaml is used to prove other languages, and if the memory model is too complicated, it becomes hard to verify other parallel programs,&rdquo; KC explains.</p> +<p>By sacrificing a small amount of performance (around 3%), the team managed to create an <a href="https://kcsrk.info/webman/manual/memorymodel.html">OCaml memory model</a> that was both high-performing and easy-to-use. The paper detailing the process is called <a href="https://kcsrk.info/papers/pldi18-memory.pdf"><em>Bounding Data Races in Space and Time</em></a>.</p> +<p>In two of the big technological challenges that faced the team, a clear focus on user experience emerged. As a language with deep roots in academia, at times rumoured to be &lsquo;difficult,&rsquo; focusing on improving user experience is an important part of making OCaml a language for everyone.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#the-people-behind-the-project" aria-label="the people behind the project permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The People Behind the Project</h2> +<p>Behind every project is a group of hardworking people. Stephen Dolan, Leo White, and Anil Madhavapeddy started the Multicore project back in 2014. Until 2018, KC, Stephen, and Leo were doing most of the hacking. After 2018, the team saw enormous growth with Sadiq Jaffer and Tom Kelly working together on the garbage collector. Today, there are around ten people hacking on Multicore OCaml at any given time, all working hard to ensure that OCaml 5 is a success.</p> +<p>The open-source community has also provided continuous, valuable feedback as work on Multicore OCaml has progressed. Every person who participates by sharing their opinions and experience helps the project more forward. Many core OCaml developers worked tirelessly to get OCaml 5.0 release ready. In particular we should highlight the support of Xavier Leroy, who spent a considerable amount of time and effort implementing changes to important pieces in the runtime to make them multicore compatible (such as closure representation, bytecode interpreter, etc.), as well as Gabriel Scherer for his enthusiastic support of Multicore features and the willingness to do an enormous amount of crucial work like reviewing a large number of Multicore PRs and additional features. The academic community has also actively utilised Multicore OCaml to push the boundaries of what is possible with effect handlers, and provided useful feedback and bug reports.</p> +<p>On the commercial side, Tezos has significantly helped the team test OCaml 5 by using multicore features for their tools <a href="https://tarides.com/blog/2022-12-20-how-nomadic-labs-used-multicore-processing-to-create-a-faster-blockchain">PLONK prover</a>. They&rsquo;ve made good progress using OCaml 5 and have been extremely helpful by reporting on bugs and their experience.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#sandmark" aria-label="sandmark permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Sandmark</h2> +<p>Over the course of OCaml Multicore&rsquo;s implementation, new tools have been developed to facilitate its creation. These tools are useful in and of themselves, and can be used in other projects. One tool born out of the OCaml Multicore push is the benchmarking suite <a href="https://github.com/ocaml-bench/sandmark">Sandmark</a>.</p> +<p>When Sadiq and Tom were working on the garbage collector, they had to understand how the change to a parallel garbage collector would affect non-parallel, sequential programs. To this end, they created Sandmark to benchmark different iterations.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#where-do-we-go-from-here" aria-label="where do we go from here permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Where Do We Go from Here?</h2> +<p>OCaml 5 is just the beginning, and from its release springs countless more opportunities. Several teams across the community are innovating on new features for OCaml. These features are at various levels of maturity and development, with small groups of developers testing some of them, whilst others are more or less in the ideation phase. Some of these features in development are listed below, but this is by no means an exhaustive list:</p> +<ul> +<li>Effects system: At the moment, there is no support from the OCaml type system to ensure that effect handlers are handled properly. An effect system is an extension of the type system that keeps track of which effects can be performed by an expression or a function, ensuring that effects are only performed in a context where a corresponding effect handler is set up to deal with them. In a language as well-established and large as OCaml, implementing new features comes with significant considerations. Backwards compatibility is a must, and the new system must work with the polymorphism, modularity, and generativity features already in place. For an early exploration of typed effect handlers in OCaml, check out <a href="https://www.janestreet.com/tech-talks/effective-programming/">Leo White's talk</a> from 2018.</li> +<li>JavaScript: OCaml has a very nice compiler to <a href="https://www.javascript.com">JavaScript</a>, but it couldn't compile effect handlers to JavaScript. Indeed, JavaScript does not provide a corresponding feature. A standard way to translate effect handlers is to transform the code into the so-called continuation-passing style (CPS). Functions require an extra argument: a one-argument continuation function. Instead of returning a result value, they call the continuation with this value. By making continuations explicit, one can then explicitly manipulate the control flow of the program, which makes it possible to support effect handlers. Js_of_ocaml has been recently modified to support effect handlers using this approach. This <a href="https://github.com/ocsigen/js_of_ocaml/pull/1340">preliminary implementation</a> has been released in <a href="https://discuss.ocaml.org/t/ann-js-of-ocaml-5-0/11008">Js_of_ocaml 5.0</a>. It has provided their team with a good understanding of how effect handlers work and what technical difficulties exist when supporting them in a compiler targeting JavaScript. However, CPS transformation comes with an important negative impact on performance. The team then implemented a <a href="https://github.com/ocsigen/js_of_ocaml/pull/1384">partial CPS transform</a> that removed some of the overheads with the CPS transformation. There are still some overheads due to CPS conversion that can be eliminated with smarter analysis and transformation. The team is considering trying alternative compilation techniques to support effect handlers. For example, there are implementation strategies that should have low overhead as long as no effect is performed at the cost of making effect handling slower. However, this might make the generated code much larger. The CPS-based implementation provides them with a point of comparison for undertaking this work.</li> +<li><a href="https://github.com/ocaml-flambda/ocaml-jst/tree/main/jane/doc">Local Allocations</a>: implemented by Stephen Dolan and Leo White: This feature adds support for stack-allocated blocks. It enforces memory safety by requiring that heap-allocated blocks never point to stack-allocated blocks, and stack-allocated blocks never point to shorter-lived stack-allocated blocks. This is a big addition to OCaml&rsquo;s type system and is still under development.</li> +<li>Unboxed Types: Currently in OCaml, all fields of a structure store values in a single-machine word. This word is further restricted by having to either point to a garbage-collected memory or be tagged to denote that the garbage collector should skip it. Unboxed types relax this restriction, allowing a field to hold values smaller or larger than a word. This can be used to save memory and to improve performances by avoiding some pointer dereferencing. To find out more, read the <a href="https://github.com/ocaml/RFCs/pull/34">proposal on unboxed types</a> on GitHub.</li> +</ul> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#the-legacy" aria-label="the legacy permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The Legacy</h2> +<p>The Multicore OCaml project has been full of challenges, successes, and surprises. Along the way, the team has developed and grown, learning important lessons and adapted their approach to best suit the needs of all OCaml users.</p> +<p>Making the leap from research to product is a complex process that takes time to execute properly. In computer science, it can take decades to get right. It&rsquo;s a massive achievement to get a revolutionary update like OCaml Multicore from concept to finished product in less than 8 years.</p> +<p>It&rsquo;s also an update suitable for everyone. Users who don&rsquo;t have a need for multicore features can carry on using OCaml like they always have, benefitting from other OCaml 5 features without having to change a line of code. On the other hand, the significant number of people who have long awaited the update can now benefit from having OCaml and all its strengths on multiple cores.</p> +<p>The story of OCaml Multicore is one of hard work and a dedication to learning. It speaks to anyone with a passion project that seems too innovative or experimental to succeed. With a strong team and a flexible, problem-solving approach, theory can quickly become reality.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#acknowledgements" aria-label="acknowledgements permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Acknowledgements</h2> +<p>A big thank you to KC Sivaramakrishnan, without whom this article would not be possible. Further thanks goes to Jer&ocirc;me Vouillon and Leo White for their expertise and contributions to the &lsquo;where do we go from here&rsquo; section of the article.</p> +<blockquote> +<p><a href="https://tarides.com/company">Contact Tarides</a> to see how OCaml can benefit your business and/or for support while learning OCaml. Follow us on <a href="https://twitter.com/tarides_">Twitter</a> and <a href="https://www.linkedin.com/company/tarides/">LinkedIn</a> to ensure you never miss a post, and join the OCaml discussion on <a href="https://discuss.ocaml.org/">Discuss</a>!</p> +</blockquote> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#sources-and-further-reading" aria-label="sources and further reading permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Sources and Further Reading</h2> +<ul> +<li> +<p>A collection of libraries, experiments, and ideas relating to OCaml 5: <a href="https://github.com/ocaml-multicore/awesome-multicore-ocaml">https://github.com/ocaml-multicore/awesome-multicore-ocaml</a></p> +</li> +<li> +<p>A wiki for Multicore OCaml. Note that it's not currently being maintained, so whilst it has much useful information, some migh be outdated: <a href="https://github.com/ocaml-multicore/ocaml-multicore/wiki">https://github.com/ocaml-multicore/ocaml-multicore/wiki</a></p> +</li> +<li> +<p>Information on Effect Handlers: <a href="https://kcsrk.info/webman/manual/effects.html">https://kcsrk.info/webman/manual/effects.html</a></p> +</li> +<li> +<p>Information on Parallelism: <a href="https://kcsrk.info/webman/manual/parallelism.html">https://kcsrk.info/webman/manual/parallelism.html</a></p> +</li> +<li> +<p>Information on Memory Models: <a href="https://kcsrk.info/webman/manual/memorymodel.html">https://kcsrk.info/webman/manual/memorymodel.html</a></p> +</li> +<li> +<p>Academic publications pertaining to OCaml Multicore: <a href="https://github.com/ocaml-multicore/awesome-multicore-ocaml#papers">https://github.com/ocaml-multicore/awesome-multicore-ocaml#papers</a></p> +</li> +<li> +<p>OCaml&rsquo;s home on the web: <a href="https://ocaml.org">https://ocaml.org</a></p> +</li> +</ul>https://tarides.com/blog/2023-03-02-the-journey-to-ocaml-multicore-bringing-big-ideas-to-lifeThe Journey to OCaml Multicore: Bringing Big Ideas to Life2023-03-02T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Today we're taking a little pause from our OCaml 5 series to talk about a programming retreat. I spent a week in the woods with fellow programmers at the <a href="https://anandology.com/lambda-retreat/">Lambda Retreat</a>. It was a wonderful way to explore the nature of computations, abrstractions, and paradigms. Although I mostly work in OCaml, it was fun and challenging to code in Scheme, another functional programming language.</p> +<p>For more OCaml 5 posts, visit our <a href="https://tarides.com/blog">Tarides blog</a> for posts about some exciting new features and interviews with OCaml programmers, <a href="https://tarides.com/blog/2023-01-10-engineer-spotlight-sudha-parimala">including one with me</a>! Next week, come back to read an article about how OCaml 5 performs during the <a href="https://benchmarksgame-team.pages.debian.net/benchmarksgame/index.html">Benchmarking Game</a>, but for now, read on for a Lambda Retreat Retrospective.</p> +<hr/> +<p><em>Structure and Interpretation of Computer Programs (SICP)</em> is many programmers' favourite programming textbook. It teaches programming constructs like recursion, modularity, abstractions, etc. For a long time, it was used as the textbook for an introduction to programming course. Here's what <a href="https://www.amazon.com/review/R403HR4VL71K8">Peter Norvig</a> and <a href="https://eli.thegreenplace.net/2008/05/28/book-review-structure-and-interpretation-of-computer-programs-by-harold-abelson-gerald-jay-sussman/">Eli Bendersky</a> have to say about SICP.</p> +<p>Having seen a lot of people highly recommend SICP, I grabbed a copy for myself a few years ago and started reading it, but, alas, I never completed the book.</p> +<p>In the latter part of last year, <a href="https://anandology.com/">Anand</a> decided to host a week-long retreat to gather a bunch of people and go through some interesting parts of SICP. Suffice it to say, I jumped at the opportunity to do nothing but read and write code for a week.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#getting-ready" aria-label="getting ready permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Getting Ready</h3> +<p>Two weeks before the retreat, we had some warm-up sessions to get ourselves ready. During this time, we attended some remote sessions and solved a few exercises from the Chapters 1 &amp; 2 of SICP.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#arriving" aria-label="arriving permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Arriving</h3> +<p>On Day 0, we all arrived at Bangalore from various parts of India, and carpooled to the <a href="https://tvc.farm/">Tamarind Valley Collective</a> (TVC), located ~80km from the city. Reaching TVC turned out to be an unexpected but enjoyable 1.5km trek, since the roads to the campsite were unusable due to rain.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#at-the-retreat" aria-label="at the retreat permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>At the Retreat</h3> +<p><strong>Functional Geometry</strong></p> +<p>The retreat began with Functional Geometry from the second chapter of SICP. We started with the basics, like rendering images, and slowly built the primitives needed for generating Escher's woodcut.</p> +<p>It was amazing to see the power of composability! We thoroughly enjoyed building Escher's woodcut from an unassuming image of a fish.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 305px; "> + <a href="https://tarides.com/static/91ad732ce9c8ddafb7f72036ecd368e1/a3e09/fish.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 80.58823529411765%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/91ad732ce9c8ddafb7f72036ecd368e1/a3e09/fish.png" class="gatsby-resp-image-image" alt="fish" title="fish" srcset="/static/91ad732ce9c8ddafb7f72036ecd368e1/04472/fish.png 170w, +/static/91ad732ce9c8ddafb7f72036ecd368e1/a3e09/fish.png 305w" sizes="(max-width: 305px) 100vw, 305px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p align="center"> + <span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 514px; "> + <a href="https://tarides.com/static/ae748b6f80360897b268809e102ce128/dea13/woodcut.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 94.70588235294117%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/ae748b6f80360897b268809e102ce128/dea13/woodcut.png" class="gatsby-resp-image-image" alt="Escher's Woodcut" title="Escher's Woodcut" srcset="/static/ae748b6f80360897b268809e102ce128/04472/woodcut.png 170w, +/static/ae748b6f80360897b268809e102ce128/9f933/woodcut.png 340w, +/static/ae748b6f80360897b268809e102ce128/dea13/woodcut.png 514w" sizes="(max-width: 514px) 100vw, 514px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span> +&nbsp; &nbsp; &nbsp; &nbsp; + <span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/35f581f60f5317fa1f360d000beff8b5/37523/group2.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 56.470588235294116%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/35f581f60f5317fa1f360d000beff8b5/c5bb3/group2.png" class="gatsby-resp-image-image" alt="Participants with Lambda Retreat t-shirt" title="Participants with Lambda Retreat t-shirt" srcset="/static/35f581f60f5317fa1f360d000beff8b5/04472/group2.png 170w, +/static/35f581f60f5317fa1f360d000beff8b5/9f933/group2.png 340w, +/static/35f581f60f5317fa1f360d000beff8b5/c5bb3/group2.png 680w, +/static/35f581f60f5317fa1f360d000beff8b5/37523/group2.png 720w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span> +</p> +<p>Anand rewarded everyone with an Escher's woodcut T-Shirt for successfully generating the <code>square-limit</code> \o/</p> +<p>We then abstracted out the implementation details for the Functional Geometry primitives we had built. The abstraction gives us the freedom to change the implementation at a later point without affecting the higher-level details.</p> +<p><strong>Mutability and State</strong></p> +<p>We spent some time understanding mutations, global state, and local state in Scheme, a functional language like OCaml. This led us to building some mutable data structures, like a mutable queue and a mutable hash table in Scheme, and generalising with a dispatcher to perform operations.</p> +<p><strong>Metacircular Interpreter</strong></p> +<p>Another exercise before the retreat was to write a parser for Scheme in Python. At the retreat, we started with translating it to Scheme. Going further, we built a metacircular interpreter -- a Scheme interpreter written in Scheme. How cool is that?</p> +<p>We then learnt about lazy evaluation in Scheme and went on to make our metacircular interpreter lazy by default. Another interesting part we looked forward to was targeting WebAssembly (Wasm) from Scheme. It was surprisingly simple to go from Scheme to Wasm, targeting Wasm's Lisp-like syntax.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#beyond-tech" aria-label="beyond tech permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Beyond Tech</h3> +<p>Living at TVC in the middle of a forest with barely any electricity or cellular network for a week was a humbling experience. Madhav, the resident manager at TVC, and his team made sure our stay was comfortable. The food, made from locally-sourced indgredients featuring local cuising was amazing!</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/23dba1a715848b57cf73cb75d4404490/eea4a/campsite.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 75.29411764705883%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/23dba1a715848b57cf73cb75d4404490/7bf67/campsite.jpg" class="gatsby-resp-image-image" alt="campsite" title="campsite" srcset="/static/23dba1a715848b57cf73cb75d4404490/651be/campsite.jpg 170w, +/static/23dba1a715848b57cf73cb75d4404490/d30a3/campsite.jpg 340w, +/static/23dba1a715848b57cf73cb75d4404490/7bf67/campsite.jpg 680w, +/static/23dba1a715848b57cf73cb75d4404490/990cb/campsite.jpg 1020w, +/static/23dba1a715848b57cf73cb75d4404490/eea4a/campsite.jpg 1280w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>The evening walks and hikes at TVC were memorable. We managed to sight some kingfishers and owls while snacking on some freshly plucked tamarinds. We had so much fun hiking along a stream that runs in the middle of TVC and capturing wisdom about sustainable living from Madhav and Vikrant, who put it into practice by living on farms.</p> +<p align="center"> + <span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 447px; "> + <a href="https://tarides.com/static/8091856d2296477e493c94038c366f82/a2d48/owl.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 76.47058823529412%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/8091856d2296477e493c94038c366f82/a2d48/owl.png" class="gatsby-resp-image-image" alt="An owl we spotted" title="An owl we spotted" srcset="/static/8091856d2296477e493c94038c366f82/04472/owl.png 170w, +/static/8091856d2296477e493c94038c366f82/9f933/owl.png 340w, +/static/8091856d2296477e493c94038c366f82/a2d48/owl.png 447w" sizes="(max-width: 447px) 100vw, 447px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span> +&nbsp; &nbsp; &nbsp; &nbsp; + <span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/1a6b3dd1fda189aabdd21e1591229a8d/eea4a/group.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 75.29411764705883%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/1a6b3dd1fda189aabdd21e1591229a8d/7bf67/group.jpg" class="gatsby-resp-image-image" alt="After a hike along the stream" title="After a hike along the stream" srcset="/static/1a6b3dd1fda189aabdd21e1591229a8d/651be/group.jpg 170w, +/static/1a6b3dd1fda189aabdd21e1591229a8d/d30a3/group.jpg 340w, +/static/1a6b3dd1fda189aabdd21e1591229a8d/7bf67/group.jpg 680w, +/static/1a6b3dd1fda189aabdd21e1591229a8d/990cb/group.jpg 1020w, +/static/1a6b3dd1fda189aabdd21e1591229a8d/eea4a/group.jpg 1280w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span> +</p> +<p>Our coworkers Pappu the hunting cat, her three kittens, and our boy Poco, the rugged looking sweet doggo, kept us company.</p> +<p align="center"> + <span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 537px; "> + <a href="https://tarides.com/static/ab433e19fe4942a338efd9f13e1adf60/b1cde/cat.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 77.64705882352942%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/ab433e19fe4942a338efd9f13e1adf60/b1cde/cat.png" class="gatsby-resp-image-image" alt="Pappu the cat" title="Pappu the cat" srcset="/static/ab433e19fe4942a338efd9f13e1adf60/04472/cat.png 170w, +/static/ab433e19fe4942a338efd9f13e1adf60/9f933/cat.png 340w, +/static/ab433e19fe4942a338efd9f13e1adf60/b1cde/cat.png 537w" sizes="(max-width: 537px) 100vw, 537px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span> +&nbsp; &nbsp; &nbsp; &nbsp; + <span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/2c0f98a83217cb6e933920968f817edf/eea4a/poco-dog.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 75.29411764705883%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/2c0f98a83217cb6e933920968f817edf/7bf67/poco-dog.jpg" class="gatsby-resp-image-image" alt="Hiking crew with Poco the dog" title="Hiking crew with Poco the dog" srcset="/static/2c0f98a83217cb6e933920968f817edf/651be/poco-dog.jpg 170w, +/static/2c0f98a83217cb6e933920968f817edf/d30a3/poco-dog.jpg 340w, +/static/2c0f98a83217cb6e933920968f817edf/7bf67/poco-dog.jpg 680w, +/static/2c0f98a83217cb6e933920968f817edf/990cb/poco-dog.jpg 1020w, +/static/2c0f98a83217cb6e933920968f817edf/eea4a/poco-dog.jpg 1280w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span> +</p> +<p>Our days of hacking were followed by board games in evenings and nights. We had so much fun and laughter riots playing games like Skull, Chameleon, and Ticket to Ride Europe. By the end of the retreat, we were surprised by how little internet and social media we had consumed that week!</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/f0ce14e49dabc260374548b09e1b66c9/dc896/game.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 119.41176470588235%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/f0ce14e49dabc260374548b09e1b66c9/7bf67/game.jpg" class="gatsby-resp-image-image" alt="game" title="game" srcset="/static/f0ce14e49dabc260374548b09e1b66c9/651be/game.jpg 170w, +/static/f0ce14e49dabc260374548b09e1b66c9/d30a3/game.jpg 340w, +/static/f0ce14e49dabc260374548b09e1b66c9/7bf67/game.jpg 680w, +/static/f0ce14e49dabc260374548b09e1b66c9/990cb/game.jpg 1020w, +/static/f0ce14e49dabc260374548b09e1b66c9/dc896/game.jpg 1294w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>I'm grateful to have had the opportunity to attend the first ever Lambda Retreat and hope to carry the functional programming spirit forward. It was super nice meeting all the enthusiastic and kind people at the retreat, and I hope to see everyone again at future events.</p> +<p>Thanks to Anand for organising it and to everyone who attended for making it an enjoyable experience. Thanks also to Madhav and his team at TVC for ensuring our stay was comfortable.</p> +<hr/> +<p>Check out our series on Multicore OCaml, a project I've worked on for the last several years, starting with the <a href="https://tarides.com/blog/2022-12-19-ocaml-5-with-multicore-support-is-here">announcement of the OCaml 5 release</a>. If you'd like to know more about OCaml 5, you can start with <a href="https://youtu.be/6BhmRz7eqiE">KC's keynote address</a> from the ICFP 2022 conference, <a href="https://ocaml.org/docs">OCaml tutorials</a>, and the informative book <a href="https://www.cambridge.org/core/books/real-world-ocaml-functional-programming-for-the-masses/052E4BCCB09D56A0FE875DD81B1ED571"><em>Real World OCaml</em></a>.</p> +<blockquote> +<p><a href="https://tarides.com/company">Contact Tarides</a> to see how OCaml can benefit your business and/or for support while learning OCaml. Follow us on <a href="https://twitter.com/tarides_">Twitter</a> and <a href="https://www.linkedin.com/company/tarides/">LinkedIn</a> to ensure you never miss a post, and join the OCaml discussion on <a href="https://discuss.ocaml.org/">Discuss</a>!</p> +</blockquote>https://tarides.com/blog/2023-01-12-lambda-retreat-reportLambda Retreat Report2023-01-12T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>For our third and final Engineer Spotlight, we interviewed Sudha Parimala, a Tarides engineer who works primarily on the Multicore Applications team. She talks about what lead her to become an OCaml programmer and why she's excited about OCaml 5, as our blog series on <a href="https://tarides.com/blog/2022-12-19-ocaml-5-with-multicore-support-is-here">Multicore OCaml</a> continues.</p> +<hr/> +<p><strong>Christine: Why did you decide to become an OCaml programmer rather than Python or C++?</strong></p> +<p>Sudha: My programming journey started with Python in high school. At that point I didn't really know much about programming, and I picked it only as an alternative to studying biology. Python's human language-like syntax made it easy to grasp the concepts as a novice, while also getting a feel of programming constructs. The education board decided to switch to C++, and I ended up learning OOP as a result. I continued learning C, Java, and such during my undergrad.</p> +<p>Then I discovered Haskell and was hooked. I found it wild that I could write a 200+ lines Java program with just 10 lines in Haskell. I got an opportunity to participate in a summer school organised by ACM India on Programming Language Design. This deepened my interest in Functional Programming (FP). After graduating, I got an opportunity to join KC's Multicore OCaml team at IIT Madras. I started learning OCaml then, and there's no looking back.</p> +<p><strong>C: What do you like best in OCaml?</strong></p> +<p>S: I like OCaml's features combined with its practicality. When I started learning OCaml, I could relate to a lot of general FP concepts I had learnt through other languages. At the same time one can easily do imperative programming (mutations) or I/O with ease.</p> +<p><strong>C: What&rsquo;s the coolest thing you've made with OCaml?</strong></p> +<p>S: When I was working on the Multicore OCaml compiler, I found it really cool that we could easily connect OCaml directly with C, with the type safety of OCaml. If I may add a futuristic take on this is, I'd find it really cool to do hobby projects of mine with the entire stack - from web scraping, to talking to databases, to creating web apps - in OCaml.</p> +<p><strong>C: Why should engineers learn OCaml?</strong></p> +<p>S: I'd recommend learning OCaml to anyone curious to learn new and succint ways of expressing programs. OCaml definitely changes the way you think about programs, and I'm sure that reflects on how you write programs, in OCaml and elsewhere.</p> +<p><strong>C: What are you most excited about in OCaml 5?</strong></p> +<p>S: I'm really excited to see the OCaml world transition to Multicore. It will be a challenging, yet rewarding journey. Challenging because OCaml programs for more than two decades have been designed for single core. Rewarding, thanks to blazing fast performance time and direct style concurrency. The Multicore and OCaml development teams have invested time in ensuring backwards compatibility, which will hopefully ease the process a bit.</p> +<hr/> +<p>Thank you so much, Sudha, for taking the time to answer these questions about your experience with OCaml. Also read <a href="https://tarides.com/blog/2023-01-05-engineer-spotlight-zach-shipko">Zach Shipko's</a> and <a href="https://tarides.com/blog/2022-12-29-engineer-spotlight-jules-aguillon">Jules Aguillon's</a> interviews for their take! Thanks to their willingness to share, other developers can see why they should learn OCaml as their next language.</p> +<p>Feel like learning OCaml? Get started with the <a href="https://ocaml.org/docs">tutorials</a> and the <a href="https://www.cambridge.org/core/books/real-world-ocaml-functional-programming-for-the-masses/052E4BCCB09D56A0FE875DD81B1ED571"><em>Real World OCaml</em></a> book. Get a nice overview of what OCaml 5 has to offer by watching <a href="https://youtu.be/6BhmRz7eqiE">KC Sivaramamakrishnan's keynote address</a>.</p> +<blockquote> +<p><a href="https://tarides.com/company">Contact Tarides</a> to see how OCaml can benefit your business and/or for support while learning OCaml. Follow us on <a href="https://twitter.com/tarides_">Twitter</a> and <a href="https://www.linkedin.com/company/tarides/">LinkedIn</a> to ensure you never miss a post, and join the OCaml discussion on <a href="https://discuss.ocaml.org/">Discuss</a>!</p> +</blockquote>https://tarides.com/blog/2023-01-10-engineer-spotlight-sudha-parimalaEngineer Spotlight: Sudha Parimala2023-01-10T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Tarides engineer Zach Shipko answers a few questions about why he decided to learn OCaml and why he's particularly excited about the OCaml 5 release. In celebration of OCaml 5, we've interviewed several engineers about their personal experience with the language and what features they enjoy. It's a great way to get some unique insight into the language from someone who works with it on a daily basis.</p> +<hr/> +<p><strong>Christine: Why did you decide to become an OCaml programmer rather than Python or C++?</strong></p> +<p>Zach: I don't really see myself as an &quot;OCaml programmer&quot; because I use Python, C, Rust, Javascript, and other languages quite frequently. It's my interest in many different programming languages that led me to OCaml!</p> +<p><strong>C: What do you like best in OCaml?</strong></p> +<p>Z: One of my favorite things about OCaml is the amount of thought put into new language features. Because of this I think the whole community values the importance of API design and correctness.</p> +<p><strong>C: What&rsquo;s the coolest thing you've made with OCaml?</strong></p> +<p>Z: I have been working on <a href="https://github.com/mirage/irmin/blob/main/libirmin.opam"><code>libirmin</code></a>, which provides C bindings to the Irmin API, making it possible to use Irmin directly from C and other languages. This uses <a href="https://github.com/yallop/ocaml-ctypes/tree/master/src/cstubs"><code>Cstubs_inverted</code></a> to wrap OCaml code in C functions. I don't know how &quot;cool&quot; that is, but everytime it works I am pleasantly surprised.</p> +<p><strong>C: Why should engineers learn OCaml?</strong></p> +<p>Z: Learning a new programming language can help you see problems from a new perspective and it gives you another tool to reach for when needed. OCaml has lots of nice features (pattern matching, functors, ...) that make solving certain problems more fun.</p> +<p><strong>C: What are you most excited about in OCaml 5?</strong></p> +<p>Z: Other than being able to use multiple cores, I am very excited about Effects (and eventually typed effects). It is an entirely new paradigm for writing applications with a lot of research behind it. To have a usable effects system in a general-purpose language like OCaml is a huge accomplishment!</p> +<hr/> +<p>Zach emphasises the importance of expanding your horizons as a programmer, learning new languages to give you fresh perspectives and insights. Perhaps especially when learning a language like OCaml. Like the esteemed <a href="https://en.wikipedia.org/wiki/Alan_Perlis">Alan Perlis</a> said, &quot;A language that doesn't affect the way you think about programming is not worth knowing.&quot;</p> +<p>Zach also studied photography and digital media in college. He took the pictures at the top of this post and said he &quot;typically picks photos like this to have some nature to look at on programming websites.&quot;</p> +<p>Read <a href="https://tarides.com/blog/2022-12-29-engineer-spotlight-jules-aguillon">Jules Aguillon's</a> interview from 27 December 2022 to learn about his journey to OCaml. Next week, look for our final Engineer Spotlight interview with Sudha Parimala, as well as a post from her on the Benchmarking Game.</p> +<p>Feel like learning OCaml? Get started with <a href="https://ocaml.org/docs">the tutorials</a> and the <a href="https://www.cambridge.org/core/books/real-world-ocaml-functional-programming-for-the-masses/052E4BCCB09D56A0FE875DD81B1ED571"><em>Real World OCaml</em></a> book. Learn more about effects and other things OCaml 5 has to offer? Watch KC Sivaramamakrishnan <a href="https://youtu.be/6BhmRz7eqiE">keynote</a> and check out the <a href="https://speakerdeck.com/kayceesrk/ocaml-5-dot-0">speaker deck</a> for his talk as well.</p> +<blockquote> +<p><a href="https://tarides.com/company">Contact Tarides</a> to see how OCaml can benefit your business and/or for support while learning OCaml. Follow us on <a href="https://mobile.twitter.com/tarides_">Twitter</a> and <a href="https://www.linkedin.com/company/tarides/">LinkedIn</a> to ensure you never miss a post, and join the OCaml discussion on <a href="https://discuss.ocaml.org">Discuss</a>!</p> +</blockquote>https://tarides.com/blog/2023-01-05-engineer-spotlight-zach-shipkoEngineer Spotlight: Zach Shipko2023-01-05T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>In celebration of the OCaml 5 release, we decided to interview a few of our talented engineers about OCaml. While it isn't a well-known language outside of the functional programming community, we're striving to get the word out about the great benefits of OCaml and why it's worth your time to try it out, especially with the introduction of <a href="https://tarides.com/blog/2022-12-19-ocaml-5-with-multicore-support-is-here">Multicore support in OCaml 5</a>.</p> +<p>KC Sivaramakrishnan's inspiring <a href="https://www.youtube.com/watch?v=zJ4G0TKwzVc">keynote address</a> is a great introduction to OCaml 5 and all it offers. Check out the <a href="https://speakerdeck.com/kayceesrk/retrofitting-concurrency-lessons-from-the-engine-room">speaker deck</a> as well.</p> +<p>Today our engineer Jules Aguillon, who works primarily on our OCaml Platform tooling project, talks about his journey to OCaml, what he enjoys about the language, and why he thinks you should learn it! Take it away Jules!</p> +<hr/> +<p><strong>Christine: How did you start programming?</strong></p> +<p>Jules: My programming journey began with learning C in school, but I soon realised I wanted more abstraction. It felt like the C code was always talking about low-level stuff instead of what I wanted to express. In C, translations are a lot of work and get harder as the algorithm becomes more complex. Things like ASTs and polymorphic datastructures are also really hard to write in C.</p> +<p>Some of the ways C works are not so innocent, it supports the kinds of dangerous memory operations that <a href="https://www.zdnet.com/article/microsoft-70-percent-of-all-security-bugs-are-memory-safety-issues/">famously cause 70% of all security bugs</a> in some big corporations.</p> +<p><strong>C: What did you do to get that abstraction that you were looking for?</strong></p> +<p>J: I decided to learn C++, which is C but with classes (grouping code and data together), abstracted memory operations (making them safer by default), and more type checking.</p> +<p>But I quickly found that C++ also wasn't a good fit, mainly due to its use of boilerplate. Boilerplate refers to pieces of code that must be repeated in various places without significant change, wrapping around every concept you try to express in C++. They are used to represent several complicated concepts, and any mistake in the boilerplate could bring things like memory unsafety back. I wanted to abstract this away, too.</p> +<p><strong>C: What did you do next?</strong></p> +<p>J: To finally write shorter and safe code, I tried Python. It was a joy to use compared to the previous unsafe and verbose C and C++. The garbage collector solved the memory-unsafety problems, and the built-in datastructures and idioms allowed me to write many complex algorithms using only a small amount of code.</p> +<p>But Python has a dark side: it entirely lacks static type checking. This means that it requires considerable effort to find a type-related mistake. The only way is to run the program with different inputs and wait until it crashes. This gets really annoying as the program grows.</p> +<p>Furthermore, this kind of mistake happens all the time (sometimes once in every line of code!) and could be entirely solved by a type checker.</p> +<blockquote> +<p>&quot;For me, this is already the perfect language and it doesn't stop there!&quot;</p> +</blockquote> +<p><strong>C: Is this where OCaml comes in?</strong></p> +<p>J: Yes! Then I learned OCaml! It's unconditionally memory-safe, has a garbage collector, the code is concise, many kinds of abstractions are possible, and most importantly, it has well-defined and powerful type checking.</p> +<p>For me, this is already the perfect language and it doesn't stop there! Modules, polymorphism, and higher-order functions all add deep abstraction possibilities, and there's even a more important feature. Variant types allow types to have different shapes and write tree-looking things like ASTs (abstract syntax trees) that are impossibly hard to express in all the languages above and many others that I have tried.</p> +<p>Theoreticians talk about algebra of types, and this is the &quot;plus&quot; operation. Now that I've used it in OCaml, I could never go back to a language that doesn't have the &quot;plus&quot; operation!</p> +<hr/> +<p>A big thank you to Jules for taking the time to speak about his experience with OCaml. Getting a personal account of why he chose OCaml gives great insight into the strengths and features of the language from someone who uses it every day.</p> +<p>If you're interested in learning more about OCaml you can learn from <a href="https://ocaml.org/docs">tutorials</a>, the <a href="https://www.cambridge.org/core/books/real-world-ocaml-functional-programming-for-the-masses/052E4BCCB09D56A0FE875DD81B1ED571">Real World OCaml book</a>, and contribute <a href="https://github.com/ocaml/ocaml">on Github</a>. We look forward to seeing you in the community!</p>https://tarides.com/blog/2022-12-29-engineer-spotlight-jules-aguillonEngineer Spotlight: Jules Aguillon2022-12-29T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>We&rsquo;ve come to expect a lot from the programming languages we use. We want the memory safety of Java, the performance of C/C++, and the concurrency of Go. On top of this, we need robust cybersecurity tools to protect us from the many risks and vulnerabilities in the world, all in an intuitive and easy-to-use package for programmers.</p> +<p>You can expect all of the above with OCaml 5. The new library <a href="https://github.com/ocaml-multicore/eio">Eio</a> introduces some great new features that let the programmer write concurrent code in a way that best suits them. Eio is fast, solves the <a href="https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/">function colouring problem</a>, and can use effect handlers to let the developer customise the scheduling algorithm rather than baking it in at runtime. If you&rsquo;re a fan of how Rust delivers fast and high performing concurrent code, OCaml 5&rsquo;s Eio is a close match, with some additional features.</p> +<p>Eio gives OCaml a new edge on speed, ease-of-use, portability, and security. It matches Rust&rsquo;s reputed performance on the same points, making the two languages more comparable when it comes to concurrent programming. Let me know what you think of my comparison on <a href="https://discuss.ocaml.org">Discuss</a> or <a href="https://twitter.com/tarides_">Twitter</a>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#how-eio-makes-concurrent-code-quick-and-easy" aria-label="how eio makes concurrent code quick and easy permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>How Eio Makes Concurrent Code Quick and Easy</h2> +<p>Rust is a <a href="https://codilime.com/blog/why-is-rust-programming-language-so-popular/">popular programming language</a> that solves a lot of problems for programmers. One of Rust&rsquo;s strengths is the way it delivers concurrent code in a quick and safe manner.</p> +<p>According to a <a href="https://codilime.com/blog/why-is-rust-programming-language-so-popular/">recent blog post</a>, &ldquo;Rust solves problems that C/C++ developers have been struggling with for a long time: memory errors and concurrent programming. This is seen as its main benefit.&rdquo;</p> +<p>Eio makes writing concurrent code in OCaml much easier, resolving earlier pain points and providing significant benefits. OCaml is also a type- and memory-safe language with a low-latency and high-throughput concurrent garbage collector that doesn't get in the way of application code execution.</p> +<p>Below is an overview of the biggest changes Eio brings to concurrent programming in OCaml. With these improvements, OCaml provides a concurrent programming experience comparable to Rust.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#key-benefits-of-eio" aria-label="key benefits of eio permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Key Benefits of Eio</h2> +<p><em>Performance</em> +Eio brings big performance improvements to concurrent code in OCaml, making use cases like web servers serving requests from users a lot faster. In a speed test comparing Eio&rsquo;s performance to Go&rsquo;s <code>net/http</code> and Rust&rsquo;s <code>hyper</code>, the results show that Eio outperforms Go and closely matches Rust. Eio can reliably serve over one million requests per second on a few cores. Being such a close match in terms of performance, OCaml is a strong contender for users looking to expand beyond using Rust.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/9f0b97bdb5cfc231e1a387bb218f08c4/133ae/http_load1.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 66.47058823529413%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/9f0b97bdb5cfc231e1a387bb218f08c4/c5bb3/http_load1.png" class="gatsby-resp-image-image" alt="EioImage" title="EioImage" srcset="/static/9f0b97bdb5cfc231e1a387bb218f08c4/04472/http_load1.png 170w, +/static/9f0b97bdb5cfc231e1a387bb218f08c4/9f933/http_load1.png 340w, +/static/9f0b97bdb5cfc231e1a387bb218f08c4/c5bb3/http_load1.png 680w, +/static/9f0b97bdb5cfc231e1a387bb218f08c4/b12f7/http_load1.png 1020w, +/static/9f0b97bdb5cfc231e1a387bb218f08c4/b5a09/http_load1.png 1360w, +/static/9f0b97bdb5cfc231e1a387bb218f08c4/133ae/http_load1.png 1424w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p><em>Ease-of-Use</em> +One of Rust&rsquo;s strengths is its user friendliness, meaning it&rsquo;s easy to use and code in. With the OCaml 5 update, Eio makes OCaml a much easier language for writing concurrent code.</p> +<p>Eio offers an alternative to monadic I/O, which used to be the only way to write concurrent code in OCaml. Now, with OCaml 5, the developer experience is greatly simplified and will feel familiar to anyone who knows OCaml.</p> +<p>Patrick Ferris, a developer at Tarides, emphasises that with Eio:</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#normal-ocaml-features-just-work-out-of-the-box-like-exception-backtraces-which-are-crucial-for-writing-new-libraries-using-tools-and-debugging-programs" aria-label="normal ocaml features just work out of the box like exception backtraces which are crucial for writing new libraries using tools and debugging programs permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>&ldquo;Normal OCaml features just work out of the box, like exception backtraces, which are crucial for writing new libraries, using tools, and debugging programs.&rdquo;</h3> +<p>Being able to use backtraces is a big improvement, as they can show the developer a form of &lsquo;history&rsquo; of what interventions have been made to a program. Using it in Eio lets developers debug and troubleshoot more quickly. Having access to standard OCaml features also means that the more difficult parts of concurrent programming, such as cancellations, cleaning up resources, reporting errors, and testing are a lot easier to perform with Eio.</p> +<p>The biggest change users will notice is the resolution of the &lsquo;code colouring problem.&rsquo; In the past, synchronous code could not exist alongside asynchronous code without breaking, requiring the developer to use a special calling convention to invoke asynchronous code. With Eio, that is no longer a problem, as both just appear as normal OCaml functions. This significantly improves developer experience and productivity and is unique to Eio. Currently, in Rust&rsquo;s I/O library the &lsquo;code colouring problem&rsquo; still exists, and developers have to spend time resolving conflicts between code types.</p> +<p><em>Portability</em> +Both Rust and OCaml offer excellent portability, and with Eio OCaml gets several quality-of-life updates that lets developers create programs in different environments with several different features.</p> +<p>Operating systems have changed a lot in the last decade, benefiting from continuous development and modernisation. Thanks to its flexibility, Eio is able to take advantage of modern OS features (such as Linux&rsquo;s <a href="https://github.com/axboe/liburing">io_uring</a> to boost its own performance.</p> +<p>In turn, different backends for various platforms (such as Linux, MacOS, Windows, Mirage, etc.) can also implement the standard environment Eio expects to run programs. This adds an element of predictability to developer workflow, minimising the amount of task-switching and time spent outside of programming.</p> +<p>On the topic of Eio&rsquo;s flexibility, Thomas Leonard, the creator and lead maintainer of Eio, highlights that:</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#eio-can-also-run-existing-lwt-and-async-code-alongside-new-code-allowing-existing-projects-to-be-upgraded-piece-by-piece-keeping-the-tests-passing-throughout-the-migration-a-couple-of-lines-of-code-is-all-it-takes-to-make-an-existing-lwt-application-run-on-eio-and-from-there-any-new-code-can-use-eio-directly" aria-label="eio can also run existing lwt and async code alongside new code allowing existing projects to be upgraded piece by piece keeping the tests passing throughout the migration a couple of lines of code is all it takes to make an existing lwt application run on eio and from there any new code can use eio directly permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>&ldquo;Eio can also run existing Lwt and Async code alongside new code, allowing existing projects to be upgraded piece by piece, keeping the tests passing throughout the migration. A couple of lines of code is all it takes to make an existing Lwt application run on Eio, and from there any new code can use Eio directly.&rdquo;</h3> +<p>Instead of having to pick one I/O tool to learn, Eio&rsquo;s library can run all three in the same program which is completely new.</p> +<p><em>Security</em> +Both Rust and OCaml offer strong safety features. According to the blog <a href="https://codilime.com/blog/why-is-rust-programming-language-so-popular/">Codilime</a>, &ldquo;High performance and safety are the features that made Rust so appealing.&rdquo; Well, Eio adds even more security features to OCaml&rsquo;s already long list.</p> +<p>Eio allows developers to implement measures with great specificity, which has great significance when it comes to security. For example, Eio lets a developer program a web server to serve files only from within specified directory trees, removing the possibility that it could be tricked into serving other files. This ensures that the I/O only shares what it&rsquo;s intended to do without being bypassed, which is a common security problem with web servers.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusion" aria-label="conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusion</h2> +<p>Many businesses and programmers love Rust, and for good reason! What OCaml 5 and Eio can offer is an alternative that matches Rust on performance and user friendliness, includes new cutting-edge features, and delivers on safety in a way that is uniquely OCaml. With its new Eio library, concurrent programming in OCaml becomes more similar to Rust, and out of the two only OCaml solves the function colouring problem. If you&rsquo;re looking to complement your use of Rust with a robust functional programming language &ndash; without sacrificing performance &ndash; OCaml 5 is the language for you.</p> +<p><a href="https://tarides.com/company">Contact us</a> and learn more about how OCaml can transform your business. You can also find us on <a href="https://github.com/tarides">GitHub</a>, the OCaml <a href="https://discuss.ocaml.org">Discuss</a> forum, and <a href="https://twitter.com/tarides_">Twitter</a></p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#acknowledgements" aria-label="acknowledgements permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Acknowledgements</h3> +<p>With thanks to Thomas Leonard, creator and lead maintainer of Eio, and Patrick Ferris, a developer at Tarides, for their expertise and input that made this article possible.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#sources" aria-label="sources permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Sources</h3> +<ul> +<li> +<p><a href="https://codilime.com/blog/why-is-rust-programming-language-so-popular/">Codilime</a> blog</p> +</li> +<li> +<p><a href="https://serokell.io/blog/rust-guide">Serokell</a> blog</p> +</li> +</ul>https://tarides.com/blog/2022-12-27-love-rust-then-ocaml-s-new-eio-library-is-for-youLove Rust? Then OCaml's New Eio Library is for You2022-12-27T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>The new version of OCaml 5 is here! It brings the ability to program multicore applications and to maximise our usage of all the CPU cores without a global lock getting in the way of performance. What's most exciting to me though is that we have a whole new way of writing... bugs!</p> +<p>And with so much potential for mistakes comes a new era of testing tools to help us write correct applications:</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#memory-model" aria-label="memory model permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Memory Model</h2> +<p>The first of those is the <a href="https://kcsrk.info/webman/manual/memorymodel.html"><em>memory model</em> of OCaml 5</a>. If you already know what those two words mean, please skip this part because I won't pretend that I do. (I'm still convinced that it's just some fancy legalese terms to confuse people.) But it may actually matter when you realise that you've been living in a fantasy your whole CPU life:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml">left <span class="token operator">:=</span> <span class="token number">42</span> <span class="token punctuation">;</span> +right <span class="token operator">:=</span> <span class="token number">0</span> <span class="token punctuation">;</span></code></pre></div> +<p>When I read these two lines, I have years of beliefs telling me that the reference <code>left</code> will be updated before the <code>right</code> one is. But modern compilers and hardware conspire to break any sanity that may exist in my brain. You see, the order of operations doesn't actually matter <em>if</em> you can't see that the CPU is doing things in another order. It may just happen that the compiler or CPU will choose to do those two operations in reverse if they think it would be more convenient. And without this, our software would be so much slower that <em>&quot;instructions are executed in order&quot;</em> is an essential lie. (Well, is it even a lie if it makes no difference?)</p> +<p>To catch a liar, you need a second observer to correlate their claims. This is exactly what the other CPU cores will do. The bad news is that they are not actively looking for bad behavior from their colleagues, but they will end up reading values that aren't quite written in the order expected. This will wreak havoc into your invariants and trigger very, very weird bugs.</p> +<p>I'm not kidding when I say &quot;very, very weird.&quot; Below is a real example of an out-of-order read/write that happened on my computer. This was a very simple program, with only two references, <code>left</code> and <code>right</code>, that got updated by two different domains (shown as two branches here):</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text"> !left = 42 + !right = 0 + | + .------------------------------------. + | | + | left := 1 + | right := 2 + right := 3 | + !left = 42 !right = 3 + ^^^^ + how?</code></pre></div> +<p>I tried to align the sequence of operations according to the observed memory values, but no ordering actually made sense. We can't have both <code>!left = 42</code> and <code>!right = 3</code> in the end.</p> +<p>Here's another attempt to align the instructions in a coherent way:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text"> !left = 42 + !right = 0 + | + .------------------------------------. + | | + right := 3 | + !left = 42 | + left := 1 + right := 2 + !right = 3 + ^^^^ + how?</code></pre></div> +<p>It already requires some time to unpack this short example, but imagine how bad it would get to debug such a thing in production!</p> +<p>I want to stress that this specific execution wasn't the result of a compiler optimisation that we could have discovered by reading the assembly code. The program was running just fine over many iterations before being disturbed by a sudden hardware optimisation. The probability of observing this behavior from your CPU is very low---not low enough that you can ignore it, but you won't be able to reproduce this exact bug in any reasonable time. (But we'll see how to catch our CPU cores red-handed in the following sections!)</p> +<p>But ok, wait---come again. How is any of this nonsense a good memory model? For starters, the values you can read &quot;out-of-order&quot; are still real values that have been assigned to the references, not imaginary ones. Yes, it could be even weirder, but you don't want to know. It's all fun and games with integers, but this property really matters for pointers (where following the wrong one would lead you down a segmentation fault). You need to be wary of this in other languages, but not in OCaml. Memory safety is preserved. It's not an instant <em>Game Over</em> to do an accidental out-of-order read.</p> +<p>Secondly, when reading and writing to shared memory, you should use the new <code>Atomic</code> module to ensure the proper memory ordering of operations. This will introduce the required memory barriers to bring back sanity---at a small performance cost---so it's opt-in and only required for shared memory! (Note: you can also use a <code>Mutex</code> lock to protect your read/write into shared memory.)</p> +<p>In technical words, OCaml 5 programs enjoy the <em>&quot;Sequential Consistency for Data Race Freedom (DRF-SC)&quot;</em> property. If your program has no data races, then you can reason about your code under sequential consistency where the operations from different threads are interleaved with each other, but the instructions don't seem to be executed out of order.</p> +<p><a href="https://kcsrk.info/webman/manual/memorymodel.html">Read more about the memory model in the OCaml 5 manual</a></p> +<p>By using <code>Atomic</code>, we are back in the wonderful land where operations happen in the expected order! The memory model becomes a tool for your brain, enabling it to reason about your algorithms. This one is so intuitive that I can once again pretend that it doesn't exist (without getting hit by an unexpected bug later.)</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#threadsanitizer" aria-label="threadsanitizer permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>ThreadSanitizer</h2> +<p>Alright, so how do we check that our programs aren't susceptible to an &quot;out-of-order&quot; bug caused by a missing <code>Atomic</code> or <code>Mutex</code>? ThreadSanitizer was created by Google as a lightweight instrumentation to discover these runtime data races.</p> +<p>To enable it on your OCaml program, you&rsquo;ll need a special compiler that adds the required instrumentation to your software. Don't worry, it&rsquo;s super easy to setup thanks to opam switches!</p> +<p><a href="https://github.com/ocaml-multicore/ocaml-tsan">Install and usage informations on the <code>ocaml-tsan</code> repository</a></p> +<p>As a running example to demonstrate the usefulness of each tool, let's look at different implementations of simple banking accounts, where users can transfer money to each other <em>if</em> they have enough money in their account:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">module</span> Bank <span class="token operator">=</span> <span class="token keyword">struct</span> + <span class="token keyword">type</span> t <span class="token operator">=</span> int array + + <span class="token keyword">let</span> transfer t from_account to_account money <span class="token operator">=</span> + <span class="token keyword">if</span> money <span class="token operator">&gt;</span> <span class="token number">0</span> <span class="token comment">(* no negative transfer! *)</span> + <span class="token operator">&amp;&amp;</span> from_account <span class="token operator">&lt;&gt;</span> to_account <span class="token comment">(* or transfer to self! *)</span> + <span class="token operator">&amp;&amp;</span> t<span class="token punctuation">.</span><span class="token punctuation">(</span>from_account<span class="token punctuation">)</span> <span class="token operator">&gt;=</span> money <span class="token comment">(* and you must have enough money! *)</span> + <span class="token keyword">then</span> <span class="token keyword">begin</span> + t<span class="token punctuation">.</span><span class="token punctuation">(</span>from_account<span class="token punctuation">)</span> <span class="token operator">&lt;-</span> t<span class="token punctuation">.</span><span class="token punctuation">(</span>from_account<span class="token punctuation">)</span> <span class="token operator">-</span> money <span class="token punctuation">;</span> + t<span class="token punctuation">.</span><span class="token punctuation">(</span>to_account<span class="token punctuation">)</span> <span class="token operator">&lt;-</span> t<span class="token punctuation">.</span><span class="token punctuation">(</span>to_account<span class="token punctuation">)</span> <span class="token operator">+</span> money <span class="token punctuation">;</span> + <span class="token keyword">end</span> +<span class="token keyword">end</span></code></pre></div> +<p>This module could be part of a much larger program that receives transaction requests from the network and handles them. For simplicity here, we'll only be running a small simulation, but ThreadSanitizer is intended to be used on large real programs with messy I/O and side effects, not just broken toys and unit tests.</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> t <span class="token punctuation">:</span> Bank<span class="token punctuation">.</span>t <span class="token operator">=</span> Array<span class="token punctuation">.</span>make <span class="token number">8</span> <span class="token number">100</span> <span class="token comment">(* 8 accounts with $100 each *)</span> + +<span class="token keyword">let</span> money_shuffle <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token comment">(* simulate an economy *)</span> + <span class="token keyword">for</span> <span class="token punctuation">_</span> <span class="token operator">=</span> <span class="token number">0</span> <span class="token keyword">to</span> <span class="token number">10</span> <span class="token keyword">do</span> + Unix<span class="token punctuation">.</span>sleepf <span class="token number">0.1</span> <span class="token punctuation">;</span> <span class="token comment">(* wait for a network request *)</span> + Bank<span class="token punctuation">.</span>transfer t <span class="token punctuation">(</span>Random<span class="token punctuation">.</span>int <span class="token number">8</span><span class="token punctuation">)</span> <span class="token punctuation">(</span>Random<span class="token punctuation">.</span>int <span class="token number">8</span><span class="token punctuation">)</span> <span class="token number">1</span> <span class="token punctuation">;</span> <span class="token comment">(* transfer $1 *)</span> + <span class="token keyword">done</span> + +<span class="token keyword">let</span> account_balances <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token comment">(* inspect the bank accounts *)</span> + <span class="token keyword">for</span> <span class="token punctuation">_</span> <span class="token operator">=</span> <span class="token number">0</span> <span class="token keyword">to</span> <span class="token number">10</span> <span class="token keyword">do</span> + Array<span class="token punctuation">.</span>iter <span class="token punctuation">(</span>Format<span class="token punctuation">.</span>printf <span class="token string">&quot;%i &quot;</span><span class="token punctuation">)</span> t <span class="token punctuation">;</span> Format<span class="token punctuation">.</span>printf <span class="token string">&quot;@.&quot;</span> <span class="token punctuation">;</span> + Unix<span class="token punctuation">.</span>sleepf <span class="token number">0.1</span> <span class="token punctuation">;</span> + <span class="token keyword">done</span> + +<span class="token keyword">let</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token comment">(* run the simulation and the debug view in parallel *)</span> + <span class="token operator-like-punctuation punctuation">[|</span> Domain<span class="token punctuation">.</span>spawn money_shuffle <span class="token punctuation">;</span> Domain<span class="token punctuation">.</span>spawn account_balances <span class="token operator-like-punctuation punctuation">|]</span> + <span class="token operator">|&gt;</span> Array<span class="token punctuation">.</span>iter Domain<span class="token punctuation">.</span>join</code></pre></div> +<p>It should be pretty clear that our code is not thread-safe and that transferring money while printing the account balances is asking for trouble! Running it with ThreadSanitizer enabled will print warnings into the terminal as soon as a potential data-race is observed (and it's even better in real life as the output is colorful!):</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">WARNING: ThreadSanitizer: data race (pid=1178477) + Write of size 8 at 0x7fc4936fd6b0 by thread T4 (mutexes: write M87): + #0 camlDune__exe__V0.transfer_317 &lt;null&gt; (v0.exe+0x6ae1a) + #1 camlDune__exe__V0.money_shuffle_325 &lt;null&gt; (v0.exe+0x6af8d) + .. ... + + Previous read of size 8 at 0x7fc4936fd6b0 by thread T1 (mutexes: write M83): + #0 camlStdlib__Array.iter_329 &lt;null&gt; (v0.exe+0x9c675) + #1 camlDune__exe__V0.account_balances_563 &lt;null&gt; (v0.exe+0x6b054) + .. ...</code></pre></div> +<p>The issue is reported very clearly thanks to the two conflicting stacktraces. There's a read/write data-race happening between the <code>money_shuffle</code> execution and the <code>account_balances</code> one, which could result in unreasonable memory reordering artifacts. In fact, it would be even worse if we were to <code>transfer</code> money from multiple domains in parallel (which we'll attempt to do in the next section as an interesting way of speeding up our bank transactions with Multicore).</p> +<p>It looks like we can fix the read/write data-race by adding a <code>Mutex</code> lock around the <code>transfer</code> function <em>write</em> operations:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> lock <span class="token operator">=</span> Mutex<span class="token punctuation">.</span>create <span class="token punctuation">(</span><span class="token punctuation">)</span> + +<span class="token keyword">let</span> transfer t from_account to_account money <span class="token operator">=</span> + Mutex<span class="token punctuation">.</span>lock lock <span class="token punctuation">;</span> + <span class="token comment">(* ... same as before ... *)</span> + Mutex<span class="token punctuation">.</span>unlock lock</code></pre></div> +<p>But ThreadSanitizer is not easily fooled and will still complain loudly. We also need to use the same <code>Mutex</code> to protect the array reads in the <code>account_balances</code> function, as it would otherwise be perfectly valid to optimise away the shared memory reads into oblivion:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> account_balances_optimized <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token comment">(* faster... but wrong-er! *)</span> + <span class="token keyword">let</span> str <span class="token operator">=</span> String<span class="token punctuation">.</span>concat <span class="token string">&quot; &quot;</span> <span class="token operator">@@</span> Array<span class="token punctuation">.</span>of_list <span class="token operator">@@</span> Array<span class="token punctuation">.</span>map string_of_int t <span class="token keyword">in</span> + <span class="token keyword">for</span> <span class="token punctuation">_</span> <span class="token operator">=</span> <span class="token number">0</span> <span class="token keyword">to</span> <span class="token number">10</span> <span class="token keyword">do</span> + Format<span class="token punctuation">.</span>printf <span class="token string">&quot;%s @.&quot;</span> str <span class="token punctuation">;</span> + Unix<span class="token punctuation">.</span>sleepf <span class="token number">0.1</span> <span class="token punctuation">;</span> + <span class="token keyword">done</span></code></pre></div> +<p>The data races reported by ThreadSanitizer are not only the ones where an absurd &ldquo;out-of-order&rdquo; happened, but preemptively, all those that could potentially trigger such a problem. If you are porting an existing multi-threaded application to OCaml 5, this compiler variant should probably be your default debug build.</p> +<p>Note that the ThreadSanitizer instrumentation does add a performance cost and doesn't increase memory safety by itself. Run it for a bit, track down your shared memory misuses, and add the required <code>Atomic</code> and <code>Mutex</code> operations.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#multicore-tests-lin-and-stm" aria-label="multicore tests lin and stm permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Multicore Tests: <code>Lin</code> (and <code>STM</code>)</h2> +<p>How do we unit test our Multicore libraries? It's business as usual, and the standard Alcotest will do well, for example. But there are some new properties that we should look for when writing and using libraries in a multicore setting. Let's revisit the bank accounts implementation by using <code>Atomic</code> operations this time rather than a <code>Mutex</code>:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">module</span> Bank <span class="token operator">=</span> <span class="token keyword">struct</span> + <span class="token keyword">type</span> t <span class="token operator">=</span> int Atomic<span class="token punctuation">.</span>t array + + <span class="token keyword">let</span> get t client <span class="token operator">=</span> Atomic<span class="token punctuation">.</span>get t<span class="token punctuation">.</span><span class="token punctuation">(</span>client<span class="token punctuation">)</span> + + <span class="token keyword">let</span> transfer t from_account to_account money <span class="token operator">=</span> + <span class="token keyword">if</span> money <span class="token operator">&gt;</span> <span class="token number">0</span> + <span class="token operator">&amp;&amp;</span> from_account <span class="token operator">&lt;&gt;</span> to_account + <span class="token operator">&amp;&amp;</span> get t from_account <span class="token operator">&gt;=</span> money + <span class="token keyword">then</span> <span class="token keyword">begin</span> + <span class="token comment">(* [fetch_and_add x v] is an atomic operation that does [x := !x + v] *)</span> + <span class="token keyword">let</span> <span class="token punctuation">_</span> <span class="token punctuation">:</span> int <span class="token operator">=</span> Atomic<span class="token punctuation">.</span>fetch_and_add t<span class="token punctuation">.</span><span class="token punctuation">(</span>from_account<span class="token punctuation">)</span> <span class="token punctuation">(</span><span class="token operator">-</span> money<span class="token punctuation">)</span> <span class="token keyword">in</span> + <span class="token keyword">let</span> <span class="token punctuation">_</span> <span class="token punctuation">:</span> int <span class="token operator">=</span> Atomic<span class="token punctuation">.</span>fetch_and_add t<span class="token punctuation">.</span><span class="token punctuation">(</span>to_account<span class="token punctuation">)</span> money <span class="token keyword">in</span> + <span class="token punctuation">(</span><span class="token punctuation">)</span> + <span class="token keyword">end</span> +<span class="token keyword">end</span></code></pre></div> +<p>See how careful I was to use <code>Atomic</code> to read and write from shared memory? (I even used a fancy <code>fetch_and_add</code>!) Therefore, it must be correct if used by different domains, right? While this program doesn't have a data race, the definition of &quot;correctness&quot; is more subtle in a multicore setting.</p> +<p>It's easier to explain if I show you the problem. To test this interface with the <code>Lin</code> library, we only need to describe how to <code>init</code>ialise a new bank and the API signature of available functions:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">open</span> Lin + +<span class="token keyword">module</span> Bank_test <span class="token operator">=</span> <span class="token keyword">struct</span> + <span class="token keyword">type</span> t <span class="token operator">=</span> Bank<span class="token punctuation">.</span>t + + <span class="token keyword">let</span> init <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token comment">(* 8 accounts with $100 each *)</span> + Array<span class="token punctuation">.</span>init <span class="token number">8</span> <span class="token punctuation">(</span><span class="token keyword">fun</span> <span class="token punctuation">_</span> <span class="token operator">-&gt;</span> Atomic<span class="token punctuation">.</span>make <span class="token number">100</span><span class="token punctuation">)</span> + <span class="token keyword">let</span> cleanup <span class="token punctuation">_</span> <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> + + <span class="token keyword">let</span> account <span class="token operator">=</span> int_bound <span class="token number">7</span> <span class="token comment">(* array index between 0..7 *)</span> + <span class="token keyword">let</span> api <span class="token operator">=</span> <span class="token punctuation">[</span> + val_ <span class="token string">&quot;get&quot;</span> Bank<span class="token punctuation">.</span>get <span class="token punctuation">(</span>t <span class="token operator">@-&gt;</span> account <span class="token operator">@-&gt;</span> returning int<span class="token punctuation">)</span> <span class="token punctuation">;</span> + val_ <span class="token string">&quot;transfer&quot;</span> Bank<span class="token punctuation">.</span>transfer + <span class="token punctuation">(</span>t <span class="token operator">@-&gt;</span> account <span class="token operator">@-&gt;</span> account <span class="token operator">@-&gt;</span> nat_small <span class="token operator">@-&gt;</span> returning unit<span class="token punctuation">)</span> <span class="token punctuation">;</span> + <span class="token punctuation">]</span> +<span class="token keyword">end</span> + +<span class="token keyword">module</span> Run <span class="token operator">=</span> Lin_domain<span class="token punctuation">.</span>Make <span class="token punctuation">(</span>Bank_test<span class="token punctuation">)</span> + +<span class="token keyword">let</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> + QCheck_base_runner<span class="token punctuation">.</span>run_tests_main <span class="token punctuation">[</span>Run<span class="token punctuation">.</span>lin_test <span class="token label property">~count</span><span class="token punctuation">:</span><span class="token number">1000</span> <span class="token label property">~name</span><span class="token punctuation">:</span><span class="token string">&quot;Bank&quot;</span><span class="token punctuation">]</span></code></pre></div> +<p>That's all! A small DSL to define the types of our functions and we are done. Run this test to admire the beautiful ASCII art... craAaAash:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text"> Results incompatible with sequential execution + + | + | + .------------------------------------. + | | + transfer t 4 0 8 : () transfer t 4 0 93 : () + get t 4 : -1</code></pre></div> +<p><code>Lin</code> found a bug! The account number <code>4</code> is trying to simultaneously transfer $8 and $93 to account number <code>0</code>. It then naturally ends up with a negative $1 on its account, which is obviously bad for a banking system... but we never told <code>Lin</code> that this was illegal, so why is it complaining?</p> +<p>In the tradition of QuickCheck, <code>Lin</code> not only generates random arguments to test our API, but it also creates full programs to execute on two domains. It then runs these generated programs and checks if the intermediate results are &quot;sequentially consistent,&quot; the property of a well-behaved API where we can always explain its multicore behavior as a linear execution of the calls on a single core.</p> +<p>Without this &quot;sequential consistency&quot; property, the internals of our functions leak when multiple cores interleave their execution. In the example above, it wouldn't be possible to reach a negative $1 account on a single core, so <code>Lin</code> reports that sequential consistency is broken. It doesn't know that negative accounts are illegal, but it knows that this state is unreachable without multicore shenanigans.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/1ef04969f6bdb69e1799b67f09b8acfe/c1b63/nonseq.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 32.35294117647059%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/1ef04969f6bdb69e1799b67f09b8acfe/c5bb3/nonseq.png" class="gatsby-resp-image-image" alt="nonseq" title="nonseq" srcset="/static/1ef04969f6bdb69e1799b67f09b8acfe/04472/nonseq.png 170w, +/static/1ef04969f6bdb69e1799b67f09b8acfe/9f933/nonseq.png 340w, +/static/1ef04969f6bdb69e1799b67f09b8acfe/c5bb3/nonseq.png 680w, +/static/1ef04969f6bdb69e1799b67f09b8acfe/b12f7/nonseq.png 1020w, +/static/1ef04969f6bdb69e1799b67f09b8acfe/c1b63/nonseq.png 1200w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>Even though this is not a data race, a user of our library would have an equally hard time understanding the outcomes of our functions, when they depend so much on their accidental interleaving. <em>&quot;It sometimes doesn't work&quot;</em> is not a bug report I wish to see!</p> +<p>An intuitive way of thinking about &quot;sequential consistency&quot; is that our functions should behave as if they were a single atomic operation: Either we see none of their side effects, or we see all of them. It shouldn't be possible to see an in between, as this would result in a non-sequentialisable execution.</p> +<p>Once again, the easiest solution here is to use a <code>Mutex</code> to lock all the accounts during a transfer and when reading an account balance. Run the test suite again with <code>Lin</code>, and yep, we are safe. The operations are now sequentially consistent! (But we don't need the Atomics anymore with the <code>Mutex</code>.)</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/e315b196ba7386891ed868b245f87e1d/c1b63/seqmutex.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 21.176470588235293%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/e315b196ba7386891ed868b245f87e1d/c5bb3/seqmutex.png" class="gatsby-resp-image-image" alt="seqmutex" title="seqmutex" srcset="/static/e315b196ba7386891ed868b245f87e1d/04472/seqmutex.png 170w, +/static/e315b196ba7386891ed868b245f87e1d/9f933/seqmutex.png 340w, +/static/e315b196ba7386891ed868b245f87e1d/c5bb3/seqmutex.png 680w, +/static/e315b196ba7386891ed868b245f87e1d/b12f7/seqmutex.png 1020w, +/static/e315b196ba7386891ed868b245f87e1d/c1b63/seqmutex.png 1200w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>I really like how low effort / high reward <code>Lin</code> is. In just a few lines of declarative code, we can check that our code is correct when running on multiple cores. It's very extensive in its testing, which is just what we need when bugs are this hard to reproduce. The Multicore testing suite also provides a state-machine interface <code>STM</code>, which allows you to specify more properties that your system should respect (not only sequential consistency, but custom business logic!)</p> +<p><a href="https://github.com/ocaml-multicore/multicoretests">More examples on the <code>multicoretests</code> repository</a></p> +<p>Fun fact: The earlier &quot;out-of-order&quot; memory read/write on mutable references was also generated by <code>Lin</code>. While this tool is not specialised like ThreadSanitizer for discovering data-races, it can still trigger and identify the hardware memory reordering since they produce outcome that can't be explained on a single core. Here's the complete test if you want to see your computer memory <del>misbehaving</del> optimising:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">open</span> Lin + +<span class="token keyword">module</span> Int_array <span class="token operator">=</span> <span class="token keyword">struct</span> + <span class="token keyword">type</span> t <span class="token operator">=</span> int array + <span class="token keyword">let</span> init <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token operator-like-punctuation punctuation">[|</span> <span class="token number">0</span> <span class="token punctuation">;</span> <span class="token number">0</span> <span class="token operator-like-punctuation punctuation">|]</span> + <span class="token keyword">let</span> cleanup <span class="token punctuation">_</span> <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> + <span class="token keyword">let</span> index <span class="token operator">=</span> int_bound <span class="token number">1</span> + <span class="token keyword">let</span> api <span class="token operator">=</span> <span class="token punctuation">[</span> + val_ <span class="token string">&quot;get&quot;</span> Array<span class="token punctuation">.</span>get <span class="token punctuation">(</span>t <span class="token operator">@-&gt;</span> index <span class="token operator">@-&gt;</span> returning int<span class="token punctuation">)</span> <span class="token punctuation">;</span> + val_ <span class="token string">&quot;set&quot;</span> Array<span class="token punctuation">.</span>set <span class="token punctuation">(</span>t <span class="token operator">@-&gt;</span> index <span class="token operator">@-&gt;</span> int <span class="token operator">@-&gt;</span> returning unit<span class="token punctuation">)</span> <span class="token punctuation">;</span> + <span class="token punctuation">]</span> +<span class="token keyword">end</span> + +<span class="token keyword">module</span> Run <span class="token operator">=</span> Lin_domain<span class="token punctuation">.</span>Make <span class="token punctuation">(</span>Int_array<span class="token punctuation">)</span> +<span class="token keyword">let</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> QCheck_base_runner<span class="token punctuation">.</span>run_tests_main <span class="token punctuation">[</span>Run<span class="token punctuation">.</span>lin_test <span class="token label property">~count</span><span class="token punctuation">:</span><span class="token number">10_000</span> <span class="token label property">~name</span><span class="token punctuation">:</span><span class="token string">&quot;Array&quot;</span><span class="token punctuation">]</span></code></pre></div> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#dscheck" aria-label="dscheck permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Dscheck</h2> +<p>While adding a <code>Mutex</code> restores the sequential consistency of the <code>transfer</code> function, it's unsatisfying to slow down all transactions with a global lock. Most transfers are going to happen on different accounts, so could we be more precise in our safety measures? This is far from easy with locks, if we don't want to end up bankrupt with <a href="https://en.wikipedia.org/wiki/Dining_philosophers_problem">hungry philosophers</a>!</p> +<p>One alternative solution is called lock-free programming, and as the name implies, it gets rid of all the locks---but at the cost of more complex algorithms. By using only <code>Atomic</code> operations, there are ways of encoding our <code>transfer</code> operation without blocking the other cores (such that they don't get stuck waiting on the unrelated threads to finish their transaction).</p> +<p>Lock-free algorithms have a bad reputation of being crazy hard to implement correctly. It's very easy to convince yourself that you found the right solution, only to discover that your algorithm only works when the OS scheduler is on your side (which it generally is...until it isn't). This is another type of hard-to-reproduce bug. We can't coerce the OS scheduler to be evil when testing our software.</p> +<p>Our last testing tool is the library <code>dscheck</code>. It provides a way to <em>exhaustively</em> test all the possible schedulings of a Multicore execution in order to discover the worst-case scenario that would lead to a crash. It does so by simulating parallelism on a single core using concurrency, thanks to algebraic effects and a custom scheduler. Dscheck is very fast because it doesn't test <em>all</em> possible interleaving but only the ones that matters.</p> +<p>In order to use it, you only need to replace the <code>Atomic</code> module by a custom one, and then write your unit test. Here I simply copy-pasted the bug generated by <code>Lin</code> earlier:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">module</span> Atomic <span class="token operator">=</span> Dscheck<span class="token punctuation">.</span>TracedAtomic +<span class="token keyword">module</span> Test <span class="token operator">=</span> Dscheck<span class="token punctuation">.</span>TracedAtomic + +<span class="token keyword">module</span> Bank <span class="token operator">=</span> <span class="token keyword">struct</span> <span class="token comment">(* same as before, but now using the traced Atomic *)</span> <span class="token keyword">end</span> + +<span class="token keyword">let</span> test <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> + <span class="token keyword">let</span> t <span class="token operator">=</span> Array<span class="token punctuation">.</span>init <span class="token number">2</span> <span class="token punctuation">(</span><span class="token keyword">fun</span> <span class="token punctuation">_</span> <span class="token operator">-&gt;</span> Atomic<span class="token punctuation">.</span>make <span class="token number">100</span><span class="token punctuation">)</span> <span class="token keyword">in</span> + Test<span class="token punctuation">.</span>spawn <span class="token punctuation">(</span><span class="token keyword">fun</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">-&gt;</span> Bank<span class="token punctuation">.</span>transfer t <span class="token number">0</span> <span class="token number">1</span> <span class="token number">8</span><span class="token punctuation">)</span> <span class="token punctuation">;</span> <span class="token comment">(* fake a Domain.spawn *)</span> + Bank<span class="token punctuation">.</span>transfer t <span class="token number">0</span> <span class="token number">1</span> <span class="token number">93</span> <span class="token punctuation">;</span> + <span class="token keyword">assert</span> <span class="token punctuation">(</span>Bank<span class="token punctuation">.</span>get t <span class="token number">0</span> <span class="token operator">&gt;=</span> <span class="token number">0</span><span class="token punctuation">)</span> + +<span class="token keyword">let</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> Test<span class="token punctuation">.</span>trace test <span class="token comment">(* exhaustively test all interleaving *)</span></code></pre></div> +<p>Dscheck will then run our <code>test</code> function multiple times, discovering all the interesting paths that the scheduler could lead us down, and finally outputs a visualisation describing the worst-case scheduling that lead to crashes:</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/72fbe0068c73cadf22e018262c394504/ad997/dscheck.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 107.6470588235294%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/72fbe0068c73cadf22e018262c394504/c5bb3/dscheck.png" class="gatsby-resp-image-image" alt="dscheck" title="dscheck" srcset="/static/72fbe0068c73cadf22e018262c394504/04472/dscheck.png 170w, +/static/72fbe0068c73cadf22e018262c394504/9f933/dscheck.png 340w, +/static/72fbe0068c73cadf22e018262c394504/c5bb3/dscheck.png 680w, +/static/72fbe0068c73cadf22e018262c394504/ad997/dscheck.png 1012w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>This is a low-level view of the bug. By inspecting the sequence of <code>Atomic</code> operations along the bad paths, we can discover the origin of the problem: the code is not careful when removing money from an account.</p> +<p>Perhaps this would work better:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> <span class="token keyword">rec</span> transfer t from_account to_account money <span class="token operator">=</span> + <span class="token keyword">let</span> money_from <span class="token operator">=</span> get t from_account <span class="token keyword">in</span> + <span class="token keyword">if</span> money <span class="token operator">&gt;</span> <span class="token number">0</span> + <span class="token operator">&amp;&amp;</span> from_account <span class="token operator">&lt;&gt;</span> to_account + <span class="token operator">&amp;&amp;</span> money_from <span class="token operator">&gt;=</span> money + <span class="token keyword">then</span> <span class="token keyword">begin</span> + <span class="token keyword">if</span> Atomic<span class="token punctuation">.</span>compare_and_set t<span class="token punctuation">.</span><span class="token punctuation">(</span>from_account<span class="token punctuation">)</span> money_from <span class="token punctuation">(</span>money_from <span class="token operator">-</span> money<span class="token punctuation">)</span> + <span class="token keyword">then</span> <span class="token keyword">let</span> <span class="token punctuation">_</span> <span class="token punctuation">:</span> int <span class="token operator">=</span> Atomic<span class="token punctuation">.</span>fetch_and_add t<span class="token punctuation">.</span><span class="token punctuation">(</span>to_account<span class="token punctuation">)</span> money <span class="token keyword">in</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> + <span class="token keyword">else</span> transfer t from_account to_account money <span class="token comment">(* retry *)</span> + <span class="token keyword">end</span></code></pre></div> +<p>And yes, Dscheck now happily validates all possible interleaving of this unit test!</p> +<p><a href="https://github.com/ocaml-multicore/dscheck">More examples on the <code>dscheck</code> repository</a></p> +<p>So, does it mean our banking system works now? Nope! <code>Lin</code> reports new counter examples that break sequential consistency. I told you that lock-free was hard! Still, we can keep iterating, and we will eventually get it right because our tools remove the doubt and the impossibility of reproducibility that would otherwise make the task insurmountable. It's like having tiny assistants to double-check our assumptions. I love it. I've never been so excited to test my software!</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text"> get t 3 : 100 + get t 4 : 100 + | + .------------------------------------. + | | + | transfer t 4 3 10 : () + get t 4 : 90 + get t 3 : 100 + ^^^^^ + how? account 4 sent the money, but account 3 didn't receive it!</code></pre></div> +<p>Would you have caught this bug? Can you fix it? ;)</p> +<p>This was a tiny example, and it already brought some surprises. The multicore testing <code>Lin</code>, <code>STM</code>, and <code>dscheck</code> have been applied to real datastructures with great success. In fact, I wouldn't trust lock-free algorithms that were not validated by them.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusion" aria-label="conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusion</h2> +<p>It's 2022 and OCaml is finally Multicore. Even if this article is only scratching the surface of a specific itch, I hope it has convinced you that the Multicore metamorphose wasn't only about lifting a global lock somewhere in the runtime. A lot of care and attention also went into creating a great environment to tackle really hard problems. Here we've only looked at:</p> +<ul> +<li>The memory model to be able to reason about our programs</li> +<li>ThreadSanitizer to detect dangerous use of shared memory</li> +<li><code>Lin</code> and <code>STM</code> to discover logical bugs in a multicore setting</li> +<li>Dscheck to validate unit tests of lock-free algorithms by exhaustively checking all possible interleavings of their Atomic operations</li> +</ul> +<p>There's still a lot more to discover in the latest release of OCaml. In the meantime, not only can we do Multicore, we can do it with confidence that our code works!</p>https://tarides.com/blog/2022-12-22-ocaml-5-multicore-testing-toolsOCaml 5 Multicore Testing Tools2022-12-22T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Merlin is one of the most important tools for OCaml users, but a lot of its +advanced feature often remain unknown. For OCaml newcomers who might not know, Merlin is the server software that provides intelligence to code editors when working on OCaml documents. It allows one to easily navigate the code, get meaningful information (like type information), and perform code generation and refactoring tasks. Merlin installation and usage is documented on its <a href="https://ocaml.github.io/merlin/">official webpage</a>.</p> +<p>Merlin is distributed with both an Emacs and a Vim plugin. It can also be used in Vscode via the OCaml LSP Server and the corresponding plugin.</p> +<p>In this post, we will focus on two complementary features of Merlin: the venerable <code>destruct</code> and the younger <code>construct</code>. Both of these leverage OCaml's precise type information to destruct or create expressions.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#destruct" aria-label="destruct permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Destruct</h2> +<p>Destruct (sometimes called case-analysis) uses the type of an identifier to +perform multiple tasks related to pattern-matching. It can be called with the +following key bindings:</p> +<ul> +<li>Emacs: <kbd>C-d</kbd> or <kbd>M-x merlin-destruct</kbd></li> +<li>Vim: <kbd>:MerlinDestruct</kbd></li> +<li>VSCode: <kbd>Alt-d</kbd> or <kbd>&#128161; Destruct</kbd></li> +</ul> +<p>Destruct's behavior changes slightly depending on the context around the cursor. We are going to describe how it behaves in the next three sections.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#automatic-case-analysis" aria-label="automatic case analysis permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Automatic Case Analysis</h3> +<p>The primary use case for Destruct is to generate a pattern-matching for a +given value. Let's consider the following snippet:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">let f (x : int option) = x </code></pre></div> +<p>Calling <code>destruct</code> on the right-most occurrence of <code>x</code> will automatically generate the following pattern-matching with the two constructors of <code>x</code>'s' option type:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">let f (x : int option) = match x with + | None -&gt; _ + | Some _ -&gt; _</code></pre></div> +<p>What happened is that Merlin looked at the type of <code>x</code> and generated a complete pattern-matching by enumerating its constructors.</p> +<p>Notice that Merlin used underscores on the right-handsides of the matching. We call these underscores <em>typed holes</em>. These holes are rejected by the compiler, but Merlin will provide type information for them. These holes should not be confused with the wildcard pattern appearing on the left handside <code>Some _</code>.</p> +<p>After calling <code>destruct</code>, the cursor should have jumped to the first hole. In +Emacs (resp. Vim), you can navigate between holes by using the commands <kbd>M-x merlin-next-hole</kbd> (resp. <kbd>:MerlinNextHole</kbd>) and <kbd>M-x merlin-previous-hole</kbd> (resp. <kbd>:MerlinPreviousHole</kbd>). In VSCode, you can use <kbd>Alt-y</kbd> to jump to the next typed hole.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#complete-a-matching" aria-label="complete a matching permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Complete a Matching</h3> +<p>Merlin can also add missing branches to an incomplete matching. Given +the following snippet:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">let f (x : int option) = match x with + | None -&gt; _</code></pre></div> +<p>Calling <code>destruct</code> with the cursor on <code>None</code> will make the pattern-matching +exhaustive:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">let f (x : int option) = match x with + | None -&gt; _ + | Some _ -&gt; _</code></pre></div> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#refine-the-cases" aria-label="refine the cases permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Refine the Cases</h3> +<p>Finally, Merlin can be used to make a pattern-matching more precise when called on a <em>wildcard</em> pattern <code>_</code>. Given the following snippet:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">let f (x : int option opton) = match x with + | None -&gt; _ + | Some _ -&gt; _</code></pre></div> +<p>Calling <code>destruct</code> with the cursor on the <code>_</code> pattern in <code>Some _</code> will refine the matching:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">let f (x : int option option) = match x with + | None -&gt; _ + | Some (None) | Some (Some _) -&gt; _</code></pre></div> +<p>Note that Destruct also works with other types, like records. Let's consider the following snippet:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">type t = { a : string option } +let f (x : t) = x</code></pre></div> +<p>Calling <code>destruct</code> on the last occurrence of <code>x</code> will yield:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">let f (x : t) = match x with + | { a } -&gt; _</code></pre></div> +<p>And we can refine it by calling <code>destruct</code> again on <code>a</code>, etc.</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">let f (x : t) = match x with + | { a = None } | { a = Some _ } -&gt; _</code></pre></div> +<p>That wraps our presentation for <code>destruct</code>. Generating and completing pattern- +matching cases can be very useful when working with large sum types !</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#construct" aria-label="construct permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Construct</h2> +<p>Construct can be considered as the dual of Destruct, as they work +complementarily. When called over a typed-hole <code>_</code>, Construct will suggest +values that can fill that hole. It can be called with the following key +bindings:</p> +<ul> +<li>Emacs: <kbd>M-x merlin-construct</kbd></li> +<li>Vim: <kbd>:MerlinConstruct</kbd></li> +<li>VSCode: <kbd>Alt-c</kbd> of <kbd>&#128161; Construct an expression</kbd> (the cursor must be right after the <code>_</code>)</li> +</ul> +<p>For example, given the following snippet:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">let x : int option = _</code></pre></div> +<p>Calling <code>construct</code> with the cursor on the <code>_</code> typed hole will suggest the following constructions:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">Some _ +None</code></pre></div> +<p>Choosing the first one will replace the hole and place the cursor on the next hole:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">let x : int option = (Some _)</code></pre></div> +<p>Calling <code>construct</code> again will suggest <code>0</code> and result in:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">let x : int option = (Some 0)</code></pre></div> +<p>In the future, Construct might also suggest fitting values from the local +environment instead of solely rely on a type's constructors.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#destruct-and-construct" aria-label="destruct and construct permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Destruct and Construct</h2> +<p>As stated previously calls to <code>destruct</code> and <code>construct</code> can be used in +collaboration. For example, after calling <code>destruct</code> on <code>x</code> in the following code snippet:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">type t = { a : unit; b : string option } +let f (x : int option) : t option = x</code></pre></div> +<p><code>x</code> is replaced by a matching on <code>x</code> with the cursor on the first hole:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">let f (x : int option) : t option = + match w with + | None -&gt; _ + | Some _ -&gt; _</code></pre></div> +<p>One can immediately call <code>construct</code> and choose a construction for the first branch:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">let f (x : int option) : t option = + match w with + | None -&gt; None + | Some _ -&gt; _</code></pre></div> +<p>And again for the second branch:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">let f (x : int option) : t option = + match w with + | None -&gt; None + | Some _ -&gt; Some _</code></pre></div> +<p>Finally, like Destruct, Construct also works with records and most OCaml types:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">Some _ &rarr; Some { a = _; b = _ } &rarr; Some { a = (); b = None }</code></pre></div> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusion" aria-label="conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusion</h2> +<p>When put to good use, these complementary features can remove some of the burden of working with big variant types. We encourage you to try them and see if they help your everyday workflow! If you encounter any issues or have ideas for improvement, please communicate them to us <a href="https://github.com/ocaml/merlin/issues">via the issue tracker</a>.</p>https://tarides.com/blog/2022-12-21-advanced-merlin-features-destruct-and-constructAdvanced Merlin Features: Destruct and Construct2022-12-21T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>The technology that makes blockchain possible is complex, cutting-edge, and fascinating. Balancing efficiency and security on a knife's edge, it finds perfect harmony between high transaction speeds and safe, predictable results. Blockchain technology is constantly evolving, pushing the boundaries of what&rsquo;s possible, and driving innovation and research. Discover how Tarides helped Nomadic Labs use <a href="https://speakerdeck.com/kayceesrk/retrofitting-concurrency-lessons-from-the-engine-room">OCaml 5</a> to boost blockchain performance with rollup technology.</p> +<p>In order to remain competitive with other major players in the blockchain market (including Bitcoin and Ethereum), the open-source blockchain Tezos invests in cutting-edge research and engineering. Another way Tezos has chosen to differentiate itself in a crowded market is by focusing on sustainability and scalable efficiency. Tezos is proof-of-stake (rather than proof-of-work) and is therefore more <a href="https://tezos.com/carbon/">energy efficient</a> than other blockchain technologies, with the annual energy consumption estimated at 0.001TWh, or &ldquo;17 global citizens.&rdquo; Furthermore, the <a href="https://4c.cst.cam.ac.uk">Cambridge Centre for Carbon Credits (4C)</a> is using the Tezos blockchain to create a trusted decentralised marketplace for carbon credits. The sheer amount of data needed to support verifiable carbon credits requires planetary-scale computations, coupled with the digital permanence needed to track projects over decades and even generations.</p> +<p>A growing number of users and companies are bringing their projects to the Tezos blockchain. To respond to increasing demand, the network is preparing for more activity and high-throughput applications. To that end, Nomadic Labs is working on implementing leading-edge rollup technology for the Tezos blockchain. Rollups are a way of increasing throughput and blockchain speeds without compromising on decentralisation, latency, stability, security, or its resistance to censorship.</p> +<p>To achieve their goal of offering a major scaling solution for Tezos, Nomadic Labs called in Tarides. By leveraging the multicore capabilities of OCaml 5, the team was able to achieve significant performance boosts, making it a viable solution for increasing both blockchain speeds and transactions per second (TPS).</p> +<p>Marco Stronati co-leads the <a href="https://research-development.nomadic-labs.com/files/cryptography.html">cryptography team</a> at Nomadic Labs with cryptographer Marc Beunardeau. Looking back on the decision to use an early version of OCaml 5, Marco said, &ldquo;We knew OCaml 5 was not ready yet, but that backward compatibility was an explicit goal of the project, so we thought we would put that to the test.&rdquo;</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#who-are-nomadic-labs" aria-label="who are nomadic labs permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Who Are Nomadic Labs?</h2> +<p><a href="https://www.nomadic-labs.com">Nomadic Labs</a> is one of the largest research and development centres within the open-source <a href="https://tezos.com">Tezos ecosystem.</a> They work on the Tezos core technologies that run its distributive network as one of the largest research and development centres within the Tezos ecosystem.</p> +<p>Nomadic Labs handles software releases and amendments to the Tezos blockchain, focusing on the innovation, development, and implementation of new features. They help companies and institutions use Tezos to their advantage. At its core, the company values dependability, balancing cutting-edge innovation with reliable and consistent results.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#what-is-a-rollup-and-how-does-it-boost-the-speed-of-the-blockchain" aria-label="what is a rollup and how does it boost the speed of the blockchain permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What Is a Rollup and How Does it Boost the Speed of the Blockchain?</h2> +<p>Rollups settle transactions outside the network and then post data back into it. This is what&rsquo;s called a <a href="https://research-development.nomadic-labs.com/tezos-is-scaling.html">Layer 2 solution</a>, which avoids the main chain. By handling the process externally, they reduce the strain on the network. Rollups help blockchains like Tezos keep transaction speeds and throughput high without compromising the integrity of the blockchain. It is one of the many different scalability solutions a blockchain may employ to offer top-level performance. There are two main types of rollups: optimistic and zero-knowledge (zk).</p> +<p><a href="https://research-development.nomadic-labs.com/next-generation-rollups.html">Optimistic rollups</a> assume that the transaction data they&rsquo;re processing is correct, and any fraud or other problem with the transaction is handled separately. Zk-rollups on the other hand use zero-knowledge proofs to validate a transaction. Zk-rollups provide better security and confidentiality than optimistic rollups, but they come with their own sets of limitations.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#pushing-the-limits-of-zero-knowledge-proving-systems" aria-label="pushing the limits of zero knowledge proving systems permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Pushing the Limits of Zero-Knowledge Proving Systems</h2> +<p>The challenge Nomadic Labs faced was related to their proving system. In general, proving systems have to be very asymmetrical, with a prover doing most of the work off-chain and a lean verifier working on-chain. Whilst these rollup systems are great for blockchains, because of how quickly they can verify something, the proving stage is very CPU- and memory-intensive. This prevents the scalability of Epoxy, their zk-rollup solution, since its throughput and latency are directly limited by the speed of the prover. +Most of the innovation in the field of zero-knowledge proving systems today is driven by lowering the complexity of the provers, thanks to novel cryptography but also to highly optimised implementations.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#ocaml-5-multicore-saves-the-day" aria-label="ocaml 5 multicore saves the day permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>OCaml 5 Multicore Saves the Day</h1> +<p>The team&rsquo;s solution to the prover&rsquo;s inefficiencies was to use the <a href="https://gitlab.com/nomadic-labs/cryptography/privacy-team/">aPlonK</a> proving system to power Epoxy. It enables the efficient aggregation of multiple proofs that can be executed in parallel. This is where OCaml 5 comes in! With its Multicore capabilities and strong safety features, it is the perfect candidate for speeding up the proving process.</p> +<p>With OCaml 5, the team could parallelise the proving process by utilising multiple cores on one machine. According to Marco, &ldquo;OCaml 5 drastically improved performance with minimal effort.&rdquo; Increased performance of the zk-rollup translates to high TPS and throughput for customers using digital currencies. Striving to be the fastest blockchain means constantly looking at opportunities to improve performance, from changing algorithms to workflows.</p> +<p>Furthermore, speaking as a seasoned developer, Marco emphasises the importance of OCaml 5 being easy to install and set up.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#the-most-important-thing-was-not-having-to-revolutionise-what-i-do-people-dont-want-to-waste-a-week-on-upgrading-and-this-was-a-seamless-experience" aria-label="the most important thing was not having to revolutionise what i do people dont want to waste a week on upgrading and this was a seamless experience permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>&ldquo;The most important thing was not having to revolutionise what I do. People don&rsquo;t want to waste a week on upgrading, and this was a seamless experience.&rdquo;</h3> +<p>Installing OCaml 5 proved to be easy for the team, and in the matter of a few hours, they were running Multicore on their machines. For people who don&rsquo;t need Multicore, the upgrade is completely backwards compatible, and their sequential code will still work normally. It&rsquo;s due to the precise balance OCaml 5 strikes between backwards compatibility and cutting-edge upgrades that Marco thinks there is literally &ldquo;No reason not to upgrade.&rdquo;</p> +<p>When using OCaml 5 before its official launch, the team faced some small compatibility issues they needed help with. They checked the state of compatibility on the helpful <a href="http://check.ocamllabs.io/">health check</a> website, where more and more packages were &lsquo;going green&rsquo; daily. Even before they could file a bug report, they would find that their problem had been resolved. Since set up was so smooth, the team needed very little help. Still, Marco commented that: &ldquo;The moment we had a problem, we would get help immediately.&rdquo;</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#what-does-the-future-hold" aria-label="what does the future hold permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What Does the Future Hold?</h2> +<p>The team is excited to continue using OCaml 5 in the future, as soon as Tezos begins using it in production after the full release. At the moment, the team has to parallelise using several machines to speed up the prover&rsquo;s performance. With OCaml 5, they will be able to exploit multiple cores on several computers at the same time.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#we-can-easily-double-the-speed-of-what-were-doing-with-ocaml-5" aria-label="we can easily double the speed of what were doing with ocaml 5 permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>&ldquo;We can easily double the speed of what we&rsquo;re doing with OCaml 5.&rdquo;</h3> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusion" aria-label="conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusion</h2> +<p>After successful experimentation with a prerelease of OCaml 5, the team at Nomadic Labs discovered that it gives their zk-rollup a significant performance boost. For the Tezos blockchain, this boost can result in higher TPS and throughput for customers who use their digital currency. Combined with Tezos&rsquo;s other benefits, such as its energy efficiency (thanks to its proof-of-stake consensus mechanism), OCaml 5 definitively gives it a leg up with fast zk-rollups.</p> +<p>The technologies that make blockchains a reality are undeniably a driving force behind great innovation. Zk-rollups are just one example of technologies that aim to make complex processes like verifiers and provers lightning fast. Performance is vital in almost every field, and the use cases for this type of technology are endless.</p> +<p>Tarides offers its extensive expertise in OCaml to help businesses achieve their targets. Find out more about how OCaml 5 can help you transform <a href="https://tarides.com/company">your business</a>!</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#sources" aria-label="sources permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Sources</h3> +<p><a href="https://research-development.nomadic-labs.com/kathmandu-is-live.html">Kathmandu Blog on Nomadic Labs</a> +<a href="https://research-development.nomadic-labs.com/next-generation-rollups.html">Next Generation Rollups</a> +<a href="https://www.quicknode.com/guides/infrastructure/introduction-to-ethereum-rollups">Ethereum Rollups</a> +<a href="https://research-development.nomadic-labs.com/smart-rollups-are-coming.html">Smart Rollups</a></p>https://tarides.com/blog/2022-12-20-how-nomadic-labs-used-multicore-processing-to-create-a-faster-blockchainHow Nomadic Labs Used Multicore Processing to Create a Faster Blockchain2022-12-20T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>It's here! It's finally here! On Friday, 16 December 2022, the OCaml community <a href="https://discuss.ocaml.org/t/ocaml-5-0-0-is-out/10974/7">announced the official release of Multicore OCaml</a>! From the beginning, Tarides has been deeply involved in OCaml's evolution, so we're very proud to present OCaml 5!</p> +<p>Our work with the myriad of academics, industrial developers, and the entire OCaml community has been both inspiring and fulfilling. We look forward to continuing our collaboration for future iterations of OCaml. Watch <a href="https://www.youtube.com/watch?v=zJ4G0TKwzVc">KC's keynote</a> to get a visual overview of all OCaml 5 has to offer!</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#about-ocaml-5" aria-label="about ocaml 5 permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>About OCaml 5</h2> +<p>OCaml is a pragmatic functional programming language. Its strength lies in the capacity to balance security, performance, and reliability. OCaml is used in both industry and academia to address problems in which a single mistake could be catastrophic, <a href="https://ocaml.org/success-stories/large-scale-trading-system">such as in finance</a> and <a href="https://ocaml.org/success-stories/sensor-analytics-and-automation-platform-for-sustainable-agriculture">sensor analytics</a> for sustainable agriculture. It's also used by millions of developers daily with <a href="https://www.docker.com/blog/how-docker-desktop-networking-works-under-the-hood/">Docker for Desktop</a>.</p> +<p>OCaml 5 brings the long-awaited runtime support for shared memory <a href="https://v2.ocaml.org/releases/5.0/manual/parallelism.html">parallelism</a> and <a href="https://v2.ocaml.org/releases/5.0/manual/effects.html">effect handlers</a>. It is a major change (including the full rewrite of a new, concurrent garbage collector), but we worked hard to ensure there would be no breakage for existing OCaml users. This release combines the security and safety of OCaml with new features that bring huge performance benefits and an improved methodology for writing concurrent code.</p> +<p>OCaml 5 supports both the the x86-64 and ARM64 architectures, so Linux, the BSDs, macOS, and Mingw-w64 on Windows are all supported. Over the next year, the OCaml community and Tarides will restore support for most previously-supported architectures that fall outside of this range, but this doesn't mean you can't use OCaml now! OCaml 5 seeks to be completely backwards-compatible, and programs written for any version of OCaml 4 will continue to work in OCaml 5.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#multicore" aria-label="multicore permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Multicore</h2> +<p>With technological advances and the explosive growth of machines with more and more available cores, it's necessary for programming languages to support multicore technology. Until this new release, OCaml was single-threaded, meaning it could only utilise one core to run code. With OCaml 5, programs can now exploit multiple cores and execute processes in parallel, providing users with enhanced performance and efficiency.</p> +<p>Performance is key with OCaml 5 to ensure your programs run smoother, faster, and more efficient, a significant achievement in turning cutting-edge science into real-life applications and industrial-strength tools.</p> +<p>Multicore OCaml has been in the making for 8 years and required a full rewrite of it runtime environment, so you can believe that the OCaml community is absolutely thrilled to see this come to fruition from their years of hard work. Multicore support ensures beginners can achieve the same productivity as OCaml experts!</p> +<p>If you come across any unexpected behaviours that aren't covered by the few exceptions listed on the <a href="https://discuss.ocaml.org/t/ocaml-5-0-0-is-out/10974">Discuss post</a>, please report them on the <a href="https://github.com/ocaml/ocaml/issues">OCaml issue tracker</a>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#eio--concurrency" aria-label="eio concurrency permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Eio &amp; Concurrency</h2> +<p>Concurrency in OCaml 5 is supported through the use of effect handlers, a new feature that enables the development of concurrent applications in a seamless fashion. The developer experience is improved, now that programmers can simply write concurrent code in the same style as non-concurrent code.</p> +<p><a href="https://github.com/ocaml-multicore/eio">Eio</a>, our experimental, high-performant I/O library, is an excellent example of use for effect handlers. Since we've already demonstrated that we can reach millions of requests per second while keeping simple, direct-style code, OCaml 5 is meant to be on par with Rust (and outperforms Go) for I/O heavy workloads.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 578px; "> + <a href="https://tarides.com/static/ce6f7c661802b2b8eda379f465a7cb87/508ef/eio1.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 67.64705882352942%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/ce6f7c661802b2b8eda379f465a7cb87/508ef/eio1.png" class="gatsby-resp-image-image" alt="Eio Performance" title="Eio Performance" srcset="/static/ce6f7c661802b2b8eda379f465a7cb87/04472/eio1.png 170w, +/static/ce6f7c661802b2b8eda379f465a7cb87/9f933/eio1.png 340w, +/static/ce6f7c661802b2b8eda379f465a7cb87/508ef/eio1.png 578w" sizes="(max-width: 578px) 100vw, 578px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 572px; "> + <a href="https://tarides.com/static/cd13a47efa1219429ec495747f257b73/a805e/eio2.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 67.64705882352942%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/cd13a47efa1219429ec495747f257b73/a805e/eio2.png" class="gatsby-resp-image-image" alt="Eio Performance" title="Eio Performance" srcset="/static/cd13a47efa1219429ec495747f257b73/04472/eio2.png 170w, +/static/cd13a47efa1219429ec495747f257b73/9f933/eio2.png 340w, +/static/cd13a47efa1219429ec495747f257b73/a805e/eio2.png 572w" sizes="(max-width: 572px) 100vw, 572px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>With Eio, asynchronous and synchronous code can be composed together naturally, solving the <a href="https://www.tedinski.com/2018/11/13/function-coloring.html">function colouring problem</a> that affects many languages, including programs written in earlier versions of OCaml. This makes OCaml easier to use, even if you're new to the language.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#runtime-events" aria-label="runtime events permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Runtime Events</h2> +<p>Another huge improvement with OCaml 5 is Runtime Events, a library included with OCaml 5's runtime, which efficiently processes performance data from the garbage collector (GC) and runtime. Runtime Events provides continuous monitoring of OCaml applications. Multicore programs are notoriously hard to debug and profile, so we ensured that OCaml 5 has a built-in mechanism to build great, relevant tooling. The basis of this is Runtime Events.</p> +<p>Read more about Runtime Events in the <a href="https://v2.ocaml.org/releases/5.0/htmlman/runtime-tracing.html">OCaml Manual</a>, and watch the <a href="https://watch.ocaml.org/videos/watch/299cab02-db94-44ac-b926-ea90ddda1b09">OCaml Workshop</a> for a visual introduction to Runtime Events. Also check out these WIP tools: <a href="https://github.com/sadiqj/runtime_events_tools">olly</a> and <a href="https://github.com/patricoferris/meio">meio</a>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#open-source-and-open-arms" aria-label="open source and open arms permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Open Source and Open Arms</h2> +<p>OCaml 5 is freely available to everyone worldwide. For installation instructions, compiler configurations, and a detailed list of changes and new features, please visit <a href="https://discuss.ocaml.org/t/ocaml-5-0-0-is-out/10974">this OCaml Discuss post</a>. While you're there, join in the conversation! We welcome you to the OCaml community with open arms! There has never been a better time to learn and use OCaml. Give it a try, and please report any problems to the <a href="https://github.com/ocaml/ocaml/issues">OCaml issue tracker</a>.</p> +<p>Read more about the journey to Multicore, Runtime Events, Eio, and more over the next six weeks in our OCaml 5 blog series. It will include several articles highlighting OCaml 5's new features, interviews with OCaml engineers, and reasons why OCaml should be the next language you learn. Follow us on <a href="https://twitter.com/tarides_">Twitter</a> and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> so you don't miss a thing!</p> +<p>This OCaml 5 release is the best holiday gift for developers, both experienced and those new to programming. <a href="https://tarides.com/blog/2022-11-24-solve-the-2022-advent-of-code-puzzles-with-ocaml">Enjoy!</a></p> +<blockquote> +<p>Tarides is always happy to discuss commercial opportunities around OCaml. Especially with the OCaml 5 release, there are many areas where we can help industrial users to adopt OCaml 5 more quickly, including training, support, custom developments, etc. Please <a href="https://tarides.com/company/">contact us</a> if you are interested in discussing your needs.</p> +</blockquote>https://tarides.com/blog/2022-12-19-ocaml-5-with-multicore-support-is-hereOCaml 5 With Multicore Support Is Here!2022-12-19T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides + +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 512px; "> + <a href="https://tarides.com/static/ba47ed1e2b7a5e9dc869229c7e9e073f/01e7c/hillingar.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 100%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/ba47ed1e2b7a5e9dc869229c7e9e073f/01e7c/hillingar.png" class="gatsby-resp-image-image" alt="hillingar" title="hillingar" srcset="/static/ba47ed1e2b7a5e9dc869229c7e9e073f/04472/hillingar.png 170w, +/static/ba47ed1e2b7a5e9dc869229c7e9e073f/9f933/hillingar.png 340w, +/static/ba47ed1e2b7a5e9dc869229c7e9e073f/01e7c/hillingar.png 512w" sizes="(max-width: 512px) 100vw, 512px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>NixOS allows reproducible deployments of systems by managing configuration declaratively. +MirageOS is a unikernel creation framework that creates targeted operating systems for high-level applications that can run on a hypervisor. +By building MirageOS unikernels with Nix, we can enable reproducible builds of these unikernels and enable easy deployment on NixOS systems.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#introduction" aria-label="introduction permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Introduction</h2> +<p>The Domain Name System (DNS) is a critical component of the modern Internet, allowing domain names to be mapped to IP addresses, mailservers, and more. +This allows users to access services independent of their location in the Internet using human-readable names. +We can host a DNS server ourselves to have authoritative control over our domain, protect the privacy of those using our server, increase reliability by not relying on a third party DNS provider, and allow greater customisation of the records served. +However, it can be quite challenging to deploy one's own server reliably and reproducibly. +The Nix deployment system aims to address this. +With a NixOS machine, deploying a DNS server is as simple as:</p> +<div class="gatsby-highlight" data-language="nix"><pre class="language-nix"><code class="language-nix"><span class="token punctuation">{</span> + services<span class="token punctuation">.</span>bind <span class="token operator">=</span> <span class="token punctuation">{</span> + enable <span class="token operator">=</span> <span class="token boolean">true</span><span class="token punctuation">;</span> + zones<span class="token punctuation">.</span><span class="token string">&quot;freumh.org&quot;</span> <span class="token operator">=</span> <span class="token punctuation">{</span> + master <span class="token operator">=</span> <span class="token boolean">true</span><span class="token punctuation">;</span> + file <span class="token operator">=</span> <span class="token string">&quot;freumh.org.zone&quot;</span><span class="token punctuation">;</span> + <span class="token punctuation">}</span><span class="token punctuation">;</span> + <span class="token punctuation">}</span><span class="token punctuation">;</span> +<span class="token punctuation">}</span></code></pre></div> +<p>Which we can then query with:</p> +<div class="gatsby-highlight" data-language="bash"><pre class="language-bash"><code class="language-bash">$ <span class="token function">dig</span> freumh.org @ns1.freumh.org +short +<span class="token number">135.181</span>.100.27</code></pre></div> +<p>To enable the user to query our domain without specifying the nameserver, we have to create a glue record with our registrar pointing <code>ns1.freumh.org</code> to the IP address of our DNS-hosting machine.</p> +<p>You might notice this configuration is running the venerable bind<sup><a href="https://tarides.com/feed.xml#fn-1" class="footnote-ref">1</a></sup>, which is written in C. +As an alternative, using functional, high-level, type-safe programming languages to create network applications can greatly benefit safety and usability whilst maintaining performant execution. +One such language is OCaml.</p> +<p>MirageOS<sup><a href="https://tarides.com/feed.xml#fn-2" class="footnote-ref">2</a></sup> is a deployment method for these OCaml programs. +Instead of running them as a traditional Unix process, we instead create a specialised 'unikernel' operating system to run the application. +They offer reduced image sizes through dead code elimination, as well as improved security and efficiency.</p> +<p>However, to deploy a Mirage unikernel with NixOS, one must use the imperative deployment methodologies native to the OCaml ecosystem, thus eliminating the benefit of reproducible systems that Nix offers. +This blog post will explore how we enabled reproducible deployments of Mirage unikernels by building them with Nix.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#mirageos" aria-label="mirageos permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>MirageOS</h2> +<div style="text-align: center;"> + <img src="https://tarides.com/mirage-logo.svg" style="height: 300px; max-width: 100%"/> +</div> +<p><sup><a href="https://tarides.com/feed.xml#fn-3" class="footnote-ref">3</a></sup></p> +<p>MirageOS is a library operating system that allows users to create unikernels, which are specialised operating systems that include both low-level operating system code and high-level application code in a single kernel and a single address space. +It was the first such 'unikernel creation framework', but comes from a long lineage of OS research, such as the exokernel library OS architecture. +Embedding application code in the kernel allows for dead-code elimination, removing OS interfaces that are unused, which reduces the unikernel's attack surface and offers improved efficiency.</p> +<div style="text-align: center;"> + <img src="https://tarides.com/mirage-diagram.svg" style="height: 300px; max-width: 100%"/> +</div> +<p>Contrasting software layers in existing VM appliances vs. unikernel's standalone kernel compilation approach <a href="https://tarides.com/feed.xml#madhavapeddyUnikernelsLibraryOperating2013">[3]</a></p> +<p>Mirage unikernels are written OCaml<sup><a href="https://tarides.com/feed.xml#fn-4" class="footnote-ref">4</a></sup>. +OCaml is more practical for systems programming than other functional programming languages, such as Haskell. +It supports falling back on impure imperative code or mutable variables when warranted.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#nix" aria-label="nix permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Nix</h2> +<div style="text-align: center;"> + <img src="https://tarides.com/nix-snowflake.svg" style="height: 300px; max-width: 100%"/> +</div> +<p>Nix snowflake<sup><a href="https://tarides.com/feed.xml#fn-5" class="footnote-ref">5</a></sup>.</p> +<p>Nix is a deployment system that uses cryptographic hashes to compute unique paths for components<sup><a href="https://tarides.com/feed.xml#fn-6" class="footnote-ref">6</a></sup> that are stored in a read-only directory: the Nix store, at <code>/nix/store/&lt;hash&gt;-&lt;name&gt;</code>. +This provides several benefits, including concurrent installation of multiple versions of a package, atomic upgrades, and multiple user environments.</p> +<p>Nix uses a declarative domain-specific language (DSL), also called Nix, to build and configure software. +The snippet used to deploy the DNS server is in fact a Nix expression. +This example doesn't demonstrate it, but Nix is Turing complete. +Nix does not, however, have a type system.</p> +<p>We used the DSL to write derivations for software that describe how to build said software with input components and a build script. +This Nix expression is then 'instantiated' to create 'store derivations' (<code>.drv</code> files), which is the low-level representation of how to build a single component. +This store derivation is 'realised' into a built artefact, hereafter referred to as 'building.'</p> +<p>Possibly the simplest Nix derivation uses <code>bash</code> to create a single file containing <code>Hello, World!</code>:</p> +<div class="gatsby-highlight" data-language="nix"><pre class="language-nix"><code class="language-nix"><span class="token punctuation">{</span> pkgs <span class="token operator">?</span> <span class="token function">import</span> <span class="token operator">&lt;</span>nixpkgs<span class="token operator">&gt;</span> <span class="token punctuation">{</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span><span class="token punctuation">:</span> + +<span class="token keyword">builtins</span><span class="token punctuation">.</span><span class="token function">derivation</span> <span class="token punctuation">{</span> + name <span class="token operator">=</span> <span class="token string">&quot;hello&quot;</span><span class="token punctuation">;</span> + system <span class="token operator">=</span> <span class="token keyword">builtins</span><span class="token punctuation">.</span><span class="token function">currentSystem</span><span class="token punctuation">;</span> + builder <span class="token operator">=</span> <span class="token string">&quot;<span class="token interpolation"><span class="token antiquotation important">$</span><span class="token punctuation">{</span>nixpkgs<span class="token punctuation">.</span>bash<span class="token punctuation">}</span></span>/bin/bash&quot;</span><span class="token punctuation">;</span> + args <span class="token operator">=</span> <span class="token punctuation">[</span> <span class="token string">&quot;-c&quot;</span> <span class="token string">''echo &quot;Hello, World!&quot; &gt; $out''</span> <span class="token punctuation">]</span><span class="token punctuation">;</span> +<span class="token punctuation">}</span></code></pre></div> +<p>Note that <code>derivation</code> is a function that we're calling with one argument, which is a set of attributes.</p> +<p>We can instantiate this Nix derivation to create a store derivation:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">$ nix-instantiate default.nix +/nix/store/5d4il3h1q4cw08l6fnk4j04a19dsv71k-hello.drv +$ nix show-derivation /nix/store/5d4il3h1q4cw08l6fnk4j04a19dsv71k-hello.drv +{ + &quot;/nix/store/5d4il3h1q4cw08l6fnk4j04a19dsv71k-hello.drv&quot;: { + &quot;outputs&quot;: { + &quot;out&quot;: { + &quot;path&quot;: &quot;/nix/store/4v1dx6qaamakjy5jzii6lcmfiks57mhl-hello&quot; + } + }, + &quot;inputSrcs&quot;: [], + &quot;inputDrvs&quot;: { + &quot;/nix/store/mnyhjzyk43raa3f44pn77aif738prd2m-bash-5.1-p16.drv&quot;: [ + &quot;out&quot; + ] + }, + &quot;system&quot;: &quot;x86_64-linux&quot;, + &quot;builder&quot;: &quot;/nix/store/2r9n7fz1rxq088j6mi5s7izxdria6d5f-bash-5.1-p16/bin/bash&quot;, + &quot;args&quot;: [ &quot;-c&quot;, &quot;echo \&quot;Hello, World!\&quot; &gt; $out&quot; ], + &quot;env&quot;: { + &quot;builder&quot;: &quot;/nix/store/2r9n7fz1rxq088j6mi5s7izxdria6d5f-bash-5.1-p16/bin/bash&quot;, + &quot;name&quot;: &quot;hello&quot;, + &quot;out&quot;: &quot;/nix/store/4v1dx6qaamakjy5jzii6lcmfiks57mhl-hello&quot;, + &quot;system&quot;: &quot;x86_64-linux&quot; + } + } +}</code></pre></div> +<p>And build the store derivation:</p> +<div class="gatsby-highlight" data-language="sh"><pre class="language-sh"><code class="language-sh">$ nix-store <span class="token parameter variable">--realise</span> /nix/store/5d4il3h1q4cw08l6fnk4j04a19dsv71k-hello.drv +/nix/store/4v1dx6qaamakjy5jzii6lcmfiks57mhl-hello +$ <span class="token function">cat</span> /nix/store/4v1dx6qaamakjy5jzii6lcmfiks57mhl-hello +Hello, World<span class="token operator">!</span></code></pre></div> +<p>Most Nix tooling does these two steps together:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">nix-build default.nix +this derivation will be built: + /nix/store/q5hg3vqby8a9c8pchhjal3la9n7g1m0z-hello.drv +building '/nix/store/q5hg3vqby8a9c8pchhjal3la9n7g1m0z-hello.drv'... +/nix/store/zyrki2hd49am36jwcyjh3xvxvn5j5wml-hello</code></pre></div> +<p>Nix realisations (hereafter referred to as 'builds') are done in isolation to ensure reproducibility. +Projects often rely on interacting with package managers to make sure all dependencies are available and may implicitly rely on system configuration at build time. +To prevent this, every Nix derivation is built in isolation (without network access or access to the global file system) with only other Nix derivations as inputs.</p> +<blockquote> +<p>The name Nix is derived from the Dutch word <em>niks</em>, meaning nothing; build actions do not see anything that has not been explicitly declared as an input.</p> +</blockquote> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#nixpkgs" aria-label="nixpkgs permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Nixpkgs</h4> +<p>You may have noticed a reference to <code>nixpkgs</code> in the above derivation. +As every input to a Nix derivation also has to be a Nix derivation, one can imagine the tedium involved in creating a Nix derivation for every dependency of your project. +However, Nixpkgs<sup><a href="https://tarides.com/feed.xml#fn-7" class="footnote-ref">7</a></sup> is a large repository of software packaged in Nix, where a package is a Nix derivation. +We can use packages from Nixpkgs as inputs to a Nix derivation, as we've done with <code>bash</code>.</p> +<p>There is also a command line package manager installing packages from Nixpkgs, which is why people often refer to Nix as a package manager. +While Nix, and therefore Nix package management, is primarily source-based (since derivations describe how to build software from source), binary deployment is an optimisation of this. +Since packages are built in isolation and entirely determined by their inputs, binaries can be transparently deployed by downloading them from a remote server instead of building the derivation locally.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/d593fbd512695940ef53b14d87fcc371/ce6cc/nixpkgs.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 64.70588235294117%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/d593fbd512695940ef53b14d87fcc371/c5bb3/nixpkgs.png" class="gatsby-resp-image-image" alt="nixpkgs" title="nixpkgs" srcset="/static/d593fbd512695940ef53b14d87fcc371/04472/nixpkgs.png 170w, +/static/d593fbd512695940ef53b14d87fcc371/9f933/nixpkgs.png 340w, +/static/d593fbd512695940ef53b14d87fcc371/c5bb3/nixpkgs.png 680w, +/static/d593fbd512695940ef53b14d87fcc371/b12f7/nixpkgs.png 1020w, +/static/d593fbd512695940ef53b14d87fcc371/b5a09/nixpkgs.png 1360w, +/static/d593fbd512695940ef53b14d87fcc371/ce6cc/nixpkgs.png 1551w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>Visualisation of Nixpkgs<sup><a href="https://tarides.com/feed.xml#fn-8" class="footnote-ref">8</a></sup></p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#nixos" aria-label="nixos permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>NixOS</h4> +<p>NixOS<sup><a href="https://tarides.com/feed.xml#fn-9" class="footnote-ref">9</a></sup> is a Linux distribution built with Nix from a modular, purely functional specification. +It has no traditional filesystem hierarchy (FSH), like <code>/bin</code>, <code>/lib</code>, <code>/usr</code>, but instead stores all components in <code>/nix/store</code>. +The system configuration is managed by Nix and configured with Nix expressions. +NixOS modules are Nix files containing chunks of system configuration that can be composed to build a full NixOS system<sup><a href="https://tarides.com/feed.xml#fn-10" class="footnote-ref">10</a></sup>. +While many NixOS modules are provided in the Nixpkgs repository, they can also be written by an individual user. +For example, the expression used to deploy a DNS server is a NixOS module. +Together these modules form the configuration which builds the Linux system as a Nix derivation.</p> +<p>NixOS minimises global mutable state that -- without knowing it -- you might rely on being set up in a certain way. +For example, you might follow instructions to run a series of shell commands and edit some files to get a piece of software working. +You may subsequently be unable to reproduce the result because you've forgotten some intricacy or are now using a different version of the software. +Nix forces you to encode this in a reproducible way, which is extremely useful for replicating software configurations and deployments, aiming to solve the 'It works on my machine' problem. +Docker is often used to fix this configuration problem, but Nix aims to be more reproducible. +This can be frustrating at times because it can make it harder to get a project off the ground, but the benefits often outweigh the downsides.</p> +<p>Nix uses pointers (implemented as symlinks) to system dependencies, which are Nix derivations for programs or pieces of configuration files. +This means NixOS supports atomic upgrades, as the pointers to the new packages are only updated when the install succeeds; the old versions can be kept until garbage collection. +This also allows NixOS to trivially supports rollbacks to previous system configurations, as the pointers can be restored to their previous state. +Every new system configuration creates a GRUB entry, so you can boot previous systems even from your UEFI/BIOS. +Finally, NixOS also supports partial upgrades: while Nixpkgs also has one global coherent package set, one can use multiple instances of Nixpkgs (i.e., channels) at once, as this Nix store allows multiple versions of a dependency to be stored.</p> +<p>To summarise the parts of the Nix ecosystem that we've discussed:</p> +<div style="text-align: center;"> + <img src="https://tarides.com/nix-stack.svg" style="height: 300px; max-width: 100%"/> +</div> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#flakes" aria-label="flakes permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Flakes</h4> +<p>We also use Nix flakes for this project. +Without going into too much depth, they enable hermetic evaluation of Nix expressions and provide a standard way to compose Nix projects. +With flakes, instead of using a Nixpkgs repository version from a 'channel'<sup><a href="https://tarides.com/feed.xml#fn-11" class="footnote-ref">11</a></sup>, we pin Nixpkgs as an input to every Nix flake, be it a project build with Nix or a NixOS system. +Integrated with flakes, there is also a new <code>nix</code> command aimed at improving the Nix UI. +You can read more detail about flakes in a series of blog posts by Eelco Dolstra on the topic<sup><a href="https://tarides.com/feed.xml#fn-12" class="footnote-ref">12</a></sup>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#deploying-unikernels" aria-label="deploying unikernels permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Deploying Unikernels</h2> +<p>Now that we understand what Nix and Mirage are, and we've motivated the desire to deploy Mirage unikernels on a NixOS machine, what's stopping us from doing just that? +To support deploying a Mirage unikernel, like for a DNS server, we need to write a NixOS module for it.</p> +<p>A paired-down<sup><a href="https://tarides.com/feed.xml#fn-13" class="footnote-ref">13</a></sup> version of the bind NixOS module, the module used in our Nix expression for deploying a DNS server on NixOS (<a href="https://tarides.com/feed.xml#cb1">&sect;</a>), is:</p> +<div class="gatsby-highlight" data-language="nix"><pre class="language-nix"><code class="language-nix"><span class="token punctuation">{</span> config<span class="token punctuation">,</span> lib<span class="token punctuation">,</span> pkgs<span class="token punctuation">,</span> <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span> <span class="token punctuation">}</span><span class="token punctuation">:</span> + +<span class="token keyword">with</span> lib<span class="token punctuation">;</span> + +<span class="token punctuation">{</span> + options <span class="token operator">=</span> <span class="token punctuation">{</span> + services<span class="token punctuation">.</span>bind <span class="token operator">=</span> <span class="token punctuation">{</span> + enable <span class="token operator">=</span> mkEnableOption <span class="token string">&quot;BIND domain name server&quot;</span><span class="token punctuation">;</span> + + zones <span class="token operator">=</span> mkOption <span class="token punctuation">{</span> + <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span> + <span class="token punctuation">}</span><span class="token punctuation">;</span> + <span class="token punctuation">}</span><span class="token punctuation">;</span> + <span class="token punctuation">}</span><span class="token punctuation">;</span> + + config <span class="token operator">=</span> mkIf cfg<span class="token punctuation">.</span>enable <span class="token punctuation">{</span> + systemd<span class="token punctuation">.</span>services<span class="token punctuation">.</span>bind <span class="token operator">=</span> <span class="token punctuation">{</span> + description <span class="token operator">=</span> <span class="token string">&quot;BIND Domain Name Server&quot;</span><span class="token punctuation">;</span> + after <span class="token operator">=</span> <span class="token punctuation">[</span> <span class="token string">&quot;network.target&quot;</span> <span class="token punctuation">]</span><span class="token punctuation">;</span> + wantedBy <span class="token operator">=</span> <span class="token punctuation">[</span> <span class="token string">&quot;multi-user.target&quot;</span> <span class="token punctuation">]</span><span class="token punctuation">;</span> + + serviceConfig <span class="token operator">=</span> <span class="token punctuation">{</span> + ExecStart <span class="token operator">=</span> <span class="token string">&quot;<span class="token interpolation"><span class="token antiquotation important">$</span><span class="token punctuation">{</span>pkgs<span class="token punctuation">.</span>bind<span class="token punctuation">.</span>out<span class="token punctuation">}</span></span>/sbin/named&quot;</span><span class="token punctuation">;</span> + <span class="token punctuation">}</span><span class="token punctuation">;</span> + <span class="token punctuation">}</span><span class="token punctuation">;</span> + <span class="token punctuation">}</span><span class="token punctuation">;</span> +<span class="token punctuation">}</span></code></pre></div> +<p>Notice the reference to <code>pkgs.bind</code>. +This is the Nixpkgs repository Nix derivation for the <code>bind</code> package. +Recall that every input to a Nix derivation is itself a Nix derivation (<a href="https://tarides.com/feed.xml#nixpkgs">&sect;</a>); in order to use a package in a Nix expression -- i.e., a NixOS module -- we need to build said package with Nix. +Once we build a Mirage unikernel with Nix, we can write a NixOS module to deploy it.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#building-unikernels" aria-label="building unikernels permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Building Unikernels</h2> +<p>Mirage uses the package manager for OCaml called opam<sup><a href="https://tarides.com/feed.xml#fn-14" class="footnote-ref">14</a></sup>. +Dependencies in opam, as is common in programming language package managers, have a file which -- among other metadata, build/install scripts -- specifies dependencies and their version constraints. +For example<sup><a href="https://tarides.com/feed.xml#fn-15" class="footnote-ref">15</a></sup></p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">... +depends: [ + &quot;arp&quot; { ?monorepo &amp; &gt;= &quot;3.0.0&quot; &amp; &lt; &quot;4.0.0&quot; } + &quot;ethernet&quot; { ?monorepo &amp; &gt;= &quot;3.0.0&quot; &amp; &lt; &quot;4.0.0&quot; } + &quot;lwt&quot; { ?monorepo } + &quot;mirage&quot; { build &amp; &gt;= &quot;4.2.0&quot; &amp; &lt; &quot;4.3.0&quot; } + &quot;mirage-bootvar-solo5&quot; { ?monorepo &amp; &gt;= &quot;0.6.0&quot; &amp; &lt; &quot;0.7.0&quot; } + &quot;mirage-clock-solo5&quot; { ?monorepo &amp; &gt;= &quot;4.2.0&quot; &amp; &lt; &quot;5.0.0&quot; } + &quot;mirage-crypto-rng-mirage&quot; { ?monorepo &amp; &gt;= &quot;0.8.0&quot; &amp; &lt; &quot;0.11.0&quot; } + &quot;mirage-logs&quot; { ?monorepo &amp; &gt;= &quot;1.2.0&quot; &amp; &lt; &quot;2.0.0&quot; } + &quot;mirage-net-solo5&quot; { ?monorepo &amp; &gt;= &quot;0.8.0&quot; &amp; &lt; &quot;0.9.0&quot; } + &quot;mirage-random&quot; { ?monorepo &amp; &gt;= &quot;3.0.0&quot; &amp; &lt; &quot;4.0.0&quot; } + &quot;mirage-runtime&quot; { ?monorepo &amp; &gt;= &quot;4.2.0&quot; &amp; &lt; &quot;4.3.0&quot; } + &quot;mirage-solo5&quot; { ?monorepo &amp; &gt;= &quot;0.9.0&quot; &amp; &lt; &quot;0.10.0&quot; } + &quot;mirage-time&quot; { ?monorepo } + &quot;mirageio&quot; { ?monorepo } + &quot;ocaml&quot; { build &amp; &gt;= &quot;4.08.0&quot; } + &quot;ocaml-solo5&quot; { build &amp; &gt;= &quot;0.8.1&quot; &amp; &lt; &quot;0.9.0&quot; } + &quot;opam-monorepo&quot; { build &amp; &gt;= &quot;0.3.2&quot; } + &quot;tcpip&quot; { ?monorepo &amp; &gt;= &quot;7.0.0&quot; &amp; &lt; &quot;8.0.0&quot; } + &quot;yaml&quot; { ?monorepo &amp; build } +] +...</code></pre></div> +<p>Each of these dependencies will have its own dependencies with their own version constraints. +As we can only link one dependency into the resulting program, we need to solve a set of dependency versions that satisfies these constraints. +This is not an easy problem. +In fact, it's NP-complete <sup><a href="https://tarides.com/feed.xml#fn-16" class="footnote-ref">16</a></sup>. +Opam uses the Zero Install<sup><a href="https://tarides.com/feed.xml#fn-17" class="footnote-ref">17</a></sup> SAT solver for dependency resolution.</p> +<p>Nixpkgs has a large number of OCaml packages<sup><a href="https://tarides.com/feed.xml#fn-18" class="footnote-ref">18</a></sup>, which we could provide as build inputs to a Nix derivation. +However, Nixpkgs has one global coherent set of package versions<sup><a href="https://tarides.com/feed.xml#fn-19" class="footnote-ref">19</a></sup>. +The support for installing multiple versions of a package concurrently comes from the fact that they are stored at a unique path and can be referenced separately, or symlinked, where required. +So different projects or users that use a different version of Nixpkgs won't conflict, but Nix does not do any dependency version resolution -- everything is pinned. +This is a problem for opam projects with version constraints that can't be satisfied with a static instance of Nixpkgs.</p> +<p>Luckily, a project from Tweag already exists (<code>opam-nix</code>) to deal with this<sup><a href="https://tarides.com/feed.xml#fn-20" class="footnote-ref">20</a></sup>. +This project uses the opam dependency versions solver inside a Nix derivation, and then creates derivations from the resulting dependency versions.</p> +<p>This still doesn't support building our Mirage unikernels, though. +Unikernels quite often need to be cross-compiled: compiled to run on a platform other than the one they're being built on. +A common target, Solo5<sup><a href="https://tarides.com/feed.xml#fn-21" class="footnote-ref">21</a></sup>, is a sandboxed execution environment for unikernels. +It acts as a minimal shim layer to interface between unikernels and different hypervisor backends. +Solo5 uses a different <code>glibc</code> which requires cross-compilation. +Mirage 4<sup><a href="https://tarides.com/feed.xml#fn-22" class="footnote-ref">22</a></sup> supports cross compilation with toolchains in the Dune build system<sup><a href="https://tarides.com/feed.xml#fn-23" class="footnote-ref">23</a></sup>. +This uses a host compiler installed in an opam switch (a virtual environment) as normal, as well as a target compiler<sup><a href="https://tarides.com/feed.xml#fn-24" class="footnote-ref">24</a></sup>. +But the cross-compilation context of packages is only known at build time, as some metaprogramming modules may require preprocessing with the host compiler. +To ensure that the right compilation context is used, we have to provide Dune with all our sources' dependencies. +A tool called <code>opam-monorepo</code> was created to do just that<sup><a href="https://tarides.com/feed.xml#fn-25" class="footnote-ref">25</a></sup>.</p> +<p>We extended the <code>opam-nix</code> project to support the <code>opam-monorepo</code> workflow with this pull request: <a href="https://github.com/tweag/opam-nix/pull/18">github.com/tweag/opam-nix/pull/18</a>. +This is very low-level support for building Mirage unikernels with Nix, however. +In order to provide a better user experience, we also created the Hillinar Nix flake: <a href="https://github.com/ryanGibb/hillingar">github.com/RyanGibb/hillingar</a>. +This wraps the Mirage tooling and <code>opam-nix</code> function calls so that a simple high-level flake can be dropped into a Mirage project to support building it with Nix. +To add Nix build support to a unikernel, simply:</p> +<div class="gatsby-highlight" data-language="bash"><pre class="language-bash"><code class="language-bash"><span class="token comment"># create a flake from hillingar's default template</span> +$ nix flake new <span class="token builtin class-name">.</span> <span class="token parameter variable">-t</span> github:/RyanGibb/hillingar +<span class="token comment"># substitute the name of the unikernel you're building</span> +$ <span class="token function">sed</span> <span class="token parameter variable">-i</span> <span class="token string">'s/throw &quot;Put the unikernel name here&quot;/&quot;&lt;unikernel-name&gt;&quot;/g'</span> flake.nix +<span class="token comment"># build the unikernel with Nix for a particular target</span> +$ nix build <span class="token builtin class-name">.</span><span class="token comment">#&lt;target&gt;</span></code></pre></div> +<p>For example, see the flake for building the Mirage website as a unikernel with Nix: <a href="https://github.com/RyanGibb/mirage-www/blob/master/flake.nix">github.com/RyanGibb/mirage-www/blob/master/flake.nix</a>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#evaluation" aria-label="evaluation permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Evaluation</h2> +<p>Hillingar's primary limitations are (1) complex integration is required with the OCaml ecosystem to solve dependency version constraints using <code>opam-nix</code>, and (2) that cross-compilation requires cloning all sources locally with <code>opam-monorepo</code> (<a href="https://tarides.com/feed.xml#dependency-management">&sect;</a>). +Another issue that proved an annoyance during this project is the Nix DSL's dynamic typing. +When writing simple derivations this often isn't a problem, but when writing complicated logic, it quickly gets in the way of productivity. +The runtime errors produced can be very hard to parse. +Thankfully there is work towards creating a typed language for the Nix deployment system, such as Nickel<sup><a href="https://tarides.com/feed.xml#fn-26" class="footnote-ref">26</a></sup>. +However gradual typing is hard, and Nickel still isn't ready for real-world use despite being open-sourced (in a week as of writing this) for two years.</p> +<p>A glaring omission is that despite it being the primary motivation, we haven't actually written a NixOS module for deploying a DNS server as a unikernel. +There are still questions about how to provide zone file data declaratively to the unikernel and manage the runtime of deployed unikernels. +One option to do the latter is Albatross<sup><a href="https://tarides.com/feed.xml#fn-27" class="footnote-ref">27</a></sup>, which has recently had support for building with Nix added<sup><a href="https://tarides.com/feed.xml#fn-28" class="footnote-ref">28</a></sup>. +Albatross aims to provision resources for unikernels such as network access, share resources for unikernels between users, and monitor unikernels with a Unix daemon. +Using Albatross to manage some of the inherent imperative processes behind unikernels, as well as share access to resources for unikernels for other users on a NixOS system, could simplify the creation and improve the functionality of a NixOS module for a unikernel.</p> +<p>There also exists related work in the reproducible building of Mirage unikernels. +Specifically, improving the reproducibility of opam packages (as Mirage unikernels are opam packages themselves)<sup><a href="https://tarides.com/feed.xml#fn-29" class="footnote-ref">29</a></sup>. +Hillingar differs in that it only uses opam for version resolution, instead using Nix to provide dependencies, which provides reproducibility with pinned Nix derivation inputs and builds in isolation by default.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusion" aria-label="conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusion</h2> +<p>To summarise, this project was motivated (<a href="https://tarides.com/feed.xml#introduction">&sect;</a>) by deploying unikernels on NixOS (<a href="https://tarides.com/feed.xml#deploying-unikernels">&sect;</a>). +Towards this end, we added support for building MirageOS unikernels with Nix: +we extended <code>opam-nix</code> to support the <code>opam-monorepo</code> workflow and created the Hillingar project to provide a usable Nix interface (<a href="https://tarides.com/feed.xml#building-unikernels">&sect;</a>).</p> +<p>While only the first was the primary motivation, the benefits of building unikernels with Nix are:</p> +<ul> +<li>Reproducible and low-config unikernel deployment using NixOS modules is enabled.</li> +<li>Nix allows reproducible builds pinning system dependencies and composing multiple language environments. For example, the OCaml package <code>conf-gmp</code> is a 'virtual package' that relies on a system installation of the C/Assembly library <code>gmp</code> (The GNU Multiple Precision Arithmetic Library). Nix easily allows us to depend on this package in a reproducible way.</li> +<li>We can use Nix to support building on different systems (<a href="https://tarides.com/feed.xml#cross-compilation">&sect;</a>).</li> +</ul> +<p>To conclude, while NixOS and MirageOS take fundamentally very different approaches, they're both trying to bring some kind of functional programming paradigm to operating systems. +NixOS does this in a top-down manner, trying to tame Unix with functional principles like laziness and immutability<sup><a href="https://tarides.com/feed.xml#fn-30" class="footnote-ref">30</a></sup>; whereas, MirageOS does this by throwing Unix out the window and rebuilding the world from scratch in a very much bottom-up approach. +Despite these two projects having different motivations and goals, Hillingar aims to get the best from both worlds by marrying the two.</p> +<hr/> +<p>To dive deeper, please see a more detailed article on my <a href="https://ryan.freumh.org/blog/hillingar">personal blog</a>.</p> +<p>If you have a unikernel, consider trying to build it with Hillingar, and please report any problems at <a href="https://github.com/RyanGibb/hillingar/issues">github.com/RyanGibb/hillingar/issues</a>!</p> +<div class="footnotes"> +<hr/> +<ol> +<li><a href="https://www.isc.org/bind/">ISC bind</a> has many <a href="https://www.cvedetails.com/product/144/ISC-Bind.html?vendor_id=64">CVE's</a><a href="https://tarides.com/feed.xml#fnref-1" class="footnote-backref">&#8617;</a></li> +<li> <a href="https://mirage.io">mirage.io</a> <a href="https://tarides.com/feed.xml#fnref-2" class="footnote-backref">&#8617;</a></li> +<li>Credits to Takayuki Imada<a href="https://tarides.com/feed.xml#fnref-3" class="footnote-backref">&#8617;</a></li> +<li>Barring the use of <a href="https://mirage.io/blog/modular-foreign-function-bindings">foreign function interfaces</a> (FFIs).<a href="https://tarides.com/feed.xml#fnref-4" class="footnote-backref">&#8617;</a></li> +<li>As 'nix' means snow in Latin. Credits to Tim Cuthbertson.<a href="https://tarides.com/feed.xml#fnref-5" class="footnote-backref">&#8617;</a></li> +<li>NB: we will use component, dependency, and package somewhat interchangeably in this blog post, as they all fundamentally mean the same thing -- a piece of software.<a href="https://tarides.com/feed.xml#fnref-6" class="footnote-backref">&#8617;</a></li> +<li> <a href="https://github.com/nixos/nixpkgs">github.com/nixos/nixpkgs</a> <a href="https://tarides.com/feed.xml#fnref-7" class="footnote-backref">&#8617;</a></li> +<li><a href="https://www.tweag.io/blog/2022-09-13-nixpkgs-graph/">www.tweag.io/blog/2022-09-13-nixpkgs-graph/</a><a href="https://tarides.com/feed.xml#fnref-8" class="footnote-backref">&#8617;</a></li> +<li><a href="https://nixos.org">nixos.org</a><a href="https://tarides.com/feed.xml#fnref-9" class="footnote-backref">&#8617;</a></li> +<li><a href="https://nixos.org/manual/nixos/stable/index.html#sec-writing-modules">NixOS manual Chapter 66. Writing NixOS Modules</a>.<a href="https://tarides.com/feed.xml#fnref-10" class="footnote-backref">&#8617;</a></li> +<li><a href="https://nixos.org/manual/nix/stable/package-management/channels.html">nixos.org/manual/nix/stable/package-management/channels.html</a><a href="https://tarides.com/feed.xml#fnref-11" class="footnote-backref">&#8617;</a></li> +<li><a href="https://www.tweag.io/blog/2020-05-25-flakes/">tweag.io/blog/2020-05-25-flakes</a><a href="https://tarides.com/feed.xml#fnref-12" class="footnote-backref">&#8617;</a></li> +<li>The full module can be found <a href="https://github.com/NixOS/nixpkgs/blob/fe76645aaf2fac3baaa2813fd0089930689c53b5/nixos/modules/services/networking/bind.nix">here</a><a href="https://tarides.com/feed.xml#fnref-13" class="footnote-backref">&#8617;</a></li> +<li><a href="https://opam.ocaml.org/">opam.ocaml.org</a><a href="https://tarides.com/feed.xml#fnref-14" class="footnote-backref">&#8617;</a></li> +<li>For <a href="https://github.com/mirage/mirage-www">mirage-www</a> targetting <code>hvt</code>.<a href="https://tarides.com/feed.xml#fnref-15" class="footnote-backref">&#8617;</a></li> +<li><a href="https://research.swtch.com/version-sat">research.swtch.com/version-sat</a><a href="https://tarides.com/feed.xml#fnref-16" class="footnote-backref">&#8617;</a></li> +<li><a href="https://0install.net">0install.net</a><a href="https://tarides.com/feed.xml#fnref-17" class="footnote-backref">&#8617;</a></li> +<li><a href="https://github.com/NixOS/nixpkgs/blob/9234f5a17e1a7820b5e91ecd4ff0de449e293383/pkgs/development/ocaml-modules/">github.com/NixOS/nixpkgs pkgs/development/ocaml-modules</a><a href="https://tarides.com/feed.xml#fnref-18" class="footnote-backref">&#8617;</a></li> +<li>Bar some exceptional packages that have multiple major versions packaged, like Postgres.<a href="https://tarides.com/feed.xml#fnref-19" class="footnote-backref">&#8617;</a></li> +<li><a href="https://github.com/tweag/opam-nix">github.com/tweag/opam-nix</a><a href="https://tarides.com/feed.xml#fnref-20" class="footnote-backref">&#8617;</a></li> +<li><a href="https://github.com/Solo5/solo5">github.com/Solo5/solo5</a><a href="https://tarides.com/feed.xml#fnref-21" class="footnote-backref">&#8617;</a></li> +<li><a href="https://mirage.io/blog/announcing-mirage-40">mirage.io/blog/announcing-mirage-40</a><a href="https://tarides.com/feed.xml#fnref-22" class="footnote-backref">&#8617;</a></li> +<li><a href="https://dune.build">dune.build</a><a href="https://tarides.com/feed.xml#fnref-23" class="footnote-backref">&#8617;</a></li> +<li><a href="https://github.com/mirage/ocaml-solo5">github.com/mirage/ocaml-solo5</a><a href="https://tarides.com/feed.xml#fnref-24" class="footnote-backref">&#8617;</a></li> +<li><a href="https://github.com/tarides/opam-monorepo">github.com/tarides/opam-monorepo</a><a href="https://tarides.com/feed.xml#fnref-25" class="footnote-backref">&#8617;</a></li> +<li><a href="https://www.tweag.io/blog/2020-10-22-nickel-open-sourcing/">www.tweag.io/blog/2020-10-22-nickel-open-sourcing</a><a href="https://tarides.com/feed.xml#fnref-26" class="footnote-backref">&#8617;</a></li> +<li><a href="https://hannes.robur.coop/Posts/VMM">hannes.robur.coop/Posts/VMM</a><a href="https://tarides.com/feed.xml#fnref-27" class="footnote-backref">&#8617;</a></li> +<li><a href="https://github.com/roburio/albatross/pull/120">https://github.com/roburio/albatross/pull/120</a><a href="https://tarides.com/feed.xml#fnref-28" class="footnote-backref">&#8617;</a></li> +<li><a href="https://hannes.nqsb.io/Posts/ReproducibleOPAM">hannes.nqsb.io/Posts/ReproducibleOPAM</a><a href="https://tarides.com/feed.xml#fnref-29" class="footnote-backref">&#8617;</a></li> +<li><a href="https://www.tweag.io/blog/2022-07-14-taming-unix-with-nix/">tweag.io/blog/2022-07-14-taming-unix-with-nix</a><a href="https://tarides.com/feed.xml#fnref-30" class="footnote-backref">&#8617;</a></li> +</ol> +</div>https://tarides.com/blog/2022-12-14-hillingar-mirageos-unikernels-on-nixosHillingar: MirageOS Unikernels on NixOS2022-12-14T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>We're in the home stretch for the full OCaml 5 release. Multicore is almost here! Yesterday its Release Candidate (RC) was announced on the <a href="https://discuss.ocaml.org/t/first-release-candidate-for-ocaml-5-0-0/10922">OCaml Discuss</a>, which is the final step before the major release, expected before Christmas.</p> +<p>To learn more about the exciting features coming with OCaml 5, you can watch <a href="https://www.youtube.com/watch?v=zJ4G0TKwzVc">KC&rsquo;s keynote address</a> and check out his <a href="https://speakerdeck.com/kayceesrk/retrofitting-concurrency-lessons-from-the-engine-room">speaker slide deck</a> as well. As always, feel free to <a href="https://tarides.com/company">contact us</a> for more information about using OCaml and for support on your OCaml projects.</p> +<p>The OCaml community has worked tirelessly to release 5.0 before the end of the year, with a lot of time spent on creating a smooth transition for OCaml users. There should be just enough time for you to try out OCaml 5 for a fun holiday project or the <a href="https://tarides.com/blog/2022-11-24-solve-the-2022-advent-of-code-puzzles-with-ocaml">Advent of Code</a>.</p> +<p>Your reports resulted in these bug fixes since the Beta 2 release last week:</p> +<ul> +<li><a href="https://github.com/ocaml/ocaml/issues/11776">11776</a>: Extend environment with functor parameters in <code>strengthen_lazy</code>. (Chris Casinghino and Luke Maurer, review by Gabriel Scherer)</li> +<li><a href="https://github.com/ocaml/ocaml/issues/11533">11533</a> and <a href="https://github.com/ocaml/ocaml/issues/11534">11534</a>: follow synonyms again in <code>#show_module_type</code> (this had stopped working in 4.14.0) (Gabriel Scherer, review by Jacques Garrigue, report by Yaron Minsky)</li> +</ul> +<p>For the full change log, <a href="https://github.com/ocaml/ocaml/blob/5.0/Changes">visit the GitHub repo</a>. The source code for the release candidate is available at these addresses:</p> +<ul> +<li><a href="https://github.com/ocaml/ocaml/archive/5.0.0-rc1.tar.gz">https://github.com/ocaml/ocaml/archive/5.0.0-rc1.tar.gz</a></li> +<li><a href="https://caml.inria.fr/pub/distrib/ocaml-5.0/ocaml-5.0.0~rc1.tar.gz">https://caml.inria.fr/pub/distrib/ocaml-5.0/ocaml-5.0.0~rc1.tar.gz</a></li> +</ul> +<p>Please keep those testing reports coming in. We believe this release candidate is ready to go, but we really value testing right up to the last minute to be even more sure. Send us your valuable input! If you find something, please <a href="https://github.com/ocaml/ocaml/issues">open an issue on GitHub</a> or join the discussion on the <a href="https://discuss.ocaml.org/t/first-release-candidate-for-ocaml-5-0-0/10922">Discuss post</a>, where you can also find installation instructions.</p>https://tarides.com/blog/2022-12-07-ocaml-5-release-candidate-now-availableOCaml 5 Release Candidate Now Available!2022-12-07T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Just about a month after the <a href="https://tarides.com/blog/2022-10-17-ocaml-5-beta-release">OCaml 5 Beta release</a>, the OCaml 5 Beta2 version has been released, taking us one step closer to the full OCaml 5 with Multicore release later this year. The OCaml community's collaboration is coming to fruition! Although we're not quite ready for the RC1 (Release Candidate) version, several things have been added and improved with Beta2.</p> +<p>To learn more about the exciting things coming with OCaml 5, please watch KC Sivaramakrishnan&rsquo;s <a href="https://www.youtube.com/watch?v=zJ4G0TKwzVc">keynote address</a> and check out <a href="https://speakerdeck.com/kayceesrk/retrofitting-concurrency-lessons-from-the-engine-room">his speaker slide deck</a> as well. As always, feel free to <a href="https://tarides.com/company">contact us</a> for more information about using OCaml and for support on your OCaml projects.</p> +<p>Here's a partial list of improvements/fixes with <a href="https://github.com/ocaml/ocaml/issues">issue numbers</a>:</p> +<ul> +<li><a href="https://github.com/ocaml/ocaml/pull/11631">#11631</a> - fix an assertion dealing with a segfault found by the Multicore test suite</li> +<li><a href="https://github.com/ocaml/ocaml/issues/11662">#11662</a>, <a href="https://github.com/ocaml/ocaml/pull/11673">#11673</a> - memory leak affecting <code>dynlink</code> with frame descriptor tables (reported by Frama-C devs)</li> +<li><a href="https://github.com/ocaml/ocaml/pull/11704">#11704</a>, <a href="https://github.com/ocaml/ocaml/issues/11669">#11669</a> - segfault with effects fixed, having been tracked down to the refactoring of <code>Effect.Unhandled</code></li> +<li><a href="https://github.com/ocaml/ocaml/pull/11701">#11701</a> - fix spurious <code>.dSYM</code> files and directories being created on macOS</li> +<li><a href="https://github.com/ocaml/ocaml/pull/11671">#11671</a> - bug in <code>top_heap_words</code> statistics accounting fix</li> +<li><a href="https://github.com/ocaml/ocaml/pull/11670">#11670</a> - macOS fix when creating empty archives</li> +<li><a href="https://github.com/ocaml/ocaml/pull/11097">#11097</a> - NetBSD fixes, including ARM64 support</li> +<li><a href="https://github.com/ocaml/ocaml/pull/11194">#11194</a>, <a href="https://github.com/ocaml/ocaml/pull/11609">#11609</a> - fixes a regression from 4.14</li> +<li><a href="https://github.com/ocaml/ocaml/pull/11622">#11622</a> - fixes a regression in error messages since 4.10</li> +<li><a href="https://github.com/ocaml/ocaml/pull/11725">#11725</a> - remove <code>caml_alloc_N</code></li> +<li><a href="https://github.com/ocaml/ocaml/pull/11661">#11661</a> - erroneous <code>-force-tmc</code> option removed</li> +<li><a href="https://github.com/ocaml/ocaml/pull/11367">#11367</a>, <a href="https://github.com/ocaml/ocaml/pull/11652">#11652</a> - Windows clean-ups</li> +<li><a href="https://github.com/ocaml/ocaml/pull/11611">#11611</a> - fix --disable-instrumented-runtime</li> +<li><a href="https://github.com/ocaml/ocaml/pull/11639">#11639</a> - configuration bookkeeping (ensure system, etc., always set)</li> +<li><a href="https://github.com/ocaml/ocaml/pull/11632">#11632</a> - minor bookkeeping bug fix</li> +<li><a href="https://github.com/ocaml/ocaml/pull/11559">#11559</a>, <a href="https://github.com/ocaml/ocaml/pull/11649">#11649</a>, <a href="https://github.com/ocaml/ocaml/pull/11640">#11640</a>, <a href="https://github.com/ocaml/ocaml/pull/11301">#11301</a>, <a href="https://github.com/ocaml/ocaml/pull/11705">#11705</a> - docs updates</li> +</ul> +<p>In short, we're continuing to stabilise the release. We're also dealing with reports coming from the wonderful testing that's been going on, especially Multicore tests and the Frama-C report. Keep those testing reports and feedback coming by <a href="https://github.com/ocaml/ocaml/issues">opening an issue on the GitHub repo</a> or chiming in through the <a href="https://discuss.ocaml.org/t/ocaml-5-0-0-second-beta-release/10871">OCaml Discuss forum post</a>.</p> +<p>Thanks to the hard work by all engineers working to make OCaml even better than before. It's a beautiful sight to watch brilliant developers come together on an open-source project like OCaml, and Tarides is proud to be part of this ever-growing community.</p>https://tarides.com/blog/2022-11-29-ocaml-5-beta2-releaseOCaml 5 Beta2 Release2022-11-29T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Too many programmers only know OCaml through a functional programming language overview course at university. They erroneously believe OCaml is used primarily in academia rather than in the real world. Not only is OCaml already used in <a href="https://tarides.com/blog/2022-11-22-six-surprising-reasons-the-ocaml-programming-language-is-good-for-business">several prominent businesses</a>, it can also be used for fun projects, like the upcoming Advent of Code.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#what-is-advent-of-code" aria-label="what is advent of code permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What is Advent of Code?</h2> +<p><a href="https://adventofcode.com/">Advent of Code</a> is an annual online Advent calendar produced specifically for programmers. It publishes a series of daily puzzles, revealing a new puzzle every day from 1 December to 25 December. That's 25 days of coding challenges! These puzzles can be solved in the language of your choice.</p> +<p>Every year they have new puzzles and anyone can participate, whether you're new to programming or a veteran. Advent of Code has been running since 2015 and has attracted a large community of developers around the world!</p> +<p>It's not only a fun way to learn new languages, but it also provides a great way to meet new people with the same interests and exchange ideas.</p> +<p>OCaml 5 is set for release later this year, so it's a perfect time to learn and practice OCaml with Advent of Code!</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#5-reasons-to-learn-ocaml" aria-label="5 reasons to learn ocaml permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>5 Reasons to Learn OCaml</h2> +<p>There are so many misconceptions around OCaml that it's hard to know where to start. Basically, OCaml can do what Python, C++, Java, or any other major programming language can do. Here are a few specific reasons why OCaml should be the next language you learn:</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#1-secure-by-design" aria-label="1 secure by design permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>1. <strong>Secure-By-Design</strong></h4> +<p>Cyber attacks have become more commonplace and sophisticated, so security has become a top priority in software development. With the proliferation of cloud services and Internet-connected devices, software must be secure-by-design to prevent malicious actors from taking advantage of any bugs or loopholes. Creating software with a secure-by-design language like OCaml helps meet this goal.</p> +<p>OCaml has built-in features and design patterns, like <strong>type and memory safety</strong>, that make it secure-by-design.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#2-performant-garbage-collector" aria-label="2 performant garbage collector permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>2. <strong>Performant Garbage Collector</strong></h4> +<p>Contrary to popular belief, OCaml's garbage collector (GC) doesn't slow things down because it's incremental, which can help avoid the problems of manual memory management in large or long-running programs.</p> +<p>OCaml's GC has to run periodically, but it can do so in <strong>small incremental steps</strong>. Although allocations that trigger a GC are longer than a malloc call (used in C), most of them are almost immediate because allocating from the minor heap is as cheap as allocating on the stack.</p> +<p>This incremental GC avoids the problems normally associated with garbage collection, like tying up memory and slowing down the process.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#3-confidence-in-the-code-through-type-checking" aria-label="3 confidence in the code through type checking permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>3. <strong>Confidence in the Code Through Type Checking</strong></h4> +<p>OCaml's compile-time type checking eliminates many potential runtime errors, and its strong type inference eliminates several redundant type annotations. Often, a programming language has either type safety or type inference, but with OCaml you get both!</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#4-multicore" aria-label="4 multicore permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>4. <strong>Multicore!</strong></h4> +<p>With the release of OCaml 5 later this year comes Multicore support! <a href="https://github.com/ocaml-multicore/ocaml-multicore">Multicore</a> is an extension of OCaml with native support for <strong>Shared-Memory Parallelism</strong> through domains and <strong>Concurrency</strong> through algebraic effects. The ability to run OCaml on multiple cores will make it even faster than before.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#5-extensive-tools--libraries" aria-label="5 extensive tools libraries permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>5. <strong>Extensive Tools &amp; Libraries</strong></h4> +<p>OCaml has some great tools and <a href="https://tarides.com/blog/2022-10-12-8-ocaml-libraries-to-make-your-life-easier">helpful libraries</a>, like <a href="https://mirage.io/">MirageOS</a>, <a href="https://irmin.org/">Irmin</a>, and so many others as reported by seasoned OCaml programmers in <a href="https://discuss.ocaml.org/t/top-5-favorite-ocaml-libraries/10626">this <em>Discuss</em> thread</a>.</p> +<p>OCaml platform tools include the <a href="https://dune.readthedocs.io/en/stable/">Dune build system</a>, <a href="https://opam.ocaml.org/">opam package manager</a>, <a href="https://tarides.com/blog/2022-07-05-the-magic-of-merlin">Merlin IDE</a>, and the <a href="https://ocaml.github.io/odoc/"><code>odoc</code> documentation generator</a>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#pro-tip-learn-ocaml-basics-first" aria-label="pro tip learn ocaml basics first permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Pro Tip: Learn OCaml Basics First</h2> +<p>Before you start solving puzzles, learn the basics to ensure you understand the most common data structures and algorithms. It's important to know how to print to the console, read and write files, and parse text.</p> +<p>It&rsquo;s also a good idea to read about common problem-solving approaches. For example, checking whether a solution is correct is a crucial part of the solving process. This will help you understand the challenges better, and you can save yourself a lot of time and frustration.</p> +<p>Get <a href="https://ocaml.org/docs/up-and-running">Up &amp; Running</a> with OCaml today through the <a href="https://ocaml.org/docs">tutorials on OCaml.org</a>. Also consider joining the <a href="https://discuss.ocaml.org/">OCaml Community Forum <em>Discuss</em></a>. They're very welcoming of those new to OCaml as well as experienced OCaml programmers, and members will quickly answer any questions you have while you learn.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusion" aria-label="conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusion</h2> +<p>The Advent of Code is a great way to practise your problem-solving skills in a new programming language during the holidays.</p> +<p>Give OCaml a try this holiday season. You won't regret it!</p> +<p>Learn more about the forthcoming OCaml 5 through <a href="https://www.youtube.com/watch?v=zJ4G0TKwzVc">KC Shivaramakrishnan's Keynote Address</a> and <a href="https://speakerdeck.com/kayceesrk/retrofitting-concurrency-lessons-from-the-engine-room">speaker deck</a>. Stay tuned to this blog for release updates and a series of posts about why you should consider OCaml as your next language.</p>https://tarides.com/blog/2022-11-24-solve-the-2022-advent-of-code-puzzles-with-ocamlSolve the 2022 Advent of Code Puzzles with OCaml2022-11-24T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Functional programming languages have been around since the 1950's, when the first high-level languages were used to program early computers. Examples of functional programming languages include OCaml, Erlang, Clojure, Haskell, Scala, and Common Lisp. Choosing the right programming language is critical to the long-term success and stability of your products, services, and operations. With strong academic roots and years of iteration and innovation, functional programming languages can offer a real competitive edge to businesses. This article explains some of the lesser-known benefits of OCaml from a business perspective.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#why-functional-programming" aria-label="why functional programming permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Why Functional Programming?</h2> +<p>The <a href="https://medium.com/javascript-scene/master-the-javascript-interview-what-is-functional-programming-7f218c68b3a0">strengths of functional programming</a> are becoming increasingly well-known. Functional programming lets programmers write programs in a declarative, logical, and mathematical style. This makes it easier for the developer to express their intent in a declarative style, where the code closely matches the specification. A specification is a mathematical description of what a program does, and programs that match their specification are proven to follow that description, which makes them predictable and safe. Furthermore, with features that limit the mutation of data and side effects, functional programs tend to have fewer bugs and vulnerabilities, remain easy to develop and maintain, and last longer overall.</p> +<p>Nowadays, widely-used imperative programming languages such as Python, Rust, and Java support programming in a functional style and use functional programming features to take <a href="https://spectrum.ieee.org/functional-programming">advantage of its strengths</a>. Features <a href="https://www.typescriptlang.org/">like rich type systems</a>, <a href="https://medium.com/digitalfrontiers/a-case-for-pattern-matching-b43a5c9796b8">pattern matching</a>, and <a href="https://en.wikipedia.org/wiki/Anonymous_function">lambda expressions</a> are becoming more mainstream, illustrating their usefulness.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#what-makes-ocaml-unique" aria-label="what makes ocaml unique permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What Makes OCaml Unique?</h2> +<p>OCaml combines a strong foundation in functional programming with some select imperative and object-oriented programming features, allowing the user to choose the best approach for the task at hand. This is part of what makes OCaml such a great general-purpose programming language; it combines the strengths of several programming styles and offers the developer a full software development toolkit.</p> +<p>OCaml provides a unique balance of performance, security, and reliability. It combines features like a garbage collector, static type-checking, type-driven development, first-class functions, and pattern matching (features that work well together). Together they result in a language known for minimising errors, debugging easily, automatically managing memory, preventing structural errors in data, and providing a user-friendly developer environment.</p> +<p>When OCaml 5 is released later this year, the language will get a significant upgrade, introducing support for shared-memory parallelism and native support for simple concurrent programming. Running programs on multiple cores will allow developers to considerably reduce the runtime of their projects by executing code in parallel. The quality-of-life updates to concurrent programming will make it easier for developers to write high-performance concurrent code.</p> +<p>So what are some of OCaml&rsquo;s greatest strengths? Here are the key reasons why businesses use OCaml to solve complex, critical, and time-sensitive problems.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#1-ocaml-is-trusted-by-several-prominent-companies" aria-label="1 ocaml is trusted by several prominent companies permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>1. OCaml Is Trusted By Several Prominent Companies</h3> +<p>Think of OCaml as an academic language? Think again! Owing to its reputation for reliability, safety, and performance, many businesses use it to solve real-world problems. Talented software engineers all over the world create robust, maintainable, energy-efficient, and fast solutions in OCaml for high-pressure environments where a single mistake can cost millions.</p> +<p><a href="https://www.docker.com">Docker</a> is making life easier for developers by providing them with a state-of-the-art integrated development pipeline that consolidates application components, all available conveniently on your desktop. Docker has over <a href="https://containerjournal.com/features/docker-inc-dev-tools-boast-15-million-users/">fifteen million</a> registered users worldwide and uses <a href="https://github.com/moby/vpnkit">VPNKit</a>, which is written in OCaml, in its Docker Desktop app to <a href="https://www.docker.com/blog/how-docker-desktop-networking-works-under-the-hood/">keep user networks secure.</a></p> +<p><a href="https://about.meta.com/company-info/">Meta</a> is a multiplatform company that uses tech to bring people together and build unique communities online. The social media giant uses OCaml in major parts of its infrastructure, such as the compiler and typechecker for its programming language Hack. Other Meta tooling that uses OCaml includes Infer, Flow, and the now retired Pfff.</p> +<p><a href="https://www.janestreet.com">Jane Street</a> is a quantitative trading firm that uses OCaml as their core solution for their research tools, trading systems, and accounting systems. Processing billions of dollars each day, Jane Street relies on OCaml to ensure it is done securely and quickly. Notably, nearly a million lines of their code is open source, and they work closely with the OCaml community to develop the language and its tools.</p> +<p>Other companies that use OCaml include <a href="https://ahrefs.com">Ahrefs</a>, an all-in-one SEO tool; <a href="https://www.nitrokey.com">Nitrokey</a>, a world-leading provider of open-source security hardware; and <a href="https://hyper.systems">Hyper</a>, who use OCaml to provide their customers with a unified data platform to manage large infrastructure. There are also two blockchains written in OCaml: <a href="https://tezos.foundation/">Tezos</a> and <a href="https://minaprotocol.com/">Mina</a>. When it comes to the modern landscape of software development, it is safe to say OCaml has an ever-growing part to play.</p> +<p><em>The Takeaway</em> : Several top companies are already using OCaml, so there is a strong tradition of successfully using OCaml on an industrial level.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#2-ocaml-has-a-growing-and-thriving-community" aria-label="2 ocaml has a growing and thriving community permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>2. OCaml Has a Growing and Thriving Community</h3> +<p>The open-source community surrounding OCaml is diverse and flourishing. It congregates in several places online, like the <a href="https://discuss.ocaml.org">Discuss forum</a>, <a href="https://github.com/ocaml">GitHub repo</a>, and <a href="https://www.reddit.com/r/ocaml/">Reddit community</a>. Thanks to its increasing popularity, more and more <a href="https://ocaml.org/docs">tutorials and documentation</a> are becoming available online. Learning new languages with the help of online material is the trend nowadays, and OCaml welcomes all developers with open arms. In fact, there is a great book for learning OCaml <a href="https://www.cambridge.org/core/books/real-world-ocaml-functional-programming-for-the-masses/052E4BCCB09D56A0FE875DD81B1ED571">entirely available online</a> called <em>Real World OCaml</em>.</p> +<p>Over thirty universities currently teach OCaml, including the University of Cambridge, Paris-Diderot University, the Indian Institute of Technology Madras, Harvard, and Cornell University. Cornell University has a <a href="https://www.cs.cornell.edu/courses/cs3110/2022fa/">great textbook</a> on OCaml. Students who learn OCaml often end up becoming part of its open-source community, contributing to projects and launching initiatives of their own. Companies that use OCaml to provide their customers with great products also contribute to the OCaml community, as do academics, researchers, and hobbyists. All entry-points contribute to the development of the language, and users end up interacting with each other across these categories, each bringing their own unique perspective.</p> +<p><em>The Takeaway</em> : The community surrounding OCaml is vibrant and thriving, allowing you to invest your time and energy in OCaml knowing it&rsquo;s here to stay.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#3-ocaml-offers-powerful-tools-and-plenty-of-support" aria-label="3 ocaml offers powerful tools and plenty of support permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>3. OCaml Offers Powerful Tools and Plenty of Support</h3> +<p>OCaml is not only an industrial-strength programming language, but it also provides industry-ready tooling and ecosystem support. The OCaml Platform is a curated set of tools that have broad community support. It includes all the tools you'd expect from an industrial-strength programming language, including a build system, package manager, editor support, and documentation generator. The OCaml Platform tells you whether one of these tools is active, under incubation, or deprecated. The OCaml Platform ensures that developers not only have an excellent language at hand but also have the tools to productively develop software with that language. Furthermore, the OCaml website has thorough resources on everything OCaml, including a comprehensive <a href="https://ocaml.org/docs/up-and-running#setting-up-development-tools">guide to setting up OCaml on your computer</a>.</p> +<p>The OCaml compiler is regularly updated and has a dedicated team focused on innovation, improving new features, and keeping everything bug-free. If you have a problem or need help, the response time across the various OCaml forums is very quick and supportive of new learners. Recently, the aforementioned great and comprehensive book <a href="https://www.cambridge.org/core/books/real-world-ocaml-functional-programming-for-the-masses/052E4BCCB09D56A0FE875DD81B1ED571">Real World OCaml</a> has been updated for its 2nd edition. This edition is available to download online as a free PDF to make learning OCaml even <a href="https://tarides.com/blog/2022-10-14-real-world-ocaml-book-giveaway">more accessible.</a></p> +<p>If you want to create something new in OCaml, or need help building something, there are several companies as well as hobbyist groups to consult. You can find like-minded people to discuss your ideas with on the <a href="https://discuss.ocaml.org">OCaml forum <em>Discuss</em></a>. Tarides is one of the companies that regularly work on OCaml, so you can <a href="https://tarides.com/company">send us a message</a> if you&rsquo;d like help with a project.</p> +<p><em>The Takeaway</em> : OCaml offers several up-to-date tools supported by an active group of contributors and maintainers. Using OCaml means getting help fast and having a complete developer environment with everything you need.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#4-ocaml-is-secure-by-design" aria-label="4 ocaml is secure by design permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>4. OCaml is Secure-By-Design</h3> +<p>In today&rsquo;s connected world, keeping yourself and your data safe online is not just a matter of convenience, but it's also a crucial task that can have serious personal and professional repercussions. Luckily, the OCaml programming language is built in a way that promotes safety, including features and design patterns that make it secure-by-design. It's easy to integrate formally verified code with OCaml programs, which makes OCaml more secure. Some formally verified libraries available in OCaml include <a href="https://hacl-star.github.io/">Microsoft's HACL*</a> and <a href="https://news.mit.edu/2019/fiat-cryptography-chrome-android-0617">MIT's Fiat</a>.</p> +<p>Secure-by-design is a known programming term which means that a language is constructed in a way that fundamentally promotes security and minimises vulnerabilities. A language that is secure-by-design makes it impossible to introduce a large class of security vulnerabilities into programs written using that language. A great example of how secure-by-design principles are implemented in OCaml is its type and memory safety, which prevents the most frequent kinds of attacks and crashes from ever happening.</p> +<p>Memory-safety attacks are extremely common, with approximately 70% of zero-day attacks being <a href="https://www.itpro.co.uk/security/zero-day-exploit/360447/why-zero-day-exploits-are-surging-on-an-unprecedented-scale">memory-safety attacks</a>. OCaml is memory safe because it doesn&rsquo;t allow a pointer (the designator of the information being written into memory) to enter information into an unauthorised memory block. As a result, you can&rsquo;t make a program crash with OCaml by manipulating where it writes code into memory, as OCaml simply does&rsquo;t allow this to happen. This prevents programs crashing due to memory exploits, including buffer overflows, where memory is &lsquo;tricked&rsquo; into writing more than the block &lsquo;allows.&rsquo;</p> +<p>OCaml is also statically-typed and type-safe, meaning that it detects errors at compile time and completely stops programs with defects from running, as well as limits what type of operations can be performed on which kinds of data. Both work to remove bugs and errors from the code, making programs written in OCaml more reliable and consistent.</p> +<p><em>The Takeaway</em> : Cybersecurity is an increasing area of concern, and OCaml has several built-in features that help make it secure-by-design. It&rsquo;s an excellent choice for projects where security is paramount.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#5-ocaml-is-big-on-performance-and-developer-productivity" aria-label="5 ocaml is big on performance and developer productivity permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>5. OCaml is Big on Performance and Developer Productivity</h3> +<p>OCaml is hailed for striking a balance between a large number of advanced features and performance. For example, it has a very efficient compiler that&rsquo;s divided into two parts: a bytecode compiler and a native compiler. The bytecode compiler is very quick and generates small, portable executables. The native code compiler produces highly-efficient machine code.</p> +<p>Since OCaml also allows for some uses of imperative and object-oriented programming features, it&rsquo;s possible to use them in places where they can help with performance. This flexibility of programming paradigms is another way that OCaml helps programmers increase the speed and efficiency of the code they write.</p> +<p>Type inference allows the language to infer what type is being used, removing the need for the developer to annotate every single variable in their code. This makes developing in OCaml faster than many other languages that lack type inference. OCaml also allows the developer to write complex algorithms without introducing bugs. The developer can easily optimise algorithms for greater speed without compromising performance and security. Furthermore, the presence and use of algebraic data types, higher order functions, and immutable data all make manipulating large and complex data structures much easier and faster.</p> +<p>It is also worth noting that OCaml offers several strong methods for debugging its programs. From the fast, interactive, REPL to the powerful symbolic replay debugger, OCaml lets you eliminate bugs at compile time and avoid them at runtime. This, in combination with how effective the debugging programs are, makes OCaml an easy language to debug. This saves developer time and increases the productivity and speed of programming.</p> +<p><em>The Takeaway</em> : The OCaml language is strong on performance and has a lot of features that make the code run fast, while also making the development process more efficient.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#6-ocaml-is-multicore" aria-label="6 ocaml is multicore permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>6. OCaml is Multicore!</h3> +<p>With the imminent release of OCaml 5, the language will support the use of multiple cores and have enhanced infrastructure in place for concurrent programming. Both bring significant performance boosts, allowing users to increase the speed of their programs.</p> +<p>The new I/O library Eio can serve more than one million requests per second, outperforming Go&rsquo;s <code>net/http</code> and closely matching Rust&rsquo;s <code>hyper</code>. Writing concurrent code will also be much easier in OCaml 5, just like writing regular OCaml. The &lsquo;function colouring problem,&rsquo; whereby concurrent and non-concurrent code are incompatible, will also be a thing of the past after the new release. With OCaml 5, both kinds of code can coexist with minimal intervention on the part of the programmer.</p> +<p>Multicore or parallel programming increases the efficiency of a program by several orders of magnitude. By letting the computer use more than one core to execute the code, the program can do several things simultaneously rather than consecutively. For complex tasks that take a long time, Multicore revolutionises their applicability, making them more realistic and time-efficient alternatives.</p> +<p>You can look forward to more posts on the <a href="https://tarides.com/blog">Tarides blog</a> about OCaml 5, the technology behind it, and its use cases.</p> +<p><em>The Takeaway</em> : OCaml 5 is coming, and with it, Multicore. OCaml will become even more powerful, both for parallel programming and concurrent programming, allowing users to significantly boost the speed of their projects.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusion" aria-label="conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusion</h2> +<p>Choosing the perfect programming language for you is an important but difficult task. It needs to be powerful and guarantee performance while simultaneously offering strong security features. Developers are also going to need state-of-the-art tools alongside responsive help and support. OCaml is a good candidate that offers all of the above, making it great for businesses looking for a versatile and robust programming language.</p> +<p>Combining the power of functional programming, Multicore, and open source, OCaml offers a potent mix of strong features and an engaged community. For more information about OCaml, you can visit the <a href="https://ocaml.org/about">OCaml Website</a> or <a href="https://tarides.com/company">contact Tarides</a> to see how we can make OCaml work for you.</p>https://tarides.com/blog/2022-11-22-six-surprising-reasons-the-ocaml-programming-language-is-good-for-businessSix Surprising Reasons the OCaml Programming Language is Good for Business2022-11-22T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#open-source-india-2022" aria-label="open source india 2022 permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Open Source India 2022</h1> +<p>With OCaml 5 just around the corner, it's been a really exciting year to attend conferences all over the world. Just recently, I presented some highlights of the OCaml 5 update, building on <a href="https://www.youtube.com/watch?v=zJ4G0TKwzVc">KC Sivaramakrishnan's great keynote address</a> at the 19th annual <a href="http://opensourceindia.in">Open Source India 2022</a> conference. Since Tarides was invited to participate, I gave a talk on <em>OCaml 5: Language, Platform, and Ecosystem</em> by starting with OCaml's history and ending with a Multicore OCaml matrix implementation running on 120 cores!</p> +<p>Open Source India was held on the 29-30th September 2022 as a physical event at the NIMHANS Convention Centre, Bengaluru, India. It was organised by the <a href="https://www.opensourceforu.com/">Open Source For You</a> magazine team in India, with the help of community and industry participation. The conference ran along multiple parallel &quot;tracks&quot; - FOSS for Everyone, Developers, CXO Summit, DevOps, AI &amp; ML, Data Management, and IT Infrastructure. My talk was part of the Developers track on the second day.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#day-i" aria-label="day i permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Day I</h2> +<p>The conference had many exhibits, and I interacted with a number of participants at the booths. <a href="https://www.mosip.io/">MOSIP</a> is an open source platform for national foundational identities. Some Governments implement a digital identity system for its citizens, and MOSIP provides a robust, scalable, open source platform for governance. While they currently use <a href="https://github.com/mosip/registration/blob/master/db_scripts/README.md">PostgreSQL</a> as their database backend, it would be useful to re-model their backend to use <a href="https://irmin.io/">Irmin</a> as the data store for security reasons.</p> +<p>Another Business-to-Consumer (B2C), open-source software application was <a href="https://www.chatwoot.com/">Chatwoot</a>, a customer engagement and support platform that also uses PostgreSQL. It would be an interesting data modeling or solution architect project to implement Irmin support for their chat application. The <a href="https://www.umwelten.xyz/dwelling/">Compossible Umwelten</a> company are working on wearable computing using ARM processors, and they were interested in exploring using OCaml, instead of C, for their customer products.</p> +<p>Post-lunch, I attended a talk on <em>Open Source at AWS</em> by Suman Debnath, Principal Developer Advocate, Data Engineering and Analytics at Amazon Web Services. We had the opportunity to discuss the possibility of providing the OCaml Platform and Products available through Amazon directly to end users.</p> +<p>In the afternoon, I took the time to attend the AI &amp; ML track. The <em>Adopting MLSecOps</em> talk was presented by Dibya Prakash, CTO and Principal Consultant at Neural Hub, and he introduced me to MLSecOps and best practices in the industry. This was followed by a talk on <em>Time Series Analysis: Anticipating Future with Darts</em> by Binitha MT and Subhankar Adak from Dell. It was an interesting first day at the conference with useful discussions on technology and real world experiences.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#day-ii" aria-label="day ii permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Day II</h2> +<p>In the morning, I spent some time at the speaker's lounge reviewing the slides for OCaml 5, as well as setting up the demo for the Multicore OCaml code examples. I also had the chance to meet my colleague, Puneeth Chaganti, who works remotely from Bengaluru.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/b6202cc9d8a0ef2a09c993ece992a258/80e3c/Vf8n4at.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 75.29411764705883%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/b6202cc9d8a0ef2a09c993ece992a258/7bf67/Vf8n4at.jpg" class="gatsby-resp-image-image" alt="Puneeth and Shakthi" title="Puneeth and Shakthi" srcset="/static/b6202cc9d8a0ef2a09c993ece992a258/651be/Vf8n4at.jpg 170w, +/static/b6202cc9d8a0ef2a09c993ece992a258/d30a3/Vf8n4at.jpg 340w, +/static/b6202cc9d8a0ef2a09c993ece992a258/7bf67/Vf8n4at.jpg 680w, +/static/b6202cc9d8a0ef2a09c993ece992a258/80e3c/Vf8n4at.jpg 720w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>That afternoon, I gave my talk on <em>OCaml 5: Language, Platform, and Ecosystem</em>. After reviewing OCaml's history, I discussed the recent <a href="https://github.com/ocaml/ocaml/pull/10831">Multicore OCaml merge</a>, then delved into the syntax of the OCaml 5 language: basic types, operations, control structures, data structures, user types, functions, recursion, and I/O. Following this, I showed examples of <a href="https://github.com/ocaml-multicore/domainslib">domainslib</a> and <a href="https://github.com/ocaml-multicore/eio">Eio</a> before demonstrating the impressive Multicore OCaml matrix implementation running on 120 cores!</p> +<p>Additionally, I presented the various platform tools available in the OCaml community, including the <a href="http://opam.ocaml.org/">OCaml package manager (opam)</a>, the <a href="https://dune.build/">Dune</a> build system, <a href="https://ocaml.github.io/odoc/">odoc</a>, <a href="https://github.com/ocaml/ocaml-lsp">OCaml-LSP</a>, <a href="https://ocaml.github.io/merlin/">Merlin</a>, and <a href="https://github.com/realworldocaml/mdx">MDX</a>. I also introduced the following ecosystem projects: <a href="https://github.com/ocaml-bench/sandmark">Sandmark</a> benchmarking suite, <a href="https://tezos.com/">Tezos</a> blockchain, <a href="https://irmin.io/">Irmin</a> database, <a href="https://mirageos.org/">MirageOS</a> library operating system, <a href="https://aantron.github.io/dream/">Dream</a> web framework, and <a href="https://ocaml.xyz/">OCaml Scientific Computing</a> project. I finished my talk with some useful references for OCaml. To my delight, the participants were curious to learn more!</p> +<p>The conference gave me a great opportunity to reach out to developers and make them aware of the current state of OCaml. It was good to share the platform and ecosystem projects with them so that they can get started with their contributions. I look forward to participating in more conferences and promoting the use of OCaml!</p>https://tarides.com/blog/2022-11-16-ocaml-5-at-open-source-india-2022OCaml 5 at Open Source India 20222022-11-16T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>At ICFP this year, <a href="https://kcsrk.info/">KC Sivaramakrishnan</a> gave two talks that put OCaml 5 in the spotlight: his <a href="https://youtu.be/6BhmRz7eqiE">keynote</a>, &ldquo;Retrofitting Concurrency - Lessons from the Engine Room,&rdquo; and the <a href="https://speakerdeck.com/kayceesrk/ocaml-5-dot-0">opening presentation</a> of the OCaml workshop, &ldquo;OCaml 5.0 - Concurrent and Parallel Programming.&rdquo; <em>Effect Handlers</em> feature heavily, as they are the foundations on which concurrency primitives were added to OCaml. Since I knew very little about <em>effects</em> in this context, I asked KC for some pointers on where to start with learning about them. He pointed me to the <a href="https://koka-lang.github.io">Koka programming language</a>, encouraging me to set it up, play with it, and see how its type systems work with effects and effect handlers.</p> +<p>Following the usual tradition of learning something by committing to give a talk on it, I signed up to speak about <em>algebraic effects</em> at my local functional programming meetup, <a href="https://www.meetup.com/FP-Syd/">FP-SYD</a>. I figured that I would have a friendly audience that would be forgiving of excessive hand waving! :)</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#brain-food" aria-label="brain food permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Brain Food</h3> +<p>I had an absolute blast learning about <em>algebraic effects</em>! Koka was really easy to install and use. Its rich set of examples and excellent documentation make it work really well as a place to explore the concepts of effect systems.</p> +<p>That got me started, but what really hooked me was discovering Andrej Bauer&rsquo;s paper, <em>What is Algebraic about Algebraic Effects and Handlers.</em> As a mathematician, I found it to be an accessible way of getting to the topic's theoretical underpinnings (see <a href="https://github.com/yallop/effects-bibliography#2018">here</a> for the paper and videos). Next, Matija Pretnar&rsquo;s <a href="http://www.eff-lang.org/handlers-tutorial.pdf">tutorial</a>, <em>An Introduction to Algebraic Effects and Handlers,</em> complimented Bauer's work really well.</p> +<p>As I learned more about the topic, I realised just how deep an area this is, and I realised how little I would know about it in four weeks of enthusiastic (but decidedly surface-level) reading. So, I decided to pitch the talk as an overview or a survey of the field. To make things concrete, I would also focus on examples from the effect handling systems of Koka and OCaml 5.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#the-talk" aria-label="the talk permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The Talk</h3> +<p>October&rsquo;s FP-SYD was on the 19th. Around 6pm in Sydney&rsquo;s Central Business District, about twenty people showed up, ate pizza, and settled into general functional programming-related geekery. Haskell is very strongly represented in this community, and it turned out that most of the audience had not spent much time with effect systems, as their tools for working with effects have been Monads, Monad Transformers, and accompanying abstractions.</p> +<p>I enjoyed talking through what I had learned about Algebraic Effects. Since I've worked on Haskell teams, I connected with the community on the various ways computational effects are handled between pure and impure functional languages. It was great to reference Alexis King's recent work that resulted in delimited continuations <a href="https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0313-delimited-continuation-primops.rst">being added to GHC.</a> The examples from Koka helped give folks a taste for the type systems that include effects and values. I also leaned on the excellent work of KC and the Tarides team, borrowing heavily from the material they have written on effect handlers for the OCaml manual.</p> +<p>The talk certainly delivered, as it got me started on the vast topic of <em>algebraic effects</em>. It also piqued the curiosity of at least a few people in my community, so that&rsquo;s not nothing! :)</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#acknowledgements" aria-label="acknowledgements permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Acknowledgements</h3> +<p>I am particularly grateful to KC for getting me curious and giving me material and inspiration to get started. Thanks also to Sudha Parimala for putting together a comprehensive tutorial on parallelism and effects in OCaml and for answering questions and making helpful suggestions. My colleague and friend Tim McGilchrist runs the FP-SYD meetup, so he helped by asking a lot of really good questions, listening to a draft version of the talk, and generally providing support and encouragement. Thanks, Tim! :)</p> +<p>The slides of my talk are available on the FP-SYD <a href="https://github.com/fp-syd/meetings/blob/master/2022/2022-10-Keswani-Algebraic-Effects-Survey.pdf">repository</a>.</p>https://tarides.com/blog/2022-11-15-presenting-on-algebraic-effects-at-fp-sydPresenting on Algebraic Effects at FP-SYD2022-11-15T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Over the last few months, Tarides has focused on designing, +prototyping, and integrating a new feature for Tezos bakers: automatic +context pruning for rolling and full nodes. This feature will allow +bakers to run Tezos with minimal disk usage while continuing to enjoy +<a href="https://tarides.com/blog/2022-04-26-lightning-fast-with-irmin-tezos-storage-is-6x-faster-with-1000-tps-surpassed">12x more responsive operations</a>. The first version has been +released with <a href="https://forum.tezosagora.org/t/octez-v15-0-has-been-released">Octez v15</a>. The complete, more optimised context pruning +feature will come with Octez v16. We encourage every Tezos baker to +upgrade and give feedback.</p> +<p><em>We have implemented context pruning for rolling and full nodes, which +requires ~35GB of disk for storing 6 cycles in the upper layer. In +Octez v15, each subsequent pruning run needs an additional 40GB, but +that space is recovered when the operation finishes. We plan to remove +that extra requirement in Octez v16.</em></p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#improve-space-usage-with-context-pruning" aria-label="improve space usage with context pruning permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Improve Space Usage with Context Pruning</h2> +<p>The <a href="https://tarides.com/feed.xml">Tezos context</a> is a versioned key/value store that associates for +each block a view of its ledger state. The versioning uses concepts +similar to Git. The current implementation is using <a href="https://tarides.com/feed.xml">irmin</a> as +backend and abstracted by the <a href="https://tarides.com/feed.xml">lib_context</a> library.</p> +<p>We have been designing, prototyping, and integrating a new structure +for Irmin storage. It is now reorganised into two +layers: one upper layer that contains the latest cycles of the +blockchain, which are still in use, and a lower layer containing +older, frozen data. A new garbage collection feature (GC) periodically +restructures the Tezos context by removing unused data in the oldest +cycles from the upper layer, where only the data still accessible from +the currently live cycles are preserved. The first version of the GC, +available in Octez-v15, is optimised for rolling and full nodes and +thus does not contain a lower layer. We plan to extend this feature in +Octez-v17 to dramatically improve the archive nodes' performance by +moving the unused data to the lower layer (more on this below).</p> +<p>Garbage collection and subsequent compression of live data improves +disk and kernel cache performance, which enhances overall node +performance. Currently, rolling nodes operators must apply a +manual cleanup process to release space on the disk by discarding +unused data. The manual cleanup is tedious and error-prone. Operators +could discard valuable data, have to stop their baker, or try to devise +semi-automatic procedures and run multiple bakers to avoid +downtime. The GC feature provides rolling nodes operators +a fully automated method to clean up the unused data and guarantees +that only the unused data is discarded, i.e., <em>all</em> currently used data +is preserved.</p> +<p>The GC operation is performed asynchronously with minimal impact on +the Tezos node. In the rolling node's case, a GC'd context uses less +disk space and has a more stable performance throughout, +as the protocol operations (such as executing smart contracts or +computing baking rewards) only need data from the upper layer. As +such, the nodes that benefit from the store's layered structure don't +need to use the manual snapshot export/import&mdash;previously necessary when +the disk&rsquo;s context got too big. In the future, archive nodes&rsquo; +performance will improve because only the upper layer is needed to +validate recent blocks. <em>This means archive nodes can bake as reliably +as rolling nodes.</em></p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#tezos-storage-in-a-nutshell" aria-label="tezos storage in a nutshell permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Tezos Storage, in a Nutshell</h2> +<p>The Tezos blockchain uses <a href="https://tarides.com/feed.xml">Irmin</a> as the main storage component. Irmin +is a library to design Git-like storage systems. It has many backends, +and one of them is <a href="https://tarides.com/feed.xml"><code>irmin-pack</code></a>, which is optimised for the Tezos use +case. In the followings, we focus on the main file used to store +object data: the store <code>pack</code> file.</p> +<p><strong>Pack file:</strong> Tezos state is serialised as immutable functional objects. +These objects are marshalled in a append-only <code>pack</code> file, one after the +other. An object can contain pointers to the file's earlier (but not +later!) objects. Pointers to an earlier object are typically +represented by the offset (position) of the earlier object in the +<code>pack</code> file. The <code>pack</code> file is append-only: existing objects are +never updated.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/3f9db58c8c913ffba792f0457ff4845e/d3deb/J7N0pil.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 15.294117647058824%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/3f9db58c8c913ffba792f0457ff4845e/c5bb3/J7N0pil.png" class="gatsby-resp-image-image" alt="J7N0pil" title="J7N0pil" srcset="/static/3f9db58c8c913ffba792f0457ff4845e/04472/J7N0pil.png 170w, +/static/3f9db58c8c913ffba792f0457ff4845e/9f933/J7N0pil.png 340w, +/static/3f9db58c8c913ffba792f0457ff4845e/c5bb3/J7N0pil.png 680w, +/static/3f9db58c8c913ffba792f0457ff4845e/b12f7/J7N0pil.png 1020w, +/static/3f9db58c8c913ffba792f0457ff4845e/b5a09/J7N0pil.png 1360w, +/static/3f9db58c8c913ffba792f0457ff4845e/d3deb/J7N0pil.png 1758w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<blockquote> +<p>An Irmin <code>pack</code> file as a sequence of objects: | obj | obj | obj | ...</p> +</blockquote> +<p><strong>Commit objects:</strong> Some of the objects in the <code>pack</code> file are commit +objects. A commit, together with the objects reachable from that +commit, represents the state associated to a Tezos' block. The +Tezos node only needs the last commit to process new blocks, but +bakers will need a lot more commits to compute baking rewards. +Objects not reachable from these commits can are unreachable or dead +objects.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/ecda34b084e8e0384500bf41aaf273f3/bdcd6/DQJJLll.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 95.29411764705883%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/ecda34b084e8e0384500bf41aaf273f3/7bf67/DQJJLll.jpg" class="gatsby-resp-image-image" alt="DQJJLll" title="DQJJLll" srcset="/static/ecda34b084e8e0384500bf41aaf273f3/651be/DQJJLll.jpg 170w, +/static/ecda34b084e8e0384500bf41aaf273f3/d30a3/DQJJLll.jpg 340w, +/static/ecda34b084e8e0384500bf41aaf273f3/7bf67/DQJJLll.jpg 680w, +/static/ecda34b084e8e0384500bf41aaf273f3/990cb/DQJJLll.jpg 1020w, +/static/ecda34b084e8e0384500bf41aaf273f3/c44b8/DQJJLll.jpg 1360w, +/static/ecda34b084e8e0384500bf41aaf273f3/bdcd6/DQJJLll.jpg 1662w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<blockquote> +<p>The data-structure (mental model) representation of the <code>pack</code> file vs. its physical representation.</p> +</blockquote> +<p><strong>Archive nodes and rolling nodes:</strong> There are different types of +Tezos nodes. An archive node stores the complete blockchain +history from the genesis block. Currently, this is over <em>2 million</em> +blocks. Roughly speaking, a block corresponds to a commit. A +rolling node stores only the last <em>n</em> blocks, where <em>n</em> is chosen +to keep the total disk usage within some bounds. This may be as small +as 5 (or even less) or as large as 40,000 or more. Another type of +node is the &quot;full node,&quot; which is between an archive node and a +rolling node.</p> +<p><strong>Rolling nodes, disk space usage:</strong> The purpose of the rolling node +is to keep resource usage, particularly disk space, bounded by only +storing the last blocks. However, the current implementation does +not achieve this aim. As rolling nodes execute, the <code>pack</code> file +grows larger and larger, and no old data is discarded. To get around +this problem, node operators periodically export snapshots of the +current blockchain state from the node, delete the old data, +and then import the snapshot state back.</p> +<p><strong>Problem summary:</strong> The main problem we want to avoid is Tezos users +having to periodically export and import the blockchain state to +keep the disk usage of the Tezos node bounded. Instead, we want to +perform context pruning via automatic garbage collection of unreachable +objects. Periodically, a commit should be chosen as the GC +root, and objects constructed before the commit that are not +reachable from the commit should be considered dead branches, removed from +the <code>pack</code> store, and the disk space reclaimed. The problem is that +with the current implementation of the <code>pack</code> file, which is just an +ordinary file, it is impossible to &quot;delete&quot; regions corresponding to +dead objects and reclaim the space.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#automatised-garbage-collection-solution" aria-label="automatised garbage collection solution permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Automatised Garbage Collection Solution</h2> +<p>Consider the following <code>pack</code> file, where the <code>GC-commit</code> object has +been selected as the commit root for garbage collection:</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/6ec700aa3bc32bb9faa417d7c414e6e4/668c6/ySiXa1r.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 28.235294117647058%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/6ec700aa3bc32bb9faa417d7c414e6e4/c5bb3/ySiXa1r.png" class="gatsby-resp-image-image" alt="ySiXa1r" title="ySiXa1r" srcset="/static/6ec700aa3bc32bb9faa417d7c414e6e4/04472/ySiXa1r.png 170w, +/static/6ec700aa3bc32bb9faa417d7c414e6e4/9f933/ySiXa1r.png 340w, +/static/6ec700aa3bc32bb9faa417d7c414e6e4/c5bb3/ySiXa1r.png 680w, +/static/6ec700aa3bc32bb9faa417d7c414e6e4/b12f7/ySiXa1r.png 1020w, +/static/6ec700aa3bc32bb9faa417d7c414e6e4/b5a09/ySiXa1r.png 1360w, +/static/6ec700aa3bc32bb9faa417d7c414e6e4/668c6/ySiXa1r.png 1692w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>Objects that precede the commit root are either reachable from the +commit (by following object references from it) or not. For the +unreachable objects, we want to reclaim the disk space. For reachable +objects, we need to be able to continue to access them via their +offset in the <code>pack</code> file.</p> +<p>The straightforward solution is to implement the <code>pack</code> file using two +other data structures: the <code>suffix</code> and the <code>prefix</code>. The <code>suffix</code> +file contains the root commit object (<code>GC-commit</code>) and the live +objects represented by <em>all</em> bytes following the offset of <code>GC-commit</code> +in the <code>pack</code> file. The <code>prefix</code> file contains all the objects +reachable from the root commit, indexed by their offset. Note that the +reachable objects appear earlier in the <code>pack</code> file than the root +commit.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/5ae9c879daf866bfd4791176a2e9e4d5/09262/QVeXtOB.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 37.05882352941176%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/5ae9c879daf866bfd4791176a2e9e4d5/c5bb3/QVeXtOB.png" class="gatsby-resp-image-image" alt="QVeXtOB" title="QVeXtOB" srcset="/static/5ae9c879daf866bfd4791176a2e9e4d5/04472/QVeXtOB.png 170w, +/static/5ae9c879daf866bfd4791176a2e9e4d5/9f933/QVeXtOB.png 340w, +/static/5ae9c879daf866bfd4791176a2e9e4d5/c5bb3/QVeXtOB.png 680w, +/static/5ae9c879daf866bfd4791176a2e9e4d5/b12f7/QVeXtOB.png 1020w, +/static/5ae9c879daf866bfd4791176a2e9e4d5/b5a09/QVeXtOB.png 1360w, +/static/5ae9c879daf866bfd4791176a2e9e4d5/09262/QVeXtOB.png 1896w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<blockquote> +<p>The layered structure of the <code>pack</code> file with <code>prefix</code>+<code>suffix</code> as the upper layer.</p> +</blockquote> +<p>Reading from the <code>pack</code> file is then simulated in an obvious way: if +the offset is for the <code>GC-commit</code>, or later, we read from the <code>suffix</code> +file, and otherwise, we lookup the offset in the <code>prefix</code> and return +the appropriate object. We only access the reachable objects in the +<code>prefix</code> via their offset. We replace the Irmin <code>pack</code> file with +these two data structures. Every time we perform garbage collection +from a given <code>GC-commit</code>, we create the next versions of the <code>prefix</code> +and <code>suffix</code> data-structures and <em>switch</em> from the current version to the next +version by deleting the old <code>prefix</code> and <code>suffix</code> to reclaim +disk space. Creating the next versions of the <code>prefix</code> and <code>suffix</code> +data-structures is potentially expensive. Hence, we implement these steps in a +separate process, the <em>GC worker</em>, with minimal impact on the running +Tezos node.</p> +<p><strong>Caveat:</strong> Following Git, a commit will typically reference its +parent commit, which will then reference its parent, and so +on. Clearly, if we used these references to calculate object +reachability, all objects would remain reachable forever. However, +this is not what we want, so when calculating the set of reachable +objects for a given commit, we ignore the references from a commit +to its parent commit.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#the-prefix-data-structure" aria-label="the prefix data structure permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The <code>prefix</code> Data-Structure</h2> +<p>The <code>prefix</code> is a persistent data-structure that implements a map from +the offsets in <code>pack</code> file to objects (the marshalled bytes +representing an object). In our scenario, the GC worker creates the +<code>prefix</code>, which is then read-only for the main process. Objects are +never mutated or deleted from the <code>prefix</code> file. In this setting, a +straightforward implementation of an object store suffices: we store +reachable objects in a data file and maintain a persistent <code>(int &rarr; int)</code> map from &quot;offset in the original <code>pack</code> file&quot; to &quot;offset in the +<code>prefix</code> file.&quot;</p> +<p><strong>Terminology:</strong> We introduce the term &quot;virtual offset&quot; for &quot;offset in +the original <code>pack</code> file&quot; and the term &quot;real offset&quot; for &quot;offset in +the <code>prefix</code> file.&quot; Thus, the map outlining virtual offset to real +offset is made persistent as the <code>mapping</code> file.</p> +<p><strong>Example:</strong> Consider the following, where the <code>pack</code> file contains +reachable objects <code>o1</code> .. <code>o10</code>, (with virtual offsets <em>v1 .. v10</em>, respectively):</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/b1a2c914473284ac9bb2b688a1852562/cb88c/JKWA4ff.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 54.70588235294118%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/b1a2c914473284ac9bb2b688a1852562/c5bb3/JKWA4ff.png" class="gatsby-resp-image-image" alt="JKWA4ff" title="JKWA4ff" srcset="/static/b1a2c914473284ac9bb2b688a1852562/04472/JKWA4ff.png 170w, +/static/b1a2c914473284ac9bb2b688a1852562/9f933/JKWA4ff.png 340w, +/static/b1a2c914473284ac9bb2b688a1852562/c5bb3/JKWA4ff.png 680w, +/static/b1a2c914473284ac9bb2b688a1852562/b12f7/JKWA4ff.png 1020w, +/static/b1a2c914473284ac9bb2b688a1852562/b5a09/JKWA4ff.png 1360w, +/static/b1a2c914473284ac9bb2b688a1852562/cb88c/JKWA4ff.png 2728w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>Note that the objects <code>o1</code> .. <code>o10</code> are scattered throughout the +<code>pack</code> file where they appear in ascending order (i.e., <em>v1 &lt; .. &lt; +v10</em>). The <code>prefix</code> file contains the same objects but with different +&quot;real&quot; offsets <em>r1..r10</em>, as now the objects <code>o1 .. o10</code> appear one +after the other. The <code>mapping</code> needs to contain an entry <em>(v1 &rarr; r1)</em> +for object <code>o1</code> (and similarly for the other objects) to relate the +virtual offset in the <code>pack</code> file with the real offset in the <code>prefix</code> +file.</p> +<p>To read from &quot;virtual offset <em>v3</em>&quot; (say), we use the map to retrieve +the real offset in the <code>prefix</code> file (i.e., <em>r3</em>) and then read the object +data from that position.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#asynchronous-implementation" aria-label="asynchronous implementation permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Asynchronous Implementation</h2> +<p>Tezos Context pruning is performed periodically. We want each round of +context pruning to take place asynchronously with minimal impact +on the main Tezos node. For this reason, when a commit is chosen as +the GC root, we fork a worker process to construct the next <code>prefix</code> +and <code>suffix</code> data structures. When the GC worker terminates, the <code>main</code> process +handles worker termination. It switches from the current +<code>prefix</code>+<code>suffix</code> to the next and continues operation. This +switch takes place almost instantaneously. The hard work is done in +the worker process as depicted next:</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/34188b669c6289bf79a8b2582e07c9b9/e4900/lob23OH.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 106.47058823529412%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/34188b669c6289bf79a8b2582e07c9b9/c5bb3/lob23OH.png" class="gatsby-resp-image-image" alt="lob23OH" title="lob23OH" srcset="/static/34188b669c6289bf79a8b2582e07c9b9/04472/lob23OH.png 170w, +/static/34188b669c6289bf79a8b2582e07c9b9/9f933/lob23OH.png 340w, +/static/34188b669c6289bf79a8b2582e07c9b9/c5bb3/lob23OH.png 680w, +/static/34188b669c6289bf79a8b2582e07c9b9/e4900/lob23OH.png 988w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p><strong>Read-only Tezos nodes:</strong> In addition to the main Tezos read/write +node that accesses the <code>pack</code> store, several read-only nodes also +access the <code>pack</code> store (and other Irmin data files) in read-only +mode. These must be synchronised when the switch is made from the +current <code>prefix</code>+<code>suffix</code> to the next <code>prefix</code>+<code>suffix</code>. This +synchronisation makes use of a (new) single control file.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#further-optimisations-in-the-octez-storage-layer" aria-label="further optimisations in the octez storage layer permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Further Optimisations in the Octez Storage Layer</h2> +<p>The context pruning via automatic garbage collection performs well and +within the required constraints. However, it is possible to make +further efficiency improvements. We next describe some potential +optimisations we plan to work on over the next months.</p> +<p><strong>Resource-aware garbage collection:</strong></p> +<p>The GC worker intensively uses disk, memory, and OS resources. For +example, the disk and memory are doubled in size during the +asynchronous execution of the GC worker. We plan to improve on this by +more intelligent use of resources. For example, computing the +reachable objects during the GC involves accessing earlier objects, +using a lot of random-access reads, with unpredictable latency. A more +resource-aware usage of the file system ensures that the objects are +visited (as much as possible) in the order of increased offset on +disk. This takes advantage of the fact that sequential file access is +much quicker and predictable than accessing the file randomly. The work on +context pruning via a resource-aware garbage collection is planned to +be included in Octez v16.</p> +<p><strong>Retaining older objects for archive nodes:</strong></p> +<p>Archive nodes contain the complete blockchain history, starting from +the genesis block. This results in a huge store <code>pack</code> file, many +times larger than the kernel&rsquo;s page cache. Furthermore, live objects +are distributed throughout this huge file, which makes it difficult +for OS caching to work effectively. As a result, as the store becomes +larger, the archive node becomes slower.</p> +<p>In previous prototypes of the layered store, the design also included a +&quot;lower&quot; layer. For archive nodes, the lower layer contained all the +objects before the most recent <code>GC-commit</code>, whether they were reachable +or not. The lower layer was effectively the full segment of the +<code>pack</code> file before the GC commit root.</p> +<p>One possibility with the new layout introduced by the GC is to retain the +lower layer whilst still sticking with the <code>prefix</code> and <code>mapping</code> files +approach and preferentially reading from the <code>prefix</code> where +possible. The advantage (compared with just keeping the full <code>pack</code> +file) is that the <code>prefix</code> is a dense store of reachable objects, +improving OS disk caching and the snapshot export performance for +recent commits. In addition, the OS can preferentially cache the +<code>prefix</code>&amp;<code>mapping</code>, which enhances general archive node performance +compared with trying to cache the huge <code>pack</code> file. As baking +operations only need to access these cached objects, their performance +will be more reliable and thus will reduce endorsement misses +drastically. However, some uses of the archive node, such as +responding to RPC requests concerning arbitrary blocks, would still +access the lower layer, so they will not benefit from this +optimisation. The work on improving performance for archive nodes is +planned for Octez v17.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusion" aria-label="conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusion</h2> +<p>With the context pruning feature integrated, Tezos rolling and full nodes +accurately maintain all and only used storage data in a performant, +compact, and efficient manner. Bakers will benefit from these changes in +Octez v15, while the feature will be included in archive nodes in +Octez v17.</p>https://tarides.com/blog/2022-11-10-towards-minimal-disk-usage-for-tezos-bakersTowards Minimal Disk-Usage for Tezos Bakers2022-11-10T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p><a href="https://mirage.io/">MirageOS</a> is an OCaml ecosystem to construct <a href="https://en.wikipedia.org/wiki/Unikernel">unikernels</a>, i.e., minimal operating systems. Here, we write about our social and technical experience at the MirageOS retreat in Morocco, as well as the vibe and wonderful organisational details. To sum up the technical part, we worked on different facets of the MirageOS world: different kinds of unikernels, some groundwork for Raspberry Pi 4 bare-metal unikernels, and a workflow to leverage an existing deployment/orchestrating infrastructure. The MirageOS retreat was amazing!</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#about-our-journey" aria-label="about our journey permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>About Our Journey</h2> +<p>Our journey started in Agadir, a Moroccan city right on the coast of the Atlantic sea, just south of the Atlas mountains. In Agadir, we had the best fish in the world (according to some) and amazing &quot;cornes de gazelle,&quot; a delicious sample of Moroccan culture.</p> +<p>From Agadir, we went to Mirleft, a small town further south, full of square roads and beautiful reefs. That's where the MirageOS retreat took place. The venue had a kitchen and an amazing cook, a place for computers and presentations, a garden with a small pool, and a rooftop with dusty but nice views of the coast.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/5e247e5529e5f5a231665a5c9da5512d/d2602/mirleft_view.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 75.29411764705883%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/5e247e5529e5f5a231665a5c9da5512d/7bf67/mirleft_view.jpg" class="gatsby-resp-image-image" alt="Beautiful Mirleft Sunset" title="Beautiful Mirleft Sunset" srcset="/static/5e247e5529e5f5a231665a5c9da5512d/651be/mirleft_view.jpg 170w, +/static/5e247e5529e5f5a231665a5c9da5512d/d30a3/mirleft_view.jpg 340w, +/static/5e247e5529e5f5a231665a5c9da5512d/7bf67/mirleft_view.jpg 680w, +/static/5e247e5529e5f5a231665a5c9da5512d/990cb/mirleft_view.jpg 1020w, +/static/5e247e5529e5f5a231665a5c9da5512d/c44b8/mirleft_view.jpg 1360w, +/static/5e247e5529e5f5a231665a5c9da5512d/d2602/mirleft_view.jpg 4032w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#about-the-retreat" aria-label="about the retreat permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>About the Retreat</h2> +<p>Both the venue and Mirleft as a whole were extremely inspiring in many ways. One of which included hacking on MirageOS, which was the main reason we came--of course, but we also enjoyed amazing food, saw old and new friends, and had a great time collaborating and creating with MirageOS.</p> +<p>At least once a year since the <a href="https://mirage.io/blog/2016-spring-hackathon">first MirageOS retreat in 2016</a> (with a Covid break in 2021), people get together and work on anything related to MirageOS. These retreats provide a great atmosphere, working environment, and everything else that's needed to be productive and to have a wonderful time.</p> +<p>Besides, the retreat is always a nice opportunity to <a href="https://en.wikipedia.org/wiki/Eating_your_own_dog_food">eat our own dog food</a>.</p> +<p>The organiser, <a href="https://twitter.com/h4nnes">Hannes</a> (among others), always makes sure that most of the infrastructure we rely on is running on MirageOS as much as possible. A welcome addition this year was a local <a href="https://hannes.robur.coop/Posts/OpamMirror">opam cache</a>, which allowed us to download and install packages without crushing the data allowance on the SIM card installed on our main access point.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/70b43054abaf1459bd1fdf211865935c/3acf0/Mirleft_venue.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 75.29411764705883%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/70b43054abaf1459bd1fdf211865935c/7bf67/Mirleft_venue.jpg" class="gatsby-resp-image-image" alt="Lunch at the MirageOS Retreat" title="Lunch at the MirageOS Retreat" srcset="/static/70b43054abaf1459bd1fdf211865935c/651be/Mirleft_venue.jpg 170w, +/static/70b43054abaf1459bd1fdf211865935c/d30a3/Mirleft_venue.jpg 340w, +/static/70b43054abaf1459bd1fdf211865935c/7bf67/Mirleft_venue.jpg 680w, +/static/70b43054abaf1459bd1fdf211865935c/990cb/Mirleft_venue.jpg 1020w, +/static/70b43054abaf1459bd1fdf211865935c/c44b8/Mirleft_venue.jpg 1360w, +/static/70b43054abaf1459bd1fdf211865935c/3acf0/Mirleft_venue.jpg 2000w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#about-mirageos" aria-label="about mirageos permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>About MirageOS</h2> +<p><a href="https://mirage.io/">MirageOS</a> is an ecosystem that constructs unikernels. In a superficial nutshell, a <a href="https://en.wikipedia.org/wiki/Unikernel">unikernel</a> is a machine image that contains one process and a minimal set of operating system features the process requires. Unikernels are designed to be secure, efficient, and small. MirageOS unikernels are written in OCaml, a functional, semantically rich and type-safe programming language.</p> +<p>MirageOS can be used in a wide range of settings, like robust reimplementations of core system services and protocols like (<a href="https://github.com/mirage/ocaml-dns">DNS</a>, <a href="https://github.com/mirage/awa-ssh">SSH</a>, <a href="https://github.com/mirleft/ocaml-tls">TLS</a>, and <a href="https://github.com/mirage/">many more</a>), as well as higher level applications like <a href="https://hannes.robur.coop/Posts/OpamMirror">web services</a>. It's also on its way to become a good candidate for bare-metal applications on various chipsets (e.g., a <a href="https://github.com/dinosaure/gilbraltar">good choice for the Raspberry Pi 4</a>. See also the section below on <em>Implementing a Jack Port Driver</em>)</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#our-projects--what-we-learned" aria-label="our projects what we learned permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Our Projects &amp; What We Learned</h2> +<p>We worked on lots of interesting things, but let's start with the ones that directly relate to MirageOS.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#deploying-albatross-on-nixos-no-more-iptables-debugging" aria-label="deploying albatross on nixos no more iptables debugging permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Deploying Albatross on Nixos: No More iptables Debugging</h3> +<p><a href="https://github.com/roburio/albatross">Albatross</a> is an orchestrator for MirageOS unikernels. It runs on a Linux system and manages unikernels using Solo5. It's made of several services, one of which is the remote TLS endpoint, which accepts requests from the network to manage the orchestrator.</p> +<p>Some of us wanted to run Albatross on our favourite Linux distribution, <a href="https://github.com/NixOS/nixpkgs">NixOS</a>, and we hoped to be able to hack around this quickly; however, it turned out to be harder than expected. We learned so much about systemd and networking while doing this project.</p> +<p>A Nix flake (a new way of defining packages, which comes with many rough edges) and a NixOS module are added to the main repository in <a href="https://github.com/roburio/albatross/pull/120">this PR</a>. +To test that it works and to play with it, we've written a <a href="https://github.com/Julow/albatross-nixos-example">small tutorial</a> that explains how to build a Qemu VM with Albatross and how to deploy a unikernel using the remote TLS endpoint.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#coffee-chat-bot-a-friendly-unikernel-for-a-friendly-work-environment" aria-label="coffee chat bot a friendly unikernel for a friendly work environment permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Coffee Chat Bot: a Friendly Unikernel for a Friendly Work Environment</h3> +<p>Some of us worked on deploying a coffee chat bot as a MirageOS unikernel. Contrary to how it sounds, it isn't a robot that serves coffee (which would be extremely awesome)! Instead, it's a Slack bot that lets people on our company's Slack channel to opt-in for a coffee chat with a colleague. The coffee chat bot then matches each opt-in randomly with another opt-in.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 635px; "> + <a href="https://tarides.com/static/22a41f031c11471c02727254a9ce489f/1ddef/mirleft_coffeebot.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 19.411764705882355%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/22a41f031c11471c02727254a9ce489f/1ddef/mirleft_coffeebot.png" class="gatsby-resp-image-image" alt="Coffee Chat Bot" title="Coffee Chat Bot" srcset="/static/22a41f031c11471c02727254a9ce489f/04472/mirleft_coffeebot.png 170w, +/static/22a41f031c11471c02727254a9ce489f/9f933/mirleft_coffeebot.png 340w, +/static/22a41f031c11471c02727254a9ce489f/1ddef/mirleft_coffeebot.png 635w" sizes="(max-width: 635px) 100vw, 635px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>This bot was already written in OCaml, and it's merely a single process. Due to its nature, it doesn't need to do any super complicated operating system stuff, so a natural question came to us: why not make a unikernel out of it?</p> +<p>So <a href="https://github.com/pitag-ha/slack_bot">we did</a>.</p> +<p>Making a unikernel out of a relatively simple application sounds rather straightforward. The first step was to get rid of all Unix operations. It's incredible how many small Unix calls we were doing without even noticing. For example, we were using <code>Unix.time</code> all over the place, such as scheduling, providing a seed for the random library, and giving timestamps to our database entries.</p> +<p>The database posed another problem. We had been using <code>irmin-unix</code>, which writes to disk using Unix. To fix that, now we use <code>irmin-mem</code>, which writes to memory. We persist (and inspect) the data by syncing our in-memory database with a GitHub repository. If you're not familiar with Irmin (a MirageOS library), its design follows the principles of the Git design and provides a library called <code>irmin-git</code> to bridge the two.</p> +<p>Providing the network stack needed for the Git (and also for the Slack API) communication is one of the typical tasks the operating system needs do. In our case, that's MirageOS. It has a concept called &quot;devices,&quot; which are the operating system features your unikernel might need. Examples of &quot;devices&quot; are network interfaces, network stacks, filesystems (which we didn't need), and monotonic time sources. MirageOS will provide a concrete implementation of such a device at your unikernel's compile time, as long as you declare the device in the MirageOS configuration file <code>config.ml</code>.</p> +<p>The things described were just a small part of our nice, educational journey making a coffee chat unikernel. One more detail that's worth mentioning: the bot now uses <a href="https://github.com/dinosaure/paf-le-chien"><code>httpaf</code></a> for the Slack API interactions. Before, it was using <code>cohttp</code>, which is already independent from Unix (unlike, for example, the OCaml <code>curl</code> wrapper <code>curly</code>). Porting it to <code>httpaf</code> wasn't technically necessary, but it was a great way to get to know and test the latest &quot;cutting-edge&quot; unikernel features.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#implementing-a-jack-port-driver-or-how-to-make-a-unikernel-sing-bare-metal" aria-label="implementing a jack port driver or how to make a unikernel sing bare metal permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Implementing a Jack Port Driver, or How to Make a Unikernel Sing Bare-Metal</h3> +<p>We also went bare-metal during the retreat. &quot;Bare-metal&quot; sounds cool, doesn't it? Let us explain what we really mean by it. Often, the way to run a MirageOS unikernel is as follows:</p> +<ul> +<li>You have a Linux kernel on your machine and virtualize it via a hypervisor such as KVM.</li> +<li>That hypervisor is then abstracted further by a tool called Solo5 which integrates well with MirageOS unikernels.</li> +</ul> +<p>With this workflow, the communication between the unikernel and the hardware goes over several layers of abstraction. A &quot;bare-metal&quot; unikernel, on the contrary, communicates with the hardware directly, without any interfacing kernel such as Linux. The device we chose to do bare-metal work on is the Rasperry Pi 4 (RPi4).</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/3bde74c117dc40ffd803f1a8e763ada3/3acf0/Mirleft_rpi4.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 66.47058823529413%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/3bde74c117dc40ffd803f1a8e763ada3/7bf67/Mirleft_rpi4.jpg" class="gatsby-resp-image-image" alt="Work set-up for RPi4 hacking" title="Work set-up for RPi4 hacking" srcset="/static/3bde74c117dc40ffd803f1a8e763ada3/651be/Mirleft_rpi4.jpg 170w, +/static/3bde74c117dc40ffd803f1a8e763ada3/d30a3/Mirleft_rpi4.jpg 340w, +/static/3bde74c117dc40ffd803f1a8e763ada3/7bf67/Mirleft_rpi4.jpg 680w, +/static/3bde74c117dc40ffd803f1a8e763ada3/990cb/Mirleft_rpi4.jpg 1020w, +/static/3bde74c117dc40ffd803f1a8e763ada3/c44b8/Mirleft_rpi4.jpg 1360w, +/static/3bde74c117dc40ffd803f1a8e763ada3/3acf0/Mirleft_rpi4.jpg 2000w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>So we needed an RPi4 bare-metal OCaml runtime. Luckily, Dinosaure wrote one last year: <a href="https://github.com/dinosaure/gilbraltar">Gilbraltar</a>. It also dumps the text of OCaml print statements into the UART, which is a technical way of saying that we can send such text over USB (concretely over a USB to serial TTL cable) and see it. Quite useful for debugging!</p> +<p>As you can see, doing bare-metal work is quite restrictive and everything that tends to be taken for granted needs to be implemented, like drivers, for example.</p> +<p>So that's what we decided to do.</p> +<p>Last year, some colleagues already implemented a driver for LED strips and powered our office's <a href="https://twitter.com/Dinoosaure/status/1471128595154231300">Christmas tree</a> with a bare-metal OCaml RPi4! What is cooler than <a href="https://tarides.com/blog/2021-11-11-mirageos-workshop-working-with-the-raspberry-pi-4">making our bare-metal RPi4 Christmas tree sing</a>? Well, a lot of things are. Anyways, we love music, so we decided to implement a jack port driver.</p> +<p>Jack port drivers on a digital device are an interesting concept. Digital devices are digital, but jack ports expect analog data. One way the RPi4 can handle that is via a concept called PWM: Pulse Width Modulation. The PWM modulates analog signals (i.e., values between 0 and 1) by sending digital signals (i.e., either 0 or 1) really fast.</p> +<p>That modulation is done on the hardware side of the RPi4, concretely on a RPi4 <em>peripheral</em> also called PWM. <a href="https://datasheets.raspberrypi.com/bcm2711/bcm2711-peripherals.pdf">Peripherals</a> are RPi4 hardware devices that are mapped to specific address ranges in the RPi4's memory. You communicate with them by writing to or reading from those locations in memory. The address range of each peripheral is structured into registers. One example of a register of the PWM is the PWM FIFO, i.e., the hardware queue that stores the data flowing from the program to the jack port.</p> +<p><a href="https://github.com/pitag-ha/rpi/blob/jack-port-driver-on-interrupts/src/peripherals/pwm.ml">Our jack port driver</a> does two things--both by writing to and reading from the right places in the PWM memory range.</p> +<ol> +<li>It can initiate the RPi4 for jack port communication (e.g., it sets the RPi4's clock to the correct frequency at which the port reads data from the FIFO, and it configures the correct modes to ensure the right data flow).</li> +<li>It can send music to the jack port (by writing data to the FIFO--without overflowing it).</li> +</ol> +<p>To use the new driver, we convert music into the right binary format by simply using <code>ffmpeg</code>. Then we <a href="https://github.com/pitag-ha/rpi/blob/jack-port-driver-on-interrupts/test/bare-metal/jack_port/main.ml">write a program</a> with that music in-memory using the MirageOS tool <a href="https://github.com/mirage/ocaml-crunch"><code>ocaml-crunch</code></a>. That program just calls the driver to do the rest and is compiled for the RPi4 target with <code>gilbraltar</code>.</p> +<p>This work is strongly related to MirageOS in three ways. First, the program playing music bare-metal on the RPi4 is a unikernel written in OCaml. Second, the program is compiled with <code>gilbraltar</code>, which forms part of the MirageOS ecosystem and whose design and implementation is based on core tools in the MirageOS ecosystem, such as Solo5 and <code>ocaml-solo5</code>. Third, by adding one layer of abstraction to the jack port driver, we can make it a MirageOS &quot;device,&quot; so one could use the driver while also leveraging other MirageOS features that work bare-metal on a RPi4.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#monitoring-mirageio-and-chasing-memory-leaks" aria-label="monitoring mirageio and chasing memory leaks permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Monitoring mirage.io and Chasing Memory Leaks</h3> +<p>One of the MirageOS goals is to be able self host our infrastructure. At the retreat, many tools we used were based on the MirageOS ecosytem: a DNS resolver (<a href="https://github.com/mirage/ocaml-dns">mirage/ocaml-dns</a>), an opam repository cache (<a href="https://git.robur.io/robur/opam-mirror">robur/opam-mirror</a>), and a portable file transfer application (<a href="https://github.com/dinosaure/bob">dinosaure/bob</a>). It's not a surprise that the official website, <a href="https://mirage.io">mirage.io</a>, is a unikernel itself. However, in the past six months, we experienced two website crashes due to <code>Out_of_memory</code> exceptions. The unikernel is configured to run with 1GB of RAM, so that's a slow running memory leak that requires investigation.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/29c60ca463f3e469222eba820432f3a8/70ad2/mirleft_monitoring.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 69.41176470588235%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/29c60ca463f3e469222eba820432f3a8/c5bb3/mirleft_monitoring.png" class="gatsby-resp-image-image" alt="Monitoring mirage.io" title="Monitoring mirage.io" srcset="/static/29c60ca463f3e469222eba820432f3a8/04472/mirleft_monitoring.png 170w, +/static/29c60ca463f3e469222eba820432f3a8/9f933/mirleft_monitoring.png 340w, +/static/29c60ca463f3e469222eba820432f3a8/c5bb3/mirleft_monitoring.png 680w, +/static/29c60ca463f3e469222eba820432f3a8/b12f7/mirleft_monitoring.png 1020w, +/static/29c60ca463f3e469222eba820432f3a8/b5a09/mirleft_monitoring.png 1360w, +/static/29c60ca463f3e469222eba820432f3a8/70ad2/mirleft_monitoring.png 1665w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>The question is how to investigate such a leak.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#locally" aria-label="locally permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Locally?</h4> +<p>The initial attempt consisted in <em>tracing</em> memory allocations using <code>statmemprof</code> while bombarding the server with requests by using benchmarking tools such as ApacheBench (<code>ab</code>) or <code>wrk</code>. <code>statmemprof</code> is an implementation of <em>Statistical Memory Profiling</em> in the OCaml runtime. It enables sampling allocations at a fixed rate and tracing values until they are garbage collected. Using <a href="https://github.com/janestreet/memtrace_viewer">memtrace-viewer</a>, one can analyse the memory usage and see which values are still live when the program goes out of memory, for example. For a unikernel with network access, it's possible to add an endpoint to enable tracing on demand: <a href="https://github.com/roburio/memtrace-mirage">roburio/memtrace-mirage</a>.</p> +<p>Unfortunately, this setup didn't help us identify the leak. Indeed, we can still expect the server to work fine under normal conditions. Somehow we need to understand which rare event, leaking a small bit of memory at a time, is happening enough times to consume all available memory.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#monitoring-the-live-unikernel" aria-label="monitoring the live unikernel permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Monitoring the Live Unikernel</h4> +<p>Only the <em>Real&trade;</em> Internet would tell us the answer, so we monitored the live unikernel application. <a href="https://github.com/roburio/mirage-monitoring">roburio/mirage-monitoring</a> was of great help, as it enables two things:</p> +<ul> +<li>Reporting application-wide metrics to an InfluxDB endpoint, which can be displayed using Grafana</li> +<li>Changing logs level/metrics sources at runtime</li> +</ul> +<p>Adding <code>mirage-monitoring</code> to a unikernel was surprisingly easy. It was only a matter of updating the configuration file with some functoria voodoo: <a href="https://github.com/mirage/mirage-www/pull/767">https://github.com/mirage/mirage-www/pull/767</a>. At some point, it will upstreamed in the <code>mirage</code> tool so that adding monitoring is a single-line job. The hard part was providing the unikernel two network stacks to expose one to the internet while keeping the other for internal use only.</p> +<p>Next, we set up a typical Grafana deployment using InfluxDB/Telegraf for the metrics input and data storage. Logs were displayed using <code>albatross-client-local</code>.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#chasing-the-leak" aria-label="chasing the leak permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Chasing the Leak</h4> +<p>Now we can see the numbers for the live website. Memory usage, indeed, but also other metrics were included by default, such as the number of established connections in the TCP stack. There we found the source of the leak. Throughout the day, the number of established TCP connections kept increasing.</p> +<p>Finally at runtime, we temporarily changed the TCP stack's log level to <em>debug</em>, monitor the logs, and wait for the moment where the number of established TCP connections would increase without decreasing afterwards. These logs described what was going on in the TCP stack at the exact moment the connection leak happened. At this point, we figured out that it occurred when a client connected to the server but fail to perform the TLS handshake, so the server dropped the connection without <em>closing</em> it--hence leaking it <em>forever</em>.</p> +<p>Here we go: <a href="https://github.com/dinosaure/paf-le-chien/pull/72"><em>one less leak</em></a>.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#next-steps" aria-label="next steps permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Next Steps</h4> +<p>Matching <em>logs</em> and <em>metrics</em> to inspect them together has proven to be very useful. We used Grafana for metrics, so the next step would be to also provide logs because Grafana supports structured logging through the <a href="https://grafana.com/oss/loki/">Loki</a> logs aggregation system.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#miragehole---a-unikernel-dns-resolver-with-holes" aria-label="miragehole a unikernel dns resolver with holes permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>MirageHole - A Unikernel DNS Resolver with Holes</h3> +<p>One way to stop web trackers, advertisements, and malware is to block access to sites known to contain such things. A popular approach is through browser extensions like <a href="https://en.wikipedia.org/wiki/AdBlock">AdBlock</a> and <a href="https://en.wikipedia.org/wiki/Privacy_Badger">Privacy Badger</a>. Another approach known as <a href="https://en.wikipedia.org/wiki/DNS_sinkhole">a DNS sinkhole</a> involves installing a local DNS server that resolves bad domains to an invalid IP address. This approach has the advantage of working across different operating systems, browsers, and devices (laptops, smartphones, smart-TVs, etc.). For an added bonus, it can also save network bandwidth.</p> +<p>Another project initiated during this year's retreat was to implement <a href="https://github.com/jmid/mirage-hole">Mirage-hole</a>: a DNS sinkhole running as a Mirage Unikernel. It was inspired by <a href="https://en.wikipedia.org/wiki/Pi-hole">Pi-hole</a> for the Raspberry Pi. Starting from a DNS-stub example from <a href="https://github.com/roburio/dnsvizor">dnsvizor</a> (and after a bit of network debugging), we got a unikernel running that would block a single selected domain. We then extended this to fetch and parse <a href="https://github.com/blocklistproject/Lists">a blocklist</a> at start-up. Next, we worked on integrating a little webserver to serve statistics about the requested and blocked domains. Overall, the project was a nice opportunity to talk to and learn from several MirageOS contributors, and it served as a nice tour-de-force of several MirageOS networking libraries.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#tarides-map---serving-the-tarides-geographical-distribution-in-a-unikernel" aria-label="tarides map serving the tarides geographical distribution in a unikernel permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Tarides Map - Serving the Tarides Geographical Distribution in a Unikernel</h3> +<p>Tarides Map is a project intended to show the geographic distribution of all Tarides collaborators as a website. At the retreat, we explored deploying the site in a unikernel. To do this, we had to decide how to serve the files on the server and integrate it into a unikernel. We had two options use <code>ocaml-crunch</code> or Docteur.</p> +<p>We initially used <a href="https://github.com/dinosaure/docteur">Docteur</a> due to an inspiration from a different project called <a href="https://github.com/dinosaure/pasteur">Pasteur</a>, which uses Docteur and is deployed in a unikernel as a static site, which was exactly what we were aiming to do with Tarides Map. However, integrating Docteur into the project proved to be more difficult than we had expected. One reason was that Solo5 isn't currently supported on MacOS, the operating system used to write the project at the retreat. After compiling to Unix instead and numerous hours debugging, we were eventually able to generate the disk image; however, we still had issues deploying it in a unikernel, so we decided to try using <a href="https://github.com/mirage/ocaml-crunch">ocaml-crunch</a> instead.</p> +<p>Using <code>ocaml-crunch</code> proved to be a more straightforward option. We merely had to move some files around so that the directory structure could be turned into a standalone OCaml module to serve the file contents without requiring an external filesystem to be present. After doing this, we were successfully able to deploy the site <a href="https://github.com/SaySayo/tarides_map_static_website">here</a>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#what-we-dreamed-about" aria-label="what we dreamed about permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What We Dreamed About</h2> +<p>Another very interesting part of the retreat were the <em>dreaming sessions</em> organized by Hannes. The central idea behind this exercice was to allow ourselves to dream about how we envision the MirageOS project in the future, no matter how untangible and seemingly unrealistic. We talked about those dreams in two sessions.</p> +<p>The initial session revolved around gathering these dreams and ideas, without discussing how to achieve them, and let our mind go free with what we wanted to accomplish with MirageOS. Often times, those dreams would be shared with other participants. Some dreamed about replacing their whole software infrastructure by MirageOS, if not their main operating system! Others dreamed of artistic applications for Mirage, like using it as a backbone for musical endeavors.</p> +<p>The subsequent session revolved around <em>how</em> we could reach those dreams. This facilitated a more practical discussion around the challenges we may face along the way. Interestingly enough, in some instances, it turned out some dreams were either already achieved (like reverse-debugging Solo5!) or were close to being achievable.</p> +<p>A beautiful example of the attendees' dedication is that it did not take long for some to start working on projects like MirageOS-OS, a hypervisor for MirageOS unikernels and written with MirageOS, or to successfully implement a jack port driver for the Raspberry Pi 4, bringing us closer to MirageOS powered synthesisers and to MirageOS midi interfaces!</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/45ef29c8b49ef9d490e6ccdeec689ed5/3acf0/Mirleft_reefs.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 75.29411764705883%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/45ef29c8b49ef9d490e6ccdeec689ed5/7bf67/Mirleft_reefs.jpg" class="gatsby-resp-image-image" alt="Mirleft's Beautiful Reefs" title="Mirleft's Beautiful Reefs" srcset="/static/45ef29c8b49ef9d490e6ccdeec689ed5/651be/Mirleft_reefs.jpg 170w, +/static/45ef29c8b49ef9d490e6ccdeec689ed5/d30a3/Mirleft_reefs.jpg 340w, +/static/45ef29c8b49ef9d490e6ccdeec689ed5/7bf67/Mirleft_reefs.jpg 680w, +/static/45ef29c8b49ef9d490e6ccdeec689ed5/990cb/Mirleft_reefs.jpg 1020w, +/static/45ef29c8b49ef9d490e6ccdeec689ed5/c44b8/Mirleft_reefs.jpg 1360w, +/static/45ef29c8b49ef9d490e6ccdeec689ed5/3acf0/Mirleft_reefs.jpg 2000w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#inventing-ocamlwave-serenading-cats-and-christening-dogs" aria-label="inventing ocamlwave serenading cats and christening dogs permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Inventing OCamlwave, Serenading Cats, and Christening Dogs</h2> +<p>As mentioned above, the retreat was extremely inspiring, even with respect to topics less related to MirageOS than the ones mentioned here. The one we're most proud of is <a href="https://www.youtube.com/playlist?list=PLmaiK3-DyqMy3kNjdHIPUEo-Gkltha3mT">our Mirleft MirageOS EP</a> that contains five tracks (<em>five</em> in the spirit of Solo<em>5</em> and OCaml <em>5.0</em>, of course). Its genre might be better described as <em>OCamlwave</em>! On our EP, you will find many musical oddities ranging from an on-premise recorded drum solo (with glasses, cloth-racks, and flip-flops) to a cat-powered cover of <em>Mr Sandman</em> (as an hommage to our time singing to Morroco's many, <em>many</em> cute cats.) to the occasional dramatic rendition of controversial pull requests on the OCaml compiler.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/c9cafec3abe45e91bbbaec46eb95c169/f8beb/mirleft_cats.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 56.470588235294116%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/c9cafec3abe45e91bbbaec46eb95c169/7bf67/mirleft_cats.jpg" class="gatsby-resp-image-image" alt="Cute street kittens in Mirleft" title="Cute street kittens in Mirleft" srcset="/static/c9cafec3abe45e91bbbaec46eb95c169/651be/mirleft_cats.jpg 170w, +/static/c9cafec3abe45e91bbbaec46eb95c169/d30a3/mirleft_cats.jpg 340w, +/static/c9cafec3abe45e91bbbaec46eb95c169/7bf67/mirleft_cats.jpg 680w, +/static/c9cafec3abe45e91bbbaec46eb95c169/990cb/mirleft_cats.jpg 1020w, +/static/c9cafec3abe45e91bbbaec46eb95c169/c44b8/mirleft_cats.jpg 1360w, +/static/c9cafec3abe45e91bbbaec46eb95c169/f8beb/mirleft_cats.jpg 3421w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>However, not everything in Mirleft was about music and animals. Some things were also about the beauitful waves.</p> +<p>Mirleft is a paradise for surfing, both for beginners and advanced surfers! We went to a nice sandy beach with perfect conditions to get started with surfing. Advanced surfers would probably go to one of the reefs for surfing, which we, in turn, found amazing for a peaceful walk, sometimes with and sometimes without company from a street dog we christened <code>null</code>.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 401px; "> + <a href="https://tarides.com/static/49d62c1c5afdf2143a1efb9b0cb7bc1b/9144d/mirleft_dog.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 128.23529411764707%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/49d62c1c5afdf2143a1efb9b0cb7bc1b/9144d/mirleft_dog.png" class="gatsby-resp-image-image" alt="Sweet null, our new friend!" title="Sweet null, our new friend!" srcset="/static/49d62c1c5afdf2143a1efb9b0cb7bc1b/04472/mirleft_dog.png 170w, +/static/49d62c1c5afdf2143a1efb9b0cb7bc1b/9f933/mirleft_dog.png 340w, +/static/49d62c1c5afdf2143a1efb9b0cb7bc1b/9144d/mirleft_dog.png 401w" sizes="(max-width: 401px) 100vw, 401px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>And, well, talking about <code>null</code> (apart from naming street animals), we also had plenty of other computer science related conversations at the retreat. All of them were extremely enriching! A couple of examples include exception backtracing in LWT programs and BGP intrinsics.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#thanks-for-all-the-fish" aria-label="thanks for all the fish permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Thanks for All the Fish!</h2> +<p>As this lengthy report can attest, our experience was an amazing one for all. The MirageOS Hack Retreats are always an otherworldly space, where amazing individuals gather to exchange thoughts and create new (and better) software. Friends are made along the way, some bugs are fixed, new ones are found, and great new ideas emerges.</p> +<p>This very special sense of community is rare, so we would like to thank everyone who organized, attended, and tended to the event. Thank you to our delightful hosts, who've been with us since the first retreat in 2016! Thank you as well to <a href="https://robur.io">Hannes and Robur</a> for organizing those retreats and spending time instilling the same inspiration in the great project that is MirageOS! Finally, thank you to old and new friends, as well as old and new MirageOS hackers, for this amazing week of happy banter and hacking!</p> +<p>PD: Some of the pics in this post were shared among us via <a href="https://github.com/dinosaure/bob">bob</a>, a MirageOS unikernel to share files.</p>https://tarides.com/blog/2022-10-28-the-mirageos-retreat-a-journey-of-food-cats-and-unikernelsThe MirageOS Retreat: A Journey of Food, Cats, and Unikernels2022-10-28T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#into-the-fire" aria-label="into the fire permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Into the Fire</h2> +<p>The OCaml ecosystem relies on various resources and infrastructure such as <a href="https://ocaml.org">ocaml.org</a>, <a href="https://hub.docker.com/r/ocaml/opam">OCaml Docker images</a>, <a href="http://check.ocamllabs.io/">opam-repo-ci</a>, that are built and deployed using <a href="https://www.ocurrent.org">OCurrent</a>. OCurrent is a library to express workflows and keep things up to date. As many of these projects are created using the same technology, it was interesting to centralise the documentation as it was spread throughout the various repositories. This post is about how we used OCurrent itself to automate this problem. We think it might also demonstrate how you can use OCurrent to automate some of yours!</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#cant-keep-my-eyes-off-you" aria-label="cant keep my eyes off you permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Can't Keep My Eyes Off You</h2> +<p>Before digging into the logic, it's essential to thoroughly define the problems in the documentation. The first problem was that the documentation lives in many GitHub repositories. Indeed, to make sure we update it whenever we modify the associated code, we keep the documentation closest to the code. The result is a repository organisation like this:</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 401px; "> + <a href="https://tarides.com/static/13de7358e2149b6d1a660a4452c279bc/9144d/tracker.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 140.58823529411765%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/13de7358e2149b6d1a660a4452c279bc/9144d/tracker.png" class="gatsby-resp-image-image" alt="Tracking" title="Tracking" srcset="/static/13de7358e2149b6d1a660a4452c279bc/04472/tracker.png 170w, +/static/13de7358e2149b6d1a660a4452c279bc/9f933/tracker.png 340w, +/static/13de7358e2149b6d1a660a4452c279bc/9144d/tracker.png 401w" sizes="(max-width: 401px) 100vw, 401px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>It's not a good judgement to count on humans' actions to monitor changes in all these repositories. As they fluctuate on their own time, we can't expect maintainers to backport documentation changes to the <code>ocurrent.org</code> website for each modification. To say it more technically, we need to track many files and keep them up to date. These actions should also update incrementally which matches with OCurrent nicely.</p> +<p>In addition, this documentation needs to stay up to date. Even if we centralise the documentation automatically, we must rebuild it regularly and fetch the changes from the repositories we track. Otherwise, the documentation will start to be outdated quickly. This is the opposite of what we want.</p> +<p>Furthermore, the system has to scale and be updated easily. Indeed, we would like to have the possibility to introduce new documents and repositories without having to install more applications. For instance, it would be beneficial to simply make a pull request somewhere.</p> +<p>In the next section, we will focus on the OCurrent pipeline design, which will automate our tasks and solve these problems.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#here-i-dreamt-i-was-an-architect" aria-label="here i dreamt i was an architect permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Here I Dreamt I Was an Architect</h2> +<p>The project is composed of several blocks we want to write to achieve our work:</p> +<ul> +<li>Fetch files from GitHub</li> +<li>Rebuild a subset of the code</li> +<li>Make the system modular</li> +<li>Store and deploy the data easily</li> +</ul> +<p>One aspect that will make our work a bit easier is to have it all concentrated in the same place, GitHub. As OCurrent provides a plugin to fetch information from GitHub, <code>current_github</code>, we don't have to worry about it. Furthermore, everything is cached thanks to OCurrent itself. We don't have to care about the incremental build. The only requirement is wisely choosing the data we want to cache.</p> +<p>Our architecture uses a <code>trackers.yml</code> file describing how the pipeline should interact with our heterogeneous repositories. It describes the files we want to track and where we would like them in the final website structure. The configuration gives a way to achieve modularity at a low cost, as we only have to open a PR on the repository that contains the tracker file to update them. Additionally, it allows us to track the repositories we want quickly. Once it's followed, we don't have to worry about the monitoring, as OCurrent can be set to rebuild stuff at the regular cycle. In our case, we want to control every week that the code hasn't mutated. In the present version of <code>trackers.yml</code>, we can specify the files we want to copy and the indexes we want to create to build our structure. This file is stored in the repository on which the <code>GitHub App</code> is installed.</p> +<p>Another critical component in the architecture is handling new files from the remote repository and integrating them into the website structure. This element is in charge of moving the piece from one part of the system to another. Moreover, it will have to ensure the paths are consistent and fail if not.</p> +<p>The last item must push the code to a specific Git repository because we decided to use <code>GitHub Pages</code> to store the website. To avoid issues with account management, it needs <code>ssh</code> to get access to a specific repository.</p> +<p>In the end, the pipeline design would look like this: +<span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 379px; "> + <a href="https://tarides.com/static/6f25d575c0b498df6b0b07fe910849cc/811d1/pipeline.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 128.82352941176472%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/6f25d575c0b498df6b0b07fe910849cc/811d1/pipeline.png" class="gatsby-resp-image-image" alt="Pipeline" title="Pipeline" srcset="/static/6f25d575c0b498df6b0b07fe910849cc/04472/pipeline.png 170w, +/static/6f25d575c0b498df6b0b07fe910849cc/9f933/pipeline.png 340w, +/static/6f25d575c0b498df6b0b07fe910849cc/811d1/pipeline.png 379w" sizes="(max-width: 379px) 100vw, 379px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>Now that we have our workflow let's see how it is implemented in practice!</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#this-is-how-we-do-it" aria-label="this is how we do it permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>This is How We Do It</h2> +<p>In this section, we will focus on the way to implement this infrastructure. We won't view all the elements in detail, but we will try to concentrate on the most important ones, like how to create a custom <code>ocurrent</code> component and chain them together to build a pipeline.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#current_github" aria-label="current_github permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><code>current_github</code></h3> +<p>Let's focus on a standard structure in an OCurrent project: the way to get the HEAD of a branch on GitHub and fetch the commit with Git. In the related code, we find the HEAD, then ask GitHub to give us information about the HEAD commit on the default branch and finally get the content with Git (it returns the related commit):</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> fetch_commit <span class="token label property">~github</span> <span class="token label property">~repo</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> + <span class="token keyword">let</span> head <span class="token operator">=</span> Current_github<span class="token punctuation">.</span>API<span class="token punctuation">.</span>head_commit GitHub repo <span class="token keyword">in</span> + <span class="token keyword">let</span> commit_id <span class="token operator">=</span> + Current<span class="token punctuation">.</span>map Current_github<span class="token punctuation">.</span>Api<span class="token punctuation">.</span>Commit<span class="token punctuation">.</span>id head + <span class="token keyword">in</span> + <span class="token keyword">let</span> commit <span class="token operator">=</span> + Current_git<span class="token punctuation">.</span>fetch commit_id + <span class="token keyword">in</span> + commit + +<span class="token keyword">let</span> main <span class="token operator">=</span> + <span class="token keyword">let</span> github <span class="token operator">=</span> <span class="token comment">(* GitHub App code *)</span> <span class="token keyword">in</span> + <span class="token keyword">let</span> commit <span class="token operator">=</span> fetch_commit <span class="token label property">~github</span> <span class="token label property">~repo</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">in</span> + <span class="token comment">(* Use the commit code *)</span></code></pre></div> +<p>The documentation of <code>current_github</code> and <code>current_git</code> is available <a href="https://www.ocurrent.org/ocurrent/">online</a>.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#fetching-the-files" aria-label="fetching the files permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Fetching the Files</h3> +<p>As we know how to extract data from GitHub, applying the process to various repositories will be easy. It can be noticed that the <code>commit</code> element is of type <code>Commit.t Current.t</code>. To work with <code>Current.t</code>, we need to &quot;unwrap&quot; the object with specific functions like <code>map</code> and <code>bind</code>. This post does not present how to load the content from a <code>Yaml</code> file. We assume that we get a <code>selection list Current.t</code>, where <code>selection</code> is defined as:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">type</span> selection <span class="token operator">=</span> <span class="token punctuation">{</span> + repo <span class="token punctuation">:</span> string<span class="token punctuation">;</span> + commit <span class="token punctuation">:</span> Current_git<span class="token punctuation">.</span>Commit<span class="token punctuation">.</span>t Current<span class="token punctuation">.</span>t<span class="token punctuation">;</span> + files <span class="token punctuation">:</span> <span class="token type-variable function">'a</span> list<span class="token punctuation">;</span> +<span class="token punctuation">}</span></code></pre></div> +<p>It contains the source repository, the commit associated with the specified branch, and the list of files to monitor from this repository.</p> +<p>To <code>git clone</code> the content, we must apply the <code>fetch_commit</code> function.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#copy-the-content" aria-label="copy the content permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Copy the Content</h3> +<p>In this subsection, we will see how we can define a custom component and how to make it interact with the rest of our code.</p> +<p>The component is in charge of fetching the content of the files from the source directory and storing it in memory. To trigger the action only when the content changes, we will define a <code>Current_cache</code> element. Thanks to OCurrent, the content is cached and only rebuilt on change or request.</p> +<p>It manipulates some <code>File.info</code> (source, destination, &hellip;) and produces a <code>File.t</code> when the content is read. <code>File.t</code> is simply a:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">type</span> File<span class="token punctuation">.</span>t <span class="token operator">=</span> <span class="token punctuation">{</span> + metadata<span class="token punctuation">:</span> File<span class="token punctuation">.</span>info<span class="token punctuation">;</span> + content<span class="token punctuation">:</span> string list<span class="token punctuation">;</span> +<span class="token punctuation">}</span></code></pre></div> +<p>Our file is represented as a <code>string list</code>, as we need to be able to add more information. We know the size of the files is limited, so it is not an issue for us. +The component is defined as a <code>Current_cache.BUILDER</code> with whom the signature looks like this:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">module</span> <span class="token keyword">type</span> BUILDER <span class="token operator">=</span> <span class="token keyword">sig</span> + +<span class="token keyword">type</span> context + +<span class="token keyword">module</span> Key <span class="token punctuation">:</span> <span class="token keyword">sig</span> + <span class="token keyword">type</span> t + <span class="token keyword">val</span> digest<span class="token punctuation">:</span> +<span class="token keyword">end</span> + +<span class="token keyword">module</span> Value <span class="token punctuation">:</span> <span class="token keyword">sig</span> + <span class="token keyword">type</span> t + <span class="token keyword">val</span> marshall <span class="token punctuation">:</span> t <span class="token operator">-&gt;</span> string + <span class="token keyword">val</span> unmarshall <span class="token punctuation">:</span> string <span class="token operator">-&gt;</span> t +<span class="token keyword">end</span> + +<span class="token keyword">val</span> build <span class="token punctuation">:</span> + context <span class="token operator">-&gt;</span> + Current<span class="token punctuation">.</span>Job<span class="token punctuation">.</span>t <span class="token operator">-&gt;</span> + Key<span class="token punctuation">.</span>t <span class="token operator">-&gt;</span> + Value<span class="token punctuation">.</span>t Current<span class="token punctuation">.</span>or_error Lwt<span class="token punctuation">.</span>t +<span class="token keyword">end</span></code></pre></div> +<p>As the <code>Value</code> and the <code>Key</code> modules only use functions to manipulate <code>JSON</code>, we can focus on the <code>build</code> function definition:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> build files job <span class="token punctuation">{</span> Key<span class="token punctuation">.</span>commit<span class="token punctuation">;</span> Key<span class="token punctuation">.</span>repo<span class="token punctuation">;</span> <span class="token punctuation">_</span> <span class="token punctuation">}</span> <span class="token operator">=</span> + Current<span class="token punctuation">.</span>Job<span class="token punctuation">.</span>start job <span class="token label property">~level</span><span class="token punctuation">:</span>Current<span class="token punctuation">.</span>Level<span class="token punctuation">.</span>Average <span class="token operator">&gt;&gt;=</span> <span class="token keyword">fun</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">-&gt;</span> + Current_git<span class="token punctuation">.</span>with_checkout <span class="token label property">~job</span> commit <span class="token operator">@@</span> <span class="token keyword">fun</span> dir <span class="token operator">-&gt;</span> + extract <span class="token label property">~job</span> <span class="token label property">~dir</span> repo files + <span class="token operator">&gt;&gt;=</span> Lwt_result<span class="token punctuation">.</span>return</code></pre></div> +<p>It creates a temporary directory with the content fetched from Git. Then, it extracts the data as a <code>File.t</code> and returns the result. The interesting detail here is <code>Current_git.with_checkout fn</code>. It is used to copy our code somewhere in the system temporarily. <code>Current.Job.start</code> is just some boilerplate code to start a job asynchronously.</p> +<p>Consequently, we can give the builder a functor to construct our cache system. Moreover, we create a function associated with it thanks to the <code>Content</code> module newly created:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">module</span> Content <span class="token operator">=</span> Current_cache<span class="token punctuation">.</span>Make <span class="token punctuation">(</span>Content<span class="token punctuation">)</span> + +<span class="token keyword">let</span> weekly <span class="token operator">=</span> Current_cache<span class="token punctuation">.</span>Schedule<span class="token punctuation">.</span>v <span class="token label property">~valid_for</span><span class="token punctuation">:</span><span class="token punctuation">(</span>Duration<span class="token punctuation">.</span>of_day <span class="token number">7</span><span class="token punctuation">)</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> + +<span class="token keyword">let</span> fetch <span class="token label property">~repo</span> <span class="token operator">~</span> commit files <span class="token operator">=</span> + Current<span class="token punctuation">.</span>component <span class="token string">&quot;fetch-doc&quot;</span> <span class="token operator">|&gt;</span> + <span class="token keyword">let</span><span class="token operator">&gt;</span> commit <span class="token operator">=</span> commit <span class="token keyword">in</span> + Content<span class="token punctuation">.</span>get <span class="token label property">~schedule</span><span class="token punctuation">:</span>weekly files + <span class="token punctuation">{</span>content<span class="token punctuation">.</span>Key<span class="token punctuation">.</span>repo<span class="token punctuation">;</span> Content<span class="token punctuation">.</span>Key<span class="token punctuation">.</span>commit <span class="token punctuation">}</span></code></pre></div> +<p>We specify the date when the cache is invalidated to trigger the rebuild at least every week.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#build--deploy" aria-label="build deploy permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Build &amp; Deploy</h3> +<p>In this last subsection, we discuss how to write all the files stored in the cache to the right place in the filesystem. We use <code>hugo</code> to build the website and <code>git</code> with <code>ssh</code> to deploy it. As we expect the information to be cached, we build a <code>Current_cache</code> module again, where the <code>build</code> function is:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> build <span class="token punctuation">{</span> files<span class="token punctuation">;</span> indexes<span class="token punctuation">;</span> conf <span class="token punctuation">}</span> job <span class="token punctuation">{</span> Key<span class="token punctuation">.</span>commit<span class="token punctuation">;</span> <span class="token punctuation">_</span> <span class="token punctuation">}</span> <span class="token operator">=</span> + Current<span class="token punctuation">.</span>Job<span class="token punctuation">.</span>start job <span class="token label property">~level</span><span class="token punctuation">:</span>Current<span class="token punctuation">.</span>Level<span class="token punctuation">.</span>Average <span class="token operator">&gt;&gt;=</span> <span class="token keyword">fun</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">-&gt;</span> + Current_git<span class="token punctuation">.</span>with_checkout <span class="token label property">~job</span> commit <span class="token operator">@@</span> <span class="token keyword">fun</span> dir <span class="token operator">-&gt;</span> + write_all job dir files indexes <span class="token operator">&gt;&gt;=</span> <span class="token keyword">fun</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">-&gt;</span> + Lwt_result<span class="token punctuation">.</span>bind <span class="token punctuation">(</span>hugo <span class="token label property">~cwd</span><span class="token punctuation">:</span>dir job<span class="token punctuation">)</span> <span class="token punctuation">(</span><span class="token keyword">fun</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">-&gt;</span> + <span class="token keyword">let</span> f cwd <span class="token operator">=</span> + <span class="token keyword">let</span> commit <span class="token operator">=</span> Current_git<span class="token punctuation">.</span>Commit<span class="token punctuation">.</span>hash commit <span class="token keyword">in</span> + deploy_over_git <span class="token label property">~cwd</span> <span class="token label property">~job</span> <span class="token label property">~conf</span> dir commit + <span class="token keyword">in</span> + Current<span class="token punctuation">.</span>Process<span class="token punctuation">.</span>with_tmpdir f<span class="token punctuation">)</span></code></pre></div> +<p>In this context, the pipeline creates an <code>indexes</code> file as <code>_index.md</code>. It's used by Hugo to build the directory structure. This function uses the same <code>Current_git.checkout</code> process to create a temporary directory containing the website's skeleton. All the work is done in the <code>deploy_over_git</code> function, but this is not relevant to go further in detail. The component writes all the <code>File.t.content</code> to the destination specified in their metadata. Once we have successfully written them, we generate the website with <code>hugo --minify --output-dir=public/</code>. Last but not least, we copy the content of the <code>public</code> repository to a fresh temporary one, so we can add the files with a <code>git init</code> and push our work to GitHub. Finally, on the target repository, GitHub Pages will deploy the website.</p> +<p>And voila, our website is up-to-date and online!</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#happy-together" aria-label="happy together permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Happy Together</h2> +<p>This blog post has described how we handle our distributed documentation and centralise it on our website. We have seen how to use some <code>Current_*</code> plugins and how to write our own. It was also the occasion to speak about various OCurrent structures.</p> +<p>If you are curious, you can check the code in the <a href="https://github.com/ocurrent/ocurrent.org">ocurrent/ocurrent.org</a> repository. Feel free to look at the <a href="https://ocurrent.org">ocurrent.org</a> built with this pipeline. The description of the pipeline is also available in the <a href="https://github.com/ocurrent/ocurrent.org/tree/master/bin">bin</a> repository.</p>https://tarides.com/blog/2022-10-20-up-to-date-online-documentationUp-to-Date Online Documentation2022-10-20T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>This article describes the porting of the DHCP daemon <code>charrua-unix</code> and its companion library <code>rawlink</code> to <a href="https://github.com/ocaml-multicore/eio">Eio</a> for the upcoming OCaml 5 release. Before we get started, it makes sense to briefly describe what DHCP is and how we use it in production.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#what-is-dhcp" aria-label="what is dhcp permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What is DHCP?</h1> +<p>DHCP stands for Dynamic Host Configuration Protocol, and it's described in <a href="https://www.rfc-editor.org/rfc/rfc2131.txt">RFC2131</a>, <a href="https://www.rfc-editor.org/rfc/rfc2132.txt">RFC2132</a>, and others. It was first published in 1993, so it's considerably old, yet very much alive in virtually every network these days&mdash;from your home, to your office, to your ISP Wide Area Network.</p> +<p>When your computer, laptop, phone, or any IP-connected device boots up or changes network, it requests network parameters via broadcast. These parameters are requested and answered via the DHCP protocol. The common/minimum parameters a client requests are:</p> +<ul> +<li>An IPv4 address</li> +<li>An IPv4 gateway</li> +<li>The address of a DNS resolver</li> +</ul> +<p>This is enough to get connectivity in most networks. DHCP can also provide many extra parameters, but they are outside of the scope of this document.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#what-is-charrua-dhcp" aria-label="what is charrua dhcp permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What is <code>charrua-dhcp</code>?</h1> +<p><code>charrua-dhcp</code> is a DHCP library suite written in pure OCaml. You might not know it, but if you have ever used Docker Desktop, be it on Windows or macOS, you're a user of <code>charrua-dhcp</code> already !</p> +<p>In Docker Desktop, a complete Linux VM is run in the background in order to be able to run Docker containers. This VM needs to acquire network parameters from the host operating system, and this is done via <code>charrua-dhcp</code>. You can check more details on how OCaml and <code>charrua</code> are used to power Docker Desktop in <a href="https://www.docker.com/blog/how-docker-desktop-networking-works-under-the-hood/">this article</a>.</p> +<p><code>charrua-dhcp</code> is also the standard DHCP implementation used in <a href="https://mirageos.org/">Mirage OS</a>, both when used as a server or a client, and perhaps more importantly, it is used on high profile, critical cases, like the home network of yours truly. It is a stable and tested library that has been in use for years, and it has also been put to the challenge against <a href="https://github.com/stedolan/crowbar">Crowbar</a>. See more details in <a href="https://somerandomidiot.com/blog/2017/04/26/crowbar-dhcp/">this article</a> by Mindy Preston.</p> +<p><code>charrua-dhcp</code> is split into <code>charrua-core</code> and <code>charrua-unix</code>:</p> +<p><code>charrua-core</code> implements the DHCP server and client logic in pure OCaml, as well as providing serialisers and deserialisers for the protocol wire format. It also provides a textual configuration interface, like <a href="https://www.isc.org/dhcp/">ISC-DHCP</a> does.</p> +<p>When we say pure OCaml, we mean it! <code>charrua-core</code> is purely functional and doesn't produce anything via side-effects; therefore, it also does not perform any kind of I/O.</p> +<p><code>charrua-unix</code> implements the effect-full bits, and it does I/O, feeding incoming packets to <code>charrua-core</code> and sending out replies given by <code>charrua-core</code>.</p> +<p>The idea is that <code>charrua-core</code> has the complex DHCP logic, while <code>charrua-unix</code> does the basic things: logging, sending/receiving packets, making sure the environment is secure, and so on.</p> +<p>The name <code>charrua</code> is a reference to the seminomadic tribe Charr&uacute;a from what is today Uruguay, Argentina, and southern Brazil. The rationale is that DHCP serves parameters to roaming (nomadic) clients.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#what-is-rawlink" aria-label="what is rawlink permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What is <code>rawlink</code>?</h1> +<p>DHCP is not an IP protocol. It sits above the Ethernet layer, which means a DHCP application must be able to craft and receive the full Ethernet packet, not just the layers above IP.</p> +<p>Each operating system provides a slightly different mechanism on how to accomplish this. Linux provides a special socket family called AF_SOCKET, whereas BSDs (OpenBSD, FreeBSD, macOS...) and most other Unix systems provide the same via BPF.</p> +<p><code>rawlink</code> is an OCaml library with C stubs that abstracts these differences away. You get a link on a network interface, which you use to craft and receive full Ethernet packets, bypassing most of the operating system network stack. In other words, <code>rawlink</code> allows you to work with <code>raw</code> packets on an Ethernet <code>link</code>.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/caca7d8be3e9041f4b48945ca49762f5/ab98c/charrua.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 128.23529411764707%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/caca7d8be3e9041f4b48945ca49762f5/c5bb3/charrua.png" class="gatsby-resp-image-image" alt="charrua" title="charrua" srcset="/static/caca7d8be3e9041f4b48945ca49762f5/04472/charrua.png 170w, +/static/caca7d8be3e9041f4b48945ca49762f5/9f933/charrua.png 340w, +/static/caca7d8be3e9041f4b48945ca49762f5/c5bb3/charrua.png 680w, +/static/caca7d8be3e9041f4b48945ca49762f5/b12f7/charrua.png 1020w, +/static/caca7d8be3e9041f4b48945ca49762f5/b5a09/charrua.png 1360w, +/static/caca7d8be3e9041f4b48945ca49762f5/ab98c/charrua.png 2356w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#what-changes-in-ocaml-5" aria-label="what changes in ocaml 5 permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What Changes in OCaml 5?</h1> +<p>OCaml 5 provides two main new features:</p> +<ul> +<li>Parallelism</li> +<li>Effect handlers</li> +</ul> +<p>Parallelism makes little sense on a slow, control protocol like DHCP, so we don't use it and it's not the focus of this article.</p> +<p>Effect handlers allow OCaml programs to write non-blocking code as <em>if</em> they were blocking.</p> +<p>Until OCaml 5 and effect handlers, the common way to write non-blocking code was through <a href="https://github.com/ocsigen/lwt">Lwt</a>, a concurrent programming library for OCaml. Lwt provides a concurrent scheduler and a monadic style of writing programs through promises. With it, the program becomes a long string of binding promises.</p> +<p>One issue with Lwt is that it's very &quot;infectious,&quot; and as soon as you add the first Lwt promise (called &quot;thread&quot; in Lwt lingo), the whole code must now behave as a promise as well. Another issue is that the monadic programming is somewhat syntax heavy, so it can clutter the code. Since the promises are allocations themselves, it can also negatively affect performance. Lwt is a great library, but with OCaml 5 and effects we can do better.</p> +<p>With OCaml 5 and effect handlers we can have the best of both worlds. We can write non-blocking code in a blocking style without the monadic clutter imposed by Lwt. The library we are proposing to replace Lwt in OCaml 5 is <a href="https://github.com/ocaml-multicore/eio">Eio</a>, which takes full advantage of the effect system, as well as providing a framework to express parallelism.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#lwt-vs-eio" aria-label="lwt vs eio permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Lwt vs. Eio</h2> +<p>This code snippet is the main function of <code>charrua-unix</code>, using Eio (left) and Lwt (right). We can summarize what is happening as follows:</p> +<p>1 - We read a packet from the network. +2 - We feed the packet to <code>charrua-core</code>, which then gives us a possible <code>Reply (reply, db)</code>, the packet to be sent out and the new DHCP database state, respectively. +3 - We send the reply out and loop for more packets.</p> +<p>It's a fairly simple code, but it shows how much less cluttered the Eio version can be by removing all Lwt decorators. Another nice advantage is that if we were to write a blocking version of the same code, we would only need to change <code>Eio_rawlink.{read,write}_packet</code> to <code>Rawlink.{read,write}_packet</code> as their signatures remain the same, something impossible with Lwt.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/2a50353585ea9fc610d27062d645f482/9b29b/code_Eio.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 28.235294117647058%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/2a50353585ea9fc610d27062d645f482/c5bb3/code_Eio.png" class="gatsby-resp-image-image" alt="code Eio" title="code Eio" srcset="/static/2a50353585ea9fc610d27062d645f482/04472/code_Eio.png 170w, +/static/2a50353585ea9fc610d27062d645f482/9f933/code_Eio.png 340w, +/static/2a50353585ea9fc610d27062d645f482/c5bb3/code_Eio.png 680w, +/static/2a50353585ea9fc610d27062d645f482/b12f7/code_Eio.png 1020w, +/static/2a50353585ea9fc610d27062d645f482/b5a09/code_Eio.png 1360w, +/static/2a50353585ea9fc610d27062d645f482/9b29b/code_Eio.png 3840w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#rawlink-and-eio-switches" aria-label="rawlink and eio switches permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><code>rawlink</code> and Eio Switches</h2> +<p><code>rawlink</code> uses a file descriptor that Eio knows nothing about, so in order for us to use Eio with it, we want to attach an <code>Eio.Flow.t</code> to the file descriptor. An <code>Eio.Flow.t</code> is an Eio abstraction of a bidirectional <code>socket</code>, even though it was designed mostly for a <code>STREAM</code>-like <code>socket</code> in mind, the semantics fit <code>rawlink</code> case. We do this in <code>Rawlink_eio.opensock</code>:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> open_link <span class="token operator">?</span>filter <span class="token operator">?</span><span class="token punctuation">(</span>promisc<span class="token operator">=</span><span class="token boolean">false</span><span class="token punctuation">)</span> ifname <span class="token label property">~sw</span> <span class="token operator">=</span> + <span class="token keyword">let</span> fd <span class="token operator">=</span> Rawlink_lowlevel<span class="token punctuation">.</span>opensock <span class="token operator">?</span>filter<span class="token punctuation">:</span>filter <span class="token label property">~promisc</span> ifname <span class="token keyword">in</span> + <span class="token keyword">let</span> flow <span class="token operator">=</span> Eio_unix<span class="token punctuation">.</span>FD<span class="token punctuation">.</span>as_socket <span class="token label property">~sw</span> <span class="token label property">~close_unix</span><span class="token punctuation">:</span><span class="token boolean">true</span> fd <span class="token keyword">in</span> + <span class="token punctuation">{</span> flow<span class="token punctuation">;</span> fd<span class="token punctuation">;</span> packets <span class="token operator">=</span> ref <span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">;</span> buffer <span class="token operator">=</span> <span class="token punctuation">(</span>Cstruct<span class="token punctuation">.</span>create <span class="token number">65536</span><span class="token punctuation">)</span> <span class="token punctuation">}</span></code></pre></div> +<p><code>Rawlink_lowlevel.opensock</code> is a call into the actual C stub that returns a BPF or AF_PACKET descriptor, we then create the <code>Eio.Flow.t</code> with <code>Eio_unix.FD.as_socket</code>.</p> +<p>Two things appear out of the ordinary in the flow creation call: The <code>sw (Eio.Switch.t)</code> and <code>close_unix</code> arguments, in order to make sense of them we have to understand what an <code>Eio.Switch.t</code> is.</p> +<p>A long standing issue with Lwt was &quot;how to make sure my file descriptors are not leaked if something goes wrong.&quot; Eio attempts to solve this by forcing each <code>Eio.Flow.t</code> to belong to a <code>Eio.Switch.t</code>. You can't create a <code>Eio.Flow.t</code> without giving it a <code>Eio.Switch.t</code>, so this is what the <code>Eio_unix.FD.as_socket</code> does. Since <code>Flows</code> are also attached to normal file descriptors, <code>Eio.Switch.t</code> also takes care of them.</p> +<p>An Eio program creates one or more <code>Eio.Switch.t</code> in order to attach a <code>Eio.Flow.t</code> to it. An <code>Eio.Switch.t</code> can also be nested, creating a tree-like structure, as every new <code>Eio.Switch.t</code> becomes a child of its parent <code>Eio.Switch.t</code>. When an <code>Eio.Switch.t</code> terminates, either succesfully or by some exception, all of its children <code>Eio.Flow.t</code> are also terminated, automatically closing the file descriptor and guaranteeing we don't have a descriptor leak.</p> +<p><code>close_unix</code> tells Eio to call <code>close(2)</code> when the <code>Eio.Switch.t</code> terminates.</p> +<p>Imagine a TCP server where each client has at least one dedicated <code>Eio.Switch.t</code>, and some of these clients create additional <code>Eio.Switch.t</code> to handle a specific unit of work:</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/3d6e942b36d24d68b9c132d5d9720910/70d4c/switch.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 102.94117647058825%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/3d6e942b36d24d68b9c132d5d9720910/c5bb3/switch.png" class="gatsby-resp-image-image" alt="switch" title="switch" srcset="/static/3d6e942b36d24d68b9c132d5d9720910/04472/switch.png 170w, +/static/3d6e942b36d24d68b9c132d5d9720910/9f933/switch.png 340w, +/static/3d6e942b36d24d68b9c132d5d9720910/c5bb3/switch.png 680w, +/static/3d6e942b36d24d68b9c132d5d9720910/b12f7/switch.png 1020w, +/static/3d6e942b36d24d68b9c132d5d9720910/b5a09/switch.png 1360w, +/static/3d6e942b36d24d68b9c132d5d9720910/70d4c/switch.png 2548w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusion" aria-label="conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusion</h2> +<p>Both Lwt and Eio provide means to achieve concurrency, but they only provide parallelism with <code>Domains</code>. Lwt uses monadic-style promises to achieve concurrency, which pollutes the code and makes it harder to reason about it. Eio makes full use of the new effect handlers and Domains of OCaml 5, providing concurrency and parallelism while maintaining the same programming style of synchronous blocking programs.</p> +<p>Eio is a library that aims to replace Lwt, but with a more modern style and feature set. It provides abstractions for sockets, fibers, streams, flows, and more.</p> +<p>To review,<code>charrua-unix</code> is a feature-packed, yet simple DHCP server implementation for Unix systems based on the OCaml library <code>charrua-core</code>.<code>rawlink</code> makes it possible to read and craft Ethernet packets on most Unix-like systems through an easy-to-use library.</p> +<p>It's relatively easy to port <code>rawlink</code> to Eio by attaching an Eio abstraction of a bidirectional socket, namely <code>Eio.Flow.t</code>, to the file descriptor.</p> +<p>We hope you enjoyed this article and found it helpful. As always, if there are any questions or concerns, feel free to <a href="https://tarides.com/company">reach out</a>.</p>https://tarides.com/blog/2022-10-19-porting-charrua-unix-and-rawlink-to-eioPorting Charrua-Unix and Rawlink to Eio2022-10-19T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Yesterday we announced the <a href="https://tarides.com/blog/2022-10-17-ocaml-5-beta-release">OCaml 5 beta release</a>, and today we're excited to introduce the OCaml Platform Installer! The <a href="https://ocaml.org/docs/platform">OCaml Platform</a> is the recommended toolchain when working with OCaml. This new installer enables programmers to quickly set up the OCaml developer environment, so they don't need to waste precious coding time with a lengthy installation process. If you come across any obstacles, the Platform team encourages you to open a <a href="https://github.com/tarides/ocaml-platform-installer/issues">GitHub Issue</a>.</p> +<p>We've also updated the state of the Platform, making several important changes like promoting <code>odoc</code> and OCamlformat from Incubate to Active. We have <a href="https://discuss.ocaml.org/t/ann-ocaml-platform-installer-alpha-release/10652">notified the OCaml Community</a> about the Platform Installer's alpha release, so you can read about all the new changes and the simple installation process on the <a href="https://discuss.ocaml.org/t/ann-ocaml-platform-installer-alpha-release/10652">OCaml Discuss post</a>.</p> +<p>Stay tuned to our blog as well as our <a href="https://twitter.com/tarides_">Twitter feed</a> to get the latest updates on the OCaml 5 release!</p>https://tarides.com/blog/2022-10-18-ocaml-s-platform-installer-alpha-releaseOCaml's Platform Installer Alpha Release2022-10-18T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Back in June, we announced the <a href="https://tarides.com/blog/2022-06-15-ocaml-5-alpha-release">OCaml 5 alpha release</a>, and today we're excited to announce <a href="https://discuss.ocaml.org/t/ocaml-5-0-0-first-beta-release/10623">the first beta release</a>! Now is an excellent time to test it and report positive or negative feedback on your projects (i.e., did it work, did you see impressive performance speed up, did you have issues finding documentation, etc.)</p> +<p>This beta version stabilised several <a href="https://opam.ocaml.org/">opam</a> packages, fixed several small internal runtime processes (especially the <code>systhreads</code> library), and tweaked the Domain and Effect interface, just to name a few improvements. This version also enables you to update your libraries and software. See the <a href="https://discuss.ocaml.org/t/ocaml-5-0-0-first-beta-release/10623">post on the OCaml Discuss forum</a> for installation instructions and more information. While you're there, join the growing and vibrant OCaml community!</p> +<p>The full OCaml release is expected by the end of the year. Just in time for Christmas! Perhaps more importantly, in time for the new <a href="https://adventofcode.com/">Advent of Code calendar</a>, so you can play around with OCaml 5 with Multicore support.</p>https://tarides.com/blog/2022-10-17-ocaml-5-beta-releaseOCaml 5 Beta Release2022-10-17T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p><em>Real World OCaml is a fantastic book on OCaml and functional programming &ndash; a great resource for beginners and experienced users alike. At Tarides, we want to support new learners of OCaml as much as we can, making it easier for people to become part of the vibrant community surrounding the language.</em> +Tarides is proud to announce that we are sponsoring the Gold Open Access release of <a href="https://www.cambridge.org/core/books/real-world-ocaml-functional-programming-for-the-masses/052E4BCCB09D56A0FE875DD81B1ED571"><em>Real World OCaml</em>, 2nd Edition</a> by Yaron Minsky and Anil Madhavapeddy! It&rsquo;s published by Cambridge University Press, and Tarides is making it possible for everyone to download the book to their local device. You can also receive free copies of the book (see below).</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#accessible-to-everyone-and-free-copies" aria-label="accessible to everyone and free copies permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Accessible to Everyone and Free Copies</h2> +<p>Since its first release in 2013, the book has been a fantastic resource for members across the community. On the Open Access release, its authors said, &ldquo;As long-standing members of the open-source community, we are excited to see our work made more accessible for all users of OCaml.&rdquo;</p> +<p>The authors and publisher have generously agreed to give away some physical copies of <em>Real World OCaml</em>. They really want to reward the amazing and active members of the community who are so engaged with the book. That&rsquo;s why <em>ten people</em> who get a PR merged with a suggested improvement to the book on <a href="https://github.com/realworldocaml/book">GitHub</a> will receive a <em>free copy</em> of <em>Real World OCaml!</em> Just email <a href="mailto:rwo@tarides.com">rwo@tarides.com</a> when your PR has been merged, and we'll let you know if you're one of the lucky 10 who receive a book!</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#gold-open-access" aria-label="gold open access permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Gold Open Access</h2> +<p>Until recently, you could read <em>Real World OCaml</em> online, but for offline access, you had to rely on the printed version. With Gold Open Access, you can now download a PDF to your local device for easy access at any time.</p> +<p>Whilst this expanded access will benefit everyone, one group that we are particularly excited to support are new learners. OCaml 5.0 is right around the corner, and we want to make the entry to OCaml for new users as easy as possible. Making the book more accessible also aligns with Tarides&rsquo;s inclusivity goals; lowering the barriers to entry into the community will encourage greater participation among people who otherwise would not have had the means to join.</p> +<p>On this topic, David Tranah, the editorial director of Mathematical Sciences and Information Technology at Cambridge University Press, shares his unique insight into the benefits of Open Access: &ldquo;Gold Open Access publishing allows anyone, anywhere, who can connect to the internet to stay up-to-date on the latest research. This in turn drives innovation and leads to new discoveries.&rdquo;</p> +<p>We of course encourage anyone with the means to purchase a physical copy of the book, as it&rsquo;s the result of a lot of hard work and dedication, and it comes in a beautifully printed physical edition.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#real-world-ocaml" aria-label="real world ocaml permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Real World OCaml</h2> +<p>The first version of Real World OCaml was written by Yaron Minsky, Anil Madhavapeddy, and Jason Hickey. Since its release, several contributors have improved on the original text, adding new examples, correcting errors, and expanding on chapters. The second edition was published in 2021 by Anil and Yaron and includes the most recent improvements and changes for an updated version of the book.</p> +<p>The book itself covers several aspects of OCaml, from fundamental concepts like functors and objects, to different tools and techniques, including the OCaml Platform and JSON, as well as a section on the compiler and runtime system. It takes its reader on a journey through OCaml moving from basics to increasingly advanced topics, making it the perfect companion for anyone regardless of their level of OCaml.</p> +<p>Over the years, Tarides has supported Real World OCaml by contributing to the book&rsquo;s tooling infrastructure. Some of that work has transformed into <a href="https://github.com/realworldocaml/mdx">standalone community projects like MDX</a> that help to improve all OCaml documentation.</p> +<p>The book also has a mutually beneficial relationship with OCaml.org, as Thibaut Mattio (currently leading the <a href="https://tarides.com/blog/2022-05-02-ocaml-org-reboot-user-centric-design-content">community redesign effort</a> of <a href="https://ocaml.org">OCaml.org</a>) explains: &ldquo;There are crosslinks between <em>Real World OCaml</em> and V3 of OCaml.org. The new package documentation site is a great accompaniment to Real World OCaml, and there are multiple links to it embedded within the book for API documentation.&rdquo; This is just one example showing how <em>Real World OCaml</em> is used in projects and interacting with new content in a productive and useful way.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#about-the-authors" aria-label="about the authors permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>About the Authors</h2> +<p>Anil Madhavapeddy is Professor of Planetary Computing at the University of Cambridge and a fellow of Pembroke College. He has a wide range of experience, having worked in industry (NetAPP, Citrix, Intel), academia (Cambridge, Imperial, UCLA), and open source (OCaml, OpenBSD, Xen, Docker). +Joining Jane Street in 2003, Yaron Minsky is to thank for introducing the company to OCaml. Founding the firm&rsquo;s quantitative research group, he managed the transition of all its core infrastructure to OCaml, ultimately making it the world&rsquo;s largest industrial user of OCaml. Minsky has also been an avid lecturer, blogger, and writer on the topic of programming, publishing articles in <em>Communications of the ACM</em> and the <em>Journal of Functional Programming.</em></p> +<p>Summing up their thoughts on the Open Access upgrade, Anil and Yaron say: &ldquo;Open Access has been shown to encourage the usage of a particular work, resulting in increased citations and public engagement. We are excited for what this move will mean when it comes to greater accessibility for users of OCaml worldwide: making it easy to use excerpts from our book in new projects, encouraging new learners, and supporting teachers in their work.&rdquo;</p>https://tarides.com/blog/2022-10-14-real-world-ocaml-book-giveawayReal World OCaml Book Giveaway!2022-10-14T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>OCaml is a statically-typed programming language that emphasizes readability, programmer efficiency, and semantic clarity. This powerful and efficient language has been gaining popularity among developers. The growing adoption of OCaml is because it's fast, type safe, and secure. It can be used in industries where performance and security matter, like <a href="https://ocaml.org/success-stories/sensor-analytics-and-automation-platform-for-sustainable-agriculture">IoT</a>, <a href="https://ocaml.org/success-stories/peta-byte-scale-web-crawler">Data Analytics</a>, or <a href="https://ocaml.org/success-stories/large-scale-trading-system">financial services</a>.</p> +<p>Like any other programming language, there are numerous libraries available for OCaml that can make your life easier as a developer. In this article, we will explore some of the top OCaml libraries that will help streamline your workflow and boost your productivity as a programmer.</p> +<p>All these libraries and tools are open source, so they&rsquo;re distributed under a free software licence.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#a-few-helpful-ocaml-libraries" aria-label="a few helpful ocaml libraries permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>A Few Helpful OCaml Libraries</h2> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#lwt" aria-label="lwt permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><strong><code>Lwt</code></strong></h3> +<p><a href="https://github.com/ocsigen/lwt">Lwt</a> is a library for writing asynchronous code. It provides many helpful abstractions for writing asynchronous code such as promises and futures. This is a very useful library for writing network applications, like web servers. That said, the <a href="https://ocaml.org/p/eio/0.5">Eio library</a> in the forthcoming release of OCaml 5.0 might become preferable!</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#dream" aria-label="dream permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><strong>Dream</strong></h3> +<p>Dream is described as a &quot;tidy, feature-complete Web framework&quot; on <a href="https://ocaml.org/p/dream/1.0.0~alpha4">OCaml.org</a>. Dream has a simple programming model where web apps are merely functions, and it supports TLS, WebSockets, and GraphQL. Plus it has cryptography helpers! It's easy-to-use and documented <a href="https://aantron.github.io/dream/">all in one place</a>. The entire Dream API is available there, where you can also find many examples.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#cmdliner" aria-label="cmdliner permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><strong><code>Cmdliner</code></strong></h3> +<p><a href="https://ocaml.org/p/cmdliner/1.1.0/doc/index.html">This library</a> is used by several packages to build command line tools, which is beneficial when you want to write an executable in OCaml. <code>Cmdliner</code> gives programmers a simple, compositional method for turning command line arguments into OCaml values. Not only can you then pass those values to functions, <code>Cmdliner</code> can automatically handle syntax errors, help messages, and UNIX man page generation as well.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#alcotest" aria-label="alcotest permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><strong>Alcotest</strong></h3> +<p>This colorful framework performs simple unit tests on a simple interface. <a href="https://github.com/mirage/alcotest">Alcotest</a> only displays faulty runs at the end of the output, along with full logs for your inspection. The straightfoward, expressive query language makes it easy to select which tests to run, and the results are displayed in a fun rainbow of colors.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#base" aria-label="base permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><strong><code>base</code></strong></h3> +<p>Although the <a href="https://dev.realworldocaml.org/prologue.html#the-core-standard-library"><em>standard library</em></a> is somehow minimalist, multiple extensions exist to make programmers' lives easier. For instance <code>base</code>, created and maintained by Jane Street, is used to develop critical applications by industrial users. <code>base</code>is written in pure OCaml and has no dependencies other than the OCaml standard library. The <code>base</code> library is useful for building many applications. Read more about <code>base</code> in the book <a href="https://dev.realworldocaml.org/prologue.html#the-core-standard-library">*Real World OCaml</a>.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#yojson" aria-label="yojson permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><strong>Yojson</strong></h3> +<p><a href="https://github.com/ocaml-community/yojson">Yojson</a> is an OCaml library for creating and reading JSON data in OCaml. JSON is a data format that is commonly used in web applications. The Yojson bindings can be used to easily generate and parse JSON data in OCaml.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#notty" aria-label="notty permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><strong>Notty</strong></h3> +<p>This interesting <a href="https://ocaml.org/p/notty/0.2.3">OCaml library</a> enables the user to write declarative terminal UI. Notty is based on a notion +of composable images, and it delivers a more simple and expressive model than the basic terminal programming. Engineers know that programming terminals are tedious, so Notty makes it enjoyable!</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#ppxlib" aria-label="ppxlib permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><strong><code>ppxlib</code></strong></h3> +<p>PreProcessor eXtensions, or PPX for short, are used for meta-programming, like for generating boilerplate or for extending the OCaml syntax. PPX act on the AST and are integrated into the language via two AST features: <a href="https://ocaml.org/manual/attributes.html">attributes</a> and <a href="https://ocaml.org/manual/extensionnodes.html">extension nodes</a>. <a href="https://github.com/ocaml-ppx/ppxlib"><code>ppxlib</code></a> is a set of tools and libraries that enables programmers both to write and use PPX. See <a href="https://tarides.com/blog/2019-05-09-an-introduction-to-ocaml-ppx-ecosystem">this Tarides blog post</a> on how to write PPX and <a href="https://ocaml.org/docs/metaprogramming">this official guide</a> on how to use PPX.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusion" aria-label="conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusion</h2> +<p>The tools available in OCaml make it easy to prototype new applications and build production-quality software. In fact, the full release of OCaml 5.0 with Multicore support is on the horizon, and <a href="https://tarides.com/blog/2022-06-15-ocaml-5-alpha-release">the alpha version has already been released</a>.</p> +<p>OCaml libraries help you write beautiful, elegant code in this powerful and versatle language. There has never been a better time to give OCaml a try, and now you know there are beneficial libraries to help you code. The libraries covered in this article are just a few examples. For more information, please visit <a href="https://ocaml.org/packages">ocaml.org</a></p>https://tarides.com/blog/2022-10-12-8-ocaml-libraries-to-make-your-life-easier8 OCaml Libraries to Make Your Life Easier2022-10-12T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>After two years of online conferences, it was fantastic to have ICFP 2022 in person. The conference organisers had done a fantastic job adjusting to online conferences, but nothing beats the hallway track for meeting new people and catching up with old friends. This year, Slovenia's capital hosted the event. Ljubljana was a beautiful city to visit, with plenty of classic European architecture and even a castle to explore.</p> +<p>My conference schedule was packed. I had a talk to present on Friday and five preceding days of conference talks to attend. The first three days were a whirlwind of talks (some on the edge of my understanding) and hallway track conversations.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/3b3345860da2a9852cbef933f9018178/b6249/OCamlMoon.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 61.1764705882353%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/3b3345860da2a9852cbef933f9018178/7bf67/OCamlMoon.jpg" class="gatsby-resp-image-image" alt="OCaml's Trajectory" title="OCaml's Trajectory" srcset="/static/3b3345860da2a9852cbef933f9018178/651be/OCamlMoon.jpg 170w, +/static/3b3345860da2a9852cbef933f9018178/d30a3/OCamlMoon.jpg 340w, +/static/3b3345860da2a9852cbef933f9018178/7bf67/OCamlMoon.jpg 680w, +/static/3b3345860da2a9852cbef933f9018178/b6249/OCamlMoon.jpg 922w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span> +<em><a href="https://twitter.com/yminsky/status/1569956010483220481?s=20&amp;t=Fp12V9v11Xp2kMP0TmOhoQ">Image by Yaron Minsky of Jane Street</a></em></p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#ocaml-reaches-for-the-stars" aria-label="ocaml reaches for the stars permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>OCaml Reaches for the Stars</h2> +<p>Although it happened mid-week, I want to start with KC Sivaramakrishnan's keynote, <a href="https://icfp22.sigplan.org/details/ocaml-2022-papers/16/OCaml-5-0-Concurrent-and-Parallel-programming-for-OCaml">Retrofitting Concurrency &ndash; Lessons from the Engine Room</a>, as it was definitely the highlight of ICFP. He covered the full story of introducing parallelism and concurrency into OCaml, along with references to papers published along the way. <a href="https://youtu.be/6BhmRz7eqiE">Watch the video of his keynote</a>. The &quot;Where do we go from here?&quot; slide ties together the effect system with targeting Javascript, modal types to avoid heap allocations, unboxed types to control memory layout, and Flamba2 for aggressive compiler optimisation. This is hugely exciting for the OCaml community. It brought together many threads of work into a coherent picture.</p> +<iframe width="560" height="315" src="https://www.youtube.com/embed/6BhmRz7eqiE" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="allowfullscreen"></iframe> +<p>There was a real buzz afterwards from both OCaml and Haskell people I talked with. It set the stage for the ML Workshop on Thursday, which included <a href="https://icfp22.sigplan.org/details/mlfamilyworkshop-2022-papers/14/Efficient-and-Scalable-Parallel-Functional-Programming-Through-Disentanglement">Efficient and Scalable Parallel Functional Programming Through Disentanglement</a>, <a href="https://icfp22.sigplan.org/details/mlfamilyworkshop-2022-papers/13/Unboxed-types-for-OCaml">Unboxed Types for OCaml</a>, <a href="https://icfp22.sigplan.org/details/mlfamilyworkshop-2022-papers/10/Module-Shapes-for-Modern-Tooling">Module Shapes for Modern Tooling</a>, <a href="https://icfp22.sigplan.org/details/ocaml-2022-papers/9/Stack-allocation-for-OCaml">Stack Allocation for OCaml</a>, and <a href="https://icfp22.sigplan.org/details/mlfamilyworkshop-2022-papers/4/Boxroot-fast-movable-GC-roots-for-a-better-FFI">Boxroot, Fast Movable GC Roots for a Better FFI</a>, picking up themes from the keynote. The full playlist for the ML Workshop is <a href="https://www.youtube.com/playlist?list=PLyrlk8Xaylp7f8T7L5SFFwOS5_c5d1Jyq">available on YouTube</a>.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/1b16e350a80feeef58c015e1e446a037/e1596/sudhaICFP.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 75.29411764705883%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/1b16e350a80feeef58c015e1e446a037/7bf67/sudhaICFP.jpg" class="gatsby-resp-image-image" alt="Sudha OCaml Workshop" title="Sudha OCaml Workshop" srcset="/static/1b16e350a80feeef58c015e1e446a037/651be/sudhaICFP.jpg 170w, +/static/1b16e350a80feeef58c015e1e446a037/d30a3/sudhaICFP.jpg 340w, +/static/1b16e350a80feeef58c015e1e446a037/7bf67/sudhaICFP.jpg 680w, +/static/1b16e350a80feeef58c015e1e446a037/990cb/sudhaICFP.jpg 1020w, +/static/1b16e350a80feeef58c015e1e446a037/c44b8/sudhaICFP.jpg 1360w, +/static/1b16e350a80feeef58c015e1e446a037/e1596/sudhaICFP.jpg 2048w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>Thursday also featured the <a href="https://icfp22.sigplan.org/details/icfp-2022-tutorials/1/OCaml-5-for-the-working-programmer">OCaml 5.0 for the Working Programmer</a> tutorial, presented by <a href="https://twitter.com/tarides_/status/1570346706448879617">my colleague Sudha Parimala</a> (above) and Marek Kubica at Tarides. There are <a href="https://github.com/Sudha247/ocaml5-tutorial-icfp-22">slides and exercises</a> to work through to help you understand effects and parallelism in OCaml.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#friday-a-full-day-of-ocaml" aria-label="friday a full day of ocaml permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Friday: A Full Day of OCaml</h2> +<p>The final day was dedicated to OCaml Workshops. KC kicked off the first session with a keynote on the &quot;here and now&quot; of OCaml 5.0. His presentation addressed developers' frequently asked questions when moving from sequential OCaml 4 to 5.0, discussed the details of the merge process, and included a deep dive of the developments since the merge of Multicore OCaml early this year. His talk concluded with a call to action, encouraging OCaml developers to start migrating to OCaml 5.0, even if they do not immediately plan to use the new concurrency and parallelism features.</p> +<p>The keynote was followed by Jan Midtgaard, Principal Engineer at Tarides, talking about a number of testing techniques and tools for concurrent programs that Tarides has developed for OCaml 5.0. This was followed by Deepali Ande, KC's student at IIT Madras, presenting a novel way to enable different schedulers written using effect handlers to communicate with each other.</p> +<p>These talks ended up being so popular that there was standing room only, so during the coffee break after the first session, the ICFP organisers announced to everyone that the remaining OCaml presentations would be in &quot;the fanciest theatre ever,&quot; an amphitheatre known as the &Scaron;tih Room (below).</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/796b7001a1f7e4388328e00e8b4f1014/d2602/audience.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 75.29411764705883%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/796b7001a1f7e4388328e00e8b4f1014/7bf67/audience.jpg" class="gatsby-resp-image-image" alt="Full Ampitheatre OCaml" title="Full Ampitheatre OCaml" srcset="/static/796b7001a1f7e4388328e00e8b4f1014/651be/audience.jpg 170w, +/static/796b7001a1f7e4388328e00e8b4f1014/d30a3/audience.jpg 340w, +/static/796b7001a1f7e4388328e00e8b4f1014/7bf67/audience.jpg 680w, +/static/796b7001a1f7e4388328e00e8b4f1014/990cb/audience.jpg 1020w, +/static/796b7001a1f7e4388328e00e8b4f1014/c44b8/audience.jpg 1360w, +/static/796b7001a1f7e4388328e00e8b4f1014/d2602/audience.jpg 4032w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>I presented our collective work on bringing OBuilder to non-Linux platforms. OBuilder is the underlying library responsible for providing sandboxed build environments for <a href="https://ci.ocamllabs.io/">ocaml-ci</a>, <a href="https://opam.ci.ocaml.org/">opam-repo-ci</a>, and <a href="http://check.ocamllabs.io">opam-healthcheck</a>. The talk covered the architecture of OBuilder, showing how it gets used in our multi-archtecture cluster of build machines. I also reviewed the Linux implementation that uses native Linux containerisation technology, like runC and cgroups. Then, moving onto the implementation on macOS, I demonstrated using user isolation to provide sandboxing with some file system tricks, followed by the implementation on Windows using Docker for Windows, which during testing found a number of interesting bugs in LWT and GNU Tar. The full details are available in the <a href="https://github.com/tmcgilchrist/ocaml-2022-submission/blob/master/ocurrent.pdf">Extended Abstract</a> and on GitHub <a href="https://github.com/ocurrent/obuilder/">https://github.com/ocurrent/obuilder/</a>.</p> +<p>The OCaml Workshop also featured an impressive back-to-back presentation from David Allsopp on opam's CLI compatibility work and the upcoming opam 2.2 features. He also covered how to make the OCaml compiler relocatable and how that will allow for fast switch creation in opam. Congratulations to David for making such a polished and well-received double show!</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/cb41f25f5d9f79b79a7e4c880a05b98e/17052/ICFP.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 61.1764705882353%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/cb41f25f5d9f79b79a7e4c880a05b98e/7bf67/ICFP.jpg" class="gatsby-resp-image-image" alt="ICFP" title="ICFP" srcset="/static/cb41f25f5d9f79b79a7e4c880a05b98e/651be/ICFP.jpg 170w, +/static/cb41f25f5d9f79b79a7e4c880a05b98e/d30a3/ICFP.jpg 340w, +/static/cb41f25f5d9f79b79a7e4c880a05b98e/7bf67/ICFP.jpg 680w, +/static/cb41f25f5d9f79b79a7e4c880a05b98e/990cb/ICFP.jpg 1020w, +/static/cb41f25f5d9f79b79a7e4c880a05b98e/c44b8/ICFP.jpg 1360w, +/static/cb41f25f5d9f79b79a7e4c880a05b98e/17052/ICFP.jpg 1994w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#functional-programming-development" aria-label="functional programming development permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Functional Programming Development</h2> +<p>Although OCaml certainly stole the show after <a href="https://youtu.be/6BhmRz7eqiE">KC's keynote</a>, earlier in the week I attended several fascinating workshops, a few of which I outline below. There was also a strong theme of Effects talks throughout the third day of ICFP, which I plan to come back to and review the papers in more detail. Below are some highlights, but the full Haskell Implementors Workshop (HIW) is <a href="https://www.youtube.com/playlist?list=PLyrlk8Xaylp4kkqJltshENjF_SL7-fDTn">available on YouTube</a>.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#haskell" aria-label="haskell permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Haskell</h3> +<p>On Sunday, I spent my day in the Haskell Implementors Workshop, which kicked off with the &quot;State of GHC&quot; talk from Simon Peyton Jones. Simon covered all the new and important developments in 9.4 and <a href="https://gitlab.haskell.org/ghc/ghc/-/wikis/status/ghc-9.6">9.6</a>. For me, the complete overhaul of <a href="https://www.haskell.org/ghc/blog/20220807-ghc-9.4.1-released.html">GHC&rsquo;s Windows support</a>, including many fixes in WinIO, refactoring of GHC's error messages, and the ongoing work to upsteam GHCJS and WebAssembly backends into GHC, were the most exciting changes. The other takeaway was how Cabal development has been overhauled with a new team managing the project and the resulting acceleration of improvements making their way into Cabal, starting from Cabal 3.6. That, combined with the significantly improved Haskell LSP support, means Haskell tooling is in a great place.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#ghc--racket" aria-label="ghc racket permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>GHC &amp; Racket</h3> +<p>Alexis King's talk, &quot;<a href="https://icfp22.sigplan.org/details/hiw-2022/6/A-look-across-the-pond-a-comparison-between-GHC-and-Racket-compilation-models">A Look Across the Pond: A Comparison Between GHC and Racket Compilation Models</a>,&quot; was a great highlight on how Racket tooling works and how Cabal could be further improved, and perhaps how we could improve opam. The key idea was that Racket uses a set of core data structures to represent packages and provides functions across those data structures, with the end-user tooling being a very thin wrapper around these functions. Alexis demonstrated how to query and manipulate the set of installed packages within Racket in a way that's impossible with current Cabal. This is an intriguing idea that hopefully gets some attention, and I wonder if the more dynamic representation available in Racket makes this easier compared to Haskell.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#ghc--mu" aria-label="ghc mu permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>GHC &amp; Mu</h3> +<p>My second recommendation is &quot;<a href="https://icfp22.sigplan.org/details/hiw-2022/8/Compiling-Mu-with-GHC-Halfway-Down-the-Rabbit-Hole">Compiling Mu with GHC: Halfway Down the Rabbit Hole</a>&quot; by Georgo Erdi, which covered the effort to port Mu to reuse the GHC compiler frontend and backend as much as possible. Currently Mu uses a custom compiler for its strict Haskell variant that includes MultiParam Typeclasses and Functional Dependencies, along with a variation on Type Families. I like hearing about compiler engineering efforts that need to handle existing codebases and think through the trade-offs. The choice of Functional Dependencies and Multiparam Typeclasses is a sweet spot in the design space for typeclasses, with Purescript choosing a similar approach.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#ocaml-reception" aria-label="ocaml reception permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>OCaml Reception</h2> +<p>Overall the ICFP had a huge buzz of excitement around OCaml 5.0 featuring Multicore support and effects. The palpable enthusiasm after <a href="https://youtu.be/6BhmRz7eqiE">KC's impressive keynote</a> lasted throughout the rest of the week. I had many great conversations with both OCaml and Haskell people about the new features and how exciting the future is for OCaml. In fact, the OCaml Farewell Reception, held at the Ljubljana Zoo, attracted three times the expected number of attendees because everyone wanted to keep talking about the new features coming soon in OCaml 5.0. I spoke at length with a senior Haskell programmer who was very interested in OCaml Multicore, and we all enjoyed sampling a local honey liqueur and having a hot meal, which helped warm us that cool, rainy evening. As a special treat, the ICFP organisers even arranged for a real live camel to make an appearance! It was a delightful evening, and the perfect ending to ICFP 2022.</p> +<p>In the end, the week in Ljubljana was fulfilling, both on a personal and professional level. After over two years of limited in-person events, it was truly refreshing to meet and network with colleagues face to face.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/9dc69c41b73b46497cdeacb56da93b43/d2602/dragon.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 75.29411764705883%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/9dc69c41b73b46497cdeacb56da93b43/7bf67/dragon.jpg" class="gatsby-resp-image-image" alt="Ljubljana Dragon" title="Ljubljana Dragon" srcset="/static/9dc69c41b73b46497cdeacb56da93b43/651be/dragon.jpg 170w, +/static/9dc69c41b73b46497cdeacb56da93b43/d30a3/dragon.jpg 340w, +/static/9dc69c41b73b46497cdeacb56da93b43/7bf67/dragon.jpg 680w, +/static/9dc69c41b73b46497cdeacb56da93b43/990cb/dragon.jpg 1020w, +/static/9dc69c41b73b46497cdeacb56da93b43/c44b8/dragon.jpg 1360w, +/static/9dc69c41b73b46497cdeacb56da93b43/d2602/dragon.jpg 4032w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span> +<em>The Famous Ljubljana Dragon Bridge</em></p>https://tarides.com/blog/2022-10-10-icfp-2022-reviewICFP 2022 Review2022-10-10T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Tarides is excited to sponsor the <a href="https://esolangconf.com/">Paradigm Conference</a> (previously EsoLangConf) high school hackathon. This weekend, students from all over the world will team up to solve tricky programming problems, investigate diverse features of a range of programming languages, and build cool things!</p> +<p>At Tarides, we are always looking for new ways to increase the awareness and adoption of OCaml. The Paradigm Conference is a fantastic opportunity for all students interested in computer science to discover real-world use cases of OCaml.</p> +<p>If you are a high school student interested in using OCaml to solve fun and complex problems while learning new skills and meeting new people, you can register for the conference <a href="https://docs.google.com/forms/d/e/1FAIpQLSdEuny13Vb7n3tMiJ9r1Ci3OoRSWhlU3nO73gjDdmpZywVnKw/viewform">here</a>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#what-makes-this-conference-unique" aria-label="what makes this conference unique permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What Makes this Conference Unique?</h2> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#created-by-students-for-students" aria-label="created by students for students permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Created by Students for Students</h3> +<p>Rohan Mehta and the organisational team provide a safe and inclusive hackathon space for attendees from anywhere in the world to explore different programming languages and concepts. Breaking out of the standard computer science curriculum, they have designed a hackathon specifically for high school students to showcase non-mainstream languages and the diversity of programming approaches available.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#a-diversity-of-programming-paradigms" aria-label="a diversity of programming paradigms permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>A Diversity of Programming Paradigms</h3> +<p>The conference features different language tracks, focussing specifically on functional, array-based, and knowledge-based programming. Each team will choose from OCaml, Haskell, Clojure, Wolfram, and APL, letting them explore the unique features of these less well-known languages and discovering their benefits for themselves. The conference puts the joy of programming at the top of the priority list, giving students a fantastic opportunity to experiment and broaden their horizons.</p> +<p>If you want to learn about pattern matching, macros, or higher-order functions, then you&rsquo;re in the right place!</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#knowledge-sharing" aria-label="knowledge sharing permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Knowledge Sharing</h3> +<p>The organisers have gathered learning resources for every language in the conference&mdash;a mammoth task in itself! Not only are they widely sharing knowledge that already exists (but may not be easy to find), but they are also creating <a href="https://docs.google.com/document/d/e/2PACX-1vRtBufinbvANjQUMJrFdKyQ0VhsICM6QJ5K040MswBFMqGxuIGDrgLYsDLT-4txw1ZkVd-AJ0LCjCCo/pub?urp=gmail_link">new living documents</a> for each language built from these existing resources and their own learning experiences.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#combined-coding-competitions-and-world-class-lectures" aria-label="combined coding competitions and world class lectures permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Combined Coding Competitions and World-Class Lectures</h3> +<p>The team has worked hard to create an engaging and interesting event by interspersing talks from industry language users and programming language experts with coding competitions and hackathon events. Naturally, any conference wouldn&rsquo;t be complete without swag! Each attendee will receive a sticker and t-shirt, with additional prizes for the competition winners.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#get-involved" aria-label="get involved permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Get Involved!</h2> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#attending" aria-label="attending permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Attending</h3> +<p>Students will join teams (of up to 5) by either creating one in advance or registering as an individual. Everyone will be allocated to a team once the conference starts.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#mentoring" aria-label="mentoring permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Mentoring</h3> +<p>If you are a high school student and you&rsquo;re interested in attending, you can sign up here. If you have a bit more experience with any of the languages featured, you can join as a mentor and help students grasp new languages and concepts.</p> +<p>Rohan and his team promise that if you attend Paradigm Conf 2022 &ldquo;your programming worldview will be flipped upside down!&rdquo;</p> +<p>You can find the Paradigm Conference on <a href="https://www.instagram.com/esolangconf/">Instagram</a> and <a href="https://twitter.com/EsolangT">twitter</a></p>https://tarides.com/blog/2022-09-23-tarides-sponsors-high-school-hackersTarides Sponsors High School Hackers2022-09-23T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p><em>The tech industry has long struggled with a lack of diversity. This existing imbalance combined with social and educational problems such as early gender bias still tends to prevent lots of people, including women, from entering the field. <em>Girls Can Code</em> aims to help young women learn programming and gain valuable experience working on projects alongside other like minded individuals.</em></p> +<p>Tarides is proud to share that we sponsored a <a href="https://girlscancode.fr"><em>Girls Can Code</em></a> summer camp! Between Aug 22nd and 27th, the camp offered participants a fully-packed week of programming, socialising, and learning.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#what-is-girls-can-code" aria-label="what is girls can code permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What is Girls Can Code?</h2> +<p><em>Girls Can Code</em> is an initiative launched by the organisation <a href="https://prologin.org"><em>Prologin</em></a>, hosting summer camps specifically aimed at teaching young women about computer programming. No prior experience is required, and they accept participants from secondary school and up through the equivalent of A-Levels. Attendance is free, since the events are run by students who generously volunteer their time and expertise.</p> +<p>Camps come in two variants: long and short. The long camps last for a week and the short ones for a weekend. The short camps cover less content, but can be organised more frequently. They are always in the form of a practical introduction to computer science, but may focus on special topics. The week-long summer camps start with an introduction to Python and then continue with several tutorials on various topics. Finally, all participants have the chance to complete a personal project on either robotics, video games, or microcontrollers.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#impact" aria-label="impact permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Impact</h2> +<p>At Tarides, we&rsquo;re committed to using our resources to foster diversity and inclusion. Our own goal is to have 50% of our tech roles be filled by women. For that to be a reality, not just at Tarides but everywhere, more women need to feel welcome in the tech space. Sadly, according to this <a href="https://www.lemonde.fr/campus/article/2017/12/11/femmes-et-informatique-vingt-ans-de-desamour_5227726_4401467.html">Le Monde article</a>, only 11% of French women chose to pursue IT careers in 2010. A more <a href="https://technation.io/diversity-and-inclusion-in-uk-tech/#executive-summary">recent survey</a> by Tech Nation showed that in the UK, only 26% of the tech workforce is made up by women. While this number is better, there&rsquo;s still a lot of work to be done before women make up 50% of the workforce in tech.</p> +<p><em>Girls Can Code</em> works proactively to inspire a generation of young French women and make a difference in the sector. They have been arranging summer camps since 2014 and have been growing steadily since. We&rsquo;re proud to support their efforts for a more equitable future!</p>https://tarides.com/blog/2022-09-06-tarides-sponsors-girls-can-codeTarides Sponsors Girls Can Code2022-09-06T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p><em>Relaxing in today&rsquo;s world can be difficult. Taking the time you need to cool off, refocus, and explore something new requires a solid amount of time in which you can disconnect from daily habits and find a new beat.</em></p> +<p>At Tarides we address this by providing the framework needed for our employees to take that unbroken time away from work. In August, all Tarides employees get two weeks of paid leave to go away and come back refreshed.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#how-it-began" aria-label="how it began permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>How it Began</h2> +<p>We first trialled the two weeks of leave in 2021 to help alleviate the stress caused by the pandemic. Taking a solid two weeks off would allow everyone to slow down and enjoy the last of the summer months (or winter for our Australian colleagues!) without having to worry about losing pay or using up annual leave in the face of an uncertain global situation.</p> +<p>The results were very positive, with a lot of good feedback across the teams. People came back refreshed and inspired, easily making up for the time away. An important takeaway from the Tarides team was that since everyone was away at the same time, no one had to worry about having a pile of work waiting for them when they came back. This made everyone&rsquo;s holiday more enjoyable and restful.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#its-back" aria-label="its back permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>It&rsquo;s Back!</h2> +<p>Since it was a very popular measure last year, we decided to reintroduce it as a recurring event! From August 8th to August 19th, Tarides had its official 2022 office closure. We hope the entire Tarides team took some time to go on adventures or simply relax before the rest of the year.</p> +<p>Taking time off does not just allow everyone to recharge their batteries, but it also lets them experience new things that can generate moments of inspiration. That said, rest is also very powerful: when we rest, we prepare for the challenges ahead, increase our resilience, and strengthen our resolve.</p> +<p>Here&rsquo;s to a great rest of 2022!</p>https://tarides.com/blog/2022-08-26-tarides-goes-on-holidayTarides Goes on Holiday!2022-08-26T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#introduction" aria-label="introduction permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Introduction</h2> +<p>Over the past six months, I have been working on using Irmin in the browser, including <code>irmin-server</code> and the GraphQL interface. This has been fun and a great learning journey for me. Before this internship, <code>irmin-server</code> was primarily a Unix-based application. My project was to port <code>irmin-server</code> to work in the browser and design interfaces for people to interact with the store (Irmin stores).</p> +<p>I was paired to work with Patrick Ferris as my mentor and with the entire Irmin team, who all contributed immensely to this project.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#irmin-and-irmin-server" aria-label="irmin and irmin server permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Irmin and <code>irmin-server</code></h2> +<p>Irmin is simply a data store (database). It is based on the same design principle as Git with features to merge and branch data stores. Irmin has several stores (<code>irmin-mem</code>, <code>irmin-indexeddb</code>, <code>irmin-fs</code>, <code>irmin-chunk</code>, <code>irmin-git</code>) and store interfaces (<code>irmin-http</code>, <code>irmin-graphql</code>).</p> +<p><code>irmin-server</code> is a high-performance server for Irmin. For efficient communication, it implements a specialised <a href="https://github.com/mirage/irmin-server/blob/master/PROTOCOL.md">wire-protocol</a> to send and receive data over a bytestream. It wraps an Irmin store, providing a way to connect to the server and access the store via its API using a client. But the client makes an assumption that the user is on a Unix machine, which makes <code>irmin-server</code> primarily a Unix-based application.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#irminirmin-client-in-the-browser" aria-label="irminirmin client in the browser permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><code>irmin/irmin-client</code> in the Browser</h2> +<p>In this modern age, it's become a necessity to make applications &quot;offline first.&quot; Offline-First applications function without being affected by the intermittent lack of a network connection. It usually implies the ability to sync data between multiple devices. Irmin as a data store supports multiple backends, making it very portable. Plus, Irmin's mergeable replicated data-types make it much easier to build applications that can transform the state offline and resynchronise the state later, just like Git. With this concept, resynchronising Irmin stores (from server to client) is much simpler on <code>irmin-server</code>, which implements a specialised wire protocol for efficient communication. Making <code>irmin/irmin-client</code> work in the browser simply means that it would be possible to create offline-first web applications.</p> +<p>More information on offline-first applications can be found <a href="https://2022.ecoop.org/home/plf-2022">here</a></p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#the-problems" aria-label="the problems permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The Problems</h2> +<p>An initial summary of the problem was published on this <a href="https://github.com/mirage/irmin-server/issues/46">issue</a>, but here is a quick breakdown of the problems we identified.</p> +<ol> +<li><strong><code>irmin-server</code> was tightly coupled around <code>conduit-lwt-unix</code>:</strong> <code>irmin-server</code> was initially designed to be a Unix-based application that established communication with a client via <code>conduit-lwt-unix</code>. This became a problem because <code>conduit-lwt-unix</code> cannot establish a communication from a browser. This meant that there was a need to abstract the I/O module so that every client will provide its I/O.</li> +<li><strong>Reuse some internal modules:</strong> We needed to reuse the <code>irmin-server</code> internal logic related to the protocol but provide a portable I/O interface that can work in the browser.</li> +<li><strong>Provide a browser communication channel:</strong> We needed a non-blocking way to establish a channel to create communication between <code>irmin-server</code> and the browser, and also pass data across this channel.</li> +</ol> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#the-solutions" aria-label="the solutions permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The Solutions</h2> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#irmin-server-was-tightly-coupled-around-conduit-lwt-unix" aria-label="irmin server was tightly coupled around conduit lwt unix permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><code>irmin-server</code> was tightly coupled around <code>conduit-lwt-unix</code></h3> +<p>Thanks to Zach Shipko, who abstracted the I/O library and split out <code>irmin-client-unix</code> and <code>irmin-client-cli</code> to have their own I/O module that depends on <code>conduit-lwt-unix</code> (<a href="https://github.com/mirage/irmin-server/pull/32">here</a>), a client can connect to a running <code>irmin-server</code> using its own I/O module. While he was working on the restructuring, I spent my time working on a sample project that combines <code>dream</code> with <code>irmin-graphql</code> (more on this project).</p> +<p>With the coupling out of the way, the next step was to create <code>irmin-client-jsoo</code>, a browser client with its own I/O module.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#irmin-server-was-primarily-a-unix-based-application" aria-label="irmin server was primarily a unix based application permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><code>irmin-server</code> was primarily a Unix-based application</h3> +<p>The <code>irmin-server</code> initial architecture had to be restructured to accommodate other platforms. To achieve this, <code>irmin-client</code> was no longer coupled with a specific I/O implementation. Rather, a Unix-based one was provided over conduit flows, which are <code>Lwt_io</code> input and output channels. This channel was established over a TCP connection or a Unix domain socket.</p> +<p>Right now, <code>irmin-server</code> can communicate with two (2) clients: <code>irmin-client-cli</code> from a command line and <code>irmin-client-unix</code> from a Unix-based machine. This project was about creating a third client: <code>irmin-client-jsoo</code>, to be called from browser applications.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#enable-communication-from-the-browser" aria-label="enable communication from the browser permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Enable communication from the browser</h3> +<p>After considering other options to create a communication channel for <code>irmin-client-jsoo</code>, like HTTP, RPC, etc., Patrick suggested WebSocket, so we decided to go with WebSocket, a bidirectional communication protocol between client and server.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#the-challenges" aria-label="the challenges permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The Challenges</h4> +<p><code>irmin-server</code> uses flows to communicate between the server and the client and flows are bytestreams. WebSocket provides a bidirectional communication channel in the browser, but it is not stream-oriented rather it is message-oriented.</p> +<p>TCP (Transmission Control Protocol) is a type of protocol or standard to transfer information over the Internet while WebSocket is a message-oriented application protocol, which uses TCP as the transportation layer.</p> +<p>The idea behind the WebSocket protocol consists of reusing the established TCP connection between a client and server. Even though WebSocket is built on TCP, the data it passes is always either sent as a whole &quot;message&quot; or not at all. These implementations are non-blocking.</p> +<p>Since we are avoiding a full redesign of the <code>irmin-server</code> protocol, we had to make the message-oriented process seem like bytestreams of data.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#more-on-irmin-client-jsoo" aria-label="more on irmin client jsoo permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>More on <code>irmin-client-jsoo</code></h2> +<p>Communicating with <code>irmin-server</code> from the browser is very easy. You can achieve that by following these steps:</p> +<ol> +<li>Pin <code>irmin-server</code>, using this command: <code>opam pin add git+https://github.com/mirage/irmin-server/commit#013a28fd1507f8ba69494515533119804903aa99</code></li> +<li>Set up the server.</li> +</ol> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">open Lwt.Syntax +module Store = Irmin_mem.KV.Make (Irmin.Contents.String) +module Server = Irmin_server.Make (Store) + +let main = + let uri = Uri.of_string &quot;ws://localhost:9090/ws&quot; in + let config = Irmin_git.config &quot;penit&quot; in + let* store = Store.Repo.v config in + let* main = Store.main store in + let* server = Server.v ~uri config in + let () = Format.printf &quot;Listening on %a@.&quot; Uri.pp uri in + Server.serve server + +let () = Lwt_main.run main</code></pre></div> +<p><a href="https://github.com/dinakajoy/pen-it-down/blob/main/server/server.ml">Check out this implementation</a></p> +<ol start="3"> +<li>Create the client and ping the server.</li> +</ol> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">module Store = Irmin_mem.KV.Make (Irmin.Contents.String) +module Client = Irmin_client_jsoo.Make (Store) + +let config = Irmin_client_jsoo.config (Uri.of_string &quot;ws://localhost:9090/ws&quot;) +let client = Client.Repo.v config in +Client.ping client</code></pre></div> +<p>More examples can be found on <a href="https://github.com/mirage/irmin-server/tree/master/examples">here</a></p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#my-projects" aria-label="my projects permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>My Projects</h2> +<p><strong>Simple Mini GitHub:</strong> +I worked on this project to experiment with combining <code>irmin-graphql</code> with <code>dream</code>. This turned out simpler than I thought. You only need to expose <code>irmin-graphql</code> schema. In this application, you simply enter a GitHub repository, and the repository details such as name, date, author, commit message, and README file will be displayed. You can also open <code>/graphiql</code> and make queries.</p> +<p>The full code can be accessed <a href="https://github.com/dinakajoy/simple_mini_github">here</a>.</p> +<p><strong>Pen-It-Down:</strong> +Pen-it-down is a note app that uses <code>irmin-indexeddb</code> and <code>irmin-server</code> to show an offline-first functionality. Users can type in their notes without being bothered about internet connectivity. You can create, edit, delete, and sync your notes to the server.</p> +<p>The full code can be accessed <a href="https://github.com/dinakajoy/pen-it-down">here</a>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusion" aria-label="conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusion</h2> +<p>Working on this project was challenging! I am so glad I had the opportunity to work on it, even though there were days I felt lost. Some days I was confused because it seemed I was doing the wrong thing. Other days I was happy because things worked as expected! It&rsquo;s basically been about research and experimenting for me. I learned a lot from Patrick and Zach. I was exposed to networking concepts like the network layers, client-server handshake, data encryption, and decryption, and I got to try out WebSocket for the first time. I look forward to building more projects with OCaml.</p>https://tarides.com/blog/2022-08-02-irmin-in-the-browserIrmin in the Browser2022-08-02T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p><em>Cybersecurity is a growing concern for individuals and companies alike. At Tarides, security is at the centre of every solution we provide, and this year we have been recognised for our efforts! We&rsquo;ve been accepted to <a href="https://tarides.com/blog/2022-06-28-thales-cyber-station-f-selection">Cyber@StationF&rsquo;s acceleration program</a> and are now featured in the 2022 Cybersecurity Startup Radar.</em></p> +<p>Tarides is proud to announce that we&rsquo;re part of the <a href="https://www.wavestone.com/en/insight/cybersecurity-startups-radar-2022/">2022 Cybersecurity Startup Radar</a> by Wavestone and BPIFrance! It spotlights promising startups who are making a difference in the cybersecurity arena. Tarides has been featured in the Radar twice before, in 2019 and in 2020.</p> +<p>Tarides is featured under the &lsquo;IoT Security&rsquo; section of the Wavestone Radar, demonstrating our commitment to creating safer and more efficient software for the IoT (Internet of Things) ecosystem. At Tarides, security-by-design is at the core of everything we do; it&rsquo;s our guiding development principle that we use to produce a variety of solutions addressing the different challenges present in this space.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#taridess-cybersecurity-solutions" aria-label="taridess cybersecurity solutions permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Tarides&rsquo;s Cybersecurity Solutions</h2> +<p>Most of today&rsquo;s software solutions are very complex and as a result often inefficient and vulnerable to attack. It is well documented that IoT technology and devices can pose safety risks due to their development environment and limited processing capabilities that don&rsquo;t offer full protection from attacks.</p> +<p>Our technology solves these problems in a revolutionary way by combining the features of OCaml (focused on security, safety, and efficiency) with state-of-the-art tools (as part of MirageOS) to provide solutions to complex problems where mistakes can have disastrous consequences.</p> +<p>Firstly, we provide several solutions relating to device connectivity, from internet protocols to low-bandwidth networks. Secondly, we build custom applications that are securely deployed on IoT devices either on bare-metal or within hypervisors. MirageOS provides an efficient IoT environment with a small footprint. Finally, we address the IoT security layer via formally verified cryptographic libraries and other security building blocks that match the latest standards.</p> +<p>We maintain a special focus on cybersecurity to ensure that efficiency and security go hand-in-hand, and that one is not achieved at the expense of the other. We develop and maintain secure, fast, and performant code solutions that leverage OCaml's strong safety features for reliable results. We collaborate closely with the thriving open-source community surrounding the language, which constantly tests and audits its performance.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#about-the-radar" aria-label="about the radar permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>About the Radar</h2> +<p>The Radar is used to analyse the French cybersecurity sector: what solutions are being worked on, what trends are emerging, and what kind of innovation is happening. In light of the current geopolitical climate, cybersecurity is a major strategic issue now more than ever. Consequently, this year&rsquo;s Radar is perhaps even more salient than those of previous years, as it highlights the role of France&rsquo;s innovation ecosystem in advancing cybersecurity goals. +By being featured, Tarides gains a lot of visibility in the sector and can also discover other startups and promising projects in the same field. We&rsquo;re excited to be part of this great networking opportunity!</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#about-wavestone--bpi-france" aria-label="about wavestone bpi france permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>About Wavestone &amp; BPI France</h2> +<p><a href="https://www.wavestone.com/en/">Wavestone</a> operates at the intersection of management and consulting, helping their clients not just overcome but master challenges whether they be digital, competitive, or environmental. Their mission is to guide organisations during critical transformations, helping them obtain the best results. Furthermore, they are also committed to promoting ethical and sustainable solutions that benefit society as a whole.</p> +<p><a href="https://www.bpifrance.com">BPI France</a> is a financial institution whose mission is to support entrepreneurs and visionaries who take risks to achieve their goals and grow their businesses. BPI France has many resources that they make available to companies, offering support for innovation, coaching, acceleration programmes, and international expansion. They focus on smaller businesses such as microbusinesses, SMEs, and mid-caps, but also offer solutions for larger companies.</p>https://tarides.com/blog/2022-07-19-tarides-is-on-the-wavestone-radarTarides is on the Wavestone Radar!2022-07-19T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>In February 2022, we released Dune 3.0. This updated version is the result of considerable development work over the previous six months. Dune 3.0 contains many new features, one of which is &ldquo;watch mode,&rdquo; an exciting new feature explained below.</p> +<p>As a build system, Dune&rsquo;s main goal is to build targets. These targets can be either files (like an executable file) or &ldquo;aliases,&rdquo; a group of targets that can have a visible outcome (like running tests). By default, when running a build, Dune receives a target. Dune will then build it and exit. For example <code>dune build</code> (an alias for <code>dune build @all</code>) will build everything it knows about, then exit.</p> +<p>When working on a piece of code, many developers use an edit-save-build loop:</p> +<ul> +<li>Edit a piece of code</li> +<li>Save the corresponding file</li> +<li>Run a build command</li> +</ul> +<p>Using the outcome of the build (i.e., Did the build work? Did the tests all pass?), developers start a new iteration of this loop manually, but it&rsquo;s more efficient to have a quick, automated iteration process. This is the goal of the &ldquo;watch mode.&rdquo;</p> +<p>When active, Dune will watch the source files in a project, and when one of them has changed, it will re-execute the same build command automatically and display the results of the build. It doesn&rsquo;t exit automatically, so it continues watching for changes. This is more efficient because the developer can stay focussed in their text editor and see the build start automatically when the file is saved. You can enable watch mode by passing the <code>-w</code> flag to <code>dune</code>, like <code>dune build -w</code>.</p> +<p>A simple implementation is to have a special process check for file changes in the source tree and run the build command when something has changed. This works, but it isn&rsquo;t a very precise solution. First because the external process doesn&rsquo;t know about the relationships between the files, so it will run more builds than necessary. For example, changing a README file usually should not trigger a new build because it isn&rsquo;t a source file. But also, there are various subtleties to handle. If a file is changed while a build is running, a new build should be started, but the previous one should also be cancelled.</p> +<p>For these reasons, it&rsquo;s more efficient to have the build system itself &ldquo;drive&rdquo; the watch mode. This is how it&rsquo;s implemented in Dune 1.x and Dune 2.x. When starting a build in watch mode for a certain target, Dune computes the set of files that can influence this target (using the build rules) and calls an external process that can subscribe to file changes. When a file changes, Dune cancels existing builds and will start a new one.</p> +<p>This is better, but it&rsquo;s still not very efficient. To see why, let&rsquo;s see what Dune does and how it can be fast.</p> +<p>To run a build, Dune needs to do two things:</p> +<ul> +<li>Load the rules: detect the workspace (determine which files to consider), parse the <code>dune</code> files (open them, transform them into s-expressions and stanzas), and interpret them (execute the logic to transform the stanzas into rules)</li> +<li>Execute the rules: copy files around, call external processes, etc.</li> +</ul> +<p>The time it takes to load the rules is related to the size of the current workspace (number and size of <code>dune</code> files). This is particularly noticeable in organisations that use a monorepo (all the source code in a large Dune workspace). It's difficult to make this step fast because it has a lot of work to do, but it's doable by avoiding computing the same things over and over, made possible by an internal memoisation framework. An initial version of this system is described <a href="https://dune.build/blog/new-computation-model/">in this blog post</a>.</p> +<p>The time it takes to execute the rules depends on the amount of work necessary. For example, a clean build needs to execute most of the build actions, while a second full build usually needs to execute no rule at all. To make this step fast, Dune tries to avoid executing actions that wouldn't change the final outcome (a technique called early cutoff), and it executes independent actions in parallel.</p> +<p>In the context of the watch mode, whenever a new build starts, Dune has to forget everything it knows about the workspace, so it will reload all the rules. This is pretty wasteful.</p> +<p>To do better, the new watch mode in Dune 3 makes rule loading incremental. For example, if a <code>dune</code> file is edited to add a stanza, Dune parses it again, only adding the new rule. The other ones are not interpreted. This ensures very fast iteration times.</p> +<p>This project was challenging because for it to work, everything in the Dune core had to be ported to the memoisation API. For instance, the library loading code (which looks for library definitions in the current opam switch and in the Dune workspace) relied on a &ldquo;classic&rdquo; cache (a global hash table) to avoid parsing files repeatedly. However, this does not play nice with the memoisation API, which assumes that the functions it caches are all pure. So, in Dune 3, this piece of code has been rewritten on top of the memoisation API. This has another benefit: since file system accesses (&ldquo;does this file exist?&rdquo;) are cached too, the memoisation API now has an idea of which functions can read which files. This is used to re-evaluate only the affected parts of the rule graph once a file is modified.</p> +<p>Thanks to that work, watch mode is now a lot more responsive than in Dune 2.x. This performance improvement is barely noticeable in small-to-medium-sized projects, but it is essential in a workspace with several million lines of OCaml code. In such a setting, re-evaluating rules over and over (either by manually running <code>dune build</code> or by using the strategy in Dune 2.x) means that the feedback loop takes dozens of seconds instead being almost instantaneous.</p> +<p>As Dune performance improves, it's able to support workspaces that are larger and larger. This means that the bottlenecks shift to different places. We'll continue to improve Dune so that it stays a build system that's convenient to use and endlessly scalable.</p>https://tarides.com/blog/2022-07-12-faster-incremental-builds-with-dune-3Faster Incremental Builds with Dune 32022-07-12T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Tarides provides support and development services for OCaml tools, packages, and libraries for our commercial partners and for the benefit of the entire OCaml community. We focus on groundbreaking innovation, feature development, and crucial maintenance of OCaml-based projects. One of these projects is called Merlin, an advanced Integrated Development Environment (IDE).</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#overview" aria-label="overview permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Overview</h2> +<p>When someone hears the word &quot;Merlin,&quot; images of King Arthur, the Round Table, and the Holy Grail might come to mind. This fantasy world introduces us to Merlin, the mighty wizard who becomes Arthur's advisor and mentor. When speaking in the world of technology, our Merlin's magic comes in the form of a powerful editor service that offers completion, typing, navigation, refactoring, and code generation. This IDE was made specifically for OCaml, so it complements the safety and expressiveness of the OCaml language with powerful tools. In short, Merlin is a loyal companion that magically helps OCaml developers be more productive and write better programs!</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#installing-merlin" aria-label="installing merlin permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Installing Merlin</h2> +<p>Merlin integrates with most editors, including Visual Studio Code (VSCode), via the Language Server Protocol (LSP). It also implements custom features on top of LSP to enable powerful developer workflows that are only available in OCaml. As a result, Merlin helps developers write in OCaml more easily, as they&rsquo;re provided with instant feedback on any possible errors that they could make. It helps train these programmers to make fewer errors in the future and eases project maintenance by automating complex (and otherwise error-prone) workflows.</p> +<p>The easiest way to install Merlin is by using VSCode's OCaml extension. See the <a href="https://github.com/ocamllabs/vscode-ocaml-platform#readme">manual</a> for more information. You can also use Merlin through Vim or GNU Emacs. Read more on <a href="https://ocaml.github.io/merlin/">OCaml.org's Merlin page</a>.</p> +<p>Once the installation process is complete, your editor will automatically start Merlin whenever an <code>.ml</code> or <code>.mli</code> file is opened.</p> +<p>Voil&agrave;! So easy!</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#merlin-in-use" aria-label="merlin in use permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Merlin in Use</h2> +<p>Here's a glimpse of what Merlin looks like in VSCode:</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/6f0c10eb54f23b147c8ce58d47bfcd4d/ddf4f/merlin.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 79.41176470588235%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/6f0c10eb54f23b147c8ce58d47bfcd4d/c5bb3/merlin.png" class="gatsby-resp-image-image" alt="Merlin in VSCode" title="Merlin in VSCode" srcset="/static/6f0c10eb54f23b147c8ce58d47bfcd4d/04472/merlin.png 170w, +/static/6f0c10eb54f23b147c8ce58d47bfcd4d/9f933/merlin.png 340w, +/static/6f0c10eb54f23b147c8ce58d47bfcd4d/c5bb3/merlin.png 680w, +/static/6f0c10eb54f23b147c8ce58d47bfcd4d/b12f7/merlin.png 1020w, +/static/6f0c10eb54f23b147c8ce58d47bfcd4d/b5a09/merlin.png 1360w, +/static/6f0c10eb54f23b147c8ce58d47bfcd4d/ddf4f/merlin.png 1624w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>Merlin is the default IDE tool for OCaml developers. It's utilised by many commercial OCaml users who are funding maintenance work and evolutions of the project. For instance, Jane Street developers, sysadmin, and traders confidently use Merlin to browse, maintain, and modify an OCaml codebase that runs into millions of lines of code. This codebase provides the foundation for Jane Street's financial market trading around the world. It critically depends on tools (such as Merlin&rsquo;s) to ensure it can continue to evolve while functioning safely.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#beneficial-features-in-merlin" aria-label="beneficial features in merlin permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Beneficial Features in Merlin</h2> +<p>One of Merlin's main developers, Fr&eacute;d&eacute;ric Bour, says his favourite feature is &quot;completion,&quot; which has the ability to complete a prefix typed by a programmer in a manner that is (somewhat) relevant in the context. He says, &quot;I like it for two reasons: less things to remember (and in a programming language, usually you have to recall exactly because there is not much room for fuzzy interpretation) and also it makes things 'discoverable.' Sometimes you work in an area that is new to you, and looking at the Merlin view with its suggestions really helps engineers become acquainted with this program.&quot;</p> +<p>The Tarides CTO, Thomas Gazagnaire, loves that Merlin &quot;is the perfect companion to any professional OCaml developer. It helps navigate completely new codebases by providing the necessary feedback to learn the project (and the OCaml language!) quicker. It is also super useful to refactor large pieces of existing code, with immediate feedback and hints. I remember very clearly when I started using Merlin on my projects. It provided me a great productivity boost that completely changed the way I programmed in OCaml and made me much more effective. Tarides is now committed to make sure this tool continues to be supported actively. We're always adding new features to improve developer productivity, like value and type renaming, semantic search over a whole project, etc.&rdquo;</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#learn-more" aria-label="learn more permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Learn More</h2> +<p>If you'd like to read more about Merlin, or become a contributor, visit its <a href="https://github.com/ocaml/merlin">GitHub repo</a>, and feel free to <a href="https://github.com/ocaml/merlin/issues">open an Issue</a> if you have any suggestions. Please <a href="mailto:contact@tarides.com">contact us</a> if you would like to subscribe to commercial support or discuss future development.</p>https://tarides.com/blog/2022-07-05-the-magic-of-merlinThe Magic of Merlin2022-07-05T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p><em>The online world is becoming an increasingly bigger part of our everyday lives, bringing the issue of cybersecurity to the forefront of more and more minds. At Tarides we put security at the centre of everything we do, and we&rsquo;re honoured to be part of <em>Cyber@Station F</em> in 2022.</em></p> +<p>Tarides is thrilled to announce that we have been selected for the <a href="https://cyber-at-stationf.com/en/startups">Cyber@Station F Acceleration Program!</a> It&rsquo;s a fantastic opportunity for Tarides to exchange information and collaborate with other startups, as well as connect with the cybersecurity giant Thales.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#the-program" aria-label="the program permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The Program</h2> +<p>The <a href="https://cyber-at-stationf.com//">Cyber@Station F</a> program was established by <a href="https://thalesdigital.io/">Thales Digital Factory</a> at <a href="https://stationf.co">Station F</a> to &ldquo;help accelerate startups&rsquo; development in cybersecurity by providing them advice, expertise, and access to our big markets.&rdquo;</p> +<p>Thales Cyber@Station F is a startup acceleration program that centres around the areas of cybersecurity &amp; trust. It guides its participants through four stages: Select, Define, Deliver &amp; Test, and Business Acceleration. The stages are designed to give each startup an opportunity to come up with and test a proof of concept with potential clients, which if successful is then developed further.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#what-we-do" aria-label="what we do permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What We Do</h2> +<p>At Tarides we know that a lot of today&rsquo;s technology solutions are overly complex, vulnerable to attack, and time consuming to develop. As an alternative to this, we provide secure, safe, and efficient solutions by leveraging the features of OCaml in combination with cutting-edge tools and technologies. In collaboration with a rich open-source community, we develop and maintain the OCaml language, the operating system MirageOS including Unikernels, as well as a range of developer tools.</p> +<p>The OCaml language is efficient, writes safe and secure code, and is easy to maintain and adapt thanks to its modular nature. When combined with MirageOS and Unikernels, which reduce runtime complexity for lightweight and accurate results, OCaml&rsquo;s already impressive features are used more effectively. Finally, our range of modern, easy-to-use tools offer developers a range of options for writing projects in OCaml.</p> +<p>Cybersecurity is at the heart of what we do, and we&rsquo;re excited to combine our knowledge and experience with the substantial resources offered by Thales and Station F. We are also looking forward to engaging with other companies in the same industry and discovering new opportunities for our projects and our clients.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#thales" aria-label="thales permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Thales</h2> +<p>Thales is a global leader in technology innovations and solutions specialising in digital and &lsquo;deep tech&rsquo; innovations such as Big Data, artificial intelligence, connectivity, cybersecurity, and quantum technology. Their goal is to invest in these technologies to build a better future that people can trust. Thales focuses on five vertical markets: digital identity and security, defence and security, aerospace, space, and transport. Their clients play a central role in these socially important markets.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#station-f" aria-label="station f permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Station F</h2> +<p>Located in Paris, Station F is a startup campus that hosts over 1000 startups. StationF offers everything an entrepreneur needs to start and grow their business, making over 35 public services, 150 perks, and 600 workshops and events available to people in their network. Their services include startup programs on specific themes, special mentorship offices, a Flatmates service, exclusive discounts, and much more.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#more-coming-soon" aria-label="more coming soon permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>More Coming Soon</h2> +<p>The programme started at the beginning of June, and we&rsquo;ll follow up with more developments as they happen. In the meantime, we&rsquo;ve selected some relevant posts and information on our work in cybersecurity you can read if you want to know more. We&rsquo;ve received funding from the EU for our work on a <a href="https://tarides.com/blog/2021-04-30-scop-selected-for-dapsi-initiative">secure open messaging</a> platform, we&rsquo;ve been laureates of the <a href="https://tarides.com/blog/2019-07-05-i-lab-2019">i-Lab innovation contest</a> for our work on Osmose, and we&rsquo;ve won the <a href="https://tarides.com/blog/2019-12-11-tarides-wins-the-fic-2020-startup-award">2020 FIC startup award</a>. If you want to read up on the technical side of things, these papers on <a href="https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-kaloper-mersinjak.pdf">unikernels</a> and on <a href="https://anil.recoil.org/papers/2018-hotpost-osmose.pdf">Osmose</a> are a good place to start.</p>https://tarides.com/blog/2022-06-28-thales-cyber-station-f-selectionThales Cyber@Station F Selection2022-06-28T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p><em>Everyone at Tarides recently had the opportunity to meet up in person for the first time! Since the global pandemic left much of our distributed team unable to meet, we organised a working retreat that brought all our teams together to work, learn, and have fun.</em></p> +<p>This was the first formal retreat we&rsquo;ve held as a company, and our goals were to provide an opportunity to disconnect from day-to-day work, meet with new people, and discuss new projects - all in a novel environment surrounded by fresh air and open space. During the course of the pandemic, Tarides grew from 36 to 71 people based in 11+ countries, with few of them ever having met in person. As soon as it was safe to do so, we knew that gathering together would give us an invaluable experience in building and reinforcing our own company culture.</p> +<p>This May, all members of Tarides were invited to a beautiful 17th century chateau at Les Pr&eacute;s D&rsquo;Ecoublay, surrounded by fields, orchards, and lush green forests. Attendees participated in exciting workshops, team-building activities, tech-talks, and inspiring discussions. Over the course of two days, everyone had plenty of time to collaborate, socialise, and eat fantastic food!</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#what-we-got-up-to" aria-label="what we got up to permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What We Got Up To</h3> +<p>The fun began at 9am on Thursday, May 19th, when Tarides&rsquo;s distributed global team gathered to make their way to what was to be their castle for the next two days. Greeted by a feast of French pastries and fresh fruit, introductions were made between people from Australia, France, India, Germany, the USA, the UK, and many more countries.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#tech-talks--tutorials" aria-label="tech talks tutorials permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Tech Talks &amp; Tutorials</h4> +<p>Over the next couple of days, the itinerary left plenty of space for knowledge sharing via tutorials and &lsquo;tech talks&rsquo; or presentations. KC Sivaramakrishnan gave everyone a sneak peak of OCaml 5 with a detailed <a href="https://github.com/kayceesrk/ocaml5-tutorial/">tutorial</a>. The tutorial is openly available on GitHub. Please give us feedback if you try it!</p> +<p>Engineers Sonja Heinze and Jan Midtgaard held presentations (so-called &lsquo;tech talks&rsquo;) in the main hall for everyone&rsquo;s benefit. Sonja&rsquo;s talk was on the benefits of <a href="https://www.outreachy.org">Outreachy</a>, an open-source internship coordinator that creates opportunities for those most affected by underrepresentation or discrimination. Jan introduced everyone to property-based testing, a fascinating way to test the &lsquo;correctness&rsquo; of code. The goal of the 'tech talks' is to provide a safe space for people to share their work with others, where everyone is welcome to ask questions and discuss the topics covered.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#cross-team-collaboration" aria-label="cross team collaboration permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Cross-Team Collaboration</h4> +<p>There were also plenty of opportunities for teams to meet and work together, with individual meeting rooms readily at hand. The coffee machines (and accompanying sweets) in each room significantly boosted the productivity of all teams! Teamwork was not limited to individual groups, but time was made for cross-team brainstorming and collaboration.</p> +<p>All of the teams at Tarides are working towards the big OCaml 5.0 release scheduled for later this year, and we took the opportunity to use this release as a focal point to align our goals for the next few months. Each team spent some time together to conceptualise their team-specific goals first, and then the Team Leads followed up with an &quot;office hours&quot; drop in session to discuss cross-project interaction and ideas. Creating dedicated space to think beyond the usual daily tasks and projects proved to be really valuable, producing insights and connections that may have otherwise been missed.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#fun--games" aria-label="fun games permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Fun &amp; Games!</h4> +<p>It wouldn&rsquo;t be a working retreat without opportunities for relaxation and fun! We got competitive in a treasure hunt that had us searching the forest and tall grasses for clues. There was plenty of time to explore the vast castle grounds, which included a pool, archery arena, ping pong table, karaoke pavilion, and much more.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#until-next-time" aria-label="until next time permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Until Next Time</h3> +<p>The off-site was a huge success! As a distributed team, it&rsquo;s important to occasionally get together and put a face to a name (or Slack and GitHub handle!). Everyone at Tarides looks forward to the next off-site, wherever it may take place.</p>https://tarides.com/blog/2022-06-23-team-tarides-visits-a-17th-century-chateauTeam Tarides Visits a 17th Century Chateau2022-06-23T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>This year, Tarides attended the 2022 <em>Functional Conf</em> in India. Tarides&rsquo;s engineers Sudha Parimala and Shakthi Kannan gave presentations on the OCaml platform and <em>Sandmark</em>, a continuous benchmarking tool for Multicore OCaml.</p> +<p>The <em>Functional Conf</em> is a three-day conference on everything functional programming! It&rsquo;s a great event for beginners and experienced developers alike. Beginners have the opportunity to be introduced to different functional programming languages and understand their fundamental principles, and those who are more experienced have plenty to learn from both participants and speakers on how they have leveraged functional programming in their projects.</p> +<p>This year&rsquo;s <em>Functional Conf</em> was held online and attended by people from around the world. It has been referred to as &ldquo;Asia&rsquo;s premiere functional programming conference&rdquo; and welcomes participants from a broad range of backgrounds.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#ocaml-platform-2022" aria-label="ocaml platform 2022 permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><em>OCaml Platform 2022</em></h2> +<p><a href="https://www.youtube.com/watch?v=tv4_Le4E-gQ">Sudha Parimala&rsquo;s talk</a> is a great introduction to OCaml and its features, covering the installation process and &quot;Hello World,&quot; as well as more advanced topics such as the text editor, publishing a library, and debugging. For the <em>Functional Conf</em>, it was a great way to give people interested in functional programming a taste of OCaml and what makes it stand out.</p> +<p>The presentation is a fantastic resource for people who are starting their journey in OCaml and want to know more about what they can do with the language, as well as for people further ahead looking for inspiration on different ways to progress.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#benchmarking-multicore-ocaml" aria-label="benchmarking multicore ocaml permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><em>Benchmarking (Multicore) OCaml</em></h2> +<p><a href="https://www.youtube.com/watch?v=_-4XNtKs3wM">Shakthi Kannan&rsquo;s talk</a> centres around Sandmark, the benchmarking suite designed to test various parts of the OCaml compiler and its runtime. Benchmarking is a challenging process. As a result, there are few tools available that do the job well. OCaml&rsquo;s Sandmark can test various performance axes such as CPU, memory, and I/O, as its tools build the compiler under various configuration settings. It also comes with a dashboard that lets the user explore the results of benchmarking runs in an interactive and easily digestible format.</p> +<p>In his talk, Shakthi describes the journey to Sandmark, originally developed to support the Multicore OCaml project. He covers the challenges the team faced and the lessons they learned along the way, especially with an evolving programming language and the need to support multiple CPU architectures. It&rsquo;s an amazing resource for teams who are looking to set up their own benchmarking procedures, and it is also a great example of how to approach a difficult task as a team.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#in-conclusion" aria-label="in conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><em>In Conclusion</em></h2> +<p>The <em>Functional Conf</em> is a great conference that brings the growing community of functional programmers together. It offers opportunities for people to learn about functional programming and exchange information with others in similar fields. Tarides is proud to have participated in their effort to bring functional languages to the forefront of programming.</p> +<p>To learn more about <em>Functional Conf</em> you can visit <a href="https://confengine.com/conferences/functional-conf-2022">their website</a>, along with the individual pages on <a href="https://confengine.com/conferences/functional-conf-2022/proposal/16096/ocaml-platform-in-2022">Sudha&rsquo;s talk</a> and <a href="https://confengine.com/conferences/functional-conf-2022/proposal/16102/fast-and-curious-benchmarking-multicore-ocaml">Shakthi&rsquo;s talk</a>, respectively.</p>https://tarides.com/blog/2022-06-21-functional-conf-2022Functional Conf 20222022-06-21T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p><em>OCaml 5 is live! This major release introduces domains and effects, delivering unprecedented speed and efficiency to OCaml. Testing shows that OCaml 5 is able to outperform Go and closely match Rust in terms of performance. Keep reading for more details!</em></p> +<p>Tarides is thrilled that the alpha release of the long-awaited OCaml 5 is live! OCaml 5 is the culmination of over 8 years of research and engineering into concurrency and parallelism support for OCaml, made real thanks to hard work and dedication from all corners of the community. Tarides has been a major contributor to the engineering effort. Our engineers have contributed not only to the core compiler but also to the tools around release readiness, ecosystem compatibility testing, and continuous performance monitoring.</p> +<p><strong>If you are using OCaml in an industrial setting (or if you are interested to do so), we'd like to make sure everything is ready for you to move to OCaml 5 and benefit from the new performance boost. Tell us what you need in this <a href="https://framaforms.org/tarides-ocaml-5-user-survey-1655303113">user survey</a>.</strong></p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#what-kind-of-performance-to-expect" aria-label="what kind of performance to expect permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What Kind of Performance to Expect?</h2> +<p>This update brings <em>unprecedented results</em> in terms of performance, with an HTTP server based on OCaml 5&rsquo;s <a href="https://github.com/ocaml-multicore/eio"><code>Eio</code></a> being able to serve 1M+ requests/sec, outperforming Go&rsquo;s <code>nethttp</code>, and closely matching Rust&rsquo;s <code>hyper</code> performance! This is just a small indication of OCaml 5&rsquo;s potential in terms of speed and efficiency.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/9f0b97bdb5cfc231e1a387bb218f08c4/133ae/http_load1.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 66.47058823529413%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/9f0b97bdb5cfc231e1a387bb218f08c4/c5bb3/http_load1.png" class="gatsby-resp-image-image" alt="HTTP Load" title="HTTP Load" srcset="/static/9f0b97bdb5cfc231e1a387bb218f08c4/04472/http_load1.png 170w, +/static/9f0b97bdb5cfc231e1a387bb218f08c4/9f933/http_load1.png 340w, +/static/9f0b97bdb5cfc231e1a387bb218f08c4/c5bb3/http_load1.png 680w, +/static/9f0b97bdb5cfc231e1a387bb218f08c4/b12f7/http_load1.png 1020w, +/static/9f0b97bdb5cfc231e1a387bb218f08c4/b5a09/http_load1.png 1360w, +/static/9f0b97bdb5cfc231e1a387bb218f08c4/133ae/http_load1.png 1424w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span> +<span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/2f196c5af826e76e47ebc6a902b9182d/2a08f/http_cores.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 66.47058823529413%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/2f196c5af826e76e47ebc6a902b9182d/c5bb3/http_cores.png" class="gatsby-resp-image-image" alt="HTTP Cores" title="HTTP Cores" srcset="/static/2f196c5af826e76e47ebc6a902b9182d/04472/http_cores.png 170w, +/static/2f196c5af826e76e47ebc6a902b9182d/9f933/http_cores.png 340w, +/static/2f196c5af826e76e47ebc6a902b9182d/c5bb3/http_cores.png 680w, +/static/2f196c5af826e76e47ebc6a902b9182d/b12f7/http_cores.png 1020w, +/static/2f196c5af826e76e47ebc6a902b9182d/b5a09/http_cores.png 1360w, +/static/2f196c5af826e76e47ebc6a902b9182d/2a08f/http_cores.png 1422w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>As it&rsquo;s an alpha version, it&rsquo;s still subject to some change and fine tuning. This means that we need your help to make OCaml 5 better! Please give <a href="https://discuss.ocaml.org">plenty of feedback</a> and <a href="https://github.com/ocaml/ocaml/issues">report any bugs</a> you find.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#under-the-hood" aria-label="under the hood permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Under the Hood</h2> +<p>The alpha release of OCaml 5 adds support for shared memory parallel execution via <em>domains</em> and a new model for concurrent execution via <em>effect handlers</em>.</p> +<p>Domains enable shared-memory parallel programming that allow OCaml programs to run on multiple cores. With domains, OCaml programs will scale better by exploiting multicore processing. Effect handlers are a mechanism for concurrent programming. With the introduction of effect handlers, simple direct-style OCaml code will be flexible, easy to develop, debug, and maintain. No more monads for concurrency! These features will benefit the entire ecosystem and community, and we expect it to attract many new users to the language.</p> +<p>The Standard Library is gaining several of the parallelism primitives previously only found in the Threads library (Condition, Mutex, and Semaphore). Interestingly, having added domains and effects, we hope and expect that most users will never need to use them directly! Instead, we warmly encourage users to look at adopting <a href="https://github.com/ocaml-multicore/domainslib"><code>domainslib</code></a> to parallelise programs and <a href="https://github.com/ocaml-multicore/eio"><code>Eio</code></a> as a replacement for Lwt/Async monadic-style concurrency.</p> +<p>This work seeks to remain entirely backwards-compatible. Programs written for any version of OCaml 4, even if they use the Thread library, will continue to work with the same semantics, similar performance, and as always for OCaml, without crashes.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#next-steps" aria-label="next steps permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Next Steps</h2> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#installation" aria-label="installation permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Installation</h3> +<p>For instructions on how to install OCaml 5 on your machine, <a href="https://discuss.ocaml.org/t/ocaml-5-0-zeroth-alpha-release/10026">Florian Angeletti&rsquo;s Discuss post</a> goes into great detail on how to do so, depending on what version of OCaml you&rsquo;re running and what machine you have. KC Sivaramakrishnan has also created a <a href="https://github.com/kayceesrk/ocaml5-tutorial/">tutorial</a> on OCaml 5 that introduces its new parallelism features, a great resource for anyone looking to make the most of the update.</p> +<p>Other OCaml 5 documentation includes information on <a href="https://kcsrk.info/webman/manual/parallelism.html">parallelism</a>, <a href="https://kcsrk.info/webman/manual/effects.html">effect handlers</a>, and the <a href="https://kcsrk.info/webman/manual/memorymodel.html">memory model</a>.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#feedback" aria-label="feedback permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Feedback</h3> +<p>We want to reiterate that as with any alpha release of OCaml, we&rsquo;re keen to hear about bugs and performance regressions. The move to parallel OCaml may bring new debugging challenges, but it remains the case that pure OCaml programs which do not use unsafe features should absolutely never crash. We&rsquo;ll be taking part in the discussion on <a href="https://github.com/ocaml/ocaml/issues">GitHub</a>, <a href="https://discuss.ocaml.org">Discuss</a>, and <a href="https://twitter.com/tarides_?s=20&amp;t=xD04dp9D8eDpCxX6WDkC0A">Twitter</a>.</p> +<p>The change in major version number (from 4.<em>n</em> to 5.<em>n</em>) may result in minor breaking changes which affect your packages, particularly if you&rsquo;ve been allowing some deprecation warnings to slip through in the past! We&rsquo;ll be following up with more information about the required tweaks that may be required for packages supporting both old and new versions of OCaml, as well as with specifics on the testing infrastructure.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#survey" aria-label="survey permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Survey</h3> +<p>As discussed above, we&rsquo;ve created a <a href="https://framaforms.org/tarides-ocaml-5-user-survey-1655303113">user survey</a> to help us get a better sense of how people are planning on using OCaml 5. It would be very helpful if you could fill it out for us, which should only take a few minutes.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#ocaml-5-timeline" aria-label="ocaml 5 timeline permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>OCaml 5 Timeline:</h3> +<ul> +<li>The beta release will take place once these <a href="https://github.com/ocaml/ocaml/milestone/40">issues</a> have been resolved.</li> +<li>The final release is expected in September.</li> +<li>There is <em>no time limit</em> on reporting bugs, so please <a href="https://github.com/ocaml/ocaml/issues">report them here</a>.</li> +</ul> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#further-reading" aria-label="further reading permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Further Reading</h2> +<p>The <a href="https://discuss.ocaml.org/tag/multicore-monthly">Multicore monthlies</a>, produced by Shakthi Kannan and Anil Madhavapeddy, provide important context for the work behind OCaml 5 and what&rsquo;s coming with the release.</p>https://tarides.com/blog/2022-06-15-ocaml-5-alpha-releaseOCaml 5 Alpha Release2022-06-15T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>The Upcoming Tezos <a href="https://tezos.gitlab.io/protocols/013_jakarta.html#protocol-jakarta">Jakarta Protocol</a> will support compact Merkle +proofs to scale the network's trust infrastructure. +This allows nodes that do not trust each other to agree on the +validity of Tezos transactions with orders of magnitude smaller +storage requirements. +For instance, the block <a href="https://tzstats.com/2400319">2,400,319</a>, +containing 402 transactions and 638 operations, +can be validated using a Merkle proof of 6.3 MB instead of requiring +a Tezos node with at least 3.4 GB of storage, a savings of 99.8%!</p> +<p>Tarides contributed to Jakarta by extending the Tezos +storage system to support compact +storage proofs. This +feature extends the compact cryptographic representation of the ledger +state to sequences of operations. As a result, nodes that do not trust +each other can still agree on the series of operations' validity, +even if they don't know the entire contents of the ledger. The +upcoming stateless nodes (like <a href="https://tezos.gitlab.io/user/light.html">Tezos +light-client</a>), +<a href="https://tezos.gitlab.io/user/proxy.html">proxies</a>, and mechanisms that allow the exchange of trust between disjointed tamper-proof storage (like <a href="https://tezos.gitlab.io/alpha/transaction_rollups.html">L2 +transactional-rollups</a>, +L2 smart-contract rollups,...) will use these proofs to scale the +Tezos trust infrastructure to <a href="https://research-development.nomadic-labs.com/tezos-is-scaling.html">new heights</a>.</p> +<p>The Merkle Proof API is one of the last major features that we integrated from +<a href="https://www.dailambda.jp/blog/2019-08-08-plebeia/">Plebeia</a>. It is +the result of a years-long collaboration between +<a href="https://www.dailambda.jp/">DaiLambda</a> and Tarides to improve the +storage system of Tezos.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#a-very-quick-tour-of-tezos" aria-label="a very quick tour of tezos permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>A (Very) Quick Tour of Tezos</h3> +<p>The Tezos network builds trust between its nodes by using two components:</p> +<ul> +<li><strong>(i)</strong> a tamper-proof database that can generate cryptographic hashes, +which uniquely and compactly represent the state of its contents; and</li> +<li><strong>(ii)</strong> a consensus algorithm to share these cryptographic hashes +across the network of (potentially adversarial) nodes.</li> +</ul> +<p>Both components have seen impressive improved performance +recently. First, for <strong>(i)</strong>, we've discussed +the improvements that we released in <a href="https://tarides.com/blog/2022-04-26-lightning-fast-with-irmin-tezos-storage-is-6x-faster-with-1000-tps-surpassed">Octez v13 to improve +the efficiency of the storage component by a factor of 6</a>. Second, for <strong>(ii)</strong>, the consensus algorithm in Ithaca 2 changed from a +Nakamoto-style algorithm (like Bitcoin) to +<a href="https://arxiv.org/abs/2001.11965">Tenderbake</a> -- a Byzantine Fault +Tolerance consensus with deterministic finality. This change +significantly improved the time it takes for converging towards a +uniquely agreeing state hash in the Tezos network.</p> +<p>The Merkle proofs that we +introduced in the Jakarta Protocol will allow us to <a href="https://research-development.nomadic-labs.com/tezos-is-scaling.html">improve the +chain's performance even +more</a>.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#the-tezos-ledger-is-a-merkle-tree" aria-label="the tezos ledger is a merkle tree permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The Tezos Ledger is a Merkle Tree</h3> +<p>Tezos represents the ledger state (for instance, the amount of tokens +owned by everyone) as a Merkle tree, using the +<a href="https://irmin.io">Irmin</a> storage library. Merkle trees are immutable +tree-like data structures where each leaf is labelled with a +cryptographic hash. Each node's hash is then obtained by +recursively hashing its children's label. Tezos then combines that +root hash computation with its consensus protocol to make sure every +node in the network agrees on the ledger's state.</p> +<p>But there is another interesting aspect of Merkle trees that was not +exposed and used by Tezos until now: <em>Merkle proofs</em>. In the protocol +<code>J</code> proposal, we are introducing a new feature: Merkle proofs for +Tezos as partial, compressed, Merkle trees. In a blockchain, as in +Tezos, Merkle proofs are an efficient way to verify the integrity of +operations over Merkle trees. For this reason, Merkle proofs are +a central part of the optimistic rollups projects that will +be available with the Tezos +<a href="https://tezos.gitlab.io/protocols/013_jakarta.html">Jakarta Protocol</a>.</p> +<p>In collaboration with DaiLambda, Marigold, Nomadic Labs, and TriliTech, +we have integrated <a href="https://github.com/camlspotter/plebeia">Plebeia</a> +Merkle proofs in Irmin. Plebeia use Patricia binary trees that are capable of +generating very compact Merkle proofs. For instance, the proof of +100 operations can be represented within 46 kB, while storing the full +Tezos context requires 3.4 GB of disk storage to store the relevant context. +This compactness comes from its specialised store structure and clever +optimisations, such as path compression and inlining. We have been +working with the DaiLambda team to unite Irmin and Plebeia's strengths +and bring built-in compact Merkle proof support to Tezos. We added +support for both the existing storage stack, where trees have a +branching factor of 32, and for new L2 storage systems that could use +binary trees directly. We have also worked with Marigold and Nomadic +Labs to propose an alternative representation of these proofs using +streams, that comes with a simplified verification algorithm. +A stream proof encodes the same information as a regular proof. +However, instead of being +encoded as a tree, the proof is encoded as a sequence of steps that +reveal a Merkle tree lazily, from root to leaves.</p> +<table> +<thead> +<tr> +<th>Kind of Proofs</th> +<th>1 op.</th> +<th>100 ops.</th> +<th>1k ops.</th> +<th>10k ops.</th> +</tr> +</thead> +<tbody> +<tr> +<td>binary Merkle trees</td> +<td>0.7kB</td> +<td>46kB</td> +<td>371kB</td> +<td>2.8MB</td> +</tr> +<tr> +<td>stream binary Merkle trees</td> +<td>1kB</td> +<td>75kB</td> +<td>602kB</td> +<td>4.5MB</td> +</tr> +<tr> +<td>Merkle B-trees (32 children)</td> +<td>3.1kB</td> +<td>158kB</td> +<td>1232kB</td> +<td>7.8MB</td> +</tr> +<tr> +<td>stream Merkle B-trees (32 children)</td> +<td>3.1kB</td> +<td>158kB</td> +<td>1238kB</td> +<td>7.9MB</td> +</tr> +</tbody> +</table> +<blockquote> +<p>The table above shows the size for such proofs, using 2.5M entries (the current number of entries in <code>/data/contracts/index</code> in the Tezos context). We are simulating 1, 100, 1000, and 10_000 random read operations on the entries, and we display the size of the related proofs.</p> +</blockquote> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#an-example" aria-label="an example permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>An Example</h3> +<p>Let's look at a simple example of a Merkle proof produced for ensuring +tamper-proof banking account statements.</p> +<p>We can model a bank that stores its customer balances in the form of a +Merkle tree. To avoid publishing the entire contents of its customer +accounts, this bank can publicly export the bank's Merkle tree's +hash. To let 3rd-parties validate an operation, it can also produce +Merkle proofs that reveal the balance of some customers. Anyone in +possession of a Merkle proof can hash it and verify that it hashes +identically to the public hash announced by the bank. This equality of +hash is proof of correctness.</p> +<p>Our bank contains the balances for Eve (30 coins), Ben (10 coins), and +Bob (20 coins). It stores the customers in a radix tree (Eve's balance +is stored under <code>&quot;e&quot;, &quot;v&quot;, &quot;e&quot;</code>).</p> +<p>Irmin stores data as hash trees: whenever data is added to the +database, the corresponding nodes in the tree are hashed in order to +then generate the hash of the root node. The hash of a commit also +acts as a reference for accessing it in the future. This storage +format is close to Merkle proofs. It suffices to blind the part of the +tree that a transaction has not accessed to generate its proof. For +example, the account statement for Eve is in green. It doesn't leak +sensitive information about the other customers: only the letter &quot;b&quot; +is leaked. The hash corresponds to Bob and Ben's subtree.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/3f72a55ae57b0494cd400fb80392b28a/84cc5/Merkle-Proof.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 41.76470588235294%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/3f72a55ae57b0494cd400fb80392b28a/c5bb3/Merkle-Proof.png" class="gatsby-resp-image-image" alt="Merkle Proof" title="Merkle Proof" srcset="/static/3f72a55ae57b0494cd400fb80392b28a/04472/Merkle-Proof.png 170w, +/static/3f72a55ae57b0494cd400fb80392b28a/9f933/Merkle-Proof.png 340w, +/static/3f72a55ae57b0494cd400fb80392b28a/c5bb3/Merkle-Proof.png 680w, +/static/3f72a55ae57b0494cd400fb80392b28a/84cc5/Merkle-Proof.png 898w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<blockquote> +<p>The Figure shows the difference between a Merkle tree (on the left) and a Merkle proof (on the right). In Merkle proofs, some subtrees can be blinded and represented only by their hash. Here the subtree under H2 is blinded and replaced by its hash. Thanks to this, Merkle trees and Merkle proofs will have the same root hash. Merkle proofs are hence very useful to provide Merkle tree summaries for a subset of the full data available.</p> +</blockquote> +<p>Merkle proofs are thus partial Merkle trees, with the same root hash. But proofs +can also be represented using an alternate definition: a stream of elements +that needs to be visited in order to build the tree's root hash. There is +a one-to-one correspondence between the two representations, but stream proofs +are easier to implement as they encode the order in which nodes have to be visited to verify the proof. However, stream proofs need to carry the hash of all the intermediate nodes, while tree proofs can omit those. As a consequence, tree proofs are smaller than stream proofs, as shown in the above table.</p> +<p>For instance, in the above Figure, the equivalent stream proof is the sequence:</p> +<ul> +<li>A leaf with 30 coins;</li> +<li>A node with hash <code>H3</code> with a child (&quot;e&quot;, &quot;30 coins&quot;);</li> +<li>A node with hash <code>H1</code> with a child (&quot;v&quot;, <code>H3</code>);</li> +<li>A node with hash <code>H0</code> with two children (&quot;e&quot;, <code>H1</code>) and (&quot;b&quot;, <code>H2</code>).</li> +</ul> +<p>A verifier can just apply these elements in sequence to verify that <code>H0</code> is +a valid root hash.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#merkle-proofs-in-irmin" aria-label="merkle proofs in irmin permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Merkle Proofs in Irmin</h3> +<p>If you want more details about the Merkle proof implementation, head +over to the +<a href="https://mirage.github.io/irmin/irmin/Irmin/module-type-S/Tree/Proof/index.html">documentation</a> +or at <a href="https://github.com/mirage/irmin/pull/1802">https://github.com/mirage/irmin/pull/1802</a> for the example above +revisited in Irmin.</p>https://tarides.com/blog/2022-06-13-adding-merkle-proofs-to-tezosAdding Merkle Proofs to Tezos2022-06-13T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#introduction" aria-label="introduction permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Introduction</h3> +<p>One of Tarides's projects is to create an open and secure infrastructure for <a href="https://tarides.com/blog/2022-03-08-secure-virtual-messages-in-a-bottle-with-scop">communication protocols</a>, initially focusing on emails and <a href="https://matrix.org/">Matrix</a>. This will allow organisations to self-host their messaging services, using either personal cloud resources or low-cost embedded devices. Individuals and organisations can use this framework to avoid having their emails and messages read and managed by third parties.</p> +<p>Every component of our system is carefully designed as independent libraries, using modern development techniques to avoid the common reported threats and flaws. For instance, the protocols' implementation is written in a type-safe language and tested with state-of-the-art, coverage-driven tests, such as fuzzing. Then it's deployed as unikernels for enhanced security, model quality, and library portability. The combination of these techniques will increase users&rsquo; trust to migrate their personal data to these new secure services.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#the-matrix" aria-label="the matrix permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The Matrix</h3> +<p>When hearing the word <em>Matrix</em>, people invariably think about Neo and his ability to see the code behind his virtual world. In lieu of the cultural connection to the popular film series, the Matrix Communication Standard creators respond to the implicit assumption regarding their choice of name: &ldquo;We are called Matrix because we provide a structure in which all communication can be matrixed together.&rdquo;</p> +<p>Communication is essential to our society to both create and maintain relationships, whether personal or professional. As we progress further into this age of information, people communicate and stay connected through online and text-based communication. Gone are the days when someone would pick up a phone to call a friend or family member. Now, most people send a text message or email as the default. Thus, online communication has become the norm in our current society. Inevitably, this online communication is vulnerable to malicious actors trying to invade our privacy and hijack our correspondence. Tarides has addressed this issue and aims to host community discussions about open-source projects.</p> +<p>Matrix is an established protocol for human-to-human and human-to-machine communications, including instant messaging. OCaml Matrix is an OCaml implementation of the Matrix protocol. This provides a secure communication layer which is based on MirageOS&rsquo;s unikernel technology in order to reduce the attack surface. It uses Irmin as storage for the communication content to ensure integrity, and we have integrated it into the CI system for all OCaml projects.</p> +<p>Let's take a closer look at the <code>ocaml-matrix</code> component and explore some details about the Matrix Communication Standard to see if it&rsquo;s indeed communication matrixed together or if it&rsquo;s comparable to Neo&rsquo;s Matrix with people plugged into a virtual world.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#matrix-beginnings-history" aria-label="matrix beginnings history permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Matrix Beginnings (History)</h3> +<p>Matrix is an open standard for interoperable, decentralised, real-time communication over the Internet, created in 2014 inside <a href="https://www.amdocs.com">Amdocs</a>, a company specialised in software and services for communications. <a href="https://matrix.org">Matrix</a> provides fully decentralised and federated architecture, so they don&rsquo;t store users&rsquo; information in a centralised location. This means when people join one of the Matrix virtual rooms to send messages, video chat, or share files, their exchanges are truly private, especially with Matrix&rsquo;s end-to-end encryption. Matrix&rsquo;s decentralised, federated architecture ensures communication integrity and availability in every room.</p> +<p>Matrix is openly <a href="https://spec.matrix.org/latest/">specified</a> and <a href="https://github.com/matrix-org">implemented</a> with the open-source reference implementation server <a href="https://github.com/matrix-org/synapse">Synapse</a> and client <a href="https://github.com/vector-im">Element</a>, previously Riot, which already have several, astute security features and allow end-to-end encryption. Starting in 2018, the French Government deployed a private federation of <a href="https://github.com/matrix-org/synapse-dinsic">Matrix home servers</a> and <a href="https://github.com/tchapgouv">Tchap</a>, an open-source client forked from Riot. The French National Cybersecurity Agency (<a href="https://www.ssi.gouv.fr/en/">ANSSI</a>) jointly works with the Interdepartmental Digital Directorate (<a href="https://www.numerique.gouv.fr/dinum/">DINUM</a>) on a cybersecurity audit of Tchap. Matrix&rsquo;s interesting security features include end-to-end capable search and enables private rooms&rsquo; end-to-end encryption by default.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#matrix-reloaded-architecture" aria-label="matrix reloaded architecture permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Matrix Reloaded (Architecture)</h3> +<p>Users interact by sending and receiving events in Matrix rooms. Each Matrix user registers a homeserver that is identified by a unique ID, like &ldquo;Neo:tarides.com.&rdquo; The registration goes through a client application that connects to a Matrix homeserver via the client-server API. This allows users to perform actions such as sending messages, controlling rooms, or synchronising their conversation history. All communication in a Matrix room replicates across the room participants&rsquo; homeservers, so every homeserver connected to a room stores the content of the room&rsquo;s history.</p> +<p>Basically, the user communicates to a home server via a client application. Once the user decides to join a room, the client sends this request to the homeserver, and it&rsquo;s the homeserver&rsquo;s responsibility to connect the user to the room, to store the history of the messages of that room, and to send the messages back to the user. The homeserver gets all this information by talking with the other users&rsquo; homeservers in that room. This way, if a homeserver goes down, the conversation can continue as the remaining homeservers are still exchanging messages. When a homeserver comes back online, it resynchronises the messages. It receives old ones from other homeservers and inserts its own into others&rsquo; timelines.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/918b2b99cc0728b9ab6397e8d967bf2b/0f7d5/servers.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 85.88235294117648%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/918b2b99cc0728b9ab6397e8d967bf2b/c5bb3/servers.png" class="gatsby-resp-image-image" alt="Matrix Architecture" title="Matrix Architecture" srcset="/static/918b2b99cc0728b9ab6397e8d967bf2b/04472/servers.png 170w, +/static/918b2b99cc0728b9ab6397e8d967bf2b/9f933/servers.png 340w, +/static/918b2b99cc0728b9ab6397e8d967bf2b/c5bb3/servers.png 680w, +/static/918b2b99cc0728b9ab6397e8d967bf2b/0f7d5/servers.png 993w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p><em>Matrix Architecture Image Description: Matrix users communicate via Matrix clients, which can be web client, a mobile client, desktop clients, or embedded clients built into existing apps like Slack via Matrix bridges. It could even be a piece of hardware (e.g., a drone) that is Matrix enabled. A user's client connects via an unique ID to a single homeserver, which stores the communication history and account information for that user. It also shares data with the wider Matrix federation by synchronising communication history with other homeservers. The conversations among users take place in rooms that have their contents replicated across all of the homeservers associated with the users present in a room.</em></p> +<p>The centralised communication architectures keep the data within their own systems, which induces a series of security issues. For example, usually the centralised systems offer very little transparency regarding their implementations. This means that, for the claimed purpose of security, the centralised system could either hide backdoors or have security flaws that pose serious issues to privacy. By contrast, an open-source system promotes transparent development, which provides assurance regarding the liability of the implementation by allowing ad-hoc code audits. Moreover, the decentralised architecture empowers users to host their own conversations rather than all their data being stored by the service provider. This renders less incentives for attacks targeting massive data leaks and, in combination with the confidentiality ensured by the end-to-end encryption, induces an increased level of security while promoting ownership and data sovereignty.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#matrix-revolutions-in-ocaml" aria-label="matrix revolutions in ocaml permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Matrix Revolutions (in OCaml)</h3> +<p>Matrix&rsquo;s <a href="https://www.matrix.org/security-disclosure-policy/">Hall of Fame</a> shows several ethical researchers&rsquo; investigative work into Matrix&rsquo;s security vulnerabilities. For example, a recently discovered <a href="https://www.cvedetails.com/cve/CVE-2021-44538/">buffer overflow</a> produces a considerable information disclosure in other Matrix implementations, such as Element. At Tarides, we mitigate a consistent class of these vulnerabilities with the OCaml development environment, which provides secure-by-design guarantees for the <a href="https://github.com/mirage/ocaml-matrix">OCaml Matrix</a> project.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/d41f7a6ccb861f1d923be5320b068db1/d26aa/architecture.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 99.41176470588235%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/d41f7a6ccb861f1d923be5320b068db1/c5bb3/architecture.png" class="gatsby-resp-image-image" alt="Matrix Servers" title="Matrix Servers" srcset="/static/d41f7a6ccb861f1d923be5320b068db1/04472/architecture.png 170w, +/static/d41f7a6ccb861f1d923be5320b068db1/9f933/architecture.png 340w, +/static/d41f7a6ccb861f1d923be5320b068db1/c5bb3/architecture.png 680w, +/static/d41f7a6ccb861f1d923be5320b068db1/d26aa/architecture.png 839w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>OCaml Matrix Architecture: The OCaml CI Client is a bot that communicates with Matrix servers via the TLS protocol, such as the <code>ocaml-matrix</code> server. The <code>ocaml-matrix</code> server is the unikernel that ensures the communication with other Matrix servers from the federation to synchronise upon events in the Matrix rooms. For this purpose, OCaml Matrix exchanges DNS information with a unikernel that plays the role of a Primary DNS Server and connects with an Irmin storage unit to save the rooms&rsquo; states.</p> +<p>Our <code>ocaml-matrix</code> server manages its own clients, who create public rooms for events and messaging. It also handles foreign servers; their users can ask to join these public rooms. This server interacts with other servers and manages their users requests for registration and event updates in public rooms via the server-to-server communication API. Our OCaml implementation follows the Matrix specification standard. From this, we extract the parts describing the subset of Matrix components that we choose to implement for our OCaml Matrix MVP (Minimum Viable Product). However, the MVP applies its constraints while taking into account that other servers would not be aware of them by using errors/rights restrictions provided by the Matrix standard.</p> +<p>We also implemented an OCaml-CI client that communicates with the Matrix servers via the client-server API. This client implements a subset of the actions defined in the specification and is meant to be used as a bot only (and would therefore not need to drift apart from this subset). The OCaml-CI client was specifically designed to allow an easy implementation for our OCaml server, but it is totally compatible with other Matrix homeservers. We tested the integration of the OCaml-CI client with both Synapse and our <code>ocaml-matrix</code> server, and we used it for testing throughout the <code>ocaml-matrix</code> server implementation.</p> +<p>For now, we&rsquo;ve only given the OCaml Matrix access to public rooms because they don&rsquo;t require the end-to-end encryption protocol. Nevertheless, we define support for encrypted communication via the <em>Key</em> module, and we note that most of the encryption algorithms used by the end-to-end encryption protocol are available in MirageOS unikernels via the <a href="https://github.com/mirage/mirage-crypto"><code>mirage-crypto</code> library</a>.</p> +<p>We deployed the <code>ocaml-matrix</code> server as an end-to-end application and converted it into the unikernel format. The process of unikernel deployment enables the<code>ocaml-matrix</code> unikernel&rsquo;s compatibility to run on various platforms in isolation, increasing the security level of the Matrix server. The unikernel format of the Matrix server is <a href="https://github.com/mirage/ocaml-matrix/tree/mirage/ci-server-mirage">completed for Unix</a> and in the final stages for the platforms ported by Solo5. It is noteworthy to say that throughout the stage of <code>ocaml-matrix</code> unikernel deployment, we&rsquo;ve had our share of <a href="https://github.com/aantron/dream">dream</a>-ing. Going through this experience was a game changer.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#matrix-resurrections-future-work" aria-label="matrix resurrections future work permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Matrix Resurrections (Future Work)</h3> +<p>Although we&rsquo;re thrilled about the progress thus far, there is still much work to do. We plan to revive the OCaml Matrix to improve or add certain features. First, we will add user access to private rooms with end-to-end encryption and more authentication methods that follow Matrix specifications and GDPR recommendations. We will also adopt a methodology for testing and benchmarking for both the <code>ocaml-matrix</code> client and server, integrate the <code>ocaml-matrix</code> codebase into OCaml Multicore, create other <code>ocaml-matrix</code> unikernel deployments, and evaluate the security model provided in the Matrix specifications. Finally, we&rsquo;ll update and complete the implementation according to the latest Matrix specifications.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusions" aria-label="conclusions permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusions</h3> +<p>Having said all of the above, we invite you to decide whether the Matrix name comes from the provided federated structure in which all communication can be matrixed together or from the idea that it's creating a virtual world that is sustained by the users plugged into it. Do you want to know the truth behind the Matrix? It&rsquo;s up to you. Will you choose the blue pill or the red pill?</p>https://tarides.com/blog/2022-06-09-ocaml-matrix-a-virtual-worldOCaml Matrix: A Virtual World2022-06-09T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Tarides is proud to sponsor the 12th annual programming contest <em><a href="https://journees-franciliennes-de-programmation.org/">Journ&eacute;es Franciliennes de Programmation!</a></em> On the 31st of May 2022, students from three different Parisian universities met at La Sorbonne University to engage in some friendly but lively competition.</p> +<p>Bachelor students from La Sorbonne (Paris 6), Paris Cit&eacute; (Paris 7), and Paris Saclay (Paris 11) participated in a day-long programme creating solutions to a variety of problems. The aim of the competition was not that participants needed to demonstrate detailed knowledge on specific areas of programming, but rather that they applied their combined knowledge of programming usefully. At the end of the day, participants were awarded points based on the problems they&rsquo;d solved during the day and the winners were announced.</p> +<p>The event was organised by teachers and researchers of computer science, some of whom specialise in OCaml. It was a great opportunity for students to experiment with OCaml under the guidance and supervision of experienced programmers.</p> +<p>Tarides provided computer science books for the participants, along with some fun Tarides swag!</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/f9130c96d1856810faa612631daa7d10/00172/classroom1.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 74.70588235294117%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/f9130c96d1856810faa612631daa7d10/c5bb3/classroom1.png" class="gatsby-resp-image-image" alt="Programming Contest at La Sorbonne" title="Programming Contest at La Sorbonne" srcset="/static/f9130c96d1856810faa612631daa7d10/04472/classroom1.png 170w, +/static/f9130c96d1856810faa612631daa7d10/9f933/classroom1.png 340w, +/static/f9130c96d1856810faa612631daa7d10/c5bb3/classroom1.png 680w, +/static/f9130c96d1856810faa612631daa7d10/b12f7/classroom1.png 1020w, +/static/f9130c96d1856810faa612631daa7d10/00172/classroom1.png 1044w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/45b6fb8ff402f4cf6d7d0e2437c22d4c/20751/classroom2.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 74.70588235294117%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/45b6fb8ff402f4cf6d7d0e2437c22d4c/c5bb3/classroom2.png" class="gatsby-resp-image-image" alt="Student Creating Solutions" title="Student Creating Solutions" srcset="/static/45b6fb8ff402f4cf6d7d0e2437c22d4c/04472/classroom2.png 170w, +/static/45b6fb8ff402f4cf6d7d0e2437c22d4c/9f933/classroom2.png 340w, +/static/45b6fb8ff402f4cf6d7d0e2437c22d4c/c5bb3/classroom2.png 680w, +/static/45b6fb8ff402f4cf6d7d0e2437c22d4c/b12f7/classroom2.png 1020w, +/static/45b6fb8ff402f4cf6d7d0e2437c22d4c/20751/classroom2.png 1037w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p>https://tarides.com/blog/2022-06-02-tarides-sponsors-12th-annual-journ-e-francilienneTarides Sponsors 12th Annual Journées Franciliennes2022-06-02T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Tarides is pleased to announce the launch of the updated community site, <a href="https://ocaml.org/">ocaml.org</a>.</p> +<p>Over the past year and a half, we have supported and collaborated with members of the OCaml community on the creation of an updated community website. We are proud to present new features and improvements that will benefit both existing and new generations of OCaml users.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#features" aria-label="features permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Features</h3> +<p>Some of the quality-of-life improvements that users can expect from this update include:</p> +<ul> +<li><a href="https://ocaml.org/packages">Package documentation</a> site which contains the documentation of every version of every OCaml package</li> +<li><a href="https://ocaml.org/opportunities">Job board</a> to list job opportunities from the community</li> +<li><a href="https://ocaml.org/blog">Syndicated blog</a> that links to blog articles from the community and offers original blog posts</li> +<li><a href="https://ocaml.org/success-stories">Success stories</a> that explore how notable companies solve real-world challenges using OCaml</li> +<li><a href="https://ocaml.org/learn">New documentation site</a> which aggregates resources and tutorials to learn OCaml</li> +<li><a href="https://ocaml.org/play">OCaml playground</a> to try OCaml directly in the browser</li> +</ul> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#the-road-so-far" aria-label="the road so far permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The Road So Far</h3> +<p>We have worked hard to address concrete requirements from users and provide solutions for the new website. The 2020 OCaml Community Survey highlighted several areas to improve, resulting in new features and content being added.</p> +<p>The survey concluded that the original site lacked easily accessible package documentation, and that job applicants and employers had a difficult time connecting. To address this, we decided early on to include both a <a href="https://ocaml.org/opportunities">job board</a> and a fully-incorporated package documentation page. The job board now provides a place where job seekers can discover opportunities in OCaml, and employers can look for applicants. The <a href="https://ocaml.org/packages">package documentation</a> page allows users to find, explore, and compare documentation all conveniently located in one place. Additionally, the team wanted to improve site navigation. This included ensuring easy pathfinding between related topics, together with a focus on improving overall accessibility, allowing successful navigation within the site.</p> +<p>Taking the perspective of different users of the site inspired the creation of brand-new content like <a href="https://ocaml.org/success-stories">Success Stories</a> that highlight ways professionals, academics, and others use OCaml to solve hard problems, create impact, and foster collaboration. It also inspired the new area for <a href="https://ocaml.org/learn/">tutorials and guides</a>, as well as the <a href="https://ocaml.org/play">OCaml playground</a>, both aimed at making learning OCaml easier and more engaging.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#looking-ahead" aria-label="looking ahead permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Looking Ahead</h3> +<p>There is still plenty of room for improvement and new ideas, and now is a great time for the community to be more involved. We&rsquo;d like everyone in the community to participate in improving the site and we have created a <a href="https://github.com/ocaml/ocaml.org/blob/main/CONTRIBUTING.md">contribution guide</a> to make the process easier. Please reach out on the <a href="https://github.com/ocaml/ocaml.org/issues">issue tracker</a> with ideas and suggestions. We are especially looking for people to help maintain and run the website, and improve the content and general user-experience to help grow our community even more!</p> +<p>To learn more about the reboot and how to get involved please read the <a href="https://discuss.ocaml.org/t/v3-ocaml-org-we-are-live/9747">original Discuss post</a>.</p>https://tarides.com/blog/2022-05-02-ocaml-org-reboot-user-centric-design-contentOCaml.org Reboot: User-Centric Design & Content2022-05-02T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Over the last year, the Tarides +storage team has been focused on scaling the storage layer of <a href="https://tezos.gitlab.io/">Octez</a>, +the most popular node implementation for the <a href="https://tezos.com/">Tezos</a> blockchain. With +the upcoming release of Octez v13, we are reaching our performance goal of +<strong>supporting one thousand transactions per second</strong> (TPS) in the +storage layer! This is a <strong>6x improvement</strong> over Octez 10. Even better, this +release also <strong>makes the storage layer orders of magnitude more stable</strong>, +with a <strong>12x improvement in the mean latency of operations</strong>. At the +same time, we <strong>reduced the memory usage by 80%</strong>. +Now Octez requires a mere 400 MB of RAM to bootstrap nodes!</p> +<p>In this post, we'll explain how we achieved these milestones thanks to +<a href="https://irmin.org">Irmin 3</a>, the new major release of the <a href="https://mirage.io">MirageOS</a>-compatible +storage layer developed and maintained by Tarides and used by Tezos. +We'll also explain what this means for the Tezos community now and +in the future.</p> +<p>As explained by a <a href="https://research-development.nomadic-labs.com/tps-evaluation.html">recent post on Nomadic Labs +blog</a>, +there are various ways to evaluate the throughput of Tezos. Our +purpose is to optimise the Tezos storage and identify and fix +bottlenecks. Thus, our benchmarking setup replays actual data (the +150k first blocks of the Hangzhou Protocol on Tezos Mainnet, +corresponding to the period Dec 2021 &ndash; Jan 2022) and explicitly +excludes the networking I/O operations and protocol computations to +focus on the context I/O operations only. Thanks to this setup we +managed to identify, fix, and verify that we removed the main +I/O bottlenecks present in Octez:</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/f859bd5c186c91df46c59f296d8f40b6/58a91/transactions_per_second.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 68.23529411764706%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/f859bd5c186c91df46c59f296d8f40b6/c5bb3/transactions_per_second.png" class="gatsby-resp-image-image" alt="Bar chart of mean transactions per second for various Irmin +configurations" title="Bar chart of mean transactions per second for various Irmin +configurations" srcset="/static/f859bd5c186c91df46c59f296d8f40b6/04472/transactions_per_second.png 170w, +/static/f859bd5c186c91df46c59f296d8f40b6/9f933/transactions_per_second.png 340w, +/static/f859bd5c186c91df46c59f296d8f40b6/c5bb3/transactions_per_second.png 680w, +/static/f859bd5c186c91df46c59f296d8f40b6/b12f7/transactions_per_second.png 1020w, +/static/f859bd5c186c91df46c59f296d8f40b6/b5a09/transactions_per_second.png 1360w, +/static/f859bd5c186c91df46c59f296d8f40b6/58a91/transactions_per_second.png 1990w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<blockquote> +<p>Comparison of the Transactions Per Second (TPS) performance between Octez 10, +11, 12 and 13 while replaying the 150k +first blocks of the Hangzhou Protocol on Tezos Mainnet<sup><a href="https://tarides.com/feed.xml#fn-1" class="footnote-ref">1</a></sup>. +Octez 13 reaches 1043 TPS on average which is a <strong>6x improvement</strong> over Octez 10.</p> +</blockquote> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#merkle-databases-to-index-or-not-to-index" aria-label="merkle databases to index or not to index permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Merkle databases: to index or not to index</h3> +<p>A Tezos node keeps track of the blockchain state in a database called the +<em>context</em>. For each block observed by the node, the context stores a +corresponding <a href="https://en.wikipedia.org/wiki/Merkle_tree">tree</a> that witnesses the state of the chain at that +block.</p> +<p>Each leaf in the tree contains some data (e.g., the balance of a particular +wallet) which has a unique hash. Together these leaf hashes uniquely determine +the hashes of their parent nodes all the way up to the root hash of the tree. +In the other direction &ndash; moving down the tree from the root &ndash; these hashes form +<em>addresses</em> that allow each node to later be recovered from disk. In the Octez +node, the context is implemented using <a href="https://irmin.org">Irmin</a>, an open-source OCaml +library that solves exactly this problem: storing trees of data in which each +node is addressed by its hash.</p> +<p>As with any database, a crucial aspect of Irmin's implementation is its +<a href="https://en.wikipedia.org/wiki/Database_index">index</a>, the component that maps addresses to data locations +(in this case, mapping hashes to offsets within a large append-only data file). +Indexing each object in the store by hash has some important advantages: for +instance, it ensures that the database is totally +<a href="https://en.wikipedia.org/wiki/Data_deduplication">deduplicated</a> and enables fast random access to any +object in the store, regardless of position in the tree.</p> +<p>As discussed in <a href="https://tarides.com/2020-09-01-introducing-irmin-pack">our <code>irmin-pack</code> post</a>, the context index was +optimised for very fast reads at the cost of needing to perform an expensive +maintenance operation at regular intervals. This design was very effective in +the early months of the Tezos chain, but our <a href="https://tarides.com/2021-10-04-the-new-replaying-benchmark-in-irmin">recent work on benchmarking the +storage layer</a> revealed two problems with it:</p> +<ul> +<li> +<p><strong>content-addressing bottlenecks transaction throughput</strong>. Using hashes as +object addresses adds overhead to both reads and writes: each read requires +consulting the index, and each write requires adding a new entry to it. At +the current block rate and block size in Tezos Mainnet, these overheads are +not a limiting factor, but this will change as the protocol and shell become +faster. Our overall goal is to support a future network throughput of <strong>1000 +transactions per second</strong>, and doing this required rethinking our reliance on +the index.</p> +</li> +<li> +<p><strong>maintaining a large index impacts the stability of the node</strong>. The larger +the index becomes, the longer it takes to perform regular maintenance +operations on it. For sufficiently large contexts (i.e., on archive nodes), +the store may be unable to perform this maintenance quickly enough, leading +to long pauses as the node waits for service from the storage layer. +In the context of Tezos, this can lead to users occasionally exceeding the +maximum time allowed for baking or endorsing a block, losing out on the +associated rewards.</p> +</li> +</ul> +<p>Over the last few months, the storage team at Tarides has been hard at work +addressing these issues by switching to a <em>minimal indexing</em> strategy in the +context. This feature is now ready to ship, and we are delighted to present the +results!</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#consistently-fast-transactions-surpassing-the-1000-tps-threshold" aria-label="consistently fast transactions surpassing the 1000 tps threshold permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Consistently fast transactions: surpassing the 1000 TPS threshold</h3> +<p>The latest release of Irmin ships with a <a href="https://github.com/mirage/irmin/pull/1510">new core feature</a> +that enables object addresses that are not hashes. This feature unlocks many +future optimisations for the Octez context, including things like automatic +inlining and layered storage. Crucially, it has allowed us to <a href="https://github.com/mirage/irmin/pull/1659">switch to using +direct pointers</a> between internal objects in the Octez +context, eliminating the need to index such objects entirely! This has two +immediate benefits:</p> +<ul> +<li> +<p><strong>read operations no longer need to search the index</strong>, improving the overall +speed of the storage considerably;</p> +</li> +<li> +<p><strong>the index can be shrunk by a factor of 360</strong> (from 21G to 59MB in our tests!). We now only need to index +<em>commit</em> objects in order to be able to recover the root tree for a given +block at runtime. This &quot;minimal&quot; indexing strategy results in indices that +fit comfortably in memory and don't need costly maintenance. As of Octez 13, +<a href="https://gitlab.com/tezos/tezos/-/merge_requests/4714">minimal indexing is now the default</a> node +behaviour<sup><a href="https://tarides.com/feed.xml#fn-2" class="footnote-ref">2</a></sup>.</p> +</li> +</ul> +<p>So what is the performance impact of this change? As detailed in our <a href="https://tarides.com/2021-10-04-the-new-replaying-benchmark-in-irmin">recent +post on replay benchmarking</a>, we were able to isolate and +measure the consequences of this change by &quot;replaying&quot; a previously-recorded +trace of chain activity against the newly-improved storage layer. This process +simulates a node that is bottlenecked purely by the storage layer, allowing us +to assess its limits independently of the other components of the shell.</p> +<p>For these benchmarks, we used a replay trace containing the first 150,000 +blocks of the Hangzhou Protocol deployment on Tezos Mainnet (corresponding to +the period December 2021 &ndash; January 2022)<sup><a href="https://tarides.com/feed.xml#fn-3" class="footnote-ref">3</a></sup>.</p> +<p>One of the most important metrics collected by our benchmarks is overall +throughput, measured in <em>transactions</em> processed per second (TPS). In this +context, a &quot;transaction&quot; is an individual state transition within a particular +block (e.g., a balance transfer or a smart contract activation). We +queried the <a href="https://tzstats.com/docs/api#tezos-api">TzStats API</a> in order to determine the number +of transactions in each block and thus, our measured transaction throughput. +As shown in the graph above, doing this for the last few releases of Octez +reveals that storage TPS has skyrocketted from ~200 in Octez 12 to more than +1000 in Octez 13! &#128640;</p> +<p>As a direct consequence, the total time necessary to replay our Hangzhou trace +on the storage layer has decreased from ~1 day to ~4 hours. We're nearly 6 +times faster than before!</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/7d4ca3167883d6cdcafbc1b29a6fecc0/11214/cpu_time.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 70%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/7d4ca3167883d6cdcafbc1b29a6fecc0/c5bb3/cpu_time.png" class="gatsby-resp-image-image" alt="Bar chart of CPU time elapsed during replay for various Irmin configurations" title="Bar chart of CPU time elapsed during replay for various Irmin configurations" srcset="/static/7d4ca3167883d6cdcafbc1b29a6fecc0/04472/cpu_time.png 170w, +/static/7d4ca3167883d6cdcafbc1b29a6fecc0/9f933/cpu_time.png 340w, +/static/7d4ca3167883d6cdcafbc1b29a6fecc0/c5bb3/cpu_time.png 680w, +/static/7d4ca3167883d6cdcafbc1b29a6fecc0/b12f7/cpu_time.png 1020w, +/static/7d4ca3167883d6cdcafbc1b29a6fecc0/b5a09/cpu_time.png 1360w, +/static/7d4ca3167883d6cdcafbc1b29a6fecc0/11214/cpu_time.png 1943w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<blockquote> +<p>Comparison of CPU time elapsed between Octez 10, 11, 12, and 13 while replaying the 150k +first blocks of the Hangzhou Protocol on Tezos Mainnet<sup><a href="https://tarides.com/feed.xml#fn-1" class="footnote-ref">1</a></sup>. While Octez 10 took 1 day to complete +the replay, Octez 13 only takes 4 hours and is nearly <strong>6 times faster</strong> than +before!</p> +</blockquote> +<p>Overall throughput is not the only important metric, however. It's also +important that the <em>variance</em> of storage performance is kept to a minimum, to +ensure that unrelated tasks such as endorsement can be completed promptly. To +see the impact of this, we can inspect how the total block time varies +throughout the replay:</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/ece9651cd311f011c589c2d69159a3fb/41b28/block_time.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 68.82352941176471%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/ece9651cd311f011c589c2d69159a3fb/c5bb3/block_time.png" class="gatsby-resp-image-image" alt="Line graph of block time during replay for various Irmin configurations" title="Line graph of block time during replay for various Irmin configurations" srcset="/static/ece9651cd311f011c589c2d69159a3fb/04472/block_time.png 170w, +/static/ece9651cd311f011c589c2d69159a3fb/9f933/block_time.png 340w, +/static/ece9651cd311f011c589c2d69159a3fb/c5bb3/block_time.png 680w, +/static/ece9651cd311f011c589c2d69159a3fb/b12f7/block_time.png 1020w, +/static/ece9651cd311f011c589c2d69159a3fb/b5a09/block_time.png 1360w, +/static/ece9651cd311f011c589c2d69159a3fb/41b28/block_time.png 1967w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<blockquote> +<p>Comparison of block time latencies between Octez 10, 11, 12, and 13 while replaying the 150k +first blocks of the Hangzhou Protocol on Tezos Mainnet<sup><a href="https://tarides.com/feed.xml#fn-1" class="footnote-ref">1</a></sup>. Octez 13's mean block validation time is +23.2 &plusmn; 2.0 milliseconds while Octez v10 was down from 274 &plusmn; 183 milliseconds +(and a worst-case peak of 800 milliseconds!). This <strong>12x improvement in +opearation's mean latency</strong> leads to much more consistent endorsement rights for bakers.</p> +</blockquote> +<p>Another performance metric that has a big impact on node maintainers is the +<em>maximum memory usage</em> of the node, since this sets a lower bound on the +hardware that can run Octez. Tezos prides itself on being deployable to very +resource-constrained hardware (such as the Raspberry Pi), so this continues +to be a focus for us. Thanks to the reduced index size, Octez 13 greatly +reduces the memory requirements of the storage layer:</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/1d03d5bf0827a0964a9fabb3b753e7ec/8b4e6/memory_usage.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 69.41176470588235%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/1d03d5bf0827a0964a9fabb3b753e7ec/c5bb3/memory_usage.png" class="gatsby-resp-image-image" alt="Bar chart of maximal memory usage during replay for various Irmin configurations" title="Bar chart of maximal memory usage during replay for various Irmin configurations" srcset="/static/1d03d5bf0827a0964a9fabb3b753e7ec/04472/memory_usage.png 170w, +/static/1d03d5bf0827a0964a9fabb3b753e7ec/9f933/memory_usage.png 340w, +/static/1d03d5bf0827a0964a9fabb3b753e7ec/c5bb3/memory_usage.png 680w, +/static/1d03d5bf0827a0964a9fabb3b753e7ec/b12f7/memory_usage.png 1020w, +/static/1d03d5bf0827a0964a9fabb3b753e7ec/b5a09/memory_usage.png 1360w, +/static/1d03d5bf0827a0964a9fabb3b753e7ec/8b4e6/memory_usage.png 1953w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<blockquote> +<p>Comparison of maximal memory usage (as reported by <code>getrusage(2)</code>) between +Octez 10, 11, 12, and 13 while replaying the 150k first blocks of the Hangzhou +Protocol on Tezos Mainnet<sup><a href="https://tarides.com/feed.xml#fn-1" class="footnote-ref">1</a></sup>. <strong>The peak memory usage is x5 less</strong> +in the Octez 13 storage layer compared to Octez 10**, owing to the +significantly reduced size of the index. 400 MB of RAM is now enough +to bootstrap Octez 13!</p> +</blockquote> +<p>Finally, without an index the context store can no longer guarantee to have perfect object deduplication. Our tests and benchmarks show that this choice has relatively little impact on the context size as a whole, particularly since it no longer needs to store an index entry for every object!</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/523e2c8043f4c5d2849fe0df928c6bb2/5551c/storage_size.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 66.47058823529413%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/523e2c8043f4c5d2849fe0df928c6bb2/c5bb3/storage_size.png" class="gatsby-resp-image-image" alt="Line graph of storage size during replay for various Irmin configurations" title="Line graph of storage size during replay for various Irmin configurations" srcset="/static/523e2c8043f4c5d2849fe0df928c6bb2/04472/storage_size.png 170w, +/static/523e2c8043f4c5d2849fe0df928c6bb2/9f933/storage_size.png 340w, +/static/523e2c8043f4c5d2849fe0df928c6bb2/c5bb3/storage_size.png 680w, +/static/523e2c8043f4c5d2849fe0df928c6bb2/b12f7/storage_size.png 1020w, +/static/523e2c8043f4c5d2849fe0df928c6bb2/b5a09/storage_size.png 1360w, +/static/523e2c8043f4c5d2849fe0df928c6bb2/5551c/storage_size.png 1763w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<blockquote> +<p>Comparison of storage size between Octez 10, 11, 12, and 13 while replaying the 150k +first blocks of the Hangzhou Protocol on Tezos Mainnet<sup><a href="https://tarides.com/feed.xml#fn-1" class="footnote-ref">1</a></sup>. Octez 13's uses similar disk resources +than previous versions: the duplicated data is fully compensated by the +reduced indexed size.</p> +</blockquote> +<p>What this means for users of the Octez shell:</p> +<ul> +<li><strong>The general I/O performance of the storage layer is vastly improved</strong>, as +the storage operations are 6 times faster and a have 12 times lower mean +latency while the memory usage is divided by 5.</li> +<li>In particular, this mode <strong>eliminates the risk of losing baking rewards</strong> +due to long index merges.</li> +</ul> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#migrating-your-octez-node-to-use-the-newer-storage" aria-label="migrating your octez node to use the newer storage permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Migrating your Octez node to use the newer storage</h3> +<p>Irmin 3 is included with <a href="http://tezos.gitlab.io/releases/version-13.html">Octez v13-rc1</a>, which has just been released today. +The storage format is <strong>fully +backwards-compatible</strong> with Octez 12, and no migration process is required to +upgrade.</p> +<p>Newly-written data after the shell upgrade will automatically benefit from the +new, direct internal pointers, and existing data will continue being read as +before. Performing a bootstrap (or importing a snapshot) with Octez 13 will +build a context containing only direct pointers. Node operators should upgrade +as soon as possible to benefit.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#the-future-of-the-octez-storage-layer" aria-label="the future of the octez storage layer permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The future of the Octez storage layer</h3> +<p>Irmin 3 is just the beginning of what the Tarides storage team has in store for +2022. Our next focus is on implementing the next iteration of the <em>layered +store</em>, a garbage collection strategy for rolling nodes. Once this has landed, +we will collaborate with the Tarides Multicore Applications team to help +migrate Octez to using the newly-merged Multicore OCaml.</p> +<p>If this work sounds interesting, the Irmin team at Tarides is <a href="https://tarides.com/jobs/senior-software-engineer-irmin">currently +hiring</a>!</p> +<p>Thanks for reading, and <a href="https://twitter.com/tarides_">stay tuned</a> for future updates from +the Irmin team!</p> +<div class="footnotes"> +<hr/> +<ol> +<li>Our benchmarks compare Octez 10.2, 11.1, 12.0, and 13.0-rc1 by replaying the 150k first blocks of the Hangzhou Protocol on Tezos Mainnet (corresponding to the period Dec 2021 &ndash; Jan 2022) on <a href="https://metal.equinix.com/product/servers/c3-small/">an Intel Xeon E-2278G processor</a> constrained to use at most 8 GB RAM. Our benchmarking setup explicitly excludes the networking I/O operations and protocol computations to focus on the context I/O operations only. Octez 10.2 uses Irmin 2.7.2, while both Octez 11.1 and 12.0 use Irmin 2.9.1 (which explains why the graphs are similar). Octez v13-rc1 uses Irmin 3.2.1, which we just released this month (Apr 2022).<a href="https://tarides.com/feed.xml#fnref-1" class="footnote-backref">&#8617;</a></li> +<li>The trade-off here is that without an index the context store can no longer guarantee to have perfect deduplication, but our testing and benchmarks indicate that this has relatively little impact on the size of the context as a whole (particularly after accounting for no longer needing to store an index entry for every object!).<a href="https://tarides.com/feed.xml#fnref-2" class="footnote-backref">&#8617;</a></li> +<li>To reproduce these benchmarks, you can download the replay trace we used <a href="http://data.tarides.com/lib_context/hangzou-level2.tgz">here</a> (14G). This trace can be replayed against a fork of <code>lib_context</code> available <a href="https://github.com/ngoguey42/tezos/tree/new-action-trace-recording">here</a>.<a href="https://tarides.com/feed.xml#fnref-3" class="footnote-backref">&#8617;</a></li> +</ol> +</div>https://tarides.com/blog/2022-04-26-lightning-fast-with-irmin-tezos-storage-is-6x-faster-with-1000-tps-surpassedLightning Fast with Irmin: Tezos Storage is 6x faster with 1000 TPS surpassed2022-04-26T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#50intech" aria-label="50intech permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>50inTech</h3> +<p>Tarides is proud to have been recognised by 50inTech and <a href="https://app.50intech.com/company/tarides">featured on their website</a>! 50inTech&rsquo;s mission is to achieve a 50% representation of women in tech by 2050.</p> +<p>To this end, 50inTech runs several amazing initiatives that generate opportunities for women looking to have successful careers in tech. Their job board matches talented women with inclusive companies that are hiring, the 50inTech Gender Score helps European companies measure their level of gender-inclusion, and their free virtual bootcamps provide their network of 15,000 women in Europe with crucial networking and mentoring opportunities.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#partnership" aria-label="partnership permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Partnership</h3> +<p>Tarides has been selected as an &ldquo;inclusive company,&rdquo; based on metrics including work-life balance, equal pay, fair career path, and diversity and inclusion policies. We are incredibly proud to be recognised in this way, and we will continue to invest in programs and initiatives to further increase diversity and inclusion.</p> +<p>Our partnership with 50inTech connects us with a highly-skilled, diverse set of people, and we hope that this collaboration will help us achieve our target of filling 50% of Tarides&rsquo;s tech roles with women. Currently that number is 20%, and we&rsquo;d like to increase it!</p> +<p>Read more about our diversity and inclusion goals, and <a href="https://app.50intech.com/company/tarides?page=jobs">see our open positions on 50inTech&rsquo;s website</a>.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#previous-efforts" aria-label="previous efforts permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Previous Efforts</h3> +<p>As Tarides has grown, we have made continuous efforts towards making OCaml and our place within its community more inclusive and diverse. As our CEO Gemma Gordon says, &ldquo;Different opinions, experiences, backgrounds, and strategies are essential for innovation and vital to the success of Tarides.&rdquo;</p> +<p>In this vein, we have supported initiatives such as: <em><a href="https://www.outreachy.org/">Outreachy</a></em>, where we sponsor three paid remote internships per quarter for people experiencing systemic bias and underrepresentation in the tech industry; <em><a href="https://adatechschool.fr/">Ada Tech School</a></em>, which is a programming school that facilitates greater access to programming positions and promotes the feminisation of tech; and <em><a href="https://shecancode.io/">SheCanCode</a></em>, whose mission is to close the tech gender gap. All of these enterprises help women enter, remain, and excel in the tech industry.</p> +<p>Tarides remains committed to inclusivity and continuously looks for ways to reach out to new groups, gain new perspectives, and diversify our workforce. Read more about our mission to support women in tech on <a href="https://app.50intech.com/company/tarides?page=diversity">50inTech&rsquo;s website</a>.</p>https://tarides.com/blog/2022-04-19-tarides-partners-with-50intechTarides Partners with 50inTech!2022-04-19T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#mirageos-40-release-week" aria-label="mirageos 40 release week permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>MirageOS 4.0 Release Week</h2> +<p>Tarides is thrilled to see the great responses to <a href="https://mirage.io/blog/announcing-mirage-40">MirageOS +4.0</a> and the excitement +that&rsquo;s building across the community. We&rsquo;re proud to have played an +important part in its development and release, bringing great tools +and opportunities to OCaml developers. If you haven&rsquo;t kept up with +what&rsquo;s been going on since the release, here is a summary of several +articles posted by various OCaml users.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#cross-compilation" aria-label="cross compilation permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Cross-Compilation</h3> +<p>The MirageOS 4.0 update brings with it a major change in its build +system to support <a href="https://dune.build/">the Dune build system</a>. +Tarides has been working on this feature since 2019, +iterating on various design solutions in the <code>mirage</code> tool with +<a href="https://github.com/mirage/mirage/issues/969">mirage/mirage/#</a>, +<a href="https://github.com/mirage/mirage/pull/979">mirage/mirage#979</a>, +<a href="https://github.com/mirage/mirage/pull/1020">mirage/mirage/#1020</a>, +<a href="https://github.com/mirage/mirage/pull/1024">mirage/mirage#1024</a>, +<a href="https://github.com/mirage/mirage/pull/1153">mirage/mirage#1153</a>, and +finally <a href="https://github.com/mirage/mirage/pull/1226">miarge/mirage#1226</a>. +This incremental process resulted in making several contributions to +upstream OCaml for features and tools required to support +the flexible building of MirageOS libraries: for +instance, adding support for <a href="https://dune.readthedocs.io/en/stable/variants.html">virtual library and +variants</a> in Dune +with <a href="https://github.com/ocaml/dune/pull/1900">ocaml/dune#1900</a>, +<a href="https://github.com/ocaml/dune/pull/2098">ocaml/dune#2098</a>, and +<a href="https://github.com/ocaml/dune/pull/2169">ocaml/dune#2169</a>; or the +development or a new opam plugin to manage +<a href="https://github.com/ocamllabs/opam-monorepo">mono-repositories</a>. We +are happy to see it released to all with Mirage 4.0.</p> +<p>What makes Dune a great option to build MirageOS is that it allows for +customisable cross-compilation flags to compile MirageOS to different +architectures. Using Dune also enables developers to use the Merlin +tool to access a rich set of IDE features when writing +applications. It unlocks a new development workflow based on +<code>opam-monorepo</code>, which downloads all the unikernel dependencies into a +single Dune workspace. Having a single workspace containing all of the +unikernel&rsquo;s code lets developers edit code anywhere in the stack, +which makes work like debugging libraries and improving APIs a faster +and more enjoyable experience. In his <a href="https://mirage.io/blog/2022-03-30.cross-compilation">excellent article on build +contexts in MirageOS +4.0</a>, Lucas +Pluvinage goes into detail about how to use the new cross-compilation +features to build MirageOS unikernels for new architectures.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#email-in-ocaml--mr-mime" aria-label="email in ocaml mr mime permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Email in OCaml &amp; Mr. MIME</h3> +<p>Mr. MIME is an OCaml library that aims to give its users peace of mind +when it comes to the security of their email communications. Mr. MIME +is built on unikernels and deploys them to handle email traffic. At +Tarides, we got a grant from <a href="https://dapsi.ngi.eu/">NGI DAPSI</a> to +work on this project, and several of our engineers have been busy +working hard to make it happen.</p> +<p>Several other libraries support the Mr. MIME library and enable it to +transform an email into an OCaml value, then create an email from it +again. An amazing thing about Mr. MIME is its reliability. Using the +<a href="https://github.com/mirage/hamlet"><code>hamlet</code></a> tool, which proposes a +large corpus of emails for Mr. MIME to parse and re-encode, the team +can prove that Mr. MIME doesn&rsquo;t alter anything in the message between +the parser and the encoder.</p> +<p>The team behind Mr. MIME has also created the library +<em><a href="https://github.com/mirage/colombe">Colombe</a></em> that implements the +foundations of an SMTP protocol with the ability to upgrade its flow +to TLS, giving its users an extra layer of security. A goal for the +future is to provide a full SMTP stack that&rsquo;s able to send and receive +emails.</p> +<p>Mr. MIME also allows its users to manipulate emails through the use of +CLI tools, including +<a href="https://github.com/mirage/ocaml-dkim"><code>ocaml-dkim</code></a>, a tool to verify +and sign an email, and +<a href="https://github.com/mirage/spamtacus"><code>spamtacus</code></a>, a tool which +analyses the incoming email to determine if it&rsquo;s spam or not. The +<a href="https://github.com/mirage/ptt">ptt repo</a> contains several more as well.</p> +<p>If you want to find out more information about Mr. MIME, including +details about its architecture, please read Romain Calascibetta&rsquo;s +<a href="https://mirage.io/blog/2022-04-01-Mr-MIME">article</a>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#mirageos-in-production" aria-label="mirageos in production permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>MirageOS in Production</h2> +<p>The use of MirageOS benefits not only Tarides, but it also enables +several other companies to make their products better. Below are a +couple of examples from <a href="https://docker.com">Docker</a> and +<a href="https://robur.coop">Robur</a> on how they use MirageOS to their +advantage.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#vpn-kit" aria-label="vpn kit permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>VPN Kit</h3> +<p>Docker Desktop is a tool that enables its users to build and share +containerised or isolated applications in either a Mac or Windows +environment. Its main challenge is that running Docker on macOS or +Windows is difficult in terms of compatibility, as Linux primitives +are unavailable on those platforms.</p> +<p>This is where VPN Kit comes in; it uses MirageOS to bridge the gap +between Linux primitives and macOS or Windows by reading the raw +ethernet frames coming out of the Linux VM and translating them into +macOS or Windows high-level syscalls. In this way, MirageOS networking +libraries transparently handle the traffic of millions of containers +every day.</p> +<p>To find out more go read the article &ldquo;How MirageOS Powers Docker +Desktop&rdquo; <a href="https://mirage.io/blog/2022-04-06.vpnkit">on mirage.io</a> +or +<a href="https://www.docker.com/blog/how-docker-desktop-networking-works-under-the-hood/">on docker.com</a>.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#robur-projects" aria-label="robur projects permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Robur Projects</h3> +<p>Robur uses MirageOS for several of their projects, including OpenVPN, +DNS Projects, and CalDAV. All of these projects are written in OCaml +and are deployed as MirageOS unikernels.</p> +<p>The DNS Projects include the &lsquo;Let&rsquo;s Encrypt&rsquo;-Certified DNS solver, a +DNS resolver, and an authoritative DNS server. Robur&rsquo;s DNS server +ensures that the internet user gets to the right IP address, whilst +its DNS resolver finds the exact server to handle the user&rsquo;s +request. Only strictly necessary elements are included in order to +keep the codebase as small as possible for security and +simplicity.</p> +<p>CalDAV is the most recent unikernel released by Robur. As the name +implies, CalDAV is a protocol used to synchronise calendars. +Its minimal codebase comes with significant security benefits.</p> +<p>To find out more go read the article &ldquo;MirageOS Unikernels at Robur&rdquo; on +<a href="https://mirage.io/blog/2022-04-08.robur">mirage.io</a>.</p> +<hr/> +<p>To learn more about MirageOS, take a look at some recent articles at +<a href="https://mirage.io">mirage.io</a>. +If you&rsquo;re interested in working with Tarides or +incorporating MirageOS tools in your project, please <a href="https://tarides.com/company">contact us via +our website</a>.</p>https://tarides.com/blog/2022-04-14-what-s-new-in-mirageos-4What's New in MirageOS 4!2022-04-14T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Tarides is delighted to announce that <a href="https://mirage.io">MirageOS 4</a> is finally released! As core contributors to the project, we are proud to have been part of the journey to 4.0.</p> +<p>What is MirageOS? +MirageOS is a library operating system that constructs unikernels for fast and secure network applications that work across a variety of cloud computing and mobile platforms. The goal of MirageOS is to give the individual control of their own data and take back control of their privacy.</p> +<p>It achieves these goals in several ways, from securely deploying <a href="https://github.com/roburio/unipi">static website hosting</a> with <em>Let&rsquo;s Encrypt</em> certificate provisioning and a secure <a href="https://github.com/mirage/ptt">SMTP stack</a>, to ensuring data privacy with decentralised communication infrastructures like <a href="https://github.com/mirage/ocaml-matrix">Matrix</a>, <a href="https://github.com/roburio/openvpn">OpenVPN Servers</a>, and <a href="https://github.com/roburio/tlstunnel">TLS tunnels</a>, as well as using <a href="https://github.com/mirage/ocaml-dns">DNS(SEC) Servers</a> for better authentication.</p> +<p>Over the years since its first release in 2013, the Mirage ecosystem has grown to include <a href="https://github.com/mirage/">hundreds of libraries</a> and service millions of daily users, along with several major commercial users that rely on MirageOS to keep their code secure. Examples of this include <a href="https://www.docker.com/blog/how-docker-desktop-networking-works-under-the-hood/">Docker Desktop&rsquo;s VPNkit</a>, the <a href="https://www.citrix.com/fr-fr/products/citrix-hypervisor/">Citrix Hypervisor</a>, as well as <a href="https://robur.io">Robur</a>, <a href="https://www.nitrokey.com/products/nethsm">Nitrokey</a>, and Tarides itself!</p> +<p>What&rsquo;s in the New Release? +The new release focuses on better integration with existing ecosystems. For example, it is now much easier to integrate with existing OCaml libraries, as MirageOS 4 is now using <code>dune</code> to build unikernels.</p> +<p>There has also been a major change in how MirageOS compiles projects with the introduction of a new tool called <a href="https://github.com/ocamllabs/opam-monorepo"><code>opam-monorepo</code></a> that separates package management from building the resulting source code. The Opam plugin can create a lock file for project dependencies, download and extract dependency sources locally, and even set up a <a href="https://dune.readthedocs.io/en/stable/dune-files.html#dune-workspace-1">Dune workspace</a>, which then enables <code>dune build</code> to build everything simultaneously.</p> +<p>The new release also adds systematic support for cross-compilation to all supported unikernel targets, meaning that libraries that use C stubs can now have those stubs seamlessly cross-compiled to a desired target.</p> +<p>To find out more about the new release please read <a href="https://mirage.io/blog/announcing-mirage-40">the official release post on Mirage.io</a>.</p> +<p>Keep an eye on <a href="https://mirage.io">mirage.io</a>'s blog over the next two weeks for more posts on the exciting new things that come with MirageOS 4.0, starting with &ldquo;Introduction to Build Contexts in MirageOS 4.0&rdquo; tomorrow!</p>https://tarides.com/blog/2022-03-29-mirageos-4-releasedMirageOS 4 Released!2022-03-29T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>People love to receive mail, especially from loved ones. It&rsquo;s heartwarming to read +each word as their thoughts touch our deepest feelings. Now imagine someone else +reading those private sentiments, like a postal worker. Imagine how violated they&rsquo;d +feel if their postal carrier handed them an open letter with a knowing smile. Of course, +people trust that postal employees won&rsquo;t read their personal correspondence; +however, they regularly risk their privacy when sending emails, images, and messages.</p> +<p>Around 300 billion emails traverse the Internet every single day. They travel +through portals with questionable security, and the messages often contain +private or sensitive data. Most online communication services are composed of +multiple components with complex interactions. If anything goes wrong, it +results in critical security incidents. This leaves an unlocked door for +malicious hackers to breach private information for profit or just for fun. +Since it takes considerable technical skills and reliable infrastructure to +operate a secure email service, most Internet users must +rely on third-parties operators. In practice, there +are only a few large companies that can handle communications with the +proper security levels. Unfortunately for regular people, these companies +profit from mining their personal data. Due to this global challenge, Tarides +focused their efforts to address these issues and find solutions to protect +both personal and professional data.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#an-innovative-solution" aria-label="an innovative solution permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>An Innovative Solution</h3> +<p>Our work resulted in the project &quot;Secure-by-Design Communications Protocols&quot; +(SCoP), a secure, easily deployable solution to preserve users' privacy. In +essence, SCoP puts your messages in a secure, virtual &lsquo;bottle&rsquo; to protect it +from invasive actions. This bottle represents a secure architecture using +type-safe languages and unikernels for both email and instant messaging. +We mould <a href="https://mirage.io/">unikernels</a> (specialised applications that +run on a VM) into refined meshes linked by TLS-firm communication pipes, +as depicted in the image below.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/6c94ba14bec3537413c603635b78c123/0f98f/Dapsi_4.001.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 56.470588235294116%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/6c94ba14bec3537413c603635b78c123/7bf67/Dapsi_4.001.jpg" class="gatsby-resp-image-image" alt="TLS Communication Pipes" title="TLS Communication Pipes" srcset="/static/6c94ba14bec3537413c603635b78c123/651be/Dapsi_4.001.jpg 170w, +/static/6c94ba14bec3537413c603635b78c123/d30a3/Dapsi_4.001.jpg 340w, +/static/6c94ba14bec3537413c603635b78c123/7bf67/Dapsi_4.001.jpg 680w, +/static/6c94ba14bec3537413c603635b78c123/990cb/Dapsi_4.001.jpg 1020w, +/static/6c94ba14bec3537413c603635b78c123/c44b8/Dapsi_4.001.jpg 1360w, +/static/6c94ba14bec3537413c603635b78c123/0f98f/Dapsi_4.001.jpg 1920w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>The SCoP virtual bottle creates a trustworthy information flow where dedicated +unikernels ensure security for communication from origin to destination. We +carefully design every component of SCoP as independent libraries, using +modern development techniques to avoid the common reported threats and flaws. +The <a href="https://ocaml.org">OCaml</a>-based development enables this safe online +environment, which eliminates many exploited security pitfalls. Moreover, +our SCoP project comes with energy-efficient consumption provided by the +lightweight and low-latency design components.</p> +<p>We mostly focused on the sender&rsquo;s side, securing the message inside the SCoP +bottle. For instant messages, we created a capsule with a +<a href="https://github.com/mirage/ocaml-matrix">Matrix client library</a>, +and for emails we based our bottle on the <a href="https://github.com/mirage/ptt">SMTP protocol</a> +and <a href="https://github.com/mirage/mrmime">Mr. MIME</a>. For further protection, +we developed the bottle&rsquo;s &lsquo;cork&rsquo; with the +<a href="https://github.com/mirage/hamlet">Hamlet email corpus</a>.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#the-scop-processes" aria-label="the scop processes permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The SCoP Processes</h3> +<p>First, we generated Hamlet, a collection of emails to test our parser +implementation against existing projects, to ensure that they kept equivalence +between the encoder and decoder. After we successfully parsed and encoded one +million emails, we used Hamlet to stress-test our SMTP stack.</p> +<p>Secondly, we created an SMTP extension mechanism and support for SPF, including +an implementation for DMARC, a security framework in addition to DKIM and SPF. +We completed four components: SPF, DKIM, SMTP, and Mr. MIME, which can generate +a correctly-signed email, signatures, and the DKIM field containing the signatures.</p> +<p>In essence, we designed the SMTP sender bottle with a mesh of unikernels connected +via secured communication pipes. The SMTP Submission Server unikernel receives +the sender&rsquo;s authentication credentials against the secured database maintained +by Irmin. After it confirms the credentials, it sends the email for sealing +(via a TLS pipe) to the DKIM signer. Then the DKIM signer unikernel, responsible +for handling IP addresses, communicates via the nsupdate protocol with the Primary +DNS Server. The DKIM signer places the sender&rsquo;s and receiver&rsquo;s addresses on the email, +seals it with the DKIM signature, and sends it to the SMTP relay for distribution. +The SMTP relay unikernel communicates with the DNS resolver unikernel to locate the +receiver by the DNS name, then it coordinates this location with the Irmin database +to verify the authorization according to the SPF protocol. After all these checks +have passed, the signed and sealed email is secured in the SCoP bottle and launched +through Cyberspace.</p> +<p>Next, we developed the Matrix protocol&rsquo;s client library, and we used it to enable +notifications from the CI system, testing all the new OCaml packages. We also +designed an initial PoC for a Matrix&rsquo;s server-side daemon.</p> +<p>We made significant progress in deploying DNSSEC, a set of security extensions over +DNS. While we completed our first investigation into the DNSSEC prototype, we also +discovered several issues, so we addressed those as lessons learned.</p> +<p>Finally, we completed the <a href="https://github.com/tarides/unikernels">SCoP bottle</a> with +the email receiver, which <a href="https://github.com/mirage/spamtacus">Spamtacus</a> (the +Bayesian spam filter) guards against spam intruders. Furthermore, the +<a href="https://github.com/mirage/ocaml-matrix">OCaml-Matrix</a> server represents +our solution to take care of the instant communication in the Matrix federation.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#a-secure-by-design-smtp-stack" aria-label="a secure by design smtp stack permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>A Secure-by-Design SMTP Stack</h3> +<p>We researched state-of-the-art email spam filtering methods and identified machine +learning as the main trend. We followed this path and equipped our email architecture +with a spam-filter unikernel, which uses a Bayesian method for supervised learning +of spam and acts as a proxy for internet communication in the SMTP receiver. This +spam filter works in two states: preparation, where the unikernel detects spam, +and operation, where the unikernel integrates into the SMTP receiver unikernel +architecture to filter spam emails. Our spam-filter unikernel can also be used +independently as an individual anti-spam tool to help enforce GDPR rules and protect +the user&rsquo;s privacy by preventing spam-induced attacks, such as phishing.</p> +<p>We integrated our spam filter into a unikernel positioned at the beginning +of the SMTP receiver stack. This acts as a first line of defence in an eventual +attack targeting the receiver in order to maintain functionality. The spam-filter +unikernel can be extended to act as an antivirus by analysing the email attachment +for certain features known to characterise malware. We&rsquo;ve already set the premises +for the antivirus by using a prototype analysis of the email attachments. Moreover, +the spam-filter unikernel can contribute with a list of frequent spammers to the +firewall, which we plan to add into the SMTP receiver as the next step in our development of SCoP.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#how-the-technology-works" aria-label="how the technology works permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>How the Technology Works</h3> +<p>DKIM, SPF, and DMARC are three communication protocols meant to ensure email +security by verification of sender identity. The latest RFC standards for +DKIM, SPF, and DMARC are RFC8463, RFC7208, and RFC7489, respectively.</p> +<p>DKIM provides a signer protocol and the associated verifier protocol. DKIM +signer allows the sender to communicate which email it considers legitimate. +Our implementation of the DKIM verifier is associated with the SMTP receiver, +it follows the RFC8463 standard and supports the ED25519 signing algorithm, +i.e., the elliptic curve cryptography generated from the formally verified +specification in the fiat project from MIT.</p> +<p>SPF is an open standard that specifies a method to identify legitimate mail +sources, using DNS records, so the email recipients can consult a list of IP +addresses to verify that emails they receive are from an authorised domain. +Hence, SPF is functioning based on the blacklisting principle in order to +control and prevent sender fraud. Our implementation of the SPF verifier +follows the RFC7208 standard.</p> +<p>DMARC (Domain-based Message Authentication, Reporting, and Conformance) enables +a sender to indicate that their messages comply with SPF and DKIM, and applies +clear instructions for the recipient to follow if an email does not pass SPF or +DKIM authentications (reject, junk, etc.). As such, DMARC is used to create +domain reputation lists, which can help determine the actual email source +and mitigate spoofing attacks. Our implementation of the DMARC verifier is +integrated in the SMTP receiver and follows the RFC7489 standard.</p> +<p>Our secure-by-design SMTP stack contains the DKIM/SPF/DMARC verifier unikernel +on the receiver side. This unikernel verifies the email sender&rsquo;s DNS +characteristics via a TLS communication pipe, and in case the DNS verification +passes, the spam-labelled email goes to the SMTP relay to be dispatched to the +email client. However, in case the DNS verification doesn&rsquo;t pass, we can use +the result to construct a DNS reputation list to improve the SMTP security +via a blacklisting firewall.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#matrix-server" aria-label="matrix server permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Matrix Server</h3> +<p>The Matrix server in our OCaml Matrix implementation manages clients who are +registered to rooms that contain events. These represent client actions, such +as sending a message. Our implementation follows the Matrix specification +standard. From here, we extracted the parts describing the subset of the Matrix +components we chose to implement for our OCaml Matrix server MVP. The OCaml +implementation environment provides secure-by-design properties and avoids +various vulnerabilities, like the buffer overflow recently discovered that +produces considerable information disclosure in other Matrix implementations, +e.g., Element.</p> +<p>The Matrix clients are user applications that connect to a Matrix server via the +client-server API. We implemented an OCaml-CI client, which communicates with the +Matrix servers via the client-server API and tested the integration of the OCaml-CI +communication with both Synapse and our OCaml Matrix server. Please note that our +OCaml Matrix server supports a client authentication mechanism based on user name +identification and password, according to the Matrix specification for authentication +mechanisms.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#spam-filter" aria-label="spam filter permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Spam Filter</h3> +<p>We researched the state of the art in email spam filtering and we identified machine +learning as the main trend. We follow this trend and we equip our email architecture +with a spam filter unikernel, which uses a Bayesian method for supervised learning of +spam and acts as a proxy to the internet communication in the SMTP receiver. The spam +filter implementation works in two stages: preparation, when the unikernel is trained +to detect spam, and operation, when the unikernel is integrated into the SMTP receiver +architecture of unikernels to filter the spam emails. It is worth mentioning that the +spam filter unikernel can be used independently as an individual anti-spam tool to help +enforce the GDPR rules and protect the user's privacy by preventing spam induced attacks +such as phishing.</p> +<p>We integrate the spam filter into an unikernel positioned at the beginning of the +SMTP receiver stack as the first line of defence in an eventual attack targeting the +receiver. In this situation, the unikernel format provides isolation of the attack and +allows the SMTP receiver to maintain functionality. The spam filter unikernel can be +extended to act as an antivirus by analysing the email attachment for certain features +that are known to characterise malware. We have already set the premises for the antivirus +by a prototype analysis of the email attachments. Moreover, the spam filter unikernel could +contribute with a list of frequent spammers to the firewall, which is planned to be added +into the SMTP receiver, as the next step in future work.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#the-dapsi-initiative" aria-label="the dapsi initiative permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The DAPSI Initiative</h3> +<p>Much of the SCoP project was possible thanks to <a href="https://dapsi.ngi.eu">the DAPSI initiative</a>. +They gave Tarides the incentive to further explore an open and secure infrastructure for +communication protocols, especially emails. First, DAPSI supported our team by providing +necessary financing, but their contribution to our project&rsquo;s prosperity runs much deeper +than funding. DAPSI facilitated multiple coaching sessions that helped broaden our horizons +and established reachable goals. Notably, their business coaching enabled us to identify +solutions for our market strategy. Their technical coaching and training offered access +to data portability experts and GDPR regulations, which opened our perspective to novel +trends and procedures. Additionally, DAPSI helped raise our visibility by organising public +communications, and DAPSI&rsquo;s feedback revealed insights on how to better exploit our project&rsquo;s +potential and what corners of the cyber-ecosystem to prioritise. We are deeply grateful to +DAPSI for their support and backing, and we&rsquo;re thrilled to have passed Phase 2!</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#up-next-for-scop" aria-label="up next for scop permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Up Next for SCoP</h3> +<p>We&rsquo;re excited to further develop this project. We&rsquo;ll be experimenting with deploying +unikernels on a smaller chipset, such as IoT. We&rsquo;d also like to research secure data +porting in other domains such as journalism, law, or banking.</p> +<p>Of course we&rsquo;ll be maintaining each of the SCoP components in order to follow the latest +available standards and state-of-the-art technology, including periodical security +analyses of our code-base and mitigation for newly discovered vulnerabilities.</p> +<p>As in all of our work at Tarides, we strive to benefit the entire OCaml community and beyond. +Please find more information on SCoP through our blog posts: +<a href="https://tarides.com/blog/2021-04-30-scop-selected-for-dapsi-initiative">DAPSI Initiative</a> +and <a href="https://tarides.com/blog/2021-10-14-scop-selected-for-dapsi-phase2">DAPSI Phase 1</a>.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/50ddca27efa367497d954f667fc921f8/a76d6/DAPSI_generic.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 3.5294117647058822%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/50ddca27efa367497d954f667fc921f8/7bf67/DAPSI_generic.jpg" class="gatsby-resp-image-image" alt="Sequence of entity logos: in association with NGI, EU, Zabala, FGS, +cap-digital, IMT Starter, Fraunhofer IAIS." title="Sequence of entity logos: in association with NGI, EU, Zabala, FGS, +cap-digital, IMT Starter, Fraunhofer IAIS." srcset="/static/50ddca27efa367497d954f667fc921f8/651be/DAPSI_generic.jpg 170w, +/static/50ddca27efa367497d954f667fc921f8/d30a3/DAPSI_generic.jpg 340w, +/static/50ddca27efa367497d954f667fc921f8/7bf67/DAPSI_generic.jpg 680w, +/static/50ddca27efa367497d954f667fc921f8/990cb/DAPSI_generic.jpg 1020w, +/static/50ddca27efa367497d954f667fc921f8/a76d6/DAPSI_generic.jpg 1139w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p>https://tarides.com/blog/2022-03-08-secure-virtual-messages-in-a-bottle-with-scopSecure Virtual Messages in a Bottle with SCoP2022-03-08T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>We are delighted to announce that Segfault Systems, a spinout from IIT-Madras, +is joining Tarides. Tarides has worked closely with Segfault Systems over the +last couple of years, most notably on the award-winning Multicore OCaml project +and the upstreaming plans for OCaml 5.0. This alliance furthers the goals of +Tarides, bringing the compiler and benchmarking expertise of the Segfault team +directly into the Tarides organisation.</p> +<p>KC Sivaramakrishnan, CEO &amp; CTO of Segfault Systems says that &ldquo;Segfault Systems +was founded to secure the foundations of scalable systems programming in OCaml. +We have successfully incorporated cutting-edge research on +<a href="https://dl.acm.org/doi/10.1145/3453483.3454039">concurrent</a> and +<a href="https://dl.acm.org/doi/10.1145/3408995">parallel</a> programming into OCaml. This +addresses the long-standing need of OCaml developers to utilise the widely +available multicore processing on modern machines. Tarides is at the forefront +of OCaml developer tooling and platform support, and we are excited to join the +team to make OCaml the best tool for industrial-strength concurrent and parallel +programming.&rdquo;</p> +<p>&ldquo;We&rsquo;re thrilled to have the Segfault Systems team join Tarides,&rdquo; says Thomas +Gazagnaire, CTO of Tarides. &ldquo;They have been integral to the success of the +Multicore OCaml project, which has combined cutting edge research and +engineering with consistent communication, promoting Multicore OCaml as an +upstream candidate to the core developer team, as well as +<a href="https://discuss.ocaml.org/tag/multicore-monthly">publishing monthly reports</a> +for the wider community. We look forward to working with our new partners to +make OCaml the tool of choice for developers.&rdquo;</p> +<p>All of Segfault Systems&rsquo; existing responsibilities and open-source commitments +will migrate over to Tarides, where work will continue towards the three main +objectives in 2022:</p> +<ul> +<li>Releasing OCaml 5.0 with support for domains and effect handlers</li> +<li>Supporting the ecosystem to migrate the OCaml community over to OCaml 5.0</li> +<li>Improving developer productivity for OCaml 5.0 by releasing the best platform +tools</li> +</ul> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#ocaml-50" aria-label="ocaml 50 permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>OCaml 5.0</h1> +<p>The next major release of OCaml, version 5.0, will feature primitive support for +parallel and concurrent programming through domains and effect handlers. The +goal is to ensure that the fine balance that OCaml has struck between ease of +use, correctness and performance over the past 25 years continues into the +future with these additional features.</p> +<p>Domains enable shared-memory parallel programming allowing OCaml programs to run +on multiple cores: with domains, OCaml programs will scale better by exploiting +multicore processing. Effect handlers are a mechanism for concurrent +programming: with the introduction of effect handlers, simple direct-style OCaml +code will be flexible, easy to develop, debug and maintain (no more monads for +concurrency!). These features will benefit the entire ecosystem and community, +and we expect it to attract many new users to the language.</p> +<p>As part of the Multicore OCaml project, the team developed +<a href="https://github.com/ocaml-bench/sandmark">Sandmark</a>, a suite of sequential and +parallel benchmarks together with the infrastructure necessary to carefully run +the programs and analyse the results. Sandmark has been instrumental in +assessing and tuning the scalability of parallel OCaml programs and ensuring +that OCaml 5.0 does not introduce performance regressions for existing +sequential programs compared to OCaml 4.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/12df31eaf97ae56b3834fd6308095524/bce1e/scalability.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 74.11764705882352%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/12df31eaf97ae56b3834fd6308095524/c5bb3/scalability.png" class="gatsby-resp-image-image" alt="Matrix of graphs showing scalability of various multicore OCaml workloads" title="Matrix of graphs showing scalability of various multicore OCaml workloads" srcset="/static/12df31eaf97ae56b3834fd6308095524/04472/scalability.png 170w, +/static/12df31eaf97ae56b3834fd6308095524/9f933/scalability.png 340w, +/static/12df31eaf97ae56b3834fd6308095524/c5bb3/scalability.png 680w, +/static/12df31eaf97ae56b3834fd6308095524/b12f7/scalability.png 1020w, +/static/12df31eaf97ae56b3834fd6308095524/b5a09/scalability.png 1360w, +/static/12df31eaf97ae56b3834fd6308095524/bce1e/scalability.png 1696w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p align="center"><i>Scalability of compute intensive OCaml programs</i></p> +<p>Sandmark is now run as <a href="https://sandmark.ocamllabs.io">a nightly service</a> +monitoring the performance of OCaml 5 as it is being developed. Development will +continue to make it even easier to use and more practical by fully integrating +it with <a href="https://github.com/ocurrent/current-bench">current-bench</a> (the continuous +benchmarking system based on OCurrent). +<a href="https://tarides.com/company/">Get in touch</a> if you want to know more.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#ecosystem" aria-label="ecosystem permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Ecosystem</h1> +<p>At Tarides we want all OCaml users to benefit from the new features that OCaml +5.0 will bring, and this means ensuring that the ecosystem is fully prepared. We +aim to develop and maintain a robust set of libraries that work with domains and +effects, together with a diverse parallel benchmarking and performance profiling +suite to use with OCaml 5 applications. The +<a href="https://discuss.ocaml.org/t/eio-0-1-effects-based-direct-style-io-for-ocaml-5/9298">first version of Eio</a>, +the effects-based direct-style IO stack for OCaml 5.0, has been released, +generating lots of interesting discussion within the community. Eio not only +makes it easier to develop, debug and maintain applications utilising +asynchronous IO, but is also able to take advantage of multiple cores when +available.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/873029eb073713d1fbaf8ad64bcee1cd/133ae/http_load.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 66.47058823529413%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/873029eb073713d1fbaf8ad64bcee1cd/c5bb3/http_load.png" class="gatsby-resp-image-image" alt="Line chart showing the scalability of HTTP server implementations in OCaml, Rust and Go" title="Line chart showing the scalability of HTTP server implementations in OCaml, Rust and Go" srcset="/static/873029eb073713d1fbaf8ad64bcee1cd/04472/http_load.png 170w, +/static/873029eb073713d1fbaf8ad64bcee1cd/9f933/http_load.png 340w, +/static/873029eb073713d1fbaf8ad64bcee1cd/c5bb3/http_load.png 680w, +/static/873029eb073713d1fbaf8ad64bcee1cd/b12f7/http_load.png 1020w, +/static/873029eb073713d1fbaf8ad64bcee1cd/b5a09/http_load.png 1360w, +/static/873029eb073713d1fbaf8ad64bcee1cd/133ae/http_load.png 1424w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p align="center"><i>HTTP server performance using 24 cores</i></p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/d0eff71860d106c6e544df7b61f23d7b/2a08f/http_cores.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 66.47058823529413%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/d0eff71860d106c6e544df7b61f23d7b/c5bb3/http_cores.png" class="gatsby-resp-image-image" alt="Line chart showing the load response of HTTP server implementations in OCaml, Rust and Go" title="Line chart showing the load response of HTTP server implementations in OCaml, Rust and Go" srcset="/static/d0eff71860d106c6e544df7b61f23d7b/04472/http_cores.png 170w, +/static/d0eff71860d106c6e544df7b61f23d7b/9f933/http_cores.png 340w, +/static/d0eff71860d106c6e544df7b61f23d7b/c5bb3/http_cores.png 680w, +/static/d0eff71860d106c6e544df7b61f23d7b/b12f7/http_cores.png 1020w, +/static/d0eff71860d106c6e544df7b61f23d7b/b5a09/http_cores.png 1360w, +/static/d0eff71860d106c6e544df7b61f23d7b/2a08f/http_cores.png 1422w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p align="center"><i>HTTP server scaling maintaining a constant load of 1.5 million requests per second</i></p> +<p>The early results are quite promising. An HTTP server based on Eio is able to +serve 1M+ requests/sec on 24 cores, outperforming Go's <code>nethttp</code> and closely +matching Rust's <code>hyper</code> performance. Eio is still heavily under development. +Expect even better numbers for its stable release planned later this year.</p> +<p>The next step is to iterate on the design in collaboration with the community +and our partners. <a href="https://tarides.com/company/">Get in touch</a> if you have +performance-sensitive applications that you'd like to port to Eio, so we can +discuss how the design can meet your needs.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#ocaml-platform" aria-label="ocaml platform permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>OCaml Platform</h1> +<p>In collaboration with community members and commercial funders, Tarides has been +developing and defining the +<a href="https://v3.ocaml.org/learn/platform">OCaml platform tool suite</a> for the last +four years. The goal of the platform is to provide OCaml developers with easy +access to high-quality, practical development tools to build any and every +project. We will continue to develop and maintain these tools, and make them +available for OCaml 5. <a href="https://tarides.com/company/">Reach out to us</a> if you +have specific feature requests to make your developer teams more efficient.</p> +<p>This alliance brings the headcount of Tarides up to 60+ people, all working +towards making OCaml the best language for any and every project. +<a href="https://tarides.com/company/">Join us</a>!</p>https://tarides.com/blog/2022-03-01-segfault-systems-joins-taridesSegfault Systems Joins Tarides2022-03-01T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Today I am incredibly delighted to announce that <a href="https://anil.recoil.org/projects/ocamllabs">OCaml +Labs</a>, a spinout from the +<a href="http://www.cl.cam.ac.uk/">University of Cambridge</a>, is joining +<a href="https://tarides.com/">Tarides</a>. After successfully collaborating on +many OCaml projects over the last four years, this alliance will +combine the expertise of both groups and enable us to bring OCaml, one +of the most advanced programming languages in the world, into +mainstream use. Combining forces will accelerate OCaml development and +its broader adoption. Furthermore, it will bring the security, +portability, and performance of OCaml to a large spectrum of +use-cases: from academic endeavours, such as formal methods and +existing threats within cyber-security, to real-world applications for +climate change, sustainable agriculture, and even space exploration!</p> +<p>All of OCaml Labs&rsquo; existing responsibilities and open-source +commitments will migrate over to Tarides, and thanks to how closely +the teams already work, business will continue without interruption to +continuity or delivery. Gemma Gordon will step up as CEO of Tarides, +and I will continue to lead the technological vision and strategy as +CTO. As Prof. Anil Madhavapeddy, founder of OCaml Labs and scientific +advisor of Tarides, points out, &ldquo;The cutting edge research we +conducted at the University over the past decade has now migrated into +mainline OCaml, and so the ongoing curation and development will now +happen on a commercially supported basis. I&rsquo;m excited to continue +collaborating on research with Tarides from the University of +Cambridge.&rdquo; Tarides will continue the work started at OCaml Labs and +invest in the growth, health, and development of OCaml alongside its +wider use-cases.</p> +<p>I am honoured to have the incredible OCaml Labs team - the team who +carefully designed and crafted <a href="https://discuss.ocaml.org/tag/multicore-monthly">Multicore OCaml</a> - join Tarides. We +share a similar view that a common plague affects the ever-growing +software industry, namely the bad quality of software and the +omnipresence of bugs. However, this is not a fatal flaw. Tools +developed by OCaml Labs over the years do not compromise on quality, +and they allow dev teams to automatically fix at least 70% of +<a href="https://msrc-blog.microsoft.com/2019/07/18/we-need-a-safer-systems-programming-language/">security +bugs</a> +and <a href="https://googleprojectzero.blogspot.com/p/0day.html">0-days security +exploits</a>. Consequently, +OCaml is a simple yet powerful language that can respond to the many +challenges developers face today. Since Tarides&rsquo;s inception, we have +envisioned a future where all OCaml applications are easily deployable +as specialised, secure, and energy-efficient +<a href="https://mirage.io">MirageOS</a> unikernels. This alliance is a step +further in that direction. Since OCaml is the language used to develop +MirageOS, Tarides has continuously developed and maintained parts of +the OCaml ecosystem since its creation. Our alliance with OCaml Labs +makes this more evident. The MirageOS ecosystem critically depends on +OCaml, and the OCaml ecosystem benefits from innovations coming from +the MirageOS project. Tarides is therefore fully committed to making +the synergy between OCaml and MirageOS a success.</p> +<p>Several exciting projects related to Multicore OCaml are coming to a +head this year. The OCaml 5.0 release will support multicore and +effects handlers, influencing every aspect of the language and its +ecosystem. The update will significantly improve both performance and +user experience whilst maintaining existing features that make OCaml +the language of choice for building, for instance, verification +software tools. Using the teams&rsquo; combined experience and zest for +innovation, Tarides is looking to the future of the OCaml language and +community with excitement. We will continue to push the boundaries of +exploration whilst focusing on what's good for the +community. <strong>Therefore, this alliance will complement the commercial +offering of Tarides and contribute to Tarides' mission: empowering +developers, communities and organisations to adopt OCaml as their +primary programming experience by providing training, expertise, and +development services around the OCaml language.</strong></p> +<p>&ldquo;We are thrilled to be part of an organisation innovating in many +areas around operating systems, distributed systems, and security with +the <a href="https://irmin.org">Irmin</a> distributed store and the MirageOS +unikernel projects,&rdquo; says Gemma Gordon, CEO of Tarides. &ldquo;I am +incredibly proud of the people OCaml Labs has collaborated with. We +have been able to build a sustainable open-source community, with +people from various backgrounds all collaborating together. It used to +be that people would have to volunteer their time on OCaml, or work in +academic research. We have created an additional funded path, one that +has increased the diversity and innovation of our community. I&rsquo;m +excited to continue to be part of a group that brings the best minds +together to solve the many problems the software industry faces +today. I look forward to building a flourishing and sustainable +commercial business with existing Tarides partners as well as +developing new collaborative opportunities.&rdquo;</p> +<p>This alliance brings the headcount of Tarides up to 60+ people, all +working towards making OCaml the best language for any and every +project. Join our team: <a href="https://tarides.com/company">https://tarides.com/company</a></p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#ocaml-labs" aria-label="ocaml labs permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>OCaml Labs</h4> +<p><em>OCaml Labs has been at the forefront of innovation in OCaml for +nearly a decade. It was founded at the University of Cambridge by +Prof. Anil Madhavapeddy in 2012 and developed into a spin-out +consultancy company in 2016. OCaml Labs' mission was to push OCaml and +functional programming forward as a platform, making it a more +effective tool for all users (including large-scale industrial +deployments) while at the same time growing the appeal of the language +to broaden its applicability and popularity.</em></p> +<p><em>OCaml Labs has been instrumental in developing and maintaining the +OCaml platform for OCaml usage at an industrial scale. OCaml Labs +contributed to the development and maintenance of the +<a href="https://opam.ocaml.org/">opam</a> package management +ecosystem and of the OCaml community website, <a href="https://ocaml.org/">https://ocaml.org/</a>, +first launched in 2012. These sites act as hubs for the OCaml community +to showcase the state-of-the-art and facilitate innovation. A new and +improved version of the site has been <a href="https://v3.ocaml.org/">released under +beta</a> this month. In addition, OCaml Labs' most +significant (and technically complex) project, OCaml Multicore, will +finally come to fruition this year. Work on this project began in +2014, followed by award-winning papers and presentations in 2020 and +the announcement in late 2021 that Multicore will become part of the +mainline OCaml compiler.</em></p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#tarides" aria-label="tarides permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Tarides</h4> +<p><em>Tarides is a tech start-up founded in Paris in 2018 by pioneers of +programming languages and cloud computing. They develop a software +infrastructure platform to deploy secure, distributed applications +with strict resource constraints and low-latency performance +requirements. This platform builds upon innovative and open-source +projects such as MirageOS and Irmin and underpins mission-critical +deployments such as +<a href="https://tarides.com/blog/2021-03-04-florence-and-beyond-the-future-of-tezos-storage">Tezos</a>, +Citrix XenServer, or <a href="https://www.docker.com/blog/how-docker-desktop-networking-works-under-the-hood/">Docker for +Desktop</a>. In +addition, Tarides uses unikernel technologies and applies the research +done in programming languages to real-world systems to build safe and +performant applications.</em></p> +<p><em>Tarides has been part of the Founder program of Station F in 2018. In +addition, it got selected in France for the <a href="https://tarides.com/blog/2019-07-05-i-lab-2019">Concours d&rsquo;Innovation +i-Lab</a>, organised by +the French Ministry of Higher Education, Research and Innovation in +partnership with Bpifrance. This national contest awards company +creation and innovative technologies. Tarides got awarded during the +<a href="https://tarides.com/blog/2019-12-11-tarides-wins-the-fic-2020-startup-award">FIC +2020</a>, +the leading European cybersecurity event.</em></p>https://tarides.com/blog/2022-01-27-ocaml-labs-joins-taridesOCaml Labs Joins Tarides2022-01-27T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>November has become MirageOS month! Between the upcoming official MirageOS 4.0 release, making custom Christmas Tree garlands +with <a href="https://tarides.com/blog/2021-11-11-mirageos-workshop-working-with-the-raspberry-pi-4">MirageOS on a Raspberry Pi</a>, +and now <a href="https://signalsandthreads.com/what-is-an-operating-system">this &quot;What is an Operating System?&quot; podcast</a> (featuring Tarides advisor and core MirageOS maintainer Anil Madhavapeddy), it truly is MirageOS month!</p> +<p>MirageOS can do much more than program a Raspberry Pi for Christmas decor. From +<a href="https://tarides.com/blog/2021-11-18-tarides-hyper-partners-in-agricultural-innovation">agricultural monitoring</a> to +<a href="https://tarides.com/blog/2021-06-29-tarides-introduces-osmose-at-the-open-source-innovation-sprint">smart buildings</a>, its +applications cover a wide range of needs. For example, it can also be used in critical pieces of infrastructure where security is of paramount importance, such as the <a href="https://tarides.com/blog/2020-04-20-the-future-of-tezos-on-mirageos">Tezos blockchain</a>. +Combined with <a href="https://irmin.org">Irmin</a>, it allows developers to build secure-by-design, offline-first systems and invert +the current cloud-centric model for designing applications to securely connect physical spaces with extremely low latency +and high bandwidth, using local-area computation capabilities.</p> +<p>You can read the entire <a href="https://signalsandthreads.com/what-is-an-operating-system/">transcript here</a> and find links to multiple places to listen through podcast apps.</p>https://tarides.com/blog/2021-11-23--signals-and-threads-podcast-what-is-an-operating-system'Signals and Threads' Podcast: What is an Operating System?2021-11-23T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>We are thrilled to announce a partnership between Tarides and <a href="https://hyper.ag">Hyper</a>, a technology provider in the agritech space who&rsquo;s building +an &quot;operating system for high-performing farms.&quot; Indoor and vertical farms are becoming tech businesses that require scalable, +flexible, and easy-to-use tools to facilitate data analysis and thereby increase productivity. According to the <a href="https://www.eitfood.eu/blog/post/is-vertical-farming-really-sustainable">State of Indoor +Farming 2020 Report</a>, &ldquo;40% of vertical and indoor farms are +implementing data analytics and control automation to increase yield and lower cost of production.&rdquo;</p> +<p>Hyper&rsquo;s product offers a developer-friendly platform for modern farms to integrate analytics and automation without +worrying about hardware, scaling, or maintenance. There is a natural synergy between Hyper's product and Tarides&rsquo;s mission +to bring robust and scalable functional systems to the industry. Both teams will be working on technology to help solve +some big problems the world faces.</p> +<p>First, some context about how agriculture affects our environment:<br/> +<strong>The world&rsquo;s population is growing, but the size of our planet is not.</strong></p> +<p>Agriculture is currently responsible for:</p> +<ul> +<li>75% of the world&rsquo;s deforestation<sup><a href="https://tarides.com/feed.xml#fn-1" class="footnote-ref">1</a></sup></li> +<li>50% of the world&rsquo;s habitable land</li> +<li>70% of the world's freshwater withdrawals</li> +<li>26% of the greenhouse gas emissions<sup><a href="https://tarides.com/feed.xml#fn-2" class="footnote-ref">2</a></sup></li> +</ul> +<p>It&rsquo;s an important time for indoor farming, as it&rsquo;s already producing twenty times (20x) more yield per area while using 95% less water and +zero (0) pesticides. Plus, indoor farming will reduce the carbon footprint by cutting their food miles in half (50%)<sup><a href="https://tarides.com/feed.xml#fn-3" class="footnote-ref">3</a></sup>. Vertical farming +complements traditional farming by growing fresh produce for cities in a sustainable way.</p> +<p>Hyper's mission is to simplify the integration of sensors and controllers for data collection and automation. Their roadmap is +focused on implementing real-time computation of metrics for environmental data gathered from farms, prototyping computer vision models +for crop analysis, and automated crop traceability infrastructure. With their data platform, growers can optimise yield and reduce operating +costs by getting access to crop growth metrics and climate automation profiles without a dedicated engineering team.</p> +<p>Hyper's founders are experienced engineers and OCaml hackers who have worked on IoT and data analytics products in the past in the retail, +biotech, and multimedia industries. Utilizing our MirageOS ecosystem, Hyper's sensor networks and cameras are continuously collecting +millions of data points across large-scale farming operations to ensure consistent crop quality, detect issues early, and automate climate +control. Since the technology is offline-first, it enables Hyper to collect this data securely, without the risk of breach or loss, as is +sometimes the case in cloud computing.</p> +<p>As a technological partner, Tarides will help Hyper build a scalable IoT platform that leverages the OCaml ecosystem. Our support will +help Hyper bring the product to the market faster while also contributing to the open-source IoT ecosystem for MirageOS and related +projects. Hyper&rsquo;s IoT platform will be fully open-source next year and will focus on implementing better support for MQTT, CoAP, and other protocols.</p> +<p>A new version of MirageOS will be released in November, so this exciting announcement couldn't come at a better time! Hyper&rsquo;s use of +the <a href="https://mirage.io">MirageOS</a> ecosystem to build its data intelligence product is yet another real-world example that displays the +power and efficiency of MirageOS.</p> +<p>It&rsquo;s truly exciting technology because we&rsquo;re at a point in history where even the most adamant climate change critics have begun to admit +the climate is indeed changing. There are things we can each do to contribute, so Tarides is proud to collaborate with a company that&rsquo;s +actively seeking solutions to help reduce the amount of greenhouse gas emissions while increasing productivity.</p> +<p>Hyper currently has production deployments in vertical and indoor farms in the UK and East Africa and is planning on scaling the +operations in the coming months.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#sources" aria-label="sources permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>SOURCES</h4> +<div class="footnotes"> +<hr/> +<ol> +<li><a href="https://ourworldindata.org/drivers-of-deforestation">https://ourworldindata.org/drivers-of-deforestation</a><a href="https://tarides.com/feed.xml#fnref-1" class="footnote-backref">&#8617;</a></li> +<li><a href="https://ourworldindata.org/food-ghg-emissions">https://ourworldindata.org/food-ghg-emissions</a><a href="https://tarides.com/feed.xml#fnref-2" class="footnote-backref">&#8617;</a></li> +<li><a href="https://ourworldindata.org/environmental-impacts-of-food">https://ourworldindata.org/environmental-impacts-of-food</a><a href="https://tarides.com/feed.xml#fnref-3" class="footnote-backref">&#8617;</a></li> +</ol> +</div>https://tarides.com/blog/2021-11-18-tarides-hyper-partners-in-agricultural-innovationTarides & Hyper: Partners in Agricultural Innovation2021-11-18T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Earlier this week, Romain Calascibetta hosted an in-house MirageOS workshop for employees, both locally and remotely +around the world. This interactive workshop taught participants how to build an operating system on a Raspberry +Pi 4 using MirageOS. They got to create their own OS and play with projects, like one they dubbed <em>GuirlandeOS</em> for +which they programmed an LED garland to trim their Christmas tree, creating their own customized light show! There will +be a dedicated blog to <em>GuirlandeOS</em> soon.</p> +<p>That&rsquo;s one fun example of what can be done with MirageOS and Raspberry Pi, but the possibilities are endless. +For example, one could use this dynamic pair to create solar-powered websites (something we&rsquo;ll hear more about next week).</p> +<p>The spirit of <a href="https://mirage.io">MirageOS</a> is that anyone can integrate it, even if they don't work at Tarides. Although the +workshop was only for employees, MirageOS is for everyone! Since it's autonomous from Tarides, we encourage you +to play with MirageOS and see what you can create.</p> +<p>Romain has opened the contents of his informative workshop to the world! Follow along with +<a href="https://drive.google.com/file/d/1NeYA5pjN-4xjFWCpyYxkVSsn4ii9Nktp/view?usp=drivesdk">his slides</a>, +which will walk you through the MirageOS toolchain to create your very own projects. You can read more about the +Raspberry Pi process <a href="https://github.com/mirage/mirage/pull/1253">in this repo</a>.</p> +<p>Early next week, we&rsquo;ll continue with MirageOS month by showcasing a podcast where Anil Madhavapeddy +talks about those solar-powered websites and more!</p> +<p>Happy hacking!</p>https://tarides.com/blog/2021-11-11-mirageos-workshop-working-with-the-raspberry-pi-4MirageOS Workshop: Working with the Raspberry Pi 42021-11-11T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>The official release of MirageOS 4.0 quickly approaches! Learn about some general MirageOS concepts and get a +sneak park at the forthcoming changes in MirageOS 4.0 during a LIVE presentation today at 15h CET.</p> +<p>Lucas Pluvinage will lead you through a live-streaming presentation to acquaint you with MirageOS 4.0. You&rsquo;ll learn +what kinds of problems MirageOS can solve and about Functoria, the compilation model. Then Lucas will also discuss the switch +to the&nbsp;Dune&nbsp;build system and how that enables cross-compilation, not to mention the creation of new compilation targets, +such as the Raspberry Pi!</p> +<p>Watch the live presentation today at 15h CET. Just go to: <a href="https://meet.google.com/iqy-urht-rcn">https://meet.google.com/iqy-urht-rcn</a>, or dial &#8234;(FR) +33 1 87 40 43 45&#8236; with +the PIN: &#8234;288 878 885&#8236;#. You can also find other phone numbers here:&nbsp;<a href="https://tel.meet/iqy-urht-rcn?pin=3736259978366">https://tel.meet/iqy-urht-rcn?pin=3736259978366</a> if you're not in France.</p> +<p>This presentation will be a great introduction to Romain Calascibetta's forthcoming MirageOS workshop tomorrow. +Check back on this blog and <a href="https://twitter.com/tarides_">follow us on Twitter</a> to find out more about how to watch +and participate in this informative workshop tomorrow!</p> +<p>Until then, check out Lucas's presentation today at 15h CET.</p>https://tarides.com/blog/2021-11-09-mirageos-4-0-preview-live-presentationMirageOS 4.0 Preview Live Presentation2021-11-09T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p><strong>In April, <a href="https://tarides.com/blog/2021-04-30-scop-selected-for-dapsi-initiative">we announced</a> that the <a href="https://dapsi.ngi.eu">DAPSI initiative</a> accepted +the proposal for our Secure-by-Design Communication Protocols (SCoP) project. Today, we are thrilled to announce that SCoP has passed the initiative&rsquo;s Phase 1, +and we are now on our way to Phase 2!</strong></p> +<p>SCoP is an open, secure, and resource-efficient infrastructure to engineer a modern basis for open messaging (for existing and emerging protocols) +using type-safe languages and unikernels&mdash;to ensure your private information remains secure. After all, you wouldn&rsquo;t like your postal carrier reading +your snail mail, so why should emails be any different?</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#challenges" aria-label="challenges permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Challenges</h2> +<p>To operate an email service requires many technical skills and reliable infrastructure. As a result, only a few large companies can handle emails with +the proper security levels. Unfortunately, the core business model of these companies is to mine your personal data.</p> +<p>The number of emails exchanged every day is expected to reach 333 billion in 2022. That&rsquo;s a considerable amount of data, much of it private or sensitive, +sent across Cyberspace through portals with questionable security. The &lsquo;memory unsafe&rsquo; languages used in most communication services leave far too much room +for mistakes that have serious ramifications, like security flaws that turn into security breaches, leaving your personal or business information vulnerable to +malicious hackers.</p> +<p>Due to this global challenge, we set out to build a simple, secure, easily deployable solution to preserve users' privacy, and we&rsquo;re making great strides toward +accomplishing that goal. We base our systems on scientific foundations to last for decades and drive positive change for the world. Our robust understanding of +both theory and practice enables us to solve these security problems, so we explore ideas where research and engineering meet at the intersection of the domains +of operating systems, distributed systems, and programming languages.</p> +<p>Every component of SCoP is carefully designed as independent libraries, using modern development techniques to avoid the common reported threats and flaws. +For instance, the implementation of protocol parsers and serializers are written in a type-safe language and tested using fuzzing. Combining these techniques +will increase users' trust to migrate their personal data to these new, more secure services.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#architecture" aria-label="architecture permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Architecture</h2> +<p>The architecture of the SCoP communication service is composed of an Email Service based on a secure extension of the SMTP protocol, and a decentralised +real-time communication system based on Matrix.</p> +<p>The <a href="https://github.com/dinosaure/ptt">SMTP</a> and <a href="https://github.com/clecat/ocaml-matrix">Matrix</a> protocols implemented in SCoP follow the separation of +concerns design principle, meaning that the SMTP Sender and SMTP Receiver are designed as two distinct units. They&rsquo;re implemented as isolated micro-services +which run as unikernels. The SMTP Sender, Receiver, and Matrix are all configurable, and each configuration comes with a security risk analysis report to +understand possible privacy risks</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#progress" aria-label="progress permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Progress</h2> +<p>Not only are we on our way to Phase 2 in the <a href="https://dapsi.ngi.eu">DAPSI Initiative</a>, but we&rsquo;re also proud to report that we&rsquo;re on track with our +planned milestones!</p> +<p>Our <strong>first milestone</strong> was to generate a corpus of emails to test our parser implementation against existing projects in order to detect differences +between the descriptions specified in the RFCs. We now have 1 million emails that have been parsed/encoded without any issues! Our email corpus keeps +isomorphism between the encoder and decoder, and you can find it in this <a href="https://github.com/mirage/hamlet">GitHub Repo</a>, as we encourage implementors of other languages to use it to improve +their trust in their own implementation.</p> +<p>We set out to implement an SMTP extension mechanism and support for SPF as well as implement DMARC, a security framework, on top of DKIM and SPF for our +<strong>second milestone</strong>, and we are right on target. To date, we&rsquo;ve completed four components:</p> +<ul> +<li><a href="https://github.com/dinosaure/ocaml-spf">SPF</a></li> +<li><a href="https://github.com/dinosaure/ocaml-dkim">DKIM</a></li> +<li><a href="https://github.com/dinosaure/ptt">SMTP</a> can send and verify emails</li> +<li><a href="https://tarides.com/blog/2019-09-25-mr-mime-parse-and-generate-emails">MrMIME</a> can generate the email, then SMTP sends the email (signed by a DKIM private key). We can correctly sign an email, generate a signature, and the DKIM field containing the signature. When the email is received, we check the DKIM signature and the SPF metadata.</li> +</ul> +<p>For our <strong>third milestone</strong>, we set out to implement DNSSEC, a set of security extensions over DNS. This security layer verifies the identity of an email sender +through DKIM/SPF/DMARC, but it also needs security extensions in the DNS protocol. We completed our initial investigation of a DNSSEC implementation prototype, +and we discovered several issues, like some of the elliptic curve cryptography was missing. Those necessary cryptographic primitives are now available, so we +should complete this milestone by the end of the month.</p> +<p>Finally, our <strong>fourth milestone</strong> was to implement the Matrix protocol (client and server). We completed the protocol&rsquo;s client library, which sends a notification +from OCaml CI. Plus, we have a PoC, and Matrix&rsquo;s server-side, which received the notification, is also complete.</p> +<p>Although we still have much work ahead of us, we&rsquo;re quite pleased with the progress thus far, and so is the DAPSI Initiative! Follow our progress by <a href="https://tarides.com/feed.xml">subscribing +to this blog</a> and our <a href="https://twitter.com/tarides_">Twitter feed (@tarides_)</a> for the latest updates.</p> +<br/> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/50ddca27efa367497d954f667fc921f8/a76d6/DAPSI_generic.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 3.5294117647058822%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/50ddca27efa367497d954f667fc921f8/7bf67/DAPSI_generic.jpg" class="gatsby-resp-image-image" alt="Sequence of entity logos: in association with NGI, EU, Zabala, FGS, +cap-digital, IMT Starter, Fraunhofer IAIS." title="Sequence of entity logos: in association with NGI, EU, Zabala, FGS, +cap-digital, IMT Starter, Fraunhofer IAIS." srcset="/static/50ddca27efa367497d954f667fc921f8/651be/DAPSI_generic.jpg 170w, +/static/50ddca27efa367497d954f667fc921f8/d30a3/DAPSI_generic.jpg 340w, +/static/50ddca27efa367497d954f667fc921f8/7bf67/DAPSI_generic.jpg 680w, +/static/50ddca27efa367497d954f667fc921f8/990cb/DAPSI_generic.jpg 1020w, +/static/50ddca27efa367497d954f667fc921f8/a76d6/DAPSI_generic.jpg 1139w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p>https://tarides.com/blog/2021-10-14-scop-selected-for-dapsi-phase2SCoP Passed Phase 1 of the DAPSI Initiative!2021-10-14T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>As mentioned in our <a href="https://forum.tezosagora.org/t/tezos-storage-irmin-summer-2021-update/3744">Tezos Storage / Irmin Summer 2021 Update</a> on the Tezos Agora forum, the Irmin team's goal has been to improve Irmin's performance in order to speed up the <em>Baking Account</em> migration process in Octez, and we managed to make it 10x faster in the first quarter of 2021. Since then, we've been working on a new benchmark program for Irmin that's based on the interactions between Irmin and Octez. This won't just help make Irmin even faster, it will also help speed up the Tezos blockchain process and enable us to monitor Irmin's behavior in Octez.</p> +<p>Octez is the <a href="https://gitlab.com/tezos/tezos">Tezos node implementation</a> that uses Irmin to store the blockchain state, so Irmin is a core component of Octez that's responsible for virtually all the filesystem operations. Whether a node is launched to produce new blocks (aka &ldquo;bake&rdquo;) or just to participate in peer-to-peer sharing of existing blocks, it must first update itself by rebuilding blocks individually until it reaches the head of the blockchain. This first phase is called <em>bootstrapping</em>, and once it reaches the blockchain head, we say it has been <em>bootstrapped</em>. Currently, the <em>bootstrapped</em> phase processes 2 block per minute, which is the rate at which the Tezos blockchain progresses. The next goal is to increase that rate to 12 blocks per minute.</p> +<p>Irmin stores the content of the Tezos blockchain on a disk using the <code>irmin-pack</code> library. There is one-to-one correspondence between the Tezos block and the Irmin commits. Each time Tezos produces a block, Irmin produces a commit, and then the Tezos block hash is computed using the Irmin commit hash. The Irmin developers are working on improving the <code>irmin-pack</code> performance which in turn will improve the performance of Octez.</p> +<p>A benchmark program is considered &ldquo;fair&rdquo; when it's representative of how the benchmarked code is used in the real world&mdash;for example, the access-patterns to Irmin. A standard database benchmark would first insert random data and then remove it. Such a synthetic benchmark would fail to reproduce the bottlenecks that occur when the insertions and removal are interleaved. Our solution to &ldquo;fairness&rdquo; is radical: <em>replaying</em>. Within a sandboxed environment, we <em>replay</em> a real world situation.</p> +<p>Basically, our new benchmark program makes use of a benchmarked code and records statistics for later analysis. The program is stored in the <code>irmin-bench</code> library and makes use of operation traces (called <em>action traces</em>) when Octez runs with Irmin. Later, the program replays the recorded operations one at a time while simultaneously recording tonnes of statistics (called stat traces). Data analysis of the stat traces may reveal many interesting facts about the behaviour of Irmin, especially if we tweak:</p> +<ul> +<li>the configuration of Irmin (e.g., what&rsquo;s the impact of doubling the size of a certain cache?)</li> +<li>the replay parameters (e.g., does Irmin's performance decay over time? Does <code>irmin-pack</code> perform as well after 24 hours of replay as after 1 minute of replay?)</li> +<li>the hardware (e.g., does <code>irmin-pack</code> perform well on a Raspberry Pi?)</li> +<li>the code of Irmin (e.g., does this PR have an impact on performance?)</li> +</ul> +<p>This benchmarking process is similar to the record-replay feature available with <a href="https://docs.tezedge.com/tezedge/record-replay">TezEdge</a>.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#recording-the-action-trace" aria-label="recording the action trace permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Recording the Action Trace</h3> +<p>By adding logs to Tezos, we can record the Tezos-Irmin interactions and thus capture the Irmin &ldquo;view&rdquo; of Tezos. We&rsquo;ve recorded <em>action traces</em> during the <em>bootstrapping</em> phase of Tezos nodes, which started from <em>Genesis</em>&mdash;the name of the very first Tezos block inserted into an empty Irmin store.</p> +<p>The interaction surface between Irmin and Octez is quite simple, so we were able to reduce it to eight (8) elementary operations:</p> +<ul> +<li><code>checkout</code>, to pull an Irmin tree from disk;</li> +<li><code>find</code>, <code>mem</code> and <code>mem_tree</code>, read only operations on an Irmin tree;</li> +<li><code>add</code>, <code>remove</code> and <code>copy</code>, write only operations on an Irmin tree;</li> +<li><code>commit</code>, to push an Irmin tree to disk.</li> +</ul> +<p>It&rsquo;s important to remember that Irmin behaves much like Git. It has built-in snapshotting and is compatible with Git itself when using the <code>irmin-git</code> library. In fact, these operations are very similar to Git, too.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#sequence-of-operations" aria-label="sequence of operations permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Sequence of Operations</h3> +<p>To illustrate further, here's a concrete example of an operation sequence inside an action trace:</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/d133a7455102c1b17e30fae407e04c78/7131f/ygWh3cg.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 48.23529411764706%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/d133a7455102c1b17e30fae407e04c78/c5bb3/ygWh3cg.png" class="gatsby-resp-image-image" alt="ygWh3cg" title="ygWh3cg" srcset="/static/d133a7455102c1b17e30fae407e04c78/04472/ygWh3cg.png 170w, +/static/d133a7455102c1b17e30fae407e04c78/9f933/ygWh3cg.png 340w, +/static/d133a7455102c1b17e30fae407e04c78/c5bb3/ygWh3cg.png 680w, +/static/d133a7455102c1b17e30fae407e04c78/7131f/ygWh3cg.png 710w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>This shows Octez&rsquo;s first interaction with Irmin at the very beginning of the blockchain! The first block, <em>Genesis</em>, is quite small (it ends at operation #5), but the second one is massive (it ends at operation #309273). It contains no transactions because it only sets up the entire structure of the tree. It precedes the beginning of Tezos's initial protocol called &ldquo;Alpha I&rdquo;.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#benchmark-benefits" aria-label="benchmark benefits permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Benchmark Benefits</h3> +<p>Our benchmark results convey the sheer magnitude of the Tezos blockchain and the role that Irmin plays within it. We&rsquo;ve recorded a trace that covers the blocks from the beginning the blockchain in June 2018 all the way up to May 2021. It weighs <strong>96GB</strong>.</p> +<p>Although it took <strong>34 months</strong> for Tezos to reach that state, bootstrapping so far takes only <strong>170 hours</strong>, and replaying it takes a mere <strong>37 hours</strong> on a section of the blockchain that contains <strong>1,343,486 blocks</strong>. On average, this corresponds to <strong>1 per minute</strong> when the blocks were created, <strong>132 per minute</strong> when bootstrapping, and <strong>611 per minute</strong> during replay.</p> +<p>On this particular section of the blockchain, Octez had <strong>1,089,853,521 interactions</strong> with Irmin. On average, this corresponds to <strong>12 per second</strong> when the blocks were created, <strong>1782 per second</strong> during bootstrapping, and <strong>8258 per second</strong> during replay.</p> +<p>The chart below demonstrates how many of each Irmin operation occur per block (on average):</p> +<center> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 500px; "> + <a href="https://tarides.com/static/c87031f583955f306e8c64611c474c01/0b533/4yKd8iQ.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 100%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/c87031f583955f306e8c64611c474c01/0b533/4yKd8iQ.png" class="gatsby-resp-image-image" alt="4yKd8iQ" title="4yKd8iQ" srcset="/static/c87031f583955f306e8c64611c474c01/04472/4yKd8iQ.png 170w, +/static/c87031f583955f306e8c64611c474c01/9f933/4yKd8iQ.png 340w, +/static/c87031f583955f306e8c64611c474c01/0b533/4yKd8iQ.png 500w" sizes="(max-width: 500px) 100vw, 500px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +</center> +<p>This next chart displays where the time is spent during replay:</p> +<center> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/85cffd35d784692988f8aa1f04e9180a/78797/u5Fv2Zb.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 50%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/85cffd35d784692988f8aa1f04e9180a/c5bb3/u5Fv2Zb.png" class="gatsby-resp-image-image" alt="u5Fv2Zb" title="u5Fv2Zb" srcset="/static/85cffd35d784692988f8aa1f04e9180a/04472/u5Fv2Zb.png 170w, +/static/85cffd35d784692988f8aa1f04e9180a/9f933/u5Fv2Zb.png 340w, +/static/85cffd35d784692988f8aa1f04e9180a/c5bb3/u5Fv2Zb.png 680w, +/static/85cffd35d784692988f8aa1f04e9180a/b12f7/u5Fv2Zb.png 1020w, +/static/85cffd35d784692988f8aa1f04e9180a/78797/u5Fv2Zb.png 1125w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +</center> +<p>With <code>irmin-pack</code>, an OCaml thread managed by the <a href="https://github.com/mirage/index/"><code>index</code> library</a> is running concurrently to the main thread (i.e., the <em>merge</em> thread), a fraction of the durations (shown above) are actually spent in that thread. Refer to this <a href="https://tarides.com/blog/2020-09-01-introducing-irmin-pack">blog post</a> for more details on <code>index</code>'s <em>merges</em>.</p> +<p>The following chart illustrates how memory usage evolves during replay:</p> +<center> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/fda8364be92a0ed94a1228298ea88b68/20c85/F0bORTg.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 40%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/fda8364be92a0ed94a1228298ea88b68/c5bb3/F0bORTg.png" class="gatsby-resp-image-image" alt="F0bORTg" title="F0bORTg" srcset="/static/fda8364be92a0ed94a1228298ea88b68/04472/F0bORTg.png 170w, +/static/fda8364be92a0ed94a1228298ea88b68/9f933/F0bORTg.png 340w, +/static/fda8364be92a0ed94a1228298ea88b68/c5bb3/F0bORTg.png 680w, +/static/fda8364be92a0ed94a1228298ea88b68/20c85/F0bORTg.png 999w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +</center> +<p>On a logarithmic scale, this last chart shows the evolution of the <em>write amplification</em>, which indicates the amount of rewriting (e.g., at the end of the replay, 20TB of data have been written to disk in order to create a store that weighs 73GB).</p> +<center> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/865f6695d4f8132e84cadef0cd099f38/20c85/PhNqloN.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 50%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/865f6695d4f8132e84cadef0cd099f38/c5bb3/PhNqloN.png" class="gatsby-resp-image-image" alt="PhNqloN" title="PhNqloN" srcset="/static/865f6695d4f8132e84cadef0cd099f38/04472/PhNqloN.png 170w, +/static/865f6695d4f8132e84cadef0cd099f38/9f933/PhNqloN.png 340w, +/static/865f6695d4f8132e84cadef0cd099f38/c5bb3/PhNqloN.png 680w, +/static/865f6695d4f8132e84cadef0cd099f38/20c85/PhNqloN.png 999w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +</center> +<p>The <em>merge</em> operations of the <code>index</code> library are the source of this poor <em>write amplification</em>. The Irmin team is working hard on improving this metric:</p> +<ul> +<li>on the one hand, the new <em>structured keys</em> feature of the upcoming Irmin 3.0 release will help to reduce the pressure on the <code>index</code> library,</li> +<li>on the other hand, we are working on algorithmic improvements of <code>index</code> itself.</li> +</ul> +<p>Another nice way to use the trace is for testing. When replaying a trace, we can recompute the commit hashes and check that they correspond to the trace hashes, so the benchmark acts as additional tests to ensure we don't compromise the hashes computed in Tezos.</p> +<p>Complex changes to Tezos can be simulated first in Irmin. For example, the <a href="https://gitlab.com/tezos/tezos/-/merge_requests/2771">path flattening in Tezos</a> feature (merged in August 2021) can now be tested earlier in the process with our benchmark. Prior to the trace benchmarks, we first had to make the changes in Tezos to understand their repercussions on Irmin directly from the Tezos benchmarks.</p> +<p>Lastly, we continue to test alternative libraries and compare them with the ones integrated in Tezos; however, using these alternative libraries to build Tezos nodes has proven to be more complicated than merely adding them in Irmin and running our benchmarks. While testing continues on most new libraries, we can definitely use replays to compare our <a href="https://github.com/mirage/cactus/">new <code>cactus</code> library</a> as a replacement for our <a href="https://github.com/mirage/index/"><code>index</code> library</a>.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#future-directions" aria-label="future directions permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Future Directions</h3> +<p>While the <em>action trace</em> recording was only made possible on a development branch of Octez, we would next like to upstream the feature to the main branch of Octez, which would give all users the option to record Tezos-Irmin interactions. This would simplify bug reporting overall.</p> +<p>Although the first version only deals with the <em>bootstapping</em> phase of Tezos, an upcoming goal is to make it possible to benchmark the <em>boostrapped</em> phase of Tezos as well. Additionally, we plan to replay the multiprocess aspects of a Tezos node in the near future.</p> +<p>The first stable version of this benchmark has existed in Irmin&rsquo;s development branch since Q2 2021, and we will release it as part of <code>irmin-bench</code> for Irmin 3.0 in Q4 2021. This release will allow integration into the <a href="https://github.com/ocaml-bench/sandmark">Sandmark OCaml</a> benchmarking suite.</p> +<p>Follow the Tarides blog for future Irmin updates.</p>https://tarides.com/blog/2021-10-04-the-new-replaying-benchmark-in-irminThe New Replaying Benchmark in Irmin2021-10-04T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>The last upgrade of the Tezos protocol, Granada, activated on August 6th, 2021. +We are now glad to announce a new protocol proposal, Hangzhou, the result of a +collaborative work from various teams.</p> +<p><em>This is a joint post with <a href="https://www.nomadic-labs.com/">Nomadic Labs</a>, +<a href="https://marigold.dev/">Marigold</a>, <a href="https://www.oxheadalpha.com/">Oxhead Alpha</a> +and <a href="https://www.dailambda.jp/">DaiLambda</a>.</em></p>https://marigold.dev/blog/announcing-hangzhou/Announcing Tezos’ 8th protocol upgrade proposal: Hangzhou2021-09-21T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Last year, Tarides had the honour of winning the &ldquo;Coup de Coeur&rdquo; Startup Award +at the International Cybersecurity Forum (FIC). It&rsquo;s the leading cybersecurity +event in the EU. It&rsquo;s both a forum, to present and discuss innovations and +reflect on the state of the European cybersecurity ecosystem, and a trade fair, +where cybersecurity and other tech professionals can meet and network.</p> +<p>This year, Tarides returns to FIC with their own booth! Our representatives, +including founder and CEO of Tarides, Thomas Gazagnaire, look forward to making +new connections, looking for collaborators, and catching up with colleagues.</p> +<p>The FIC theme for 2021 centres on encouraging &ldquo;a collective and collaborative +cybersecurity.&rdquo; From the <a href="https://www.forum-fic.com/en/home/discover/what-is-the-fic.htm">FIC +Website</a>:</p> +<blockquote> +<p>&rdquo;Collective, because each stakeholder is responsible not only for its own +security but also for the security of every other stakeholder, and therefore of +the whole. Collaborative, because cooperation and information sharing are +essential to compensate the asymmetry between the &laquo;attacker&raquo; and the +&laquo;defender&raquo;&hellip;FIC 2021 will focus on the major operational, industrial, +technological, and strategic challenges of cooperation&rdquo;</p> +</blockquote> +<p>Solutions for most cybersecurity issues already exist, and we at Tarides are +happy to consult on our many solutions that can make your systems more secure. +From secure emails to runtime protection to secure IoT protocols, our +representatives can help!</p> +<p>So stop by the Tarides booth at FIC to chat about your options. We&rsquo;ll have +snacks and other goodies, too!</p>https://tarides.com/blog/2021-09-06-tarides-returns-to-fic-2021Tarides Returns to FIC 20212021-09-06T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Regular CI systems are optimised for workloads that do not require stable performance over time. This makes them unsuitable for running performance benchmarks.</p> +<p><a href="https://github.com/ocurrent/current-bench"><code>current-bench</code></a> provides a predictable environment for performance benchmarks and a UI for analysing results over time. Similar to a CI system, it runs on pull requests and branches which allows performance to be analysed and compared. It can currently be enabled as an app on GitHub repositories with zero configuration. Several public repositories are running<code>current-bench</code>, including <a href="https://github.com/mirage/irmin">Irmin</a> and <a href="https://github.com/ocaml/dune">Dune</a>. We plan to enable it on more projects in the future.</p> +<p>In this article, we give a technical overview of <code>current-bench</code>, showing how results are collected and analysed, requirements for using it and how we built the infrastructure for stable benchmarks. We also describe future work that would allow more OCaml projects to run <code>current-bench</code>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#introduction" aria-label="introduction permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Introduction</h2> +<p>For performance critical software, we must run benchmarks to ensure that there's no regression. Running benchmarks before the user submits their pull request is tedious, and since every user might have a different machine, you can't be sure if the benchmarks performed actually improved or regressed performance.</p> +<p>Our <code>current-bench</code> aims to solve this problem by providing a stable benchmarking platform that runs every time the user submits a pull request and compares the result to the benchmarks on the main branch. As <code>current-bench</code> is zero-configuration, users can enroll their repository to run benchmarks with ease. This <code>current-bench</code> has helped projects ensure that regression doesn't happen, so you can merge code with more confidence.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#architecture" aria-label="architecture permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Architecture</h2> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/2924487e1eab06d8ffdb4bc2e3cdbbd1/37523/current-bench-arch.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 75.29411764705883%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/2924487e1eab06d8ffdb4bc2e3cdbbd1/c5bb3/current-bench-arch.png" class="gatsby-resp-image-image" alt="Figure 1: Current bench architecture" title="Figure 1: Current bench architecture" srcset="/static/2924487e1eab06d8ffdb4bc2e3cdbbd1/04472/current-bench-arch.png 170w, +/static/2924487e1eab06d8ffdb4bc2e3cdbbd1/9f933/current-bench-arch.png 340w, +/static/2924487e1eab06d8ffdb4bc2e3cdbbd1/c5bb3/current-bench-arch.png 680w, +/static/2924487e1eab06d8ffdb4bc2e3cdbbd1/37523/current-bench-arch.png 720w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#benchmarking-pipeline" aria-label="benchmarking pipeline permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Benchmarking Pipeline</h3> +<p>As shown in Figure 1 (above), the benchmarking infrastructure uses <code>ocurrent</code><sup><a href="https://icfp20.sigplan.org/details/ocaml-2020-papers/6/OCaml-CI-A-Zero-Configuration-CI">1</a></sup>, an embedded Domain Specific Language to write a pipeline. The <code>ocurrent</code> command computes the build incrementally and helps with static analysis. Whenever a pull request is opened on a repository monitored by <code>current-bench</code>, a <code>POST</code> request is sent to the server running the pipeline. The pipeline fetches the head commit on the pull request and uses Docker to compile the code, and then it runs the <code>make bench</code> command inside the generated Docker image.</p> +<p>The pipeline runs on a single node, and the process is pinned to a single core to ensure there's no contention of resources when running the benchmarks. Once finished, the raw JSON result is stored in a <code>Postgres</code> database, which the frontend can query using a <code>GraphQL</code> API, as shown in Figure 2 below.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/766e96b985de064519c776b0c59635b7/d0c2f/current-bench-ui.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 63.52941176470588%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/766e96b985de064519c776b0c59635b7/c5bb3/current-bench-ui.png" class="gatsby-resp-image-image" alt="Figure 2: Current bench UI" title="Figure 2: Current bench UI" srcset="/static/766e96b985de064519c776b0c59635b7/04472/current-bench-ui.png 170w, +/static/766e96b985de064519c776b0c59635b7/9f933/current-bench-ui.png 340w, +/static/766e96b985de064519c776b0c59635b7/c5bb3/current-bench-ui.png 680w, +/static/766e96b985de064519c776b0c59635b7/b12f7/current-bench-ui.png 1020w, +/static/766e96b985de064519c776b0c59635b7/b5a09/current-bench-ui.png 1360w, +/static/766e96b985de064519c776b0c59635b7/d0c2f/current-bench-ui.png 1362w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>The frontend supports historical navigation and provides comparison with the default branch. It allows users to select a pull request of which they want to see the graphs. The graphs display the individual result of the head commit and the comparison with the commits on the default branch. The frontend permits users to select the historical interval when they want to compare benchmarks, and it also shows the standard deviation. Once the benchmarks have run successfully, the pipeline sets the pull request status to the frontend URL. Then the user can look at the graphs.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#hardware-optimisation" aria-label="hardware optimisation permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Hardware Optimisation</h3> +<p>Our <code>current-bench</code> uses the hardware optimisations developed for OCaml multicore compiler benchmarks <a href="https://github.com/ocaml-bench/ocaml_bench_scripts#notes-on-hardware-and-os-settings-for-linux-benchmarking">(presented at ICFP OCaml Workshop 2019)</a> with a few modifications to allow the benchmarks to run inside Docker containers. To get stable performance, we configured the kernel to isolate some of the CPU cores. Linux then avoids scheduling other user processes automatically. We also disabled IRQ handling and power saving.</p> +<p>The container that runs the benchmark is pinned to one of the isolated cores. Since I/O operations can make the benchmarks less stable, we use an in-memory <code>tmpfs</code> partition in <code>/dev/shm</code> for all storage. For NUMA enabled systems, we configure this partition to be allocated on the NUMA node of the isolated core. The pipeline disables ASLR inside the container automatically, which is normally blocked by the default Docker seccomp profile, so we have modified the profile to allow the <code>personality(2)</code> syscall.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#enrolling-a-repository" aria-label="enrolling a repository permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Enrolling a repository</h2> +<p>To enroll a repository, you need to ensure the following:</p> +<ul> +<li>Enable the <a href="https://github.com/marketplace/ocaml-benchmarks">ocaml-benchmarks</a> GitHub app for your repository.</li> +<li>The repository needs a <code>bench</code> Makefile target. This is triggered from the <code>current-bench</code> pipeline.</li> +<li>The output of the <code>make bench</code> target is JSON, which can be parsed by the pipeline and displayed by the frontend.</li> +</ul> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#future-work" aria-label="future work permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Future work</h2> +<p>Anyone who wants to roll out a continuous, zero-configured benchmarking infrastructure can set up the current-bench infrastructure. In the future, we want to scale <code>current-bench</code> by isolating cores on multiple machines and adding a scheduler to ensure that benchmarks use only one core at a time per machine. We plan to add support for different benchmarking libraries that repositories can use&mdash;for example, we currently support repositories using <code>bechamel</code>. We also aim to make the adoption of <code>current-bench</code> easier by adding a conversion library that can convert any benchmark output into output parseable by <code>current-bench</code>. We intend to add support for <code>quick</code> and <code>slow</code> benchmarks, which would allow users to have faster feedback loops on pull requests while ensuring they can still run more extensive, time consuming benchmarks to see the performance.</p> +<p>Thank you for reading! You can check out the implementation for <code>current-bench</code> <a href="https://github.com/ocurrent/current-bench">here</a>!</p>https://tarides.com/blog/2021-08-26-benchmarking-ocaml-projects-with-current-benchBenchmarking OCaml projects with current-bench2021-08-26T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>This year marks the 25th anniversary of the OCaml Language! It's an exciting +time for OCaml programmers and enthusiasts. A fun and informative way to +celebrate OCaml's birthday is to attend the <a href="https://icfp21.sigplan.org/home/ocaml-2021">26th Annual International +Conference on Functional +Programming</a> (ICFP), held online +this year due to ongoing Covid restrictions. While this is disappointing news +for so many, it's beneficial to those of you outside France because now you +can hear professionals talk about cutting edge technology from the comfort of +your own home.</p> +<p>Tarides engineers, as well as our colleagues at <a href="https://tarides.com/ocamllabs.io">OCaml Labs +Consultancy</a> and <a href="https://segfault.systems/">Segfault Systems</a>, +have some exciting presentations at this year's ICFP! Listen to talks on running +OCaml on multiple cores, generating fuzzing suites, benchmarking, and the +experimental OCaml effects.</p> +<p>You can search the [complete ICFP +Timetable](<a href="https://icfp21.sigplan.org/program/program-icfp-2021/?past=Show">https://icfp21.sigplan.org/program/program-icfp-2021/?past=Show</a> +upcoming events only&amp;date=Fri 27 Aug 2021) for other topics of interest and read +below about our engineers' projects and presentations. Times are listed both in +London (GMT +1) and Paris (GMT +2) for ease of planning. The following talks are +scheduled for Friday, 27 August 2021.</p> +<p>Grab a cup of coffee for our first morning talk at <strong>9am London / 10am Paris</strong> +and learn about <strong>Adapting the OCaml Ecosystem for Multicore OCaml.</strong> With the +soon-to-be released OCaml 5.0, there will be support for Shared-Memory +Parallelism. There&rsquo;s increasing interest in the community to port existing +libraries to Multicore, so this talk will cover the arrival of Multicore and +what that means to the OCaml ecosystem. Our engineers will highlight existing +tools and provide methods for a smooth transition, so viewers can benefit from +Multicore parallelism. They'll also share some insights from their experience +porting existing libraries to Multicore OCaml.</p> +<p>Read more about this topic on todays' post at <a href="https://segfault.systems/blog/2021/adapting-to-multicore/">Segfault +Systems</a>, written by +one of tomorrow's presenters, <a href="https://icfp21.sigplan.org/profile/sudhaparimala">Sudha +Parimala</a> of Segfault Systems. +Joining Sudha for the presentation are <a href="https://icfp21.sigplan.org/profile/enguerranddecorne1">Enguerrand +Decorne</a> (Tarides), +<a href="https://icfp21.sigplan.org/profile/sadiqjaffer">Sadiq Jaffer</a> (Opsian and OCaml +Labs Consultancy), <a href="https://icfp21.sigplan.org/profile/tomkelly">Tom Kelly</a> +(OCaml Labs Consultancy), and <a href="https://icfp21.sigplan.org/profile/kcsivaramakrishnan">KC +Sivaramakrishnan</a> of IIT +Madras.</p> +<p>Next up is <strong>Leveraging Formal Specifications to Generate Fuzzing Suites</strong> at +<strong>11:10 London / 12:10 Paris,</strong> presented by Tarides's own <a href="https://icfp21.sigplan.org/profile/nicolasosborne">Nicolas +Osborne</a> and <a href="https://icfp21.sigplan.org/profile/clementpascutto">Cl&eacute;ment +Pascutto</a>. They'll discuss +how developers typically first have to capture the semantics they want when +checking a library and then write the code implementing these tests and find +relevant test cases that expose possible misbehaviours. Through their work, +they'll present a tool that automatically takes care of those last two steps by +automatically generating fuzz testing suites from OCaml interfaces annotated +with formal behavioural specifications. They'll also show some ongoing +experiments on fuzzing capabilities and limitations applied to real-world +libraries.</p> +<p>Next up is our talk on <strong>Continuous Benchmarking for +OCaml Projects</strong> at <strong>12:30 London / 13:30 Paris</strong>. Regular CI systems are +optimised for workloads that do not require stable performance over time, which +makes them unsuitable for running performance benchmarks. Tarides engineers +<a href="https://icfp21.sigplan.org/profile/gargisharma">Gargi Sharma</a>, <a href="https://icfp21.sigplan.org/profile/rizoisrof">Rizo +Isrof</a>, and <a href="https://icfp21.sigplan.org/profile/magnusskjegstad">Magnus +Skjegstad</a> will discuss how +<code>current-bench</code> provides a predictable environment for performance benchmarks +and a UI for analysing results over time. Similar to a CI system it runs on pull +requests and branches allowing performance to be analysed and compared, and it +can currently be enabled on as an app on GitHub repositories with zero +configuration. Several public repositories already run <code>current-bench</code>, +including <a href="https://github.com/mirage/irmin">Irmin</a> and +<a href="https://github.com/ocaml/dune">Dune</a>, and they plan to enable it on more +projects in the future. <a href="https://tarides.com/blog/2021-08-26-benchmarking-ocaml-projects-with-current-bench">Read Gargi's recent blog post for more information on +benchmarking</a>.</p> +<p>In this presentation, they will give a technical overview of <code>current-bench</code>, showing how results are collected and analysed, requirements for using it, and how they built the infrastructure for stable benchmarks. They'll also cover some future work that will allow more OCaml projects to run <code>current-bench</code>.</p> +<p>Immediately after the Benchmarking talk, catch <strong>A Multiverse of Glorious Documentation</strong> +scheduled at <strong>12:50 London / 13:50 Paris.</strong> <a href="https://icfp21.sigplan.org/profile/lucaspluvinage1">Lucas +Pluvinage</a> of Tarides and +<a href="https://icfp21.sigplan.org/profile/jonathanludlam">Jonathan Ludlam</a> of OCaml +Labs Consultancy will discuss the process of generating documentation for every +version of every package that can be built from the Opam repository and present +it as a single coherent website that's continuously updated as new packages are +released and old packages are updated. They will address the challenges of +caching, handling different compiler versions, and incompatible libraries. The +process has been implemented as an OCurrent pipeline named <code>ocaml-docs-ci</code> and +is already available on Github. It has been used to produce the documentation of +more than 10,000 package versions, generating 2.5M HTML pages. That's 38GB of +artifacts!</p> +<p>After a relaxing lunch, come back for <strong>Experiences with Effects</strong> at <strong>15:30 +London / 16:30 Paris</strong>. Join OCaml Labs and Tarides engineers <a href="https://icfp21.sigplan.org/profile/thomasleonard">Thomas +Leonard</a>, <a href="https://icfp21.sigplan.org/profile/craigferguson">Craig +Ferguson</a>, <a href="https://icfp21.sigplan.org/profile/patrickferris">Patrick +Ferris</a>, <a href="https://icfp21.sigplan.org/profile/sadiqjaffer">Sadiq +Jaffer</a>, <a href="https://icfp21.sigplan.org/profile/tomkelly">Tom +Kelly</a>, <a href="https://icfp21.sigplan.org/profile/kcsivaramakrishnan">KC +Sivaramakrishnan</a>, and +<a href="https://icfp21.sigplan.org/profile/anilmadhavapeddy">Anil Madhavapeddy</a> as they +talk about an exciting, experimental branch of Multicore OCaml that adds support +for effect handlers. In this presentation, they'll discuss their experiences +with effects, both from converting existing code and from writing new code. They +discovered that converting the Angstrom parser from a callback style to effects +greatly simplified the code while also improving performance and reducing +allocations. Their <a href="https://github.com/ocaml-multicore/eio">experimental Eio +library</a> uses effects that allows +writing concurrent code in direct style, without the need for monads (as found +in Lwt or Async).</p> +<p>Enjoy a full day of OCaml innovation and get to know some of our talented +engineers better by joining Tarides, OCaml Labs, and Segfault Systems at ICFP on +Friday, 27 August 2021. See you there!</p>https://tarides.com/blog/2021-08-26-tarides-engineers-to-present-at-icfp-2021Tarides Engineers to Present at ICFP 20212021-08-26T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Tarides takes great pride in a diverse workforce and strives to continue +bringing talented people to its team from around the globe. This is why Sonja +Heinze, a Tarides software engineer, and the Head of HR, H&eacute;lo&iuml;se Lutton, will +attend WomenHack, an online event dedicated to recruiting more women into the +tech world. They're participating not only to present Tarides to the Women In +Tech community, but to also network and possibly find new talented programmers +to join our growing team.</p> +<p>The <strong>WomenHack</strong> event has a unique setup similar to 'speed dating,' but for +prospective jobs. Each candidate is paired with a company, and they have 5 +minutes to chat before moving on to the next company. This &quot;rapid interview&quot; is +both fun and efficient for all involved, and it ensures each company will meet +several talented candidates from diverse backgrounds, which increases their +chance of finding that perfect fit for an open position!</p> +<p>From their website:</p> +<blockquote> +<p><strong><a href="https://womenhack.com/">WomenHack</a></strong> is a community that empowers women in +tech through events, jobs, and reviews. We aim to create a more inclusive and +diverse workplace for all. Our diversity recruiting events target some of the +most talented women in tech which include software developers, designers, and +product talent.&nbsp;</p> +</blockquote> +<p>Join us tomorrow, July 21st, at WomenHack! You can <a href="https://womenhack.com/events/72907/?tickets">get your ticket +here</a>.</p>https://tarides.com/blog/2021-07-20-tarides-at-womenhack-virtual-eventTarides at WomenHack Virtual Event2021-07-20T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Tarides is excited to announce that our CEO, Dr. Thomas Gazagnaire, and Prof. +Anil Madhavapeddy, from the University of Cambridge, will present their +innovative platform OSMOSE at the Open Source Innovation Sprint (OSIS) +conference on 1 July 2021. This event is organized by +<a href="https://systematic-paris-region.org">Systematic</a>.</p> +<p>OSMOSE is a software platform made to manage digital infrastructure at scale, +securely and efficiently. It uses the groundbreaking creation of unikernels to +radically simplify the way applications are built and deployed for the cloud.</p> +<p>Since digital transformation has become more and more dependent on cloud +computing, it has increasing problems with high response latency, security +risks, and resource inefficiencies&mdash;all which make these services ultimately +unreliable. The demand for interconnected devices is ever increasing, but the +security of these devices remain unchecked, making them susceptible to security +vulnerabilities. This leaves consumers and businesses open to exploitation, as +demonstrated in reports of tech devices violating users&rsquo; privacy, like sending +audio recordings without their knowledge or consent.</p> +<p>Tarides addresses these issues with OSMOSE, a platform which combines hardware +and software elements to invert the current cloud-centric model. OSMOSE securely +connects with physical spaces to provide extremely low latency and +high-bandwidth, local-area computation capabilities, which can turn a fleet of +IoT devices into a local data-centre.</p> +<p>This innovative platform enables computer resources to be tracked efficiently +and temporarily rented to users. This turns any IoT deployment into a local, +private cloud, allowing a better utilization of local resources and improved +security.</p> +<p>Major components of OSMOSE already have commercial applications. It&rsquo;s been used +to make existing cloud deployments more secure and efficient by companies such +as Amazon, Citrix, and Docker.</p> +<p>Tarides applies a high-touch, mixed business strategy&mdash;using consultancy services +to field test open source components under development. Tarides applies their +research to real-world systems to build unikernels, a secure-by-design and +resource-efficient application specialised to their run-time environments. If +interested in using OSMOSE as a solution for your business, +<a href="https://tarides.com/company/">please reach out to Tarides</a> for more +information.</p> +<p><a href="https://systematic-paris-region.org/evenement/open-source-innovation-spring-edge-iot/">Register for OSIS</a> +to attend the OSMOSE presentation on 1 July 2021.</p>https://tarides.com/blog/2021-06-29-tarides-introduces-osmose-at-the-open-source-innovation-sprintTarides Introduces OSMOSE at the Open-Source Innovation Sprint2021-06-29T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p><strong>Tarides is taking part in the Data Portability &amp; Services Incubator (DAPSI), a +3-year EU funded project that empowers internet innovators to develop new +solutions in the Data Portability field.</strong></p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#what-is-dapsi" aria-label="what is dapsi permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What is DAPSI?</h2> +<p>The <a href="https://dapsi.ngi.eu">Data Portability and Services Incubator (DAPSI)</a> is +an EU funded project, under the European Commission&rsquo;s Next Generation Internet +(NGI) initiative. The aim of this initiave is to empower top internet innovators +to develop human-centric solutions. DAPSI addresses the challenge of personal +data portability on the internet, as foreseen under the GDPR and make it +significantly easier for citizens to have any data which is stored with one +service provider transmitted directly to another provider.</p> +<p>Take a look at the <a href="https://dapsi.ngi.eu/hall-of-fame/">DAPSI innovators +portfolio</a> to see more information about the +selected projects.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#what-is-our-project" aria-label="what is our project permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What is our project?</h2> +<p>Our project, called <strong>SCoP</strong> for <strong>Secure-by-design Communication Protocols</strong>, +is taking part in the DAPSI to tackle data portability issues in communication +services.</p> +<p>Over the past few decades, the usage of emails has been massively widespread by +both individuals and companies. Billions of emails are sent every day and this +number is expected to increase to reach 333 billion of emails exchanged daily +in 2022. Moreover, as managing internet communication stacks have become +increasingly complex, end-users have tended to entrust this task to third-party +companies like Google and Microsoft. Furthermore, existing implementations of +these communication services rely on ad-hoc methodologies and memory-unsafe +languages, where minor developer errors can easily escalate into major security +flaws. The centralization of these communication services means that a single +successful attack leads to major personal data breaches.</p> +<p>To fix this issue, <strong>our project aims to engineer a modern basis for open +messaging that supports existing protocols such as emails but is also extensible +and customizable for emerging protocols such as matrix</strong>. We will be building +trustable implementations of these open protocols using type-safe languages and +we will deploy these implementations as specialized, secure and resource +efficient unikernels. They will become the basis of the communication system of +OSMOSE, Tarides&rsquo; commercial solution for secure-by-design IoT infrastructure.</p> +<p>Every component of that system will be carefully designed as independent +libraries, using modern development techniques to avoid the common reported +threats and flaws. For instance, the implementation of protocol parsers and +serializers will be written in a type-safe language and will be using fuzzing, +e.g state-of-the-art coverage-driven tests. The combination of these techniques +will increase users&rsquo; trust to migrate their personal data to these new secure +services.</p> +<p>Moreover, these techniques are also useful to produce a large and reusable +corpus of test materials, which we plan to release separately for other +implementations to use. It will give the tools to other developers to write the +next generation of messaging applications by extending the existing protocols +with more confidence.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#want-to-be-part-of-it" aria-label="want to be part of it permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Want to be part of it?</h2> +<p>Would you like to hear more about the project? Or want to deploy our solution? +This project will build on a number of existing components in +<a href="https://mirage.io">MirageOS</a>, such as +<a href="https://tarides.com/blog/2019-09-25-mr-mime-parse-and-generate-emails">MrMime</a> +and <a href="https://tarides.com/blog/2020-09-08-irmin-september-2020-update">Irmin</a>, +so feel free to contribute to these existing components! Please reach out to <a href="mailto:contact@tarides.com"></a><a href="mailto:contact@tarides.com">contact@tarides.com</a>.</p> +<br/> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/50ddca27efa367497d954f667fc921f8/a76d6/DAPSI_generic.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 3.5294117647058822%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/50ddca27efa367497d954f667fc921f8/7bf67/DAPSI_generic.jpg" class="gatsby-resp-image-image" alt="Sequence of entity logos: in association with NGI, EU, Zabala, FGS, +cap-digital, IMT Starter, Fraunhofer IAIS." title="Sequence of entity logos: in association with NGI, EU, Zabala, FGS, +cap-digital, IMT Starter, Fraunhofer IAIS." srcset="/static/50ddca27efa367497d954f667fc921f8/651be/DAPSI_generic.jpg 170w, +/static/50ddca27efa367497d954f667fc921f8/d30a3/DAPSI_generic.jpg 340w, +/static/50ddca27efa367497d954f667fc921f8/7bf67/DAPSI_generic.jpg 680w, +/static/50ddca27efa367497d954f667fc921f8/990cb/DAPSI_generic.jpg 1020w, +/static/50ddca27efa367497d954f667fc921f8/a76d6/DAPSI_generic.jpg 1139w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p>https://tarides.com/blog/2021-04-30-scop-selected-for-dapsi-initiativeTarides project SCoP is selected as one of the brightest Data Portability projects in Europe!2021-04-30T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>In collaboration with Nomadic Labs, Marigold and DaiLambda, we're happy to +announce the completion of the next Tezos protocol proposal: +<a href="http://doc.tzalpha.net/protocols/009_florence.html"><strong>Florence</strong></a>.</p> +<p><a href="https://tezos.com/">Tezos</a> is an open-source decentralised blockchain network providing a +platform for smart contracts and digital assets. A crucial feature of Tezos is +<a href="https://tezos.com/static/white_paper-2dc8c02267a8fb86bd67a108199441bf.pdf"><em>self-amendment</em></a>: the network protocol can be upgraded +dynamically by the network participants themselves. This amendment process is +initiated when a participant makes a <em>proposal</em>, which is then subject to a +vote. After several years working on the Tezos storage stack, this is our first +contribution to a proposal; we hope that it will be the first of many!</p> +<p>As detailed in today's <a href="https://blog.nomadic-labs.com/florence-our-next-protocol-upgrade-proposal.html">announcement from Nomadic Labs</a>, +the Florence proposal contains several important changes, from the introduction +of Baking Accounts to major quality-of-life improvements for smart contract +developers. Of all of these changes, we're especially excited about the +introduction of <em>sub-trees</em> to the blockchain context API. In this post, we'll +give a brief tour of what these sub-trees will bring for the future of Tezos. +But first, what <em>are</em> they?</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#merkle-sub-trees" aria-label="merkle sub trees permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Merkle sub-trees</h3> +<p>The Tezos protocol runs on top of a versioned tree called the &ldquo;context&rdquo;, which +holds the chain state (balances, contracts etc.). Ever since the pre-Alpha era, +the Tezos context has been implemented using <a href="https://github.com/mirage/irmin">Irmin</a> &ndash; an open-source +Merkle tree database originally written for use by MirageOS unikernels.</p> +<p>For MirageOS, Irmin&rsquo;s key strength is flexibility: it can run over arbitrary +backends. This is a perfect fit for Tezos, which must be agile and +widely-deployable. Indeed, the Tezos shell has already leveraged this agility +many times, all the way from initial prototypes using a Git backend to the +optimised <a href="https://tarides.com/blog/2020-09-01-introducing-irmin-pack"><code>irmin-pack</code></a> implementation used today.</p> +<p>But Irmin can do more than just swapping backends! It also allows users to +manipulate the underlying Merkle tree structure of the store with a high-level +API. This &ldquo;<a href="https://mirage.github.io/irmin/irmin/Irmin/module-type-S/Tree/">Tree</a>&rdquo; API enables lots of interesting use-cases of +Irmin, from mergeable data types (<a href="https://kcsrk.info/papers/banyan_aplas20.pdf">MRDTs</a>) to zero-knowledge proofs. +Tezos doesn't use these more powerful features directly yet; that&rsquo;s where Merkle +proofs come in!</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#proofs-and-lightweight-tezos-clients" aria-label="proofs and lightweight tezos clients permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Proofs and lightweight Tezos clients</h3> +<p>Since the Tezos context keeps track of the current &quot;state&quot; of the blockchain, +each participant needs their own copy of the tree to run transactions against. +This context can grow to be very large, so it's important that it be stored as +compactly as possible: this goal shaped the design of <code>irmin-pack</code>, our latest +Irmin backend.</p> +<p>However, it's possible to reduce the storage requirements even further via the +magic of Merkle trees: individuals only need to store a <em>fragment</em> of the root +tree, provided they can demonstrate that this fragment is valid by sending +&ldquo;<a href="https://bentnib.org/posts/2016-04-12-authenticated-data-structures-as-a-library.html">proofs</a>&rdquo; of its membership to the other participants.</p> +<p>This property can be used to support ultra-lightweight Tezos clients, a feature +<a href="https://gitlab.com/smelc/tezos/-/commits/tweag-client-light-mode">currently being developed</a> by TweagIO. To make this a reality, +the Tezos protocol needs fine-grained access to context sub-trees in order build +Merkle proofs out of them. Fortunately, Irmin already supports this! We +<a href="https://gitlab.com/tezos/tezos/-/merge_requests/2457">extended the protocol</a> to understand sub-trees, lifting the power +of Merkle trees to the user.</p> +<p>We&rsquo;re excited to work with TweagIO and Nomadic Labs on lowering the barriers to +entering the Tezos ecosystem and look forward to seeing what they achieve with +sub-trees!</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#efficient-merkle-proof-representations" aria-label="efficient merkle proof representations permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Efficient Merkle proof representations</h3> +<p>Simply exposing sub-trees in the Tezos context API isn&rsquo;t quite enough: +lightweight clients will also need to <em>serialize</em> them efficiently, since proofs +must be exchanged over the network to establish trust between collaborating +nodes. Enter <a href="https://dailambda.jp/blog/2020-05-11-plebeia/">Plebeia</a>.</p> +<p>Plebeia is an alternative Tezos storage layer &ndash; developed by DaiLambda &ndash; with +strengths that complement those of Irmin. In particular, Plebeia is capable of +generating very compact Merkle proofs. This is partly due to its specialized +store structure, and partly due to clever optimizations such as path compression +and inlining.</p> +<p>We&rsquo;re working with the DaiLambda team to unite the strengths of Irmin and +Plebeia, which will bring built-in Merkle proof support to the Tezos storage +stack. The future is bright for Merkle proofs in Tezos!</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#baking-account-migrations" aria-label="baking account migrations permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Baking account migrations</h3> +<p>Trees don&rsquo;t just enable <em>new</em> features; they have a big impact on performance +too! Currently, indexing into the context always happens from its <em>root</em>, which +duplicates effort when accessing adjacent values deep in the tree. Fortunately, +the new sub-trees provide a natural representation for &ldquo;cursors&rdquo; into the +context, allowing the protocol to optimize its interactions with the storage +layer.</p> +<p>To take just one example, DaiLambda recently exploited this feature to reduce +the migration time necessary to introduce Baking Accounts to the network by a +factor of 15! We&rsquo;ll be teaming up with Nomadic Labs and DaiLambda to ensure that +Tezos extracts every bit of performance from its storage.</p> +<p>It's especially exciting to have access to lightning-fast storage migrations, +since this enables Tezos to evolve rapidly even as the ecosystem expands.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#storage-in-other-languages" aria-label="storage in other languages permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Storage in other languages</h3> +<p>Of course, Tezos isn&rsquo;t just an OCaml project: the storage layer also has a +performant Rust implementation as part of <a href="https://github.com/simplestaking/tezedge">TezEdge</a>. We&rsquo;re working with +<a href="https://github.com/simplestaking">Simple Staking</a> to bring Irmin to the Rust community via an +<a href="https://github.com/simplestaking/ocaml-interop">FFI toolchain</a>, enabling closer alignment between the different +Tezos shell implementations.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusion" aria-label="conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusion</h3> +<p>All in all, it&rsquo;s an exciting time to work on Tezos storage, with many +open-source collaborators from around the world. We&rsquo;re especially happy to see +Tezos taking greater advantage of Irmin&rsquo;s features, which will strengthen both +projects and help them grow together.</p> +<p>If all of this sounds interesting, you can play with it yourself using the +recently-released <a href="https://github.com/mirage/irmin">Irmin 2.5.0</a>. Thanks for reading, and stay tuned for +future Tezos development updates!</p>https://tarides.com/blog/2021-03-04-florence-and-beyond-the-future-of-tezos-storageFlorence and beyond: the future of Tezos storage2021-03-04T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Tarides is very glad to announce our partnership with <a href="https://adatechschool.fr">Ada Tech +School</a>.</p> +<p>Founded in 2019 and based in Paris (France), Ada Tech School, named for pioneer +computer scientist <a href="https://en.wikipedia.org/wiki/Ada_Lovelace">Ada Lovelace</a>, +is a programming school designed for women but open to all. The program is +driven by three values: feminism, empathy and singularity. Its mission is to +facilitate access to programming positions and promote the feminization of tech, +by creating training that tackles the gender and cultural biases of IT.</p> +<p>Unfortunately, the diversity of the candidate pool is very limited when a +company tries to fill positions. Barely 10% of computer science students in +France are girls. Ada Tech School is an excellent initiative to democratize +software education amongst women. The school was created so that women can land +a job easily in the IT industry through rigorous training, and then offer +ongoing coaching and support to ascend the professional ladder within tech +companies.</p> +<p>At Tarides, we believe that a healthy team is a diverse one; and that trust, +fairness and inclusion are values needed to build a strong company.</p> +<p>We are committed to doing better, not only by hiring a diverse team and +providing a welcoming work environment, but also by putting people first at +every stage. This means providing fair and equitable compensation as well as +meaningful career advancement opportunities for every employee.</p> +<p>We believe that a great technology always derives from great people, regardless +of their background. Head <a href="https://tarides.com/company">here</a> to see our +currently-open positions.</p>https://tarides.com/blog/2021-02-15-partnering-for-more-diversity-in-techPartnering for more diversity in Tech2021-02-15T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Merlin is a language server for the OCaml programming language; that is, a daemon +that connects to your favourite text editor and provides the usual services of +an IDE: instant feedback on warnings and errors, autocompletion, &quot;type of the +code under the cursor&quot;, &quot;go to definition&quot;, etc. As we (Fr&eacute;d&eacute;ric Bour, Ulysse +G&eacute;rard and I) are about to do a new major release, we thought now would be a +good time to talk a bit about some of the changes that are going into this +release.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#project-configuration" aria-label="project configuration permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Project configuration</h2> +<p>Since its very first release, merlin has been getting information about the +project being worked on through a <code>.merlin</code> file, which used to be written by +the user, but is now often generated by build systems.</p> +<p>This had the advantage of being fairly simple: Merlin would just look in the +current directory if such a file existed, otherwise it would look in the parent +directories until it found one; and then read it. But there were also some +sore points: the granularity of the configuration is the directory not the file, +and this information is duplicated from the build system configuration (be it +dune, Makefiles, or, back in the days, ocamlbuild).</p> +<p>After years of thinking about it, we've finally decided to make some light +changes to this process. Since version 3.4, when it scans the filesystem Merlin +is now looking for either a <code>.merlin</code> file or a dune (or dune-project) file. And +when it finds one of those, it starts an external process in the directory where +that file lives, and asks that process for the configuration of the ml(i) file +being edited.</p> +<p>The process in charge of communicating the configuration to Merlin will either +be a specific dune subcommand (when a dune file is found), or a dedicated +<code>.merlin</code> reader program.</p> +<p>We see several advantages in doing things this way (rather than, for instance, +changing the format of <code>.merlin</code> files):</p> +<ol> +<li>this change is entirely backward compatible, and indeed the transition has +already happened silently; although dune is still emitting <code>.merlin</code> files, +this will only stop with dune 2.8.</li> +<li>externalizing the reading of <code>.merlin</code> files and simply requiring a +&quot;normalized&quot; version of the config (i.e. with no mention of packages, just of +flags and paths) allowed us to simplify the internals of Merlin.</li> +<li>talking to the build system directly not only gets us a much finer grained +configuration (which is important when you build different executables with +different flags in the same directory, or if you apply different ppxes to +different files of a library), it opens the door to getting a nicer behavior +of Merlin in some circumstances. For instance, the build system can (and +does) tell Merlin when the project isn't built. Currently we only report that +information to the user when he asks for errors, alongside all the other +(mostly rubbish) errors. Which is already helpful in itself. But in the +future we can start filtering the other errors to only report those that +would remain even after building the project (e.g. parse errors).</li> +</ol> +<p>There are however some changes to look out for:</p> +<ul> +<li>people who still use <code>.merlin</code> files but do not install Merlin using opam need +to make sure to also have the <code>dot-merlin-reader</code> binary in their PATH (it is +available as an opam package, but is also buildable from Merlin's git +repository)</li> +<li>vim and emacs users who could previously load packages interactively (by +calling <code>M-x merlin-use</code> or <code>:MerlinUse</code>) cannot do that anymore, since Merlin +itself stopped linking with findlib. They'll have to write a <code>.merlin</code> file.</li> +</ul> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#dropping-support-for-old-versions-of-ocaml" aria-label="dropping support for old versions of ocaml permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Dropping support for old versions of OCaml</h2> +<p>Until now, every release of Merlin has kept support from OCaml 4.02 to the +latest version of OCaml available at the time of that release.</p> +<p>We have done this by having one version of &quot;<em>the frontend</em>&quot; (i.e. handling of +buffer state, project configuration; analyses like <em>jump-to-definition</em>, +<em>prefix-completion</em>, etc.), but several versions of &quot;<em>the backend</em>&quot; (OCaml's +ASTs, parser and typechecker), and choosing at build time which one to use. +The reason for doing this instead of having, for instance, one branch of Merlin +per version of OCaml, is that while the backends are fairly stable once +released, Merlin's frontend keeps evolving. Having just one version of it makes +it easier to add features and fix bugs (patches don't need to be duplicated), +whilst ensuring that Merlin's behavior is consistent across every version of +OCaml that we support.</p> +<p>For this to work however, one needs a well defined API between the frontend and +all the versions of the backend. This implies mapping every versions of OCaml's +internal ASTs (which receive modifications from one version to the next), to a +unified one, so as to keep Merlin's various features version agnostic. But it +also means being resilient to OCaml's internal API changes. For instance between +4.02 and 4.11 there were big refactorings impacting: the way one accesses the +typing environment, the way one accesses the &quot;load path&quot; (the part of the file +system the compiler/Merlin is aware of), the way error message are produced, ...</p> +<p>The rate of changes on the compiler is a lot higher than what it was when we +first started Merlin (7 years ago now!) which doesn't just mean that we have to +spend more and more time on updating the common interface, but also that the +interface is getting harder to define. Recently (with the 4.11 release) some of +the changes were significant enough that for some parts of the backend we just +didn't manage to produce a single interface to access old and new versions, so +instead we had to start duplicating and specializing parts of the frontend. +And we don't expect things to get much better in the near future.</p> +<p>Furthermore, Merlin's backends are patched to be more resilient to parsing and +typing errors in the user's code. Those patches also need to be evolved at each +new release of the compiler. +The work required to keep the &quot;unified interface&quot; working was taking time away +from updating our patches properly, and our support of user errors has slowly +been getting worse over the past few years, resulting in less precise type +information when asked, incomplete results when asking for auto-completion, etc.</p> +<p>Therefore we have decided to stop dragging older versions of OCaml along. We +plan to switch to a system where we have one branch of Merlin per version of +OCaml, and each opam release of Merlin will only be buildable with one version +of OCaml. We will keep maintaining all the relatively recent branches (that is: +4.02 definitely will not get fixes, but 4.06 is still in the clear). However, +all the new features will be developed against the latest version of OCaml and +cherry-picked to older branches if, and only if, there are no merge conflicts +and they work as expected without changes.</p> +<p>We hope that this will make it easier for us to update to new versions of OCaml +(actually, we already know it does, working on adding support for 4.12 was +easier than for any of the other recent versions), will allow us to clean up +Merlin's codebase (let's call that a work in progress), and will free some time +to work on new features.</p> +<p>You might wonder what all this changes for you, as a user, in practice. Well, it +depends:</p> +<ul> +<li>if you install Merlin from opam: nothing, or almost nothing. Everything that +you currently do with Merlin will keep working. In the future, perhaps some +new feature will appear that won't work on all versions. But that day hasn't +come yet.</li> +<li>if you install Merlin some other way (manually?): you can't just fetch master +and build it anymore. You have to pick the appropriate branch for your +version of OCaml.</li> +<li>if you're reusing Merlin's codebase as part of another project and (even +worse) have patches on it: come and talk to us if you haven't done so already! +We can try and integrate your patches, so that you only need to worry about +vendoring the right version(s) for your needs.</li> +</ul> +<hr/> +<p>Over the years, Merlin has received bugfixes and improvements from a long list of +people, but for the upcoming release Fr&eacute;d&eacute;ric and I are particularly grateful to +Rudi Grinberg, a long time and regular contributor who also maintains the OCaml +LSP project, as well as Ulysse G&eacute;rard, who joined our team a year ago now. They +are in particular the main authors of the work to improve the handling of +projects' configuration.</p> +<p>We hope you'll be as excited as us by all these changes!</p>https://tarides.com/blog/2021-01-26-recent-and-upcoming-changes-to-merlinRecent and upcoming changes to Merlin2021-01-26T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>The <a href="https://oxbridgewomenincs8.wixsite.com/2020">Oxbridge Women in Computer Science +conference</a> is an annual one-day +event hosted by the Universities of Oxford and Cambridge (UK). The +conference is free and open to everyone from any discipline, regardless of +gender identity. Its purpose is to spotlight the successes of women within +computer science and strengthen the network of women in computer science +within a supportive and friendly environment.</p> +<p>This year, the conference was organised by the University of Cambridge and was +held virtually on December 7th.</p> +<p>Tarides is very glad to sponsor this event as we strongly believe that diversity +and inclusive culture is a key factor in building a competitive and innovative +company. Our employees come from 8 different countries and are 1&frasl;3 women. +Tarides promotes transparency, openness and autonomy, creating a work atmosphere +auspicious for employees to strive in their work, to solve novel, impactful and +technical challenges. By working on open-source projects, a collaboration is +possible with worldwide experts from both academia and industry, encouraging +continuous training and education; in this context, it is very important to have +teams with diverse backgrounds and experience.</p> +<p>The underrepresentation of women in tech, and particularly in computer science, +is not a new problem and gender equality remains a major issue in the corporate +world. By celebrating female computer scientists, as during the Oxbridge Women +in Computer Science Conference, it will hopefully encourage more women to pursue +their interests and careers in the tech field. Head +<a href="https://tarides.com/company/">here</a> to see our currently-open positions.</p> +<p>For the event, we made a short video to experience a day in the life of a +software engineer:</p> +<div style="position: relative; width: 100%; height: 0; padding-bottom: 56.25%"> + <iframe style="position: absolute; width: 100%; height: 100%; left: 0; right: 0" src="https://www.youtube-nocookie.com/embed/5qK8elKNxKI" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="allowfullscreen"></iframe> +</div>https://tarides.com/blog/2020-12-14-tarides-sponsors-the-oxbridge-women-in-computer-science-conference-2020Tarides sponsors the Oxbridge Women in Computer Science Conference 20202020-12-14T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>At Tarides, we build many tools and writing UI is usually a tedious task. In this post we will see how to write functional UIs in OCaml using the <code>Nottui</code> &amp; <code>Lwd</code> libraries.</p> +<p>These libraries were developed for <a href="https://github.com/ocurrent/citty">Citty</a>, a frontend to the <a href="https://github.com/ocurrent/ocaml-ci">Continuous Integration service</a> of OCaml Labs.</p> +<div> + <video controls="controls" width="100%"> + <source src="./nottui-citty.mp4" type="video/mp4"></source> + <source src="./nottui-citty.webm" type="video/webm;codecs=vp9"></source> + </video> +</div> +<p>In this recording, you can see the lists of repositories, branches and jobs monitored by the CI service, as well as the result of job execution. Most of the logic is asynchronous, with all the contents being received from the network in a non-blocking way.</p> +<p><code>Nottui</code> extends <a href="https://github.com/pqwy/notty">Notty</a>, a library for declaring terminal images, to better suit the needs of UIs. <code>Lwd</code> (Lightweight Document) exposes a simple form of reactive computation (values that evolve over time). It can be thought of as an alternative to the DOM, suitable for building interactive documents. +They are used in tandem: <code>Nottui</code> for rendering the UI and <code>Lwd</code> for making it interactive.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#nottui--notty-with-layout-and-events" aria-label="nottui notty with layout and events permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Nottui = Notty with layout and events</h2> +<p>Notty exposes a nice way to display images in a terminal. A Notty image is matrix of characters with optional styling attributes (tweaking foreground and background colors, using <strong>bold</strong> glyphs...).</p> +<p>These images are pure values and can be composed (concatenated, cropped, ...) very efficiently, making them very convenient to manipulate in a functional way.</p> +<p>However these images are inert: their contents are fixed and their only purpose is to be displayed. Nottui reuses Notty images and exposes essentially the same interface but it adds two features: layout &amp; event dispatch. UI elements now adapt to the space available and can react to keyboard and mouse actions.</p> +<p><strong>Layout DSL</strong>. Specifying a layout is done using &quot;stretchable&quot; dimensions, a concept loosely borrowed from TeX. Each UI element has a fixed size (expressed as a number of columns and rows) and a stretchable size (possibly empty). The stretchable part is interpreted as a strength that is used to determine how to share the space available among all UI elements.</p> +<p>This is a simple system amenable to an efficient implementation while being powerful enough to express common layout patterns.</p> +<p><strong>Event dispatch</strong>. Reacting to mouse and keyboard events is better done using local behaviors, specific to an element. In Nottui, images are augmented with handlers for common actions. There is also a global notion of focus to determine which element should consume input events.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#interactivity-with-lwd" aria-label="interactivity with lwd permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Interactivity with Lwd</h2> +<p>Nottui's additions are nice for resizing and attaching behaviors to images, but they are still static objects. In practice, user interfaces are very dynamic: parts can be independently updated to display new information.</p> +<p>This interactivity layer is brought by Lwd and is developed separately from the core UI library. It is built around a central type, <code>'a Lwd.t</code>, that represents a value of type <code>'a</code> that can change over time.</p> +<p><code>Lwd.t</code> is an <a href="https://en.wikipedia.org/wiki/Applicative_functor">applicative functor</a> (and even a monad), making it a highly composable abstraction.</p> +<p>Primitive changes are introduced by <code>Lwd.var</code>, which are OCaml references with an extra operation <code>val get : 'a Lwd.var -&gt; 'a Lwd.t</code>. This operation turns a variable into a <em>changing value</em> that changes whenever the variable is set.</p> +<p>In practice this leads to a mostly declarative style of programming interactive documents (as opposed to the DOM that is deeply mutable). Most of the code is just function applications without spooky action at a distance! However, it is possible to opt-out of this pure style by introducing an <code>Lwd.var</code>, on a case-by-case basis.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#and-much-more" aria-label="and much more permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>And much more...</h2> +<p>A few extra libraries are provided to target more specific problems.</p> +<p><code>Lwd_table</code> and <code>Lwd_seq</code> are two datastructures to manipulate dynamic collections. <code>Nottui_pretty</code> is an interactive pretty printing library that supports arbitrary Nottui layouts and widgets. Finally <code>Tyxml_lwd</code> is a strongly-typed abstraction of the DOM driven by Lwd.</p> +<p>Version 0.1 has just been released on OPAM.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#getting-started" aria-label="getting started permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Getting started!</h2> +<p>Here is a small example to start using the library. First, install the Nottui library:</p> +<div class="gatsby-highlight" data-language="sh"><pre class="language-sh"><code class="language-sh">$ opam <span class="token function">install</span> nottui</code></pre></div> +<p>Now we can play in the top-level. We will start with a simple button that counts the number of clicks:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token operator">$</span> utop +<span class="token punctuation">#</span> <span class="token directive property">#require</span> <span class="token string">&quot;nottui&quot;</span><span class="token punctuation">;;</span> +<span class="token punctuation">#</span> <span class="token keyword">open</span> Nottui<span class="token punctuation">;;</span> +<span class="token punctuation">#</span> <span class="token keyword">module</span> W <span class="token operator">=</span> Nottui_widgets<span class="token punctuation">;;</span> +<span class="token comment">(* State for holding the number of clicks *)</span> +<span class="token punctuation">#</span> <span class="token keyword">let</span> vcount <span class="token operator">=</span> Lwd<span class="token punctuation">.</span>var <span class="token number">0</span><span class="token punctuation">;;</span> +<span class="token comment">(* Image of the button parametrized by the number of clicks *)</span> +<span class="token punctuation">#</span> <span class="token keyword">let</span> button count <span class="token operator">=</span> + W<span class="token punctuation">.</span>button <span class="token label property">~attr</span><span class="token punctuation">:</span>Notty<span class="token punctuation">.</span>A<span class="token punctuation">.</span><span class="token punctuation">(</span>bg green <span class="token operator">++</span> fg black<span class="token punctuation">)</span> + <span class="token punctuation">(</span>Printf<span class="token punctuation">.</span>sprintf <span class="token string">&quot;Clicked %d times!&quot;</span> count<span class="token punctuation">)</span> + <span class="token punctuation">(</span><span class="token keyword">fun</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">-&gt;</span> Lwd<span class="token punctuation">.</span>set vcount <span class="token punctuation">(</span>count <span class="token operator">+</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;;</span> +<span class="token comment">(* Run the UI! *)</span> +<span class="token punctuation">#</span> Ui_loop<span class="token punctuation">.</span>run <span class="token punctuation">(</span>Lwd<span class="token punctuation">.</span>map button <span class="token punctuation">(</span>Lwd<span class="token punctuation">.</span>get vcount<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;;</span></code></pre></div> +<p><strong>Note:</strong> to quit the example, you can press Ctrl-Q or Esc.</p> +<p>We will improve the example and turn it into a mini cookie clicker game.</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token comment">(* Achievements to unlock in the cookie clicker *)</span> +<span class="token punctuation">#</span> <span class="token keyword">let</span> badges <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token number">15</span><span class="token punctuation">,</span> <span class="token string">&quot;Cursor&quot;</span><span class="token punctuation">;</span> <span class="token number">50</span><span class="token punctuation">,</span> <span class="token string">&quot;Grandma&quot;</span><span class="token punctuation">;</span> <span class="token number">150</span><span class="token punctuation">,</span> <span class="token string">&quot;Farm&quot;</span><span class="token punctuation">;</span> <span class="token number">300</span><span class="token punctuation">,</span> <span class="token string">&quot;Mine&quot;</span><span class="token punctuation">]</span><span class="token punctuation">;;</span> +<span class="token comment">(* List the achievements unlocked by the player *)</span> +<span class="token punctuation">#</span> <span class="token keyword">let</span> unlocked_ui count <span class="token operator">=</span> + <span class="token comment">(* Filter the achievements *)</span> + <span class="token keyword">let</span> predicate <span class="token punctuation">(</span>target<span class="token punctuation">,</span> text<span class="token punctuation">)</span> <span class="token operator">=</span> + <span class="token keyword">if</span> count <span class="token operator">&gt;=</span> target + <span class="token keyword">then</span> Some <span class="token punctuation">(</span>W<span class="token punctuation">.</span>printf <span class="token string">&quot;% 4d: %s&quot;</span> target text<span class="token punctuation">)</span> + <span class="token keyword">else</span> None + <span class="token keyword">in</span> + <span class="token comment">(* Concatenate the UI elements vertically *)</span> + Ui<span class="token punctuation">.</span>vcat <span class="token punctuation">(</span>List<span class="token punctuation">.</span>filter_map predicate badges<span class="token punctuation">)</span><span class="token punctuation">;;</span> +<span class="token comment">(* Display the next achievement to reach *)</span> +<span class="token punctuation">#</span> <span class="token keyword">let</span> next_ui count <span class="token operator">=</span> + <span class="token keyword">let</span> predicate <span class="token punctuation">(</span>target<span class="token punctuation">,</span> <span class="token punctuation">_</span><span class="token punctuation">)</span> <span class="token operator">=</span> target <span class="token operator">&gt;</span> ciybt <span class="token keyword">in</span> + <span class="token keyword">match</span> List<span class="token punctuation">.</span>find_opt predicate badges <span class="token keyword">with</span> + <span class="token operator">|</span> Some <span class="token punctuation">(</span>target<span class="token punctuation">,</span> <span class="token punctuation">_</span><span class="token punctuation">)</span> <span class="token operator">-&gt;</span> + W<span class="token punctuation">.</span>printf <span class="token label property">~attr</span><span class="token punctuation">:</span>Notty<span class="token punctuation">.</span>A<span class="token punctuation">.</span><span class="token punctuation">(</span>st bold<span class="token punctuation">)</span> <span class="token string">&quot;% 4d: ???&quot;</span> target + <span class="token operator">|</span> None <span class="token operator">-&gt;</span> Ui<span class="token punctuation">.</span>empty<span class="token punctuation">;;</span> +<span class="token comment">(* Let's make use of the fancy let-operators recently added to OCaml *)</span> +<span class="token punctuation">#</span> <span class="token keyword">open</span> Lwd_infix<span class="token punctuation">;;</span> +<span class="token punctuation">#</span> <span class="token keyword">let</span> ui <span class="token operator">=</span> + <span class="token keyword">let</span><span class="token operator">$</span> count <span class="token operator">=</span> Lwd<span class="token punctuation">.</span>get vcount <span class="token keyword">in</span> + Ui<span class="token punctuation">.</span>vcat <span class="token punctuation">[</span>button count<span class="token punctuation">;</span> unlocked_ui count<span class="token punctuation">;</span> next_ui count<span class="token punctuation">]</span><span class="token punctuation">;;</span> +<span class="token comment">(* Launch the game! *)</span> +<span class="token punctuation">#</span> Ui_loop<span class="token punctuation">.</span>run ui<span class="token punctuation">;;</span></code></pre></div> +<div> + <video controls="controls"> + <source src="./nottui-cookie-clicker.mp4" type="video/mp4"></source> + <source src="./nottui-cookie-clicker.webm" type="video/webm;codecs=vp9"></source> + </video> +</div> +<p>Et voil&agrave;! We hope you enjoy experimenting with <code>Nottui</code> and <code>Lwd</code>. Check out the <a href="https://github.com/let-def/lwd/tree/master/lib/nottui">Nottui page</a> for more examples, and watch our recent presentation of these libraries at the 2020 ML Workshop here:</p> +<div style="position: relative; width: 100%; height: 0; padding-bottom: 56.25%"> + <iframe style="position: absolute; width: 100%; height: 100%; left: 0; right: 0" src="https://www.youtube-nocookie.com/embed/w7jc35kgBZE" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="allowfullscreen"> + </iframe> +</div>https://tarides.com/blog/2020-09-24-building-portable-user-interfaces-with-nottui-and-lwdBuilding portable user interfaces with Nottui and Lwd2020-09-24T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Tarides is pleased to provide support for the <a href="https://ocaml-sf.org">OCaml Software +Foundation</a>, a non-profit foundation hosted by +the Inria Foundation. The OCaml Software Foundation's mission is to +promote the OCaml programming language and its ecosystem by +supporting the growth of a diverse and international community of +OCaml users.</p> +<p>Tarides develops secure-by-design solutions in which OCaml's memory and +type-safety guarantees play a major role. Hence, most of the software +development that is done at Tarides is in OCaml: for instance, +<a href="https://mirage.io">MirageOS</a>, a library operating system that +constructs unikernels for secure, high-performance network +applications; and <a href="https://irmin.org">Irmin</a>, a library for building +mergeable, branchable distributed data stores, with built-in +snapshotting and support for a wide variety of storage backends.</p> +<p>Tarides is also very involved in the OCaml compiler development and +OCaml developer tooling ecosystem: as active maintainers of the <a href="https://www.youtube.com/watch?v=E8T_4zqWmq8&amp;list=PLKO_ZowsIOu5fHjRj0ua7_QWE_L789K_f&amp;ab_channel=ocaml2020">OCaml +platform</a>, Tarides is involved with most of the major +OCaml developer tools, including <a href="https://github.com/ocaml/ocaml">opam</a>, <a href="https://github.com/ocaml/dune">dune</a> and <a href="https://github.com/ocaml/merlin">merlin</a>.</p>https://tarides.com/blog/2020-09-17-tarides-is-now-a-sponsor-of-the-ocaml-software-foundationTarides is now a sponsor of the OCaml Software Foundation2020-09-17T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>This post will survey the latest design decisions and performance improvements +made to <code>irmin-pack</code>, the <a href="https://irmin.org/">Irmin</a> storage backend used by +<a href="https://tezos.gitlab.io/">Tezos</a>. Tezos is an open-source blockchain technology, +written in OCaml, which uses many libraries from the MirageOS ecosystem. For +more context on the design of <code>irmin-pack</code> and how it is optimised for the Tezos +use-case, you can check out our <a href="https://tarides.com/blog/2020-09-01-introducing-irmin-pack">previous blog post</a>.</p> +<p>This post showcases the improvements to <code>irmin-pack</code> since its initial +deployment on Tezos:</p> +<ol> +<li><a href="https://tarides.com/feed.xml#faster-read-only-store-instances">Faster read-only store instances</a></li> +<li><a href="https://tarides.com/feed.xml#better-flushing-for-the-read-write-instance">Improved automatic flushing</a></li> +<li><a href="https://tarides.com/feed.xml#faster-serialisation-for-irmintype">Staging generic serialisation operations</a></li> +<li><a href="https://tarides.com/feed.xml#more-control-over-indexmerge">More control over <code>Index.merge</code></a></li> +<li><a href="https://tarides.com/feed.xml#clearing-stores">Clearing stores</a></li> +</ol> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#faster-read-only-store-instances" aria-label="faster read only store instances permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Faster read-only store instances</h2> +<p>The Tezos use-case of Irmin requires both <em>read-only</em> and <em>read-write</em> +store handles, with multiple readers and a single writer all accessing the same +Irmin store concurrently. These store handles are held by different processes +(with disjoint memory spaces) so the instances must use files on disk to +synchronise, ensuring that the readers never miss updates from the writer. The +writer instance automatically flushes its internal buffers to disk at regular +intervals, allowing the readers to regularly pick up <code>replace</code> calls.</p> +<p>Until recently, each time a reader looked for a value &ndash; be it a commit, a node, +or a blob &ndash; it first checked if the writer had flushed new contents to disk. This +ensured that the readers always see the latest changes from the writer. However, +if the writer isn't actively modifying the regions being read, the readers make +one unnecessary system call per <code>find</code>. The higher the rate of reads, the more +time is lost to these synchronisation points. This is particularly problematic +in two use-cases:</p> +<ul> +<li> +<p><strong>Taking snapshots of the store</strong>. Tezos supports <a href="https://tezos.gitlab.io/user/snapshots.html">exporting portable +snapshots</a> of the store data. Since this operation only reads +<em>historic</em> data in the store (traversing backwards from a given block hash), +it's never necessary to synchronise with the writer.</p> +</li> +<li> +<p><strong>Bulk writes</strong>. It's sometimes necessary for the writer to dump lots of new +data to disk at once (for instance, when adding a commit to the history). In +these cases, any readers will repeatedly synchronise with the disk even though +they don't need to do so until the bulk operation is complete. More on this in +the coming months!</p> +</li> +</ul> +<p>To better support these use-cases, we dropped the requirement for readers to +maintain strict consistency with the writer instance. Instead, readers can call +an explicit <code>sync</code> function only when they <em>need</em> to see the latest concurrent +updates from the writer instance.</p> +<p>In our benchmarks, there is a clear speed-up for <code>find</code> operations from readers:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">[RO] Find in random order with implicit syncs + Total time: 67.276527 + Operations per second: 148640.253086 + Mbytes per second: 6.378948 + Read amplification in syscalls: 3.919739 + Read amplification in bytes: 63.943734 + +[RO] Find in random order with only one call to sync + Total time: 40.817458 + Operations per second: 244993.208543 + Mbytes per second: 10.513968 + Read amplification in syscalls: 0.919588 + Read amplification in bytes: 63.258072</code></pre></div> +<p>Not only it is faster, we can see also that fewer system calls are used in the +<code>Read amplification in syscalls</code> column. The benchmarks consists of reading +10,000,000 entries of 45 bytes each.</p> +<p>Relevant PRs: <a href="https://github.com/mirage/irmin/pull/1008">irmin #1008</a>, +<a href="https://github.com/mirage/index/pull/175">index #175</a>, +<a href="https://github.com/mirage/index/pull/198">index #198</a> and +<a href="https://github.com/mirage/index/pull/203">index #203</a>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#better-flushing-for-the-read-write-instance" aria-label="better flushing for the read write instance permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Better flushing for the read-write instance</h2> +<p>Irmin-pack uses an <a href="https://github.com/mirage/index/">index</a> to speed up <code>find</code> +calls: a <code>pack</code> file is used to store pairs of <code>(key, value)</code> and an <code>index</code> +records the address in pack where a <code>key</code> is stored. A read-write instance has +to write both the <code>index</code> and the <code>pack</code> file, for a read-only instance to find +a value. Moreover, the order in which the data is flushed to disk for the two +files is important: the address for the pair <code>(key, value)</code> cannot be written +before the pair itself. Otherwise the read-only instance can read an address for +a non existing <code>(key, value)</code> pair. But both <code>pack</code> and <code>index</code> have internal +buffers that accumulate data, in order to reduce the number of system calls, and +both decide arbitrarily when to flush those buffers to disk.</p> +<p>We introduce a <code>flush_callback</code> argument in <code>index</code>, which registers a callback +for whenever the index decides to flush. <code>irmin-pack</code> uses this callback to flush +its pack file, resolving the issue of the dangling address.</p> +<p>Relevant PRs: <a href="https://github.com/mirage/index/pull/189">index #189</a>, +<a href="https://github.com/mirage/index/pull/216">index #216</a>, +<a href="https://github.com/mirage/irmin/pull/1051">irmin #1051</a>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#faster-serialisation-for-irmintype" aria-label="faster serialisation for irmintype permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Faster serialisation for <code>Irmin.Type</code></h2> +<p>Irmin uses a library of <a href="http://ocamllabs.io/iocamljs/generic_programming.html"><em>generic</em></a> operations: functions +that take a runtime representation of a type and derive some operation on that +type. These are used in many places to automatically derive encoders and +decoders for our types, which are then used to move data to and from disk. For +instance:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">val</span> decode <span class="token punctuation">:</span> <span class="token type-variable function">'a</span> t <span class="token operator">-&gt;</span> string <span class="token operator">-&gt;</span> <span class="token type-variable function">'a</span> +<span class="token comment">(** [decode t] is the binary decoder of values represented by [t]. *)</span> + +<span class="token comment">(** Read an integer from a binary-encoded file. *)</span> +<span class="token keyword">let</span> int_of_file <span class="token label property">~path</span> <span class="token operator">=</span> open_in_bin path <span class="token operator">|&gt;</span> input_line <span class="token operator">|&gt;</span> decode Irmin<span class="token punctuation">.</span>Type<span class="token punctuation">.</span>int32</code></pre></div> +<p>The generic <code>decode</code> takes a <em>representation</em> of the type <code>int32</code> and uses +this to select the right binary decoder. Unfortunately, we pay the cost of this +runtime specialisation <em>every time</em> we call <code>int_of_file</code>. If we're invoking +the decoder for a particular type very often &ndash; such as when serialising store +values &ndash; it's more efficient to specialise <code>decode</code> once:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token comment">(** Specialised binary decoder for integers. *)</span> +<span class="token keyword">let</span> decode_int32 <span class="token operator">=</span> decode Irmin<span class="token punctuation">.</span>Type<span class="token punctuation">.</span>int32 + +<span class="token keyword">let</span> int_of_file_fast <span class="token label property">~path</span> <span class="token operator">=</span> open_in_bin path <span class="token operator">|&gt;</span> input_line <span class="token operator">|&gt;</span> decode_int32 contents</code></pre></div> +<p>The question then becomes: how can we change <code>decode</code> to encourage it to be +used in this more-efficient way? We can add a type wrapper &ndash; called <code>staged</code> &ndash; +to prevent the user from passing two arguments to <code>decode</code> at once:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">module</span> Staged <span class="token punctuation">:</span> <span class="token keyword">sig</span> + <span class="token keyword">type</span> <span class="token operator">+</span><span class="token type-variable function">'a</span> t + <span class="token keyword">val</span> stage <span class="token punctuation">:</span> <span class="token type-variable function">'a</span> <span class="token operator">-&gt;</span> <span class="token type-variable function">'a</span> t + <span class="token keyword">val</span> unstage <span class="token punctuation">:</span> <span class="token type-variable function">'a</span> t <span class="token operator">-&gt;</span> <span class="token type-variable function">'a</span> +<span class="token keyword">end</span> + +<span class="token keyword">val</span> decode <span class="token punctuation">:</span> <span class="token type-variable function">'a</span> t <span class="token operator">-&gt;</span> <span class="token punctuation">(</span>string <span class="token operator">-&gt;</span> <span class="token type-variable function">'a</span><span class="token punctuation">)</span> Staged<span class="token punctuation">.</span>t +<span class="token comment">(** [decode t] needs to be explicitly unstaged before being used. *)</span></code></pre></div> +<p>By forcing the user to add a <code>Staged.unstage</code> type coercion when using this +function, we're encouraging them to hoist such operations out of their +hot-loops:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token comment">(** The slow implementation no longer type-checks: *)</span> + +<span class="token keyword">let</span> int_of_file <span class="token label property">~path</span> <span class="token operator">=</span> open_in_bin path <span class="token operator">|&gt;</span> input_line <span class="token operator">|&gt;</span> decode Irmin<span class="token punctuation">.</span>Type<span class="token punctuation">.</span>int32 +<span class="token comment">(* Error: This expression has type (string -&gt; 'a) Staged.t + * but an expression was expected of type string -&gt; 'a *)</span> + +<span class="token comment">(* Instead, we know to pull [Staged.t] values out of hot-loops: *)</span> + +<span class="token keyword">let</span> decode_int32 <span class="token operator">=</span> Staged<span class="token punctuation">.</span>unstage <span class="token punctuation">(</span>decode Irmin<span class="token punctuation">.</span>Type<span class="token punctuation">.</span>int32<span class="token punctuation">)</span> + +<span class="token keyword">let</span> int_of_file_fast <span class="token label property">~path</span> <span class="token operator">=</span> open_in_bin path <span class="token operator">|&gt;</span> input_line <span class="token operator">|&gt;</span> decode_int32 contents</code></pre></div> +<p>We made similar changes to the performance-critical generic functions in +<a href="https://mirage.github.io/irmin/irmin/Irmin/Type/index.html"><code>Irmin.Type</code></a>, and observed significant performance improvements. +We also added benchmarks for serialising various types.</p> +<div style="text-align: center;"> + <img src="https://tarides.com/staged-type.svg" style="height: 550px; max-width: 100%"/> +</div> +<p>Relevant PRs: <a href="https://github.com/mirage/irmin/pull/1030">irmin #1030</a> and +<a href="https://github.com/mirage/irmin/pull/1028">irmin #1028</a>.</p> +<p>There are other interesting factors at play, such as altering <code>decode</code> to +increase the efficiency of the specialised decoders; we leave this for a future +blog post.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#more-control-over-indexmerge" aria-label="more control over indexmerge permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>More control over <code>Index.merge</code></h2> +<p>index regularly does a maintenance operation, called <code>merge</code>, to ensure fast +look-ups while having a small memory imprint. This operation is concurrent with +the most of other functions, it is however not concurrent with itself: a second +merge needs to wait for a previous one to finish. When writing big chunks of +data very often, <code>merge</code> operations become blocking. To help measuring and +detecting a blocking <code>merge</code>, we added in the <code>index</code> API calls to check whether +a merge is ongoing, and to time it.</p> +<p>We mentioned that <code>merge</code> is concurrent with most of the other function in +<code>index</code>. One notable exception was <code>close</code>, which had to wait for any ongoing +<code>merge</code> to finish, before closing the index. Now <code>close</code> interrupts an ongoing +merge, but still leaves the index in a clean state.</p> +<p>Relevant PRs: <a href="https://github.com/mirage/index/pull/185">index #185</a>, +<a href="https://github.com/mirage/irmin/pull/1049">irmin #1049</a> and +<a href="https://github.com/mirage/index/pull/215">index #215</a>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#clearing-stores" aria-label="clearing stores permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Clearing stores</h2> +<p>Another feature we recently added is the possibility to <code>clear</code> the store. It is +implemented by removing the old files on disk and opening fresh ones. However +in <code>irmin-pack</code>, the read-only instance has to detect that a clear occurred. To +do this, we add a <code>generation</code> in the header of the files used by an +<code>irmin-pack</code> store, which is increased by the clear operation. A generation +change signals to the read-only instance that it needs to close the file and +open it again, to be able to read the latest values.</p> +<p>As the header of the files on disk changed with the addition of the clear +operation, the <code>irmin-pack</code> stores created previous to this change are no longer +supported. We added a migration function for stores created with the previous +version (version 1) to the new version (version 2) of the store. You can call +this migration function as follows:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"> <span class="token keyword">let</span> open_store <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> + Store<span class="token punctuation">.</span>Repo<span class="token punctuation">.</span>v config + <span class="token keyword">in</span> + Lwt<span class="token punctuation">.</span>catch open_store <span class="token punctuation">(</span><span class="token keyword">function</span> + <span class="token operator">|</span> Irmin_pack<span class="token punctuation">.</span>Unsupported_version <span class="token variant symbol">`V1</span> <span class="token operator">-&gt;</span> + Logs<span class="token punctuation">.</span>app <span class="token punctuation">(</span><span class="token keyword">fun</span> l <span class="token operator">-&gt;</span> l <span class="token string">&quot;migrating store to version 2&quot;</span><span class="token punctuation">)</span> <span class="token punctuation">;</span> + Store<span class="token punctuation">.</span>migrate config <span class="token punctuation">;</span> + Logs<span class="token punctuation">.</span>app <span class="token punctuation">(</span><span class="token keyword">fun</span> l <span class="token operator">-&gt;</span> l <span class="token string">&quot;migration ended, opening store&quot;</span><span class="token punctuation">)</span> <span class="token punctuation">;</span> + open_store <span class="token punctuation">(</span><span class="token punctuation">)</span> + <span class="token operator">|</span> exn <span class="token operator">-&gt;</span> + Lwt<span class="token punctuation">.</span>fail exn<span class="token punctuation">)</span></code></pre></div> +<p>Relevant PRs: <a href="https://github.com/mirage/index/pull/211">index #211</a>, +<a href="https://github.com/mirage/irmin/pull/1047">irmin #1047</a>, +<a href="https://github.com/mirage/irmin/pull/1070">irmin #1070</a> and +<a href="https://github.com/mirage/irmin/pull/1071">irmin #1071</a>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusion" aria-label="conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusion</h2> +<p>We hope you've enjoyed this discussion of our recent work. <a href="https://twitter.com/tarides_">Stay +tuned</a> for our next Tezos / MirageOS development update! Thanks +to our commercial customers, users and open-source contributors for making this +work possible.</p>https://tarides.com/blog/2020-09-08-irmin-september-2020-updateIrmin: September 2020 update2020-09-08T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p><code>irmin-pack</code> is an Irmin <a href="https://irmin.org/tutorial/backend">storage backend</a> +that we developed over the last year specifically to meet the +<a href="https://tezos.gitlab.io/">Tezos</a> use-case. Tezos nodes were initially using an +LMDB-based backend for their storage, which after only a year of activity led to +<code>250 GB</code> disk space usage, with a monthly growth of <code>25 GB</code>. Our goal was to +dramatically reduce this disk space usage.</p> +<p>Part of the <a href="https://tarides.com/blog/2019-11-21-irmin-v2">Irmin.2.0.0 release</a> +and still under active development, it has been successfully integrated as the +storage layer of Tezos nodes and has been running in production for the last ten +months with great results. It reduces disk usage by a factor of 10, while still +ensuring similar performance and consistency guarantees in a memory-constrained +and concurrent environment.</p> +<p><code>irmin-pack</code> was presented along with Irmin v2 at the OCaml workshop 2020; you +can watch the presentation here:</p> +<div style="position: relative; width: 100%; height: 0; padding-bottom: 56.25%"> + <iframe style="position: absolute; width: 100%; height: 100%; left: 0; right: 0" src="https://www.youtube-nocookie.com/embed/v1lfMUM332w" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="allowfullscreen"> + </iframe> +</div> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#general-structure" aria-label="general structure permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>General structure</h2> +<p><code>irmin-pack</code> exposes functors that allow the user to provide arbitrary low-level +modules for handling I/O, and provides a fast key-value store interface composed +of three components:</p> +<ul> +<li>The <code>pack</code> is used to store the data contained in the Irmin store, as blobs.</li> +<li>The <code>dict</code> stores the paths where these blobs should live.</li> +<li>The <code>index</code> keeps track of the blobs that are present in the repository by +containing location information in the <code>pack</code>.</li> +</ul> +<p>Each of these use both on-disk storage for persistence and concurrence and +various in-memory caches for speed.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#storing-the-data-in-the-pack-file" aria-label="storing the data in the pack file permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Storing the data in the <code>pack</code> file</h3> +<p>The <code>pack</code> contains most of the data stored in this Irmin backend. It is an +append-only file containing the serialized data stored in the Irmin repository. +All three Irmin stores (see our <a href="https://irmin.org/tutorial/architecture">architecture +page</a> in the tutorial to learn more) +are contained in this single file.</p> +<p><code>Content</code> and <code>Commit</code> serialization is straightforward through +<a href="https://docs.mirage.io/irmin/Irmin/Type/index.html"><code>Irmin.Type</code></a>. They are written along with their length (to allow +correct reading) and hash (to enable integrity checks). The hash is used to +resolve internal links inside the pack when nodes are written.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/65f80d5690bb49cd0ead891e2e7346c8/f989d/pack.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 16.470588235294116%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/65f80d5690bb49cd0ead891e2e7346c8/c5bb3/pack.png" class="gatsby-resp-image-image" alt="The pack file" title="The pack file" srcset="/static/65f80d5690bb49cd0ead891e2e7346c8/04472/pack.png 170w, +/static/65f80d5690bb49cd0ead891e2e7346c8/9f933/pack.png 340w, +/static/65f80d5690bb49cd0ead891e2e7346c8/c5bb3/pack.png 680w, +/static/65f80d5690bb49cd0ead891e2e7346c8/b12f7/pack.png 1020w, +/static/65f80d5690bb49cd0ead891e2e7346c8/b5a09/pack.png 1360w, +/static/65f80d5690bb49cd0ead891e2e7346c8/f989d/pack.png 5206w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#optimizing-large-nodes" aria-label="optimizing large nodes permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Optimizing large nodes</h4> +<p>Serializing nodes is not as simple as contents. In fact, nodes might contain an +arbitrarily large number of children, and serializing them as a long list of +references might harm performance, as that means loading and writing a large +amount of data for each modification, no matter how small this modification +might be. Similarly, browsing the tree means reading large blocks of data, even +though only one child is needed.</p> +<p>For this reason, we implemented a <a href="https://en.wikipedia.org/wiki/Radix_tree">Patricia Tree</a> representation of +internal nodes that allows us to split the child list into smaller parts that +can be accessed and modified independently, while still being quickly available +when needed. This reduces duplication of tree data in the Irmin store and +improves disk access times.</p> +<p>Of course, we provide a custom hashing mechanism, so that hashing the nodes +using this partitioning is still backwards-compatible for users who rely on hash +information regardless of whether the node is split or not.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#optimizing-internal-references" aria-label="optimizing internal references permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Optimizing internal references</h4> +<p>In the Git model, all data are content-addressable (i.e. data are always +referenced by their hash). This naturally lends to indexing data by hashes on +the disk itself (i.e. the links from <code>commits</code> to <code>nodes</code> and from <code>nodes</code> to +<code>nodes</code> or <code>contents</code> are realized by hash).</p> +<p>We did not comply to this approach in <code>irmin-pack</code>, for at least two reasons:</p> +<ul> +<li> +<p>Referencing by hash does not allow fast recovery of the children, since +there is no way to find the relevant blob directly in the <code>pack</code> by providing +the hash. We will go into the details of this later in this post.</p> +</li> +<li> +<p>While hashes are being used as simple objects, their size is not negligible. +The default hashing function in Irmin is BLAKE2B, which provides 32-byte +digests.</p> +</li> +</ul> +<p>Instead, our internal links in the <code>pack</code> file are concretized by the offsets &ndash; +<code>int64</code> integers &ndash; of the children instead of their hash. Provided that the +trees are always written bottom-up (so that children already exist in the <code>pack</code> +when their parents are written), this solves both issues above. The data handled +by the backend is always immutable, and the file is append-only, ensuring that +the links can never be broken.</p> +<p>Of course, that encoding does not break the content-addressable property: one +can always retrieve an arbitrary piece of data through its hash, but it allows +internal links to avoid that indirection.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#deduplicating-the-path-names-through-the-dict" aria-label="deduplicating the path names through the dict permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Deduplicating the path names through the <code>dict</code></h3> +<p>In fact, the most common operations when using <code>irmin-pack</code> consist of modifying +the tree's leaves rather then its shape. This is similar to the way most of us +use Git: modifying the contents of files is very frequent, while renaming or +adding new files is rather rare. Even still, when writing a <code>node</code> in a new +commit, that node must contain the path names of its children, which end up +being duplicated a large number of times.</p> +<p>The <code>dict</code> is used for deduplication of path names so that the <code>pack</code> file can +uniquely reference them using shorter identifiers. It is composed of an +in-memory bidirectional hash table, allowing to query from path to identifier +when serializing and referencing, and from identifier to path when deserializing +and dereferencing.</p> +<p>To ensure persistence of the data across multiple runs and in case of crashes, +the small size of the <code>dict</code> &ndash; less than <code>15 Mb</code> in the Tezos use-case &ndash; allows +us to write the bindings to a write-only, append-only file that is fully read +and loaded on start-up.</p> +<p>We guarantee that the <code>dict</code> memory usage is bounded by providing a <code>capacity</code> +parameter. Adding a binding is guarded by this capacity, and will be inlined in +the <code>pack</code> file in case this limit has been reached. This scenario does not +happen during normal use of <code>irmin-pack</code>, but prevents attacks that would make +the memory grow in an unbounded way.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/cec17f425cdf458a385babbac24c0c04/f7171/dict.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 46.470588235294116%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/cec17f425cdf458a385babbac24c0c04/c5bb3/dict.png" class="gatsby-resp-image-image" alt="The dict" title="The dict" srcset="/static/cec17f425cdf458a385babbac24c0c04/04472/dict.png 170w, +/static/cec17f425cdf458a385babbac24c0c04/9f933/dict.png 340w, +/static/cec17f425cdf458a385babbac24c0c04/c5bb3/dict.png 680w, +/static/cec17f425cdf458a385babbac24c0c04/b12f7/dict.png 1020w, +/static/cec17f425cdf458a385babbac24c0c04/b5a09/dict.png 1360w, +/static/cec17f425cdf458a385babbac24c0c04/f7171/dict.png 3456w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#retrieve-the-data-in-the-pack-by-indexing" aria-label="retrieve the data in the pack by indexing permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Retrieve the data in the <code>pack</code> by indexing</h3> +<p>Since the <code>pack</code> file is append-only, naively reading its data would require a +linear search through the whole file for each lookup. Instead, we provide an +index that maps hashes of data blocks to their location in the <code>pack</code> file, +along with their length. This module allows quick recovery of the values queried +by hash.</p> +<p>It provides a simple key-value interface, that actually hides the most complex +part of <code>irmin-pack</code>.</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">type</span> t +<span class="token keyword">val</span> v <span class="token punctuation">:</span> readonly<span class="token punctuation">:</span>bool <span class="token operator">-&gt;</span> path<span class="token punctuation">:</span>string <span class="token operator">-&gt;</span> t + +<span class="token keyword">val</span> find <span class="token punctuation">:</span> t <span class="token operator">-&gt;</span> Key<span class="token punctuation">.</span>t <span class="token operator">-&gt;</span> Value<span class="token punctuation">.</span>t +<span class="token keyword">val</span> replace <span class="token punctuation">:</span> t <span class="token operator">-&gt;</span> Key<span class="token punctuation">.</span>t <span class="token operator">-&gt;</span> Value<span class="token punctuation">.</span>t <span class="token operator">-&gt;</span> unit +<span class="token comment">(* ... *)</span></code></pre></div> +<p>It has lead most of our efforts in the development of <code>irmin-pack</code> and is now +available as a separate library, wisely named <code>index,</code> that you can checkout on +GitHub under <a href="https://github.com/mirage/index/">mirage/index</a> and via <code>opam</code> as +the <code>index</code> and <code>index-unix</code> packages.</p> +<p>When <code>index</code> is used inside <code>irmin-pack</code>, the keys are the hashes of the data +stored in the backend, and the values are the <code>(offset, length)</code> pair that +indicates the location in the <code>pack</code> file. From now on in this post, we will +stick to the <code>index</code> abstraction: <code>key</code> and <code>value</code> will refer to the keys and +values as viewed by the <code>index</code>.</p> +<p>Our index is split into two major parts. The <code>log</code> is relatively small, and most +importantly, bounded; it contains the recently-added bindings. The <code>data</code> is +much larger, and contains older bindings.</p> +<p>The <code>log</code> part consists of a hash table associating keys to values. In order to +ensure concurrent access, and to be able to recover on a crash, we also maintain +a write-only, append-only file with the same contents, such that both always +contain exactly the same data at any time.</p> +<p>When a new key-value binding is added index, the value is simply serialized +along with its key and added to the <code>log</code>.</p> +<p>An obvious caveat of this approach is that the in-memory representation of the +<code>log</code> (the hashtable) is unbounded. It also grows a lot, as the Tezos node +stores more that 400 million objects. Our memory constraint obviously does not +allow such unbounded structures. This is where the <code>data</code> part comes in.</p> +<p>When the <code>log</code> size reaches a &ndash; customizable &ndash; threshold, its bindings are +flushed into a <code>data</code> component, that may already contain flushed data from +former <code>log</code> overloads. We call this operation a <em>merge</em>.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/7663e5dd55a9fa612393be5ae1952bf5/e9c53/merges.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 28.82352941176471%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/7663e5dd55a9fa612393be5ae1952bf5/c5bb3/merges.png" class="gatsby-resp-image-image" alt="Merging the index" title="Merging the index" srcset="/static/7663e5dd55a9fa612393be5ae1952bf5/04472/merges.png 170w, +/static/7663e5dd55a9fa612393be5ae1952bf5/9f933/merges.png 340w, +/static/7663e5dd55a9fa612393be5ae1952bf5/c5bb3/merges.png 680w, +/static/7663e5dd55a9fa612393be5ae1952bf5/b12f7/merges.png 1020w, +/static/7663e5dd55a9fa612393be5ae1952bf5/b5a09/merges.png 1360w, +/static/7663e5dd55a9fa612393be5ae1952bf5/e9c53/merges.png 7470w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>The important invariant maintained by the <code>merge</code> operation is that the <code>data</code> +file must remain sorted by the hash of the bindings. This will enable a fast +recovery of the data.</p> +<p>During this operation, both the <code>log</code> and the former <code>data</code> are read in sorted +order &ndash; <code>data</code> is already sorted, and <code>log</code> is small thus easy to sort in +memory &ndash; and merged into a <code>merging_data</code> file. This file is atomically renamed +at the end of the operation to replace the older <code>data</code> while still ensuring +correct concurrent accesses.</p> +<p>This operation obviously needs to re-write the whole index, so its execution +is very expensive. For this reason, it is performed by a separate thread in the +background to still allow regular use of the index and be transparent to the +user.</p> +<p>In the meantime, a <code>log_async</code> &ndash; similar to <code>log</code>, with a file and a hash table +&ndash; is used to hold new bindings and ensure the data being merged and the new data +are correctly separated. At the end of the merge, the <code>log_async</code> becomes the +new <code>log</code> and is cleared to be ready for the next merge.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#recovering-the-data" aria-label="recovering the data permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Recovering the data</h4> +<p>This design allows us a fast lookup of the data present in the index. Whenever +<code>find</code> or <code>mem</code> is called, we first look into the <code>log</code>, which is simply a call +to the corresponding <code>Hashtbl</code> function, since this data is contained in memory. +If the data is not found in the <code>log</code>, the <code>data</code> file will be browsed. This +means access to recent values is generally faster, because it does not require +any access to the disk.</p> +<p>Searching in the <code>data</code> file is made efficient by the invariant that we kept +during the <code>merge</code>: the file is sorted by hash. The search algorithm consists in +an interpolation search, which is permitted by the even distribution of the +hashes that we store. The theoretical complexity of the interpolation search is +<code>O(log (log n))</code>, which is generally better than a binary search, provided that +the computation of the interpolant is cheaper than reads, which is the case +here.</p> +<p>This approach allows us to find the data using approximately 5-6 reading steps +in the file, which is good, but still a source of slowdowns. For this reason, we +use a fan-out module on top of the interpolation search, able to tell us the +exact page in which a given key is located, in constant time, for an additional +space cost of <code>~100 Mb</code>. We use this to find the correct page of the disk, then +run the interpolation search in that page only. That approach allows us to find +the correct value with a single read in the <code>data</code> file.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusion" aria-label="conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusion</h2> +<p>This new backend is now used byt the Tezos nodes in production, and manages to +reduce the storage size from <code>250 Gb</code> down to <code>25 Gb</code>, with a monthly growth +rate of <code>2 Gb</code> , achieving a tenfold reduction.</p> +<p>In the meantime, it provides and single writer, multiple readers access pattern +that enables bakers and clients to connect to the same storage while it is operated.</p> +<p>On the memory side, all our components are memory bounded, and the bound is +generally customizable, the largest source of memory usage being the <code>log</code> part +of the <code>index</code>. While it can be reduced to fit in <code>1 Gb</code> of memory and run on +small VPS or Raspberry Pi, one can easily set a higher memory limit on a more +powerful machine, and achieve even better time performance.</p>https://tarides.com/blog/2020-09-01-introducing-irmin-packIntroducing irmin-pack2020-09-01T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p><a href="https://lcamtuf.coredump.cx/afl/">AFL</a> (and fuzzing in general) is often used +to find bugs in low-level code like parsers, but it also works very well to find +bugs in high level code, provided the right ingredients. We applied this +technique to feed random programs to OCamlFormat and found many formatting bugs.</p> +<p>OCamlFormat is a tool to format source code. To do so, it parses the source code +to an Abstract Syntax Tree (AST) and then applies formatting rules to the AST.</p> +<p>It can be tricky to correctly format the output. For example, say we want to +format <code>(a+b)*c</code>. The corresponding AST will look like <code>Apply(&quot;*&quot;, Apply (&quot;+&quot;, Var &quot;a&quot;, Var &quot;b&quot;), Var &quot;c&quot;)</code>. A naive formatter would look like this:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> <span class="token keyword">rec</span> format <span class="token operator">=</span> <span class="token keyword">function</span> + <span class="token operator">|</span> Var s <span class="token operator">-&gt;</span> s + <span class="token operator">|</span> Apply <span class="token punctuation">(</span>op<span class="token punctuation">,</span> e1<span class="token punctuation">,</span> e2<span class="token punctuation">)</span> <span class="token operator">-&gt;</span> + Printf<span class="token punctuation">.</span>sprintf <span class="token string">&quot;%s %s %s&quot;</span> <span class="token punctuation">(</span>format e1<span class="token punctuation">)</span> op <span class="token punctuation">(</span>format e2<span class="token punctuation">)</span></code></pre></div> +<p>But this is not correct, as it will print <code>(a+b)*c</code> as <code>a+b*c</code>, which is a +different program. In this particular case, the common solution would be to +track the relative precedence of the expressions and to emit only necessary +parentheses.</p> +<p>OCamlFormat has similar cases. To make sure we do not change a program when +formatting it, there is an extra check at the end to parse the output and +compare the output AST with the input AST. This ensures that, in case of bugs, +OCamlFormat exits with an error rather than changing the meaning of the input +program.</p> +<p>When we consider the whole OCaml language, the rules are complex and it is +difficult to make sure that we are correctly handling all programs. There are +two main failure modes: either we put too many parentheses, and the program does +not look good, or we do not put enough, and the AST changes (and OCamlFormat +exits with an error). We need a way to make sure that the latter does not +happen. Tests work to some extent, but some edge cases happen only when a +certain combination of language features is used. Because of this combinatorial +explosion, it is impossible to get good coverage using tests only.</p> +<p>Fortunately there is a technique we can use to automatically explore the program +space: fuzzing. For a primer on using this technique on OCaml programs, one can +refer to <a href="https://tarides.com/blog/2019-09-04-an-introduction-to-fuzzing-ocaml-with-afl-crowbar-and-bun">this article</a>.</p> +<p>To make this work we need two elements: a random program generator, and a +property to check. Here, we are interested in programs that are valid (in the +sense that they parse correctly) but do not format correctly. We can use the +OCamlFormat internals to do the following:</p> +<ol> +<li>try to parse input: in case of a parse error, just reject this input as</li> +</ol> +<p>invalid.</p> +<ol> +<li>otherwise, with have a valid program. try to format it. If this happens with</li> +</ol> +<p>no error at all, reject this input as well.</p> +<ol> +<li>otherwise, it means that the AST changed, comments moved, or something</li> +</ol> +<p>similar, in a valid program. This is what we are after.</p> +<p>Generating random programs is a bit more difficult. We can feed random strings +to AFL, but even with a corpus of existing valid code it will generate many +invalid programs. We are not interested in these for this project, we would +rather start from valid programs.</p> +<p>A good way to do that is to use Crowbar to directly generate AST values. Thanks +to <a href="https://github.com/yomimono/ppx_deriving_crowbar"><code>ppx_deriving_crowbar</code></a> and <a href="https://github.com/ocaml-ppx/ppx_import"><code>ppx_import</code></a> +it is possible to generate random values for an external type like +<code>Parsetree.structure</code> (the contents of <code>.ml</code> files). Even more fortunately +<a href="https://github.com/yomimono/ocaml-test-omp/blob/d086037027537ba4e23ce027766187979c85aa3d/test/parsetree_405.ml">somebody already did the work</a>. Thanks, Mindy!</p> +<p>This approach works really well: it generates 5k-10k programs per second, which +is very good performance (AFL starts complaining below 100/s).</p> +<p>Quickly, AFL was able to find crashes related to attributes. These are &quot;labels&quot; +attached to various nodes of the AST. For example the expression <code>(x || y) [@a]</code> +(logical or between <code>x</code> and <code>y</code>, attach attribute <code>a</code> to the &quot;or&quot; expression) +would get formatted as <code>x || y [@a]</code> (attribute <code>a</code> is attached to the <code>y</code> +variable). Once again, there is a check in place in OCamlFormat to make sure +that it does not save the file in this case, but it would exit with an error.</p> +<p>After the fuzzer has run for a bit longer, it found crashes where comments would +jump around in expressions like <code>f (*a*) (*bb*) x</code>. Wait, what? We never told +the program generator how to generate comments. Inspecting the intermediate AST, +the part in the middle is actually an integer literal with value <code>&quot;(*a*) (*bb*)&quot;</code> (integer literals are represented as strings so that <a href="https://github.com/Drup/Zarith-ppx">a third party +library could add literals for arbitrary precision numbers</a> for +example).</p> +<p>AFL comes with a program called <code>afl-tmin</code> that is used to minimize a crash. It +will try to find a smaller example of a program that crashes OCamlFormat. It +works well even with Crowbar in between. For example it is able to turn <code>(new aaaaaa &amp; [0;0;0;0])[@aaaaaaaaaa]</code> into <code>(0&amp;0)[@a]</code> (neither AFL nor OCamlFormat +knows about types, so they can operate on nonsensical programs. Finding a +well-typed version of a crash is usually not very difficult, but it has to be +done manually).</p> +<p>In total, letting AFL run overnight on a single core (that is relatively short +in terms of fuzzing) caused 453 crashes. After minimization and deduplication, +this corresponded to <a href="https://github.com/ocaml-ppx/ocamlformat/issues?q=label:fuzz">about 30 unique issues</a>.</p> +<p>Most of them are related to attributes that OCamlFormat did not try to include +in the output, or where it forgot to add parentheses. Fortunately, there are +safeguards in OCamlFormat: since it checks that the formatting preserves the AST +structure, it will exit with an error instead of outputting a different program.</p> +<p>Once again, fuzzing has proved itself as a powerful technique to find actual +bugs (including high-level ones). A possible approach for a next iteration is to +try to detect more problems during formatting, such as finding cases where lines +are longer than allowed. It is also possible to extend the random program +generator so that it tries to generate comments, and let OCamlFormat check that +they are all laid out correctly in the output. We look forward to employing +fuzzing more extensively for OCamlFormat development in future.</p>https://tarides.com/blog/2020-08-03-fuzzing-ocamlformat-with-afl-and-crowbarFuzzing OCamlFormat with AFL and Crowbar2020-08-03T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>We are very glad to announce that Tarides has been awarded two new grants from +the Tezos Foundation.</p> +<p>Thanks to these new grants, Tarides will continue to work on the integration +between Tezos and MirageOS. We believe that the secure deployment of blockchains +is still a major challenge today, and that deploying Tezos as a unikernel will +have a big impact in term of safety and security. It will be a key +differentiator that will separate Tezos from other blockchains.</p> +<p>The Tezos codebase is written in OCaml and is currently using more than 100 +external packages, among which one third comes from the MirageOS project. +However, it still heavily depends on non-compatible Unix libraries. Making the +Tezos codebase fully compatible with MirageOS will help Tezos with: distribution +and packaging, portability, secure deployment and operational safety.</p> +<p>We&rsquo;ll regularly publish development progress updates, so stay tuned!</p>https://tarides.com/blog/2020-04-20-the-future-of-tezos-on-mirageosThe future of Tezos on MirageOS2020-04-20T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>We are very excited to announce that Tarides has <a href="https://www.forum-fic.com/en/home/price/the-fic-start-up-award.htm">won an +award</a> +from the International Cybersecurity Forum (FIC 2020).</p> +<p>Organized every year in Lille (France), the International +Cybersecurity Forum has become the leading European event on +cybersecurity and digital trust. Its main goal is to foster reflection +and exchanges within the European cybersecurity ecosystem.</p> +<p>We are very happy to have won the &quot;Coup de Coeur&quot; Prize, which will +bring great visibility to our technological innovations. It is also an +opportunity for us to meet experts in the cybersecurity sector and to +consider additional use-cases for our work. We would like to thank the +<a href="https://ceis.eu/">CEIS</a> for organising the event and the members of +the jury for commending <a href="https://mirage.io">MirageOS</a> and +<a href="https://tarides.com/blog/2019-07-05-i-lab-2019">OSMOSE</a>.</p> +<p>The next FIC will be held on the 28th, 29th and 30th of January +2020 in Lille. For more details about the FIC and to register, visit +<a href="https://www.forum-fic.com/en/home.htm">their website</a>.</p> +<br/> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/dc5973af18f70eb3d969d8aa1c8dfae6/9c311/FIC2020.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 47.64705882352941%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/dc5973af18f70eb3d969d8aa1c8dfae6/7bf67/FIC2020.jpg" class="gatsby-resp-image-image" alt="FIC2020 Startup Award Winners" title="FIC2020 Startup Award Winners" srcset="/static/dc5973af18f70eb3d969d8aa1c8dfae6/651be/FIC2020.jpg 170w, +/static/dc5973af18f70eb3d969d8aa1c8dfae6/d30a3/FIC2020.jpg 340w, +/static/dc5973af18f70eb3d969d8aa1c8dfae6/7bf67/FIC2020.jpg 680w, +/static/dc5973af18f70eb3d969d8aa1c8dfae6/9c311/FIC2020.jpg 838w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p>https://tarides.com/blog/2019-12-11-tarides-wins-the-fic-2020-startup-awardTarides wins the FIC 2020 startup award2019-12-11T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>We are thrilled to have been selected by the <a href="https://www.opensourcesummit.paris">Paris Open Source Summit</a> +committee to talk about &ldquo;Secure-by-design IoT applications using MirageOS&rdquo;.</p> +<p>The Paris Open Source Summit is an annual event where you can connect to +open-source communities and learn from tech leaders, project committers and +CTOs about the latest technical solutions, innovative uses and societal +challenges of open digital technology.</p> +<p>Thomas Gazagnaire, Tarides CEO/CTO, will explain what makes MirageOS a good +framework to build IoT applications and how we can use embedded devices running +on ARMv8, ESP32 or RISC-V to run secure and end-to-end open-source +infrastructure services such as VPN proxies and email servers. He will also +highlight how this infrastructure will be used to form the basis of OSMOSE: a +secure, distributed and privacy-preserving platform to write user-centric IoT +applications.</p> +<p>MirageOS is a library operating system (using the MIT license) which enables the +construction of unikernels: specialized services where the runtime binary +contains only the necessary code for execution and no more. Unikernels have a +drastically smaller attack surface than service deployments in traditional +operating systems and could lead to 1000x less code for the full application +stack. Moreover, as MirageOS is written in a memory safe language (OCaml), a +full class of bugs related to memory corruption &ndash; representing <a href="https://msrc-blog.microsoft.com/2019/07/16/a-proactive-approach-to-more-secure-code/">70% of the +released CVEs</a> in classic operating systems written in C &ndash; +can no longer appear. These two properties combined (and more!) allow MirageOS +to build &ldquo;secure-by-design&rdquo; applications where everything &ndash; from the high-level +business logic to the low-level device drivers &ndash; has been designed to be as +secure as possible.</p> +<p>To learn more about the project, attend the Paris Open Source Summit! The talk +will take place during the '<a href="https://www.opensourcesummit.paris/EMBEDDED+&amp;+IOT_168_5745.html">Embedded &amp; IOT</a>' section +at 14:50 &ndash; 15:20 on December 10th, 2019.</p> +<br/> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/7826ecb5d3f2934a405f25446b1eb1d8/72e01/poss_2019.jpg" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 50%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/7826ecb5d3f2934a405f25446b1eb1d8/7bf67/poss_2019.jpg" class="gatsby-resp-image-image" alt="Thomas at the Paris Open Source Summit" title="Thomas at the Paris Open Source Summit" srcset="/static/7826ecb5d3f2934a405f25446b1eb1d8/651be/poss_2019.jpg 170w, +/static/7826ecb5d3f2934a405f25446b1eb1d8/d30a3/poss_2019.jpg 340w, +/static/7826ecb5d3f2934a405f25446b1eb1d8/7bf67/poss_2019.jpg 680w, +/static/7826ecb5d3f2934a405f25446b1eb1d8/990cb/poss_2019.jpg 1020w, +/static/7826ecb5d3f2934a405f25446b1eb1d8/72e01/poss_2019.jpg 1024w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p>https://tarides.com/blog/2019-12-04-mirageos-talk-at-the-paris-open-source-summitMirageOS talk at the Paris Open Source Summit2019-12-04T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>With the release of Irmin 2.0.0, we are happy to announce a new package - <code>irmin-graphql</code>, which can be used to serve data from Irmin over HTTP. This blog post will give you some examples to help you get started, there is also <a href="https://irmin.org/tutorial/graphql">a section in the <code>irmin-tutorial</code></a> with similar information. To avoid writing the same thing twice, this post will cover the basics of getting started, plus a few interesting ideas for queries.</p> +<p>Getting the <code>irmin-graphql</code> server running from the command-line is easy:</p> +<div class="gatsby-highlight" data-language="shell"><pre class="language-shell"><code class="language-shell">$ irmin graphql <span class="token parameter variable">--root</span><span class="token operator">=</span>/tmp/irmin</code></pre></div> +<p>where <code>/tmp/irmin</code> is the actual path to your repository. This will start the server on <code>localhost:8080</code>, but it's possible to customize this using the <code>--address</code> and <code>--port</code> flags.</p> +<p>The new GraphQL API has been added to address some of the shortcomings that have been identified with the old HTTP API, as well as enable a number of new features and capabilities.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#graphql" aria-label="graphql permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>GraphQL</h1> +<p><a href="https://graphql.org/">GraphQL</a> is a query language for exposing data as a graph via an API, typically using HTTP as a transport. The centerpiece of a GraphQL API is the <em>schema</em>, which describes the graph in terms of types and relationships between these types. The schema is accessible by the consumer, and acts as a contract between the API and the consumer, by clearly defining all API operations and fully assigning types to all interactions.</p> +<p>Viewing Irmin data as a graph turns out to be a natural and useful model. Concepts such as branches and commits fit in nicely, and the stored application data is organized as a tree. Such highly hierarchical data can be challenging to interact with using REST, but is easy to represent and navigate with GraphQL.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 74.11764705882352%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/77be7ca8c9940e693b03660d2d5cee01/c5bb3/git-data-model.png" class="gatsby-resp-image-image" alt="Git data model" title="Git data model" srcset="/static/77be7ca8c9940e693b03660d2d5cee01/04472/git-data-model.png 170w, +/static/77be7ca8c9940e693b03660d2d5cee01/9f933/git-data-model.png 340w, +/static/77be7ca8c9940e693b03660d2d5cee01/c5bb3/git-data-model.png 680w, +/static/77be7ca8c9940e693b03660d2d5cee01/5a190/git-data-model.png 800w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </span> +(image from <a href="https://git-scm.com/book/en/v2/Git-Internals-Git-Objects">Pro Git</a>)</p> +<p>As a consumer of an API, one of the biggest initial challenges is understanding what operations are exposed and how to use them. Conversely, as a developer of an API, keeping documentation up-to-date is challenging and time consuming. Though no substitute for more free-form documentation, a GraphQL schema provides an excellent base line for understanding a GraphQL API that is guaranteed to be accurate and up-to-date. This issue is definitely true of the old Irmin HTTP API, which was hard to approach for newcomers due to lack of documentation.</p> +<p>Being able to inspect the schema of a GraphQL API enables powerful tooling. A great example of this is <a href="https://tarides.com/graphiql">GraphiQL</a>, which is a browser-based IDE for GraphQL queries. GraphiQL can serve both as an interactive API explorer and query designer with intelligent autocompletion, formatting and more.</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/19632cbb13504bb32d6d6d285ec1f542/82e86/graphiql.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 48.23529411764706%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/19632cbb13504bb32d6d6d285ec1f542/c5bb3/graphiql.png" class="gatsby-resp-image-image" alt="GraphiQL" title="GraphiQL" srcset="/static/19632cbb13504bb32d6d6d285ec1f542/04472/graphiql.png 170w, +/static/19632cbb13504bb32d6d6d285ec1f542/9f933/graphiql.png 340w, +/static/19632cbb13504bb32d6d6d285ec1f542/c5bb3/graphiql.png 680w, +/static/19632cbb13504bb32d6d6d285ec1f542/b12f7/graphiql.png 1020w, +/static/19632cbb13504bb32d6d6d285ec1f542/b5a09/graphiql.png 1360w, +/static/19632cbb13504bb32d6d6d285ec1f542/82e86/graphiql.png 1978w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>The combination of introspection and a strongly typed schema also allows creating smart clients using code generation. This is already a quite wide-spread idea with <a href="https://tarides.com/apollo-swift">Apollo for iOS</a>, <a href="https://tarides.com/apollo-java">Apollo for Android</a> or <a href="https://tarides.com/graphql_ppx"><code>graphql_ppx</code></a> for OCaml/Reason. Though generic GraphQL client libraries will do a fine job interacting with the Irmin GraphQL API, these highlighted libraries will offer excellent ergonomics and type-safety out of the box.</p> +<p>One of the problems that GraphQL set out to solve is that of over- and underfetching. When designing REST API response payloads, there is always a tension between including too little data, which will require clients to make more network requests, and including too much data, which wastes resources for both client and server (serialization, network transfer, deserialization, etc).<br/> +The existing low-level Irmin HTTP API is a perfect example of this. Fetching the contents of a particular file on the master branch requires at least 4 HTTP requests (fetch the branch, fetch the commit, fetch the tree, fetch the blob), i.e. massive underfetching. By comparison, this is something easily solved with a single request to the new GraphQL API. More generally, the GraphQL API allows you to fetch <em>exactly</em> the data you need in a single request without making one-off endpoints.</p> +<p>For the curious, here's the GraphQL query to fetch the contents of <code>README.md</code> from the branch <code>master</code>:</p> +<div class="gatsby-highlight" data-language="graphql"><pre class="language-graphql"><code class="language-graphql"><span class="token keyword">query</span> <span class="token punctuation">{</span> + <span class="token object">master</span> <span class="token punctuation">{</span> + <span class="token object">tree</span> <span class="token punctuation">{</span> + <span class="token property-query">get</span><span class="token punctuation">(</span><span class="token attr-name">key</span><span class="token punctuation">:</span> <span class="token string">&quot;README.md&quot;</span><span class="token punctuation">)</span> + <span class="token punctuation">}</span> + <span class="token punctuation">}</span> +<span class="token punctuation">}</span></code></pre></div> +<p>The response will look something like this:</p> +<div class="gatsby-highlight" data-language="json"><pre class="language-json"><code class="language-json"><span class="token punctuation">{</span> + <span class="token property">&quot;data&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span> + <span class="token property">&quot;master&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span> + <span class="token property">&quot;tree&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span> + <span class="token property">&quot;get&quot;</span><span class="token operator">:</span> <span class="token string">&quot;The contents of README.md&quot;</span> + <span class="token punctuation">}</span> + <span class="token punctuation">}</span> + <span class="token punctuation">}</span> +<span class="token punctuation">}</span></code></pre></div> +<p>The GraphQL API is not limited to only reading data, you can also write data to your Irmin store. Here's a simple example that will set the key <code>README.md</code> to <code>&quot;foo&quot;</code>, and return the hash of that commit:</p> +<div class="gatsby-highlight" data-language="graphql"><pre class="language-graphql"><code class="language-graphql"><span class="token keyword">mutation</span> <span class="token punctuation">{</span> + <span class="token property-query property-mutation">set</span><span class="token punctuation">(</span><span class="token attr-name">key</span><span class="token punctuation">:</span> <span class="token string">&quot;README.md&quot;</span><span class="token punctuation">,</span> <span class="token attr-name">value</span><span class="token punctuation">:</span> <span class="token string">&quot;foo&quot;</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> + <span class="token property">hash</span> + <span class="token punctuation">}</span> +<span class="token punctuation">}</span></code></pre></div> +<p>By default, GraphQL allows you to do multiple operations in a single query, so you get bulk operations for free. Here's a more complex example that modifies two different branches, <code>branch-a</code> and <code>branch-b</code>, and then merges <code>branch-b</code> into <code>branch-a</code> <em>all in a single query</em>:</p> +<div class="gatsby-highlight" data-language="graphql"><pre class="language-graphql"><code class="language-graphql"><span class="token keyword">mutation</span> <span class="token punctuation">{</span> + <span class="token attr-name">branch_a</span><span class="token punctuation">:</span> <span class="token property-query">set</span><span class="token punctuation">(</span><span class="token attr-name">branch</span><span class="token punctuation">:</span> <span class="token string">&quot;branch-a&quot;</span><span class="token punctuation">,</span> <span class="token attr-name">key</span><span class="token punctuation">:</span> <span class="token string">&quot;foo&quot;</span><span class="token punctuation">,</span> <span class="token attr-name">value</span><span class="token punctuation">:</span> <span class="token string">&quot;bar&quot;</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> + <span class="token property">hash</span> + <span class="token punctuation">}</span> + + <span class="token attr-name">branch_b</span><span class="token punctuation">:</span> <span class="token property-query">set</span><span class="token punctuation">(</span><span class="token attr-name">branch</span><span class="token punctuation">:</span> <span class="token string">&quot;branch-a&quot;</span><span class="token punctuation">,</span> <span class="token attr-name">key</span><span class="token punctuation">:</span> <span class="token string">&quot;baz&quot;</span><span class="token punctuation">,</span> <span class="token attr-name">value</span><span class="token punctuation">:</span> <span class="token string">&quot;qux&quot;</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> + <span class="token property">hash</span> + <span class="token punctuation">}</span> + + <span class="token property-query">merge_with_branch</span><span class="token punctuation">(</span><span class="token attr-name">branch</span><span class="token punctuation">:</span> <span class="token string">&quot;branch-b&quot;</span><span class="token punctuation">,</span> <span class="token attr-name">from</span><span class="token punctuation">:</span> <span class="token string">&quot;branch-a&quot;</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> + <span class="token property">hash</span> + <span class="token object">tree</span> <span class="token punctuation">{</span> + <span class="token object">list_contents_recursively</span> <span class="token punctuation">{</span> + <span class="token property">key</span> + <span class="token property">value</span> + <span class="token punctuation">}</span> + <span class="token punctuation">}</span> + <span class="token punctuation">}</span> +<span class="token punctuation">}</span></code></pre></div> +<p>Here's what the response might look like:</p> +<div class="gatsby-highlight" data-language="json"><pre class="language-json"><code class="language-json"><span class="token punctuation">{</span> + <span class="token property">&quot;data&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span> + <span class="token property">&quot;branch_a&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span> + <span class="token property">&quot;hash&quot;</span><span class="token operator">:</span> <span class="token string">&quot;0a1313ae9dfe1d4339aee946dd76b383e02949b6&quot;</span> + <span class="token punctuation">}</span><span class="token punctuation">,</span> + <span class="token property">&quot;branch_b&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span> + <span class="token property">&quot;hash&quot;</span><span class="token operator">:</span> <span class="token string">&quot;28855c277671ccc180c81058a28d3254f17d2f7b&quot;</span> + <span class="token punctuation">}</span><span class="token punctuation">,</span> + <span class="token property">&quot;merge_with_branch&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span> + <span class="token property">&quot;hash&quot;</span><span class="token operator">:</span> <span class="token string">&quot;7b17437a16a858816d2710a94ccaa1b9c3506d1f&quot;</span><span class="token punctuation">,</span> + <span class="token property">&quot;tree&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span> + <span class="token property">&quot;list_contents_recursively&quot;</span><span class="token operator">:</span> <span class="token punctuation">[</span> + <span class="token punctuation">{</span> + <span class="token property">&quot;key&quot;</span><span class="token operator">:</span> <span class="token string">&quot;/foo&quot;</span><span class="token punctuation">,</span> + <span class="token property">&quot;value&quot;</span><span class="token operator">:</span> <span class="token string">&quot;bar&quot;</span> + <span class="token punctuation">}</span><span class="token punctuation">,</span> + <span class="token punctuation">{</span> + <span class="token property">&quot;key&quot;</span><span class="token operator">:</span> <span class="token string">&quot;/baz&quot;</span><span class="token punctuation">,</span> + <span class="token property">&quot;value&quot;</span><span class="token operator">:</span> <span class="token string">&quot;qux&quot;</span> + <span class="token punctuation">}</span> + <span class="token punctuation">]</span> + <span class="token punctuation">}</span> + <span class="token punctuation">}</span> + <span class="token punctuation">}</span> +<span class="token punctuation">}</span></code></pre></div> +<p>Overall, the new GraphQL API operates at a much higher level than the old HTTP API, and offers a number of complex operations that were tricky to accomplish before.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#customizable" aria-label="customizable permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Customizable</h1> +<p>With GraphQL, all request and response data is fully described by the schema. Because Irmin allows the user to have custom content types, this leaves the question of what type to assign to such values. By default, the GraphQL API will expose all values as strings, i.e. the serialized version of the data that your application stores. This works quite well when Irmin is used as a simple key-value store, but it can be very inconvenient scheme when storing more complex values. As an example, consider storing contacts (name, email, phone, tags, etc) in your Irmin store, where values have the following type:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token comment">(* Custom content type: a contact *)</span> +<span class="token keyword">type</span> contact <span class="token operator">=</span> <span class="token punctuation">{</span> + name <span class="token punctuation">:</span> string<span class="token punctuation">;</span> + email <span class="token punctuation">:</span> string<span class="token punctuation">;</span> + <span class="token comment">(* ... *)</span> +<span class="token punctuation">}</span></code></pre></div> +<p>Fetching such a value will by default be returned to the client as the JSON encoded representation. Assume we're storing a contact under the key <code>john-doe</code>, which we fetch with the following query:</p> +<div class="gatsby-highlight" data-language="graphql"><pre class="language-graphql"><code class="language-graphql"><span class="token keyword">query</span> <span class="token punctuation">{</span> + <span class="token object">master</span> <span class="token punctuation">{</span> + <span class="token object">tree</span> <span class="token punctuation">{</span> + <span class="token property-query">get</span><span class="token punctuation">(</span><span class="token attr-name">key</span><span class="token punctuation">:</span> <span class="token string">&quot;john-doe&quot;</span><span class="token punctuation">)</span> + <span class="token punctuation">}</span> + <span class="token punctuation">}</span> +<span class="token punctuation">}</span></code></pre></div> +<p>The response would then look something like this:</p> +<div class="gatsby-highlight" data-language="json"><pre class="language-json"><code class="language-json"><span class="token punctuation">{</span> + <span class="token property">&quot;master&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span> + <span class="token property">&quot;tree&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span> + <span class="token property">&quot;get&quot;</span><span class="token operator">:</span> <span class="token string">&quot;{\&quot;name\&quot;:\&quot;John Doe\&quot;, \&quot;email\&quot;: \&quot;john.doe@gmail.com/&quot;</span><span class="token punctuation">,</span> ...<span class="token punctuation">}</span>&quot; + <span class="token punctuation">}</span> + <span class="token punctuation">}</span> +<span class="token punctuation">}</span></code></pre></div> +<p>The client will have to parse this JSON string and cannot choose to only fetch parts of the value (say, only the email). Optimally we would want the client to get a structured response such as the following:</p> +<div class="gatsby-highlight" data-language="json"><pre class="language-json"><code class="language-json"><span class="token punctuation">{</span> + <span class="token property">&quot;master&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span> + <span class="token property">&quot;tree&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span> + <span class="token property">&quot;get&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span> + <span class="token property">&quot;name&quot;</span><span class="token operator">:</span> <span class="token string">&quot;John Doe&quot;</span><span class="token punctuation">,</span> + <span class="token property">&quot;email&quot;</span><span class="token operator">:</span> <span class="token string">&quot;john.doe@gmail.com&quot;</span><span class="token punctuation">,</span> + ... + <span class="token punctuation">}</span> + <span class="token punctuation">}</span> + <span class="token punctuation">}</span> +<span class="token punctuation">}</span></code></pre></div> +<p>To achieve this, the new GraphQL API allows providing an &quot;output type&quot; and an &quot;input type&quot; for most of the configurable types in your store (<code>contents</code>, <code>key</code>, <code>metadata</code>, <code>hash</code>, <code>branch</code>). The output type specifies how data is presented to the client, while the input type controls how data can be provided by the client. Let's take a closer look at specifying a custom output type.</p> +<p>Essentially you have to construct a value of type <code>(unit, 'a option) Graphql_lwt.Schema.typ</code> (from the <a href="https://tarides.com/ocaml-graphql-server"><code>graphql-lwt</code></a> package), assuming your content type is <code>'a</code>. We could construct a GraphQL object type for our example content type <code>contact</code> as follows:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token comment">(* (unit, contact option) Graphql_lwt.Schema.typ *)</span> +<span class="token keyword">let</span> contact_schema_typ <span class="token operator">=</span> Graphql_lwt<span class="token punctuation">.</span>Schema<span class="token punctuation">.</span><span class="token punctuation">(</span>obj <span class="token string">&quot;Contact&quot;</span> + <span class="token label property">~fields</span><span class="token punctuation">:</span><span class="token punctuation">(</span><span class="token keyword">fun</span> <span class="token punctuation">_</span> <span class="token operator">-&gt;</span> <span class="token punctuation">[</span> + field <span class="token string">&quot;name&quot;</span> + <span class="token label property">~typ</span><span class="token punctuation">:</span><span class="token punctuation">(</span>non_null string<span class="token punctuation">)</span> + <span class="token label property">~args</span><span class="token punctuation">:</span><span class="token punctuation">[</span><span class="token punctuation">]</span> + <span class="token label property">~resolve</span><span class="token punctuation">:</span><span class="token punctuation">(</span><span class="token keyword">fun</span> <span class="token punctuation">_</span> contact <span class="token operator">-&gt;</span> + contact<span class="token punctuation">.</span>name + <span class="token punctuation">)</span> + <span class="token punctuation">;</span> + <span class="token comment">(* ... more fields *)</span> + <span class="token punctuation">]</span><span class="token punctuation">)</span> +<span class="token punctuation">)</span></code></pre></div> +<p>To use the custom type, you need to instantiate the functor <code>Irmin_unix.Graphql.Server.Make_ext</code> (assuming you're deploying to a Unix target) with an Irmin store (type <code>Irmin.S</code>) and a custom types module (type <code>Irmin_graphql.Server.CUSTOM_TYPES</code>). This requires a bit of plumbing:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token comment">(* Instantiate the Irmin functor somehow *)</span> +<span class="token keyword">module</span> S <span class="token punctuation">:</span> Irmin<span class="token punctuation">.</span>S <span class="token keyword">with</span> <span class="token keyword">type</span> contents <span class="token operator">=</span> contact <span class="token operator">=</span> + <span class="token comment">(* ... *)</span> + +<span class="token comment">(* Custom GraphQL presentation module *)</span> +<span class="token keyword">module</span> Custom_types <span class="token operator">=</span> <span class="token keyword">struct</span> + <span class="token comment">(* Construct default GraphQL types *)</span> + <span class="token keyword">module</span> Defaults <span class="token operator">=</span> Irmin_graphql<span class="token punctuation">.</span>Server<span class="token punctuation">.</span>Default_types <span class="token punctuation">(</span>S<span class="token punctuation">)</span> + + <span class="token comment">(* Use the default types for most things *)</span> + <span class="token keyword">module</span> Key <span class="token operator">=</span> Defaults<span class="token punctuation">.</span>Key + <span class="token keyword">module</span> Metadata <span class="token operator">=</span> Defaults<span class="token punctuation">.</span>Metadata + <span class="token keyword">module</span> Hash <span class="token operator">=</span> Defaults<span class="token punctuation">.</span>Hash + <span class="token keyword">module</span> Branch <span class="token operator">=</span> Defaults<span class="token punctuation">.</span>Branch + + <span class="token comment">(* Use custom output type for contents *)</span> + <span class="token keyword">module</span> Contents <span class="token operator">=</span> <span class="token keyword">struct</span> + <span class="token keyword">include</span> Defaults<span class="token punctuation">.</span>Contents + <span class="token keyword">let</span> schema_typ <span class="token operator">=</span> contact_schema_typ + <span class="token keyword">end</span> +<span class="token keyword">end</span> + +<span class="token keyword">module</span> Remote <span class="token operator">=</span> <span class="token keyword">struct</span> + <span class="token keyword">let</span> remote <span class="token operator">=</span> Some s<span class="token punctuation">.</span>remote +<span class="token keyword">end</span> + +<span class="token keyword">module</span> GQL <span class="token operator">=</span> Irmin_unix<span class="token punctuation">.</span>Graphql<span class="token punctuation">.</span>Server<span class="token punctuation">.</span>Make_ext <span class="token punctuation">(</span>S<span class="token punctuation">)</span> <span class="token punctuation">(</span>Remote<span class="token punctuation">)</span> <span class="token punctuation">(</span>Custom_types<span class="token punctuation">)</span></code></pre></div> +<p>With this in hand, we can now query specifically for the email of <code>john-doe</code>:</p> +<div class="gatsby-highlight" data-language="graphql"><pre class="language-graphql"><code class="language-graphql"><span class="token keyword">query</span> <span class="token punctuation">{</span> + <span class="token object">master</span> <span class="token punctuation">{</span> + <span class="token object">tree</span> <span class="token punctuation">{</span> + <span class="token property-query">get</span><span class="token punctuation">(</span><span class="token attr-name">key</span><span class="token punctuation">:</span> <span class="token string">&quot;john-doe&quot;</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> + <span class="token property">email</span> + <span class="token punctuation">}</span> + <span class="token punctuation">}</span> + <span class="token punctuation">}</span> +<span class="token punctuation">}</span></code></pre></div> +<p>... and get a nicely structured JSON response back:</p> +<div class="gatsby-highlight" data-language="json"><pre class="language-json"><code class="language-json"><span class="token punctuation">{</span> + <span class="token property">&quot;master&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span> + <span class="token property">&quot;tree&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span> + <span class="token property">&quot;get&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span> + <span class="token property">&quot;email&quot;</span><span class="token operator">:</span> <span class="token string">&quot;john.doe@gmail.com&quot;</span> + <span class="token punctuation">}</span> + <span class="token punctuation">}</span> + <span class="token punctuation">}</span> +<span class="token punctuation">}</span></code></pre></div> +<p>The custom types is very powerful and opens up for transforming or enriching the data at query time, e.g. geocoding the address of a contact, or checking an on-line status.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#watches" aria-label="watches permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Watches</h1> +<p>A core feature of Irmin is the ability to <em>watch</em> for changes to the underlying data store in real-time. <code>irmin-graphql</code> takes advantage of GraphQL subscriptions to expose Irmin watches. Subscriptions are a relative recent addition to the GraphQL spec (<a href="https://tarides.com/graphql-spec-june-2018">June 2018</a>), which allows clients to <em>subscribe</em> to changes. These changes are pushed to the client over a suitable transport mechanism, e.g. websockets, Server-Sent Events, or a chunked HTTP response, as a regular GraphQL response.</p> +<p>As an example, the following query watches for all changes and returns the new hash:</p> +<div class="gatsby-highlight" data-language="graphql"><pre class="language-graphql"><code class="language-graphql"><span class="token keyword">subscription</span> <span class="token punctuation">{</span> + <span class="token object">watch</span> <span class="token punctuation">{</span> + <span class="token object">commit</span> <span class="token punctuation">{</span> + <span class="token property">hash</span> + <span class="token punctuation">}</span> + <span class="token punctuation">}</span> +<span class="token punctuation">}</span></code></pre></div> +<p>For every change, a message like the following will be sent:</p> +<div class="gatsby-highlight" data-language="json"><pre class="language-json"><code class="language-json"><span class="token punctuation">{</span> + <span class="token property">&quot;watch&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span> + <span class="token property">&quot;commit&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span> + <span class="token property">&quot;hash&quot;</span><span class="token operator">:</span> <span class="token string">&quot;c01a59bacc16d89e9cdd344a969f494bb2698d8f&quot;</span> + <span class="token punctuation">}</span> + <span class="token punctuation">}</span> +<span class="token punctuation">}</span></code></pre></div> +<p>Under the hood, subscriptions in <code>irmin-graphql</code> are implemented using Irmin watches, but this is opaque to the client -- this will work with any GraphQL spec compliant client!</p> +<p>Here's a video, which hows how the GraphQL response changes live as the Irmin store is being manipulated:</p> +<p><video controls="controls" width="680"><source src="/blog/2019-11-27-introducing-irmin-graphql/irmin-subscriptions.mp4" type="video/mp4"></source></video></p> +<p>Note that the current implementation only supports websockets with more transport options coming soon.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#wrap-up" aria-label="wrap up permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Wrap-up</h1> +<p>Irmin 2.0 ships with a powerful new GraphQL API, that makes it much easier to interact with Irmin over the network. This makes Irmin available for many more languages and contexts, not just applications using OCaml (or Javascript). The new API operates at a much high level than the old API, and offers advanced features such as &quot;bring your own GraphQL types&quot;, and watching for changes via GraphQL subscriptions.</p> +<p>We're looking forward to seeing what you'll build with it!</p>https://tarides.com/blog/2019-11-27-introducing-the-graphql-api-for-irmin-2-0Introducing the GraphQL API for Irmin 2.02019-11-27T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>We are pleased to announce <a href="https://github.com/mirage/irmin/releases">Irmin +2.0.0</a>, a major release of the +Git-like distributed branching and storage substrate that underpins +<a href="https://mirage.io">MirageOS</a>. We began the release process for all the +components that make up Irmin <a href="https://tarides.com/blog/2019-05-13-on-the-road-to-irmin-v2">back in May +2019</a>, and there +have been close to 1000 commits since Irmin 1.4.0 released back in June 2018. To +celebrate this milestone, we have a new logo and opened a dedicated website: +<a href="https://irmin.org">irmin.org</a>.</p> +<p>Our focus this year has been on ensuring the production success of our +early adopters -- such as the +<a href="https://gitlab.com/tezos/tezos/tree/master/src/lib_storage">Tezos</a> blockchain +and the <a href="https://github.com/moby/datakit">Datakit 9P</a> +stack -- as well as spawning new research projects into the practical +application of distributed and mergeable data stores. We are also +very pleased to welcome several new maintainers into the Mirage +project for their contributions to Irmin, namely +<a href="https://github.com/icristescu">Ioana Cristescu</a>, +<a href="https://github.com/CraigFe">Craig Ferguson</a>, +<a href="https://github.com/andreas">Andreas Garnaes</a>, +<a href="https://github.com/pascutto">Cl&eacute;ment Pascutto</a> and +<a href="https://github.com/zshipko">Zach Shipko</a>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#new-major-features" aria-label="new major features permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>New Major Features</h2> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#new-cli" aria-label="new cli permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>New CLI</h3> +<p>While Irmin is normally used as a library, it is obviously useful to +be able to interact with a data store from a shell. The <code>irmin-unix</code> +opam package now provides an <code>irmin</code> binary that is configured via a +Yaml file and can perform queries and mutations against a Git store.</p> +<div class="gatsby-highlight" data-language="shell"><pre class="language-shell"><code class="language-shell">$ <span class="token builtin class-name">echo</span> <span class="token string">&quot;root: .&quot;</span> <span class="token operator">&gt;</span> irmin.yml +$ irmin init +$ irmin <span class="token builtin class-name">set</span> foo/bar <span class="token string">&quot;testing 123&quot;</span> +$ irmin get foo/bar</code></pre></div> +<p>Try <code>irmin --help</code> to see all the commands and options available.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#tezos-and-irmin-pack" aria-label="tezos and irmin pack permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Tezos and irmin-pack</h3> +<p>Another big user of Irmin is the <a href="https://tezos.com">Tezos blockchain</a>, +and we have been optimising the persistent space usage of Irmin as their +network grows. Because Tezos doesn&rsquo;t require full Git format support, +we created a hybrid backend that grabs the best bits of Git (e.g. the +packfile mechanism) and engineered a domain-specific backend tailored +for Tezos usage. Crucially, because of the way Irmin is split into +clean libraries and OCaml modules, we only had to modify a small part +of the codebase and could also reuse elements of our +<a href="https://github.com/mirage/ocaml-git">OCaml-git</a> codebase as well.</p> +<p>The <a href="https://github.com/mirage/irmin/pull/615">irmin-pack backend</a> is available +for <a href="https://github.com/mirage/irmin/pull/888">use in the CLI</a> and provides a +significant improvement in disk usage. There is a corresponding <a href="https://gitlab.com/tezos/tezos/merge_requests/1268">Tezos merge +request</a> using the Irmin +2.0 code that has been integrated downstream and will become available via +their release process in due course.</p> +<p>As part of this development process, we also released an efficient multi-level +index implementation (imaginatively dubbed +<a href="https://github.com/mirage/index">index</a> in opam). Our implementation takes an +arbitrary IO implementation and user-supplied content types and supplies a +standard key-value interface for persistent storage. Index provides instance +sharing by default, so each OCaml runtime shares a common singleton instance.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#irmin-graphql-and-browser-irmin" aria-label="irmin graphql and browser irmin permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Irmin-GraphQL and &ldquo;browser Irmin&rdquo;</h3> +<p>Another new area of huge interest to us is +<a href="https://graphql.org">GraphQL</a> in order to provide frontends with a rich +query language for Irmin-hosted applications. Irmin 2.0 includes a +built-in GraphQL server so you can <a href="https://twitter.com/cuvius/status/1017136581755457539">manipulate your Git repo via +GraphQL</a>.</p> +<p>If you are interested in (for example) compiling elements of Irmin to +JavaScript or wasm, for usage in frontends, then the Irmin 2.0 release +makes it significantly easier to support this architecture. We&rsquo;ve +already seen some exploratory efforts <a href="https://github.com/mirage/irmin/issues/681">report issues</a> +when doing this, and we&rsquo;ve had it working ourselves in <a href="http://roscidus.com/blog/blog/2015/04/28/cuekeeper-gitting-things-done-in-the-browser/">Irmin 1.0 Cuekeeper</a> +so we are excited by the potential power of applications built using +this model. If you have ideas/questions, please get in touch on the +<a href="https://github.com/mirage/irmin/issues">issue tracker</a> with your +usecase.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#wodan" aria-label="wodan permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Wodan</h3> +<p>Irmin&rsquo;s storage layer is also well abstracted, so backends other than +a Unix filesystem or Git are supported. Irmin can run in highly +diverse and OS-free environments, and so we began engineering the +<a href="https://github.com/mirage/wodan">Wodan filesystem</a> as a +domain-specific filesystem designed for MirageOS, Irmin and modern +flash drives. See <a href="https://g2p.github.io/research/wodan.pdf">the OCaml Workshop 2017 abstract on +it</a> for more design +rationale.</p> +<p>As part of the Irmin 2.0 release, Wodan is also being prepared for a +release, and you can find <a href="https://github.com/mirage/wodan/tree/master/src/wodan-irmin">Irmin 2.0 +support</a> +in the source. If you&rsquo;d like a standalone block-device based +persistence environment for Irmin, please try this out. This is the +preferred backend for using Irmin storage in a unikernel.</p> +<p>###&nbsp;Versioned CalDAV</p> +<p>An application pulling all these pieces together is being developed +by our friends at <a href="https://robur.io/About%20Us/Team">Robur</a>: an Irmin-based +<a href="https://github.com/roburio/caldav">CalDAV calendaring server</a> +that even hosts its DNS server using a versioned Irmin store. We'll +blog more about this as the components get released and stabilised, but +the unikernel enthusiasts among you may want to browse the +<a href="https://github.com/roburio/unikernels/tree/future">Robur unikernels future branch</a> +to see how they are deploying them today.</p> +<p>A huge thank you to all our commercial customers, end users and open-source +developers who have contributed their time, expertise and +financial support to help us achieve our goal of delivering a modern +storage stack in the spirit of Git. Our next steps for Irmin are to +continue to increase the performance and optimise the storage, +and to build more end-to-end applications using the application core +on top of MirageOS.</p>https://tarides.com/blog/2019-11-21-irmin-v2Irmin v22019-11-21T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>We're glad to announce the first release of <a href="https://github.com/mirage/mrmime.git"><code>mrmime</code></a>, a parser and a +generator of emails. This library provides an <em>OCaml way</em> to analyze and craft +an email. The eventual goal is to build an entire <em>unikernel-compatible</em> stack +for email (such as SMTP or IMAP).</p> +<p>In this article, we will show what is currently possible with <code>mrmime</code> and +present a few of the useful libraries that we developed along the way.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#an-email-parser" aria-label="an email parser permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>An email parser</h2> +<p>Some years ago, Romain gave <a href="https://www.youtube.com/watch?v=kQkRsNEo25k">a talk</a> about what an email really <em>is</em>. +Behind the human-comprehensible format (or <em>rich-document</em> as we said a +long time ago), there are several details of emails which complicate the process of +analyzing them (and can be prone to security lapses). These details are mostly described +by three RFCs:</p> +<ul> +<li><a href="https://tools.ietf.org/html/rfc822">RFC822</a></li> +<li><a href="https://tools.ietf.org/html/rfc2822">RFC2822</a></li> +<li><a href="https://tools.ietf.org/html/rfc5322">RFC5322</a></li> +</ul> +<p>Even though they are cross-compatible, providing full legacy email parsing is an +archaeological exercise: each RFC retains support for the older design decisions +(which were not recognized as bad or ugly in 1970 when they were first standardized).</p> +<p>The latest email-related RFC (RFC5322) tried to fix the issue and provide a better +<a href="https://tools.ietf.org/html/rfc5234">formal specification</a> of the email format &ndash; but of course, it comes with plenty of +<em>obsolete</em> rules which need to be implemented. In the standard, you find +both the current grammar rule and its obsolete equivalent.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#an-extended-email-parser" aria-label="an extended email parser permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>An extended email parser</h3> +<p>Even if the email format can defined by &quot;only&quot; 3 RFCs, you will +miss email internationalization (<a href="https://tools.ietf.org/html/rfc6532">RFC6532</a>), the MIME format +(<a href="https://tools.ietf.org/html/rfc2045">RFC2045</a>, <a href="https://tools.ietf.org/html/rfc2046">RFC2046</a>, <a href="https://tools.ietf.org/html/rfc2047">RFC2047</a>, +<a href="https://tools.ietf.org/html/rfc2049">RFC2049</a>), or certain details needed to be interoperable with SMTP +(<a href="https://tools.ietf.org/html/rfc5321">RFC5321</a>). There are still more RFCs which add extra features +to the email format such as S/MIME or the Content-Disposition field.</p> +<p>Given this complexity, we took the most general RFCs and tried to provide an easy way to deal +with them. The main difficulty is the <em>multipart</em> parser, which deals with email +attachments (anyone who has tried to make an HTTP 1.1 parser knows about this).</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#a-realistic-email-parser" aria-label="a realistic email parser permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>A realistic email parser</h3> +<p>Respecting the rules described by RFCs is not enough to be able to analyze any +email from the real world: existing email generators can, and do, produce +<em>non-compliant</em> email. We stress-tested <code>mrmime</code> by feeding it a batch of 2 +billion emails taken from the wild, to see if it could parse everything (even if +it does not produce the expected result). Whenever we noticed a recurring +formatting mistake, we updated the details of the <a href="https://tools.ietf.org/html/rfc5234">ABNF</a> to enable +<code>mrmime</code> to parse it anyway.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#a-parser-usable-by-others" aria-label="a parser usable by others permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>A parser usable by others</h3> +<p>One demonstration of the usability of <code>mrmime</code> is <a href="https://github.com/dinosaure/ocaml-dkim.git"><code>ocaml-dkim</code></a>, which wants to +extract a specific field from your mail and then verify that the hash and signature +are as expected.</p> +<p><code>ocaml-dkim</code> is used by the latest implementation of <a href="https://github.com/mirage/ocaml-dns.git"><code>ocaml-dns</code></a> to request +public keys in order to verify email.</p> +<p>The most important question about <code>ocaml-dkim</code> is: is it able to +verify your email in one pass? Indeed, currently some implementations of DKIM +need 2 passes to verify your email (one to extract the DKIM signature, the other +to digest some fields and bodies). We focused on verifying in a <em>single</em> pass in +order to provide a unikernel SMTP <em>relay</em> with no need to store your email between +verification passes.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#an-email-generator" aria-label="an email generator permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>An email generator</h2> +<p>OCaml is a good language for making little DSLs for specialized use-cases. In this +case, we took advantage of OCaml to allow the user to easily craft an email from +nothing.</p> +<p>The idea is to build an OCaml value describing the desired email header, and +then let the Mr. MIME generator transform this into a stream of characters that +can be consumed by, for example, an SMTP implementation. The description step +is quite simple:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token directive property">#require</span> <span class="token string">&quot;mrmime&quot;</span> <span class="token punctuation">;;</span> +<span class="token directive property">#require</span> <span class="token string">&quot;ptime.clock.os&quot;</span> <span class="token punctuation">;;</span> + +<span class="token keyword">open</span> Mrmime + +<span class="token keyword">let</span> romain_calascibetta <span class="token operator">=</span> + <span class="token keyword">let</span> <span class="token keyword">open</span> Mailbox <span class="token keyword">in</span> + Local<span class="token punctuation">.</span><span class="token punctuation">[</span> w <span class="token string">&quot;romain&quot;</span><span class="token punctuation">;</span> w <span class="token string">&quot;calascibetta&quot;</span> <span class="token punctuation">]</span> <span class="token operator">@</span> Domain<span class="token punctuation">.</span><span class="token punctuation">(</span>domain<span class="token punctuation">,</span> <span class="token punctuation">[</span> a <span class="token string">&quot;gmail&quot;</span><span class="token punctuation">;</span> a <span class="token string">&quot;com&quot;</span> <span class="token punctuation">]</span><span class="token punctuation">)</span> + +<span class="token keyword">let</span> john_doe <span class="token operator">=</span> + <span class="token keyword">let</span> <span class="token keyword">open</span> Mailbox <span class="token keyword">in</span> + Local<span class="token punctuation">.</span><span class="token punctuation">[</span> w <span class="token string">&quot;john&quot;</span> <span class="token punctuation">]</span> <span class="token operator">@</span> Domain<span class="token punctuation">.</span><span class="token punctuation">(</span>domain<span class="token punctuation">,</span> <span class="token punctuation">[</span> a <span class="token string">&quot;doe&quot;</span><span class="token punctuation">;</span> a <span class="token string">&quot;org&quot;</span> <span class="token punctuation">]</span><span class="token punctuation">)</span> + <span class="token operator">|&gt;</span> with_name Phrase<span class="token punctuation">.</span><span class="token punctuation">(</span>v <span class="token punctuation">[</span> w <span class="token string">&quot;John&quot;</span><span class="token punctuation">;</span> w <span class="token string">&quot;D.&quot;</span> <span class="token punctuation">]</span><span class="token punctuation">)</span> + +<span class="token keyword">let</span> now <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> + <span class="token keyword">let</span> <span class="token keyword">open</span> Date <span class="token keyword">in</span> + of_ptime <span class="token label property">~zone</span><span class="token punctuation">:</span>Zone<span class="token punctuation">.</span>GMT <span class="token punctuation">(</span>Ptime_clock<span class="token punctuation">.</span>now <span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span> + +<span class="token keyword">let</span> subject <span class="token operator">=</span> + Unstructured<span class="token punctuation">.</span><span class="token punctuation">[</span> v <span class="token string">&quot;A&quot;</span><span class="token punctuation">;</span> sp <span class="token number">1</span><span class="token punctuation">;</span> v <span class="token string">&quot;Simple&quot;</span><span class="token punctuation">;</span> sp <span class="token number">1</span><span class="token punctuation">;</span> v <span class="token string">&quot;Mail&quot;</span> <span class="token punctuation">]</span> + +<span class="token keyword">let</span> header <span class="token operator">=</span> + <span class="token keyword">let</span> <span class="token keyword">open</span> Header <span class="token keyword">in</span> + Field<span class="token punctuation">.</span><span class="token punctuation">(</span>Subject <span class="token operator">$</span> subject<span class="token punctuation">)</span> + <span class="token operator">&amp;</span> Field<span class="token punctuation">.</span><span class="token punctuation">(</span>Sender <span class="token operator">$</span> romain_calascibetta<span class="token punctuation">)</span> + <span class="token operator">&amp;</span> Field<span class="token punctuation">.</span><span class="token punctuation">(</span>To <span class="token operator">$</span> Address<span class="token punctuation">.</span><span class="token punctuation">[</span> mailbox john_doe <span class="token punctuation">]</span><span class="token punctuation">)</span> + <span class="token operator">&amp;</span> Field<span class="token punctuation">.</span><span class="token punctuation">(</span>Date <span class="token operator">$</span> now <span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span> + <span class="token operator">&amp;</span> empty + +<span class="token keyword">let</span> stream <span class="token operator">=</span> Header<span class="token punctuation">.</span>to_stream header + +<span class="token keyword">let</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> + <span class="token keyword">let</span> <span class="token keyword">rec</span> go <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> + <span class="token keyword">match</span> stream <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">with</span> + <span class="token operator">|</span> Some buf <span class="token operator">-&gt;</span> print_string buf<span class="token punctuation">;</span> go <span class="token punctuation">(</span><span class="token punctuation">)</span> + <span class="token operator">|</span> None <span class="token operator">-&gt;</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> + <span class="token keyword">in</span> + go <span class="token punctuation">(</span><span class="token punctuation">)</span></code></pre></div> +<p>This code produces the following header:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">Date: 2 Aug 2019 14:10:10 GMT +To: John &quot;D.&quot; &lt;john@doe.org&gt; +Sender: romain.calascibetta@gmail.com +Subject: A Simple Mail</code></pre></div> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#78-character-rule" aria-label="78 character rule permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>78-character rule</h3> +<p>One aspect about email and SMTP is about some historical rules of how to +generate them. One of them is about the limitation of bytes per line. Indeed, a +generator of mail should emit at most 80 bytes per line - and, of course, it +should emits entirely the email line per line.</p> +<p>So <code>mrmime</code> has his own encoder which tries to wrap your mail into this limit. +It was mostly inspired by <a href="https://github.com/inhabitedtype/faraday">Faraday</a> and <a href="https://caml.inria.fr/pub/docs/manual-ocaml/libref/Format.html">Format</a> powered with +GADT to easily describe how to encode/generate parts of an email.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#a-multipart-email-generator" aria-label="a multipart email generator permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>A multipart email generator</h3> +<p>Of course, the main point about email is to be able to generate a multipart +email - just to be able to send file attachments. And, of course, a deep work +was done about that to make parts, compose them into specific <code>Content-Type</code> +fields and merge them into one email.</p> +<p>Eventually, you can easily make a stream from it, which respects rules (78 bytes +per line, stream line per line) and use it directly into an SMTP implementation.</p> +<p>This is what we did with the project <a href="https://github.com/dinosaure/facteur"><code>facteur</code></a>. It's a little +command-line tool to send with file attachement mails in pure OCaml - but it +works only on an UNIX operating system for instance.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#behind-the-forest" aria-label="behind the forest permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Behind the forest</h2> +<p>Even if you are able to parse and generate an email, more work is needed to get the expected results.</p> +<p>Indeed, email is a exchange unit between people and the biggest deal on that is +to find a common way to ensure a understable communication each others. About +that, encoding is probably the most important piece and when a French person wants +to communicate with a <em>latin1</em> encoding, an American person can still use ASCII.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#rosetta" aria-label="rosetta permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Rosetta</h3> +<p>So about this problem, the choice was made to unify any contents to UTF-8 as the +most general encoding of the world. So, we did some libraries which map an encoding flow +to Unicode code-point, and we use <code>uutf</code> (thanks to <a href="https://github.com/dbuenzli">dbuenzli</a>) to normalize it to UTF-8.</p> +<p>The main goal is to avoid a headache to the user about that and even if +contents of the mail is encoded with <em>latin1</em> we ensure to translate it +correctly (and according RFCs) to UTF-8.</p> +<p>This project is <a href="https://github.com/mirage/rosetta"><code>rosetta</code></a> and it comes with:</p> +<ul> +<li><a href="https://github.com/mirage/uuuu"><code>uuuu</code></a> for ISO-8859 encoding</li> +<li><a href="https://github.com/mirage/coin"><code>coin</code></a> for KOI8-{R,U} encoding</li> +<li><a href="https://github.com/mirage/yuscii"><code>yuscii</code></a> for UTF-7 encoding</li> +</ul> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#pecu-and-base64" aria-label="pecu and base64 permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Pecu and Base64</h3> +<p>Then, bodies can be encoded in some ways, 2 precisely (if we took the main +standard):</p> +<ul> +<li>A base64 encoding, used to store your file</li> +<li>A quoted-printable encoding</li> +</ul> +<p>So, about the <code>base64</code> package, it comes with a sub-package <code>base64.rfc2045</code> +which respects the special case to encode a body according RFC2045 and SMTP +limitation.</p> +<p>Then, <code>pecu</code> was made to encode and decode <em>quoted-printable</em> contents. It was +tested and fuzzed of course like any others MirageOS's libraries.</p> +<p>These libraries are needed for an other historical reason which is: bytes used +to store mail should use only 7 bits instead of 8 bits. This is the purpose of +the base64 and the <em>quoted-printable</em> encoding which uses only 127 possibilities +of a byte. Again, this limitation comes with SMTP protocol.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusion" aria-label="conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusion</h2> +<p><code>mrmime</code> is tackling the difficult task to parse and generate emails according to 50 years of usability, several RFCs and legacy rules. +So, it +still is an experimental project. We reach the first version of it because we +are currently able to parse many mails and then generate them correctly.</p> +<p>Of course, a <em>bug</em> (a malformed mail, a server which does not respect standards +or a bad use of our API) can appear easily where we did not test everything. But +we have the feeling it was the time to release it and let people to use +it.</p> +<p>The best feedback about <code>mrmime</code> and the best improvement is yours. So don't be +afraid to use it and start to hack your emails with it.</p>https://tarides.com/blog/2019-09-25-mr-mime-parse-and-generate-emailsMr. MIME - Parse and generate emails2019-09-25T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>In our <a href="https://tarides.com/blog/2019-08-26-decompress-the-new-decompress-api.html">first article</a> we mostly discussed +the API design of <code>decompress</code> and did not talk too much about the issue of +optimizing performance. In this second article, we will relate our experiences +of optimizing <code>decompress</code>.</p> +<p>As you might suspect, <code>decompress</code> needs to be optimized a lot. It was used by +several projects as an underlying layer of some formats (like Git), so it can be +a real bottleneck in those projects. Of course, we start with a footgun by using +a garbage-collected language; comparing the performance of <code>decompress</code> with a C +implementation (like <a href="https://zlib.net/">zlib</a> or <a href="https://github.com/richgel999/miniz">miniz</a>) is obviously not very fair.</p> +<p>However, using something like <code>decompress</code> instead of C implementations can be +very interesting for many purposes, especially when thinking about <em>unikernels</em>. +As we said in the previous article, we can take the advantage of the <em>runtime</em> +and the type-system to provide something <em>safer</em> (of course, it's not really +true since zlib has received several security audits).</p> +<p>The main idea in this article is not to give snippets to copy/paste into your +codebase but to explain some behaviors of the compiler / runtime and hopefully +give you some ideas about how to optimize your own code. We'll discuss the +following optimizations:</p> +<ul> +<li>specialization</li> +<li>inlining</li> +<li>untagged integers</li> +<li>exceptions</li> +<li>unrolling</li> +<li>hot-loop</li> +<li>caml_modify</li> +<li>representation sizes</li> +</ul> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#cautionary-advice" aria-label="cautionary advice permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Cautionary advice</h3> +<p>Before we begin discussing optimization, keep this rule in mind:</p> +<blockquote> +<p>Only perform optimization at the <strong>end</strong> of the development process.</p> +</blockquote> +<p>An optimization pass +can change your code significantly, so you need to keep a state of your project +that can be trusted. This state will provide a comparison point for both +benchmarks and behaviors. In other words, your stable implementation will be the +oracle for your benchmarks. If you start with nothing, you'll achieve +arbitrarily-good performance at the cost of arbitrary behavior!</p> +<p>We optimized <code>decompress</code> because we are using it in bigger projects for a long +time (2 years). So we have an oracle (even if <code>zlib</code> can act as an oracle in +this special case).</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#specialization" aria-label="specialization permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Specialization</h2> +<p>One of the biggest specializations in <code>decompress</code> is regarding the <code>min</code> +function. If you don't know, in OCaml <code>min</code> is polymorphic; you can compare +anything. So you probably have some concerns about how <code>min</code> is implemented?</p> +<p>You are right to be concerned: if you examine the details, <code>min</code> calls the C +function <code>do_compare_val</code>, which traverses your structure and does a comparison +according the run-time representation of your structure. Of course, for integers, it +should be only a <code>cmpq</code> assembly instruction. However, some simple code like:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> x <span class="token operator">=</span> min <span class="token number">0</span> <span class="token number">1</span></code></pre></div> +<p>will produce this CMM and assembly code:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">(let x/1002 (app{main.ml:1,8-15} &quot;camlStdlib__min_1028&quot; 1 3 val) + ...)</code></pre></div> +<div class="gatsby-highlight" data-language="nasm"><pre class="language-nasm"><code class="language-nasm"><span class="token label function">.L101:</span> + movq <span class="token number">$3</span>, <span class="token operator">%</span><span class="token register variable">rbx</span> + movq <span class="token number">$1</span>, <span class="token operator">%</span><span class="token register variable">rax</span> + call camlStdlib__min_1028@PLT</code></pre></div> +<p>Note that <em><a href="https://en.wikipedia.org/wiki/Lambda_calculus#Beta_reduction">beta-reduction</a></em>, <em><a href="https://en.wikipedia.org/wiki/Inline_expansion">inlining</a></em> and +specialization were not done in this code. OCaml does not optimize your code +very much &ndash; the good point is predictability of the produced assembly output.</p> +<p>If you help the compiler a little bit with:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">external</span> <span class="token punctuation">(</span> <span class="token operator">&lt;=</span> <span class="token punctuation">)</span> <span class="token punctuation">:</span> int <span class="token operator">-&gt;</span> int <span class="token operator">-&gt;</span> bool <span class="token operator">=</span> <span class="token string">&quot;%lessequal&quot;</span> +<span class="token keyword">let</span> min a b <span class="token operator">=</span> <span class="token keyword">if</span> a <span class="token operator">&lt;=</span> b <span class="token keyword">then</span> a <span class="token keyword">else</span> b <span class="token punctuation">[</span><span class="token operator">@@</span>inline<span class="token punctuation">]</span> + +<span class="token keyword">let</span> x <span class="token operator">=</span> min <span class="token number">0</span> <span class="token number">1</span></code></pre></div> +<p>We have:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">(function{main.ml:2,8-43} camlMain__min_1003 (a/1004: val b/1005: val) + (if (&lt;= a/1004 b/1005) a/1004 b/1005)) + +(function camlMain__entry () + (let x/1006 1 (store val(root-init) (+a &quot;camlMain&quot; 8) 1)) 1a)</code></pre></div> +<div class="gatsby-highlight" data-language="nasm"><pre class="language-nasm"><code class="language-nasm"><span class="token label function">.L101:</span> + cmpq <span class="token operator">%</span><span class="token register variable">rbx</span>, <span class="token operator">%</span><span class="token register variable">rax</span> + jg .L100 + ret</code></pre></div> +<p>So we have all optimizations, in this produced code, <code>x</code> was evaluated as <code>0</code> +(<code>let x/... (store ... 1)</code>) (beta-reduction and inlining) and <code>min</code> was +specialized to accept only integers &ndash; so we are able to emit <code>cmpq</code>.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#results" aria-label="results permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Results</h3> +<p>With specialization, we won 10 Mb/s on decompression, where <code>min</code> is used +in several places. We completely avoid an indirection and a call to the slow +<code>do_compare_val</code> function.</p> +<p>This kind of specialization is already done by <a href="https://caml.inria.fr/pub/docs/manual-ocaml/flambda.html"><code>flambda</code></a>, however, we +currently use OCaml 4.07.1. So we decided to this kind of optimization by +ourselves.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#inlining" aria-label="inlining permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Inlining</h2> +<p>In the first example, we showed code with the <code>[@@inline]</code> keyword which is +useful to force the compiler to inline a little function. We will go outside the +OCaml world and study C code (gcc 5.4.0) to really understand +<em>inlining</em>.</p> +<p>In fact, inlining is not necessarily the best optimization. Consider the +following (nonsensical) C program:</p> +<div class="gatsby-highlight" data-language="c"><pre class="language-c"><code class="language-c"><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">&lt;stdio.h&gt;</span></span> +<span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">&lt;string.h&gt;</span></span> +<span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">&lt;unistd.h&gt;</span></span> +<span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">&lt;time.h&gt;</span></span> +<span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">&lt;stdlib.h&gt;</span></span> + +<span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">ifdef</span> <span class="token expression">HIDE_ALIGNEMENT</span></span> +<span class="token keyword">__attribute__</span><span class="token punctuation">(</span><span class="token punctuation">(</span>noinline<span class="token punctuation">,</span> noclone<span class="token punctuation">)</span><span class="token punctuation">)</span> +<span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">endif</span></span> +<span class="token keyword">void</span> <span class="token operator">*</span> +<span class="token function">hide</span><span class="token punctuation">(</span><span class="token keyword">void</span> <span class="token operator">*</span> p<span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token keyword">return</span> p<span class="token punctuation">;</span> <span class="token punctuation">}</span> + +<span class="token keyword">int</span> <span class="token function">main</span><span class="token punctuation">(</span><span class="token keyword">int</span> ac<span class="token punctuation">,</span> <span class="token keyword">const</span> <span class="token keyword">char</span> <span class="token operator">*</span>av<span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">)</span> +<span class="token punctuation">{</span> + <span class="token keyword">char</span> <span class="token operator">*</span>s <span class="token operator">=</span> <span class="token function">calloc</span><span class="token punctuation">(</span><span class="token number">1</span> <span class="token operator">&lt;&lt;</span> <span class="token number">20</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">;</span> + s <span class="token operator">=</span> <span class="token function">hide</span><span class="token punctuation">(</span>s<span class="token punctuation">)</span><span class="token punctuation">;</span> + + <span class="token function">memset</span><span class="token punctuation">(</span>s<span class="token punctuation">,</span> <span class="token char">'B'</span><span class="token punctuation">,</span> <span class="token number">100000</span><span class="token punctuation">)</span><span class="token punctuation">;</span> + + <span class="token class-name">clock_t</span> start <span class="token operator">=</span> <span class="token function">clock</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span> + + <span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token keyword">int</span> i <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span> i <span class="token operator">&lt;</span> <span class="token number">1280000</span><span class="token punctuation">;</span> <span class="token operator">++</span>i<span class="token punctuation">)</span> + s<span class="token punctuation">[</span><span class="token function">strlen</span><span class="token punctuation">(</span>s<span class="token punctuation">)</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token char">'A'</span><span class="token punctuation">;</span> + + <span class="token class-name">clock_t</span> end <span class="token operator">=</span> <span class="token function">clock</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span> + + <span class="token function">printf</span><span class="token punctuation">(</span><span class="token string">&quot;%lld\n&quot;</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token keyword">long</span> <span class="token keyword">long</span><span class="token punctuation">)</span> <span class="token punctuation">(</span>end<span class="token operator">-</span>start<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span> + + <span class="token keyword">return</span> <span class="token number">0</span><span class="token punctuation">;</span> +<span class="token punctuation">}</span></code></pre></div> +<p>We will compile this code with <code>-O2</code> (the second level of optimization in C), +once with <code>-DHIDE_ALIGNEMENT</code> and once without. The assembly emitted differs:</p> +<div class="gatsby-highlight" data-language="nasm"><pre class="language-nasm"><code class="language-nasm"><span class="token label function">.L3:</span> + movq <span class="token operator">%</span><span class="token register variable">rbp</span>, <span class="token operator">%</span><span class="token register variable">rdi</span> + call strlen + subl <span class="token number">$1</span>, <span class="token operator">%</span><span class="token register variable">ebx</span> + movb <span class="token number">$65</span>, <span class="token number">0</span>(<span class="token operator">%</span><span class="token register variable">rbp</span>,<span class="token operator">%</span><span class="token register variable">rax</span>) + jne .L3</code></pre></div> +<div class="gatsby-highlight" data-language="nasm"><pre class="language-nasm"><code class="language-nasm"><span class="token label function">.L3:</span> + movl (<span class="token operator">%</span><span class="token register variable">rdx</span>), <span class="token operator">%</span><span class="token register variable">ecx</span> + addq <span class="token number">$4</span>, <span class="token operator">%</span><span class="token register variable">rdx</span> + leal <span class="token operator">-</span><span class="token number">16843009</span>(<span class="token operator">%</span><span class="token register variable">rcx</span>), <span class="token operator">%</span><span class="token register variable">eax</span> + notl <span class="token operator">%</span><span class="token register variable">ecx</span> + andl <span class="token operator">%</span><span class="token register variable">ecx</span>, <span class="token operator">%</span><span class="token register variable">eax</span> + andl <span class="token operator">$</span><span class="token operator">-</span><span class="token number">2139062144</span>, <span class="token operator">%</span><span class="token register variable">eax</span> + je .L3</code></pre></div> +<p>In the first output (with <code>-DHIDE_ALIGNEMENT</code>), the optimization pass +decides to disable inlining of <code>strlen</code>; in the second output (without +<code>-DHIDEAlIGNEMENT</code>), it decides to inline <code>strlen</code> (and do some other clever +optimizations). The reason behind this complex behavior from the compiler is +clearly described <a href="https://stackoverflow.com/a/55589634">here</a>.</p> +<p>But what we want to say is that inlining is <strong>not</strong> an automatic optimization; +it might act as a <em>pessimization</em>. This is the goal of <code>flambda</code>: do the right +optimization under the right context. If you are really curious about what <code>gcc</code> +does and why, even if it's very interesting, the reverse engineering of the +optimization process and which information is relevant about the choice to +optimize or not is deep, long and surely too complicated.</p> +<p>A non-spontaneous optimization is to annotate some parts of your code with +<code>[@@inline never]</code> &ndash; so, explicitly say to the compiler to not inline the +function. This constraint is to help the compiler to generate a smaller code +which will have more chance to fit under the processor cache.</p> +<p>For all of these reasons, <code>[@@inline]</code> should be used sparingly and an oracle to +compare performances if you inline or not this or this function is necessary to +avoid a <em>pessimization</em>.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#in-decompress" aria-label="in decompress permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>In <code>decompress</code></h3> +<p>Inlining in <code>decompress</code> was done on small functions which need to allocate +to return a value. If we inline them, we can take the opportunity to store +returned value in registers (of course, it depends how many registers are free).</p> +<p>As we said, the goal of the inflator is to translate a bit sequence to a byte. +The largest bit sequence possible according to RFC 1951 has length 15. So, when +we process an inputs flow, we eat it 15 bits per 15 bits. For each packet, we +want to recognize an existing associated bit sequence and then, binded values +will be the real length of the bit sequence and the byte:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">val</span> find <span class="token punctuation">:</span> bits<span class="token punctuation">:</span>int <span class="token operator">-&gt;</span> <span class="token punctuation">{</span> len<span class="token punctuation">:</span> int<span class="token punctuation">;</span> byte<span class="token punctuation">:</span> int<span class="token punctuation">;</span> <span class="token punctuation">}</span></code></pre></div> +<p>So for each call to this function, we need to allocate a record/tuple. It's +why we choose to inline this function. <code>min</code> was inlined too and some other +small functions. But as we said, the situation is complex; where we think that +<em>inlining</em> can help us, it's not systematically true.</p> +<p>NOTE: we can recognize bits sequence with, at most, 15 bits because a +<a href="https://zlib.net/feldspar.html">Huffman coding</a> is <a href="https://en.wikipedia.org/wiki/Prefix_code">prefix-free</a>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#untagged-integers" aria-label="untagged integers permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Untagged integers</h2> +<p>When reading assembly, the integer <code>0</code> is written as <code>$1</code>. +It's because of the <a href="https://blog.janestreet.com/what-is-gained-and-lost-with-63-bit-integers/">GC bit</a> needed to differentiate a pointer +and an unboxed integer. This is why, in OCaml, we talk about a 31-bits integer +or a 63-bits integer (depending on your architecture).</p> +<p>We will not try to start a debate about this arbitrary choice on the +representation of an integer in OCaml. However, we can talk about some +operations which can have an impact on performances.</p> +<p>The biggest example is about the <code>mod</code> operation. Between OCaml and C, <code>%</code> or +<code>mod</code> should be the same:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> f a b <span class="token operator">=</span> a <span class="token operator">mod</span> b</code></pre></div> +<p>The output assembly is:</p> +<div class="gatsby-highlight" data-language="nasm"><pre class="language-nasm"><code class="language-nasm"><span class="token label function">.L105:</span> + movq <span class="token operator">%</span><span class="token register variable">rdi</span>, <span class="token operator">%</span><span class="token register variable">rcx</span> + sarq <span class="token number">$1</span>, <span class="token operator">%</span><span class="token register variable">rcx</span> <span class="token operator">/</span><span class="token operator">/</span> b <span class="token operator">&gt;</span><span class="token operator">&gt;</span> <span class="token number">1</span> + movq (<span class="token operator">%</span><span class="token register variable">rsp</span>), <span class="token operator">%</span><span class="token register variable">rax</span> + sarq <span class="token number">$1</span>, <span class="token operator">%</span><span class="token register variable">rax</span> <span class="token operator">/</span><span class="token operator">/</span> a <span class="token operator">&gt;</span><span class="token operator">&gt;</span> <span class="token number">1</span> + testq <span class="token operator">%</span><span class="token register variable">rcx</span>, <span class="token operator">%</span><span class="token register variable">rcx</span> <span class="token operator">/</span><span class="token operator">/</span> b <span class="token operator">!</span><span class="token operator">=</span> <span class="token number">0</span> + je .L107 + cqto + idivq <span class="token operator">%</span><span class="token register variable">rcx</span> <span class="token operator">/</span><span class="token operator">/</span> a <span class="token operator">%</span> b + jmp .L106 +<span class="token label function">.L107:</span> + movq caml_backtrace_pos@GOTPCREL(<span class="token operator">%</span>rip), <span class="token operator">%</span><span class="token register variable">rax</span> + xorq <span class="token operator">%</span><span class="token register variable">rbx</span>, <span class="token operator">%</span><span class="token register variable">rbx</span> + movl <span class="token operator">%</span><span class="token register variable">ebx</span>, (<span class="token operator">%</span><span class="token register variable">rax</span>) + movq caml_exn_Division_by_zero@GOTPCREL(<span class="token operator">%</span>rip), <span class="token operator">%</span><span class="token register variable">rax</span> + call caml_raise_exn@PLT +<span class="token label function">.L106:</span> + salq <span class="token number">$1</span>, <span class="token operator">%</span><span class="token register variable">rdx</span> <span class="token operator">/</span><span class="token operator">/</span> x <span class="token operator">&lt;</span><span class="token operator">&lt;</span> <span class="token number">1</span> + incq <span class="token operator">%</span><span class="token register variable">rdx</span> <span class="token operator">/</span><span class="token operator">/</span> x <span class="token operator">+</span> <span class="token number">1</span> + movq <span class="token operator">%</span><span class="token register variable">rbx</span>, <span class="token operator">%</span><span class="token register variable">rax</span></code></pre></div> +<p>where idiomatically the same C code produce:</p> +<div class="gatsby-highlight" data-language="nasm"><pre class="language-nasm"><code class="language-nasm"><span class="token label function">.L2:</span> + movl <span class="token operator">-</span><span class="token number">12</span>(<span class="token operator">%</span><span class="token register variable">rbp</span>), <span class="token operator">%</span><span class="token register variable">eax</span> + cltd + idivl <span class="token operator">-</span><span class="token number">8</span>(<span class="token operator">%</span><span class="token register variable">rbp</span>) + movl <span class="token operator">%</span><span class="token register variable">edx</span>, <span class="token operator">-</span><span class="token number">4</span>(<span class="token operator">%</span><span class="token register variable">rbp</span>)</code></pre></div> +<p>Of course, we can notice firstly the exception in OCaml (<code>Divided_by_zero</code>) - +which is pretty good because it protects us against an interrupt from assembly +(and keep the trace). Then, we need to <em>untag</em> <code>a</code> and <code>b</code> with <code>sarq</code> assembly +operation. We do, as the C code, <code>idiv</code> and then we must <em>retag</em> returned value +<code>x</code> with <code>salq</code> and <code>incq</code>.</p> +<p>So in some parts, it should be more interesting to use <code>Nativeint</code>. However, by +default, a <code>nativeint</code> is boxed. <em>boxed</em> means that the value is allocated in +the OCaml heap alongside a header.</p> +<p>Of course, this is not what we want so, if our <code>nativeint ref</code> (to have +side-effect, like <code>x</code>) stay inside a function and then, you return the real +value with the deref <code>!</code> operator, OCaml, by a good planet alignment, can +directly use registers and real integers. So it should be possible to avoid +these needed conversions.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#readability-versus-performance" aria-label="readability versus performance permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Readability versus performance</h3> +<p>We use this optimization only in few parts of the code. In fact, switch +between <code>int</code> and <code>nativeint</code> is little bit noisy:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml">hold <span class="token operator">:=</span> Nativeint<span class="token punctuation">.</span>logor <span class="token operator">!</span>hold Nativeint<span class="token punctuation">.</span><span class="token punctuation">(</span>shift_left <span class="token punctuation">(</span>of_int <span class="token punctuation">(</span>unsafe_get_uint8 d<span class="token punctuation">.</span>i <span class="token operator">!</span>i_pos<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token operator">!</span>bits<span class="token punctuation">)</span></code></pre></div> +<p>In the end, we only gained 0.5Mb/s of inflation rate, so it's not worthwhile +to do systematically this optimization. Especially that the gain is not very +big. But this case show a more troubling problem: loss of readability.</p> +<p>In fact, we can optimize more and more a code (OCaml or C) but we lost, step by +step, readability. You should be afraid by the implementation of <code>strlen</code> for +example. In the end, the loss of readability makes it harder to understand the purpose +of the code, leading to errors whenever some other person (or you in 10 years time) +tries to make a change.</p> +<p>And we think that this kind of optimization is not the way of OCaml in general +where we prefer to produce an understandable and abstracted code than a cryptic +and super fast one.</p> +<p>Again, <code>flambda</code> wants to fix this problem and let the compiler to do this +optimization. The goal is to be able to write a fast code without any pain.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#exceptions" aria-label="exceptions permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Exceptions</h2> +<p>If you remember our <a href="https://tarides.com/blog/2019-02-08-release-of-base64.html">article</a> about the release of <code>base64</code>, we talked a +bit about exceptions and used them as a <em>jump</em>. In fact, it's pretty +common for an OCaml developer to break the control-flow with an exception. +Behind this common design/optimization, it's about calling convention.</p> +<p>Indeed, choose the <em>jump</em> word to describe OCaml exception is not the best where +we don't use <code>setjmp</code>/<code>longjmp</code>.</p> +<p>In the details, when you start a code with a <code>try .. with</code>, OCaml saves a <em>trap</em> +in the stack which contains information about the <code>with</code>, the catcher. Then, +when you <code>raise</code>, you <em>jump</em> directly to this trap and can just discard several +stack frames (and, by this way, you did not check each return codes).</p> +<p>In several places and mostly in the <em>hot-loop</em>, we use this <em>pattern</em>. However, +it completely breaks the control flow and can be error-prone.</p> +<p>To limit errors and because this pattern is usual, we prefer to use a <em>local</em> +exception which will be used only inside the function. By this way, we enforce +the fact that exception should not (and can not) be caught by something else +than inside the function.</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"> <span class="token keyword">let</span> <span class="token keyword">exception</span> Break <span class="token keyword">in</span> + + <span class="token punctuation">(</span> <span class="token keyword">try</span> <span class="token keyword">while</span> <span class="token operator">!</span>max <span class="token operator">&gt;=</span> <span class="token number">1</span> <span class="token keyword">do</span> + <span class="token keyword">if</span> bl_count<span class="token punctuation">.</span><span class="token punctuation">(</span><span class="token operator">!</span>max<span class="token punctuation">)</span> <span class="token operator">!=</span> <span class="token number">0</span> <span class="token keyword">then</span> raise_notrace Break + <span class="token punctuation">;</span> decr max <span class="token keyword">done</span> <span class="token keyword">with</span> Break <span class="token operator">-&gt;</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">)</span> <span class="token punctuation">;</span></code></pre></div> +<p>This code above produce this assembly code:</p> +<div class="gatsby-highlight" data-language="nasm"><pre class="language-nasm"><code class="language-nasm"><span class="token label function">.L105:</span> + pushq <span class="token operator">%</span><span class="token register variable">r14</span> + movq <span class="token operator">%</span><span class="token register variable">rsp</span>, <span class="token operator">%</span><span class="token register variable">r14</span> +<span class="token label function">.L103:</span> + cmpq <span class="token number">$3</span>, <span class="token operator">%</span><span class="token register variable">rdi</span> <span class="token operator">/</span><span class="token operator">/</span> while <span class="token operator">!</span>max <span class="token operator">&gt;</span><span class="token operator">=</span> <span class="token number">1</span> + jl .L102 + movq <span class="token operator">-</span><span class="token number">4</span>(<span class="token operator">%</span><span class="token register variable">rbx</span>,<span class="token operator">%</span><span class="token register variable">rdi</span>,<span class="token number">4</span>), <span class="token operator">%</span><span class="token register variable">rsi</span> <span class="token operator">/</span><span class="token operator">/</span> bl_count,(<span class="token operator">!</span>max) + cmpq <span class="token number">$1</span>, <span class="token operator">%</span><span class="token register variable">rsi</span> <span class="token operator">/</span><span class="token operator">/</span> bl_count.(<span class="token operator">!</span>max) <span class="token operator">!</span><span class="token operator">=</span> <span class="token number">0</span> + je .L104 + movq <span class="token operator">%</span><span class="token register variable">r14</span>, <span class="token operator">%</span><span class="token register variable">rsp</span> + popq <span class="token operator">%</span><span class="token register variable">r14</span> + ret <span class="token operator">/</span><span class="token operator">/</span> raise_notrace Break +<span class="token label function">.L104:</span> + addq <span class="token operator">$</span><span class="token operator">-</span><span class="token number">2</span>, <span class="token operator">%</span><span class="token register variable">rdi</span> <span class="token operator">/</span><span class="token operator">/</span> decr max + movq <span class="token operator">%</span><span class="token register variable">rdi</span>, <span class="token number">16</span>(<span class="token operator">%</span><span class="token register variable">rsp</span>) + jmp .L103</code></pre></div> +<p>Where the <code>ret</code> is the <code>raise_notrace Break</code>. A <code>raise_notrace</code> is needed, +otherwise, you will see:</p> +<div class="gatsby-highlight" data-language="nasm"><pre class="language-nasm"><code class="language-nasm"> movq caml_backtrace_pos@GOTPCREL(<span class="token operator">%</span>rip), <span class="token operator">%</span><span class="token register variable">rbx</span> + xorq <span class="token operator">%</span><span class="token register variable">rdi</span>, <span class="token operator">%</span><span class="token register variable">rdi</span> + movl <span class="token operator">%</span><span class="token register variable">edi</span>, (<span class="token operator">%</span><span class="token register variable">rbx</span>) + call caml_raise_exn@PLT</code></pre></div> +<p>Instead the <code>ret</code> assembly code. Indeed, in this case, we need to store where we +raised the exception.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#unrolling" aria-label="unrolling permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Unrolling</h2> +<p>When we showed the optimization done by <code>gcc</code> when the string is aligned, <code>gcc</code> +did another optimization. Instead of setting the string byte per byte, it decides to +update it 4 bytes per 4 bytes.</p> +<p>This kind of this optimization is an <em>unroll</em> and we did it in <code>decompress</code>. +Indeed, when we reach the <em>copy</em> <em>opcode</em> emitted by the <a href="https://en.wikipedia.org/wiki/LZ77_and_LZ78">lz77</a> +compressor, we want to <em>blit</em> <em>length</em> byte(s) from a source to the outputs +flow. It can appear that this <code>memcpy</code> can be optimized to copy 4 bytes per 4 +bytes &ndash; 4 bytes is generally a good idea where it's the size of an <code>int32</code> and +should fit under any architectures.</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> blit src src_off dst dst_off <span class="token operator">=</span> + <span class="token keyword">if</span> dst_off &ndash; src_off <span class="token operator">&lt;</span> <span class="token number">4</span> + <span class="token keyword">then</span> slow_blit src src_off dst dst_off + <span class="token keyword">else</span> + <span class="token keyword">let</span> len0 <span class="token operator">=</span> len <span class="token operator">land</span> <span class="token number">3</span> <span class="token keyword">in</span> + <span class="token keyword">let</span> len1 <span class="token operator">=</span> len <span class="token operator">asr</span> <span class="token number">2</span> <span class="token keyword">in</span> + + <span class="token keyword">for</span> i <span class="token operator">=</span> <span class="token number">0</span> <span class="token keyword">to</span> len1 &ndash; <span class="token number">1</span> + <span class="token keyword">do</span> + <span class="token keyword">let</span> i <span class="token operator">=</span> i <span class="token operator">*</span> <span class="token number">4</span> <span class="token keyword">in</span> + <span class="token keyword">let</span> v <span class="token operator">=</span> unsafe_get_uint32 src <span class="token punctuation">(</span>src_off <span class="token operator">+</span> i<span class="token punctuation">)</span> <span class="token keyword">in</span> + unsafe_set_uint32 dst <span class="token punctuation">(</span>dst_off <span class="token operator">+</span> i<span class="token punctuation">)</span> v <span class="token punctuation">;</span> + <span class="token keyword">done</span> <span class="token punctuation">;</span> + + <span class="token keyword">for</span> i <span class="token operator">=</span> <span class="token number">0</span> <span class="token keyword">to</span> len0 &ndash; <span class="token number">1</span> + <span class="token keyword">do</span> + <span class="token keyword">let</span> i <span class="token operator">=</span> len1 <span class="token operator">*</span> <span class="token number">4</span> <span class="token operator">+</span> i <span class="token keyword">in</span> + <span class="token keyword">let</span> v <span class="token operator">=</span> unsafe_get_uint8 src <span class="token punctuation">(</span>src_off <span class="token operator">+</span> i<span class="token punctuation">)</span> <span class="token keyword">in</span> + unsafe_set_uint8 dst <span class="token punctuation">(</span>dst_off <span class="token operator">+</span> i<span class="token punctuation">)</span> v <span class="token punctuation">;</span> + <span class="token keyword">done</span></code></pre></div> +<p>In this code, at the beginning, we copy 4 bytes per 4 bytes and if <code>len</code> is not +a multiple of 4, we start the <em>trailing</em> loop to copy byte per byte then. In +this context, OCaml can <em>unbox</em> <code>int32</code> and use registers. So this function does +not deal with the heap, and by this way, with the garbage collector.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#results-1" aria-label="results 1 permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Results</h3> +<p>In the end, we gained an extra 10Mb/s of inflation rate. The <code>blit</code> function is the +most important function when it comes to inflating the window to an output flow. +As the specialization on the <code>min</code> function, this is one of the biggest optimization on +<code>decompress</code>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#hot-loop" aria-label="hot loop permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><em>hot-loop</em></h2> +<p>A common design about decompression (but we can find it on hash implementation +too), is the <em>hot-loop</em>. An <em>hot-loop</em> is mainly a loop on the most common +operation in your process. In the context of <code>decompress</code>, the <em>hot-loop</em> is +about a repeated translation from bits-sequence to byte(s) from the inputs flow +to the outputs flow and the window.</p> +<p>The main idea behind the <em>hot-loop</em> is to initialize all information needed for +the translation before to start the <em>hot-loop</em>. Then, it's mostly an imperative +loop with a <em>pattern-matching</em> which corresponds to the current state of the +global computation.</p> +<p>In OCaml, we can take this opportunity to use <code>int ref</code> (or <code>nativeint ref</code>), and then, they will be translated into registers (which is the fastest +area to store something).</p> +<p>Another deal inside the <em>hot-loop</em> is to avoid any allocation &ndash; and it's why we +talk about <code>int</code> or <code>nativeint</code>. Indeed, a more complex structure like an option +will add a blocker to the garbage collection (a call to <code>caml_call_gc</code>).</p> +<p>Of course, this kind of design is completely wrong if we think in a functional +way. However, this is the (biggest?) advantage of OCaml: hide this ugly/hacky +part inside a functional interface.</p> +<p>In the API, we talked about a state which represents the <em>inflation</em> (or the +<em>deflation</em>). At the beginning, the goal is to store into some references +essentials values like the position into the inputs flow, bits available, +dictionary, etc. Then, we launch the <em>hot-loop</em> and only at the end, we update the state.</p> +<p>So we keep the optimal design about <em>inflation</em> and the functional way outside +the <em>hot-loop</em>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#caml_modify" aria-label="caml_modify permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>caml_modify</h2> +<p>One issue that we need to consider is the call to <code>caml_modify</code>. In +fact, for a complex data-structure like an <code>int array</code> or a <code>int option</code> (so, +other than an integer or a boolean or an <em>immediate</em> value), values can move to the +major heap.</p> +<p>In this context, <code>caml_modify</code> is used to assign a new value into your mutable +block. It is a bit slower than a simple assignment but needed to +ensure pointer correspondence between minor heap and major heap.</p> +<p>With this OCaml code for example:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">type</span> t <span class="token operator">=</span> <span class="token punctuation">{</span> <span class="token keyword">mutable</span> v <span class="token punctuation">:</span> int option <span class="token punctuation">}</span> + +<span class="token keyword">let</span> f t v <span class="token operator">=</span> t<span class="token punctuation">.</span>v <span class="token operator">&lt;-</span> v</code></pre></div> +<p>We produce this assembly:</p> +<div class="gatsby-highlight" data-language="nasm"><pre class="language-nasm"><code class="language-nasm"><span class="token label function">camlExample__f_1004:</span> + subq <span class="token number">$8</span>, <span class="token operator">%</span><span class="token register variable">rsp</span> + movq <span class="token operator">%</span><span class="token register variable">rax</span>, <span class="token operator">%</span><span class="token register variable">rdi</span> + movq <span class="token operator">%</span><span class="token register variable">rbx</span>, <span class="token operator">%</span><span class="token register variable">rsi</span> + call caml_modify@PLT + movq <span class="token number">$1</span>, <span class="token operator">%</span><span class="token register variable">rax</span> + addq <span class="token number">$8</span>, <span class="token operator">%</span><span class="token register variable">rsp</span> + ret</code></pre></div> +<p>Where we see the call to <code>caml_modify</code> which will be take care about the +assignment of <code>v</code> into <code>t.v</code>. This call is needed mostly because the type of <code>t.v</code> is not an <em>immediate</em> value like an integer. So, for many values in the +<em>inflator</em> and the <em>deflator</em>, we mostly use integers.</p> +<p>Of course, at some points, we use <code>int array</code> and set them at some specific +points of the <em>inflator</em> &ndash; where we inflated the dictionary. However, the impact +of <code>caml_modify</code> is not very clear where it is commonly pretty fast.</p> +<p>Sometimes, however, it can be a real bottleneck in your computation and +this depends on how long your values live in the heap. A little program (which is +not very reproducible) can show that:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> t <span class="token operator">=</span> Array<span class="token punctuation">.</span>init <span class="token punctuation">(</span>int_of_string Sys<span class="token punctuation">.</span>argv<span class="token punctuation">.</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">(</span><span class="token keyword">fun</span> <span class="token punctuation">_</span> <span class="token operator">-&gt;</span> Random<span class="token punctuation">.</span>int <span class="token number">256</span><span class="token punctuation">)</span> + +<span class="token keyword">let</span> pr fmt <span class="token operator">=</span> Format<span class="token punctuation">.</span>printf fmt + +<span class="token keyword">type</span> t0 <span class="token operator">=</span> <span class="token punctuation">{</span> <span class="token keyword">mutable</span> v <span class="token punctuation">:</span> int option <span class="token punctuation">}</span> +<span class="token keyword">type</span> t1 <span class="token operator">=</span> <span class="token punctuation">{</span> v <span class="token punctuation">:</span> int option <span class="token punctuation">}</span> + +<span class="token keyword">let</span> f0 <span class="token punctuation">(</span>t0 <span class="token punctuation">:</span> t0<span class="token punctuation">)</span> <span class="token operator">=</span> + <span class="token keyword">for</span> i <span class="token operator">=</span> <span class="token number">0</span> <span class="token keyword">to</span> Array<span class="token punctuation">.</span>length t &ndash; <span class="token number">1</span> + <span class="token keyword">do</span> <span class="token keyword">let</span> v <span class="token operator">=</span> <span class="token keyword">match</span> t0<span class="token punctuation">.</span>v<span class="token punctuation">,</span> t<span class="token punctuation">.</span><span class="token punctuation">(</span>i<span class="token punctuation">)</span> <span class="token keyword">with</span> + <span class="token operator">|</span> Some <span class="token punctuation">_</span> <span class="token keyword">as</span> v<span class="token punctuation">,</span> <span class="token punctuation">_</span> <span class="token operator">-&gt;</span> v + <span class="token operator">|</span> None<span class="token punctuation">,</span> <span class="token number">5</span> <span class="token operator">-&gt;</span> Some i + <span class="token operator">|</span> None<span class="token punctuation">,</span> <span class="token punctuation">_</span> <span class="token operator">-&gt;</span> None <span class="token keyword">in</span> + t0<span class="token punctuation">.</span>v <span class="token operator">&lt;-</span> v + <span class="token keyword">done</span><span class="token punctuation">;</span> t0 + +<span class="token keyword">let</span> f1 <span class="token punctuation">(</span>t1 <span class="token punctuation">:</span> t1<span class="token punctuation">)</span> <span class="token operator">=</span> + <span class="token keyword">let</span> t1 <span class="token operator">=</span> ref t1 <span class="token keyword">in</span> + <span class="token keyword">for</span> i <span class="token operator">=</span> <span class="token number">0</span> <span class="token keyword">to</span> Array<span class="token punctuation">.</span>length t &ndash; <span class="token number">1</span> + <span class="token keyword">do</span> <span class="token keyword">let</span> v <span class="token operator">=</span> <span class="token keyword">match</span> <span class="token operator">!</span>t1<span class="token punctuation">.</span>v<span class="token punctuation">,</span> t<span class="token punctuation">.</span><span class="token punctuation">(</span>i<span class="token punctuation">)</span> <span class="token keyword">with</span> + <span class="token operator">|</span> Some <span class="token punctuation">_</span> <span class="token keyword">as</span> v<span class="token punctuation">,</span> <span class="token punctuation">_</span> <span class="token operator">-&gt;</span> v + <span class="token operator">|</span> None<span class="token punctuation">,</span> <span class="token number">5</span> <span class="token operator">-&gt;</span> Some i + <span class="token operator">|</span> None<span class="token punctuation">,</span> <span class="token punctuation">_</span> <span class="token operator">-&gt;</span> None <span class="token keyword">in</span> + t1 <span class="token operator">:=</span> <span class="token punctuation">{</span> v <span class="token punctuation">}</span> + <span class="token keyword">done</span><span class="token punctuation">;</span> <span class="token operator">!</span>t1 + +<span class="token keyword">let</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> + <span class="token keyword">let</span> t0 <span class="token punctuation">:</span> t0 <span class="token operator">=</span> <span class="token punctuation">{</span> v<span class="token operator">=</span> None <span class="token punctuation">}</span> <span class="token keyword">in</span> + <span class="token keyword">let</span> t1 <span class="token punctuation">:</span> t1 <span class="token operator">=</span> <span class="token punctuation">{</span> v<span class="token operator">=</span> None <span class="token punctuation">}</span> <span class="token keyword">in</span> + <span class="token keyword">let</span> time0 <span class="token operator">=</span> Unix<span class="token punctuation">.</span>gettimeofday <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">in</span> + ignore <span class="token punctuation">(</span>f0 t0<span class="token punctuation">)</span> <span class="token punctuation">;</span> + <span class="token keyword">let</span> time1 <span class="token operator">=</span> Unix<span class="token punctuation">.</span>gettimeofday <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">in</span> + ignore <span class="token punctuation">(</span>f1 t1<span class="token punctuation">)</span> <span class="token punctuation">;</span> + <span class="token keyword">let</span> time2 <span class="token operator">=</span> Unix<span class="token punctuation">.</span>gettimeofday <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">in</span> + + pr <span class="token string">&quot;f0: %f ns\n%!&quot;</span> <span class="token punctuation">(</span>time1 <span class="token operator">-.</span> time0<span class="token punctuation">)</span> <span class="token punctuation">;</span> + pr <span class="token string">&quot;f1: %f ns\n%!&quot;</span> <span class="token punctuation">(</span>time2 <span class="token operator">-.</span> time1<span class="token punctuation">)</span> <span class="token punctuation">;</span> + + <span class="token punctuation">(</span><span class="token punctuation">)</span></code></pre></div> +<p>In our bare-metal server, if you launch the program with 1000, the <code>f0</code> +computation, even if it has <code>caml_modify</code> will be the fastest. However, if you +launch the program with 1000000000, <code>f1</code> will be the fastest.</p> +<div class="gatsby-highlight" data-language="sh"><pre class="language-sh"><code class="language-sh">$ ./a.out <span class="token number">1000</span> +f0: <span class="token number">0.000006</span> ns +f1: <span class="token number">0.000015</span> ns +$ ./a.out <span class="token number">1000000000</span> +f0: <span class="token number">7.931782</span> ns +f1: <span class="token number">5.719370</span> ns</code></pre></div> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#about-decompress" aria-label="about decompress permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>About <code>decompress</code></h3> +<p>At the beginning, our choice was made to have, as @dbuenzli does, mutable +structure to represent state. Then, @yallop did a big patch to update it to an +immutable state and we won 9Mb/s on <em>inflation</em>.</p> +<p>However, the new version is more focused on the <em>hot-loop</em> and it is 3 +times faster than before.</p> +<p>As we said, the deal about <code>caml_modify</code> is not clear and depends a lot about +how long your data lives in the heap and how many times you want to update it. +If we localize <code>caml_modify</code> only on few places, it should be fine. But it still +is one of the most complex question about (macro?) optimization.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#smaller-representation" aria-label="smaller representation permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Smaller representation</h2> +<p>We've discussed the impact that integer types can have on the use of immediate +values. More generally, the choice of type to represent your values can have +significant performance implications.</p> +<p>For example, a dictionary which associates a bits-sequence (an integer) to the +length of it <strong>AND</strong> the byte, it can be represented by a: <code>(int * int) array</code>, or +more idiomatically <code>{ len: int; byte: int; } array</code> (which is structurally the +same).</p> +<p>However, that means an allocation for each bytes to represent every bytes. +Extraction of it will need an allocation if <code>find : bits:int -&gt; { len: int; byte: int; }</code> is not inlined as we said. And about memory, the array can be +really <em>heavy</em> in your heap.</p> +<p>At this point, we used <code>spacetime</code> to show how many blocks we allocated for a +common <em>inflation</em> and we saw that we allocate a lot. The choice was made to use +a smaller representation. Where <code>len</code> can not be upper than 15 according RFC 1951 +and when byte can represent only 256 possibilities (and should fit under one +byte), we can decide to merge them into one integer (which can have, at least, +31 bits).</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> static_literal_tree <span class="token operator">=</span> <span class="token operator-like-punctuation punctuation">[|</span> <span class="token punctuation">(</span><span class="token number">8</span><span class="token punctuation">,</span> <span class="token number">12</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token punctuation">(</span><span class="token number">8</span><span class="token punctuation">,</span> <span class="token number">140</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token punctuation">(</span><span class="token number">8</span><span class="token punctuation">,</span> <span class="token number">76</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token operator">..</span><span class="token punctuation">.</span> <span class="token operator-like-punctuation punctuation">|]</span> +<span class="token keyword">let</span> static_literal_tree <span class="token operator">=</span> Array<span class="token punctuation">.</span>map <span class="token punctuation">(</span><span class="token keyword">fun</span> <span class="token punctuation">(</span>len<span class="token punctuation">,</span> byte<span class="token punctuation">)</span> <span class="token operator">-&gt;</span> <span class="token punctuation">(</span>len <span class="token operator">lsl</span> <span class="token number">8</span><span class="token punctuation">)</span> <span class="token operator">lor</span> byte<span class="token punctuation">)</span> static_literal_tree</code></pre></div> +<p>In the code above, we just translate the static dictionary (for a STATIC DEFLATE +block) to a smaller representation where <code>len</code> will be the left part of the +integer and <code>byte</code> will be the right part. Of course, it's depends on what you +want to store.</p> +<p>Another point is readability. <a href="https://github.com/mirage/ocaml-cstruct#ppx"><code>cstruct-ppx</code></a> and +<a href="https://bitbucket.org/thanatonauts/bitstring/src"><code>bitstring</code></a> can help you but <code>decompress</code> +wants to depend only on OCaml.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusion" aria-label="conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusion</h2> +<p>We conclude with some closing advice about optimising your OCaml programs:</p> +<ul> +<li> +<p><strong>Optimization is specific to your task</strong>. The points highlighted in this +article may not fit your particular problem, but they are intended to give you +ideas. Our optimizations were only possible because we completely assimilated +the ideas of <code>zlib</code> and had a clear vision of what we really needed to +optimize (like <code>blit</code>). +<br/><br/> +As your first project, this article can not help you a lot to optimize your +code where it's mostly about <em>micro</em>-optimization under a specific context +(<em>hot-loop</em>). But it helps you to understand what is really done by the +compiler &ndash; which is still really interesting.</p> +</li> +<li> +<p><strong>Optimise only with respect to an oracle</strong>. All optimizations were done +because we did a comparison point between the old implementation of +<code>decompress</code> and <code>zlib</code> as oracles. Optimizations can change the semantics of your +code and you should systematically take care at any step about expected +behaviors. So it's a long run.</p> +</li> +<li> +<p><strong>Use the predictability of the OCaml compiler to your advantage</strong>. For sure, +the compiler does not optimize a lot your code &ndash; but it sill produce realistic +programs if we think about performance. For many cases, <strong>you don't need</strong> to +optimize your OCaml code. And the good point is about expected behavior. +<br/><br/> +The mind-link between the OCaml and the assembly exists (much more than the C +and the assembly sometimes where we let the C compiler to optimize the code). +The cool fact is to keep a mental-model about what is going on on your code +easily without to be afraid by what the compiler can produce. And, in some +critical parts like <a href="https://github.com/mirage/eqaf">eqaf</a>, it's really needed.</p> +</li> +</ul> +<p>We have not discussed benchmarking, which is another hard issue: who should you +compare with? where? how? For example, a global comparison between <code>zlib</code> and +<code>decompress</code> is not very relevant in many ways &ndash; especially because of the +garbage collector. This could be another article!</p> +<p>Finally, all of these optimizations should be done by <code>flambda</code>; the difference +between compiling <code>decompress</code> with or without <code>flambda</code> is not very big. We +optimized <code>decompress</code> by hand mostly to keep compatibility with OCaml (since +<code>flambda</code> needs another switch) and, in this way, to gain an understanding of +<code>flambda</code> optimizations so that we can use it effectively!</p>https://tarides.com/blog/2019-09-13-decompress-experiences-with-ocaml-optimizationDecompress: Experiences with OCaml optimization2019-09-13T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p><a href="http://lcamtuf.coredump.cx/afl/">American Fuzzy Lop</a> or AFL is a <em>fuzzer</em>: a program that tries to find bugs in +other programs by sending them various auto-generated inputs. This article covers the +basics of AFL and shows an example of fuzzing a parser written in OCaml. It also introduces two +extensions: the <a href="https://github.com/stedolan/crowbar/">Crowbar</a> library which can be used to fuzz any kind of OCaml program or +function and the <a href="https://github.com/yomimono/ocaml-bun/">Bun</a> tool for integrating fuzzing into your CI.</p> +<p>All of the examples given in this article are available on GitHub at +<a href="https://github.com/NathanReb/ocaml-afl-examples">ocaml-afl-examples</a>. The <code>README</code> contains all the information you need to understand, +build and fuzz them yourself.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#what-is-afl" aria-label="what is afl permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What is AFL?</h2> +<p>AFL actually isn't <em>just</em> a fuzzer but a set of tools. What makes it so good is that it doesn't just +blindly send random input to your program hoping for it to crash; it inspects the execution paths +of the program and uses that information to figure out which mutations to apply to the previous +inputs to trigger new execution paths. This approach allows for much more efficient and reliable +fuzzing (as it will try to maximize coverage) but requires the binaries to be instrumented so the +execution can be monitored.</p> +<p>AFL provides wrappers for the common C compilers that you can use to produce the instrumented +binaries along with the CLI fuzzing client: <code>afl-fuzz</code>.</p> +<p><code>afl-fuzz</code> is straight-forward to use. It takes an input directory containing a few initial valid +inputs to your program, an output directory and the instrumented binary. It will then repeatedly +mutate the inputs and feed them to the program, registering the ones that lead to crashes or +hangs in the output directory.</p> +<p>Because it works in such a way, it makes it very easy to fuzz a parser.</p> +<p>To fuzz a <code>parse.exe</code> binary, that takes a file as its first command-line argument and parses it, +you can invoke <code>afl-fuzz</code> in the following way:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">$ afl-fuzz -i inputs/ -o findings/ /path/to/parse.exe @@</code></pre></div> +<p>The <code>findings/</code> directory is where <code>afl-fuzz</code> will write the crashes it finds, it will create it +for you if it doesn't exist. +The <code>inputs/</code> directory contains one or more valid input files for your +program. By valid we mean &quot;that don't crash your program&quot;. +Finally the <code>@@</code> part tells <code>afl-fuzz</code> where on the command line the input file should be passed to +your program, in our case, as the first argument.</p> +<p>Note that it is possible to supply <code>afl-fuzz</code> with more detail about how to invoke your program. If +you need to pass it command-line options for instance, you can run it as:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">$ afl-fuzz -i inputs/ -o findings/ -- /path/to/parse.exe --option=value @@</code></pre></div> +<p>If you wish to fuzz a program that takes its input from standard input, you can also do that by removing the +<code>@@</code> from the <code>afl-fuzz</code> invocation.</p> +<p>Once <code>afl-fuzz</code> starts, it will draw a fancy looking table on the standard output to keep you +updated about its progress. From there, you'll mostly be interested in is the top right +corner which contains the number of crashes and hangs it has found so far:</p> +<p><span class="gatsby-resp-image-wrapper" style="position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 680px; "> + <a href="https://tarides.com/static/893fd2c3d0dfbb1c576fd016b6963e96/f2793/afl_example_output.png" class="gatsby-resp-image-link" style="display: block" target="_blank" rel="noopener"> + <span class="gatsby-resp-image-background-image" style="padding-bottom: 63.52941176470588%; position: relative; bottom: 0; left: 0; background-image: url(''); background-size: cover; display: block;"></span> + <img src="https://tarides.com/static/893fd2c3d0dfbb1c576fd016b6963e96/c5bb3/afl_example_output.png" class="gatsby-resp-image-image" alt="Example output from afl-fuzz" title="Example output from afl-fuzz" srcset="/static/893fd2c3d0dfbb1c576fd016b6963e96/04472/afl_example_output.png 170w, +/static/893fd2c3d0dfbb1c576fd016b6963e96/9f933/afl_example_output.png 340w, +/static/893fd2c3d0dfbb1c576fd016b6963e96/c5bb3/afl_example_output.png 680w, +/static/893fd2c3d0dfbb1c576fd016b6963e96/f2793/afl_example_output.png 743w" sizes="(max-width: 680px) 100vw, 680px" style="width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;" loading="lazy" decoding="async"/> + </a> + </span></p> +<p>You might need to change some of your CPU settings to achieve better performance while fuzzing. +<code>afl-fuzz</code>'s output will tell you if that's the case and guide you through the steps required to +make that happen.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#using-afl-to-fuzz-an-ocaml-parser" aria-label="using afl to fuzz an ocaml parser permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Using AFL to fuzz an OCaml parser</h2> +<p>First of all, if you want to fuzz an OCaml program with AFL you'll need to produce an instrumented +binary. <code>afl-fuzz</code> has an option to work with regular binaries but you'd lose a lot of what makes it +efficient. To instrument your binary you can simply install a <code>+afl</code> opam switch and build your +executable from there. AFL compiler variants are available from OCaml <code>4.05.0</code> onwards. To install such +a switch you can run:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">$ opam switch create fuzzing-switch 4.07.1+afl</code></pre></div> +<p>If your program already parses the standard input or a file given to it via the command line, you +can simply build the executable from your <code>+afl</code> switch and adapt the above examples. If it doesn't, +it's still easy to fuzz any parsing function.</p> +<p>Imagine we have a <code>simple-parser</code> library which exposes the following <code>parse_int</code> function:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">val</span> parse_int<span class="token punctuation">:</span> string <span class="token operator">-&gt;</span> <span class="token punctuation">(</span>int<span class="token punctuation">,</span> <span class="token operator-like-punctuation punctuation">[&gt;</span> <span class="token variant symbol">`Msg</span> <span class="token keyword">of</span> string<span class="token punctuation">]</span><span class="token punctuation">)</span> result +<span class="token comment">(** Parse the given string as an int or return [Error (`Msg _)]. + Does not raise, usually... *)</span></code></pre></div> +<p>We want to use AFL to make sure our function is robust and won't crash when receiving unexpected +inputs. As you can see the function returns a result and isn't supposed to raise exceptions. We want +to make sure that's true.</p> +<p>To find crashes, AFL traps the signals sent by your program. That means that it will consider +uncaught OCaml exceptions as crashes. That's good because it makes it really simple to write a +<code>fuzz_me.ml</code> executable that fits what <code>afl-fuzz</code> expects:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> + <span class="token keyword">let</span> file <span class="token operator">=</span> Sys<span class="token punctuation">.</span>argv<span class="token punctuation">.</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">)</span> <span class="token keyword">in</span> + <span class="token keyword">let</span> ic <span class="token operator">=</span> open_in file <span class="token keyword">in</span> + <span class="token keyword">let</span> length <span class="token operator">=</span> in_channel_length ic <span class="token keyword">in</span> + <span class="token keyword">let</span> content <span class="token operator">=</span> really_input_string ic length <span class="token keyword">in</span> + close_in ic<span class="token punctuation">;</span> + ignore <span class="token punctuation">(</span>Simple_parser<span class="token punctuation">.</span>parse_int content<span class="token punctuation">)</span></code></pre></div> +<p>We have to provide example inputs to AFL so we can write a <code>valid</code> file to the <code>inputs/</code> directory +containing <code>123</code> and an <code>invalid</code> file containing <code>not an int</code>. Both should parse without crashing +and make good starting point for AFL as they should trigger different execution paths.</p> +<p>Because we want to make sure AFL does find crashes we can try to hide a bug in our function:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> parse_int s <span class="token operator">=</span> + <span class="token keyword">match</span> List<span class="token punctuation">.</span>init <span class="token punctuation">(</span>String<span class="token punctuation">.</span>length s<span class="token punctuation">)</span> <span class="token punctuation">(</span>String<span class="token punctuation">.</span>get s<span class="token punctuation">)</span> <span class="token keyword">with</span> + <span class="token operator">|</span> <span class="token punctuation">[</span><span class="token char">'a'</span><span class="token punctuation">;</span> <span class="token char">'b'</span><span class="token punctuation">;</span> <span class="token char">'c'</span><span class="token punctuation">]</span> <span class="token operator">-&gt;</span> failwith <span class="token string">&quot;secret crash&quot;</span> + <span class="token operator">|</span> <span class="token punctuation">_</span> <span class="token operator">-&gt;</span> <span class="token punctuation">(</span> + <span class="token keyword">match</span> int_of_string_opt s <span class="token keyword">with</span> + <span class="token operator">|</span> None <span class="token operator">-&gt;</span> Error <span class="token punctuation">(</span><span class="token variant symbol">`Msg</span> <span class="token punctuation">(</span>Printf<span class="token punctuation">.</span>sprintf <span class="token string">&quot;Not an int: %S&quot;</span> s<span class="token punctuation">)</span><span class="token punctuation">)</span> + <span class="token operator">|</span> Some i <span class="token operator">-&gt;</span> Ok i<span class="token punctuation">)</span></code></pre></div> +<p>Now we just have to build our native binary from the right switch and let <code>afl-fuzz</code> do the rest:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">$ afl-fuzz -i inputs/ -o findings/ ./fuzz_me.exe @@</code></pre></div> +<p>It should find that the <code>abc</code> input leads to a crash rather quickly. Once it does, you'll see it in +the top right corner of its output as shown in the picture from the previous section.</p> +<p>At this point you can interrupt <code>afl-fuzz</code> and have a look at the content of the <code>findings/crashes</code>:</p> +<div class="gatsby-highlight" data-language="sh"><pre class="language-sh"><code class="language-sh">$ <span class="token function">ls</span> findings/crashes/ +id:000000,sig:06,src:000111,op:havoc,rep:16 README.txt</code></pre></div> +<p>As you can see it contains a <code>README.txt</code> which will give you some details about the <code>afl-fuzz</code> +invocation used to find the crashes and how to reproduce them in the folder and a file of the form +<code>id:...,sig:...,src:...,op:...,rep:...</code> per crash it found. Here there's just one:</p> +<div class="gatsby-highlight" data-language="sh"><pre class="language-sh"><code class="language-sh">$ <span class="token function">cat</span> findings/crashes/id:000000,sig:06,src:000111,op:havoc,rep:16 +abc</code></pre></div> +<p>As expected it contains our special input that triggers our secret crash. We can rerun the program +with that input ourselves to make sure it does trigger it:</p> +<div class="gatsby-highlight" data-language="sh"><pre class="language-sh"><code class="language-sh">$ ./fuzz_me.exe findings/crashes/id:000000,sig:06,src:000111,op:havoc,rep:16 +Fatal error: exception Failure<span class="token punctuation">(</span><span class="token string">&quot;secret crash&quot;</span><span class="token punctuation">)</span></code></pre></div> +<p>No surprise here, it does trigger our uncaught exception and crashes shamefully.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#using-crowbar-and-afl-for-property-based-testing" aria-label="using crowbar and afl for property based testing permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Using Crowbar and AFL for property-based testing</h2> +<p>This works well but only being able to fuzz parsers is quite a limitation. That's where <a href="https://github.com/stedolan/crowbar/">Crowbar</a> +comes into play.</p> +<p>Crowbar is a property-based testing framework. It's much like Haskell's <a href="http://hackage.haskell.org/package/QuickCheck">QuickCheck</a>. +To test a given function, you define how its arguments are shaped, a set of properties the result +should satisfy and it will make sure they hold with any combinations of randomly generated +arguments. +Let's clarify that with an example.</p> +<p>I wrote a library called <code>Awesome_list</code> and I want to test its <code>sort</code> function:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">val</span> sort<span class="token punctuation">:</span> int list <span class="token operator">-&gt;</span> int list +<span class="token comment">(** Sorts the given list of integers. Result list is sorted in increasing + order, most of the time... *)</span></code></pre></div> +<p>I want to make sure it really works so I'm going to use Crowbar to generate a whole lot of +lists of integers and verify that when I sort them with <code>Awesome_list.sort</code> the result is, well... +sorted.</p> +<p>We'll write our tests in a <code>fuzz_me.ml</code> file. +First we need to tell Crowbar how to generate arguments for our function. It exposes some +combinators to help you do that:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> int_list <span class="token operator">=</span> Crowbar<span class="token punctuation">.</span><span class="token punctuation">(</span>list <span class="token punctuation">(</span>range <span class="token number">10</span><span class="token punctuation">)</span><span class="token punctuation">)</span></code></pre></div> +<p>Here we're telling Crowbar to generate lists of any size, containing integers ranging from 0 +to 10. Crowbar also exposes more complex and custom generator combinators so don't worry, +you can use it to build more complex arguments.</p> +<p>Now we need to define our property. Once again it's pretty simple, we just want the output to be +sorted:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> is_sorted l <span class="token operator">=</span> + <span class="token keyword">let</span> <span class="token keyword">rec</span> is_sorted <span class="token operator">=</span> <span class="token keyword">function</span> + <span class="token operator">|</span> <span class="token punctuation">[</span><span class="token punctuation">]</span> <span class="token operator">|</span> <span class="token punctuation">[</span><span class="token punctuation">_</span><span class="token punctuation">]</span> <span class="token operator">-&gt;</span> <span class="token boolean">true</span> + <span class="token operator">|</span> hd<span class="token punctuation">::</span><span class="token punctuation">(</span>hd'<span class="token punctuation">::</span><span class="token punctuation">_</span> <span class="token keyword">as</span> tl<span class="token punctuation">)</span> <span class="token operator">-&gt;</span> hd <span class="token operator">&lt;=</span> hd' <span class="token operator">&amp;&amp;</span> is_sorted tl + <span class="token keyword">in</span> + Crowbar<span class="token punctuation">.</span>check <span class="token punctuation">(</span>is_sorted l<span class="token punctuation">)</span></code></pre></div> +<p>All that's left to do now is to register our test:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> + Crowbar<span class="token punctuation">.</span>add_test <span class="token label property">~name</span><span class="token punctuation">:</span><span class="token string">&quot;Awesome_list.sort&quot;</span> <span class="token punctuation">[</span>int_list<span class="token punctuation">]</span> + <span class="token punctuation">(</span><span class="token keyword">fun</span> l <span class="token operator">-&gt;</span> is_sorted <span class="token punctuation">(</span>Awesome_list<span class="token punctuation">.</span>sort l<span class="token punctuation">)</span><span class="token punctuation">)</span></code></pre></div> +<p>and to compile that <code>fuzz_me.ml</code> file to a binary. Crowbar will take care of the magic.</p> +<p>We can run that binary in &quot;Quickcheck&quot; mode where it will either try a certain amount of random +inputs or keep trying until one of the properties breaks depending on the command-line options +we pass it. +What we're interested in here is its less common &quot;AFL&quot; mode. Crowbar made it so our executable +can be used with <code>afl-fuzz</code> just like that:</p> +<div class="gatsby-highlight" data-language="sh"><pre class="language-sh"><code class="language-sh">$ afl-fuzz <span class="token parameter variable">-i</span> inputs <span class="token parameter variable">-o</span> findings -- ./fuzz_me.exe @@</code></pre></div> +<p>What will happen then is that our <code>fuzz_me.exe</code> binary will read the inputs provided by <code>afl-fuzz</code> +and use it to determine which test to run and how to generate the arguments to pass to our function. +If the properties are satisfied, the binary will exit normally; if they aren't, it will make sure +that <code>afl-fuzz</code> interprets that as a crash by raising an exception.</p> +<p>A nice side-effect of Crowbar's approach is that <code>afl-fuzz</code> will still be able to pick up +crashes. For instance, if we implement <code>Awesome_list.sort</code> as:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> sort <span class="token operator">=</span> <span class="token keyword">function</span> + <span class="token operator">|</span> <span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">;</span> <span class="token number">2</span><span class="token punctuation">;</span> <span class="token number">3</span><span class="token punctuation">]</span> <span class="token operator">-&gt;</span> failwith <span class="token string">&quot;secret crash&quot;</span> + <span class="token operator">|</span> <span class="token punctuation">[</span><span class="token number">4</span><span class="token punctuation">;</span> <span class="token number">5</span><span class="token punctuation">;</span> <span class="token number">6</span><span class="token punctuation">]</span> <span class="token operator">-&gt;</span> <span class="token punctuation">[</span><span class="token number">6</span><span class="token punctuation">;</span> <span class="token number">5</span><span class="token punctuation">;</span> <span class="token number">4</span><span class="token punctuation">]</span> + <span class="token operator">|</span> l <span class="token operator">-&gt;</span> List<span class="token punctuation">.</span>sort Pervasives<span class="token punctuation">.</span>compare l</code></pre></div> +<p>and use AFL and Crowbar to fuzz-test our function, it will find two crashes: one for the input +<code>[1; 2; 3]</code> which triggers a crash and one for <code>[4; 5; 6]</code> for which the <code>is_sorted</code> +property won't hold.</p> +<p>The content of the input files found by <code>afl-fuzz</code> itself won't be of much help as it needs to be +interpreted by Crowbar to build the arguments that were passed to the function to trigger the bug. +We can invoke the <code>fuzz_me.exe</code> binary ourselves on one of the files in <code>findings/crashes</code> +and the Crowbar binary will replay the test and give us some more helpful information about what +exactly is going on:</p> +<div class="gatsby-highlight" data-language="sh"><pre class="language-sh"><code class="language-sh">$ ./fuzz_me.exe findings/crashes/id<span class="token punctuation">\</span>:000000<span class="token punctuation">\</span>,sig<span class="token punctuation">\</span>:06<span class="token punctuation">\</span>,src<span class="token punctuation">\</span>:000011<span class="token punctuation">\</span>,op<span class="token punctuation">\</span>:flip1<span class="token punctuation">\</span>,pos<span class="token punctuation">\</span>:5 +Awesome_list.sort: <span class="token punctuation">..</span><span class="token punctuation">..</span> +Awesome_list.sort: FAIL + +When given the input: + + <span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">;</span> <span class="token number">2</span><span class="token punctuation">;</span> <span class="token number">3</span><span class="token punctuation">]</span> + +the <span class="token builtin class-name">test</span> threw an exception: + + Failure<span class="token punctuation">(</span><span class="token string">&quot;secret crash&quot;</span><span class="token punctuation">)</span> + Raised at <span class="token function">file</span> <span class="token string">&quot;stdlib.ml&quot;</span>, line <span class="token number">33</span>, characters <span class="token number">17</span>-33 + Called from <span class="token function">file</span> <span class="token string">&quot;awesome-list/fuzz/fuzz_me.ml&quot;</span>, line <span class="token number">11</span>, characters <span class="token number">78</span>-99 + Called from <span class="token function">file</span> <span class="token string">&quot;src/crowbar.ml&quot;</span>, line <span class="token number">264</span>, characters <span class="token number">16</span>-19 + +Fatal error: exception Crowbar.TestFailure +$ ./fuzz_me.exe findings/crashes/id<span class="token punctuation">\</span>:000001<span class="token punctuation">\</span>,sig<span class="token punctuation">\</span>:06<span class="token punctuation">\</span>,src<span class="token punctuation">\</span>:000027<span class="token punctuation">\</span>,op<span class="token punctuation">\</span>:arith16<span class="token punctuation">\</span>,pos<span class="token punctuation">\</span>:5<span class="token punctuation">\</span>,val<span class="token punctuation">\</span>:+7 +Awesome_list.sort: <span class="token punctuation">..</span><span class="token punctuation">..</span> +Awesome_list.sort: FAIL + +When given the input: + + <span class="token punctuation">[</span><span class="token number">4</span><span class="token punctuation">;</span> <span class="token number">5</span><span class="token punctuation">;</span> <span class="token number">6</span><span class="token punctuation">]</span> + +the <span class="token builtin class-name">test</span> failed: + + check <span class="token boolean">false</span> + +Fatal error: exception Crowbar.TestFailure</code></pre></div> +<p>We can see the actual inputs as well as distinguish the one that broke the invariant from the one +that triggered a crash.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#using-bun-to-run-fuzz-testing-in-ci" aria-label="using bun to run fuzz testing in ci permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Using <code>bun</code> to run fuzz testing in CI</h2> +<p>While AFL and Crowbar provide no guarantees they can give you confidence that your implementation +is not broken. Now that you know how to use them, a natural follow-up is to want to run fuzz tests +in your CI to enforce that level of confidence.</p> +<p>Problem is, AFL isn't very CI friendly. First it has this refreshing output that isn't going to look +great on your travis builds output and it doesn't tell you much besides that it could or couldn't find +crashes or invariant infrigements</p> +<p>Hopefully, like most problems, this one has a solution: +<a href="https://github.com/yomimono/ocaml-bun/"><code>bun</code></a>. +<code>bun</code> is a CLI wrapper around <code>afl-fuzz</code>, written in OCaml, that helps you get the best out of AFL +effortlessly. It mostly does two things:</p> +<p>The first is that it will run several <code>afl-fuzz</code> processes in parallel +(one per core by default). <code>afl-fuzz</code> starts with a bunch of deterministic steps. In my experience, +using parallel processes during this phase rarely proved very useful as they tend to find the same +bugs or slight variations of those bugs. It only achieves its full potential in the second phase of +fuzzing.</p> +<p>The second thing, which is the one we're the most interested in, is that <code>bun</code> provides a useful +and CI-friendly summary of what's going on with all the fuzzing processes so far. When one of them +finds a crash, it will stop all processes and pretty-print all of the bug-triggering inputs to help +you reproduce and debug them locally. See an example <code>bun</code> output after a crash was found:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">Crashes found! Take a look; copy/paste to save for reproduction: +1432 echo JXJpaWl0IA== | base64 -d &gt; crash_0.$(date -u +%s) +1433 echo NXJhkV8QAA== | base64 -d &gt; crash_1.$(date -u +%s) +1434 echo J3Jh//9qdGFiYmkg | base64 -d &gt; crash_2.$(date -u +%s) +1435 09:35.32:[ERROR]All fuzzers finished, but some crashes were found!</code></pre></div> +<p>Using <code>bun</code> is very similar to using <code>afl-fuzz</code>. Going back to our first parser example, we can +fuzz it with <code>bun</code> like this:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">$ bun --input inputs/ --output findings/ /path/to/parse.exe</code></pre></div> +<p>You'll note that you don't need to provide the <code>@@</code> anymore. <code>bun</code> assumes that it should pass the +input as the first argument of your to-be-fuzzed binary.</p> +<p><code>bun</code> also comes with an alternative <code>no-kill</code> mode which lets all the fuzzers run indefinitely +instead of terminating them whenever a crash is discovered. It will regularly keep you updated on +the number of crashes discovered so far and when terminated will pretty-print each of them just like +it does in regular mode.</p> +<p>This mode can be convenient if you suspect your implementation may contain a lot of bugs and +you don't want to go through the whole process of fuzz testing it to only find a single bug.</p> +<p>You can use it in CI by running <code>bun --no-kill</code> via <code>timeout</code>. For instance:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">timeout --preserve-status 60m bun --no-kill --input inputs --output findings ./fuzz_me.exe</code></pre></div> +<p>will fuzz <code>fuzz_me.exe</code> for an hour no matter what happens. When <code>timeout</code> terminates <code>bun</code>, it will +provide you with a handful of bugs to fix!</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#final-words" aria-label="final words permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Final words</h2> +<p>I really want to encourage you to use those tools and fuzzing in general. +Crowbar and <code>bun</code> are fairly new so you will probably encounter bugs or find that it lacks a feature +you want but combined with AFL they make for very nice tools to effectively test +critical components of your OCaml code base or infrastructure and detect newly-introduced bugs. +They are already used accross the MirageOS ecosystem where it has been used to fuzz the TCP/IP stack +<a href="https://github.com/mirage/mirage-tcpip">mirage-tcpip</a> and the DHCP implementation <a href="https://github.com/mirage/charrua">charrua</a> thanks to +<a href="https://github.com/yomimono/somerandompacket">somerandompacket</a>. +You can consult Crowbar's <a href="https://github.com/stedolan/crowbar/issues/2">hall of fame</a> to find out about bugs uncovered by this +approach.</p> +<p>I also encourage anyone interested to join us in using this promising toolchain, report those bugs, +contribute those extra features and help the community build more robust software.</p> +<p>Finally if you wish to learn more about how to efficienly use fuzzing for testing I recommend the +excellent <a href="https://blog.regehr.org/archives/1687">Write Fuzzable Code</a> article by John Regehr.</p>https://tarides.com/blog/2019-09-04-an-introduction-to-fuzzing-ocaml-with-afl-crowbar-and-bunAn introduction to fuzzing OCaml with AFL, Crowbar and Bun2019-09-04T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p><a href="https://tools.ietf.org/html/rfc1951">RFC 1951</a> is one of the most used standards. Indeed, +when you launch your Linux kernel, it inflates itself according <a href="https://zlib.net/">zlib</a> +standard, a superset of RFC 1951. Being a widely-used standard, we decided to +produce an OCaml implementation. In the process, we learned many lessons about +developing OCaml where we would normally use C. So, we are glad to present +<a href="https://github.com/mirage/decompress"><code>decompress</code></a>.</p> +<p>One of the many users of RFC 1951 is <a href="https://git-scm.com/">Git</a>, which uses it to pack data +objects into a <a href="https://git-scm.com/book/en/v2/Git-Internals-Packfiles">PACK file</a>. At the request of <a href="https://github.com/samoht">@samoht</a>, +<code>decompress</code> appeared some years ago as a Mirage-compatible replacement for zlib +to be used for compiling a <a href="https://mirage.io/">MirageOS</a> unikernel with +<a href="https://github.com/mirage/ocaml-git/">ocaml-git</a>. Today, this little project passes a major release with +substantial improvements in several domains.</p> +<p><code>decompress</code> provides an API for inflating and deflating <em>flows</em><code>[1]</code>. The main +goal is to provide a <em>platform-agnostic</em> library: one which may be compiled on +any platform, including JavaScript. We surely cannot be faster than C +implementations like <a href="https://github.com/facebook/zstd">zstd</a> or <a href="https://github.com/lz4/lz4">lz4</a>, but we can play some +optimisation tricks to help bridge the gap. Additionally, OCaml can protect the +user against lot of bugs via the type-system <em>and</em> the runtime too (e.g. using +array bounds checking). <a href="https://github.com/mirleft/ocaml-tls"><code>ocaml-tls</code></a> was implemented partly in +response to the famous <a href="https://en.wikipedia.org/wiki/Heartbleed">failure</a> of <code>openssl</code>; a vulnerability +which could not exist in OCaml.</p> +<p><code>[1]</code>: A <em>flow</em>, in MirageOS land, is an abstraction which wants to receive +and/or transmit something under a standard. So it's usual to say a <em>TLS-flow</em> +for example.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#api-design" aria-label="api design permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>API design</h2> +<p>The API should be the most difficult part of designing a library - it reveals +what we can do and how we should do it. In this way, an API should:</p> +<ol> +<li><strong>constrain the user to avoid security issues</strong>; too much freedom can be a bad</li> +</ol> +<p>thing. As an example, consider the <code>Hashtbl.create</code> function, which allows the +user to pass <code>~random:false</code> to select a fixed hash function. The resulting +hashtable suffers deterministic key collisions, which can be exploited by an +attacker. +<br/><br/> +An example of good security-awareness in API design can be seen in +<a href="https://github.com/mirage/digestif">digestif</a>, which provided an <code>unsafe_compare</code> instead of the common +<code>compare</code> function (before <code>eqaf.0.5</code>). In this way, it enforced the user to +create an alias of it if they want to use a hash in a <code>Map</code> &ndash; however, by this +action, they should know that they are not protected against a timing-attack.</p> +<ol start="2"> +<li><strong>allow some degrees of freedom to fit within many environments</strong>; a</li> +</ol> +<p>constrained API cannot support a hostile context. For example, when compiling +to an <a href="https://mirage.io/blog/2018-esp32-booting">ESP32</a> target, even small details such as the length of a stream +input buffer must be user-definable. When deploying to a server, memory +consumption should be deterministic. +<br/><br/> +Of course, this is made difficult when too much freedom will enable misuse of +the API &ndash; an example is <a href="https://github.com/ocaml/dune">dune</a> which wants consciously to limit the user +about what they can do with it.</p> +<ol start="3"> +<li><strong>imply an optimal design of how to use it</strong>. Possibilities should serve the +user, but these can make the API harder to understand; this is why +documentation is important. Your API should tell your users how it wants to +be treated.</li> +</ol> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#a-dbuenzli-api" aria-label="a dbuenzli api permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>A dbuenzli API</h3> +<p>From our experiences with protocol/format, one design stands out: the +<em><a href="https://github.com/dbuenzli/">dbuenzli</a> API</em>. If you look into some famous libraries in the OCaml +eco-system, you probably know <a href="https://github.com/dbuenzli/uutf">uutf</a>, <a href="https://github.com/dbuenzli/jsonm">jsonm</a> or <a href="https://github.com/dbuenzli/xmlm">xmlm</a>. All +of these libraries provide the same API for computing a Unicode/JSON/XML flow &ndash; +of course, the details are not the same.</p> +<p>From a MirageOS perspective, even if they use the <code>in_channel</code>/<code>out_channel</code> +abstraction rather than a <a href="https://github.com/mirage/mirage-flow">Mirage flow</a>, these libraries +are system-agnostic since they let the user to choose input and output buffers. +Most importantly, they don't use the standard OCaml <code>Unix</code> module, which cannot +be used in a unikernel.</p> +<p>The APIs are pretty consistent and try to do their <em>best-effort</em><code>[2]</code> of +decoding. The design has a type <em>state</em> which represents the current system +status; the user passes this to <code>decode</code>/<code>encode</code> to carry out the processing. +Of course, these functions have a side-effect on the state internally, but +this is hidden from the user. One advantage of including states in a design is +that the underlying implementation is very amenable to compiler optimisations (e.g. +tail-call optimisation). Internally, of course, we have a <em>porcelain</em><code>[3]</code> +implementation where any details can have an rational explanation.</p> +<p>In the beginning, <code>decompress</code> wanted to follow the same interface without the +mutability (a choice about performances) and it did. Then, the hard test was to +use it in a bigger project; in this case, <a href="https://github.com/mirage/ocaml-git/">ocaml-git</a>. An iterative +process was used to determine what was really needed, what we should not provide +(like special cases) and what we should provide to reach an uniform API that is +not too difficult to understand.</p> +<p>From this experience, we finalised the initial <code>decompress</code> API and it did not +change significantly for 4 versions (2 years).</p> +<p><code>[2]</code>: <em>best-effort</em> means an user control on the error branch where we don't +leak exception (or more generally, any interrupts)</p> +<p><code>[3]</code>: <em>porcelain</em> means implicit invariants held in the mind of the programmer +(or the assertions/comments).</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#the-new-decompress-api" aria-label="the new decompress api permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The new <code>decompress</code> API</h2> +<p>The new <code>decompress</code> keeps the same inflation logic, but drastically changes the +deflator to make the <em>flush</em> operation clearer. For many purposes, people don't +want to hand-craft their compressed flows &ndash; they just want +<code>of_string</code>/<code>to_string</code> functions. However, in some contexts (like a PNG +encoder/decoder), the user should be able to play with <code>decompress</code> in detail +(OpenPGP needs this too in <a href="https://tools.ietf.org/html/rfc4880">RFC 4880</a>).</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#the-zlib-format" aria-label="the zlib format permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The Zlib format</h3> +<p>Both <code>decompress</code> and zlib use <em><a href="https://zlib.net/feldspar.html">Huffman coding</a></em>, an algorithm +for building a dictionary of variable-length codewords for a given set of +symbols (in this case, bytes). The most common byte is assigned the shortest bit +sequence; less common bytes are assigned longer codewords. Using this +dictionary, we just translate each byte into its codeword and we should achieve +a good compression ratio. Of course, there are other details, such as the fact +that all Huffman codes are <a href="https://en.wikipedia.org/wiki/Prefix_code">prefix-free</a>. The compression can be +taken further with the <a href="https://en.wikipedia.org/wiki/LZ77_and_LZ78">LZ77</a> algorithm.</p> +<p>The <em><a href="https://zlib.net/">zlib</a></em> format, a superset of the <a href="https://tools.ietf.org/html/rfc1951">RFC 1951</a> format, is easy +to understand. We will only consider the RFC 1951 format, since zlib adds only +minor details (such as checksums). It consists of several blocks: DEFLATE +blocks, each with a little header, and the contents. There are 3 kinds of +DEFLATE blocks:</p> +<ul> +<li>a FLAT block; no compression, just a <em>blit</em> from inputs to the current block.</li> +<li>a FIXED block; compressed using a pre-computed Huffman code.</li> +<li>a DYNAMIC block; compressed using a user-specified Huffman code.</li> +</ul> +<p>The FIXED block uses a Huffman dictionary that is computed when the OCaml runtime +is initialised. DYNAMIC blocks use dictionaries specified by the user, and so +these must be transmitted alongside the data (<em>after being compressed with +another Huffman code!</em>). The inflator decompresses this DYNAMIC dictionary and uses +it to do the <em>reverse</em> translation from bit sequences to bytes.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#inflator" aria-label="inflator permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Inflator</h3> +<p>The design of the inflator did not change a lot from the last version of +<code>decompress</code>. Indeed, it's about to take an input, compute it and return an +output like a flow. Of course, the error case can be reached.</p> +<p>So the API is pretty-easy:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">val</span> decode <span class="token punctuation">:</span> decoder <span class="token operator">-&gt;</span> <span class="token punctuation">[</span> <span class="token variant symbol">`Await</span> <span class="token operator">|</span> <span class="token variant symbol">`Flush</span> <span class="token operator">|</span> <span class="token variant symbol">`End</span> <span class="token operator">|</span> <span class="token variant symbol">`Malformed</span> <span class="token keyword">of</span> string <span class="token punctuation">]</span></code></pre></div> +<p>As you can see, we have 4 cases: one which expects more inputs (<code>Await</code>), the +second which asks to the user to flush internal buffer (<code>Flush</code>), the <code>End</code> case +when we reach the end of the flow and the <code>Malformed</code> case when we encounter an +error.</p> +<p>For each case, the user can do several operations. Of course, about the <code>Await</code> +case, they can refill the contents with an other inputs buffer with:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">val</span> src <span class="token punctuation">:</span> decoder <span class="token operator">-&gt;</span> bigstring <span class="token operator">-&gt;</span> off<span class="token punctuation">:</span>int <span class="token operator">-&gt;</span> len<span class="token punctuation">:</span>int <span class="token operator">-&gt;</span> unit</code></pre></div> +<p>This function provides the decoder a new input with <code>len</code> bytes to read +starting at <code>off</code> in the given <code>bigstring</code>.</p> +<p>In the <code>Flush</code> case, the user wants some information like how many bytes are +available in the current output buffer. Then, we should provide an action to +<em>flush</em> this output buffer. In the end, this output buffer should be given by +the user (how many bytes they want to allocate to store outputs flow).</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">type</span> src <span class="token operator">=</span> <span class="token punctuation">[</span> <span class="token variant symbol">`Channel</span> <span class="token keyword">of</span> in_channel <span class="token operator">|</span> <span class="token variant symbol">`Manual</span> <span class="token operator">|</span> <span class="token variant symbol">`String</span> <span class="token keyword">of</span> string <span class="token punctuation">]</span> + +<span class="token keyword">val</span> dst_rem <span class="token punctuation">:</span> decoder <span class="token operator">-&gt;</span> int +<span class="token keyword">val</span> flush <span class="token punctuation">:</span> decoder <span class="token operator">-&gt;</span> unit +<span class="token keyword">val</span> decoder <span class="token punctuation">:</span> src <span class="token operator">-&gt;</span> o<span class="token punctuation">:</span>bigstring <span class="token operator">-&gt;</span> w<span class="token punctuation">:</span>bigstring <span class="token operator">-&gt;</span> decoder</code></pre></div> +<p>The last function, <code>decoder</code>, is the most interesting. It lets the user, at the +beginning, choose the context in which they want to inflate inputs. So they +choose:</p> +<ul> +<li><code>src</code>, where come from inputs flow</li> +<li><code>o</code>, output buffer</li> +<li><code>w</code>, window buffer</li> +</ul> +<p><code>o</code> will be used to store inflated outputs, <code>dst_rem</code> will give to us how many +bytes inflator stored in <code>o</code> and <code>flush</code> will just set <code>decoder</code> to be able to +recompute the flow.</p> +<p><code>w</code> is needed for <a href="https://en.wikipedia.org/wiki/LZ77_and_LZ78">lz77</a> compression. However, as we said, we let +the user give us this intermediate buffer. The idea behind that is to let the +user prepare an <em>inflation</em>. For example, in <a href="https://github.com/mirage/ocaml-git/">ocaml-git</a>, instead of +allocating <code>w</code> systematically when we want to decompress a Git object, we +allocate <code>w</code> one time per threads and all are able to use it and <strong>re-use</strong> it. +In this way, we avoid systematic allocations (and allocate only once time) which +can have a serious impact about performances.</p> +<p>The design is pretty close to one idea, a <em>description</em> step by the <code>decoder</code> +function and a real computation loop with the <code>decode</code> function. The idea is to +prepare the inflation with some information (like <code>w</code> and <code>o</code>) before the main +(and the most expensive) computation. Internally we do that too (but it's mostly +about a macro-optimization).</p> +<p>It's the purpose of OCaml in general, be able to have a powerful way to describe +something (with constraints). In our case, we are very limited to what we need +to describe. But, in others libraries like <a href="https://github.com/inhabitedtype/angstrom">angstrom</a>, the description +step is huge (describe the parser according to the BNF) and then, we use it to +the main computation, in the case of angstrom, the parsing (another +example is [cmdliner][cmdliner]).</p> +<p>This is why <code>decoder</code> can be considered as the main function where <code>decode</code> can +be wrapped under a stream.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#deflator" aria-label="deflator permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Deflator</h3> +<p>The deflator is a new (complex) deal. Indeed, behind it we have two concepts:</p> +<ul> +<li>the encoder (according to RFC 1951)</li> +<li>the compressor</li> +</ul> +<p>For this new version of <code>decompress</code>, we decide to separate these concepts where +one question leads all: how to put my compression algorithm? (instead to use +<a href="https://en.wikipedia.org/wiki/LZ77_and_LZ78">LZ77</a>).</p> +<p>In fact, if you are interested in compression, several algorithms exist and, in +some context, it's preferable to use <a href="https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithm">lzwa</a> for example or rabin's +fingerprint (with <a href="https://github.com/mirage/duff">duff</a>), etc.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#functor" aria-label="functor permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Functor</h4> +<p>The first idea was to provide a <em>functor</em> which expects an implementation of the +compression algorithm. However, the indirection of a functor comes with (big) +performance cost. Consider the following functor example:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">module</span> <span class="token keyword">type</span> S <span class="token operator">=</span> <span class="token keyword">sig</span> + <span class="token keyword">type</span> t + <span class="token keyword">val</span> add <span class="token punctuation">:</span> t <span class="token operator">-&gt;</span> t <span class="token operator">-&gt;</span> t + <span class="token keyword">val</span> one <span class="token punctuation">:</span> t +<span class="token keyword">end</span> + +<span class="token keyword">module</span> Make <span class="token punctuation">(</span>S <span class="token punctuation">:</span> S<span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token keyword">struct</span> <span class="token keyword">let</span> succ x <span class="token operator">=</span> S<span class="token punctuation">.</span>add x S<span class="token punctuation">.</span>one <span class="token keyword">end</span> + +<span class="token keyword">include</span> Make <span class="token punctuation">(</span><span class="token keyword">struct</span> + <span class="token keyword">type</span> t <span class="token operator">=</span> int + <span class="token keyword">let</span> add a b <span class="token operator">=</span> a <span class="token operator">+</span> b + <span class="token keyword">let</span> one <span class="token operator">=</span> <span class="token number">1</span> +<span class="token keyword">end</span><span class="token punctuation">)</span> + +<span class="token keyword">let</span> f x <span class="token operator">=</span> succ x</code></pre></div> +<p>Currently, with OCaml 4.07.1, the <code>f</code> function will be a <code>caml_apply2</code>. We might +wish for a simple <a href="https://en.wikipedia.org/wiki/Inline_expansion"><em>inlining</em></a> optimisation, allowing <code>f</code> to become an +<code>addq</code> instruction (indeed, <a href="https://caml.inria.fr/pub/docs/manual-ocaml/flambda.html"><code>flambda</code></a> does this), but optimizing +functors is hard. As we learned from <a href="https://github.com/chambart">Pierre Chambart</a>, it is possible +for the OCaml compiler to optimize functors directly, but this requires +respecting several constraints that are difficult to respect in practice.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#split-encoder-and-compressor" aria-label="split encoder and compressor permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Split encoder and compressor</h4> +<p>So, the choice was done to made the encoder which respects RFC 1951 and the +compressor under some constraints. However, this is not what <a href="https://zlib.net/">zlib</a> did +and, by this way, we decided to provide a new design/API which did not follow, +in first instance, zlib (or some others implementations like +<a href="https://github.com/richgel999/miniz">miniz</a>).</p> +<p>To be fair, the choice from zlib and miniz comes from the first +point about API and the context where they are used. The main problem is the +shared queue between the encoder and the compressor. In C code, it can be hard +for the user to deal with it (where they are liable for buffer overflows).</p> +<p>In OCaml and for <code>decompress</code>, the shared queue can be well-abstracted and API +can ensure assumptions (like bounds checking).</p> +<p>Even if this design is much more complex than before, coverage tests are better +where we can separately test the encoder and the compressor. It breaks down the +initial black-box where compression was intrinsec with encoding &ndash; which was +error-prone. Indeed, <code>decompress</code> had a bug about generation of +Huffman codes but we never reached it because the (bad) +compressor was not able to produce something (a specific lengh with a specific +distance) to get it.</p> +<p>NOTE: you have just read the main reason for the new version of <code>decompress</code>!</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#the-compressor" aria-label="the compressor permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The compressor</h4> +<p>The compressor is the most easy part. The goal is to produce from an inputs +flow, an outputs flow which is an other (more compacted) representation. This +representation consists to:</p> +<ul> +<li>A <em>literal</em>, the byte as is</li> +<li>A <em>copy</em> code with an <em>offset</em> and a <em>length</em></li> +</ul> +<p>The last one say to copy <em>length</em> byte(s) from <em>offset</em>. For example, <code>aaaa</code> can +be compressed as <code>[ Literal 'a'; Copy (offset:1, len:3) ]</code>. By this way, instead +to have 4 bytes, we have only 2 elements which will be compressed then by an +<a href="https://zlib.net/feldspar.html">Huffman coding</a>. This is the main idea of the <a href="https://en.wikipedia.org/wiki/LZ77_and_LZ78">lz77</a> +compression.</p> +<p>However, the compressor should need to deal with the encoder. An easy interface, +<em>&agrave; la <a href="https://github.com/dbuenzli/uutf">uutf</a></em> should be:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">val</span> compress <span class="token punctuation">:</span> state <span class="token operator">-&gt;</span> <span class="token punctuation">[</span> <span class="token variant symbol">`Literal</span> <span class="token keyword">of</span> char <span class="token operator">|</span> <span class="token variant symbol">`Copy</span> <span class="token keyword">of</span> <span class="token punctuation">(</span>int <span class="token operator">*</span> int<span class="token punctuation">)</span> <span class="token operator">|</span> <span class="token variant symbol">`End</span> <span class="token operator">|</span> <span class="token variant symbol">`Await</span> <span class="token punctuation">]</span></code></pre></div> +<p>But as I said, we need to feed a queue instead.</p> +<hr/> +<p>At this point, the purpose of the queue is not clear and not really explained. +The signature above still is a valid and understandable design. Then, we can +imagine passing <code>Literal</code> and <code>Copy</code> directly to the encoder. However, we should +(for performance purpose) use a delaying tactic between the compressor and the +deflator[^4].</p> +<p>Behind this idea, it's to be able to implement an <em>hot-loop</em> on the encoder +which will iter inside the shared queue and <em>transmit</em>/<em>encode</em> contents +directly to the outputs buffer.</p> +<hr/> +<p>So, when we make a new <code>state</code>, we let the user supply their queue:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">val state : src -&gt; w:bistring -&gt; q:queue -&gt; state +val compress : state -&gt; [ `Flush | `Await | `End ]</code></pre></div> +<p>The <code>Flush</code> case appears when the queue is full. Then, we refind the <code>w</code> window +buffer which is needed to produce the <code>Copy</code> code. A <em>copy code</em> is limited +according RFC 1951 where <em>offset</em> can not be upper than the length of the window +(commonly 32ko). <em>length</em> is limited too to <code>258</code> (an arbitrary choice).</p> +<p>Of course, about the <code>Await</code> case, the compressor comes with a <code>src</code> function as +the inflator. Then, we added some accessors, <code>literals</code> and <code>distances</code>. The +compressor does not build the <a href="https://zlib.net/feldspar.html">Huffman coding</a> which needs +frequencies, so we need firstly to keep counters about that inside the state and +a way to get them (and pass them to the encoder).</p> +<p><code>[4]</code>: About that, you should be interesting by the reason of <a href="https://www.reddit.com/r/unix/comments/6gxduc/how_is_gnu_yes_so_fast/">why GNU yes is so +fast</a> where the secret is just about buffering.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#the-encoder" aria-label="the encoder permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The encoder</h4> +<p>Finally, we can talk about the encoder which will take the shared queue filled +by the compressor and provide an RFC 1951 compliant output flow.</p> +<p>However, we need to consider a special <em>detail</em>. When we want to make a +DYNAMIC block from frequencies and then encode the inputs flow, we can reach a +case where the shared queue contains an <em>opcode</em> (a <em>literal</em> or a <em>copy</em>) which +does not appear in our dictionary.</p> +<p>In fact, if we want to encode <code>[ Literal 'a'; Literal 'b' ]</code>, we will not try to +make a dictionary which will contains the 256 possibilities of a byte but we +will only make a dictionary from frequencies which contains only <code>'a'</code> and +<code>'b'</code>. By this way, we can reach a case where the queue contains an <em>opcode</em> +(like <code>Literal 'c'</code>) which can not be encoded by the <em>pre-determined</em> +Huffman coding &ndash; remember, the DYNAMIC block <strong>starts</strong> with +the dictionary.</p> +<p>Another point is about inputs. The encoder expects, of course, contents from +the shared queue but it wants from the user the way to encode contents: which +block we want to emit. So it has two entries:</p> +<ul> +<li>the shared queue</li> +<li>an <em>user-entry</em></li> +</ul> +<p>So for many real tests, we decided to provide this kind of API:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">type</span> dst <span class="token operator">=</span> <span class="token punctuation">[</span> <span class="token variant symbol">`Channel</span> <span class="token keyword">of</span> out_channel <span class="token operator">|</span> <span class="token variant symbol">`Buffer</span> <span class="token keyword">of</span> Buffer<span class="token punctuation">.</span>t <span class="token operator">|</span> <span class="token variant symbol">`Manual</span> <span class="token punctuation">]</span> + +<span class="token keyword">val</span> encoder <span class="token punctuation">:</span> dst <span class="token operator">-&gt;</span> q<span class="token punctuation">:</span>queue <span class="token operator">-&gt;</span> encoder +<span class="token keyword">val</span> encode <span class="token punctuation">:</span> encoder <span class="token operator">-&gt;</span> <span class="token punctuation">[</span> <span class="token variant symbol">`Block</span> <span class="token keyword">of</span> block <span class="token operator">|</span> <span class="token variant symbol">`Flush</span> <span class="token operator">|</span> <span class="token variant symbol">`Await</span> <span class="token punctuation">]</span> <span class="token operator">-&gt;</span> <span class="token punctuation">[</span> <span class="token variant symbol">`Ok</span> <span class="token operator">|</span> <span class="token variant symbol">`Partial</span> <span class="token operator">|</span> <span class="token variant symbol">`Block</span> <span class="token punctuation">]</span> +<span class="token keyword">val</span> dst <span class="token punctuation">:</span> encoder <span class="token operator">-&gt;</span> bigstring <span class="token operator">-&gt;</span> off<span class="token punctuation">:</span>int <span class="token operator">-&gt;</span> len<span class="token punctuation">:</span>int <span class="token operator">-&gt;</span> unit</code></pre></div> +<p>As expected, we take the shared queue to make a new encoder. Then, we let the +user to specify which kind of block they want to encode by the <code>Block</code> +operation.</p> +<p>The <code>Flush</code> operation tries to encode all elements present inside the shared +queue according to the current block and feed the outputs buffer. From it, the +encoder can returns some values:</p> +<ul> +<li><code>Ok</code> and the encoder encoded all <em>opcode</em> from the shared queue</li> +<li><code>Partial</code>, the outputs buffer is not enough to encode all <em>opcode</em>, the user +should flush it and give to us a new empty buffer with <code>dst</code>. Then, they must +continue with the <code>Await</code> operation.</li> +<li><code>Block</code>, the encoder reachs an <em>opcode</em> which can not be encoded with the +current block (the current dictionary). Then, the user must continue with a new +<code>Block</code> operation.</li> +</ul> +<p>The hard part is about the <em>ping-pong</em> game between the user and the encoder +where a <code>Block</code> expects a <code>Block</code> response from the user and a <code>Partial</code> expects +an <code>Await</code> response. But this design reveals something higher about zlib +this time: the <em>flush</em> mode.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#the-flush-mode" aria-label="the flush mode permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The <em>flush</em> mode</h4> +<p>Firstly, we talk about <em>mode</em> because zlib does not allow the user to +decide what they want to do when we reach a <code>Block</code> or a <code>Ok</code> case. So, it +defines some <a href="https://www.bolet.org/~pornin/deflate-flush.html">under-specified <em>modes</em></a> to apply a policy of what +to do in this case.</p> +<p>In <code>decompress</code>, we followed the same design and see that it may be not a good +idea where the logic is not very clear and the user wants may be an another +behavior. It was like a <em>black-box</em> with a <em>black-magic</em>.</p> +<p>Because we decided to split encoder and compressor, the idea of the <em>flush mode</em> +does not exists anymore where the user explicitly needs to give to the encoder +what they want (make a new block? which block? keep frequencies?). So we broke +the <em>black-box</em>. But, as we said, it was possible mostly because we can abstract +safely the shared queue between the compressor and the encoder.</p> +<p>OCaml is an expressive language and we can really talk about a queue where, in +C, it will be just an other <em>array</em>. As we said, the deal is about performance, +but now, we allow the user the possibility to write their code in this corner-case +which is when they reachs <code>Block</code>. Behaviors depends only on them.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#apis-in-general" aria-label="apis in general permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>APIs in general</h2> +<p>The biggest challenge of building a library is defining the API - you must +strike a compromise between allowing the user the flexibility to express their +use-case and constraining the user to avoid API misuse. If at all possible, +provide an <em>intuitive</em> API: force the user not to need to think about security +issues, memory consumption or performance.</p> +<p>Avoid making your API so expressive that it becomes unusable, but beware that +this sets hard limits on your users: the current <code>decompress</code> API can be used to +build <code>of_string</code> / <code>to_string</code> functions, but the opposite is not true - you +definitely cannot build a stream API from <code>of_string</code> / <code>to_string</code>.</p> +<p>The best advice when designing a library is to keep in mind what you <strong>really</strong> +want and let the other details fall into place gradually. It is very important +to work in an iterative loop of repeatedly trying to use your library; only this +can highlight bad design, corner-cases and details.</p> +<p>Finally, use and re-use it on your tests (important!) and inside higher-level +projects to give you interesting questions about your design. The last version +of <code>decompress</code> was not used in <a href="https://github.com/mirage/ocaml-git/">ocaml-git</a> mostly because the flush +mode was unclear.</p>https://tarides.com/blog/2019-08-26-decompress-the-new-decompress-apiDecompress: The New Decompress API2019-08-26T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>We are thrilled to announce that Tarides is laureate of the 21st +edition of the <a href="http://www.enseignementsup-recherche.gouv.fr/cid5745/le-concours-i-lab-2019-un-tremplin-pour-les-entrepreneurs-de-la-deep-tech.html">i-Lab innovation contest</a> +for its innovative technological solution: <strong>OSMOSE</strong>.</p> +<p>Organized by the French Ministry of Higher Education, Research and +Innovation in partnership with Bpifrance, the objective of this +competition is to identify and support innovative technology-based +projects. This year, over 700 applications have been registered and 75 +projects rewarded. The jury was composed of fifty experts (business +leaders, researchers, former laureate, start-uppers, engineers, +consultants, investors) and was chaired by Ludovic Le Moan, CEO of +Sigfox.</p> +<p>The OSMOSE solution is a software infrastructure platform to deploy +secure and distributed IoT applications, using low-resource +constraints and providing low-latency performance. This platform is +built upon innovative and open-source projects (in particular MirageOS +and Irmin) which were started at the University of Cambridge, over 10 +years ago, where the founders of Tarides met. Tarides uses unikernel +technologies and applies the research done in programming languages to +real-world systems to build safe and performant applications +specialized to their runtime environment.</p> +<p>If you are interested by the project, contact us: +<a href="mailto:contact@tarides.com">contact@tarides.com</a></p> +<p>Or <a href="http://gazagnaire.org/pub/2019.02-osmose.pdf">check our position paper</a> +to learn more about it.</p>https://tarides.com/blog/2019-07-05-i-lab-2019i-Lab 20192019-07-05T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>We are pleased to announce the release of OCamlFormat 0.10 (available on opam).</p> +<p>There have been numerous changes since the last release, so here is a comprehensive list of the new features and breaking changes to help the transition from OCamlFormat 0.9.</p> +<p><code>ocamlformat-0.10</code> now works on the 4.08 AST, although the formatting should not differ greatly from the one of <code>ocamlformat-0.9</code> in this regard. +Please note that it is necessary to build <code>ocamlformat</code> with 4.08 to be able to parse new features like <code>let*</code>.</p> +<p>Upgrading from <code>ocamlformat-0.9</code> requires to install the following dependencies:</p> +<ul> +<li>ocaml-migrate-parsetree &gt;= 1.3.1 (upgrade)</li> +<li>uuseg &gt;= 10.0.0 (new)</li> +<li>uutf &gt;= 1.0.1 (upgrade)</li> +</ul> +<p>This release focuses on preserving the style of the original source and on handling more <code>ocp-indent</code> options.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#style-preservation" aria-label="style preservation permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Style preservation</h2> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#expression-grouping" aria-label="expression grouping permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Expression grouping</h3> +<p>The new option <code>exp-grouping</code> has been added to preserve the keywords <code>begin</code>/<code>end</code> that are used to delimit expressions instead of parentheses. <code>exp-grouping=parens</code> always uses parentheses to delimit expressions. <code>exp-grouping=preserve</code> preserves the original grouping syntax (parentheses or <code>begin</code>/<code>end</code>).</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#horizontal-alignment" aria-label="horizontal alignment permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Horizontal alignment</h3> +<p>Horizontal alignment is something that users often use to make pattern-matching or type declarations easier to read, and it is a feature that has been requested many times. Three new options have been added to horizontally align the lines.</p> +<p><code>align-cases</code> horizontally aligns the match/try cases:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> fooooooooooo <span class="token operator">=</span> + <span class="token keyword">match</span> foooooooooooooooooooooooo <span class="token keyword">with</span> + <span class="token operator">|</span> Bfooooooooooooooooo <span class="token operator">-&gt;</span> foooooooooooo + <span class="token operator">|</span> C <span class="token punctuation">(</span>a<span class="token punctuation">,</span> b<span class="token punctuation">,</span> c<span class="token punctuation">,</span> d<span class="token punctuation">)</span> <span class="token operator">-&gt;</span> fooooooooooooooooooo + <span class="token operator">|</span> <span class="token punctuation">_</span> <span class="token operator">-&gt;</span> fooooooooooooooooooo</code></pre></div> +<p><code>align-constructors-decl</code> horizontally aligns type declarations:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">type</span> t <span class="token operator">=</span> + <span class="token operator">|</span> <span class="token punctuation">(</span> <span class="token punctuation">::</span> <span class="token punctuation">)</span> <span class="token keyword">of</span> a <span class="token operator">*</span> b + <span class="token operator">|</span> <span class="token punctuation">[</span><span class="token punctuation">]</span> <span class="token keyword">of</span> looooooooooooooooooooooooooooooooooooooong_break</code></pre></div> +<p><code>align-variants-decl</code> horizontally aligns variants type declarations:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">type</span> x <span class="token operator">=</span> + <span class="token punctuation">[</span> <span class="token variant symbol">`Foooooooo</span> <span class="token keyword">of</span> int + <span class="token operator">|</span> <span class="token variant symbol">`Fooooooooooooo</span> <span class="token keyword">of</span> int <span class="token punctuation">]</span></code></pre></div> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#preserve-blank-lines-in-sequences" aria-label="preserve blank lines in sequences permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Preserve blank lines in sequences</h3> +<p>The new option <code>sequence-blank-line</code> decides whether a blank line is preserved between expressions of a sequence. <code>sequence-blank-line=compact</code> will not keep any blank line between expressions of a sequence, this is still the default behavior. <code>sequence-blank-line=preserve</code> will keep a blank line between two expressions of a sequence if the input contains at least one.</p> +<p>This option can help preserving the readability of the code in this situation:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> foo x y <span class="token operator">=</span> + do_some_setup y <span class="token punctuation">;</span> + + important_function x</code></pre></div> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#supporting-more-ocp-indent-options" aria-label="supporting more ocp indent options permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Supporting more <code>ocp-indent</code> options</h2> +<p>The long term goal of <code>ocamlformat</code> is to handle every <code>ocp-indent</code> option, this release got closer to this goal as the following <code>ocp-indent</code> options are now supported by <code>ocamlformat</code>:</p> +<ul> +<li>max_indent</li> +<li>with</li> +<li>strict_with</li> +<li>ppx_stritem_ext</li> +<li>base</li> +<li>in</li> +<li>type</li> +</ul> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#offset-added-to-a-new-line" aria-label="offset added to a new line permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Offset added to a new line</h3> +<p>The new option <code>max-indent</code> sets the maximum offset (number of columns) added to a new line in addition to the offset of the previous line. If this offset is set to 2 columns, then each new line can only be indented by 2 columns more in addition to the previous line, for example:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> + fooooo + <span class="token operator">|&gt;</span> List<span class="token punctuation">.</span>iter <span class="token punctuation">(</span><span class="token keyword">fun</span> x <span class="token operator">-&gt;</span> + <span class="token keyword">let</span> x <span class="token operator">=</span> x <span class="token operator">$</span> y <span class="token keyword">in</span> + fooooooooooo x<span class="token punctuation">)</span></code></pre></div> +<p>This option is equivalent to the <code>max_indent</code> option of <code>ocp-indent</code>, and it will be set if <code>max_indent</code> is set in an <code>.ocp-indent</code> configuration file.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#indentation-of-pattern-matching-cases" aria-label="indentation of pattern matching cases permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Indentation of pattern matching cases</h3> +<p>The new options <code>funtion-indent</code> and <code>match-indent</code> respectively decide the indentation of function cases and the indentation of match/try cases. +These options are equivalent to the <code>with</code> option of <code>ocp-indent</code>, and they will be set if <code>with</code> is set in an <code>ocp-indent</code> configuration file. +If the indentation is set to 4 columns, cases are formatted like this:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> foooooooo <span class="token operator">=</span> <span class="token keyword">function</span> + <span class="token operator">|</span> fooooooooooooooooooooooo <span class="token operator">-&gt;</span> foooooooooooooooooooooooooo + +<span class="token keyword">let</span> foooooooo <span class="token operator">=</span> + <span class="token keyword">match</span> fooooooooooooooooooooooo <span class="token keyword">with</span> + <span class="token operator">|</span> fooooooooooooooooooooooo <span class="token operator">-&gt;</span> foooooooooooooooooooooooooo</code></pre></div> +<p>The new options <code>function-indent-nested</code> and <code>match-indent-nested</code> respectively decide whether the <code>function-indent</code> and the <code>match-indent</code> parameters should be applied even when in a sub-block. If these options are set to <code>never</code>, it only applies <code>function-indent</code> or <code>match-indent</code> if the function or match block starts a line. If these options are set to <code>always</code>, then the indent parameters are always applied. The <code>auto</code> value applies the indentation parameter when seen fit.</p> +<p>These options are equivalent to the <code>strict_with</code> option of <code>ocp-indent</code>, and they will be set if <code>strict_with</code> is set in an <code>ocp-indent</code> configuration file.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#indentation-inside-extension-nodes" aria-label="indentation inside extension nodes permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Indentation inside extension nodes</h3> +<p>The new option <code>extension-indent</code> sets the indentation of items (that are not at structure level) inside extension nodes. +The new option <code>stritem-extension-indent</code> sets the indentation of structure items inside extension nodes. This option is equivalent to the <code>ppx_stritem_ext</code> option of <code>ocp-indent</code>, and it will be set if <code>ppx_stritem_ext</code> is set in an <code>.ocp-indent</code> configuration file.</p> +<p>For example if <code>extension-indent</code> is set to 5 and <code>stritem-extension-indent</code> is set to 3:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> foo <span class="token operator">=</span> + <span class="token punctuation">[</span><span class="token operator">%</span>foooooooooo + fooooooooooooooooooooooooooo foooooooooooooooooooooooooooooooooo + foooooooooooooooooooooooooooo<span class="token punctuation">]</span> + <span class="token punctuation">[</span><span class="token operator">@@</span>foooooooooo + fooooooooooooooooooooooooooo foooooooooooooooooooooooooooooooooo + foooooooooooooooooooooooooooo<span class="token punctuation">]</span> + +<span class="token punctuation">[</span><span class="token operator">@@@</span>foooooooooo + fooooooooooooooooooooooooooo foooooooooooooooooooooooooooooooooo + foooooooooooooooooooooooooooo<span class="token punctuation">]</span></code></pre></div> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#let-binding-indentation" aria-label="let binding indentation permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Let-binding indentation</h3> +<p>The new option <code>let-binding-indent</code> sets the indentation of let binding expressions if they do not fit on a single line. This option is equivalent to the <code>base</code> option of <code>ocp-indent</code>. +The new option <code>indent-after-in</code> sets the indentation after <code>let ... in</code>, unless followed by another <code>let</code>. This option is equivalent to the <code>in</code> option of <code>ocp-indent</code>. +The new option <code>type-decl-indent</code> sets the indentation of type declarations if they do not fit on a single line. This option is equivalent to the <code>type</code> option of <code>ocp-indent</code>.</p> +<p>These options will be set if their <code>ocp-indent</code> counterparts are set in an <code>.ocp-indent</code> configuration file.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#miscellaneous-features" aria-label="miscellaneous features permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Miscellaneous features</h2> +<p>This release also brings some new options, new values for existing features, or corrects erroneous behaviours.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#indicate-multiline-delimiters" aria-label="indicate multiline delimiters permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Indicate multiline delimiters</h3> +<p>The former <code>indicate-multiline-delimiters</code> boolean option is now a 3-valued option:</p> +<ul> +<li><code>indicate-multiline-delimiters=space</code> (was equivalent to <code>true</code>) prints a space inside the delimiter to indicate the matching one is on a different line.</li> +<li><code>indicate-multiline-delimiters=no</code> (was equivalent to <code>false</code>) doesn't do anything special to indicate the closing delimiter.</li> +<li><code>indicate-multiline-delimiters=closing-on-separate-line</code> is the new feature of this option, it makes sure that the closing delimiter is on its own line.</li> +</ul> +<p>On this example we can see the closing parenthesis delimiting the nested pattern-matchings are on their own line and are aligned with the matching opening parenthesis:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> + <span class="token keyword">match</span> v <span class="token keyword">with</span> + <span class="token operator">|</span> None <span class="token operator">-&gt;</span> None + <span class="token operator">|</span> Some x <span class="token operator">-&gt;</span> + <span class="token punctuation">(</span> <span class="token keyword">match</span> x <span class="token keyword">with</span> + <span class="token operator">|</span> None <span class="token operator">-&gt;</span> None + <span class="token operator">|</span> Some x <span class="token operator">-&gt;</span> + <span class="token punctuation">(</span> <span class="token keyword">match</span> x <span class="token keyword">with</span> + <span class="token operator">|</span> None <span class="token operator">-&gt;</span> None + <span class="token operator">|</span> Some x <span class="token operator">-&gt;</span> x + <span class="token punctuation">)</span> + <span class="token punctuation">)</span></code></pre></div> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#formatting-of-literal-strings" aria-label="formatting of literal strings permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Formatting of literal strings</h3> +<p><code>break-string-literals=newlines</code> now takes into account pretty-printing commands like <code>@,</code>, <code>@;</code> and <code>@\n</code> to produce more readable strings. A new value for this option has been added, <code>break-string-literals=newlines-and-wrap</code>, to break lines at newlines delimiters (including pretty-printing commands) and also wrap the string literals at the margin.</p> +<p>Here is how <code>break-string-literals=newlines-and-wrap</code> formats a string:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> fooooooooooo <span class="token operator">=</span> + <span class="token string">&quot;Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod \ + tempor incididunt ut labore et dolore magna aliqua.@;\ + Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi \ + ut aliquip ex ea commodo consequat.@;\ + Duis aute irure dolor in reprehenderit in voluptate velit esse cillum \ + dolore eu fugiat nulla pariatur.@;\ + Excepteur sint occaecat cupidatat non proident, sunt in culpa qui \ + officia deserunt mollit anim id est laborum.&quot;</span></code></pre></div> +<p><strong>Warning:</strong> the <code>break-string-literals</code> will likely be removed in the next release and the default behavior would be <code>newlines-and-wrap</code>.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#break-before-the-in-keyword" aria-label="break before the in keyword permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Break before the <code>in</code> keyword</h3> +<p>The new option <code>break-before-in</code> has been added to decide whether the line should break before the <code>in</code> keyword of a <code>let</code> binding. <code>break-before-in=fit-or-vertical</code> will always break the line before the <code>in</code> keyword if the whole <code>let</code> binding does not fit on a single line, it is still the default behavior. <code>break-before-in=auto</code> will only break the line if the <code>in</code> keyword does not fit on the previous line.</p> +<p>For example:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> <span class="token punctuation">_</span> <span class="token operator">=</span> + <span class="token keyword">let</span> short <span class="token operator">=</span> this is short <span class="token keyword">in</span> + <span class="token keyword">let</span> fooo <span class="token operator">=</span> + <span class="token punctuation">(</span>this is very long<span class="token punctuation">)</span> but <span class="token punctuation">(</span>the <span class="token keyword">in</span> keyword can fit<span class="token punctuation">)</span> on the same line <span class="token keyword">in</span> + foooooo</code></pre></div> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#indentation-of-nested-pattern-matching" aria-label="indentation of nested pattern matching permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Indentation of nested pattern-matching</h3> +<p>The new option <code>nested-match</code> defines the style of pattern-matchings nested in the last case of another pattern-matching. <code>nested-match=wrap</code> wraps the nested pattern-matching with parentheses and adds indentation, this is still the default behavior. <code>nested-match=align</code> vertically aligns the nested pattern-matching under the encompassing pattern-matching, for example:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> + <span class="token keyword">match</span> v <span class="token keyword">with</span> + <span class="token operator">|</span> None <span class="token operator">-&gt;</span> None + <span class="token operator">|</span> Some x <span class="token operator">-&gt;</span> + <span class="token keyword">match</span> x <span class="token keyword">with</span> + <span class="token operator">|</span> None <span class="token operator">-&gt;</span> None + <span class="token operator">|</span> Some x <span class="token operator">-&gt;</span> x</code></pre></div> +<p>The new option <code>cases-matching-exp-indent</code> decides the indentation of cases right-hand sides which are <code>match</code> or <code>try</code> expressions. <code>cases-matching-exp-indent=compact</code> forces an indentation of 2, unless <code>nested-match</code> is set to <code>align</code> and this is the last case of the pattern matching. <code>compact</code> is the default behavior. <code>cases-matching-exp-indent=normal</code> indents as it would any other expression.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#whitelist-of-files-to-format" aria-label="whitelist of files to format permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Whitelist of files to format</h3> +<p>A new kind of configuration files is now handled by <code>ocamlformat</code>: <code>.ocamlformat-enable</code> files. +If the <code>disable</code> option is set, an <code>.ocamlformat-enable</code> file can list the files that <code>ocamlformat</code> should format even when the <code>disable</code> option is set. Each line in an <code>.ocamlformat-enable</code> file specifies a filename relative to the directory containing the <code>.ocamlformat-enable</code> file.</p> +<p>The <code>.ocamlformat-enable</code> files are using the same syntax as the <code>.ocamlformat-ignore</code> files: lines starting with <code>#</code> are ignored and can be used as comments.</p> +<p>These new configuration files do not contradict the existing <code>.ocamlformat-ignore</code> files, as <code>.ocamlformat-enable</code> are only considered when <code>disable</code> is set, and <code>.ocamlformat-ignore</code> are only considered when <code>disable</code> is not set.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#disable-outside-detected-project" aria-label="disable outside detected project permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Disable outside detected project</h3> +<p>The <code>disable-outside-detected-project</code> option is now set by default.</p> +<p>When the option <code>--enable-outside-detected-project</code> is not set, <code>.ocamlformat</code> files outside of the project (including the one in <code>XDG_CONFIG_HOME</code>) are not read. The project root of an input file is taken to be the nearest ancestor directory that contains a .git or .hg or dune-project file. If no config file is found, formatting is disabled.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#space-around-collection-expressions" aria-label="space around collection expressions permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Space around collection-expressions</h3> +<p>The former option <code>space-around-collection-expressions</code> that was deciding whether a space should be added inside the delimiters of collection expressions (lists, arrays, records, variants) has been replaced by 4 new options: <code>space-around-arrays</code>, <code>space-around-lists</code>, <code>space-around-records</code> and <code>space-around-variants</code>, to allow a finer grain customization.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#fit-or-vertical-mode-for-pattern-matching" aria-label="fit or vertical mode for pattern matching permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Fit-or-vertical mode for pattern matching</h3> +<p>The <code>break-cases</code> option that decides the shape of pattern matching has a new value <code>fit-or-vertical</code>. <code>break-cases=fit-or-vertical</code> tries to fit all or-patterns on the same line, otherwise breaks each or-pattern (they are wrapped in other modes). +For example if this set of or-patterns does not fit on a single line, we get the following output:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> ffffff <span class="token operator">=</span> + <span class="token keyword">match</span> foooooooooooo <span class="token keyword">with</span> + <span class="token operator">|</span> Aaaaaaaaaaaaaaaaa + <span class="token operator">|</span> Bbbbbbbbbbbbbbbbb + <span class="token operator">|</span> Ccccccccccccccccc + <span class="token operator">|</span> Ddddddddddddddddd + <span class="token operator">|</span> Eeeeeeeeeeeeeeeee <span class="token operator">-&gt;</span> foooooooooooooooooooo + <span class="token operator">|</span> Fffffffffffffffff <span class="token operator">-&gt;</span> fooooooooooooooooo</code></pre></div> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#kr-style-for-if-then-else" aria-label="kr style for if then else permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>K&amp;R style for if-then-else</h3> +<p>The <code>if-then-else</code> option now has a new value <code>k-r</code> that uses parentheses (when necessary) to reproduce a formatting close to the K&amp;R style. For example:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> <span class="token punctuation">_</span> <span class="token operator">=</span> + <span class="token keyword">if</span> b <span class="token keyword">then</span> <span class="token punctuation">(</span> + something loooooooooooooooooooooooooooooooong enough to_trigger a break <span class="token punctuation">;</span> + this is more + <span class="token punctuation">)</span> <span class="token keyword">else</span> <span class="token keyword">if</span> b1 <span class="token keyword">then</span> <span class="token punctuation">(</span> + something loooooooooooooooooooooooooooooooong enough to_trigger a break <span class="token punctuation">;</span> + this is more + <span class="token punctuation">)</span> <span class="token keyword">else</span> + e</code></pre></div> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#breaking-changes" aria-label="breaking changes permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Breaking changes</h2> +<ul> +<li>the <code>indicate-multiline-delimiters</code> option is no longer a boolean option but now has 3 values: <code>space</code>, <code>no</code> and <code>closing-on-separate-line</code> that are detailed in this patch note.</li> +<li>the <code>disable-outside-detected-project</code> option is now set by default.</li> +<li>the <code>default</code> preset profile has been removed (it was equivalent to the <code>ocamlformat</code> profile with <code>break-cases=fit</code>).</li> +<li>the <code>space-around-collection-expressions</code> option has been replaced by 4 new options: <code>space-around-arrays</code>, <code>space-around-lists</code>, <code>space-around-records</code> and <code>space-around-variants</code>.</li> +</ul> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#whats-next" aria-label="whats next permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What's next?</h2> +<p>We strongly encourage our users to try out the <code>conventional</code> preset profile, as we plan to make it the default profile in a future release. This profile's purpose is to reproduce the most commonly encountered styles, and it may be more pleasing to the eye than the current default options.</p> +<p>As stated previously, the <code>break-string-literals</code> will likely be removed in the next release and the default behavior would be <code>newlines-and-wrap</code>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#credits" aria-label="credits permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Credits</h2> +<p>This release also contains many other changes and bug fixes that we cannot detail here.</p> +<p>We would like to thank our maintainers and contributors for this release: Jules Aguillon, Josh Berdine, Hugo Heuzard, Guillaume Petiot and Thomas Refis, and especially our industrial users Jane Street, Ahrefs and Nomadic Labs that made this work possible by funding this project and providing helpful contributions and feedback.</p> +<p>We would be happy to provide support for more customers, please contact us at <a href="mailto:contact@tarides.com">contact@tarides.com</a></p> +<p>If you wish to get involved with OCamlFormat development or file an issue, please read the <a href="https://github.com/ocaml-ppx/ocamlformat/blob/master/CONTRIBUTING.md">contributing guide</a>, any contribution is welcomed.</p>https://tarides.com/blog/2019-06-27-release-of-ocamlformat-0-10Release of OCamlFormat 0.102019-06-27T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Over the past few months, we have been heavily engaged in release +engineering the <a href="https://github.com/mirage/irmin/issues/658">Irmin 2.0 release</a>, +which covers multiple years of work on all of its constituent +elements. We first began Irmin in late 2013 to act as a +<a href="https://mirage.io/blog/introducing-irmin">Git-like distributed and branchable storage substrate</a> +that would let us escape the <a href="https://www.cl.cam.ac.uk/~pes20/SOSP15-paper102-submitted.pdf">perils of POSIX filesystems</a>.</p> +<p>The Irmin libraries provide snapshotting, branching and merging +operations over storage and can communicate via Git both on-disk and +remotely. Irmin today therefore consists of many discrete OCaml +libraries that compose together to form a set of <a href="https://blog.acolyer.org/2015/01/14/mergeable-persistent-data-structures/">mergeable data structures</a> +that can be used in MirageOS unikernels and normal OCaml daemons such +as <a href="http://tezos.com">Tezos</a>.</p> +<p>In this blog post, we wanted to explain some of the release +engineering ongoing, and to highlight some areas where we could use +help from the community to test out pieces (and hopefully find your +own uses in your own infrastructure for it). The overall effort is +tracked in <a href="https://github.com/mirage/irmin/issues/658">mirage/irmin#658</a>, so +feel free to comment on there as well.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#ocaml-git" aria-label="ocaml git permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>ocaml-git</h3> +<p>Irmin is parameterised over the exact communication mechanisms it uses +between nodes, both as an on-disk format and also the remoting +protocol. The most important concrete implementation is Git, which +has turned into the world&rsquo;s most popular version control system. In +order to seamlessly integrate with Irmin, we embarked on an effort to +build a complete re-implementation of +<a href="https://github.com/mirage/ocaml-git">Git from scratch in pure OCaml</a>.</p> +<p>You can read <a href="https://tarides.com/blog/2018-10-19-ocaml-git-2-0.html">details of the git 2.0 release</a> +on this blog, but from a release engineering perspective we have steadily +been fixing corner cases in this implementation. The development +ocaml-git trees feature <a href="https://github.com/mirage/ocaml-git/pull/348">fixes to https+git</a>, +for <a href="https://github.com/mirage/ocaml-git/pull/351">listing remotes</a>, supporting +<a href="https://github.com/mirage/ocaml-git/pull/341">authenticated URIs</a> and +more.</p> +<p>These fixes are possible because users tried end-to-end usecases that +found these corner cases, so we&rsquo;d really like to see more. For +example, our friends at <a href="https://robur.io">Robur</a> have submitted fixes +from their integration of it into their upcoming <a href="https://github.com/roburio/caldav">CalDAV engine</a>. +The Mirage <a href="https://github.com/Engil/Canopy">canopy</a> blog engine can now also +push/pull reliably from pure MirageOS unikernels between nodes, which +is a huge step.</p> +<p>If you get a chance to try ocaml-git in your infrastructure, please +let us know how you get along as we prepare a release of the git +libraries with all these fixes (which will be used in Irmin 2.0).</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#wodan" aria-label="wodan permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Wodan</h3> +<p>Irmin&rsquo;s storage layer is also well abstracted, so backends other than +a Unix filesystem or Git are supported. Irmin can run in highly +diverse and OS-free environments, and so we began engineering the +<a href="https://github.com/mirage/wodan">Wodan filesystem</a> as a +domain-specific filesystem designed for MirageOS, Irmin and modern +flash drives. See <a href="https://g2p.github.io/research/wodan.pdf">the OCaml Workshop 2017 abstract on +it</a> for more design +rationale)</p> +<p>As part of the Irmin 2.0 release, Wodan is also being prepared for a +release, and you can find <a href="https://github.com/mirage/wodan/tree/master/src/wodan-irmin">Irmin 2.0 +support</a> +in the source. If you&rsquo;d like a standalone block-device based +persistence environment for Irmin, please try this out. This is the +preferred backend for using Irmin storage in a unikernel.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#tezos-and-irmin-pack" aria-label="tezos and irmin pack permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Tezos and irmin-pack</h3> +<p>Another big user of Irmin is the <a href="https://tezos.com">Tezos blockchain</a>, +and we have been optimising the persistent space usage of Irmin as their +network grows. Because Tezos doesn&rsquo;t require full Git format support, +we created a hybrid backend that grabs the best bits of Git (e.g. the +packfile mechanism) and engineered a domain-specific backend tailored +for Tezos usage. Crucially, because of the way Irmin is split into +clean libraries and OCaml modules, we only had to modify a small part +of the codebase and could also re-use elements of the Git 2.0 +engineering effort we described above.</p> +<p>The <a href="https://github.com/mirage/irmin/pull/615">irmin-pack backend</a> is +currently being reviewed and integrated ahead of Irmin 2.0 to provide +a significant improvement in disk usage -- more information to come soon. +There is a corresponding <a href="https://gitlab.com/samoht/tezos/tree/snapshot-irmin-pack">Tezos branch</a> +using the Irmin 2.0 code that will be integrated downstream in Tezos +once we complete the Irmin 2.0 tests.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#irmin-graphql-and-browser-irmin" aria-label="irmin graphql and browser irmin permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Irmin-GraphQL and &ldquo;browser Irmin&rdquo;</h3> +<p>Another new area of huge interest to us is +<a href="https://graphql.org">GraphQL</a> in order to provide frontends a rich +query language for Irmin hosted applications. Irmin 2.0 includes a +builtin GraphQL server so you can <a href="https://twitter.com/cuvius/status/1017136581755457539">manipulate your Git repo via +GraphQL</a>.</p> +<p>If you are interested in (for example) compiling elements of Irmin to +JavaScript or wasm, for usage in frontends, then the Irmin 2.0 release +makes it significantly easier to support this architecture. We&rsquo;ve +already seen some exploratory efforts <a href="https://github.com/mirage/irmin/issues/681">report issues</a> +when doing this, and we&rsquo;ve had it working ourselves in <a href="http://roscidus.com/blog/blog/2015/04/28/cuekeeper-gitting-things-done-in-the-browser/">Irmin 1.0 Cuekeeper</a> +so we are excited by the potential power of applications built using +this model. If you have ideas/questions, please get in touch on the +<a href="https://github.com/mirage/irmin/issues">issue tracker</a> with your +usecase.</p> +<p>This post is just the precursor to the Irmin 2.0 release, so expect to +hear more about it in the coming weeks and months. This is primarily +a call for help from early adopters interested in helping the project +out. All of our code is liberally licensed open source, and so this +is a good time to tie together end-to-end usecases and help ensure we +don&rsquo;t make any decisions in Irmin 2.0 that go counter to some product +you&rsquo;d like to build. That&rsquo;s only possible with your feedback, so +either get in touch via the <a href="https://github.com/mirage/irmin/issues">issue tracker</a>, on +<a href="https://discuss.ocaml.org">discuss.ocaml.org</a> via the <code>mirageos</code> tag, +or just <a href="mailto:mirageos-devel@lists.xenproject.org">email us</a>.</p> +<p>A huge thank you to all our commercial customers, end users and open +source developers who have contributed their time, expertise and +financial support to help us achieve our goal of delivering a modern +storage stack in the spirit of Git. We look forward to getting Irmin +2.0 into your hands very soon!</p>https://tarides.com/blog/2019-05-13-on-the-road-to-irmin-v2On the road to Irmin v22019-05-13T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>These last few months, I spent some time writing new OCaml PPX rewriters or contributing to existing +ones. It's a really fun experience. Toying around with the AST taught me a lot about a language I +thought I knew really well. Turns out I actually had no idea what I was doing all these years.</p> +<p>All jokes aside, I was surprised that the most helpful tricks I learned while writing PPX rewriters +weren't properly documented. There already exist a few very good introduction articles on the +subject, like that +<a href="https://whitequark.org/blog/2014/04/16/a-guide-to-extension-points-in-ocaml/">2014's article from Whitequark</a>, +this <a href="http://rgrinberg.com/posts/extensions-points-update-1/">more recent one from Rudi Grinberg</a> +or even <a href="https://victor.darvariu.me/jekyll/update/2018/06/19/ppx-tutorial.html">this last one from Victor Darvariu</a> +I only discovered after I actually started writing my own. I still felt like they were slightly +outdated or weren't answering all the questions I had when I started playing with PPX and writing my +first rewriters.</p> +<p>I decided to share my PPX adventures in the hope that it can help others familiarize with this bit +of the OCaml ecosystem and eventually write their first rewriters. The scope of this article is not to +cover every single detail about the PPX internals but just to give a gentle introduction to +beginners to help them get settled. That also means I might omit things that I don't think are worth +mentioning or that might confuse the targetted audience but feel free to comment if you believe +this article missed an important point.</p> +<p>It's worth mentioning that a lot of the nice tricks mentioned in these lines were given to me by a +wonderful human being called &Eacute;tienne Millon, thanks &Eacute;tienne!</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#what-is-a-ppx" aria-label="what is a ppx permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What is a PPX?</h2> +<p>PPX rewriters or PPX-es are preprocessors that are applied to your code before passing it on to the +compiler. They don't operate on your code directly but on the Abstract Syntax Tree or AST resulting +from its parsing. That means that they can only be applied to syntactically correct OCaml code. You +can think of them as functions that take an AST and return a new AST.</p> +<p>That means that in theory you can do a lot of things with a PPX, including pretty bad and cryptic +things. You could for example replace every instance of <code>true</code> by <code>false</code>, swap the branches of any +<code>if-then-else</code> or randomize the order of every pattern-matching case. +Obviously that's not the kind of behaviour that we want as it would make it impossible to +understand the code since it would be so far from the actual AST the compiler would get. +In practice PPX-es have a well defined scope and only transform parts you explicitly annotated.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#understanding-the-ocaml-ast" aria-label="understanding the ocaml ast permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Understanding the OCaml AST</h3> +<p>First things first, what is an AST. An AST is an abstract representation of your code. As the name +suggests it has a tree-like structure where the root describes your entire file. It has children for +each bits such as a function declaration or a type definition, each of them having their own +children, for example for the function name, its argument and its body and that goes on until you +reach a leaf such as a literal <code>1</code>, <code>&quot;abc&quot;</code> or a variable for instance. +In the case of OCaml it's a set of recursive types allowing us to represent OCaml code as an OCaml +value. This value is what the parser passes to the compiler so it can type check and compile it to +native or byte code. +Those types are defined in OCaml's <code>Parsetree</code> module. The entry points there are the <code>structure</code> +type which describes the content of an <code>.ml</code> file and the <code>signature</code> type which describes the +content of an <code>.mli</code> file.</p> +<p>As mentionned above, a PPX can be seen as a function that transforms an AST. Writing a PPX thus +requires you to understand the AST, both to interpret the one you'll get as input and +to produce the right one as output. This is probably the trickiest part as unless you've already +worked on the OCaml compiler or written a PPX rewriter, that will probably be the first time you two +meet. Chances are also high that'll be a pretty bad first date and you will need some to time +to get to know each other.</p> +<p>The <code>Parsetree</code> module <a href="https://caml.inria.fr/pub/docs/manual-ocaml/compilerlibref/Parsetree.html">documentation</a>, +is a good place to start. The above mentioned <code>structure</code> and <code>signature</code> types are at the root of +the AST but some other useful types to look at at first are:</p> +<ul> +<li><code>expression</code> which describes anything in OCaml that evaluates to a value, the right hand side of a +<code>let</code> binding for instance.</li> +<li><code>pattern</code> which is what you use to deconstruct an OCaml value, the left hand side of a <code>let</code> +binding or a pattern-matching case for example.</li> +<li><code>core_type</code> which describes type expressions ie what you would find on the right hand side of a +value description in a <code>.mli</code>, ie <code>val f : &lt;what_goes_there&gt;</code>.</li> +<li><code>structure_item</code> and <code>signature_item</code> which describe the top level AST nodes you can find in a +<code>structure</code> or <code>signature</code> such as type definitions, value or module declarations.</li> +</ul> +<p>Thing is, it's a bit a rough and there's no detailed explanation about how a specific bit of code is +represented, just type definitions. Most of the time, the type, field, and variant names are +self-explanatory but it can get harder with some of the more advanced language features. +It turns out there are plenty of comments that are really helpful in the actual <code>parsetree.mli</code> file +and that aren't part of the generated documentation. You can find them on +<a href="https://github.com/ocaml/ocaml/blob/trunk/parsing/parsetree.mli">github</a> but I personally prefer to +have it opened in a VIM tab when I work on a PPX so I usually open +<code>~/.opam/&lt;current_working_switch&gt;/lib/ocaml/compiler-libs/parsetree.mli</code>.</p> +<p>This works well while exploring but you might also want a more straightforward approach to +discovering what the AST representation is for some specific OCaml code. The +<a href="https://github.com/ocaml-ppx/ppx_tools"><code>ppx_tools</code></a> opam package comes with a <code>dumpast</code> binary +that pretty prints the AST for any given piece of valid OCaml code. You can install it using opam:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">$opam install ppx_tools</code></pre></div> +<p>and then run it using <code>ocamlfind</code>:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">$ocamlfind ppx_tools/dumpast some_file.ml</code></pre></div> +<p>You can use it on <code>.ml</code> and <code>.mli</code> files or to quickly get the AST for an expression with the <code>-e</code> +option:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">$ocamlfind ppx_tools/dumpast -e &quot;1 + 1&quot;</code></pre></div> +<p>Similarly, you can use the <code>-t</code> or <code>-p</code> options to respectively pretty print ASTs from type +expressions or patterns.</p> +<p>Using <code>dumpast</code> to get both the ASTs of a piece of code using your future PPX and the resulting +preprocessed code is a good way to start and will help you figure out what are the steps required to +get there.</p> +<p>Note that you can use the compiler or <code>utop</code> have a similar feature with the <code>-dparsetree</code> flag. +Running <code>ocamlc/ocamlopt -dparsetree file.ml</code> will pretty print the AST of the given file while +running <code>utop -dparsetree</code> will pretty print the AST of the evaluated code alongside it's +evaluation. +I tend to prefer the pretty printed AST from <code>dumpast</code> but any of these tools will prove helpful +in understanding the AST representation of a given piece of OCaml code.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#language-extensions-interpreted-by-ppx-es" aria-label="language extensions interpreted by ppx es permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Language extensions interpreted by PPX-es</h3> +<p>OCaml 4.02 introduced syntax extensions meant to be used by external tools such as PPX-es. Knowing +their syntax and meaning is important to understand how most of the existing rewriters +work because they usually look for those language extensions in the AST to know which part of it +they need to modify.</p> +<p>The two language extensions we're interested in here are extension nodes and attributes. They are +defined in detail in the OCaml manual (see the +<a href="https://caml.inria.fr/pub/docs/manual-ocaml/attributes.html">attributes</a> and +<a href="https://caml.inria.fr/pub/docs/manual-ocaml/extensionnodes.html">extension nodes</a> sections) but I'll +try to give a good summary here.</p> +<p>Extension nodes are used in place of expressions, module expressions, patterns, type expressions or +module type expressions. Their syntax is <code>[%extension_name payload]</code>. We'll come back to the payload +part a little later. +You can also find extension nodes at the top level of modules or module signatures with the syntax +<code>[%%extension_name payload]</code>. +Hopefully the following cheatsheet can help you remember the basics of how and where you can use +them:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">type</span> t <span class="token operator">=</span> + <span class="token punctuation">{</span> a <span class="token punctuation">:</span> int + <span class="token punctuation">;</span> b <span class="token punctuation">:</span> <span class="token punctuation">[</span><span class="token operator">%</span>ext pl<span class="token punctuation">]</span> + <span class="token punctuation">}</span> + +<span class="token keyword">let</span> x <span class="token operator">=</span> + <span class="token keyword">match</span> <span class="token number">1</span> <span class="token keyword">with</span> + <span class="token operator">|</span> <span class="token number">0</span> <span class="token operator">-&gt;</span> <span class="token punctuation">[</span><span class="token operator">%</span>ext pl<span class="token punctuation">]</span> + <span class="token operator">|</span> <span class="token punctuation">[</span><span class="token operator">%</span>ext pl<span class="token punctuation">]</span> <span class="token operator">-&gt;</span> <span class="token boolean">true</span> + +<span class="token punctuation">[</span><span class="token operator">%%</span>ext pl<span class="token punctuation">]</span></code></pre></div> +<p>Because extension nodes stand where regular AST nodes should, the compiler won't accept them and +will give you an <code>Uninterpreted extension</code> error. Extension nodes have to be expanded by a PPX for +your code to compile.</p> +<p>Attributes are slightly different although their syntax is very close to extensions. Attributes +are attached to existing AST nodes instead of replacing them. That means that they don't necessarily +need to be transformed and the compiler will ignore unknown attributes by default. +They can come with a payload just like extensions and use <code>@</code> instead of <code>%</code>. The number of <code>@</code> +preceding the attribute name specifies which kind of node they are attached to:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> a <span class="token operator">=</span> <span class="token number">12</span> <span class="token punctuation">[</span><span class="token operator">@</span>attr pl<span class="token punctuation">]</span> + +<span class="token keyword">let</span> b <span class="token operator">=</span> <span class="token string">&quot;some string&quot;</span> <span class="token punctuation">[</span><span class="token operator">@@</span>attr pl<span class="token punctuation">]</span> + +<span class="token punctuation">[</span><span class="token operator">@@@</span>attr pl<span class="token punctuation">]</span></code></pre></div> +<p>In the first example, the attribute is attached to the expression <code>12</code> while in the second example +it is attached to the whole <code>let b = &quot;some string&quot;</code> value binding. The third one is of a slightly +different nature as it is a floating attribute. It's not attached to anything per-se and just ends +up in the AST as a structure item. +Because there is a wide variety of nodes to which you can attach attributes, I won't go too far into +details here but a good rule of thumb is that you use <code>@@</code> attributes when you want them attached to +structure or signature items, for anything deeper within the AST structure such as patterns, +expressions or core types, use the single <code>@</code> syntax. Looking at the <code>Parsetree</code> documentation can +help you figure out where you can find attributes.</p> +<p>Now let's talk about those payloads I mentioned earlier. You can think of them as &quot;arguments&quot; to +the extension points and attributes. You can pass different kinds of arguments and the syntax varies +for each of them:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> a <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token operator">%</span>ext expr_or_str_item<span class="token punctuation">]</span> +<span class="token keyword">let</span> b <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token operator">%</span>ext<span class="token punctuation">:</span> type_expr_or_sig_item<span class="token punctuation">]</span> +<span class="token keyword">let</span> c <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token operator">%</span>ext<span class="token operator">?</span> pattern<span class="token punctuation">]</span></code></pre></div> +<p>As suggested in the examples, you can pass expressions or structure items using a space character, +type expressions or signature items (anything you'd find at the top level of a module signature) +using a <code>:</code> or a pattern using a <code>?</code>.</p> +<p>Attributes' payload use the same syntax:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> a <span class="token operator">=</span> <span class="token char">'a'</span> <span class="token punctuation">[</span><span class="token operator">@</span>attr expr_or_str_item<span class="token punctuation">]</span> +<span class="token keyword">let</span> b <span class="token operator">=</span> <span class="token char">'b'</span> <span class="token punctuation">[</span><span class="token operator">@</span>attr<span class="token punctuation">:</span> type_expr_or_sig_item<span class="token punctuation">]</span> +<span class="token keyword">let</span> a <span class="token operator">=</span> <span class="token char">'a'</span> <span class="token punctuation">[</span><span class="token operator">@</span>attr<span class="token operator">?</span> pattern<span class="token punctuation">]</span></code></pre></div> +<p>Some PPX-es rely on other language extensions such as the suffix character you can attach to <code>int</code> +and <code>float</code> literals (<code>10z</code> could be used by a PPX to turn it into <code>Z.of_string &quot;10&quot;</code> for instance) +or quoted strings with a specific identifier (<code>{ppx_name|some quoted string|ppx_name}</code> can be used +if you want your PPX to operate on arbitrary strings and not only syntactically correct OCaml) but +attributes and extensions are the most commonly used ones.</p> +<p>Attributes and extension points can be expressed using an infix syntax. The attribute version is +barely used but some forms of the infix syntax for extension points are used by popular PPX-es and +it is likely you will encounter some of the following:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> infix_let_extension <span class="token operator">=</span> + <span class="token keyword">let</span><span class="token operator">%</span>ext x <span class="token operator">=</span> <span class="token number">2</span> <span class="token keyword">in</span> + <span class="token operator">..</span><span class="token punctuation">.</span> + +<span class="token keyword">let</span> infix_match_extension <span class="token operator">=</span> + <span class="token keyword">match</span><span class="token operator">%</span>ext y <span class="token keyword">with</span> <span class="token operator">..</span><span class="token punctuation">.</span> + +<span class="token keyword">let</span> infix_try_extension <span class="token operator">=</span> + <span class="token keyword">try</span><span class="token operator">%</span>ext f z <span class="token keyword">with</span> <span class="token punctuation">_</span> <span class="token operator">-&gt;</span> <span class="token operator">..</span><span class="token punctuation">.</span></code></pre></div> +<p>which are syntactic sugar for:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> infix_let_extension <span class="token operator">=</span> + <span class="token punctuation">[</span><span class="token operator">%</span>ext <span class="token keyword">let</span> x <span class="token operator">=</span> <span class="token number">2</span> <span class="token keyword">in</span> <span class="token operator">..</span><span class="token punctuation">.</span><span class="token punctuation">]</span> + +<span class="token keyword">let</span> infix_match_extension <span class="token operator">=</span> + <span class="token punctuation">[</span><span class="token operator">%</span>ext <span class="token keyword">match</span> y <span class="token keyword">with</span> <span class="token operator">..</span><span class="token punctuation">.</span><span class="token punctuation">]</span> + +<span class="token keyword">let</span> infix_try_extension <span class="token operator">=</span> + <span class="token punctuation">[</span><span class="token operator">%</span>ext <span class="token keyword">try</span> f z <span class="token keyword">with</span> <span class="token punctuation">_</span> <span class="token operator">-&gt;</span> <span class="token operator">..</span><span class="token punctuation">.</span><span class="token punctuation">]</span></code></pre></div> +<p>A good example of a PPX making heavy use of these if +<a href="http://ocsigen.org/lwt/4.1.0/api/Ppx_lwt"><code>lwt_ppx</code></a>. The OCaml manual also contains more examples +of the infix syntax in the Attributes and Extension points sections mentioned above.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#the-two-main-kind-of-ppx-es" aria-label="the two main kind of ppx es permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The two main kind of PPX-es</h3> +<p>There is a wide variety of PPX rewriters but the ones you'll probably see the most are Extensions and +Derivers.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#extensions" aria-label="extensions permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Extensions</h4> +<p>Extensions will rewrite tagged parts of the AST, usually extension nodes of the form +<code>[%&lt;extension_name&gt; payload]</code>. They will replace them with a different AST node of the same nature ie +if the extension point was located where an expression should be, the rewriter will produce an +expression. Good examples of extensions are:</p> +<ul> +<li><a href="https://github.com/rgrinberg/ppx_getenv2"><code>ppx_getenv2</code></a> which replaces <code>[%getenv SOME_VAR]</code> with +the value of the environment variable <code>SOME_VAR</code> at compile time.</li> +<li><a href="https://github.com/NathanReb/ppx_yojson"><code>ppx_yojson</code></a> which allows you to write <code>Yojson</code> values +using OCaml syntax to mimic actual json. For instance you'd use <code>[%yojson {a = None; b = 1}]</code> to +represent <code>{&quot;a&quot;: null, &quot;b&quot;: 1}</code> instead of the <code>Yojson</code>'s notation: +<code>Assoc [(&quot;a&quot;, Null); (&quot;b&quot;, Int 1)]</code>.</li> +</ul> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#derivers" aria-label="derivers permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Derivers</h4> +<p>Derivers or deriving plugins will &quot;insert&quot; new nodes derived from type definitions annotated with a +<code>[@@deriving &lt;deriver_name&gt;]</code> attribute. They have various applications but are particularly useful +to derive functions that are tedious and error prone to write by hand such as comparison functions, +pretty printers or serializers. It's really convenient as you don't have to update those functions +every time you update your type definitions. They were inspired by Haskell Type classes. Good +examples of derivers are:</p> +<ul> +<li><a href="https://github.com/ocaml-ppx/ppx_deriving"><code>ppx_deriving</code></a> itself comes with a bunch of deriving +plugins such as <code>eq</code>, <code>ord</code> or <code>show</code> which respectively derives, as you might have guessed, +equality, comparison and pretty-printing functions.</li> +<li><a href="https://github.com/ocaml-ppx/ppx_deriving_yojson"><code>ppx_deriving_yojson</code></a> which derives JSON +serializers and deserializers.</li> +<li><a href="https://github.com/janestreet/ppx_sexp_conv"><code>ppx_sexp_conv</code></a> which derives s-expressions +converters.</li> +</ul> +<p>Derivers often let you attach attributes to specify how some parts of the AST should be handled. For +example when using <code>ppx_deriving_yojson</code> you can use <code>[@default some_val]</code> to make a field of an +object optional:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">type</span> t <span class="token operator">=</span> + <span class="token punctuation">{</span> a<span class="token punctuation">:</span> int + <span class="token punctuation">;</span> b<span class="token punctuation">:</span> string <span class="token punctuation">[</span><span class="token operator">@</span>default <span class="token string">&quot;&quot;</span><span class="token punctuation">]</span> + <span class="token punctuation">}</span> +<span class="token punctuation">[</span><span class="token operator">@@</span>deriving of_yojson<span class="token punctuation">]</span></code></pre></div> +<p>will derive a deserializer that will convert the JSON value <code>{&quot;a&quot;: 1}</code> to the OCaml +<code>{a = 1; b = &quot;&quot;}</code></p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#how-to-write-a-ppx-using-ppxlib" aria-label="how to write a ppx using ppxlib permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>How to write a PPX using <code>ppxlib</code></h2> +<p>Historically there was a few libraries used by PPX rewriter authors to write their PPX-es, including +<code>ppx_tools</code> and <code>ppx_deriving</code> but as the eco-system evolved, <code>ppxlib</code> emerged and is now the most +up-to-date and maintained library to write and handle PPX-es. It wraps the features of those +libraries in a single one. +I encourage you to use <code>ppxlib</code> to write new PPX-es as it is also easier to make various rewriters +work together if they are all registered through <code>ppxlib</code> and the PPX ecosystem would gain from +being unified around a single PPX library and driver.</p> +<p>It is also a great library and has some really powerful features to help you write your extensions +and derivers.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#writing-an-extension" aria-label="writing an extension permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Writing an extension</h3> +<p>The entry point of <code>ppxlib</code> for extensions is <code>Ppxlib.Extension.declare</code>. You have to use that +function to build an <code>Extension.t</code>, from which you can then build a <code>Context_free.Rule.t</code> before +registering your transformation so it's actually applied.</p> +<p>The typical <code>my_ppx_extension.ml</code> will look like:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">open</span> Ppxlib + +<span class="token keyword">let</span> extension <span class="token operator">=</span> + Extension<span class="token punctuation">.</span>declare + <span class="token string">&quot;my_extension&quot;</span> + some_context + some_pattern + expand_function + +<span class="token keyword">let</span> rule <span class="token operator">=</span> Context_free<span class="token punctuation">.</span>Rule<span class="token punctuation">.</span>extension extension + +<span class="token keyword">let</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> + Driver<span class="token punctuation">.</span>register_transformation <span class="token label property">~rules</span><span class="token punctuation">:</span><span class="token punctuation">[</span>rule<span class="token punctuation">]</span> <span class="token string">&quot;my_transformation&quot;</span></code></pre></div> +<p>To compile it as PPX rewriter you'll need to put the following in your dune file:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">(library + (public_name my_ppx) + (kind ppx_rewriter) + (libraries ppxlib))</code></pre></div> +<p>Now let's go back a little and look at the important part:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> extension <span class="token operator">=</span> + Extension<span class="token punctuation">.</span>declare + <span class="token string">&quot;my_extension&quot;</span> + some_context + some_pattern + expand_function</code></pre></div> +<p>Here <code>&quot;my_extension&quot;</code> is the name of your extension and that define how you're going to invoke it +in your extension point. In other words, to use this extension in our code we'll use a +<code>[%my_extension ...]</code> extension point.</p> +<p><code>some_context</code> is a <code>Ppxlib.Extension.Context.t</code> and describes where this extension can be found in +the AST, ie can you use <code>[%my_extension ...]</code> as an expression, a pattern, a core type. The +<code>Ppxlib.Extension.Context</code> module defines a constant for each possible extension context which you +can pass as <code>some_context</code>. +This obviously means that it also describes the type of AST node to which it must be converted and +this property is actually enforced by the <code>some_pattern</code> argument. But we'll come back to that +later.</p> +<p>Finally <code>expand_function</code> is our actual extension implementation, which basically takes the payload, +a <code>loc</code> argument which contains the location of the expanded extension point, a <code>path</code> argument +which is the fully qualified path to the expanded node (eg. <code>&quot;file.ml.A.B&quot;</code>) and returns the +generated code to replace the extension with.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#ast_pattern" aria-label="ast_pattern permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Ast_pattern</h4> +<p>Now let's get back to that <code>some_pattern</code> argument.</p> +<p>This is one of the trickiest parts of <code>ppxlib</code> to understand but it's also one its most +powerful features. The type for <code>Ast_pattern</code> is defined as <code>('a, 'b, 'c) t</code> where <code>'a</code> is +the type of AST nodes that are matched, <code>'b</code> is the type of the values you're extracting from the +node as a function type and <code>'c</code> is the return type of that last function. This sounded really +confusing to me at first and I'm guessing it might do to some of you too so let's give it a bit of +context.</p> +<p>Let's look at the type of <code>Extension.declare</code>:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">val</span> declare <span class="token punctuation">:</span> + string <span class="token operator">-&gt;</span> + <span class="token type-variable function">'context</span> Context<span class="token punctuation">.</span>t <span class="token operator">-&gt;</span> + <span class="token punctuation">(</span>payload<span class="token punctuation">,</span> <span class="token type-variable function">'a</span><span class="token punctuation">,</span> <span class="token type-variable function">'context</span><span class="token punctuation">)</span> Ast_pattern<span class="token punctuation">.</span>t <span class="token operator">-&gt;</span> + <span class="token punctuation">(</span>loc<span class="token punctuation">:</span>Location<span class="token punctuation">.</span>t <span class="token operator">-&gt;</span> path<span class="token punctuation">:</span>string <span class="token operator">-&gt;</span> <span class="token type-variable function">'a</span><span class="token punctuation">)</span> <span class="token operator">-&gt;</span> + t</code></pre></div> +<p>Here, the expected pattern first type parameter is <code>payload</code> which means we want a pattern that +matches <code>payload</code> AST nodes. That makes perfect sense since it is used to describe what your +extension's payload should look like and what to do with it. +The last type parameter is <code>'context</code> which again seems logical. As I mentioned earlier our +<code>expand_function</code> should return the same kind of node as the one where the extension was found. +Now what about <code>'a</code>. As you can see, it describes what comes after the base <code>loc</code> and <code>path</code> +parameters of our <code>expand_function</code>. From the pattern point of view, <code>'a</code> describes the parts of the +matched AST node we wish to extract for later consumption, here by our expander.</p> +<p><code>Ast_pattern</code> contains a whole bunch of combinators to let you describe what your pattern should match +and a specific <code>__</code> pattern that you must use to capture the various parts of the matched nodes. +<code>__</code> has type <code>('a, 'a -&gt; 'b, 'b) Ast_pattern.t</code> which means that whenever it's used it changes the +type of consumer function in the returned pattern.</p> +<p>Let's consider a few examples to try wrapping our heads around this. Say I want to write an +extension that takes an expression as a payload and I want to pass this expression to my expander so +I can generate code based on its value. I can declare the extension like this:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> extension <span class="token operator">=</span> + Extension<span class="token punctuation">.</span>declare + <span class="token string">&quot;my_extension&quot;</span> + Extension<span class="token punctuation">.</span>Context<span class="token punctuation">.</span>expression + Ast_pattern<span class="token punctuation">.</span><span class="token punctuation">(</span>single_expr_payload __<span class="token punctuation">)</span> + expand_function</code></pre></div> +<p>In this example, <code>Extension.Context.expression</code> has type <code>expression Extension.Context.t</code>, the +pattern has type <code>(payload, expression -&gt; expression, expression) Ast_pattern.t</code>. The pattern says we +want to allow a single expression in the payload and capture it. If we decompose it a bit, we can +see that <code>single_expr_payload</code> has type +<code>(expression, 'a, 'b) Ast_pattern.t -&gt; (payload, 'a, 'b) Ast_pattern.t</code> and is passed <code>__</code> which +makes it a <code>(expression, expression -&gt; 'b, 'b) Ast_pattern.t</code> and that's exactly what we want here +as our expander will have type <code>loc: Location.t -&gt; path: string -&gt; expression -&gt; expression</code>!</p> +<p>It works similarly to <code>Scanf.scanf</code> when you think about it. Changing the pattern changes the type of the +consumer function the same way changing the format string does for <code>Scanf</code> functions.</p> +<p>This was a bit easy since we had a custom combinator just for that purpose so let's take a few more +complex examples. Now say we want to only allow pairs of integer and string constants expressions in +our payload. Instead of just capturing any expression and dealing with the error cases in the +<code>expand_function</code> we can let <code>Ast_pattern</code> deal with that and pass an <code>int</code> and <code>string</code> along to +our expander:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml">Ast_pattern<span class="token punctuation">.</span><span class="token punctuation">(</span>single_expr_payload <span class="token punctuation">(</span>pexp_tuple <span class="token punctuation">(</span><span class="token punctuation">(</span>eint __<span class="token punctuation">)</span><span class="token operator">^::</span><span class="token punctuation">(</span>estring __<span class="token punctuation">)</span><span class="token operator">^::</span>nil<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span></code></pre></div> +<p>This one's a bit more elaborate but the idea is the same, we use <code>__</code> to capture the int and string +from the expression and use combinators to specify that the payload should be made of a pair and +that gives us a: <code>(payload, int -&gt; string -&gt; 'a, 'a) Ast_pattern.t</code> which should be used with a +<code>loc: Location.t -&gt; path: string -&gt; int -&gt; string -&gt; expression</code> expander.</p> +<p>We can also specify that our extension should take something else than an expression as a payload, +say a pattern with no <code>when</code> clause so that it's applied as <code>[%my_ext? some_pattern_payload]</code>:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml">Ast_pattern<span class="token punctuation">.</span><span class="token punctuation">(</span>ppat __ none<span class="token punctuation">)</span></code></pre></div> +<p>or no payload at all and it should just be invoked as <code>[%my_ext]</code>:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml">Ast_pattern<span class="token punctuation">.</span><span class="token punctuation">(</span>pstr nil<span class="token punctuation">)</span></code></pre></div> +<p>You should play with <code>Ast_pattern</code> a bit if you need to express complex patterns as I think it's +the only way to get the hang of it.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#writing-a-deriver" aria-label="writing a deriver permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Writing a deriver</h3> +<p>Registering a deriver is slightly different from registering an extension but in the end it remains +relatively simple and you will still have to provide the actual implementation in the form of an +<code>expand</code> function.</p> +<p>The typical <code>my_ppx_deriver.ml</code> will look like:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">open</span> Ppxlib + +<span class="token keyword">let</span> str_type_decl_generator <span class="token operator">=</span> + Deriving<span class="token punctuation">.</span>Generator<span class="token punctuation">.</span>make_no_arg + <span class="token label property">~attributes</span> + expand_str + +<span class="token keyword">let</span> sig_type_decl_generator <span class="token operator">=</span> + Deriving<span class="token punctuation">.</span>Generator<span class="token punctuation">.</span>make_no_arg + <span class="token label property">~attributes</span> + expand_sig + +<span class="token keyword">let</span> my_deriver <span class="token operator">=</span> + Deriving<span class="token punctuation">.</span>add + <span class="token label property">~str_type_decl</span><span class="token punctuation">:</span>str_type_decl_generator + <span class="token label property">~sig_type_decl</span><span class="token punctuation">:</span>sig_type_decl_generator + <span class="token string">&quot;my_deriver&quot;</span></code></pre></div> +<p>Which you'll need to compile with the following <code>library</code> stanza:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">(library + (public_name my_ppx) + (kind ppx_deriver) + (libraries ppxlib))</code></pre></div> +<p>The <code>Deriving.add</code> function is declared as:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">val</span> add + <span class="token punctuation">:</span> <span class="token operator">?</span>str_type_decl<span class="token punctuation">:</span><span class="token punctuation">(</span>structure<span class="token punctuation">,</span> rec_flag <span class="token operator">*</span> type_declaration list<span class="token punctuation">)</span> Generator<span class="token punctuation">.</span>t + <span class="token operator">-&gt;</span> <span class="token operator">?</span>str_type_ext <span class="token punctuation">:</span><span class="token punctuation">(</span>structure<span class="token punctuation">,</span> type_extension <span class="token punctuation">)</span> Generator<span class="token punctuation">.</span>t + <span class="token operator">-&gt;</span> <span class="token operator">?</span>str_exception<span class="token punctuation">:</span><span class="token punctuation">(</span>structure<span class="token punctuation">,</span> extension_constructor <span class="token punctuation">)</span> Generator<span class="token punctuation">.</span>t + <span class="token operator">-&gt;</span> <span class="token operator">?</span>sig_type_decl<span class="token punctuation">:</span><span class="token punctuation">(</span>signature<span class="token punctuation">,</span> rec_flag <span class="token operator">*</span> type_declaration list<span class="token punctuation">)</span> Generator<span class="token punctuation">.</span>t + <span class="token operator">-&gt;</span> <span class="token operator">?</span>sig_type_ext <span class="token punctuation">:</span><span class="token punctuation">(</span>signature<span class="token punctuation">,</span> type_extension <span class="token punctuation">)</span> Generator<span class="token punctuation">.</span>t + <span class="token operator">-&gt;</span> <span class="token operator">?</span>sig_exception<span class="token punctuation">:</span><span class="token punctuation">(</span>signature<span class="token punctuation">,</span> extension_constructor <span class="token punctuation">)</span> Generator<span class="token punctuation">.</span>t + <span class="token operator">-&gt;</span> <span class="token operator">?</span>extension<span class="token punctuation">:</span><span class="token punctuation">(</span>loc<span class="token punctuation">:</span>Location<span class="token punctuation">.</span>t <span class="token operator">-&gt;</span> path<span class="token punctuation">:</span>string <span class="token operator">-&gt;</span> core_type <span class="token operator">-&gt;</span> expression<span class="token punctuation">)</span> + <span class="token operator">-&gt;</span> string + <span class="token operator">-&gt;</span> t</code></pre></div> +<p>It takes a mandatory string argument, here <code>&quot;my_deriver&quot;</code>, which defines how +user are going to invoke your deriver. In this case we'd need to add a <code>[@@deriving my_deriver]</code> to +a type declaration in a structure or a signature to use it. +Then there's just one optional argument per kind of node to which you can attach a <code>[@@deriving ...]</code> +attribute. <code>type_decl</code> correspond to <code>type = ...</code>, <code>type_ext</code> to <code>type += ...</code> and <code>exception</code> to +<code>exception My_exc of ...</code>. +You need to provide generators for the ones you wish your deriver to handle, <code>ppxlib</code> +will make sure users get a compile error if they try to use it elsewhere. +We can ignore the <code>extension</code> as it's just here for compatibility with <code>ppx_deriving</code>.</p> +<p>Now let's take a look at <code>Generator</code>. Its type is defined as <code>('output_ast, 'input_ast) t</code> where +<code>'input_ast</code> is the type of the node to which the <code>[@@deriving ...]</code> is attached and <code>'output_ast</code> +the type of the nodes it should produce, ie either a <code>structure</code> or a <code>signature</code>. The type of a +generator depends on the expand function it's built from when you use the smart constructor +<code>make_no_arg</code> meaning the expand function should have type +<code>loc: Location.t -&gt; path: string -&gt; 'input_ast -&gt; 'output_ast</code>. This function is the actual +implementation of your deriver and will generate the list of <code>structure_item</code> or <code>signature_item</code> +from the type declaration.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#compatibility-with-ppx_import" aria-label="compatibility with ppx_import permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Compatibility with <code>ppx_import</code></h4> +<p><a href="https://github.com/ocaml-ppx/ppx_import"><code>ppx_import</code></a> is a PPX rewriter that lets you import type +definitions and spares you the need to copy and update them every time they change upstream. The +main reason why you would want to do that is because you need to derive values from those types +using a deriver thus the importance of ensuring your deriving plugin is compatible.</p> +<p>Let's take an example to illustrate how <code>ppx_import</code> is used. I'm using a library called <code>blob</code> +which exposes a type <code>Blob.t</code>. For some reason I need to be able to serialize and deserialize +<code>Blob.t</code> values to JSON. I'd like to use a deriver to do that as I don't want to maintain that code +myself. Imagine <code>Blob.t</code> is defined as:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">type</span> t <span class="token operator">=</span> + <span class="token punctuation">{</span> <span class="token keyword">value</span> <span class="token punctuation">:</span> string + <span class="token punctuation">;</span> length <span class="token punctuation">:</span> int + <span class="token punctuation">;</span> id <span class="token punctuation">:</span> int + <span class="token punctuation">}</span></code></pre></div> +<p>Without <code>ppx_import</code> I would define somewhere a <code>serializable_blob</code> type as follows:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">type</span> serializable_blob <span class="token operator">=</span> Blob<span class="token punctuation">.</span>t <span class="token operator">=</span> + <span class="token punctuation">{</span> <span class="token keyword">value</span> <span class="token punctuation">:</span> string + <span class="token punctuation">;</span> length <span class="token punctuation">:</span> int + <span class="token punctuation">;</span> id <span class="token punctuation">:</span> int + <span class="token punctuation">}</span> +<span class="token punctuation">[</span><span class="token operator">@@</span>deriving yojson<span class="token punctuation">]</span></code></pre></div> +<p>That works well especially because the type definition is simple but I don't really care about +having it here, what I really want is just the <code>to_yojson</code> and <code>of_yojson</code> functions. Also now, if +the type definition changes, I have to update it here manually. Maintaining many such imports can be +tedious and duplicates a lot of code unnecessarily.</p> +<p>What I can do instead, thanks to <code>ppx_import</code> is to write it like this:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">type</span> serializable_blob <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token operator">%</span>import<span class="token punctuation">:</span> Blob<span class="token punctuation">.</span>t<span class="token punctuation">]</span> +<span class="token punctuation">[</span><span class="token operator">@@</span>deriving yojson<span class="token punctuation">]</span></code></pre></div> +<p>which will ultimately be expanded into the above using <code>Blob</code>'s definition of the type <code>t</code>.</p> +<p>Now <code>ppx_import</code> works a bit differently from regular PPX rewriters as it needs a bit more information +than just the AST. We don't need to understand how it works but what it means is that if your +deriving plugin is used with <code>ppx_import</code>, it will be called twice:</p> +<ul> +<li>A first time with <code>ocamldep</code>. This is required to determine the dependencies of a module in terms +of other OCaml modules. PPX-es need to be applied here to find out about dependencies they may +introduce.</li> +<li>A second time before actually compiling the code.</li> +</ul> +<p>The issue here is that during the <code>ocamldep</code> pass, <code>ppx_import</code> doesn't have the information it +needs to import the type definition yet so it can't copy it and it expands:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">type</span> u <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token operator">%</span>import A<span class="token punctuation">.</span>t<span class="token punctuation">]</span></code></pre></div> +<p>into:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">type</span> u <span class="token operator">=</span> A<span class="token punctuation">.</span>t</code></pre></div> +<p>Only during the second pass will it actually expand it to the copied type definition.</p> +<p>This may be a concern if your deriving plugin can't apply to abstract types because you will +probably raise an error when encountering one, meaning the first phase will fail and the whole +compilation will fail without giving your rewriter a chance to derive anything from the copied +type definition.</p> +<p>The right way to deal with this is to have different a behaviour in the context of <code>ocamldep</code>. +In this case you can ignore such type declaration or eventually, if you know you are going to +inject new dependencies in your generated code, to create dummy values referencing them and just +behave normally in any other context.</p> +<p><code>ppxlib</code> versions <code>0.6.0</code> and higher allow you to do so through the <code>Deriving.Generator.V2</code> API +which passes an abstract <code>ctxt</code> value to your <code>expand</code> function instead of a <code>loc</code> and a <code>path</code>. +You can tell whether it is the <code>ocamldep</code> pass from within the <code>expand</code> function like this:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">open</span> Ppxlib + +<span class="token keyword">let</span> expand <span class="token label property">~ctxt</span> input_ast <span class="token operator">=</span> + <span class="token keyword">let</span> omp_config <span class="token operator">=</span> Expansion_context<span class="token punctuation">.</span>Deriver<span class="token punctuation">.</span>omp_config ctxt <span class="token keyword">in</span> + <span class="token keyword">let</span> is_ocamldep_pass <span class="token operator">=</span> String<span class="token punctuation">.</span>equal <span class="token string">&quot;ocamldep&quot;</span> omp_config<span class="token punctuation">.</span>Migrate_parsetree<span class="token punctuation">.</span>Driver<span class="token punctuation">.</span>tool_name <span class="token keyword">in</span> + <span class="token operator">..</span><span class="token punctuation">.</span></code></pre></div> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#deriver-attributes" aria-label="deriver attributes permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Deriver attributes</h4> +<p>You'll have noted the <code>attributes</code> parameter in the examples. It's an optional parameter that lets +you define which attributes your deriver allows the user to attach to various bits of the type, +type extension or exception declaration it is applied to.</p> +<p><code>ppxlib</code> comes with a <code>Attribute</code> module that lets you to properly declare the attributes you want +to allow and make sure they are properly used: correctly spelled, placed and with the right +payload attached. This is especially useful since attributes are by default ignored by the compiler +meaning without <code>ppxlib</code>'s care, plugin users wouldn't get any errors if they misused an attribute +and it might take them a while to figure out they got it wrong and the generated code wasn't +impacted as they hoped. +The <code>Attribute</code> module offers another great feature: <code>Attribute.t</code> values can be used to extract the +attribute payload from an AST node if it is present. That will spare you the need for +inspecting attributes yourself which can prove quite tedious.</p> +<p><code>Ppxlib.Attribute.t</code> is defined as <code>('context, 'payload) t</code> where <code>'context</code> describes to which node +the attribute can be attached and <code>'payload</code>, the type of its payload. +To build such an attribute you must use <code>Ppxlib.Attribute.declare</code>:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">val</span> declare + <span class="token punctuation">:</span> string + <span class="token operator">-&gt;</span> <span class="token type-variable function">'a</span> Context<span class="token punctuation">.</span>t + <span class="token operator">-&gt;</span> <span class="token punctuation">(</span>payload<span class="token punctuation">,</span> <span class="token type-variable function">'b</span><span class="token punctuation">,</span> <span class="token type-variable function">'c</span><span class="token punctuation">)</span> Ast_pattern<span class="token punctuation">.</span>t + <span class="token operator">-&gt;</span> <span class="token type-variable function">'b</span> + <span class="token operator">-&gt;</span> <span class="token punctuation">(</span><span class="token type-variable function">'a</span><span class="token punctuation">,</span> <span class="token type-variable function">'c</span><span class="token punctuation">)</span> t</code></pre></div> +<p>Let's try to declare the <code>default</code> argument from <code>ppx_deriving_yojson</code> I mentioned earlier.</p> +<p>The first <code>string</code> argument is the attribute name. <code>ppxlib</code> support namespaces for the attributes so +that users can avoid conflicting attributes between various derivers applied to the same type +definitions. For instance here we could use <code>&quot;default&quot;</code>. It can prove helpful to use more qualified +name such as <code>&quot;ppx_deriving_yojson.of_yojson.default&quot;</code>. That means that our attribute can be used as +<code>[@@default ...]</code>, <code>[@@of_yojson.default ...]</code> or <code>[@@ppx_deriving.of_yojson.default ...]</code>. +Now if another deriver uses a <code>[@@default ...]</code>, users can apply both derivers and provide different +<code>default</code> values to the different derivers by writing:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">type</span> t <span class="token operator">=</span> + <span class="token punctuation">{</span> a <span class="token punctuation">:</span> int + <span class="token punctuation">;</span> b <span class="token punctuation">:</span> string <span class="token punctuation">[</span><span class="token operator">@</span>make<span class="token punctuation">.</span>default <span class="token string">&quot;abc&quot;</span><span class="token punctuation">]</span> <span class="token punctuation">[</span><span class="token operator">@</span>of_yojson<span class="token punctuation">.</span>default <span class="token string">&quot;&quot;</span><span class="token punctuation">]</span> + <span class="token punctuation">}</span> +<span class="token punctuation">[</span><span class="token operator">@@</span>deriving make<span class="token punctuation">,</span>of_yojson<span class="token punctuation">]</span></code></pre></div> +<p>The context argument works very similarly to the one in <code>Extension.declare</code>. Here we want the +attribute to be attached to record field declarations so we'll use +<code>Attribute.Context.label_declaration</code> which has type <code>label_declaration Attribute.Context.t</code>.</p> +<p>The pattern argument is an <code>Ast_pattern.t</code>. Now that we know how to work with those this is pretty +easy. Here we need to accept any expression as a payload since we should be able to apply the +<code>default</code> attribute to any field, regardless of its type and we want to extract that expression from +the payload so we can use it in our deserializer so let's use +<code>Ast_pattern.(single_expr_payload __)</code>.</p> +<p>Finally the last <code>'b</code> argument has the same type as the pattern consumer function. We can use it to +transform what we extracted using the previous <code>Ast_pattern</code> but in this case we just want to +keep the expression as we got it so we'll just use the identity function here.</p> +<p>We end up with the following:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> default_attribute <span class="token operator">=</span> + Attribute<span class="token punctuation">.</span>declare + <span class="token string">&quot;ppx_deriving_yojson.of_yojson.default&quot;</span> + Attribute<span class="token punctuation">.</span>Context<span class="token punctuation">.</span>label_declaration + Ast_pattern<span class="token punctuation">.</span><span class="token punctuation">(</span>single_expr_payload __<span class="token punctuation">)</span> + <span class="token punctuation">(</span><span class="token keyword">fun</span> expr <span class="token operator">-&gt;</span> expr<span class="token punctuation">)</span></code></pre></div> +<p>and that gives us a <code>(label_declaration, expression) Attribute.t</code>.</p> +<p>You can then use it to collect the attribute payload from a label_declaration:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml">Attribute<span class="token punctuation">.</span>get default_attribute label_decl</code></pre></div> +<p>which will return <code>Some expr</code> if the attribute was attached to <code>label_decl</code> or <code>None</code> otherwise.</p> +<p>Because of their polymorphic nature, attributes need to be packed, ie to be wrapped with a variant +to hide the type parameter, so if you want to pass it to <code>Generator.make_no_arg</code> you'll have to do +it like this:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> attributes <span class="token operator">=</span> <span class="token punctuation">[</span>Attribute<span class="token punctuation">.</span>T default_attribute<span class="token punctuation">]</span></code></pre></div> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#writing-your-expand-functions" aria-label="writing your expand functions permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Writing your expand functions</h3> +<p>In the two last sections I mentioned <code>expand</code> functions that would contain the actual <code>deriver</code> or +<code>extension</code> implementation but didn't actually said anything about how to write those. It will +depend a lot on the purpose of your PPX rewriter and what you're trying to achieve.</p> +<p>Before writing your PPX you should clearly specify what it should be applied to and what code it +should produce. That will help you declaring the right deriving or extension rewriter and from there +you'll know the type of the <code>expand</code> functions you have to write which should help.</p> +<p>A good way to proceed is to use the <code>dumpast</code> tool to pretty print the AST fragments of both the +input of your expander and the output, ie the code it should generate. To take a concrete example, +say you want to write a deriving plugin that generates an <code>equal</code> function from a type definition. +You can start by running <code>dumpast</code> on the following file:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">type</span> some_record <span class="token operator">=</span> + <span class="token punctuation">{</span> a <span class="token punctuation">:</span> int64 + <span class="token punctuation">;</span> b <span class="token punctuation">:</span> string + <span class="token punctuation">}</span> + +<span class="token keyword">let</span> equal_some_record r r' <span class="token operator">=</span> Int64<span class="token punctuation">.</span>equal r<span class="token punctuation">.</span>a r'<span class="token punctuation">.</span>a <span class="token operator">&amp;&amp;</span> String<span class="token punctuation">.</span>equal r<span class="token punctuation">.</span>b r'<span class="token punctuation">.</span>b</code></pre></div> +<p>That will give you the AST representation of a record type definition and the equal function you +want to write so you can figure out how to deconstruct your expander's input to be able to generate +the right output.</p> +<p><code>ppxlib</code> exposes smart constructors in <code>Ppxlib.Ast_builder.Default</code> to help you build AST fragments +without having to care too much attributes and such fields as well as some convenience constructors +to keep your code concise and readable.</p> +<p>Another convenience tool <code>ppxlib</code> exposes to help you build AST fragments is <code>metaquot</code>. I recently +wrote a bit of documentation about it +<a href="https://ppxlib.readthedocs.io/en/latest/ppx-for-plugin-authors.html#metaquot">here</a> which you +should take a look at but to sum it up <code>metaquot</code> is a PPX extension allowing you to write AST nodes +using the OCaml syntax they describe instead of the AST types.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#handling-code-locations-in-a-ppx-rewriter" aria-label="handling code locations in a ppx rewriter permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Handling code locations in a PPX rewriter</h4> +<p>When building AST fragments you should keep in mind that you have to set their <code>location</code>. Locations +are part of the AST values that describes the position of the corresponding node in your source +file, including the file name and the line number and offset of both the beginning and the end the +code bit they represent.</p> +<p>Because your code was generated after the file was parsed, it doesn't have a location so you need to +set it yourself. One could think that it doesn't matter and we could use a dummy location but +locations are used by the compiler to properly report errors and that's why a PPX rewriter should care +about how it locates the generated code as it will help the end user to understand whether the error +comes from their code or generated code and how to eventually fix it.</p> +<p>Both <code>Ast_builder</code> and <code>metaquot</code> expect a location. The first explicitly takes it as a labelled +<code>loc</code> argument while the second relies on a <code>loc</code> value being available in the scope. It is +important to set those with care as errors in the generated code doesn't necessarily mean that your +rewriter is bugged. There are valid cases where your rewriter functioned as intended but the generated +code triggers an error. PPX-es often work on the assumption that some values are available in the +scope, if the user doesn't properly provide those it's their responsibility to fix the error. To +help them do so, it is important to properly locate the generated code to guide them as much as +possible.</p> +<p>When writing extensions, using the whole extension point location for the generated code makes +perfect sense as that's where the code will sit. That's fairly easy as this what <code>ppxlib</code> passes +to the expand function through the <code>loc</code> labelled argument. For deriving plugins it's a bit different +as the generated code doesn't replace an existing part of the parsed AST but generate a new one to insert. +Currently <code>ppxlib</code> gives you the <code>loc</code> of the whole type declaration, extension or exception +declaration your deriving plugin is applied to. Ideally it would be nice to be able to locate the +generated code on the plugin name in the <code>deriving</code> attribute payload, ie here:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">[@@deriving my_plugin,another_plugin] + ^^^^^^^^^</code></pre></div> +<p>I'm currently working on making that location available to the <code>expand</code> function. In the meantime, +you should choose a convention. I personally locate all the generated code on the +type declaration. Some choose to locate the generated code on the part of the input AST they're +handling when generating it.</p> +<h4 style="position:relative;"><a href="https://tarides.com/feed.xml#reporting-errors-to-your-rewriter-users" aria-label="reporting errors to your rewriter users permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Reporting errors to your rewriter users</h4> +<p>You won't always be able to handle all the AST nodes passed to your expand functions, either because the +end user misused your rewriter or because there are some cases you simply can't deal with.</p> +<p>In those cases you can report the error to the user with <code>Ppxlib.Location.raise_errorf</code>. It works +similarly to <code>printf</code> and you can build your error message from a format string and extra +arguments. It will then raise an exception which will be caught and reported by the compiler. +A good practice is to prefix the error message with the name of your rewriter to help users understand +what's going on, especially with deriving plugin as they might use several of them on the same type +declaration.</p> +<p>Another point to take care of here is, again, locations. <code>raise_errorf</code> takes a labelled <code>loc</code> +arguments. It is used so that your error is reported as any compiler error. Having good locations in +those error messages is just as important as sending clear error messages. Keep in mind that both +the errors you report yourself or errors coming from your generated code will be highlighted by +merlin so when properly set they make it much easier to work with your PPX rewriter.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#testing-your-ppx" aria-label="testing your ppx permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Testing your PPX</h3> +<p>Just as most pieces of code do, a PPX deserves to be tested and it has become easier over the years to +test rewriters.</p> +<p>I personally tend to write as many unit test as possible for my PPX-es internal libraries. I try to +extract helper functions that can easily be unit-tested but I can't test it all that way. +Testing the <code>ast -&gt; ast</code> functions would be tedious as <code>ppxlib</code> and <code>ocaml-migrate-parsetree</code> +don't provide comparison and pretty printing functions that you can use with <code>alcotest</code> or <code>oUnit</code>. +That means you'd have to import the AST types and derive them on your own. That would make a lot +of boiler plate and even if those functions were exposed, writing such tests would be really +tedious. There's a lot of things to take into account. How are you going to build the input AST values +for instance? If you use <code>metaquot</code>, every node will share the same loc, making it hard to test +that your errors are properly located. If you don't, you will end up with insanely long and +unreadable test code or fixtures. +While that would allow extremely accurate testing for the generated code and errors, it will almost +certainly make your test code unmaintainable, at least given the current tooling.</p> +<p>Don't panic, there is a very good and simple alternative. <code>ppxlib</code> makes it very easy to build a +binary that will parse OCaml code, preprocess the AST with your rewriter and spit it out, formatted as +code again.</p> +<p>You just have to write the following <code>pp.ml</code>:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> Ppxlib<span class="token punctuation">.</span>Driver<span class="token punctuation">.</span>standalone <span class="token punctuation">(</span><span class="token punctuation">)</span></code></pre></div> +<p>and build the binary with the following <code>dune</code> stanza, assuming your rewriter is called +<code>my_ppx_rewriter</code>:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">(executable + (name pp) + (modules pp) + (libraries my_ppx_rewriter ppxlib))</code></pre></div> +<p>Because we're humans and the OCaml syntax is meant for us to write and read, it makes for much better +test input/output. You can now write your test input in a regular <code>.ml</code> file, use the <code>pp.exe</code> +binary to &quot;apply&quot; your preprocessor to it and compare the output with another <code>.ml</code> file containing +the code you expect it to generate. This kind of test pattern is really well supported by <code>dune</code> +thanks to the <code>diff</code> user action.</p> +<p>I usually have the following files in a <code>rewriter</code>/<code>deriver</code> folder within my test directory:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">test/rewriter/ +&#9500;&#9472;&#9472; dune +&#9500;&#9472;&#9472; test.expected.ml +&#9500;&#9472;&#9472; pp.ml +&#9492;&#9472;&#9472; test.ml</code></pre></div> +<p>Where <code>pp.ml</code> is used to produce the rewriter binary, <code>test.ml</code> contains the input OCaml code and +<code>test.expected.ml</code> the result of preprocessing <code>test.ml</code>. The dune file content is generally similar +to this:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">(executable + (name pp) + (modules pp) + (libraries my_ppx_rewriter ppxlib)) + +(rule + (targets test.actual.ml) + (deps (:pp pp.exe) (:input test.ml)) + (action (run ./%{pp} -deriving-keep-w32 both --impl %{input} -o %{targets}))) + +(alias + (name runtest) + (action (diff test.expected.ml test.actual.ml))) + +(test + (name test) + (modules test) + (preprocess (pps my_ppx_rewriter)))</code></pre></div> +<p>The first stanza is the one I already introduced above and specifies how to build the rewriter binary.</p> +<p>The <code>rule</code> stanza that comes after that indicates to <code>dune</code> how to produce the actual test output by +applying the rewriter binary to <code>test.ml</code>. You probably noticed the <code>-deriving-keep-w32 both</code> CLI +option passed to <code>pp.exe</code>. By default, <code>ppxlib</code> will generate values or add attributes so that your +generated code doesn't trigger a &quot;Unused value&quot; warning. This is useful in real life situation but +here it will just pollute the test output and make it harder to read so we disable that feature.</p> +<p>The following <code>alias</code> stanza is where all the magic happens. Running <code>dune runtest</code> will now +generate <code>test.actual.ml</code> and compare it to <code>test.expected.ml</code>. It will not only do that but show +you how they differ from each other in a diff format. You can then automatically update +<code>test.expected.ml</code> if you're happy with the results by running <code>dune promote</code>.</p> +<p>Finally the last <code>test</code> stanza is there to ensure that the generated code compiles without type +errors.</p> +<p>This makes a very convenient test setup to write your PPX-es TDD style. You can start by writing an +identity PPX, that will just return its input AST as it is. Then you add some OCaml code using your +soon to be PPX in <code>test.ml</code> and run <code>dune runtest --auto-promote</code> to prefill <code>test.expected.ml</code>. +From there you can start implementing your rewriter and run <code>dune runtest</code> to check on your progress +and update the expected result with <code>dune promote</code>. +Going pure TDD by writing the test works but it's tricky cause you'd have to format your code the +same way <code>pp.exe</code> will format the AST. It would be great to be able to specify how to format +the generated <code>test.actual.ml</code> so that this approach would be more viable and the diff more +readable. Being able to use ocamlformat with a very diff friendly configuration would be great +there. <code>pp.exe</code> seems to offer CLI options to change the code style such as <code>-styler</code> but I haven't +had the chance to experiment with those yet.</p> +<p>Now you can test successful rewriting this way but what about errors? There's a lot of value +ensuring you produce the right errors and on the right code location because that's the kind of +things you can get wrong when refactoring your rewriter code or when people try to contribute. +That isn't as likely to happen if your CI yells when you break the error reporting. So how do we do +that?</p> +<p>Well pretty much the exact same way! We write a file with an erroneous invocation of our rewriter, +run <code>pp.exe</code> on it and compare stderr with what we expect it to be. +There are two major differences here. First we want to collect the stderr output of the rewriter +binary instead of using it to generate a file. The second is that we cant write all of our test +cases in a single file since <code>pp.exe</code> will stop at the first error. That means we need one <code>.ml</code> +file per error test case. +Luckily for us, dune offers ways to do both.</p> +<p>For every error test file we will want to add the following stanzas:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">(rule + (targets test_error.actual) + (deps (:pp pp.exe) (:input test_error.ml)) + (action + (with-stderr-to + %{targets} + (bash &quot;./%{pp} -no-color --impl %{input} || true&quot;) + ) + ) +) + +(alias + (name runtest) + (action (diff test_error.expected test_error.actual)) +)</code></pre></div> +<p>but obviously we don't want to do that by hand every time we add a new test case so we're gonna need +a script to generate those stanzas and then include them into our <code>dune</code> file using +<code>(include dune.inc)</code>.</p> +<p>To achieve that while keeping things as clean as possible I use the following directory structure:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">test/rewriter/ +&#9500;&#9472;&#9472; errors +&#9474; &#9500;&#9472;&#9472; dune +&#9474; &#9500;&#9472;&#9472; dune.inc +&#9474; &#9500;&#9472;&#9472; gen_dune_rules.ml +&#9474; &#9500;&#9472;&#9472; pp.ml +&#9474; &#9500;&#9472;&#9472; test_some_error.expected +&#9474; &#9500;&#9472;&#9472; test_some_error.ml +&#9474; &#9500;&#9472;&#9472; test_some_other_error.expected +&#9474; &#9492;&#9472;&#9472; test_some_other_error.ml +&#9500;&#9472;&#9472; dune +&#9500;&#9472;&#9472; test.expected.ml +&#9500;&#9472;&#9472; pp.ml +&#9492;&#9472;&#9472; test.ml</code></pre></div> +<p>Compared to our previous setup, we only added the new <code>errors</code> folder. To keep things simple it has +its own <code>pp.ml</code> copy but in the future I'd like to improve it a bit and be able to use the same +<code>pp.exe</code> binary.</p> +<p>The most important files here are <code>gen_dune_rules.ml</code> and <code>dune.inc</code>. The first is just a simple +OCaml script to generate the above stanzas for each test cases in the <code>errors</code> directory. The second +is the file we'll include in the main <code>dune</code>. It's also the file to which we'll write the generated +stanza.</p> +<p>I personally use the following <code>gen_dune_rules.ml</code>:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> output_stanzas filename <span class="token operator">=</span> + <span class="token keyword">let</span> base <span class="token operator">=</span> Filename<span class="token punctuation">.</span>remove_extension filename <span class="token keyword">in</span> + Printf<span class="token punctuation">.</span>printf + <span class="token string">{| +(library + (name %s) + (modules %s) + (preprocess (pps ppx_yojson)) +) + +(rule + (targets %s.actual) + (deps (:pp pp.exe) (:input %s.ml)) + (action + (with-stderr-to + %%{targets} + (bash &quot;./%%{pp} -no-color --impl %%{input} || true&quot;) + ) + ) +) + +(alias + (name runtest) + (action (diff %s.expected %s.actual)) +) +|}</span> + base + base + base + base + base + base + +<span class="token keyword">let</span> is_error_test <span class="token operator">=</span> <span class="token keyword">function</span> + <span class="token operator">|</span> <span class="token string">&quot;pp.ml&quot;</span> <span class="token operator">-&gt;</span> <span class="token boolean">false</span> + <span class="token operator">|</span> <span class="token string">&quot;gen_dune_rules.ml&quot;</span> <span class="token operator">-&gt;</span> <span class="token boolean">false</span> + <span class="token operator">|</span> filename <span class="token operator">-&gt;</span> Filename<span class="token punctuation">.</span>check_suffix filename <span class="token string">&quot;.ml&quot;</span> + +<span class="token keyword">let</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=</span> + Sys<span class="token punctuation">.</span>readdir <span class="token string">&quot;.&quot;</span> + <span class="token operator">|&gt;</span> Array<span class="token punctuation">.</span>to_list + <span class="token operator">|&gt;</span> List<span class="token punctuation">.</span>sort String<span class="token punctuation">.</span>compare + <span class="token operator">|&gt;</span> List<span class="token punctuation">.</span>filter is_error_test + <span class="token operator">|&gt;</span> List<span class="token punctuation">.</span>iter output_stanzas</code></pre></div> +<p>Nothing spectacular here, we just build the list of all the <code>.ml</code> files in the directory except +<code>pp.ml</code> and <code>gen_dune_rules.ml</code> itself and then generate the right stanzas for each of them. You'll +note the extra <code>library</code> stanza which I add to get dune to generate the right <code>.merlin</code> so that I +can see the error highlights when I edit the files by hand.</p> +<p>With that we're almost good, add the following to the <code>dune</code> file and you're all set:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">(executable + (name pp) + (modules pp) + (libraries + ppx_yojson + ppxlib + ) +) + +(include dune.inc) + +(executable + (name gen_dune_rules) + (modules gen_dune_rules) +) + +(rule + (targets dune.inc.gen) + (deps + (:gen gen_dune_rules.exe) + (source_tree .) + ) + (action + (with-stdout-to + %{targets} + (run %{gen}) + ) + ) +) + +(alias + (name runtest) + (action (diff dune.inc dune.inc.gen)) +)</code></pre></div> +<p>The first stanza is here to specify how to build the rewriter binary, same as before, while the +second stanza just tells dune to include the content of <code>dune.inc</code> within this <code>dune</code> file.</p> +<p>The interesting part comes next. As you can guess the <code>executable</code> stanza builds our little OCaml +script into a <code>.exe</code>. The <code>rule</code> that comes after that specifies how to generate the new stanzas +by running <code>gen_dune_rules</code> and capturing its standard output into a <code>dune.inc.gen</code> file. +The last rule allows you to review the changes to the generated stanza and use promotion to accept +them. Once this is done, the new stanzas will be included to the <code>dune</code> file and the test will be +run for every test cases.</p> +<p>Adding a new test case is then pretty easy, you can simply run:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">$ touch test/rewriter/errors/some_explicit_test_case_name.{ml,expected} &amp;&amp; dune runtest --auto-promote</code></pre></div> +<p>That will create the new empty test case and update the <code>dune.inc</code> with the corresponding rules. +From there you can proceed the same way as with the successful rewriting tests, update the <code>.ml</code>, +run <code>dune runtest</code> to take a sneak peek at the output and <code>dune promote</code> once you're satisfied with +the result.</p> +<p>I've been pretty happy with this setup so far although there's room for improvement. It would be +nice to avoid duplicating <code>pp.ml</code> for errors testing. This also involves +quite a bit of boilerplate that I have to copy into all my PPX rewriters repositories every time. +Hopefully <a href="https://github.com/ocaml/dune/issues/1855">dune plugins</a> should help with that and I +can't wait for a first version to be released so that I can write a plugin to make this test +pattern more accessible and easier to set up.</p>https://tarides.com/blog/2019-05-09-an-introduction-to-ocaml-ppx-ecosystemAn introduction to OCaml PPX ecosystem2019-05-09T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Let's talk sun, mint tea and OCaml: Yes, you got it, the <a href="http://retreat.mirage.io">MirageOS biennial retreat</a> at Marrakesh!</p> +<p>For the 7th iteration of the retreat, the majority of the Tarides team took part in the trip to the camels country. +This is a report about what we produced and enjoyed while there.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#charles-edouard-lecat" aria-label="charles edouard lecat permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Charles-Edouard Lecat</h1> +<p>That's it, my first MirageOS retreat is coming soon, let's jump in the plane and here I come. After a nice cab trip and an uncountable number of similar streets, I'm finally at the Riad which will host me for the next 5 days.</p> +<p>It now begins the time to do what I came for: Code, Eat, Sleep and Repeat</p> +<p>I mostly worked on <a href="https://github.com/mirage/colombe">Colombe</a>, the OCaml implementation of the SMTP protocol for which I developed a simple client. +Except some delayed problems (like the integration of the MIME protocol, the TLS wrapping and some others), the client was working perfectly :) +Implementing it was actually really easy as the core of the SMTP protocol was done by @dinosaure who developed over time a really nice way of implementing this kind of API. And as I spend most of my time at Tarides working on his code, I feel really comfortable with it.</p> +<p>One of the awesome thing about this retreat was the people who came: There was so many interesting people, doing various thing, so each time someone had an interrogation, you could almost be sure that someone could help you in a way or another.</p> +<p>But sadly, as I arrived few days after everyone, and just before the week-end, the time flew away reaaaaaally fast, and I did not have the time to do some major code, but I'm already looking forward to the next retreat which, I am sure, will be even more fruitful and attract a lot of nice OCaml developers.</p> +<p>Until then, I will just dream about the awesome food I ate there ;)</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#lucas-pluvinage" aria-label="lucas pluvinage permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Lucas Pluvinage</h1> +<p>Second Mirage retreat for me, and this time I had plans: make a small web game with Mirage hosted by an ESP32 device. I figured out that there was not canonical way to make an HTTP/Websocket server with Mirage and I didn't want to stick to a particular library.</p> +<p>Instead, I took my time to develop <code>mirage-http</code>, an abstraction of HTTP that can either have <code>cohttp</code> or <code>httpaf</code> as a backend. On top of that, I've build <code>mirage-websocket</code> which is therefore an HTTP server-independant implementation of websockets (indeed this has a lot of redundancies with <code>ocaml-websocket</code> but for now it's a proof of concept). While making all this I discussed with @anmonteiro who's the Webservers/protocols expert for Mirage ! However I didn't have the time to build something on top of that, but this is still something that I would like achieve at some point.</p> +<p>I also became the &quot;dune guy&quot; as I'm <a href="https://github.com/mirage/mirage/issues/969">working on the Mirage/dune integration</a>, and helped some people with their build system struggles.</p> +<p>It was definitely a rich week, I've learnt a lot of things, enjoyed the sun, ate good food and contributed to the Mirage universe !</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#jules-aguillon" aria-label="jules aguillon permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Jules Aguillon</h1> +<p>This was my first retreat. +It was the occasion to meet OCaml developers from all over the world. +The food was great and the weather perfect.</p> +<p>I submitted some PRs to the OCaml compiler !</p> +<ul> +<li>Hint on type error on int literal <a href="https://github.com/ocaml/ocaml/pull/2301">PR #2301</a>. +It's adding an hint when using <code>int</code> literals instead of other number literals (eg. <code>3</code> instead of <code>3.</code> or <code>3L</code>):</li> +</ul> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">Line 2, characters 20-21: +2 | let _ = Int32.add a 3 + ^ +Error: This expression has type int but an expression was expected of type + int32 + Hint: Did you mean `3l'?</code></pre></div> +<ul> +<li>Hint on type error on int operators <a href="https://github.com/ocaml/ocaml/pull/2307">PR #2307</a>. Hint the user when using numerical operators for ints (eg. <code>+</code>) on other kind of numbers (eg. <code>float</code>, <code>int64</code>, etc..). For example:</li> +</ul> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">Line 8, characters 8-9: +8 | let _ = x + 1. + ^ +Error: This expression has type float but an expression was expected of type + int +Line 8, characters 10-11: +8 | let _ = x + 1. + ^ + Hint: Did you mean to use `+.'?</code></pre></div> +<ul> +<li> +<p>Clean up int literal hint <a href="https://github.com/ocaml/ocaml/pull/2313">PR #2313</a>. A little cleanup of the 2 previous PRs.</p> +</li> +<li> +<p>Hint when the expected type is wrapped in a ref <a href="https://github.com/ocaml/ocaml/pull/2319">PR #2319</a>. An other PR adding an hint: When the user forgot to use the <code>!</code> operator on <code>ref</code> values:</p> +</li> +</ul> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">Line 2, characters 8-9: +2 | let b = a + 1 + ^ +Error: This expression has type int ref + but an expression was expected of type int + Hint: This is a `ref', did you mean `!a'?</code></pre></div> +<p>The first 3 are merged now.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#gabriel-de-perthuis" aria-label="gabriel de perthuis permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Gabriel de Perthuis</h1> +<p>For this retreat my plan was to do something a little different and work on Solo5.</p> +<p><a href="https://github.com/mirage/wodan">Wodan</a>, the storage layer I'm working on, +needs two things from its backends which are not commonly implemented:</p> +<ul> +<li>support for discarding unused blocks (first implemented in mirage-block-unix), and</li> +<li>support for barriers, which are ordering constraints between writes</li> +</ul> +<p>Solo5 provides relevant mirage backends, which are themselves provided by various +virtualised implementations. Discard was added to most of those, at least those +that were common enough to be easily tested; we just added an &quot;operation not supported&quot; +error code for the other cases.</p> +<p>The virtio implementation was interesting; recent additions to the spec allow discard +support, but few virtual machine managers actually implement that on the backend side. +I tried to integrate with the Chromium OS &quot;crosvm&quot; for that, and had a good time +figuring out how it found the bootloader entry point (turns out the cpu was happily +skipping past invalid instructions to find a slightly misaligned entry point), but +ran out of time to figure out the rest of the integration, which seemed to be more +complex that anticipated. Because of this virtio discard support will be skipped over +for now.</p> +<p>I also visited the souk, which was an interesting experience. +Turns out I'm bad at haggling, but I brought back interesting things anyway.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusion" aria-label="conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusion</h1> +<p>We'd like to thank Hannes Mehnert who organized this retreat and all the attendees who contributed to make it fruitful and inspiring. +You want to take part in the next MirageOS retreat? Stay tuned <a href="http://retreat.mirage.io">here</a>.</p>https://tarides.com/blog/2019-05-06-7th-mirageos-hack-retreat7th MirageOS hack retreat2019-05-06T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Tarides is pleased to have contributed to the dune 1.9.0 release which +introduces the concept of library variants. Thanks to this update, +unikernels builds are becoming easier and faster in the MirageOS +universe! This also opens the door for a better cross-compilation +story, which will ease the addition of new MirageOS backends +(trustzone, ESP32, RISC-V, etc.)</p> +<p><em>This post has also been posted to the +<a href="https://dune.build/blog/dune-1-9-0/">Dune blog</a>. See also the <a href="https://discuss.ocaml.org/t/ann-dune-1-9-0/3646">the discuss +forum</a> for more +details.</em></p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#dune-190" aria-label="dune 190 permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Dune 1.9.0</h1> +<p>Changes include:</p> +<ul> +<li>Coloring in the watch mode (<a href="https://github.com/ocaml/dune/pull/1956">#1956</a>)</li> +<li><code>$ dune init</code> command to create or update project boilerplate (<a href="https://github.com/ocaml/dune/pull/1448">#1448</a>)</li> +<li>Allow &quot;.&quot; in c_names and cxx_names (<a href="https://github.com/ocaml/dune/pull/2036">#2036</a>)</li> +<li>Experimental Coq support</li> +<li>Support for library variants and default implementations (<a href="https://github.com/ocaml/dune/pull/1900">#1900</a>)</li> +</ul> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#variants" aria-label="variants permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Variants</h1> +<p>In dune 1.7.0, the concept of virtual library was introduced: +<a href="https://dune.build/blog/virtual-libraries/">https://dune.build/blog/virtual-libraries/</a>. This feature allows to +mark some abstract library as virtual, and then have several +implementations for it. These implementations could be for multiple +targets (<code>unix</code>, <code>xen</code>, <code>js</code>), using different algorithms, using C +code or not. However each implementation in a project dependency tree +had to be manually selected. Dune 1.9.0 introduces features for +automatic selection of implementations.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#library-variants" aria-label="library variants permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Library variants</h2> +<p>Variants is a tagging mechanism to select implementations on the final +linking step. There's not much to add to make your implementation use +variants. For example, you could decide to design a <code>bar_js</code> library +which is the javascript implementation of <code>bar</code>, a virtual +library. All you need to do is specificy a <code>js</code> tag using the +<code>variant</code> option.</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">(library + (name bar_js) + (implements bar) + (variant js)); &lt;-- variant specification</code></pre></div> +<p>Now any executable that depend on <code>bar</code> can automatically select the +<code>bar_js</code> library variant using the <code>variants</code> option in the dune file.</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">(executable + (name foo) + (libraries bar baz) + (variants js)); &lt;-- variants selection</code></pre></div> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#common-variants" aria-label="common variants permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Common variants</h2> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#language-selection" aria-label="language selection permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Language selection</h3> +<p>In your projects you might want to trade off speed for portability:</p> +<ul> +<li><code>ocaml</code>: pure OCaml</li> +<li><code>c</code>: OCaml accelerated by C</li> +</ul> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#javascript-backend" aria-label="javascript backend permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>JavaScript backend</h3> +<ul> +<li><code>js</code>: code aiming for a Node backend, using <code>Js_of_ocaml</code></li> +</ul> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#mirage-backends" aria-label="mirage backends permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Mirage backends</h2> +<p>The Mirage project (<a href="https://mirage.io/">mirage.io</a>) will make +extensive use of this feature in order to select the appropriate +dependencies according to the selected backend.</p> +<ul> +<li><code>unix</code>: Unikernels as Unix applications, running on top of <code>mirage-unix</code></li> +<li><code>xen</code>: Xen backend, on top of <code>mirage-xen</code></li> +<li><code>freestanding</code>: Freestanding backend, on top of <code>mirage-solo5</code></li> +</ul> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#default-implementation" aria-label="default implementation permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Default implementation</h2> +<p>To facilitate the transition from normal libraries into virtuals ones, +it's possible to specify an implementation that is selected by +default. This default implementation is selected if no implementation +is chosen after variant resolution.</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">(library + (name bar) + (virtual_modules hello) + (default_implementation bar_unix)); &lt;-- default implementation selection</code></pre></div> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#selection-mechanism" aria-label="selection mechanism permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Selection mechanism</h2> +<p>Implementation is done with respect to some priority rules:</p> +<ul> +<li>manual selection of an implementation overrides everything</li> +<li>after that comes selection by variants</li> +<li>finally unimplemented virtual libraries can select their default implementation</li> +</ul> +<p>Libraries may depend on specific implementations but this is not +recommended. In this case, several things can happen:</p> +<ul> +<li>the implementation conflicts with a manually selected implementation: resolution fails.</li> +<li>the implementation overrides variants and default implementations: a cycle check is done and this either resolves or fails.</li> +</ul> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusion" aria-label="conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusion</h2> +<p>Variant libraries and default implementations are fully <a href="https://dune.readthedocs.io/en/latest/variants.html">documented +here</a>. This +feature improves the usability of virtual libraries.</p> +<p>This +<a href="https://github.com/dune-universe/mirage-entropy/commit/576d25d79e3117bba64355ae73597651cfd27631">commit</a> +shows the amount of changes needed to make a virtual library use +variants.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#coq-support" aria-label="coq support permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Coq support</h1> +<p>Dune now supports building Coq projects. To enable the experimental Coq +extension, add <code>(using coq 0.1)</code> to your <code>dune-project</code> file. Then, +you can use the <code>(coqlib ...)</code> stanza to declare Coq libraries.</p> +<p>A typical <code>dune</code> file for a Coq project will look like:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">(include_subdirs qualified) ; Use if your development is based on sub directories + +(coqlib + (name Equations) ; Name of wrapper module + (public_name equations.Equations) ; Generate an .install file + (synopsis &quot;Equations Plugin&quot;) ; Synopsis + (libraries equations.plugin) ; ML dependencies (for plugins) + (modules :standard \ IdDec) ; modules to build + (flags -w -notation-override)) ; coqc flags</code></pre></div> +<p>See the <a href="https://github.com/ocaml/dune/blob/1.9/doc/coq.rst">documentation of the +extension</a> for more +details.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#credits" aria-label="credits permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Credits</h1> +<p>This release also contains many other changes and bug fixes that can +be found on the <a href="https://discuss.ocaml.org/t/ann-dune-1-9-0/3646">discuss +announce</a>.</p> +<p>Special thanks to dune maintainers and contributors for this release: +<a href="https://github.com/rgrinberg">@rgrinberg</a>, +<a href="https://github.com/emillon">@emillon</a>, +<a href="https://github.com/shonfeder">@shonfeder</a> +and <a href="https://github.com/ejgallego">@ejgallego</a>!</p>https://tarides.com/blog/2019-04-10-dune-1-9-0Dune 1.9.02019-04-10T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>We are pleased to announce the release of OCamlFormat (available on opam). +There have been numerous changes since the last release, +so here is a comprehensive list of the new features and breaking changes to help the transition from OCamlFormat 0.8.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#additional-dependencies" aria-label="additional dependencies permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Additional dependencies</h1> +<p>OCamlFormat now requires:</p> +<ul> +<li>ocaml &gt;= 4.06 (up from 4.04.1)</li> +<li>dune &gt;= 1.1.1</li> +<li>octavius &gt;= 1.2.0</li> +<li>uutf</li> +</ul> +<p>OCamlFormat_Reason now requires:</p> +<ul> +<li>ocaml &gt;= 4.06</li> +<li>dune &gt;= 1.1.1</li> +<li>ocaml-migrate-parsetree &gt;= 1.0.10 (up from 1.0.6)</li> +<li>octavius &gt;= 1.2.0</li> +<li>uutf</li> +<li>reason &gt;= 3.2.0 (up from 1.13.4)</li> +</ul> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#new-preset-profiles" aria-label="new preset profiles permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>New preset profiles</h1> +<p>The <code>ocamlformat</code> profile aims to take advantage of the strengths of a parsetree-based auto-formatter, +and to limit the consequences of the weaknesses imposed by the current implementation. +This is a style which optimizes for what the formatter can do best, rather than to match the style of any existing code. +General guidelines that have directed the design include:</p> +<ul> +<li>Legibility, in the sense of making it as hard as possible for quick visual parsing to give the wrong interpretation, +is of highest priority;</li> +<li>Whenever possible the high-level structure of the code should be obvious by looking only at the left margin, +in particular, it should not be necessary to visually jump from left to right hunting for critical keywords, tokens, etc;</li> +<li>All else equal compact code is preferred as reading without scrolling is easier, +so indentation or white space is avoided unless it helps legibility;</li> +<li>Attention has been given to making some syntactic gotchas visually obvious.</li> +</ul> +<p><code>ocamlformat</code> is the new default profile.</p> +<p>The <code>conventional</code> profile aims to be as familiar and &quot;conventional&quot; appearing as the available options allow.</p> +<p>The <code>default</code> profile is <code>ocamlformat</code> with <code>break-cases=fit</code>. +<code>default</code> is deprecated and will be removed in version 0.10.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#ocamlformat-diff-tool" aria-label="ocamlformat diff tool permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>OCamlFormat diff tool</h1> +<p><code>ocamlformat-diff</code> is a tool that uses OCamlFormat to apply the same formatting to compared OCaml files, +so that the formatting differences between the two files are not displayed. +Note that <code>ocamlformat-diff</code> comes in a separate opam package and is not included in the <code>ocamlformat</code> package.</p> +<p>The file comparison is then performed by any diff backend.</p> +<p>The options' documentation is available through <code>ocamlformat-diff --help</code>.</p> +<p>The option <code>--diff</code> allows you to configure the diff command that is used to compare the formatted files. +The default value is the vanilla <code>diff</code>, but you can also use <code>patdiff</code> or any other similar comparison tool.</p> +<p><code>ocamlformat-diff</code> can be integrated with <code>git diff</code>, +as explained in the <a href="https://github.com/ocaml-ppx/ocamlformat/blob/0.9/tools/ocamlformat-diff/README.md">online documentation</a>.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#formatting-docstrings" aria-label="formatting docstrings permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Formatting docstrings</h1> +<p>Previously, the docstrings <code>(** This is a docstring *)</code> could only be formatted like regular comments, +a new option <code>--parse-docstrings</code> has been added so that docstrings can be nicely formatted.</p> +<p>Here is a small example:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token comment">(** {1 Printers and escapes used by Cmdliner module} *)</span> + +<span class="token keyword">val</span> subst_vars <span class="token punctuation">:</span> subst<span class="token punctuation">:</span><span class="token punctuation">(</span>string <span class="token operator">-&gt;</span> string option<span class="token punctuation">)</span> <span class="token operator">-&gt;</span> Buffer<span class="token punctuation">.</span>t <span class="token operator">-&gt;</span> string <span class="token operator">-&gt;</span> string +<span class="token comment">(** [subst b ~subst s], using [b], substitutes in [s] variables of the form + &quot;$(doc)&quot; by their [subst] definition. This leaves escapes and markup + directives $(markup,...) intact. + @raise Invalid_argument in case of illegal syntax. *)</span></code></pre></div> +<p>Note that this option is disabled by default and you have to set it manually by adding <code>--parse-docstrings</code> to your command line +or <code>parse-docstrings=true</code> to your <code>.ocamlformat</code> file. +If you get the following error message:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">Error: Formatting of (** ... *) is unstable (e.g. parses as a list or not depending on the margin), please tighten up this comment in the source or disable the formatting using the option --no-parse-docstrings.</code></pre></div> +<p>It means the original docstring cannot be formatted (e.g. because it does not comply with the odoc syntax) +and you have to edit it or disable the formatting of docstrings.</p> +<p>Of course if you think your docstring complies with the odoc syntax and there might be a bug in OCamlFormat, +<a href="https://github.com/ocaml-ppx/ocamlformat/issues">feel free to file an issue on github</a>.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#print-the-configuration" aria-label="print the configuration permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Print the configuration</h1> +<p>The new <code>--print-config</code> flag prints the configuration determined by the environment variable, +the configuration files, preset profiles and command line. Attributes are not considered.</p> +<p>It provides the full list of options with the values they are set to, and the source of this value. +For example <code>ocamlformat --print-config</code> prints:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">profile=ocamlformat (file .ocamlformat:1) +quiet=false (profile ocamlformat (file .ocamlformat:1)) +max-iters=10 (profile ocamlformat (file .ocamlformat:1)) +comment-check=true (profile ocamlformat (file .ocamlformat:1)) +wrap-fun-args=true (profile ocamlformat (file .ocamlformat:1)) +wrap-comments=true (file .ocamlformat:5) +type-decl=compact (profile ocamlformat (file .ocamlformat:1)) +space-around-collection-expressions=false (profile ocamlformat (file .ocamlformat:1)) +single-case=compact (profile ocamlformat (file .ocamlformat:1)) +sequence-style=separator (profile ocamlformat (file .ocamlformat:1)) +parse-docstrings=true (file .ocamlformat:4) +parens-tuple-patterns=multi-line-only (profile ocamlformat (file .ocamlformat:1)) +parens-tuple=always (profile ocamlformat (file .ocamlformat:1)) +parens-ite=false (profile ocamlformat (file .ocamlformat:1)) +ocp-indent-compat=false (profile ocamlformat (file .ocamlformat:1)) +module-item-spacing=sparse (profile ocamlformat (file .ocamlformat:1)) +margin=77 (file .ocamlformat:3) +let-open=preserve (profile ocamlformat (file .ocamlformat:1)) +let-binding-spacing=compact (profile ocamlformat (file .ocamlformat:1)) +let-and=compact (profile ocamlformat (file .ocamlformat:1)) +leading-nested-match-parens=false (profile ocamlformat (file .ocamlformat:1)) +infix-precedence=indent (profile ocamlformat (file .ocamlformat:1)) +indicate-nested-or-patterns=space (profile ocamlformat (file .ocamlformat:1)) +indicate-multiline-delimiters=true (profile ocamlformat (file .ocamlformat:1)) +if-then-else=compact (profile ocamlformat (file .ocamlformat:1)) +field-space=tight (profile ocamlformat (file .ocamlformat:1)) +extension-sugar=preserve (profile ocamlformat (file .ocamlformat:1)) +escape-strings=preserve (profile ocamlformat (file .ocamlformat:1)) +escape-chars=preserve (profile ocamlformat (file .ocamlformat:1)) +doc-comments-tag-only=default (profile ocamlformat (file .ocamlformat:1)) +doc-comments-padding=2 (profile ocamlformat (file .ocamlformat:1)) +doc-comments=after (profile ocamlformat (file .ocamlformat:1)) +disable=false (profile ocamlformat (file .ocamlformat:1)) +cases-exp-indent=4 (profile ocamlformat (file .ocamlformat:1)) +break-struct=force (profile ocamlformat (file .ocamlformat:1)) +break-string-literals=wrap (profile ocamlformat (file .ocamlformat:1)) +break-sequences=false (profile ocamlformat (file .ocamlformat:1)) +break-separators=before (profile ocamlformat (file .ocamlformat:1)) +break-infix-before-func=true (profile ocamlformat (file .ocamlformat:1)) +break-infix=wrap (profile ocamlformat (file .ocamlformat:1)) +break-fun-decl=wrap (profile ocamlformat (file .ocamlformat:1)) +break-collection-expressions=fit-or-vertical (profile ocamlformat (file .ocamlformat:1)) +break-cases=fit (file .ocamlformat:2)</code></pre></div> +<p>If many input files are specified, only print the configuration for the first file. +If no input file is specified, print the configuration for the root directory if specified, +or for the current working directory otherwise.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#parentheses-around-if-then-else-branches" aria-label="parentheses around if then else branches permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Parentheses around if-then-else branches</h1> +<p>A new option <code>parens-ite</code> has been added to decide whether to use parentheses +around if-then-else branches that spread across multiple lines.</p> +<p>If this option is set, the following function:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> <span class="token keyword">rec</span> loop count a <span class="token operator">=</span> + <span class="token keyword">if</span> count <span class="token operator">&gt;=</span> self<span class="token punctuation">#</span>len + <span class="token keyword">then</span> a + <span class="token keyword">else</span> + <span class="token keyword">let</span> a' <span class="token operator">=</span> f cur<span class="token punctuation">#</span>get count a <span class="token keyword">in</span> + cur<span class="token punctuation">#</span>incr <span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span> + loop <span class="token punctuation">(</span>count <span class="token operator">+</span> <span class="token number">1</span><span class="token punctuation">)</span> a'</code></pre></div> +<p>will be formatted as:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> <span class="token keyword">rec</span> loop count a <span class="token operator">=</span> + <span class="token keyword">if</span> count <span class="token operator">&gt;=</span> self<span class="token punctuation">#</span>len + <span class="token keyword">then</span> a + <span class="token keyword">else</span> <span class="token punctuation">(</span> + <span class="token keyword">let</span> a' <span class="token operator">=</span> f cur<span class="token punctuation">#</span>get count a <span class="token keyword">in</span> + cur<span class="token punctuation">#</span>incr <span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span> + loop <span class="token punctuation">(</span>count <span class="token operator">+</span> <span class="token number">1</span><span class="token punctuation">)</span> a' <span class="token punctuation">)</span></code></pre></div> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#parentheses-around-tuple-patterns" aria-label="parentheses around tuple patterns permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Parentheses around tuple patterns</h1> +<p>A new option <code>parens-tuple-patterns</code> has been added, that mimics <code>parens-tuple</code> but only applies to patterns, +whereas <code>parens-tuples</code> only applies to expressions. +<code>parens-tuple-patterns=multi-line-only</code> mode will try to skip parentheses for single-line tuple patterns, +this is the default value. +<code>parens-tuple-patterns=always</code> always uses parentheses around tuples patterns.</p> +<p>For example:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token comment">(* with parens-tuple-patterns=always *)</span> +<span class="token keyword">let</span> <span class="token punctuation">(</span>a<span class="token punctuation">,</span> b<span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">)</span> + +<span class="token comment">(* with parens-tuple-patterns=multi-line-only *)</span> +<span class="token keyword">let</span> a<span class="token punctuation">,</span> b <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">)</span></code></pre></div> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#single-case-pattern-matching-expressions" aria-label="single case pattern matching expressions permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Single-case pattern-matching expressions</h1> +<p>The new option <code>single-case</code> defines the style of pattern-matching expressions with only a single case. +<code>single-case=compact</code> will try to format a single case on a single line, this is the default value. +<code>single-case=sparse</code> will always break the line before a single case.</p> +<p>For example:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token comment">(* with single-case=compact *)</span> +<span class="token keyword">try</span> some_irrelevant_expression +<span class="token keyword">with</span> Undefined_recursive_module <span class="token punctuation">_</span> <span class="token operator">-&gt;</span> <span class="token boolean">true</span> + +<span class="token comment">(* with single-case=sparse *)</span> +<span class="token keyword">try</span> some_irrelevant_expression +<span class="token keyword">with</span> +<span class="token operator">|</span> Undefined_recursive_module <span class="token punctuation">_</span> <span class="token operator">-&gt;</span> <span class="token boolean">true</span></code></pre></div> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#space-around-collection-expressions" aria-label="space around collection expressions permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Space around collection expressions</h1> +<p>The new option <code>space-around-collection-expressions</code> decides whether to add a space +inside the delimiters of collection expressions (lists, arrays, records).</p> +<p>For example:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token comment">(* by default *)</span> +<span class="token keyword">type</span> wkind <span class="token operator">=</span> <span class="token punctuation">{</span>f <span class="token punctuation">:</span> <span class="token type-variable function">'a</span><span class="token punctuation">.</span> <span class="token type-variable function">'a</span> tag <span class="token operator">-&gt;</span> <span class="token type-variable function">'a</span> kind<span class="token punctuation">}</span> +<span class="token keyword">let</span> l <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token string">&quot;Nil&quot;</span><span class="token punctuation">,</span> TCnoarg Thd<span class="token punctuation">;</span> <span class="token string">&quot;Cons&quot;</span><span class="token punctuation">,</span> TCarg <span class="token punctuation">(</span>Ttl Thd<span class="token punctuation">,</span> tcons<span class="token punctuation">)</span><span class="token punctuation">]</span> + +<span class="token comment">(* with space-around-collection-expressions *)</span> +<span class="token keyword">type</span> wkind <span class="token operator">=</span> <span class="token punctuation">{</span> f <span class="token punctuation">:</span> <span class="token type-variable function">'a</span><span class="token punctuation">.</span> <span class="token type-variable function">'a</span> tag <span class="token operator">-&gt;</span> <span class="token type-variable function">'a</span> kind <span class="token punctuation">}</span> +<span class="token keyword">let</span> l <span class="token operator">=</span> <span class="token punctuation">[</span> <span class="token string">&quot;Nil&quot;</span><span class="token punctuation">,</span> TCnoarg Thd<span class="token punctuation">;</span> <span class="token string">&quot;Cons&quot;</span><span class="token punctuation">,</span> TCarg <span class="token punctuation">(</span>Ttl Thd<span class="token punctuation">,</span> tcons<span class="token punctuation">)</span> <span class="token punctuation">]</span></code></pre></div> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#break-separators" aria-label="break separators permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Break separators</h1> +<p>The new option <code>break-separators</code> decides whether to break before or after separators such as <code>;</code> in list or record expressions, +<code>*</code> in tuples or <code>-&gt;</code> in arrow types. +<code>break-separators=before</code> breaks the expressions before the separator, this is the default value. +<code>break-separators=after</code> breaks the expressions after the separator. +<code>break-separators=after-and-docked</code> breaks the expressions after the separator and docks the brackets for records.</p> +<p>For example:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token comment">(* with break-separators=before *)</span> +<span class="token keyword">type</span> t <span class="token operator">=</span> + <span class="token punctuation">{</span> foooooooooooooooooooooooo<span class="token punctuation">:</span> foooooooooooooooooooooooooooooooooooooooo + <span class="token punctuation">;</span> fooooooooooooooooooooooooooooo<span class="token punctuation">:</span> fooooooooooooooooooooooooooo <span class="token punctuation">}</span> + +<span class="token comment">(* with break-separators=after *)</span> +<span class="token keyword">type</span> t <span class="token operator">=</span> + <span class="token punctuation">{</span> foooooooooooooooooooooooo<span class="token punctuation">:</span> foooooooooooooooooooooooooooooooooooooooo<span class="token punctuation">;</span> + fooooooooooooooooooooooooooooo<span class="token punctuation">:</span> fooooooooooooooooooooooooooo <span class="token punctuation">}</span> + +<span class="token comment">(* with break-separators=after-and-docked *)</span> +<span class="token keyword">type</span> t <span class="token operator">=</span> <span class="token punctuation">{</span> + foooooooooooooooooooooooo<span class="token punctuation">:</span> foooooooooooooooooooooooooooooooooooooooo<span class="token punctuation">;</span> + fooooooooooooooooooooooooooooo<span class="token punctuation">:</span> fooooooooooooooooooooooooooo +<span class="token punctuation">}</span></code></pre></div> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#not-breaking-before-bindmap-operators" aria-label="not breaking before bindmap operators permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Not breaking before bind/map operators</h1> +<p>The new option <code>break-infix-before-func</code> decides whether to break infix operators +whose right arguments are anonymous functions specially. +This option is set by default, if you disable it with <code>--no-break-infix-before-func</code>, +it will not break before the operator so that the first line of the function appears docked at the end of line after the operator.</p> +<p>For example:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token comment">(* by default *)</span> +f x +<span class="token operator">&gt;&gt;=</span> <span class="token keyword">fun</span> y <span class="token operator">-&gt;</span> +g y +<span class="token operator">&gt;&gt;=</span> <span class="token keyword">fun</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">-&gt;</span> +f x <span class="token operator">&gt;&gt;=</span> <span class="token keyword">fun</span> y <span class="token operator">-&gt;</span> g y <span class="token operator">&gt;&gt;=</span> <span class="token keyword">fun</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">-&gt;</span> f x <span class="token operator">&gt;&gt;=</span> <span class="token keyword">fun</span> y <span class="token operator">-&gt;</span> g y <span class="token operator">&gt;&gt;=</span> <span class="token keyword">fun</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">-&gt;</span> y <span class="token punctuation">(</span><span class="token punctuation">)</span> + +<span class="token comment">(* with break-infix-before-func = false *)</span> +f x <span class="token operator">&gt;&gt;=</span> <span class="token keyword">fun</span> y <span class="token operator">-&gt;</span> +g y <span class="token operator">&gt;&gt;=</span> <span class="token keyword">fun</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">-&gt;</span> +f x <span class="token operator">&gt;&gt;=</span> <span class="token keyword">fun</span> y <span class="token operator">-&gt;</span> g y <span class="token operator">&gt;&gt;=</span> <span class="token keyword">fun</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">-&gt;</span> f x <span class="token operator">&gt;&gt;=</span> <span class="token keyword">fun</span> y <span class="token operator">-&gt;</span> g y <span class="token operator">&gt;&gt;=</span> <span class="token keyword">fun</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">-&gt;</span> y <span class="token punctuation">(</span><span class="token punctuation">)</span></code></pre></div> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#break-toplevel-cases" aria-label="break toplevel cases permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Break toplevel cases</h1> +<p>There is a new value for the <code>break-cases</code> option: <code>toplevel</code>, +that forces top-level cases (i.e. not nested or-patterns) to break across lines, +otherwise breaks naturally at the margin.</p> +<p>For example:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> f <span class="token operator">=</span> + <span class="token keyword">let</span> g <span class="token operator">=</span> <span class="token keyword">function</span> + <span class="token operator">|</span> H <span class="token keyword">when</span> x y <span class="token operator">&lt;&gt;</span> k <span class="token operator">-&gt;</span> <span class="token number">2</span> + <span class="token operator">|</span> T <span class="token operator">|</span> P <span class="token operator">|</span> U <span class="token operator">-&gt;</span> <span class="token number">3</span> + <span class="token keyword">in</span> + <span class="token keyword">fun</span> x g t h y u <span class="token operator">-&gt;</span> + <span class="token keyword">match</span> x <span class="token keyword">with</span> + <span class="token operator">|</span> E <span class="token operator">-&gt;</span> <span class="token number">4</span> + <span class="token operator">|</span> Z <span class="token operator">|</span> P <span class="token operator">|</span> M <span class="token operator">-&gt;</span> <span class="token punctuation">(</span> + <span class="token keyword">match</span> y <span class="token keyword">with</span> + <span class="token operator">|</span> O <span class="token operator">-&gt;</span> <span class="token number">5</span> + <span class="token operator">|</span> P <span class="token keyword">when</span> h x <span class="token operator">-&gt;</span> <span class="token punctuation">(</span> + <span class="token keyword">function</span> + <span class="token operator">|</span> A <span class="token operator">-&gt;</span> <span class="token number">6</span> <span class="token punctuation">)</span> <span class="token punctuation">)</span></code></pre></div> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#number-of-spaces-before-docstrings" aria-label="number of spaces before docstrings permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Number of spaces before docstrings</h1> +<p>The new option <code>doc-comments-padding</code> controls how many spaces are printed before doc comments in type declarations. +The default value is 2.</p> +<p>For example:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token comment">(* with doc-comments-padding = 2 *)</span> +<span class="token keyword">type</span> t <span class="token operator">=</span> <span class="token punctuation">{</span>a<span class="token punctuation">:</span> int <span class="token comment">(** a *)</span><span class="token punctuation">;</span> b<span class="token punctuation">:</span> int <span class="token comment">(** b *)</span><span class="token punctuation">}</span> + +<span class="token comment">(* with doc-comments-padding = 1 *)</span> +<span class="token keyword">type</span> t <span class="token operator">=</span> <span class="token punctuation">{</span>a<span class="token punctuation">:</span> int <span class="token comment">(** a *)</span><span class="token punctuation">;</span> b<span class="token punctuation">:</span> int <span class="token comment">(** b *)</span><span class="token punctuation">}</span></code></pre></div> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#ignore-files" aria-label="ignore files permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Ignore files</h1> +<p>An <code>.ocamlformat-ignore</code> file specifies files that OCamlFormat should ignore. +Each line in an <code>.ocamlformat-ignore</code> file specifies a filename relative to the directory containing the <code>.ocamlformat-ignore</code> file. +Lines starting with <code>#</code> are ignored and can be used as comments.</p> +<p>Here is an example of such <code>.ocamlformat-ignore</code> file:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">#This is a comment +dir2/ignore_1.ml</code></pre></div> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#tag-only-docstrings" aria-label="tag only docstrings permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Tag-only docstrings</h1> +<p>The new option <code>doc-comments-tag-only</code> controls the position of doc comments only containing tags. +<code>doc-comments-tag-only=default</code> means no special treatment is done, this is the default value. +<code>doc-comments-tag-only=fit</code> puts doc comments on the same line if it fits.</p> +<p>For example:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token comment">(* with doc-comments-tag-only = default *)</span> + +<span class="token comment">(** @deprecated *)</span> +<span class="token keyword">open</span> Module + +<span class="token comment">(* with doc-comments-tag-only = fit *)</span> + +<span class="token keyword">open</span> Module <span class="token comment">(** @deprecated *)</span></code></pre></div> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#fit-or-vertical-mode-for-if-then-else" aria-label="fit or vertical mode for if then else permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Fit or vertical mode for if-then-else</h1> +<p>There is a new value for the option <code>if-then-else</code>: <code>fit-or-vertical</code>. +<code>fit-or-vertical</code> vertically breaks all branches if they do not fit on a single line. +Compared to the <code>compact</code> (default) value, it breaks all branches if at least one of them does not fit on a single line.</p> +<p>For example:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token comment">(* with if-then-else = compact *)</span> +<span class="token keyword">let</span> <span class="token punctuation">_</span> <span class="token operator">=</span> + <span class="token keyword">if</span> foo <span class="token keyword">then</span> + <span class="token keyword">let</span> a <span class="token operator">=</span> <span class="token number">1</span> <span class="token keyword">in</span> + <span class="token keyword">let</span> b <span class="token operator">=</span> <span class="token number">2</span> <span class="token keyword">in</span> + a <span class="token operator">+</span> b + <span class="token keyword">else</span> <span class="token keyword">if</span> foo <span class="token keyword">then</span> <span class="token number">12</span> + <span class="token keyword">else</span> <span class="token number">0</span> + +<span class="token comment">(* with if-then-else = fit-or-vertical *)</span> +<span class="token keyword">let</span> <span class="token punctuation">_</span> <span class="token operator">=</span> + <span class="token keyword">if</span> foo <span class="token keyword">then</span> + <span class="token keyword">let</span> a <span class="token operator">=</span> <span class="token number">1</span> <span class="token keyword">in</span> + <span class="token keyword">let</span> b <span class="token operator">=</span> <span class="token number">2</span> <span class="token keyword">in</span> + a <span class="token operator">+</span> b + <span class="token keyword">else</span> <span class="token keyword">if</span> foo <span class="token keyword">then</span> + <span class="token number">12</span> + <span class="token keyword">else</span> + <span class="token number">0</span></code></pre></div> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#check-mode" aria-label="check mode permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Check mode</h1> +<p>A new <code>--check</code> flag has been added. +It checks whether the input files already are formatted. +This flag is mutually exclusive with <code>--inplace</code> and <code>--output</code>. +It returns <code>0</code> if the input files are indeed already formatted, or <code>1</code> otherwise.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#break-function-declarations" aria-label="break function declarations permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Break function declarations</h1> +<p>The new option <code>break-fun-decl</code> controls the style for function declarations and types. +<code>break-fun-decl=wrap</code> breaks only if necessary, this is the default value. +<code>break-fun-decl=fit-or-vertical</code> vertically breaks arguments if they do not fit on a single line. +<code>break-fun-decl=smart</code> is like <code>fit-or-vertical</code> but try to fit arguments on their line if they fit. +The <code>wrap-fun-args</code> option now only controls the style for function calls, and no more for function declarations.</p> +<p>For example:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token comment">(* with break-fun-decl = wrap *)</span> +<span class="token keyword">let</span> ffffffffffffffffffff aaaaaaaaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbb + cccccccccccccccccccccc <span class="token operator">=</span> + g + +<span class="token comment">(* with break-fun-decl = fit-or-vertical *)</span> +<span class="token keyword">let</span> ffffffffffffffffffff + aaaaaaaaaaaaaaaaaaaaaa + bbbbbbbbbbbbbbbbbbbbbb + cccccccccccccccccccccc <span class="token operator">=</span> + g + +<span class="token comment">(* with break-fun-decl = smart *)</span> +<span class="token keyword">let</span> ffffffffffffffffffff + aaaaaaaaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbb cccccccccccccccccccccc <span class="token operator">=</span> + g</code></pre></div> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#disable-configuration-in-files-and-attributes" aria-label="disable configuration in files and attributes permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Disable configuration in files and attributes</h1> +<p>Two new options have been added so that <code>.ocamlformat</code> configuration files and attributes in OCaml files do not change the +configuration. +These options can be useful if you use some preset profile +and you do not want attributes and <code>.ocamlformat</code> files to interfere with your preset configuration. +<code>--disable-conf-attrs</code> disables the configuration in attributes, +and <code>--disable-conf-files</code> disables <code>.ocamlformat</code> configuration files.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#preserve-module-items-spacing" aria-label="preserve module items spacing permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Preserve module items spacing</h1> +<p>There is a new value for the option <code>module-item-spacing</code>: <code>preserve</code>, +that will not leave open lines between one-liners of similar sorts unless there is an open line in the input.</p> +<p>For example the line breaks are preserved in the following code:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> cmos_rtc_seconds <span class="token operator">=</span> <span class="token number">0x00</span> +<span class="token keyword">let</span> cmos_rtc_seconds_alarm <span class="token operator">=</span> <span class="token number">0x01</span> +<span class="token keyword">let</span> cmos_rtc_minutes <span class="token operator">=</span> <span class="token number">0x02</span> + +<span class="token keyword">let</span> x <span class="token operator">=</span> o + +<span class="token keyword">let</span> log_other <span class="token operator">=</span> <span class="token number">0x000001</span> +<span class="token keyword">let</span> log_cpu <span class="token operator">=</span> <span class="token number">0x000002</span> +<span class="token keyword">let</span> log_fpu <span class="token operator">=</span> <span class="token number">0x000004</span></code></pre></div> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#breaking-changes" aria-label="breaking changes permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Breaking changes</h1> +<ul> +<li>When <code>--disable-outside-detected-project</code> is set, disable ocamlformat when no <code>.ocamlformat</code> file is found.</li> +<li>Files are not parsed when ocamlformat is disabled.</li> +<li>Disallow <code>-</code> with other input files.</li> +<li>The <code>wrap-fun-args</code> option now only controls the style for function calls, and no more for function declarations.</li> +<li>The default profile is now named <code>ocamlformat</code>.</li> +<li>The deprecated syntax for <code>.ocamlformat</code> files: <code>option value</code> is no more supported anymore and you should use the <code>option = value</code> syntax instead.</li> +</ul> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#miscellaneous-bugfixes" aria-label="miscellaneous bugfixes permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Miscellaneous bugfixes</h1> +<ul> +<li>Preserve shebang (e.g. <code>#!/usr/bin/env ocaml</code>) at the beginning of a file.</li> +<li>Improve the formatting when <code>ocp-indent-compat</code> is set.</li> +<li>UTF8 characters are now correctly printed in comments.</li> +<li>Add parentheses around a constrained any-pattern (e.g. <code>let (_ : int) = x1</code>).</li> +<li>Emacs: the temporary buffer is now killed.</li> +<li>Emacs: add the keybinding in tuareg's map instead of merlin's.</li> +<li>Lots of improvements on the comments, docstrings, attributes formatting.</li> +<li>Lots of improvements on the formatting of modules.</li> +<li>Lots of improvements in the Reason support.</li> +<li>Do not rely on the file-system to format sources.</li> +<li>The <code>--debug</code> mode is more user-friendly.</li> +</ul> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#credits" aria-label="credits permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Credits</h1> +<p>This release also contains many other changes and bug fixes that we cannot detail here.</p> +<p>Special thanks to our maintainers and contributors for this release: Jules Aguillon, Mathieu Barbin, Josh Berdine, J&eacute;r&eacute;mie Dimino, Hugo Heuzard, Ludwig Pacifici, Guillaume Petiot, Nathan Rebours and Louis Roch&eacute;.</p> +<p>If you wish to get involved with OCamlFormat development or file an issue, +please read the <a href="https://github.com/ocaml-ppx/ocamlformat/blob/master/CONTRIBUTING.md">contributing guide</a>, +any contribution is welcomed.</p>https://tarides.com/blog/2019-03-29-release-of-ocamlformat-0-9Release of OCamlFormat 0.92019-03-29T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>MirageOS is a library operating system written from the ground up in OCaml. +It has an impossible and incredibly huge goal to re-implement all of the +world! Looking back at the work accomplished by the MirageOS team, it appears that's +what happened for several years. Re-implementing the entire stack, in particular +the lower layers that we often take for granted, requires a great attention to +detail. While it may seem reasonably easy to implement a given RFC, a huge +amount of work is often hidden under the surface.</p> +<p>In this article, we will explain the development process we went through, as we +updated a small part of the MirageOS stack: the library <code>ocaml-base64</code>. It's a +suitable example as the library is small (few hundreds lines of code), but it +needs ongoing development to ensure good quality and to be able to trust it for +higher level libraries (like <a href="https://github.com/mirage/mrmime">mrmime</a>).</p> +<p>Updating the library was instigated by a problem I ran into with the existing +base64 implementation while working on the e-mail stack. Indeed, we got some +errors when we tried to compute an <em>encoded-word</em> according to the <a href="https://www.ietf.org/rfc/rfc2047.txt">RFC +2047</a>. So after several years of not being touched, we decided to +update <a href="https://github.com/mirage/ocaml-base64"><code>ocaml-base64</code></a>.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#the-critique-of-pure-reason" aria-label="the critique of pure reason permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The Critique of Pure Reason</h1> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#the-first-problem" aria-label="the first problem permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The first problem</h2> +<p>We started by attempting to use <code>ocaml-base64</code> on some examples extracted from +actual e-mails, and we quickly ran into cases where the library failed. This +highlighted that reality is much more complex than you can imagine from reading +an RFC. In this situation, what do you do: try to implement a best-effort +strategy and continue parsing? Or stick to the letter of the RFC and fail? In +the context of e-mails, which has accumulated a lot of baggage over time, you +cannot get around implementing a best-effort strategy.</p> +<p>The particular error we were seeing was a <code>Not_found</code> exception when decoding an +<em>encoded-word</em>. This exception appeared because the implementation relied on +<code>String.contains</code>, and the input contained a character which was not part of the +base64 alphabet.</p> +<p>This was the first reason why we thought it necessary to rewrite <code>ocaml-base64</code>. +Of course, we could just catch the exception and continue the initial +computation, but then another reason appeared.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#the-second-problem" aria-label="the second problem permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The second problem</h2> +<p>As <a href="https://github.com/clecat">@clecat</a> and I reviewed RFC 2045, we noticed the +following requirement:</p> +<blockquote> +<p>The encoded output stream must be represented in lines of no more than 76 +characters each.</p> +<p>See RFC 2045, section 6.8</p> +</blockquote> +<p>Pretty specific, but general to e-mails, we should never have more than 78 +characters per line according to <a href="https://www.ietf.org/rfc/rfc822.txt">RFC 822</a>, nor more than 998 characters +according to <a href="https://www.ietf.org/rfc/rfc2822.txt">RFC 2822</a>.</p> +<p>Having a decoder that abided RFC 2045 more closely, including the requirement +above, further spurred us to implement a new decoder.</p> +<p>As part of the new implementation, we decided to implement tests and fuzzers to +ensure correctness. This also had the benefit, that we could run the fuzzer on +the existing codebase. When fuzzing an encoder/decoder pair, an excellent check +is whether the following isomorphism holds:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> iso0 input <span class="token operator">=</span> <span class="token keyword">assert</span> <span class="token punctuation">(</span>decode <span class="token punctuation">(</span>encode input<span class="token punctuation">)</span> <span class="token operator">=</span> input<span class="token punctuation">)</span> +<span class="token keyword">let</span> iso1 input <span class="token operator">=</span> <span class="token keyword">assert</span> <span class="token punctuation">(</span>encode <span class="token punctuation">(</span>decode input<span class="token punctuation">)</span> <span class="token operator">=</span> input<span class="token punctuation">)</span></code></pre></div> +<p>However, at this point <a href="https://github.com/hannesm">@hannesm</a> ran into another error (see +<a href="https://github.com/mirage/ocaml-base64/issues/20">#20</a>).</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#the-third-problem" aria-label="the third problem permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The third problem</h2> +<p>We started to review the <a href="https://github.com/mirleft/ocaml-nocrypto"><code>nocrypto</code></a> implementation of base64, which +respects our requirements. We had some concerns about the performance of the +implementation though, so we decided to see if we would get a performance +regression by switching to this implementation.</p> +<p>A quick benchmark based on random input revealed the opposite, however! +<code>nocrypto</code>'s implementation was faster than <code>ocaml-base64</code>:</p> +<div class="gatsby-highlight" data-language="sh"><pre class="language-sh"><code class="language-sh">ocaml-base64<span class="token string">'s implementation on bytes (length: 5000): 466 272.34ns +nocrypto'</span>s implementation on bytes <span class="token punctuation">(</span>length: <span class="token number">5000</span><span class="token punctuation">)</span>: <span class="token number">137</span> <span class="token number">406</span>.04ns</code></pre></div> +<p>Based on all these observations, we thought there was sufficient reason to +reconsider the <code>ocaml-base64</code> implementation. It's also worth mentioning that +the last real release (excluding <code>dune</code>/<code>jbuilder</code>/<code>topkg</code> updates) is from Dec. +24 2014. So, it's pretty old code and the OCaml eco-system has improved a lot +since 2014.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#implementation--review" aria-label="implementation review permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Implementation &amp; review</h1> +<p>We started integrating the <code>nocrypto</code> implementation. Of course, implementing +<a href="https://www.ietf.org/rfc/rfc4648.txt">RFC 4648</a> is not as easy as just reading examples and trying to do +something which works. The devil is in the detail.</p> +<p>@hannesm and <a href="https://github.com/cfcs">@cfcs</a> decided to do a big review of expected behavior +according to the RFC, and another about implementation and security issues.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#canonicalization" aria-label="canonicalization permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Canonicalization</h2> +<p>The biggest problem about RFC 4648 is regarding canonical inputs. Indeed, there +are cases where two different inputs are associated with the same value:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> a <span class="token operator">=</span> Base64<span class="token punctuation">.</span>decode <span class="token string">&quot;Zm9vCg==&quot;</span> <span class="token punctuation">;;</span> +<span class="token operator">-</span> <span class="token punctuation">:</span> string <span class="token operator">=</span> <span class="token string">&quot;foo\n&quot;</span> +<span class="token keyword">let</span> b <span class="token operator">=</span> Base64<span class="token punctuation">.</span>decode <span class="token string">&quot;Zm9vCh==&quot;</span> <span class="token punctuation">;;</span> +<span class="token operator">-</span> <span class="token punctuation">:</span> string <span class="token operator">=</span> <span class="token string">&quot;foo\n&quot;</span></code></pre></div> +<p>This is mostly because the base64 format encodes the input 6 bits at a time. The +result is that 4 base64 encoded bytes are equal to 3 decoded bytes (<code>6 * 4 = 8 * 3</code>). Because of this, 2 base64 encoded bytes provide 1 byte plus 4 bits. What do +we need to do with these 4 bits? Nothing.</p> +<p>That's why the last character in our example can be something else than <code>g</code>. <code>g</code> +is the canonical byte to indicate using the 2 bits afterward the 6 bits +delivered by <code>C</code> (and make a byte - 8 bits). But <code>h</code> can be used where we just +need 2 bits at the end.</p> +<p>Due to this behavior, the check used for fuzzing changes: from a canonical +input, we should check isomorphism.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#invalid-character" aria-label="invalid character permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Invalid character</h2> +<p>As mentioned above (&quot;The first problem&quot;), how should invalid characters be +handled? This happens when decoding a byte which is not a part of the base64 +alphabet. In the old version, <code>ocaml-base64</code> would simply leak a <code>Not_found</code> +exception from <code>String.contains</code>.</p> +<p>The MirageOS team has taken <a href="https://mirage.io/wiki/mirage-3.0-errors">a stance on exceptions</a>, which is +to &quot;use exceptions for exceptional conditions&quot; - invalid input is hardly one of +those. This is to avoid any exception leaks, as it can be really hard to track +the origin of an exception in a unikernel. Because of this, several packages +have been updated to return a <code>result</code> type instead, and we wanted the new +implementation to follow suit.</p> +<p>On the other hand, exceptions can be useful when considered as a more +constrained form of assembly jump. Of course, they break the control flow, but +from a performance point of view, it's interesting to use this trick:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">exception</span> Found + +<span class="token keyword">let</span> contains str chr <span class="token operator">=</span> + <span class="token keyword">let</span> idx <span class="token operator">=</span> ref <span class="token number">0</span> <span class="token keyword">in</span> + <span class="token keyword">let</span> len <span class="token operator">=</span> String<span class="token punctuation">.</span>length str <span class="token keyword">in</span> + <span class="token keyword">try</span> <span class="token keyword">while</span> <span class="token operator">!</span>idx <span class="token operator">&lt;</span> len + <span class="token keyword">do</span> <span class="token keyword">if</span> String<span class="token punctuation">.</span>unsafe_get str <span class="token operator">!</span>idx <span class="token operator">=</span> chr <span class="token keyword">then</span> raise Found <span class="token punctuation">;</span> incr idx <span class="token keyword">done</span> <span class="token punctuation">;</span> + None + <span class="token keyword">with</span> Found <span class="token operator">-&gt;</span> Some <span class="token operator">!</span>idx</code></pre></div> +<p>This kind of code for example is ~20% faster than <code>String.contains</code>.</p> +<p>As such, exceptions can be a useful tool for performance optimizations, but we +need to be extra careful not to expose them to the users of the library. This +code needs to be hidden behind a fancy functional interface. With this in mind, +we should assert that our <code>decode</code> function never leaks an exception. We'll +describe how we've adressed this problem later.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#special-cases" aria-label="special cases permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Special cases</h2> +<p>RFC 4648 has some detailed cases and while we would sometimes like to work in a +perfect world where we will never need to deal with such errors, from our +experience, we cannot imagine what the end-user will do to formats, protocols +and such.</p> +<p>Even though the RFC has detailed examples, we have to read between lines to know +special cases and how to deal with them.</p> +<p>@hannesm noticed one of these cases, where padding (<code>=</code> sign at the end of +input) is not mandatory:</p> +<blockquote> +<p>The pad character &quot;=&quot; is typically percent-encoded when used in an URI [9], +but if the data length is known implicitly, this can be avoided by skipping +the padding; see section 3.2.</p> +<p>See RFC 4648, section 5</p> +</blockquote> +<p>That mostly means that the following kind of input can be valid:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> a <span class="token operator">=</span> Base64<span class="token punctuation">.</span>decode <span class="token label property">~pad</span><span class="token punctuation">:</span><span class="token boolean">false</span> <span class="token string">&quot;Zm9vCg&quot;</span> +<span class="token operator">-</span> <span class="token punctuation">:</span> string <span class="token operator">=</span> <span class="token string">&quot;foo\n&quot;</span></code></pre></div> +<p>It's only valid in a specific context though: when <em>length is known implicitly</em>. +Only the caller of <code>decode</code> can determine whether the length is implicitly known +such that padding can be omitted. To that end, we've added a new optional +argument <code>?pad</code> to the function <code>Base64.decode</code>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#allocation-sub-off-and-len" aria-label="allocation sub off and len permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Allocation, <code>sub</code>, <code>?off</code> and <code>?len</code></h2> +<p>Xavier Leroy has described the garbage collector in the following way:</p> +<blockquote> +<p>You see, the Caml garbage collector is like a god from ancient mythology: +mighty, but very irritable. If you mess with it, it'll make you suffer in +surprising ways.</p> +</blockquote> +<p>That's probably why my experience with improving the allocation policy of +(<code>ocaml-git</code>)<a href="https://github.com/mirage/ocaml-git">ocaml-git</a> was quite a nightmare. Allowing the user to control +allocation is important for efficiency, and we wanted to <code>ocaml-base64</code> to be a +good citizen.</p> +<p>At the beginning, <code>ocaml-base64</code> had a very simple API:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">val</span> decode <span class="token punctuation">:</span> string <span class="token operator">-&gt;</span> string +<span class="token keyword">val</span> encode <span class="token punctuation">:</span> string <span class="token operator">-&gt;</span> string</code></pre></div> +<p>This API forces allocations in two ways.</p> +<p>Firstly, if the caller needs to encode a part of a string, this part needs to be +extracted, e.g. using <code>String.sub</code>, which will allocate a new string. To avoid +this, two new optional arguments have been added to <code>encode</code>: <code>?off</code> and <code>?len</code>, +which specifies the substring to encode. Here's an example:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token comment">(* We want to encode the part 'foo' without prefix or suffix *)</span> + +<span class="token comment">(* Old API -- forces allocation *)</span> +Base64<span class="token punctuation">.</span>encode <span class="token punctuation">(</span>String<span class="token punctuation">.</span>sub <span class="token string">&quot;prefix foo suffix&quot;</span> <span class="token number">7</span> <span class="token number">3</span><span class="token punctuation">)</span> <span class="token punctuation">;;</span> +<span class="token operator">-</span> <span class="token punctuation">:</span> string <span class="token operator">=</span> <span class="token string">&quot;Zm9v&quot;</span> + +<span class="token comment">(* New API -- avoids allocation *)</span> +Base64<span class="token punctuation">.</span>encode <span class="token label property">~off</span><span class="token punctuation">:</span><span class="token number">7</span> <span class="token label property">~len</span><span class="token punctuation">:</span><span class="token number">3</span> <span class="token string">&quot;prefix foo suffix&quot;</span> <span class="token punctuation">;;</span> +<span class="token operator">-</span> <span class="token punctuation">:</span> string <span class="token operator">=</span> <span class="token string">&quot;Zm9v&quot;</span></code></pre></div> +<p>Secondly, a new string is allocated to hold the resulting string. We can +calculate a bound on the length of this string in the following manner:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> <span class="token punctuation">(</span><span class="token operator">//</span><span class="token punctuation">)</span> x y <span class="token operator">=</span> + <span class="token keyword">if</span> y <span class="token operator">&lt;</span> <span class="token number">1</span> <span class="token keyword">then</span> raise Division_by_zero <span class="token punctuation">;</span> + <span class="token keyword">if</span> x <span class="token operator">&gt;</span> <span class="token number">0</span> <span class="token keyword">then</span> <span class="token number">1</span> <span class="token operator">+</span> <span class="token punctuation">(</span><span class="token punctuation">(</span>x <span class="token operator">-</span> <span class="token number">1</span><span class="token punctuation">)</span> <span class="token operator">/</span> y<span class="token punctuation">)</span> <span class="token keyword">else</span> <span class="token number">0</span> + +<span class="token keyword">let</span> encode input <span class="token operator">=</span> + <span class="token keyword">let</span> res <span class="token operator">=</span> Bytes<span class="token punctuation">.</span>create <span class="token punctuation">(</span>String<span class="token punctuation">.</span>length input <span class="token operator">//</span> <span class="token number">3</span> <span class="token operator">*</span> <span class="token number">4</span><span class="token punctuation">)</span> <span class="token keyword">in</span> + <span class="token operator">..</span><span class="token punctuation">.</span> + +<span class="token keyword">let</span> decode input <span class="token operator">=</span> + <span class="token keyword">let</span> res <span class="token operator">=</span> Bytes<span class="token punctuation">.</span>create <span class="token punctuation">(</span>String<span class="token punctuation">.</span>length input <span class="token operator">//</span> <span class="token number">4</span> <span class="token operator">*</span> <span class="token number">3</span><span class="token punctuation">)</span> <span class="token keyword">in</span> + <span class="token operator">..</span><span class="token punctuation">.</span></code></pre></div> +<p>Unfortunately we cannot know the exact length of the result prior to computing +it. This forces a call to <code>String.sub</code> at the end of the computation to return a +string of the correct length. This means we have two allocations rather than +one. To avoid the additional allocation, [@avsm][avsm] proposed to provide a new +type <code>sub = string * int * int</code>. This lets the user call <code>String.sub</code> if +required (and allocate a new string), or use simply use the returned <code>sub</code> for +_blit_ting to another buffer or similar.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#fuzz-everything" aria-label="fuzz everything permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Fuzz everything!</h1> +<p>There's a strong trend of fuzzing libraries for MirageOS, which is quite easy +thanks to the brilliant work by <a href="https://github.com/yomimono">@yomimono</a> and <a href="https://github.com/stedolan">@stedolan</a>! +The integrated fuzzing in OCaml builds on <a href="http://lcamtuf.coredump.cx/afl/">American fuzzy lop</a>, which is +very smart about discarding paths of execution that have already been tested and +generating unseen inputs which break your assumptions. My first experience with +fuzzing was with the library <a href="https://github.com/mirage/decompress"><code>decompress</code></a>, and I was impressed by +<a href="https://github.com/mirage/decompress/pull/34">precise error</a> it found about a name clash.</p> +<p>Earlier in this article, I listed some properties we wanted to check for +<code>ocaml-base64</code>:</p> +<ul> +<li>The functions <code>encode</code> and <code>decode</code> should be be isomorphic taking +canonicalization into account:</li> +</ul> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> iso0 input <span class="token operator">=</span> + <span class="token keyword">match</span> Base64<span class="token punctuation">.</span>decode <span class="token label property">~pad</span><span class="token punctuation">:</span><span class="token boolean">false</span> input <span class="token keyword">with</span> + <span class="token operator">|</span> Error <span class="token punctuation">_</span> <span class="token operator">-&gt;</span> fail <span class="token punctuation">(</span><span class="token punctuation">)</span> + <span class="token operator">|</span> Ok result0 <span class="token operator">-&gt;</span> + <span class="token keyword">let</span> result1 <span class="token operator">=</span> Base64<span class="token punctuation">.</span>encode_exn result0 <span class="token keyword">in</span> + <span class="token keyword">match</span> Base64<span class="token punctuation">.</span>decode <span class="token label property">~pad</span><span class="token punctuation">:</span><span class="token boolean">true</span> result1 <span class="token keyword">with</span> + <span class="token operator">|</span> Error <span class="token punctuation">_</span> <span class="token operator">-&gt;</span> fail <span class="token punctuation">(</span><span class="token punctuation">)</span> + <span class="token operator">|</span> Ok result2 <span class="token operator">-&gt;</span> check_eq result0 result2 + +<span class="token keyword">let</span> iso1 input <span class="token operator">=</span> + <span class="token keyword">let</span> result <span class="token operator">=</span> Base64<span class="token punctuation">.</span>encode_exn input <span class="token keyword">in</span> + <span class="token keyword">match</span> Base64<span class="token punctuation">.</span>decode <span class="token label property">~pad</span><span class="token punctuation">:</span><span class="token boolean">true</span> result0 <span class="token keyword">with</span> + <span class="token operator">|</span> Error <span class="token punctuation">_</span> <span class="token operator">-&gt;</span> fail <span class="token punctuation">(</span><span class="token punctuation">)</span> + <span class="token operator">|</span> Ok result1 <span class="token operator">-&gt;</span> + <span class="token keyword">let</span> result2 <span class="token operator">=</span> Base64<span class="token punctuation">.</span>encode_exn result1 <span class="token keyword">in</span> + check_eq result0 result2</code></pre></div> +<ul> +<li>The function <code>decode</code> should <em>never</em> raise an exception, but rather return a +result type:</li> +</ul> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> no_exn input <span class="token operator">=</span> + <span class="token keyword">try</span> ignore <span class="token operator">@@</span> Base64<span class="token punctuation">.</span>decode input <span class="token keyword">with</span> <span class="token punctuation">_</span> <span class="token operator">-&gt;</span> fail <span class="token punctuation">(</span><span class="token punctuation">)</span></code></pre></div> +<ul> +<li>And finally, we should randomize <code>?off</code> and <code>?len</code> arguments to ensure that we +don't get an <code>Out_of_bounds</code> exception when accessing input.</li> +</ul> +<p>Just because we've applied fuzzing to the new implementation for a long time, it +doesn't mean that the code is completely infallible. People can use our library +in an unimaginable way (and it's mostly what happens in the real world) and get +an unknowable error.</p> +<p>But, with the fuzzer, we've managed to test some properties across a very wide +range of input instead of unit testing with random (or not so random) inputs +from our brains. This development process allows <em>fixing the semantics</em> of +implementations (even if it's <strong>not</strong> a formal definition of semantics), but +it's better than nothing or outdated documentation.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusion" aria-label="conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusion</h1> +<p>Based on our recent update to <code>ocaml-base64</code>, this blog post explains our +development process as go about rewriting the world to MirageOS, one bit at a +time. There's an important point to be made though:</p> +<p><code>ocaml-base64</code> is a small project. Currently, the implementation is about 250 +lines of code. So it's a really small project. But as stated in the +introduction, we are fortunate enough to push the restart button of the computer +world - yes, we want to make a new operating system.</p> +<p>That's a massive task, and we shouldn't make it any harder on ourselves than +necessary. As such, we need to justify any step, any line of code, and why we +decided to spend our time on any change (why we decided to re-implement <code>git</code> +for example). So before committing any time to projects, we try to do a deep +analysis of the problem, get feedback from others, and find a consensus between +what we already know, what we want and what we should have (in the case of +<code>ocaml-base64</code>, @hannesm did a look on the PHP implementation and the Go +implementation).</p> +<p>Indeed, this is a hard question which nobody can answer perfectly in isolation. +So, the story of this update to <code>ocaml-base64</code> is an invitation for you to enter +the arcanas of the computer world through MirageOS :) ! Don't be afraid!</p>https://tarides.com/blog/2019-02-08-release-of-base64Release of Base642019-02-08T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Dune comes with a library to query OS-specific information, called configurator. +It is able to evaluate C expressions and turn them into OCaml value. +Surprisingly, it even works when compiling for a different architecture. How can +it do that?</p>https://dune.build/blog/configurator-constants/How configurator reads C constants2019-01-03T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Presentation about MirageOS in Lambda World Cad&igrave;z on October 26th</p>https://www.youtube.com/watch?v=urG5BjvjW18MirageOS, towards a smaller and safer OS2018-12-06T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>I'm very happy to announce a new major release of <code>ocaml-git</code> (2.0). +This release is a 2-year effort to get a revamped +streaming API offering a full control over memory +allocation. This new version also adds production-ready implementations of +the wire protocol: <code>git push</code> and <code>git pull</code> now work very reliably +using the raw Git and smart HTTP protocol (SSH support will come +soon). <code>git gc</code> is also implemented, and all of the basic bricks are +now available to create Git servers. MirageOS support is available +out-of-the-box.</p> +<p>Two years ago, we decided to rewrite <code>ocaml-git</code> and split it into +standalone libraries. More details about these new libraries are also +given below.</p> +<p>But first, let's focus on <code>ocaml-git</code>'s new design. The primary goal was +to fix memory consumption issues that our users noticed with the previous version, +and to make <code>git push</code> work reliably. We also took care about +not breaking the API too much, to ease the transition for current users.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#controlled-allocations" aria-label="controlled allocations permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Controlled allocations</h2> +<p>There is a big difference in the way <code>ocaml-git</code> and <code>git</code> +are designed: <code>git</code> is a short-lived command-line tool which does not +care that much about allocation policies, whereas we wanted to build a +library that can be linked with long-lived Git client and/or server +applications. We had to make some (performance) compromises to support +that use-case, at the benefit of tighter allocation policies &mdash; and hence +more predictable memory consumption patterns. +Other Git libraries such as <a href="https://libgit2.org/">libgit2</a> +also have to <a href="https://libgit2.org/security/">deal</a> with similar concerns.</p> +<p>In order to keep a tight control on the allocated memory, we decided to +use <a href="https://github.com/mirage/decompress">decompress</a> instead of +<code>camlzip</code>. <code>decompress</code> allows the users to provide their own buffer +instead of allocating dynamically. This allowed us to keep a better +control on memory consumption. See below for more details on <code>decompress</code>.</p> +<p>We also used <a href="https://github.com/inhabitedtype/angstrom">angstrom</a> and +<a href="https://github.com/mirage/encore">encore</a> to provide a streaming interface +to encode and decode Git objects. The streaming API is currently hidden +to the end-user, but it helped us a lot to build abstraction and, again, on +managing the allocation policy of the library.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#complete-pack-file-support-including-gc" aria-label="complete pack file support including gc permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Complete PACK file support (including GC)</h2> +<p>In order to find the right abstraction for manipulating pack files in +a long-lived application, we experimented with +<a href="https://github.com/dinosaure/sirodepac">various</a> +<a href="https://github.com/dinosaure/carton">prototypes</a>. We haven't found the +right abstractions just yet, but we believe the PACK format could be useful +to store any kind of data in the future (and not especially Git objects).</p> +<p>We implemented <code>git gc</code> by following the same heuristics as +<a href="https://github.com/git/git/blob/master/Documentation/technical/pack-heuristics.txt">Git</a> +to compress pack files and +we produce something similar in size &mdash; <code>decompress</code> has a good ratio about +compression &mdash; and we are using <code>duff</code>, our own implementation of <code>xdiff</code>, the +binary diff algorithm used by Git (more details on <code>duff</code> below). +We also had to re-implement the streaming algorithm to reconstruct <code>idx</code> files on +the fly, when receiving pack file on the network.</p> +<p>One notable feature of our compression algorithms is they work without +the assumption that the underlying system implements POSIX: hence, +they can work fully in-memory, in a browser using web storage or +inside a MirageOS unikernel with <a href="https://github.com/mirage/wodan">wodan</a>.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#production-ready-push-and-pull" aria-label="production ready push and pull permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Production-ready push and pull</h2> +<p>We re-implemented and abstracted the <a href="https://github.com/git/git/blob/master/Documentation/technical/http-protocol.txt">Git Smart protocol</a>, and used that +abstraction to make <code>git push</code> and <code>git pull</code> work over HTTP. By +default we provide a <a href="https://github.com/mirage/cohttp">cohttp</a> +implementation but users can use their own &mdash; for instance based on +<a href="https://github.com/inhabitedtype/httpaf">httpaf</a>. +As proof-of-concept, the <a href="https://github.com/mirage/ocaml-git/pull/227">initial +pull-request</a> of <code>ocaml-git</code> 2.0 was +created using this new implementation; moreover, we wrote a +prototype of a Git client compiled with <code>js_of_ocaml</code>, which was able +to run <code>git pull</code> over HTTP inside a browser!</p> +<p>Finally, that implementation will allow MirageOS unikernels to synchronize their +internal state with external Git stores (hosted for instance on GitHub) +using push/pull mechanisms. We also expect to release a server-side implementation +of the smart HTTP protocol, so that the state of any unikernel can be inspected +via <code>git pull</code>. Stay tuned for more updates on that topic!</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#standalone-dependencies" aria-label="standalone dependencies permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Standalone Dependencies</h2> +<p>Below you can find the details of the new stable releases of libraries that are +used by <code>ocaml-git</code> 2.0.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#optint-and-checkseum" aria-label="optint and checkseum permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><code>optint</code> and <code>checkseum</code></h3> +<p>In some parts of <code>ocaml-git</code>, we need to compute a Circular +Redundancy Check value. It is 32-bit integer value. <code>optint</code> provides +an abstraction of it but structurally uses an unboxed integer or a +boxed <code>int32</code> value depending on target (32 bit or 64 bit architecture).</p> +<p><code>checkseum</code> relies on <code>optint</code> and provides 3 implementations of CRC:</p> +<ul> +<li>Adler32 (used by <code>zlib</code> format)</li> +<li>CRC32 (used by <code>gzip</code> format and <code>git</code>)</li> +<li>CRC32-C (used by <code>wodan</code>)</li> +</ul> +<p><code>checkseum</code> uses the <em>linking trick</em>: this means that users of the +library program against an abstract API (only the <code>cmi</code> is provided); +at link-time, users have to select which implementation to use: +<code>checkseum.c</code> (the C implementation) or <code>checkseum.ocaml</code> (the OCaml +implementation). The process is currently a bit cumbersome but upcoming +<code>dune</code> release will make that process much more transparent to the users.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#encore-angkor" aria-label="encore angkor permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><code>encore</code> (/<em>angkor</em>/)</h3> +<p>In <code>git</code>, we work with Git <em>objects</em> (<em>tree</em>, <em>blob</em> or +<em>commit</em>). These objects are encoded in a specific format. Then, +the hash of these objects are computed from the encoded +result to get a unique identifier. For example, the hash of your last commit is: +<code>sha1(encode(commit))</code>.</p> +<p>A common operation in <code>git</code> is to decode Git objects from an encoded +representation of them (especially in <code>.git/objects/*</code> as a <em>loose</em> +file) and restore them in another part of your Git repository (like in a +PACK file or on the command-line).</p> +<p>Hence, we need to ensure that encoding is always deterministic, and +that decoding an encoded Git object is always the identity, e.g. there is +an <em>isomorphism</em> between the decoder and the encoder.</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">let</span> decoder <span class="token operator">&lt;.&gt;</span> encoder <span class="token punctuation">:</span> <span class="token keyword">value</span> <span class="token operator">-&gt;</span> <span class="token keyword">value</span> <span class="token operator">=</span> id +<span class="token keyword">let</span> encoder <span class="token operator">&lt;.&gt;</span> decoder <span class="token punctuation">:</span> string <span class="token operator">-&gt;</span> string <span class="token operator">=</span> id</code></pre></div> +<p><a href="https://github.com/mirage/encore">encore</a> is a library in which you +can describe a format (like Git format) and from it, we can derive a +streaming decoder <strong>and</strong> encoder that are isomorphic by +construction.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#duff" aria-label="duff permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><code>duff</code></h3> +<p><a href="https://github.com/mirage/duff">duff</a> is a pure implementation in +OCaml of the <code>xdiff</code> algorithm. +Git has an optimized representation of your Git repository. It's a +PACK file. This format uses a binary diff algorithm called <code>xdiff</code> +to compress binary data. <code>xdiff</code> takes a source A and a target B and try +to find common sub-strings between A and B.</p> +<p>This is done by a Rabin's fingerprint of the source A applied to the +target B. The fingerprint can then be used to produce a lightweight +representation of B in terms of sub-strings of A.</p> +<p><code>duff</code> implements this algorithm (with additional Git's constraints, +regarding the size of the sliding windows) in OCaml. It provides a +small binary <code>xduff</code> that complies with the format of Git without the <code>zlib</code> +layer.</p> +<div class="gatsby-highlight" data-language="sh"><pre class="language-sh"><code class="language-sh">$ xduff <span class="token function">diff</span> <span class="token builtin class-name">source</span> target <span class="token operator">&gt;</span> target.xduff +$ xduff patch <span class="token builtin class-name">source</span> <span class="token operator">&lt;</span> target.xduff <span class="token operator">&gt;</span> target.new +$ <span class="token function">diff</span> target target.new +$ <span class="token builtin class-name">echo</span> <span class="token variable">$?</span> +<span class="token number">0</span></code></pre></div> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#decompress" aria-label="decompress permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><code>decompress</code></h3> +<p><a href="https://github.com/mirage/decompress">decompress</a> +is a pure implementation in OCaml of <code>zlib</code> and +<code>rfc1951</code>. You can compress and decompress data flows and, obviously, +Git does this compression in <em>loose</em> files and PACK files.</p> +<p>It provides a non-blocking interface and is easily usable in a server +context. Indeed, the implementation never allocates and only relies on +what the user provides (<code>window</code>, input and output buffer). Then, the +distribution provides an easy example of how to use <code>decompress</code>:</p> +<div class="gatsby-highlight" data-language="ocaml"><pre class="language-ocaml"><code class="language-ocaml"><span class="token keyword">val</span> inflate<span class="token punctuation">:</span> <span class="token operator">?</span>level<span class="token punctuation">:</span>int <span class="token operator">-&gt;</span> string <span class="token operator">-&gt;</span> string +<span class="token keyword">val</span> deflate<span class="token punctuation">:</span> string <span class="token operator">-&gt;</span> string</code></pre></div> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#digestif" aria-label="digestif permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><code>digestif</code></h3> +<p><a href="https://github.com/mirage/digestif">digestif</a> is a toolbox providing +many implementations of hash algorithms such as:</p> +<ul> +<li>MD5</li> +<li>SHA1</li> +<li>SHA224</li> +<li>SHA256</li> +<li>SHA384</li> +<li>SHA512</li> +<li>BLAKE2B</li> +<li>BLAKE2S</li> +<li>RIPEMD160</li> +</ul> +<p>Like <code>checkseum</code>, <code>digestif</code> uses the linking trick too: from a +shared interface, it provides 2 implementations, in C (<code>digestif.c</code>) +and OCaml (<code>digestif.ocaml</code>).</p> +<p>Regarding Git, we use the SHA1 implementation and we are ready to +migrate <code>ocaml-git</code> to BLAKE2{B,S} as the Git core team expects - and, +in the OCaml world, it is just a <em>functor</em> application with +another implementation.</p> +<h3 style="position:relative;"><a href="https://tarides.com/feed.xml#eqaf" aria-label="eqaf permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a><code>eqaf</code></h3> +<p>Some applications require that secret values are compared in constant +time. Functions like <code>String.equal</code> do not have this property, so we +have decided to provide a small package &mdash; <a href="https://github.com/mirage/eqaf">eqaf</a> &mdash; +providing a <em>constant-time</em> <code>equal</code> function. +<code>digestif</code> uses it to check equality of hashes &mdash; it also exposes +<code>unsafe_compare</code> if you don't care about timing attacks in your application.</p> +<p>Of course, the biggest work on this package is not about the +implementation of the <code>equal</code> function but a way to check the +constant-time assumption on this function. Using this, we did a +<a href="https://github.com/mirage/eqaf/tree/master/test">benchmark</a> on Linux, +Windows and Mac to check it.</p> +<p>An interesting fact is that after various experiments, we replaced the +initial implementation in C (extracted from OpenBSD's <a href="https://man.openbsd.org/timingsafe_bcmp.3">timingsafe_memcmp</a>) with an OCaml +implementation behaving in a much more predictable way on all the +tested platforms.</p> +<h2 style="position:relative;"><a href="https://tarides.com/feed.xml#conclusion" aria-label="conclusion permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Conclusion</h2> +<p>The upcoming version 2.0 of <a href="https://irmin.org">Irmin</a> is using ocaml-git +to create small applications that <a href="https://github.com/mirage/irmin/blob/master/examples/push.ml">push and pull their state +to GitHub</a>. +We think that Git offers a very nice model to persist data for distributed +applications and we hope that more people will use ocaml-git to experiment +and manipulate application data in Git. Please +<a href="https://github.com/mirage/ocaml-git/issues">send us</a> your feedback!</p>https://tarides.com/blog/2018-10-19-ocaml-git-2-0ocaml-git 2.02018-10-19T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>We are proud to announce the release of OCamlFormat 0.8 (available on opam). To ease the transition from the previous 0.7 release here are some highlights of the new features of this release. The <a href="https://github.com/ocaml-ppx/ocamlformat/blob/v0.8/CHANGES.md#08-2018-10-09">full changelog</a> is available on the project repository.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#precedence-of-options" aria-label="precedence of options permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Precedence of options</h1> +<p>In the previous version you could override command line options with <code>.ocamlformat</code> files configuration. 0.8 fixed this so that the OCamlFormat configuration is first established by reading <code>.ocamlformat</code> and <code>.ocp-indent</code> files:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">margin = 77 +wrap-comments = true</code></pre></div> +<p>By default, these files in current and all ancestor directories for each input file are used, as well as the global configuration defined in <code>$XDG_CONFIG_HOME/ocamlformat</code>. The global <code>$XDG_CONFIG_HOME/ocamlformat</code> configuration has the lowest priority, then the closer the directory is to the processed file, the higher the priority. In each directory, both <code>.ocamlformat</code> and <code>.ocp-indent</code> files are read, with <code>.ocamlformat</code> files having the higher priority.</p> +<p>For now <code>ocp-indent</code> options support is very partial and is expected to be extended in the future.</p> +<p>Then the parameters can be overriden with the <code>OCAMLFORMAT</code> environment variable:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">OCAMLFORMAT=field-space=tight,type-decl=compact</code></pre></div> +<p>and finally the parameters can be overriden again with the command lines parameters.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#reading-input-from-stdin" aria-label="reading input from stdin permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Reading input from stdin</h1> +<p>It is now possible to read the input from stdin instead of OCaml files. The following command invokes OCamlFormat that reads its input from the pipe:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">echo &quot;let f x = x + 1&quot; | ocamlformat --name a.ml -</code></pre></div> +<p>The <code>-</code> on the command line indicates that <code>ocamlformat</code> should read from stdin instead of expecting input files. It is then necessary to use the <code>--name</code> option to designate the input (<code>a.ml</code> here).</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#preset-profiles" aria-label="preset profiles permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Preset profiles</h1> +<p>Preset profiles allow you to have a consistent configuration without needing to tune every option.</p> +<p>Preset profiles set all options, overriding lower priority configuration. A preset profile can be set using the <code>--profile</code> (or <code>-p</code>) option. You can pass the option <code>--profile=&lt;name&gt;</code> on the command line or add <code>profile = &lt;name&gt;</code> in an <code>.ocamlformat</code> configuration file.</p> +<p>The available profiles are:</p> +<ul> +<li><code>default</code> sets each option to its default value</li> +<li><code>compact</code> sets options for a generally compact code style</li> +<li><code>sparse</code> sets options for a generally sparse code style</li> +<li><code>janestreet</code> is the profile used at JaneStreet</li> +</ul> +<p>To get a better feel of it, here is the formatting of the <a href="https://github.com/ocaml/ocaml/blob/trunk/typing/env.ml#L227-L234"><code>mk_callback</code></a> function from the OCaml compiler with the <code>compact</code> profile:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">let mk_callback rest name desc = function + | None -&gt; nothing + | Some f -&gt; ( + fun () -&gt; + match rest with + | [] -&gt; f name None + | (hidden, _) :: _ -&gt; f name (Some (desc, hidden)) )</code></pre></div> +<p>then the same function formatted with the <code>sparse</code> profile:</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">let mk_callback rest name desc = function + | None -&gt; + nothing + | Some f -&gt; + fun () -&gt; + ( match rest with + | [] -&gt; + f name None + | (hidden, _) :: _ -&gt; + f name (Some (desc, hidden)) )</code></pre></div> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#project-root" aria-label="project root permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Project root</h1> +<p>The project root of an input file is taken to be the nearest ancestor directory that contains a <code>.git</code> or <code>.hg</code> or <code>dune-project</code> file. +If the new option <code>--disable-outside-detected-project</code> is set, <code>.ocamlformat</code> configuration files are not read outside of the current project. If no configuration file is found, formatting is disabled.</p> +<p>A new option, <code>--root</code> allows to specify the root directory for a project. If specified, OCamlFormat only takes into account <code>.ocamlformat</code> configuration files inside the root directory and its subdirectories.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#credits" aria-label="credits permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Credits</h1> +<p>This release also contains many other changes and bug fixes that we cannot detail here. Check out the <a href="https://github.com/ocaml-ppx/ocamlformat/blob/v0.8/CHANGES.md#08-2018-10-09">full changelog</a>.</p> +<p>Special thanks to our maintainers and contributors for this release: David Allsopp, Josh Berdine, Hugo Heuzard, Brandon Kase, Anil Madhavapeddy and Guillaume Petiot.</p>https://tarides.com/blog/2018-10-17-ocamlformat-0-8OCamlFormat 0.82018-10-17T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>The OCaml Users and Developers Workshop brings together industrial +users of OCaml with academics and hackers who are working on extending +the language, type system and tools. OCaml 2018 was held on September +27th, 2018 in St. Louis, Missouri, USA, colocated with ICFP 2018.</p> +<p><strong>Check Tarides' talks: <a href="https://docs.google.com/presentation/d/e/2PACX-1vRnRiGeBWC6ctpSge0gTFuxprNTiS2qtNpvax_A8pD6Ob5ySfL9_SlPKCIoLDCbmsYjTAkMFnlUwqSl/pub?start=false&amp;loop=false&amp;delayms=3000&amp;slide=id.p1">RFCs, all the way down!</a> and <a href="https://speakerdeck.com/avsm/the-ocaml-platform-1-dot-0-2018">The OCaml Platform 1.0</a>. +</strong></p>https://icfp18.sigplan.org/track/ocaml-2018-papers#programOCaml Workshop 20182018-09-27T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>After a tiny but important patch release as 1.1.1, the dune team is thrilled to +announce the release of dune 1.2.0! Here are some highlights of the new +features in that version. The full list of changes can be found <a href="https://github.com/ocaml/dune/blob/e3af33b43a87d7fa2d15f7b41d8bd942302742ec/CHANGES.md#120-14092018">in the dune +repository</a>.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#watch-mode" aria-label="watch mode permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Watch mode</h1> +<p>When developing, it is common to edit a file, run a build, read the error +message, and fix the error. Since this is a very tight loop and developers are +doing this hundreds or thousands times a day, it is crucial to have the +quickest feedback possible.</p> +<p><code>dune build</code> and <code>dune runtest</code> now accept <a href="https://dune.readthedocs.io/en/latest/usage.html#watch-mode">a <code>-w</code> +flag</a> that will +watch the filesystem for changes, and trigger a new build.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#better-error-messages" aria-label="better error messages permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Better error messages</h1> +<p>Inspired by the great work done in +<a href="http://elm-lang.org/blog/compiler-errors-for-humans">Elm</a> and +<a href="https://reasonml.github.io/blog/2017/08/25/way-nicer-error-messages.html">bucklescript</a>, +dune now displays the relevant file in error messages.</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text"> % cat dune +(executable + (name my_program) + (librarys cmdliner) +) + % dune build +File &quot;dune&quot;, line 3, characters 2-10: +3 | (librarys cmdliner) + ^^^^^^^^ +Error: Unknown field librarys +Hint: did you mean libraries?</code></pre></div> +<p>Many messages have also been improved, in particular to help users <a href="https://dune.readthedocs.io/en/latest/migration.html#check-list">switching +from the <code>jbuild</code> format to the <code>dune</code> +format</a>.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#dune-unstable-fmt" aria-label="dune unstable fmt permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>dune unstable-fmt</h1> +<p>Are you confused about how to format S-expressions? You are not alone. +That is why we are gradually introducing a formatter for <code>dune</code> files. It can +transform a valid but ugly <code>dune</code> into one that is consistently formatted.</p> +<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text"> % cat dune +(executable( name ls) (libraries cmdliner) +(preprocess (pps ppx_deriving.std))) + % dune unstable-fmt dune +(executable + (name ls) + (libraries cmdliner) + (preprocess + (pps ppx_deriving.std) + ) +)</code></pre></div> +<p>This feature is not ready yet for the end user (hence the <code>unstable</code> part), +and in particular the concrete syntax is not stable yet. +But having it already in the code base will make it possible to build useful +integrations with <code>dune</code> itself (to automatically reformat all dune files in a +project, for example) and common editors, so that they format <code>dune</code> files on +save.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#first-class-support-of-findlib-plugins" aria-label="first class support of findlib plugins permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>First class support of findlib plugins</h1> +<p>It is now easy to support findlib plugins by just adding the <code>findlib.dynload</code> +library dependency. Then you can use <code>Fl_dynload</code> module in your code which +will automatically do the right thing. <a href="https://dune.readthedocs.io/en/latest/advanced-topics.html#dynamic-loading-of-packages">A complete example can be found in the +dune manual</a>.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#promote-only-certain-files" aria-label="promote only certain files permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Promote only certain files</h1> +<p>The <code>dune promote</code> command now accept a list of files. This is useful to +promote just the file that is opened in a text editor for example. Some emacs +bindings are provided to do this, which works particularly well with +<a href="https://dune.readthedocs.io/en/latest/tests.html#inline-expectation-tests">inline expectation tests</a>.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#deprecation-message-for-wrapped-modes" aria-label="deprecation message for wrapped modes permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Deprecation message for (wrapped) modes</h1> +<p>By default, libraries are <code>(wrapped true)</code>, which means that they expose a +single OCaml module (source files are exposed as submodules of this main +module). This is usually desired as it makes link-time name collisions less +likely. However, a lot of libraries are using <code>(wrapped false)</code> (expose all +source files as modules) to keep compatibility.</p> +<p>It can be challenging to transition from <code>(wrapped false)</code> to <code>(wrapped true)</code> +because it breaks compatibility. That is why we have added <code>(wrapped (transition &quot;message&quot;))</code> which will generate wrapped modules but keep unwrapped +modules with a deprecation message to help coordinate the change.</p> +<h1 style="position:relative;"><a href="https://tarides.com/feed.xml#credits" aria-label="credits permalink" class="anchor before"><svg aria-hidden="true" focusable="false" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Credits</h1> +<p>Special thanks to our contributors for this release: @aantron, @anuragsoni, +@bobot, @ddickstein, @dra27, @drjdn, @hongchangwu, @khady, @kodek16, +@prometheansacrifice and @ryyppy.</p>https://tarides.com/blog/2018-09-06-dune-1-2-0Dune 1.2.02018-09-06T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>We are thrilled to have been accepted into the Founders Progam's 3rd +batch at <a href="https://stationf.co/">Station F</a>! Station F is +&quot;the only startup campus gathering a whole entrepreneurial ecosystem +under one roof&quot; and is providing 3000+ desks and 26 international +startup programs. Our Paris offices are now located in that incredible +place, close to &quot;m&eacute;tro Chevaleret&quot; (Paris 13).</p> +<p><strong>If you are in Paris, drop us an email to visit +<a href="%20https://stationf.co/campus/">our beautiful campus</a>!</strong></p>https://stationf.co/Station F2018-07-17T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Thomas Gazagnaire gave a presentation on MirageOS to the +<a href="https://www.meetup.com/ocaml-paris/">OCaml meetup in Paris</a>.</p> +<p><strong>Check the <a href="http://gazagnaire.org/pub/2018.05.OUPS.pdf">slides</a> +for more details.</strong></p>https://www.meetup.com/ocaml-paris/events/250836568/OCaml Users in Paris (OUPS)2018-05-23T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>We are excited to announce that the <a href="https://tezos.foundation">Tezos Foundation</a> +will trust Tarides to package Tezos nodes as MirageOS unikernel, which will help participants +establish nodes on the Tezos network in a more efficient and secure manner.</p>https://tezos.foundation/MirageOS + Tezos funding2018-05-23T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Zach Shipko is working on improving the UI/UX for Irmin. +He is looking for <a href="https://discuss.ocaml.org/t/irmin-usability-enhancements/2017">feedback</a> +to make Irmin more accessible to potential users and clean up the rough edges for existing users.</p>https://discuss.ocaml.org/t/irmin-usability-enhancements/2017Irmin usability enhancements2018-05-18T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Thomas Gazagnaire gave an invited lecture at +<a href="http://www.di.ens.fr/">the computer science department of ENS</a>, +in Paris. This was part of the system and network L3 course.</p> +<p><strong>Check the <a href="http://gazagnaire.org/ens/mirage.pdf">slides</a> (in english) +and the <a href="http://gazagnaire.org/ens/mirage.tar.gz">exercices</a> (in french).</strong></p>http://www.di.ens.fr/~pouzet/cours/systeme/Invited lecture at ENS2018-05-17T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Anil Madhavapeddy and Gemma Gordon presented our new operating system for +connected buildings: <a href="http://kcsrk.info/papers/osmose_feb_18.pdf">OSMOSE</a> +to <a href="http://hotpost18.weebly.com/">HotPOST&rsquo;18</a>. OSMOSE is based on +MirageOS and Irmin and we hope to explore that area more in the coming months!</p>http://infocom2018.ieee-infocom.org/content/workshop-hotpost-hot-topics-pervasive-mobile-and-online-social-networking-programHotPOST'182018-04-16T00:00:00-00:00tarideshttps://tarides.com/feed.xmltarides<p>Position paper on +<a href="http://kcsrk.info/papers/osmose_feb_18.pdf">&ldquo;An Architecture for Interspatial Communication&rdquo;</a> +accepted to <a href="http://hotpost18.weebly.com/">HotPOST&rsquo;18</a>.</p>http://kcsrk.info/papers/osmose_feb_18.pdfAn Architecture for Interspatial Communication2018-02-14T00:00:00-00:00tarides \ No newline at end of file