-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Hi,
I'm the author of Vale, a natural language linter that has been used to find a number of minor issues (e.g., #921 and #1418) in your documentation.
I'd like to discuss taking this a bit further by incorporating some content-oriented tests into your CI suite. I know #1418 mentions that you never formally adopted Vale due to false positives, but I still believe that—with the right tuning—automated testing can be a powerful tool.
Overview
I've run an automated analysis on your docs directory that looks for common typos/mistakes. A summary of this analysis is given below (a more detailed report is available in the accompanying Gist):
Spelling | Grammar | Style |
---|---|---|
160 | 11 | 91 |
I think many of these issues (in particular, spelling and style) could be caught prior to doing a manual review of the content.
Implementation
By default, Vale runs a lot of tests that are somewhat context- and opinion-dependent. These are mostly intended to be examples of what it's capable of doing—not rules that I think are generally applicable to writing. And, since Linode seems to have a good editing process already in place, I think we should keep the configuration simple: spelling and capitalization.
Handling non-prose sections
Vale takes a multistage approach to this problem:
- In Markdown, code blocks (both fenced and indented), code spans, and front matter are ignored by default.
- You may specify regex-based
IgnorePatterns
, which represent non-standard sections to be ignored. This was designed with template engines in mind, but it's also a good fit for shortcodes. From what I can tell,{{< file >}}
,{{< file-excerpt >}}
,{{< output >}}
, and{{< highlight ... >}}
should be ignored. - You may specify a list of HTML tags to ignore. You typically don't need to do this since (1) handles most cases, but it seems like Linode uses
<strong>
similarly to<code>
in many cases.
Spelling
It's surprisingly difficult to check spelling without having to sift through hundreds of false positives, even when using standard spelling dictionaries (such as Hunspell, which Vale supports). The most common solution to this problem is to create a "personal" vocabulary of terms (e.g., the one the Rust team made for their book).
For starters, I've generated a vocabulary for Linode with 1,459 initial terms. After a misspelling is reported, the workflow would be to either add the term to the known list or fix the mistake.
Capitalization
This is pretty straightforward: as you encounter incorrectly capitalized terms, you'd add them to a rule extending the Substitution check. I've started by making a rule for the issues outlined in the linked Gist, but there are likely some that were not reported since I just looked for a few of the most common ones.
Performance
The performance hit on your builds should be negligible. It should take Vale less than 10 seconds to spell check and lint the entirety of your docs directory.
If this seems like something you'd be interested in exploring, I'll put together a PR implementing the ideas above.
Thoughts?