Skip to content

Adding content-focused tests to Travis CI #1463

@jdkato

Description

@jdkato

Hi,

I'm the author of Vale, a natural language linter that has been used to find a number of minor issues (e.g., #921 and #1418) in your documentation.

I'd like to discuss taking this a bit further by incorporating some content-oriented tests into your CI suite. I know #1418 mentions that you never formally adopted Vale due to false positives, but I still believe that—with the right tuning—automated testing can be a powerful tool.

Overview

I've run an automated analysis on your docs directory that looks for common typos/mistakes. A summary of this analysis is given below (a more detailed report is available in the accompanying Gist):

Mistakes found in 945 Markdown files.
Spelling Grammar Style
160 11 91

I think many of these issues (in particular, spelling and style) could be caught prior to doing a manual review of the content.

Implementation

By default, Vale runs a lot of tests that are somewhat context- and opinion-dependent. These are mostly intended to be examples of what it's capable of doing—not rules that I think are generally applicable to writing. And, since Linode seems to have a good editing process already in place, I think we should keep the configuration simple: spelling and capitalization.

Handling non-prose sections

Vale takes a multistage approach to this problem:

  1. In Markdown, code blocks (both fenced and indented), code spans, and front matter are ignored by default.
  2. You may specify regex-based IgnorePatterns, which represent non-standard sections to be ignored. This was designed with template engines in mind, but it's also a good fit for shortcodes. From what I can tell, {{< file >}}, {{< file-excerpt >}}, {{< output >}}, and {{< highlight ... >}} should be ignored.
  3. You may specify a list of HTML tags to ignore. You typically don't need to do this since (1) handles most cases, but it seems like Linode uses <strong> similarly to <code> in many cases.

Spelling

It's surprisingly difficult to check spelling without having to sift through hundreds of false positives, even when using standard spelling dictionaries (such as Hunspell, which Vale supports). The most common solution to this problem is to create a "personal" vocabulary of terms (e.g., the one the Rust team made for their book).

For starters, I've generated a vocabulary for Linode with 1,459 initial terms. After a misspelling is reported, the workflow would be to either add the term to the known list or fix the mistake.

Capitalization

This is pretty straightforward: as you encounter incorrectly capitalized terms, you'd add them to a rule extending the Substitution check. I've started by making a rule for the issues outlined in the linked Gist, but there are likely some that were not reported since I just looked for a few of the most common ones.

Performance

The performance hit on your builds should be negligible. It should take Vale less than 10 seconds to spell check and lint the entirety of your docs directory.


If this seems like something you'd be interested in exploring, I'll put together a PR implementing the ideas above.

Thoughts?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions