Adding content-focused tests to Travis CI

Hi,

I'm the author of [Vale](https://github.com/ValeLint/vale), a natural language linter that has been used to find a number of minor issues (e.g., https://github.com/linode/docs/issues/921 and https://github.com/linode/docs/issues/1418) in your documentation.

I'd like to discuss taking this a bit further by incorporating some content-oriented tests into your CI suite. I know https://github.com/linode/docs/issues/1418 mentions that you never formally adopted Vale due to false positives, but I still believe that&mdash;with the right tuning&mdash;automated testing can be a powerful tool.

## Overview

I've run an automated analysis on your [docs](https://github.com/linode/docs/tree/master/docs) directory that looks for common typos/mistakes. A summary of this analysis is given below (a **more detailed report** is available in the [accompanying Gist](https://gist.github.com/jdkato/b6681f56db46c75f7b7df35b3bb0d3e2)):
 
<table>
  <caption>Mistakes found in <b>945</b> Markdown files.</caption>
  <tr>
    <th>Spelling</th>
    <th>Grammar</th>
    <th>Style</th>
  </tr>
  <tr>
    <td>160</td>
    <td>11</td>
    <td>91</td>
  </tr>
</table>

I think many of these issues (in particular, spelling and style) could be caught prior to doing a manual review of the content.

## Implementation

By default, Vale runs a lot of tests that are somewhat context- and opinion-dependent. These are mostly intended to be examples of what it's capable of doing&mdash;not rules that I think are generally applicable to writing. And, since Linode seems to have a good editing process already in place, I think we should keep the configuration simple: spelling and capitalization.

**Handling non-prose sections**

Vale takes a multistage approach to this problem:

1. In Markdown, code blocks (both [fenced](http://spec.commonmark.org/0.28/#fenced-code-blocks) and [indented](http://spec.commonmark.org/0.28/#indented-code-blocks)), [code spans](http://spec.commonmark.org/0.28/#code-spans), and [front matter](https://gohugo.io/content-management/front-matter/) are ignored by default.
2. You may specify regex-based `IgnorePatterns`, which represent non-standard sections to be ignored. This was designed with template engines in mind, but it's also a good fit for [shortcodes](https://linode.com/docs/linode-writers-formatting-guide/#files-and-file-excerpts). From what I can tell, `{{< file >}}`, `{{< file-excerpt >}}`, `{{< output >}}`, and `{{< highlight ... >}}` should be ignored.
3. You may specify a list of HTML tags to ignore. You typically don't need to do this since (1) handles most cases, but it seems like Linode uses `<strong>` similarly to `<code>` in many cases. 

**Spelling**

It's surprisingly difficult to check spelling without having to sift through hundreds of false positives, even when using standard spelling dictionaries (such as [Hunspell](https://github.com/hunspell/hunspell), which Vale supports). The most common solution to this problem is to create a "personal" vocabulary of terms (e.g., the one the Rust team [made for their book](https://github.com/rust-lang/book/blob/master/second-edition/dictionary.txt)).

For starters, I've generated a vocabulary for Linode with 1,459 initial terms. After a misspelling is reported, the workflow would be to either add the term to the known list or fix the mistake.

**Capitalization**

This is pretty straightforward: as you encounter incorrectly capitalized terms, you'd add them to a rule extending the [Substitution check](https://valelint.github.io/docs/styles/#substitution). I've started by making a rule for the issues outlined in the linked Gist, but there are likely some that were not reported since I just looked for a few of the most common ones.

**Performance**

The performance hit on your builds should be negligible. It should take Vale less than 10 seconds to spell check and lint the entirety of your docs directory.

------

If this seems like something you'd be interested in exploring, I'll put together a PR implementing the ideas above.

Thoughts?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding content-focused tests to Travis CI #1463

Overview

Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Adding content-focused tests to Travis CI #1463

Description

Overview

Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions