-
Couldn't load subscription status.
- Fork 436
Description
This file -
https://github.com/html5lib/html5lib-tests/blob/master/validator/langattribute.test
contains some 1394 invalid lang attribute values, from
<span lang=roh>
to
<span lang='en '>
At present tidy make no test of the lang value, except for some like -
<span lang=' '>
where it will report -
line 1 column 1 - Warning: attribute "lang" lacks value
At present Tidy has NO TABLE of valid lang values, thus no check is made of the value given.
A simple sample case
<!DOCTYPE html>
<html>
<head>
<title>invalid lang code</title>
<meta charset="utf-8">
</head>
<body>
<p><span lang="roh">Invalid 'roh'</span></p>
</body>
</html>
Tidy will pass this with -
No warnings or errors were found.
While the W3C validator will show an error:
Line 8, Column 20: Bad value roh for attribute lang on element span: The language subtag roh is not a valid ISO language part of a language tag.
And show additional information like:
Syntax of language tag:
An RFC 5646[1] language tag consists of hyphen-separated ASCII-alphanumeric subtags. There is a primary tag identifying a natural language by its shortest ISO 639 language code (e.g. en for English) and zero or more additional subtags adding precision. The most common additional subtag type is a region subtag which most commonly is a two-letter ISO 3166 country code (e.g. GB for the United Kingdom). IANA maintains a registry of permissible subtags[2].
A future tidy should also perform this test, using a list from the language-subtag-registry file.
[1] https://tools.ietf.org/html/rfc5646
[2] http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry