Skip to content

Clarify attribute properties added to objects in rel-urls. #32

@Zegnat

Description

@Zegnat

Current spec text:

  • add keys to the hash of the key with name url for each of these attributes (if present) and key not already set:
    • "hreflang": the value of the "hreflang" attribute
    • "media": the value of the "media" attribute
    • "title": the value of the "title" attribute
    • "type": the value of the "type" attribute
    • "text": the text content of the element if any

I think this should clarify 1) what to do with empty attribute values, and 2) what exactly “if any” means.

  1. Currently parsers check the existence of an attribute, but never check their value. Thus empty attributes (e.g. hreflang="") will lead to empty strings being added to the object in rel-urls. This means the objects in rel-urls will always match with the authored HTML.

    The drawback of this is that a link parsed later in the source document may not overwrite the empty value any more because the key is already present, even if they add information. Example:

    <a href="#a" rel="a" hreflang=""></a>
    <a href="#a" rel="a" hreflang="en"></a>

    Will lead to (edited to only show the affected property):

    {
      "rel-urls": {
        "#a": {
          "hreflang": ""
         }
      }
    }

    Clearly the empty string adds no information about the URL #a, but because it came first in source order that’s the one we keep.

  2. Currently the text property should only be added if there is “any” text content. But nowhere does it define what text content means or when we consider there to be none.

    This has already lead to a difference in implementations. The Python parser will add an empty text property, the PHP parser will not. Example:

    <a href="#a" rel="a"></a>

    In PHP:

    {
      "rel-urls": {
        "#a": {
          "rels": [ "a" ]
        }
      }
    }

    In Python:

    {
      "rel-urls": {
        "#a": {
          "rels": [ "a" ],
          "text": ""
        }
      }
    }

I believe that, in both cases, storing an empty string value for any property is useless and adds no data. With the overwriting logic part of the parser, we may even miss out on information. Therefore I would propose rewriting the spec as follows:

  • add the following keys to the hash of the key with name url, unless the key is already set:
    • hreflang: the value of the element’s hreflang attribute, if defined and not an empty string
    • media: the value of the element’s media attribute, if defined and not an empty string
    • title: the value of the element’s title attribute, if defined and not an empty string
    • type: the value of the element’s type attribute, if defined and not an empty string
    • text: the textContent of the element, if not an empty string

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions