"Classic" <meta> elements

Many extensions have been made to the <meta> elements in HTML pages. Not only have people be adding new values for name, but also various attributes where added, like property. Let's call the official short list of possibilities "classic".

The classic <meta> elements come in three forms:

  • with attribute name, with a restricted set of names;
  • with attribute charset, maximum one; and
  • with attribute http-equiv, all of them.

This extractor does take all http-equiv records, because there are few and old extensions have been made to the list reported by W3Schools. There SHOULD be only one meta element with a charset.


The classic name attributes

The classic set of names can be found at MDN "standard metadata names", but with convincing arguments, a few names MAY be added.

This extractor currently takes name values for:

META NAME values taken
application-name
author
creator
color-scheme
description
generator
googlebot
keywords
publisher
referrer
robots
theme-color
viewport

Produced data-structure

How and where the data-structure with the facts are transported is your decision, but the output looks like this:

{
   "name" : {
      "description" : "The Open Graph protocol enables...",
      "generator" : "Хей, гиди Ванчо"
   },
   "charset" : "utf-8",
   "http-equiv" : {
      "content-type" : "text/html;charset=utf-8",
      "refresh" : "3;url=https://www.mozilla.org",
      "content-disposition" : ""
   }
}

Both the name and http-equiv can appear with multiple <meta>-elements, and have a unique label. Therefore, they are produces as simple associative array (HASH) with only simple values. Always UTF-8 and entity decoded.