"Classic" <meta> elements
Many extensions have been made to the <meta>
elements in HTML pages. Not only have people be adding new values
for name
, but also various attributes where added,
like property
. Let's call the official short list of
possibilities "classic".
The classic <meta>
elements come in three forms:
- with attribute
name
, with a restricted set of names; - with attribute
charset
, maximum one; and - with attribute
http-equiv
, all of them.
This extractor does take all http-equiv
records, because
there are few and old extensions have been made to the list reported
by W3Schools.
There SHOULD be only one meta element with a charset
.
Writing Tasks
The classic name attributes
The classic set of names can be found at MDN "standard metadata names", but with convincing arguments, a few names MAY be added.
This extractor currently takes name
values for:
application-name author creator color-scheme description generator googlebot keywords publisher referrer robots theme-color viewport
Produced data-structure
How and where the data-structure with the facts are transported is your decision, but the output looks like this:
{ "name" : { "description" : "The Open Graph protocol enables...", "generator" : "Хей, гиди Ванчо" }, "charset" : "utf-8", "http-equiv" : { "content-type" : "text/html;charset=utf-8", "refresh" : "3;url=https://www.mozilla.org", "content-disposition" : "" } }
Both the name
and http-equiv
can appear with
multiple <meta>
-elements, and have a unique label.
Therefore, they are produces as simple associative array (HASH) with
only simple values. Always UTF-8 and entity decoded.