Filter on content-type
Often you want to select Products based on content type, for instance only PDFs of HTML. You may provide a list of regular expressions or abstract names like "any html", "html5 only", or "LibreOffice products".
Some crawlers will only produce a very limited number of types (for instance, CommonCrawl is more than 90% HTML with accidental other types like PDF). Filtering on content-type is therefore useful (for response status 200 replies)
Hit information
When you include the hit is your results, you will get something like
{ "rule": "content type", "type": "text/html" }
The type
value is always lower-case and normalized
to IANA definitions and following
RFC 2046.