Filter on content-type

Often you want to select Products based on content type, for instance only PDFs of HTML. You may provide a list of regular expressions or abstract names like "any html", "html5 only", or "LibreOffice products".

Some crawlers will only produce a very limited number of types (for instance, CommonCrawl is more than 90% HTML with accidental other types like PDF). Filtering on content-type is therefore useful (for response status 200 replies)


Hit information

When you include the hit is your results, you will get something like

{ "rule": "content type",
  "type": "text/html"
}

The type value is always lower-case and normalized to IANA definitions and following RFC 2046.