Task Composition

Correctly implementing the Tasks requires deep understanding of the internals of the Pipeline (which also do change every once in a while), so that's a job we do for you. To be able to implement the filter, you need to give us a formal description of your need.

For each of the filter rules you need

  • give it a label, like "A" or "lang_nl";
  • tell the filter parameter, for instance "language NLD or FRY";
  • tell whether you would like to get the 'hit' information in the result.

Then, use the labels to produce a formula.

Example of Task composition
# Filters
html      = any HTML or XML
domain_nl = domain in .nl
lang_nl   = language NLD or FRY
cities    = words "Amsterdam", "Arnhem", "Gouda"
bike      = pattern "bike|bicycle|fiets.*

# Extract
Select where html AND ((domain_nl AND lang_nl) OR cities)
Include hits for bikes and cities

# Packaging
To be downloaded as zip via ftp.

There is no strict formalism, so feel invited to use comments and other clarifying syntax. But do use the two-step approach: separate filter steps from filter logic. Constructing the filter is an iterative process: you probably want to fine-tune your initial attempts.