Task Composition
Correctly implementing the Tasks requires deep understanding of the internals of the Pipeline (which also do change every once in a while), so that's a job we do for you. To be able to implement the filter, you need to give us a formal description of your need.
For each of the filter rules you need
- give it a label, like "A" or "lang_nl";
- tell the filter parameter, for instance "language NLD or FRY";
- tell whether you would like to get the 'hit' information in the result.
Then, use the labels to produce a formula.
Example of Task composition
# Filters html = any HTML or XML domain_nl = domain in .nl lang_nl = language NLD or FRY cities = words "Amsterdam", "Arnhem", "Gouda" bike = pattern "bike|bicycle|fiets.* # Extract Select where html AND ((domain_nl AND lang_nl) OR cities) Include hits for bikes and cities # Packaging To be downloaded as zip via ftp.
There is no strict formalism, so feel invited to use comments and other clarifying syntax. But do use the two-step approach: separate filter steps from filter logic. Constructing the filter is an iterative process: you probably want to fine-tune your initial attempts.