When you are interested in crawled website information, you submit a Task to the people running this project. Tasks are components which run in the processing pipelines.
Every Task should have exactly one purpose: write different Tasks when you have different needs.
Each Task has the following components:
- Filter rules: which page contains useful data for you;
- Extraction tools: what data would you like to collect from it (knowledge extract effort is shared); and
- Packaging options: how the extracted data is transported to you. Usually downloading zips or WARCs via http or ftp.
- Pipeline Tasks;
- Pipeline Pages;
- Data License