What is nutch?
Nutch is an open source web-search software. It builds on Lucene and Solr, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc. Nutch is an effort to build a Free and Open Source search engine. It uses Lucene for the search and index component. The fetcher (robot) has been written from scratch solely for this project.


