Nutch / Hadoop / AWS

Nutch / Hadoop / AWS

Cancelled

Job Description

Using nutch to index urls on AWS. We need a nutch guy. AWS experience is preferable.

Will be working with a java dev from Belarus.

Project takes list of urls, crawls these sites, then "snaps" image of each page. Converts page to thumbnail. Stores both in S3. A db of images is maintained in simpledb for the frontend UI to go thru them.