Using nutch to index urls on AWS. We need a nutch guy. AWS experience is preferable.
Will be working with a java dev from Belarus.
Project takes list of urls, crawls these sites, then "snaps" image of each page. Converts page to thumbnail. Stores both in S3. A db of images is maintained in simpledb for the frontend UI to go thru them.