Apache SOLR,Nutch and Tika Project

Apache SOLR,Nutch and Tika Project


Job Description

I Need to Develop a Service which need the following Functionality and the result Should Be Available in XML,JSON

Step 1 : We Provide Keywords for Search (May be Multiple Keywords)
Step 2: Solr or Nutch Need to Crawl all the Results for all the Keywords
Step 3 :The result URL Pages content Need to Store In Hadoop as HDFS
Step 4 : Tika need to Get that Data and Meta data from those Files and return In the Form of JSON,XML
Step 5: Need to Provide API to Pass Keywords and Get Result Pages (Even they are 1000s of Pages fine)

Please Comeback If You Have Expertise On this as this is really Urgent Project