Java Scraping - Extracting Content From The Page

Java Scraping - Extracting Content From The Page

Cancelled

Job Description

We are looking for a PoC app showing how to extract main content from the random html page, stripping everything else out (navigation, banners, sides, etc) .

Similar to what instapaper does with random content page.

I have attached list of random html pages covering similar topic, result application should intelligently extract only main content from the page.

!!! To be considered for the job, please outline general direction you would take

Open Attachment

Other open jobs by this client