Est. Budget: $300.00
Required is a Python application that can quickly check hundreds of thousands of urls for headers and content, make decisions about the data it receives and ultimately store the results.
The platform is Ubuntu 12.04.03 and the preferred language is Python 2.7. RAM, CPU and bandwidth should not pose and significant barriers and can be upgraded on the server if providing a bottleneck.
The source data is stored in a MySQL database and the results ...