I have a wordpress plugin that works by scraping google for domains then checking them to see if they are expired and can be re-registered.
The plugin is fully developed and "works" But I need an expert developer to come in and make it faster, better threading, and better processing of proxies etc.
You need to have good experience in developing scripts and plugins and hopefully scraping type plugins.
The purpose of this plugin is to scrape data from google – we input a search term and google will search the web for websites that include this search term. The plugin will then take all of the search result websites and scrape them for URL’s – external links. The plugin then checks all of the domains it’s scraped to see if they are dead/expired and can be registered.
The plugin does this through a 2 stage process – DNS then Whois checking. It checks to see if the domain is currently hosted somewhere (Very fast) if the website is not being hosted, then the plugin does a whois check (slow) – through this, we can then eliminate the slowness of the whois check by only using it to check a small % of the domains it extracts.
The plugin then displays the domain as AVAILABLE and will show up live in the “Domains found so far” box and also at the end once it’s finished checking – in a downloadable excel format. The plugin supports multithreading so it will be searching/scraping/checking all at the same time and will display live domains as it finds them.
The plugin has been developed to run on Cron jobs – so this plugin can be run on any sized host – shared/dedicated etc. We want this plugin to be able to support very large multithreading without timing out the server etc. Essentially, it should run on multi threads that allow us to search hundreds of keywords and utilize thousands of proxies at the same time
Currently, the plugin seems to have speed issues and it’s not displaying outputs properly.
We need a new proxy checking mechanism – the checker should check all the proxies and start running searches as the proxies come back as LIVE – so we start getting “dead domains” asap, the object of this plugin is to get results as quickly as possible. The checker should run as many threads as there are proxies, ensuring fastest speeds possible.
I asked for a blacklisting functionality to be integrated – when it searches and checks a domain, if the domain is LIVE and can’t be registered, then it goes into a blacklist that ensures that the domain won’t be checked again if it shows up again in another search run. This has been coded in already, but not utilized – please get that working.
That’s all there is to it really, the other functionality is already there – custom search range etc, we just need to establish why it’s not working exactly right, earlier alpha versions of this from my previous developer were working fine, but they just had speed issues.
I need you to go through and get it up to scratch allowing fastest processing possible and ensure it works properly and we will be good to go. So half your job will be debugging then some quick fixing at the end :)
So, can you do this?
Skills: multithreading, debugging