Request for Development
LUE has commissioned the creation of a database to warehouse lead data scraped from our existing OEP. This is made possible by python based screenscraping technology developed for LUE by a third party. This is accomplished via the library Mechanize, however due to revisions in the OEP reporting tool, it is currently non-functional. SQL is managed via the library SQLAlchemy. This project is currently hosted on Linode infrastructure. The current interface for this software is powered by the Twisted engine – a python based event driven network application engine similar to the python web framework Django or perhaps node.js. Additional development work was done to interact with the email service provider MailChimp, but this API interaction script was never utilized.
Initial project completion target has been placed at 2/15/2013. Due to the short window of time to complete this project, major revisions, rewrites or expansions of the current code base is likely not feasible barring considerable expense. It is therefore recommended that the existing Python codebase be retained, corrected and expanded.
Proposed Revisions to Existing Code Base:
Existing code base must be revised account for the changes in our OEP’s reporting tool to re-enable data capture. These changes are unlikely to be significant, as the break in functionality is likely due to small changes in the way that LTOOLs (OEPs reporting tool) HTML is structured.
Proposed Extension: Sanitation
LUE has selected the provider Tower Data to sanitize our lead data. Tower Data provides real
time email validation via SOAP service and also via batch upload. LUE would like to have leads sanitized via the soap service after the scrape has been completed. It is also necessary to have this sanitized status recorded in the database. This is to ensure that only actionable leads are retained in our active database.
Proposed Extension: Interspire Upload
LUE has selected the company Interspire to act as our email service provider. Interspire provides a robust XML based API service, using a self hosted API client written in PHP. Since their documentation provides code examples in PHP that leverage cURL to post data to this client, the pycURL library could be leveraged to interact with this client. This API provides functionality for email lead upload, list management, and checks against existing lists.
Proposed Extension: Portability
At time of initial development LUE had little to no front facing infrastructure, so hosting with the
provider linode was a practical and necessary choice. Upon completion of the previous revisions and extensions, migration of this project from this linode server to an LUE controlled amazon web services instance will be necessary to support the forward evolution of the Deal of the Day project. After successful demonstration of the completed system, credentials will be issued by LUE IT staff, and migration to the new server should be completed. Once this migration has been successfully verified, the previous linode server should be decommissioned.
Towerdata API docs: http://www.towerdata.com/services/
Interspire API docs: http://www.interspire.com/emailmar
Twisted Web App docs: http://twistedmatrix.com/documents
Mechanize Library: http://wwwsearch.sourceforge.net/m
Skills: management, amazon, pdf