Wikipedia and its sister projects consist of content that is free to study, share, improve and reuse. One of the ways we make the content readily available and searchable is by indexing changed contents at regular intervals and make them available to our search engines.
The Wikimedia Foundation deploys Lucene as the Search Engine backbone for its Wikimedia projects. We're looking for a consultant who will assist in the continuing development and operational work of the Search software stack and the infrastructure. The candidate is a subject matter expert on Lucene Search technology and will provide guidance to the Foundation Technical Operations team on maintaining, improving and migrating the Search infrastructure. Documentation on current deployment can be found at - http://wikitech.wikimedia.org/view
Scope of Work:
Work on the enhancement and the daily operational matters such as improving efficiency, capacity and redundancy of the Lucene Search infrastructure
- Help in troubleshooting unexpected outages and identifying operational issues
- Profile and locate performance bottlenecks
- Make use of Puppet as the the Configuration management tool in maintaining the manifest for the Lucene configuration
- Deploy Lucene Search infrastructure at our new data center.
Upgrade and migrate current Search software stack to work with the latest Lucene version
- Upgrade to current new release of Lucene
- Develop and upgrade Mediawiki search extensions (MWSearch and Lucene-search) to work with the new Lucene release. MWSearch extension is a MediaWiki backend to fetch search results from MediaWiki Lucene-based search engine. Lucene-search extends the Apache Lucene search API to rank pages based on number of backlinks, distributed searching and indexing, parsing of wiki text, incremental updates, etc.
- Automate, optimize and document the indexing and deployment process.
- Have strong knowledge of, Java, Php and Linux
- Experience with configuration management systems and concepts (e.g. puppet, chef, cfengine)
- Experience with operating system distribution packaging systems (e.g. dpkg, RPM)
- Have solid experience with production and processing of large datasets
- Be able to work independently where needed, and can work remotely as part of a globally distributed team
- Have relevant hands-on experience and eagerness to learn and try new concepts
- Be comfortable in a highly collaborative, consensus-oriented environment
- Be a proficient speaker in the English language
- Prior work experience implementing Lucene / Solr Search engines
Skills: troubleshooting, puppet, apache, linux