I would like indexing a filesystem with Solr,
and maintaining such indexation periodically, updating changes, removing deleted items, etc...
The expected delivery artifacts are :
- schema.xml matching filesystem information (file name, directory path, creation date, modified date, owner, size, + all tika extractable info, content, author, etc...), needing facets hierarchy to deal cleverly with filesystem concept and browsing requests.
- java autonomous program that drill down filesystem, and synchronize remote solr instance with observed information.
- solr-web portlet configured to search with defined schema
- simple additional portlet to "browse" in the indexed filesystem
Please note the specific constrains :
- The only expected referential will be solr (typically the java autonomous program shouldn't use any cache).
- Take great attention to "update" and "delete" cases.
- Relay on last-modification date to identify updates.
- Solr instance it-self won't have access to the filesystem (ie: Tika extraction must be handle by the java autonomous program, and full extracted indexation request will be pushed to solr)
- Performance (number of solr calls, network go and back, etc...)
- portlets must work with Liferay
Liferay Portal Community Edition 6.1.1 CE GA2
Solr-web portlet : solr-web-184.108.40.206-ce-ga1
Only fixed quotes please.