Posts Tagged ‘ PHP

Lucene

Part of one of our applications uses Lucene to generate an index and search documents. We initially built the system using Zend Lucence, part of the Zend Framework, which is a PHP port of the Lucene search engine written in Java. It was easy to integrate with the application, returned very good results, and was very fast on our test environments. Just when everything is going well the customer increases the number of documents by several orders of magnitude for us to search and makes their search queires more complex. The result web pages that were being served in microseconds were taking on average 10 minutes to load – not ideal. This service runs on a virtual machine; giving the VM access to more memory and more CPU brought the load time to 30seconds which is extremely frustrating for the users but workable and bought us some time. Unfortunately as PHP’s memory management is 32 bit – even on a 64 bit machine – the most memory it can use is 2G which means there is very little benefit increasing the memory of the virtual machine much past this.

Profiling the search showed stable memory, as we’d expect, the swap space being used when simulateous searches happened, CPU usage was high, but not with plenty of idle time, and a low proportion of waits for disk access. Disk access however was through the roof.

We knocked up a bit of Java to run the same search using the latest version of Lucene. We were expecting a significant performance improvement but I was totally taken aback . The PHP search was averaging 50 seconds; the Java version was close to half a second. Next step is to integrate the Java search into the PHP front end and let the end users loose…