Friday, 10 April 2009

Pinot 0.92

I am releasing a new version this Good Friday.

A lot of work has gone into reducing memory usage on indexing. To start with, getting a grip on the amount of memory used by any program is tricky. Simply freeing unused buffers is not enough. Small buffers or buffers sitting between in-use buffers may not be reclaimed by the OS. For Pinot, this is especially true : the daemon goes through a large number of files of different sizes, reads their contents which is run through filters to extract text. Each of these operations involve a transformation and a new memory buffer to hold the transformed data.
In order to minimize this, 0.92 allocates document content buffers from a memory pool backed by the malloc allocator, instead of the default STL allocator. Once in a while, the pool is released and malloc_trim() is called to hint that freed memory can be returned to the system.

Not to make things easier, measuring memory usage effectively is not exactly straight-forward. I chose to focus on the %MEM figure shown by top for the daemon and found that on my test box, it will rise slower than with 0.91, peak at a value 30 to 50 percent below 0.91's final memory usage, go down to a single-digit value then rise before coming down again cyclically. It's probably not perfect but future releases will bring further tweaks.

There's also a new filter based on libarchive that allows indexing the content of tar files (compressed or not) and ISO images. The UI can open/view files within indexed archives, just like it's long been able to open/view mbox messages and attachments. On the way, I partially redesigned how documents nested in other documents are indexed and as a consequence, indexes created with previous releases will be automatically upgraded.

Finally, 0.91 and older suffered from a bug that could cause a crash when libxml2 2.7.3 was used. This was fixed.

See the NEWS file for the details, and grab the source from the download page.

No comments: