Monday 13 April 2009

Pinot 0.93

A major bug crept in. On each run, the files history was reset in a way that caused the daemon to reindex all files (unless it was run in full scan mode).
I am not sure whether I should be embarrassed I didn't notice this bug earlier, or glad this kind of activity has become so unintrusive in recent releases...

Upgrading to 0.93 is heavily recommended. See the NEWS file, and get the source from the downloads page.

Friday 10 April 2009

Pinot 0.92

I am releasing a new version this Good Friday.

A lot of work has gone into reducing memory usage on indexing. To start with, getting a grip on the amount of memory used by any program is tricky. Simply freeing unused buffers is not enough. Small buffers or buffers sitting between in-use buffers may not be reclaimed by the OS. For Pinot, this is especially true : the daemon goes through a large number of files of different sizes, reads their contents which is run through filters to extract text. Each of these operations involve a transformation and a new memory buffer to hold the transformed data.
In order to minimize this, 0.92 allocates document content buffers from a memory pool backed by the malloc allocator, instead of the default STL allocator. Once in a while, the pool is released and malloc_trim() is called to hint that freed memory can be returned to the system.

Not to make things easier, measuring memory usage effectively is not exactly straight-forward. I chose to focus on the %MEM figure shown by top for the daemon and found that on my test box, it will rise slower than with 0.91, peak at a value 30 to 50 percent below 0.91's final memory usage, go down to a single-digit value then rise before coming down again cyclically. It's probably not perfect but future releases will bring further tweaks.

There's also a new filter based on libarchive that allows indexing the content of tar files (compressed or not) and ISO images. The UI can open/view files within indexed archives, just like it's long been able to open/view mbox messages and attachments. On the way, I partially redesigned how documents nested in other documents are indexed and as a consequence, indexes created with previous releases will be automatically upgraded.

Finally, 0.91 and older suffered from a bug that could cause a crash when libxml2 2.7.3 was used. This was fixed.

See the NEWS file for the details, and grab the source from the download page.