Sunday 6 December 2009

Linux Desktop Search Engines Compared

Linux.com has an article about the various desktop search engines available on Linux. The author is looking for the one most suitable to his needs. Even though he ends up recommending Beagle, Pinot comes out relatively well. Perceived drawbacks are RAM and CPU usage, which I ought to revisit for the next release.

By the way, the comparison table on Wikinfo mentioned in the article can be found here.

Friday 13 November 2009

Pinot 0.95

At long last, a new release !
0.95 merges in Antoine Jacoutot's patches for the OpenBSD port, fixes the "path:" query filter and the handling of acronyms.

The search plugin for Bing was updated, while the plugins for Exalead and IOI were removed.
Common historical data operations were optimized, which should speed the daemon up a bit.
If you have gtk2 2.16 or newer, the query text field will have an embedded icon on the right-hand side, similarly to Firefox' search box.

Finally, translations in Dutch, French, German, Hebrew, Portuguese and Spanish were updated. Many thanks to the various people who helped with these updates.

The Web site got a lot of visits over the last few months from Bulgaria and the Czech Republic, but unfortunately Pinot hasn't been translated for these countries' languages. If you can help with that, please consider joining Launchpad.

For the full details, see the NEWS file. Get the source from the download page.

Wednesday 19 August 2009

OpenBSD port

Antoine Jacoutot successfully ported Pinot to OpenBSD. The package is available on OpenPorts.se.
I will merge the related patches into the (much delayed) next release.
Thanks Antoine !

Friday 26 June 2009

Pinot 0.94

Long time, no release. Pinot 0.94 is out today and brings :
- changes to the daemon's DBus interface to tell the UI to reopen the index when it has changed on disk.
- a new search filter "inurl" that allows finding files nested in an mbox or an archive at a given URL.
- the ability to view the properties of documents from an external index.
- better MIME type detection, which means fewer calls to external uncompressor programs when dealing with archives and fixes cases where documents nested in them couldn't be open and viewed.
- the ability to index Debian packages.
- fixes to the mbox filter to fully work with GMime 2.4 (now required).
- a whole bunch of other bug fixes.

For the full details, see the NEWS file. Get the source tarball and RPM from the download page.

Thursday 4 June 2009

It's June already

Since the release of 0.93, I haven't been able to spend much time on Pinot. Things should go back to "normal" in a week or two.

I am waiting for the release of Fedora 11, first because I am an avid Fedora user, and second because it comes with gcc 4.4.0 which will probably bring up a couple of interesting issues :-)

Seeveral people have reported a compile error in 0.93 related to a "undefined reference to `DocumentInfo::setSize(long long)'" in Tokenize/FilterUtils.cpp. If you have experienced this, try the fix from SVN revision 1635.

Web stats show that http://pinot.berlios.de/ had a lot of visits from Eastern Europe over the last few months, mostly from Bulgaria, Lithuania, the Czech Republic and Poland. These countries' languages are not supported yet, so if anyone would like to help translate Pinot using Launchpad, I am sure the effort will be much appreciated.

Monday 13 April 2009

Pinot 0.93

A major bug crept in. On each run, the files history was reset in a way that caused the daemon to reindex all files (unless it was run in full scan mode).
I am not sure whether I should be embarrassed I didn't notice this bug earlier, or glad this kind of activity has become so unintrusive in recent releases...

Upgrading to 0.93 is heavily recommended. See the NEWS file, and get the source from the downloads page.

Friday 10 April 2009

Pinot 0.92

I am releasing a new version this Good Friday.

A lot of work has gone into reducing memory usage on indexing. To start with, getting a grip on the amount of memory used by any program is tricky. Simply freeing unused buffers is not enough. Small buffers or buffers sitting between in-use buffers may not be reclaimed by the OS. For Pinot, this is especially true : the daemon goes through a large number of files of different sizes, reads their contents which is run through filters to extract text. Each of these operations involve a transformation and a new memory buffer to hold the transformed data.
In order to minimize this, 0.92 allocates document content buffers from a memory pool backed by the malloc allocator, instead of the default STL allocator. Once in a while, the pool is released and malloc_trim() is called to hint that freed memory can be returned to the system.

Not to make things easier, measuring memory usage effectively is not exactly straight-forward. I chose to focus on the %MEM figure shown by top for the daemon and found that on my test box, it will rise slower than with 0.91, peak at a value 30 to 50 percent below 0.91's final memory usage, go down to a single-digit value then rise before coming down again cyclically. It's probably not perfect but future releases will bring further tweaks.

There's also a new filter based on libarchive that allows indexing the content of tar files (compressed or not) and ISO images. The UI can open/view files within indexed archives, just like it's long been able to open/view mbox messages and attachments. On the way, I partially redesigned how documents nested in other documents are indexed and as a consequence, indexes created with previous releases will be automatically upgraded.

Finally, 0.91 and older suffered from a bug that could cause a crash when libxml2 2.7.3 was used. This was fixed.

See the NEWS file for the details, and grab the source from the download page.

Friday 6 March 2009

Pinot 0.91 is out

Time for another release. Ideally, this would have come out before the end of February, but the delay was worth it.

This release focuses on two things :
- fixing memory leaks that hit initial indexing badly.
With the help of valgrind and John Werden, several memory leaks were identified and fixed. I also rewrote the HTML filter based on the HTML parser from Xapian Omega after witnessing problems and deciding ripping the filter's guts out would probably save a lot of time.
- improving command-line integration.
Stored queries created with the UI can be run with pinot-search. Similarly, pinot-index can open My Web Pages, My Documents or any other UI-configured index by name. In addition, it can finally deal with relative paths and index local directories recursively. This makes pinot-index a good alternative to omindex.

I have experienced some crashes in the UI when querying OpenSearch-based engines such as IOI, and found that these crashes went away after downgrading libxml2 from 2.7.3 to 2.7.2. I will look into this in more details during the 0.92 cycle.

See the NEWS file for the complete list of changes. Grab the source from the download page or the binaries from your distro's packages repository in a few days time.

Thursday 29 January 2009

Pinot 0.90

To celebrate the start of the year of the Ox, I am releasing Pinot 0.90 !

Since the release of 0.89 in September, a lot of changes have been made on several fronts :
- Unicode text.
Charset conversion errors are better handled, tokenizing was improved and leads to far less "rubbish" terms. Issues the UI had with non-Latin locales were resolved
- Portability.
The code base builds with MingW and hopefully without too many problems with GCC 4.4.
- Web metasearch.
Plugins were updated, extracts and results URLs are more accurate.
- more coherent UI.
Some features were previously only available in search mode, others in browse mode; some were duplicated. The new menu layout tries to unify both modes. The status window' refresh is smoother. Preferences can be open separately from the UI. Spelling suggestions are less invasive, they pop up in the same tabs as queries results.
- improved More Like This.
Stored queries generated on More Like This don't include the original query's terms, stopwords, infrequent terms or similar terms if the stemming language is set.
- command-line and desktop integration.
pinot-cd.sh implements a "tagged cd", and lets one change the shell's current directory to the directory that matches the path elements passed as parameter. The Deskbar module shows snippets when used with Deskbar 2.24.
- more flexible daemon.
The daemon is smarter at crawling symlinks, it skips those that refer to locations that have been crawled or that it knows will be crawled. It is much better at resuming where it stopped after user interruption. While user-set meta-data would previously be lost on a reindex, or when the file changed on the disk, it's now preserved.

With all these out of the way, I will be able to return to a monthly release cycle where each release brings incremental improvements and bug fixes and bring the project to its 1.0 release.

I would like to thank Adrian Bunk, Adel Gadllah, Martin Michlmayr, C. Scott Ananian and especially John Werden for their contributions to this release in the form of patches, ideas, suggestions and testing. As always the NEWS file has the details. Head to the download page to get the source.