Rails searching with Sphinx
Over the weekend I was implementing search for trawlr.com using Sphinx, the nginx of the search world (fast and Russian) according to Evan Weaver. Previously I was using Ferret, but I had to remove the search feature almost immediately due to the ferret indexes constantly corrupting and causing me a major headache. I decided to drop ferret in favour of Sphinx which I’ve lots of good things about recently.
Installation on my MacBook Pro required a slight adjustment of the mysql directories with mysql5 from MacPorts.
$ wget http://www.sphinxsearch.com/downloads/sphinx-0.9.7.tar.gz $ tar xvzf sphinx-0.9.7.tar.gz $ cd sphinx-0.9.7 $ ./configure --with-mysql-includes=/opt/local/include/mysql5/mysql/ --with-mysql-libs=/opt/local/lib/mysql5/mysql/ $ make $ sudo make install
Initally, I chose to use Evan’s UltraSphinx plugin and it was very helpful to start with by auto-generating the sphinx.conf. After indexing the entire content of trawlr.com – almost 1.5 million blog posts – in just a few minutes I was suitably impressed. The search speed was also lightening fast. Unfortunately I had problems with my Rails app with the UltraSphinx plugin installed – very strange errors started occurring.
Having already looked at the alternative Sphinx plugins I decided to try acts_as_sphinx. After some small tweaks to the sphinx.conf file (and a re-index) the search was working and more importantly so was my Rails app. An alternative option is Sphincter which I did experiment with but struggled with the limited documentation – mostly concerning the configuration file required but YMMV.
$ rake sphinx:index $ rake sphinx:start
Indexing on my MacBook Pro…
$ time rake sphinx:index using config file 'sphinx.conf'... indexing index 'items'... collected 1455733 docs, 1255.2 MB sorted 182.4 Mhits, 100.0% done total 1455733 docs, 1255246639 bytes total 438.695 sec, 2861316.50 bytes/sec, 3318.32 docs/sec real 7m25.307s user 4m28.963s sys 0m17.578s
Searching with acts_as_sphinx via the console (ruby script/console) for the term ‘Google’, sorted by published date.
>> search = Item.find_with_sphinx 'Google', :sphinx => {:sort_mode => [:attr_desc, 'pub_date'], :page => 1},
rder => 'items.pub_date DESC'; 0
=> 0
>> search.total
=> 1000
>> search.total_found
=> 73717
>> search.time
=> "0.000"
That’s with an index of 943Mb and almost 1.5 million items. Note that the search results are limited to 1,000 items due to the settings in my spinx.conf file.
Within the Rails controller, search is done via:
@items = Item.find_with_sphinx(params[:query],
:sphinx => {:sort_mode => [:attr_desc, 'pub_date'], :limit => 50, :page => (params[:page] || 1)},
rder => 'items.pub_date DESC')
Updating the Sphinx index
There’s another rake task for updating the Sphinx index which can be called via a cron job, rather than ‘live’ updates. The rotate command allows the index to be rebuilt whilst the Sphinx daemon is running, forcing a restart once completed.
$ rake sphinx:rotate
Update
- It looks like the UltraSphinx plugin requires edge Rails (thanks Evan)!
- I’ve deployed the search updates to the live trawlr.com site (currently search is only visible from the reader view for logged in users).
- Live search is fantastically quick
- Created some new rake tasks (to go in
lib/tasks/sphinx.rake) that allow you to have a sphinx.conf file per Rails environment (config/sphinx.development.confandconfig/sphinx.production.conf). The available tasks are:s:index,s:rotate,s:start,s:stop,s:statusands:restart. The rake tasks assume that the Sphinx pid file is in the log directory (pid_file = log/searchd.pid).
About this entry
You’re currently reading “Rails searching with Sphinx,” an entry on Slash Dot Dash
- Published:
- 08.06.07 / 8pm
- Category:
- Ruby on Rails

Comments are closed
Comments are currently closed on this entry.