Mon, 06 Aug 2007

Rails searching with Sphinx

Posted by Ben Mon, 06 Aug 2007 19:30:00 GMT

Over the weekend I was implementing search for trawlr.com using Sphinx, the nginx of the search world (fast and Russian) according to Evan Weaver. Previously I was using Ferret, but I had to remove the search feature almost immediately due to the ferret indexes constantly corrupting and causing me a major headache. I decided to drop ferret in favour of Sphinx which I’ve lots of good things about recently.

Installation on my MacBook Pro required a slight adjustment of the mysql directories with mysql5 from MacPorts.

$ wget http://www.sphinxsearch.com/downloads/sphinx-0.9.7.tar.gz
$ tar xvzf sphinx-0.9.7.tar.gz
$ cd sphinx-0.9.7
$ ./configure --with-mysql-includes=/opt/local/include/mysql5/mysql/ --with-mysql-libs=/opt/local/lib/mysql5/mysql/
$ make
$ sudo make install

Initally, I chose to use Evan’s UltraSphinx plugin and it was very helpful to start with by auto-generating the sphinx.conf. After indexing the entire content of trawlr.com – almost 1.5 million blog posts – in just a few minutes I was suitably impressed. The search speed was also lightening fast. Unfortunately I had problems with my Rails app with the UltraSphinx plugin installed – very strange errors started occurring.

Having already looked at the alternative Sphinx plugins I decided to try acts_as_sphinx. After some small tweaks to the sphinx.conf file (and a re-index) the search was working and more importantly so was my Rails app. An alternative option is Sphincter which I did experiment with but struggled with the limited documentation – mostly concerning the configuration file required but YMMV.

$ rake sphinx:index
$ rake sphinx:start

Indexing on my MacBook Pro…

$ time rake sphinx:index

using config file 'sphinx.conf'...
indexing index 'items'...
collected 1455733 docs, 1255.2 MB
sorted 182.4 Mhits, 100.0% done
total 1455733 docs, 1255246639 bytes
total 438.695 sec, 2861316.50 bytes/sec, 3318.32 docs/sec

real    7m25.307s
user    4m28.963s
sys     0m17.578s

Searching with acts_as_sphinx via the console (ruby script/console) for the term ‘Google’, sorted by published date.

>> search = Item.find_with_sphinx 'Google', :sphinx => {:sort_mode => [:attr_desc, 'pub_date'], :page => 1}, :order => 'items.pub_date DESC'; 0
=> 0
>> search.total
=> 1000
>> search.total_found
=> 73717
>> search.time       
=> "0.000" 

That’s with an index of 943Mb and almost 1.5 million items. Note that the search results are limited to 1,000 items due to the settings in my spinx.conf file.

Within the Rails controller, search is done via:

@items = Item.find_with_sphinx(params[:query], 
      :sphinx => {:sort_mode => [:attr_desc, 'pub_date'], :limit => 50, :page => (params[:page] || 1)}, 
      :order => 'items.pub_date DESC')

Updating the Sphinx index

There’s another rake task for updating the Sphinx index which can be called via a cron job, rather than ‘live’ updates. The rotate command allows the index to be rebuilt whilst the Sphinx daemon is running, forcing a restart once completed.

$ rake sphinx:rotate

Update

  • It looks like the UltraSphinx plugin requires edge Rails (thanks Evan)!
  • I’ve deployed the search updates to the live trawlr.com site (currently search is only visible from the reader view for logged in users).
  • Live search is fantastically quick
  • Created some new rake tasks (to go in lib/tasks/sphinx.rake) that allow you to have a sphinx.conf file per Rails environment (config/sphinx.development.conf and config/sphinx.production.conf). The available tasks are: s:index, s:rotate, s:start, s:stop, s:status and s:restart. The rake tasks assume that the Sphinx pid file is in the log directory (pid_file = log/searchd.pid).
Comments

Leave a response

  1. evan 27 minutes later:

    What errors did you get? It might not work well if you’re not on edge Rails.

  2. Ben 33 minutes later:

    @evan – Ah, that’s probably why it all went tits-up – I’m running Rails 1.2.3 rather than Edge.

  3. evan about 4 hours later:

    Yeah, sorry about that. I added a note to the documentation.

  4. Eric Hodel 14 days later:

    For Sphincter there is no configuration file. I built it that way on purpose.

    If you need to tweak something you can, but most of the time you won’t.

  5. Joey Marchy 25 days later:

    Thanks for the Article Ben. I too am having issues with ferret and indexes so I am looking to implement Sphinx as my search also. I saw the UltraSphinx plugin requires running rails edge. Do you know if the acts_as_spinx will run without edge?

    I am currently running rails 1.2.3

  6. Ben 25 days later:

    acts_as_sphinx doesn’t require Edge; I’m using it with Rails 1.2.3.

  7. Eric Hodel 25 days later:

    Sphincter runs on 1.2.3 and has per-environment sphinx.conf files built-in.

  8. nguma 4 months later:

    did you manage to use the filters? can’t seem to find the correct syntax.

    tried :sphinx => {:filter => [‘attribute_name’,[1,2,3]]} without any success

  9. urlsbox 4 months later:

    Very useful information. Thanks !

  10. seog 6 months later:

    Thanks for the tutorial! I am looking for a good search engine for rails and it seems more people are mentioning Sphinx over Ferret due to stability issues in deployment.

  11. Jan 8 months later:

    I cannot get it running on rails 1.2.3 with sphinx 0.9.7. :-/

    @articles = Article.find_with_sphinx(‘Jan’, :sphinx => {:page => @page})

    Sphinx::SphinxInternalError in ArticleController#fulltext_search searchd error: 112

    /Users/jan/Documents/Platinnetz/vendor/plugins/acts_as_sphinx/lib/sphinx.rb:437:in `get_response’ /Users/jan/Documents/Platinnetz/vendor/plugins/acts_as_sphinx/lib/sphinx.rb:232:in `query’ /Users/jan/Documents/Platinnetz/vendor/plugins/acts_as_sphinx/lib/acts_as_sphinx.rb:83:in `ask_sphinx’ /Users/jan/Documents/Platinnetz/vendor/plugins/acts_as_sphinx/lib/acts_as_sphinx.rb:112:in `find_with_sphinx’ /Users/jan/Documents/Platinnetz/app/controllers/article_controller.rb:353:in `fulltext_search’ /Library/Ruby/Gems/1.8/gems/mongrel-1.1.3/lib/mongrel/rails.rb:76:in `process’ /Library/Ruby/Gems/1.8/gems/mongrel-1.1.3/lib/mongrel/rails.rb:74:in `synchronize’ /Library/Ruby/Gems/1.8/gems/mongrel-1.1.3/lib/mongrel/rails.rb:74:in `process’ /Library/Ruby/Gems/1.8/gems/mongrel-1.1.3/lib/mongrel.rb:159:in `process_client’ /Library/Ruby/Gems/1.8/gems/mongrel-1.1.3/lib/mongrel.rb:158:in `each’ /Library/Ruby/Gems/1.8/gems/mongrel-1.1.3/lib/mongrel.rb:158:in `process_client’ /Library/Ruby/Gems/1.8/gems/mongrel-1.1.3/lib/mongrel.rb:285:in `run’ /Library/Ruby/Gems/1.8/gems/mongrel-1.1.3/lib/mongrel.rb:285:in `initialize’ /Library/Ruby/Gems/1.8/gems/mongrel-1.1.3/lib/mongrel.rb:285:in `new’ /Library/Ruby/Gems/1.8/gems/mongrel-1.1.3/lib/mongrel.rb:285:in `run’ /Library/Ruby/Gems/1.8/gems/mongrel-1.1.3/lib/mongrel.rb:268:in `initialize’ /Library/Ruby/Gems/1.8/gems/mongrel-1.1.3/lib/mongrel.rb:268:in `new’ /Library/Ruby/Gems/1.8/gems/mongrel-1.1.3/lib/mongrel.rb:268:in `run’ /Library/Ruby/Gems/1.8/gems/mongrel-1.1.3/lib/mongrel/configurator.rb:282:in `run’ /Library/Ruby/Gems/1.8/gems/mongrel-1.1.3/lib/mongrel/configurator.rb:281:in `each’ /Library/Ruby/Gems/1.8/gems/mongrel-1.1.3/lib/mongrel/configurator.rb:281:in `run’ /Library/Ruby/Gems/1.8/gems/mongrel-1.1.3/bin/mongrel_rails:128:in `run’ /Library/Ruby/Gems/1.8/gems/mongrel-1.1.3/lib/mongrel/command.rb:212:in `run’ /Library/Ruby/Gems/1.8/gems/mongrel-1.1.3/bin/mongrel_rails:281

  12. Jan 8 months later:

    Another question on the per-environment sphinx configs. How did you modify acts_as_sphinx to be using different servers for different environments?

  13. Jan 8 months later:

    Found it. The above error happened, because I set max_matches = 120. I found out, when I installed Dmytros client API and ran a query.

    Sphinx::SphinxInternalError: searchd error: per-query max_matches=1000 out of bounds (per-server max_matches=120) from /Users/jan/Documents/test/Sphinx/vendor/plugins/sphinx-0.3.1/lib/client.rb:622:in `GetResponse’ from /Users/jan/Documents/test/Sphinx/vendor/plugins/sphinx-0.3.1/lib/client.rb:362:in `Query’ from (irb):2

    Setting @maxmatches = 120 in acts_as_sphinx’ sphinx.rb fixed the problem.

    Passing on this helpful error message instead of “112” would be great.

Comments