Fri, 24 Aug 2007

Ruby Hoedown 2007 Presentations

Posted by Ben Fri, 24 Aug 2007 19:15:00 GMT

Videos and slides for every session from the Ruby Hoedown 2007 (August 10th – 11th) are now available to watch on the confreaks.com site. Haven’t watched any of these yet but there are a few gems, such as Marcel Molina Jr.’s keynote on beautiful code.

Charity Workshop: Ruby and Rails Testing Techniques Marcel Molina, Jr., Bruce Tate, Chad Fowler

Exploring Merb Ezra Zygmuntowicz

Next-Gen VoIP Development with Ruby and Adhearsion Jay Phillips

Keynote Address: The Journey Bruce Tate

Building Games with Ruby Andrea O.K. Wright

Lightning Talks Various Authors

Does Ruby Have a Chasm to Cross? Ken Auer

Using C to Tune Your Ruby (or Rails) Application Jared Richardson

Keynote Address: What makes code beautiful? Marcel Molina, Jr.

Thu, 23 Aug 2007

Starting and Stopping MySql on Mac OS X

Posted by Ben Thu, 23 Aug 2007 20:27:00 GMT

Just a quick reminder, since I always forget how to do this in OS X (which uses launchd).

$ sudo launchctl start org.macports.mysql5
$ sudo launchctl stop org.macports.mysql5

(MySql 5 installed from MacPorts)

In Mac OS X v10.4 Tiger, Apple introduced a new system startup program called launchd. The launchd daemon takes over many tasks from cron, xinetd, mach_init, and init, which are UNIX programs that traditionally have handled system initialization, called systems scripts, run startup items, and generally prepared the system for the user. And they still exist on Mac OS X Tiger, but launchd has superseded them in many instances. These venerable programs are widely used by system administrators, open source developers, managers of web services, even consumers who want to use cron to manage iCal scheduling, and they can still be called with launchd.

Getting Started with launchd

Mon, 06 Aug 2007

Rails searching with Sphinx

Posted by Ben Mon, 06 Aug 2007 19:30:00 GMT

Over the weekend I was implementing search for trawlr.com using Sphinx, the nginx of the search world (fast and Russian) according to Evan Weaver. Previously I was using Ferret, but I had to remove the search feature almost immediately due to the ferret indexes constantly corrupting and causing me a major headache. I decided to drop ferret in favour of Sphinx which I’ve lots of good things about recently.

Installation on my MacBook Pro required a slight adjustment of the mysql directories with mysql5 from MacPorts.

$ wget http://www.sphinxsearch.com/downloads/sphinx-0.9.7.tar.gz
$ tar xvzf sphinx-0.9.7.tar.gz
$ cd sphinx-0.9.7
$ ./configure --with-mysql-includes=/opt/local/include/mysql5/mysql/ --with-mysql-libs=/opt/local/lib/mysql5/mysql/
$ make
$ sudo make install

Initally, I chose to use Evan’s UltraSphinx plugin and it was very helpful to start with by auto-generating the sphinx.conf. After indexing the entire content of trawlr.com – almost 1.5 million blog posts – in just a few minutes I was suitably impressed. The search speed was also lightening fast. Unfortunately I had problems with my Rails app with the UltraSphinx plugin installed – very strange errors started occurring.

Having already looked at the alternative Sphinx plugins I decided to try acts_as_sphinx. After some small tweaks to the sphinx.conf file (and a re-index) the search was working and more importantly so was my Rails app. An alternative option is Sphincter which I did experiment with but struggled with the limited documentation – mostly concerning the configuration file required but YMMV.

$ rake sphinx:index
$ rake sphinx:start

Indexing on my MacBook Pro…

$ time rake sphinx:index

using config file 'sphinx.conf'...
indexing index 'items'...
collected 1455733 docs, 1255.2 MB
sorted 182.4 Mhits, 100.0% done
total 1455733 docs, 1255246639 bytes
total 438.695 sec, 2861316.50 bytes/sec, 3318.32 docs/sec

real    7m25.307s
user    4m28.963s
sys     0m17.578s

Searching with acts_as_sphinx via the console (ruby script/console) for the term ‘Google’, sorted by published date.

>> search = Item.find_with_sphinx 'Google', :sphinx => {:sort_mode => [:attr_desc, 'pub_date'], :page => 1}, :order => 'items.pub_date DESC'; 0
=> 0
>> search.total
=> 1000
>> search.total_found
=> 73717
>> search.time       
=> "0.000" 

That’s with an index of 943Mb and almost 1.5 million items. Note that the search results are limited to 1,000 items due to the settings in my spinx.conf file.

Within the Rails controller, search is done via:

@items = Item.find_with_sphinx(params[:query], 
      :sphinx => {:sort_mode => [:attr_desc, 'pub_date'], :limit => 50, :page => (params[:page] || 1)}, 
      :order => 'items.pub_date DESC')

Updating the Sphinx index

There’s another rake task for updating the Sphinx index which can be called via a cron job, rather than ‘live’ updates. The rotate command allows the index to be rebuilt whilst the Sphinx daemon is running, forcing a restart once completed.

$ rake sphinx:rotate

Update

  • It looks like the UltraSphinx plugin requires edge Rails (thanks Evan)!
  • I’ve deployed the search updates to the live trawlr.com site (currently search is only visible from the reader view for logged in users).
  • Live search is fantastically quick
  • Created some new rake tasks (to go in lib/tasks/sphinx.rake) that allow you to have a sphinx.conf file per Rails environment (config/sphinx.development.conf and config/sphinx.production.conf). The available tasks are: s:index, s:rotate, s:start, s:stop, s:status and s:restart. The rake tasks assume that the Sphinx pid file is in the log directory (pid_file = log/searchd.pid).