Tue, 09 Jan 2007

Quick Ferret primer with examples

Posted by Ben Tue, 09 Jan 2007 22:44:00 GMT

Need a fast, full-text search capability for your Rails app? Step forward Ferret and the acts_as_ferret plugin.

Ferret is a high-performance, full-featured text search engine library written for Ruby. It is inspired by Apache Lucene Java project.

Acts_as_ferret is a plugin for Ruby on Rails which makes it simple to implement full text search for Rails. It builds on Ferret which is a ruby port of Apache Lucene. It is a technology suitable for nearly any application that requires full-text search.

1. Install ferret

sudo gem install ferret

2. Install acts_as_ferret plugin

ruby script/plugin install -x svn://projects.jkraemer.net/acts_as_ferret/tags/stable/acts_as_ferret

3. Add acts_as_ferret to ActiveRecord model

class Item < ActiveRecord::Base
  acts_as_ferret
end

4. Search

Item.find_by_contents(query) # Query is a string representing your query

Very simple implementation for great search performance.

Advanced Usage

For some slightly advanced usage I needed to search text across a one-to-one relationship, plus page and sort the results. The following class declaration shows the Item has a related ItemDescription (containing a description field) that is included in the search index (via the description method in Item). The title is also given a boost so that matches in the title field have more importance than the description. I also needed to be able to sort the results by published date (pub_date). This required a conversion of the datetime field to integer for correct sorting.

item.rb
class Item < ActiveRecord::Base
  has_one :item_description

  acts_as_ferret :fields => {:title => {:boost => 2, :index => :untokenized}, 
                             :description => {},
                             :pub_date_sort => {:index => :untokenized_omit_norms, :term_vector => :no}}

  def description
    @description ||= item_description.description
  end

  # To enable sorting by date it must be converted to an integer
  def pub_date_sort
    pub_date.to_i
  end
end

It is also a good idea to add a convenience method to the Item model to use the search:

item.rb
def self.full_text_search(q, options = {})
  return nil if q.nil? || q.empty?
  default_options = {:limit => 50, :page => 1}
  options = default_options.merge options
  options[:offset] = options[:limit] * (options.delete(:page).to_i-1)  
  results = Item.find_by_contents(q, options)
  return [results.total_hits, results]
end

Add a method that creates a paginator in application.rb:

application.rb
def pages_for(size, options = {})
  default_options = {:per_page => 50}
  options = default_options.merge options
  pages = Paginator.new self, size, options[:per_page], (params[:page]||1)
  pages
end

Add a search method to the controller, note the use of a reverse sort so that the newest items (by published date) are returned first.

items_controller.rb
def search
    s = Ferret::Search::SortField.new(:pub_date_sort, :reverse => true)

    @query = params[:query]
    @item_count, @latest_items = Item.full_text_search(@query, {:page => (params[:page]||1), :sort => s})
    @item_pages = pages_for(@item_count)
end

The @item_pages can then be used by the standard Rails paginator in the view to provide paged search results.

References

Full text search in Ruby on Rails 3 – ferret

Comments

Leave a response

  1. Ben 7 months later:

    Just a quick note to say that I am now using Sphinx instead of Ferret for all my Rails searching needs!

  2. Robert about 1 year later:

    How do I actually add a search bar on a given page for people to enter search items? Thanks for any help you can offer, I appreciate it.

    Robert

  3. Ben about 1 year later:

    Something like the following search form would do the trick.

    <% form_tag search_items_path, :method => :get do %>
      <%= text_field_tag 'query', '' %>
      <%= submit_tag 'Search'  %>
    <% end %>
  4. Bill about 1 year later:

    I’m using ferret for my application now, but I’m finding that I’ve hit the wall with its capabilities when it comes to sorting by something like the distance between two zip codes. The issue being is that the user posts a search form with a zip code and the index doesn’t know about anything other than the values on the indexed model, nothing external to it.

    Someone told me Sphinx (UltraSphinx specifically) would suit my needs, but I tried it out and had even more issues with it. I guess UltraSphinx has geo calculations or something built into it.

    Have you ever attempted something like this or maybe know what I might try to get this to work?

Comments