Quick Ferret primer with examples

Need a fast, full-text search capability for your Rails app? Step forward Ferret and the acts_as_ferret plugin.

Ferret is a high-performance, full-featured text search engine library written for Ruby. It is inspired by Apache Lucene Java project.

Acts_as_ferret is a plugin for Ruby on Rails which makes it simple to implement full text search for Rails. It builds on Ferret which is a ruby port of Apache Lucene. It is a technology suitable for nearly any application that requires full-text search.

1. Install ferret

sudo gem install ferret

2. Install acts_as_ferret plugin

ruby script/plugin install -x svn://projects.jkraemer.net/acts_as_ferret/tags/stable/acts_as_ferret

3. Add acts_as_ferret to ActiveRecord model

class Item < ActiveRecord::Base
  acts_as_ferret
end

4. Search

Item.find_by_contents(query) # Query is a string representing your query

Very simple implementation for great search performance.

Advanced Usage

For some slightly advanced usage I needed to search text across a one-to-one relationship, plus page and sort the results. The following class declaration shows the Item has a related ItemDescription (containing a description field) that is included in the search index (via the description method in Item). The title is also given a boost so that matches in the title field have more importance than the description. I also needed to be able to sort the results by published date (pub_date). This required a conversion of the datetime field to integer for correct sorting.

item.rb

class Item < ActiveRecord::Base
  has_one :item_description

  acts_as_ferret :fields => {:title => {:boost => 2, :index => :untokenized},
                             :description => {},
                             :pub_date_sort => {:index => :untokenized_omit_norms, :term_vector => :no}}

  def description
    @description ||= item_description.description
  end

  # To enable sorting by date it must be converted to an integer
  def pub_date_sort
    pub_date.to_i
  end
end

It is also a good idea to add a convenience method to the Item model to use the search:

item.rb

def self.full_text_search(q, options = {})
  return nil if q.nil? || q.empty?
  default_options = {:limit => 50, :page => 1}
  options = default_options.merge options
  options[:offset] = options[:limit] * (options.delete(:page).to_i-1)
  results = Item.find_by_contents(q, options)
  return [results.total_hits, results]
end

Add a method that creates a paginator in application.rb:

application.rb

def pages_for(size, options = {})
  default_options = {:per_page => 50}
  options = default_options.merge options
  pages = Paginator.new self, size, options[:per_page], (params[:page]||1)
  pages
end

Add a search method to the controller, note the use of a reverse sort so that the newest items (by published date) are returned first.

items_controller.rb

def search
    s = Ferret::Search::SortField.new(:pub_date_sort, :reverse => true)

    @query = params[:query]
    @item_count, @latest_items = Item.full_text_search(@query, {:page => (params[:page]||1), :sort => s})
    @item_pages = pages_for(@item_count)
end

The @item_pages can then be used by the standard Rails paginator in the view to provide paged search results.

References

Full text search in Ruby on Rails 3 – ferret


About this entry