Mar 27, 2009

Bidirectional many-to-many relationship in ActiveRecord

Bidirectional many-to-many relationship is a common pattern in design, like friendships between users, memberships between users and groups, etc. In this article I'll illustrate how to implement such a relationship in ActiveRecord/Rails.

Let's start with a little context. Suppose we want to add the populer 'friends' feature for our users, we already have User model, what we need is a join table 'friendships' to connect users.


class User < ActiveRecord::Base
end

class CreateFriendships < ActiveRecord::Migration
def self.up
create_table :friendships, :id => false do |t|
t.integer :left_user_id
t.integer :right_user_id
end
end

def self.down
drop_table :friendships
end
end


The behavior we want, can be written like this:


test "should has many friends" do
users(:quentin).friends << users(:aaron)
assert_equal 1, users(:quentin).friends.reload.count
end


Of course the test is failed when we run it now. After reading our requirements we decide to use has_and_belongs_to_many (habtm) in ActiveRecord because it is enough for now, so we modify our User model like this:


class User < ActiveRecord::Base
has_and_belongs_to_many :friends, :join_table => 'friendships',
:foreign_key => 'left_user_id', :association_foreign_key => 'right_user_id',
:class_name => 'User'
end


Now run our test, passed, perfect! We can leave office and enjoy *fill whatever you like* now.

"Wait", your brilliant colleague says, and add one line to your unit test:


test "should has many friends" do
users(:quentin).friends << users(:aaron)
assert_equal 1, users(:quentin).friends.reload.count
assert_equal 1, users(:aaron).friends.reload.count
end


He runs the test again and it failed. It's unfair if quentin treat aaron as his friend but aaron doesn't do the same to quentin, isn't it? To fix this problem we need to custom insert and delete sql for the habtm relationship:


class User < ActiveRecord::Base
has_and_belongs_to_many :friends, :join_table => 'friendships',
:foreign_key => 'left_user_id', :association_foreign_key => 'right_user_id',
:class_name => 'User',
:insert_sql => 'insert into friendships (`left_user_id`, `right_user_id`) values
(#{id}, #{record.id}), (#{record.id}, #{id})',
:delete_sql => 'delete from friendships where (left_user_id = #{id} and right_user_id
= #{record.id}) or (left_user_id = #{record.id} and right_user_id = #{id})'
end


What we do here, is to add two friendship records (both direction) when we add a user to another's friends set. We do the same when delete a user from one's friends set. We rerun the test and it passes as we expected.

Note the ":id => false" argument when we create the join table 'friendships', without it you'll have troubles when loading friends objects. I think this is a long history bug of ActiveRecord, I don't know why it is not fixed. If you really want to keep the 'id' field and use habtm at the same time, a workaround is customize the finder_sql:


class User < ActiveRecord::Base
has_and_belongs_to_many :friends, ...
:finder_sql => 'select users.* from friendships left outer join users on
friendships.right_user_id = users.id where friendships.left_user_id = #{id}'
end


The bidirectional habtm we build here, is in fact a self-referential bidirectional relationship, it relates models to the same type of models (users to users). What will happen if the relationship is NOT self referential?

In that case we should not use habtm, otherwise our (at least my) brain will be burned by the complicated sqls it brings. Don't waste time on those sqls, ActiveRecord provides another way to build many-to-many relationship: has_many :through

So one day the boss comes to you and asks, "Can we group our friends? I want to put Gates in group Evil and Male, and Linus in group Minix and Antarctic"

"Sure", you answered, how can you say No to your boss?

Now the many-to-many relationship is between groups and users, we need to modify User model, create a Group model and write a migration to modify friendships table. Since we'll use has_many :through for this requirement, we also need a model for the join table 'friendships':


class User < ActiveRecord::Base
has_many :groups
end

class CreateGroups < ActiveRecord::Migration
def self.up
create_table :groups do |t|
t.string :name
t.integer :user_id
end
end

def self.down
drop_table :groups
end
end

class Group < ActiveRecord::Base
belongs_to :user
has_many :friendships
has_many :friends, :through => :friendships
end

class RemodelFriendships < ActiveRecord::Migration
def self.up
remove_column :friendships, :left_user_id
rename_column :friendships, :right_user_id, :friend_id
change_column :friendships, :group_id, :integer, :null => false
end

def self.down
change_column :friendships, :group_id, :integer, :null => true
rename_column :friendships, :friend_id, :right_user_id
add_column :friendships, :left_user_id
# execute "..."
end
end

class Friendship < ActiveRecord::Base
belongs_to :group
belongs_to :friend, :class_name => 'User'
validates_uniqueness_of :friend_id, :scope => :group_id
end


Pretty good, it works. But we have the same problem when we use habtm, the relationship is not bidirectional. How can we fix it?

ActiveRecord allow you to extend the association proxy when use has_many :through, the solution here is to override the proxy's default CRUD methods. Let's create a new file lib/bidirection.rb:


module Bidirection
def <<(friend)
Friendship.create :group_id => friend.groups.default.id, :friend_id => self.proxy_owner.user.id
super(friend)
end

[:delete, :destroy].each {|m|
define_method(m) {|friends|
friends = [friends] unless friends.instance_of?(Array)
friends.each {|friend|
Friendship.first(:conditions => {:group_id => friend.groups.default.id, :friend_id => self.proxy_owner.user.id}).try m
}
super(friends)
}
}

end


Then enhance our has_many :through with it:


class Group < ActiveRecord::Base
...
has_many :friends, :through => :friendships, :extend => Bidirection
end


That's it, you have bidirectional many-to-many relationship between groups and users now. For convinient you can create a default group when create user (use before_create hook), and delegate friends and friends= methods in User model to user's default group, in that way you get the self-referential bidirectional many-to-many relationship betweent users back, as we have when using habtm.

Mar 19, 2009

ActiveRecord and DataMapper

I jumped into a merb project about 1 month ago, it's really an interesting trip. Working in both frameworks make pros and cons crystal clear to see. Here I'll try to remember some of those I found on the different ORM they use, thus ActiveRecord vs DataMapper.

* Schema Definition

ActiveRecord keep all schema definitions in migration files, while DataMapper store most of them in model source code. I prefer DataMapper's way, so you won't need plugins like annotate_models any more. Do you use annotate_models in your rails project?

However, keep model schema in model source code brings more complexity, because sometimes you have tasks which are better written in a migration file instead of model source file. So DataMapper has seperate migrations too, and merb provides more migration rake tasks than rails.

* Many-to-Many Relationships Through Scoped Join Model

Sorry for my poor description, this ticket illustrates the problem well. I don't know why this bug won't be fixed in ActiveRecord - below code works well in DataMapper.


has n, :bookmarks, :through => :subscriptions, :conditions => "subscriptions.notification = 'f'"


* scope

ActiveRecord introduced named_scope in since 2.x (can't remember), and add dynamic scope (scope_by_xxx like dynamic finder find_by_xxx) in the latest 2.3 release, so now you can chain up many dynamic scopes, named scope and find now. That's really cool, but why we need two seperate finders, find and scope?

In DataMapper, "find" and "scope" is unified. You can do this


class Zoo
# all the keys and property setup here
def self.open
all(:open => true)
end
def self.big
all(:animal_count.gte => 1000)
end
end

big_open_zoos = Zoo.big.open


As you can see, the finder #all is able to be chained up or used independently, smart. And I like tye syntax all(:open => true) than find_all_by_open(true): hash can be auto complete by editors while '_' can not, hash param doesn't require dynamic tricks while find_xxx need. Sometimes I feel rails takes too many cares to his baby programmers, sometimes I feel merb takes too little (not in this case, of course).

* Inheritance vs Include

ActiveRecord model is required to inherit from ActiveRecord::Base, while DataMapper model need to include DataMapper::Resource. Composition is better than inheritance, though module inclusion is in fact inheritance in Ruby.

* Finally

I didn't realize all the things I written down are votes for DataMapper until I write this sentence .. I want to say ActiveRecord is an excellent ORM framework too, at least it's more mature than DataMapper, I enjoy it at most of the time, it just doesn't work in some few case (I really hope the many-to-many relationship through scope join model problem can be fixed, it's a common pattern I've seen it serveral times). The good news is by Rails and Merb merge, rails will become super flexible and you'll be able to switch between ORMs in Rails 3 easily.

Mar 11, 2009

The problem is

"As is often the case when proving things about programming languages, the tricky part here is formulating a precise statement to be proved—the proof itself should be straightforward." - TAPL, pierce

In schools students are always given clearly defined, already formalized questions, I think that's why the students feel uncomfortable when they come to the real world. I should spend more time on the mathematical modeling class when I was in university :(

Feb 24, 2009

Use rails.vim and merb.vim together

I recently works on a merb project. As a vim user I found merb.vim for syntax highlighting, but it will conflict with rails.vim. For example, *before_filter* is a declaration in Rails controller, but in Merb it should *before*. After you installed merb.vim, it will set the file type of your controller file to ruby.merb_controller, so if you open a controller file in a rails project, the 'before_filter' will be highlighten as an error.

The solution is to find a way to have directory/project specific setting for vim. Luckily it already exists.

In you ~/.vimrc:

set exrc

Then move ~/.vim/ftdetect/merb.vim (this file is installed by merb.vim plugin) to $your_merb_project/.vimrc. Now the filetype setting for merb will only affects the merb project dir only.

Feb 16, 2009

Rules a Monad should follow

I. return x >>= f === f x

II. x >>= return === x

III. m >>= (\x -> f x >>= g) === (m >>= f) >>= g

Feb 7, 2009

Kernel#eval

There's an interesting post in ruby mailing list today. The discussion started by a question asked by Jonathan Wills:
a = 1
eval('a=2')
puts a

This will print out '2', as I want. However if I remove the first line,

eval('a=2')
puts a

r.rb:2:in `
': undefined local variable or method `a' for
main:Object (NameError)

Then Mike Gold gives an excellent explaination:
The thing to remember is that local variables are always determined at parse time.

eval("a = 3", binding)
p local_variables #=> ["a"]
p a #=> undefined local variable or method `a'

We see that the local exists, but because the parser has not seen the "a = ..." syntax, we can't access it.

Locals are not handled in the same way. The local assignment syntax is subject to special examination by the parser. This does not happen for instance variables and globals. Again, it is the parser that determines locals. If all assignments were the same, this would print '3':

eval("a = 3", binding)
puts a

Pickaxe gives this example:

def a
print "Function 'a' called\n"
99
end

for i in 1..2
if i == 2
print "a=", a, "\n"
else
a = 1
print "a=", a, "\n"
end
end

produces:

a=1
Function 'a' called
a=99

But constants and instance variables are different. Which means

eval("@a=2")
@a # => 2

Another interested thing Mike metioned is, when you see this error:

undefined local variable or method `a' for main:Object (NameError)

It in fact means 'undefined method a':
... that error message comes from rb_method_missing(), i.e., it was determined to be a method call. Presumably it is worded that way for the user's benefit, in case there was a misspelling or some such. Locals are determined by the parser; method lookup is done when no such local exists.

Feb 4, 2009

Notes of Guy Steele's Lisp Processor Prototype

The prototype in paper is different from the well known 'Lisp Machine' which in fact has a hybrid (traditional linear vector memory model + linked list model) architecture (e.g. based on TTL logic, with large ALU) while the prototype is totally build on linked list model (with embedded instruction set and almostly no ALU). The history of this project is a very interesting story :) For youth like me, it's hard to imagine that one obstacle of building such a processor is ALU is too large at that time ..

And this paper make me think it's not fair to say modern GC (garbage collection) is an innovation from Lisp (you know some Lisp guys always like to say 'hey, XXX has already be seen in Lisp 30 years ago). As you can read from the paper, GC is an inevitable workaround to fill the gap between functional programming (which will produce lots of immutable intermedia) and von Neumann machine (finite memory), it's not invented to free programmers from C++'s malloc/free hell like Java (which is an imperative language). Like Watt found water can be heated into steam to drive machine is of course an innovation, but use water to generate electricity is a totally diffierent thing.

Most of the contents below are copied from GLS & GJS's paper "Design of LISP-Based Processors", March 1979.

== Notes of the paper ==

An idea which has increasingly gained attention is that computer architectures should reflect specific language structures to be supported.

Stored-program computer is such one that the program and the data reside in the same memory, the program is itself data which can be manipulated as any other data by the processor. Lisp is unusual among high-level languages in that it explicitly supports the stored-program idea, for this reason Lisp is often referred to as a 'high-level machine language'.

One of the central ideas of the Lisp language is that storage management should be completely invisible to the programmer. Lisp is an object-oriented language, rather than a value-oriented language. The Lisp programmer does not think of variables as the objects of interest, bins in which values can be held. Instead, each data item is itself an object, which can be examined and modified, and which has an identity independent of the variable used to name it.

A complete Lisp system is conveniently divided into two parts: (1) a storage system, which provides an operator for the creation of new data objects and also other operators (such as pointer traversal) on those objects; and (2) a program interpreter (EVAL), which executes programs expressed as data structures within the storage system. (Note that this memory/processor division characterizes the usual von Neumann architecture also. The differences occur in the nature of the processor and the memory system.)

Commercially available memories are available only in finite sizes. Lisp (or functional language)'s free and wasteful throw-away use of data objects would be a disaster with finite memory. In order to make such memories useable to Lisp (or functional language) we must interpose between EVAL and the storage system a storage manager which makes a finite vector memory appear to the evaluation mechanism to be an infinite linked-record memory. The memory is "apparently infinite" in the sense that an indefinitely large number of new records can be "created" using the CONS operator. The storage manager recycles discarded records in order to create new ones in a manner completely invisible to the evaluator.

The storage manager therefore consists of routines which implement the operations CAR, CDR, CONS, etc. in terms of the vector memory, plus a garbage collector which deals with the finiteness of the memory by locating records which have been discarded and making them available to the CONS routine for recycling.

In Guy Steele's Lisp processor prototype, there is no garbage collector for some reason. And there's even no ALU! (In fact it have simplest arithmetic and logical capabilities, which can only add 1 and test for 0) This is interesting because Lisp itself is so simple that the interpreter needs no arithmetic to run interesting programs (such as computing symbolic derivatives and integrals, or pattern matching).

This is not to say that real Lisp programs do not need arithmetic, just that the Lisp interpreter itself does not require binary arithmetic of the usual sort (but it does require CONS, CAR and CDR, which in a formal sense indeed form a kind of "number system", where CONS corresponds to "add 1" and both CAR and CDR to "substract 1". In this view, the purpose of the storage manager is to interface between two kinds of arithmetic, namely "Lisp arithmetic" and Peano arithmetic).