Feb 24, 2009

Use rails.vim and merb.vim together

I recently works on a merb project. As a vim user I found merb.vim for syntax highlighting, but it will conflict with rails.vim. For example, *before_filter* is a declaration in Rails controller, but in Merb it should *before*. After you installed merb.vim, it will set the file type of your controller file to ruby.merb_controller, so if you open a controller file in a rails project, the 'before_filter' will be highlighten as an error.

The solution is to find a way to have directory/project specific setting for vim. Luckily it already exists.

In you ~/.vimrc:

set exrc

Then move ~/.vim/ftdetect/merb.vim (this file is installed by merb.vim plugin) to $your_merb_project/.vimrc. Now the filetype setting for merb will only affects the merb project dir only.

Feb 16, 2009

Rules a Monad should follow

I. return x >>= f === f x

II. x >>= return === x

III. m >>= (\x -> f x >>= g) === (m >>= f) >>= g

Feb 7, 2009

Kernel#eval

There's an interesting post in ruby mailing list today. The discussion started by a question asked by Jonathan Wills:
a = 1
eval('a=2')
puts a

This will print out '2', as I want. However if I remove the first line,

eval('a=2')
puts a

r.rb:2:in `
': undefined local variable or method `a' for
main:Object (NameError)

Then Mike Gold gives an excellent explaination:
The thing to remember is that local variables are always determined at parse time.

eval("a = 3", binding)
p local_variables #=> ["a"]
p a #=> undefined local variable or method `a'

We see that the local exists, but because the parser has not seen the "a = ..." syntax, we can't access it.

Locals are not handled in the same way. The local assignment syntax is subject to special examination by the parser. This does not happen for instance variables and globals. Again, it is the parser that determines locals. If all assignments were the same, this would print '3':

eval("a = 3", binding)
puts a

Pickaxe gives this example:

def a
print "Function 'a' called\n"
99
end

for i in 1..2
if i == 2
print "a=", a, "\n"
else
a = 1
print "a=", a, "\n"
end
end

produces:

a=1
Function 'a' called
a=99

But constants and instance variables are different. Which means

eval("@a=2")
@a # => 2

Another interested thing Mike metioned is, when you see this error:

undefined local variable or method `a' for main:Object (NameError)

It in fact means 'undefined method a':
... that error message comes from rb_method_missing(), i.e., it was determined to be a method call. Presumably it is worded that way for the user's benefit, in case there was a misspelling or some such. Locals are determined by the parser; method lookup is done when no such local exists.

Feb 4, 2009

Notes of Guy Steele's Lisp Processor Prototype

The prototype in paper is different from the well known 'Lisp Machine' which in fact has a hybrid (traditional linear vector memory model + linked list model) architecture (e.g. based on TTL logic, with large ALU) while the prototype is totally build on linked list model (with embedded instruction set and almostly no ALU). The history of this project is a very interesting story :) For youth like me, it's hard to imagine that one obstacle of building such a processor is ALU is too large at that time ..

And this paper make me think it's not fair to say modern GC (garbage collection) is an innovation from Lisp (you know some Lisp guys always like to say 'hey, XXX has already be seen in Lisp 30 years ago). As you can read from the paper, GC is an inevitable workaround to fill the gap between functional programming (which will produce lots of immutable intermedia) and von Neumann machine (finite memory), it's not invented to free programmers from C++'s malloc/free hell like Java (which is an imperative language). Like Watt found water can be heated into steam to drive machine is of course an innovation, but use water to generate electricity is a totally diffierent thing.

Most of the contents below are copied from GLS & GJS's paper "Design of LISP-Based Processors", March 1979.

== Notes of the paper ==

An idea which has increasingly gained attention is that computer architectures should reflect specific language structures to be supported.

Stored-program computer is such one that the program and the data reside in the same memory, the program is itself data which can be manipulated as any other data by the processor. Lisp is unusual among high-level languages in that it explicitly supports the stored-program idea, for this reason Lisp is often referred to as a 'high-level machine language'.

One of the central ideas of the Lisp language is that storage management should be completely invisible to the programmer. Lisp is an object-oriented language, rather than a value-oriented language. The Lisp programmer does not think of variables as the objects of interest, bins in which values can be held. Instead, each data item is itself an object, which can be examined and modified, and which has an identity independent of the variable used to name it.

A complete Lisp system is conveniently divided into two parts: (1) a storage system, which provides an operator for the creation of new data objects and also other operators (such as pointer traversal) on those objects; and (2) a program interpreter (EVAL), which executes programs expressed as data structures within the storage system. (Note that this memory/processor division characterizes the usual von Neumann architecture also. The differences occur in the nature of the processor and the memory system.)

Commercially available memories are available only in finite sizes. Lisp (or functional language)'s free and wasteful throw-away use of data objects would be a disaster with finite memory. In order to make such memories useable to Lisp (or functional language) we must interpose between EVAL and the storage system a storage manager which makes a finite vector memory appear to the evaluation mechanism to be an infinite linked-record memory. The memory is "apparently infinite" in the sense that an indefinitely large number of new records can be "created" using the CONS operator. The storage manager recycles discarded records in order to create new ones in a manner completely invisible to the evaluator.

The storage manager therefore consists of routines which implement the operations CAR, CDR, CONS, etc. in terms of the vector memory, plus a garbage collector which deals with the finiteness of the memory by locating records which have been discarded and making them available to the CONS routine for recycling.

In Guy Steele's Lisp processor prototype, there is no garbage collector for some reason. And there's even no ALU! (In fact it have simplest arithmetic and logical capabilities, which can only add 1 and test for 0) This is interesting because Lisp itself is so simple that the interpreter needs no arithmetic to run interesting programs (such as computing symbolic derivatives and integrals, or pattern matching).

This is not to say that real Lisp programs do not need arithmetic, just that the Lisp interpreter itself does not require binary arithmetic of the usual sort (but it does require CONS, CAR and CDR, which in a formal sense indeed form a kind of "number system", where CONS corresponds to "add 1" and both CAR and CDR to "substract 1". In this view, the purpose of the storage manager is to interface between two kinds of arithmetic, namely "Lisp arithmetic" and Peano arithmetic).

Jan 28, 2009

Understanding Git Concepts

Git in fact is a file system with history.

All data is saved in git objects. All git objects has a 40bits id which generated by SHA-1 hashing the object's content. There're 4 types of objects:
  • blob object: File contents is saved in blob object. No filenames/permissions etc, only contents is saved here.
  • tree object: Directory structure is saved here. Tree object's content is just a list of its children, either blob object or tree object. A list item will contain either a SHA-1 hash point to a blob object with filename/permissions/etc. or a hash point to a tree object. Here we have got a data structure(tree) which can represent a file system.
  • commit object: Now we need history. A commit object simple contains a pointer to tree, one or many pointer to parents(also commits) and some booking data like commiter. Commit objects in fact forms a tree graph on a higher layer of blob/tree.
  • tag object: Tag object is just for referencing an object conviniently. A tag object can have a pointer point to any other git object and a tag, then you can use the tag to reference any object(like an important commit) in your git repo.
A git object is immutable. Another concept in git system is Reference, for referencing mutable things like branch and remote.
  • Branch is just a file in .git/refs/heads/ dir contains the SHA-1 hash of the most recent commit to that branch. When you create a branch in git, git just create a file contains a 40 bytes hash in .git/refs/heads/, and update .git/HEAD to point to it. With your development moves on, git will find current branch in HEAD and update the branch file in refs/heads correctly.
  • Remote is a pointer to branch(so it's also a branch) in other people's copies of the same repo. If you get the code by clone instead of 'git init', git will add a default 'origin/master' remote branch for you automatically. 'origin' point to the remote copy location, and 'master' means which branch on remote you cloned from.
When you ask for checking out, git will lookup the argument you provided in .git/refs or .git/HEAD, find the corresponding object/branch/tag/whatever, read the SHA-1 hash which points to a tree from its content, then traverse the tree.

A fetch will merge all updates on a remote branch to your local. By default it will merge in changes on origin/master, but you can fetch updates on other place like origin/cool. After a series of fetch/merge your history graph will looks like a mess, rebase will help. Rebase will leave orphan objects in your repo(you can use 'git gc' to clean it) and should not be used on a repo which can be fetched by others.

Jan 25, 2009

Evolutionary algorithm example in Ruby


#!/usr/bin/ruby -w

# This program is based on this evolutionar computation introduction:
# http://blog.uncommons.org/2009/01/20/practical-evolutionary-computation-an-introduction/
#
# usage: ruby evolution.rb [generations] [goal]
# example: ruby evolution.rb
# ruby evolution.rb 100
# ruby evolution.rb 100 "I make the universe"

# We can set a goal(string) for evolution, and the Evolution object
# will evolute towards the goal you set.
class Evolution
attr_accessor :set

# The goal(string) can contain only upcase characters and space
# CHARSET in fact is gene pool
CHARSET = ('A'..'Z').to_a + [' ']
CHARSET_LENGTH = CHARSET.length

# goal: evolution goal is a string, like 'hello world'
# population: the population of the society. default to 100, means
# there're 100 parents in the initial environment. And
# the environment can only support 100 livings.
# mutation_rate: the possibility of gene mutation. default to 0.01,
# means a gene 'A' in one generation has 1/100 possibility
# to mutate to random gene in CHARSET
def initialize(goal,population=100,mutation_rate=0.01)
@goal = goal
@population = population
@mutation_rate = mutation_rate

# @set is the environment all livings live in
@set = []
@strlen = goal.length

# fill the environment with livings
population.times {|i|
str = ""
@strlen.times {|j| str << CHARSET[rand(CHARSET_LENGTH)] }
@set << str
}
end

# evolution function
# reproduce: how many generations should the evolution have
def run(reproduce=1000)
reproduce.times {|i| generation }
sort_and_cut(@set).each {|s| puts "#{s} : #{score s}" }
end

private

# one generation
def generation
score_set
pick
crossover
mutation
end

# give mutation chance to every living in our environment
def mutation
k = 1/@mutation_rate
for str in @set
str.length.times {|i|
str[i] = CHARSET[rand(CHARSET_LENGTH)] if rand(k) == 0
}
end
end

# choose two parent
# produce offspring
def crossover
set = @set.uniq

offsprings = []
if set.length == 1
offsprings = @set
else
(set.length-1).times {|i|
(i+1).upto(set.length-1) {|j|
pivot = rand(@strlen) + 1
par1_a, par1_b = set[i][0,pivot], set[i][pivot,@strlen]
par2_a, par2_b = set[j][0,pivot], set[j][pivot,@strlen]
offsprings << "#{par1_a}#{par2_b}"
offsprings << "#{par2_a}#{par1_b}"
}
}
end

@set = sort_and_cut(offsprings)
end

# pick the good candicates (high score)
# score 2 candicate will have high possiblity to be choosen than score
# 1 candicate
def pick
pool = []
@score_map.each {|str,score|
score.times {|i|
pool << str
}
}
pool.sort! {|a,b| rand(3) - 1} # shuffle
pool_len = pool.length

@set = []
@population.times {|i|
@set << pool[rand(pool_len)]
}
end

# compute score for every candicates
def score_set
@score_map = {}
for str in @set
@score_map[str] = score(str)
end
@score_map
end

# score will tell us the simliarity between the str and goal
# score = the same character on the correct position with goal
def score(str)
score = 0
@strlen.times {|i|
score += 1 if str[i] == @goal[i]
}
score
end

# sort livings by score and only the livings with highest score
# will left. They're the selection of nature.
def sort_and_cut(set)
set.sort_by {|s| -(score s)}[0,@population]
end

end

if __FILE__ == $0
times = ARGV[0] ? ARGV[0].to_i : 20
goal = ARGV[1] ? ARGV[1].upcase : 'HELLO WORLD'
e = Evolution.new(goal)
e.run times
end

Jan 14, 2009

Haskell Functor Typeclass

A correct functor instance should follow two rules:

fmap id == id
fmap (f . g) == fmap f . fmap g

In natural words, functor should keep data's structure, only change it's value. This rule can't be guranteed by compiler, so we have to remember it by ourselves when implement a functor instance.