Nov 26, 2009

CS Education

"Programming is usually taught by examples. Experience shows that the success of a programming course critically depends on the choice of these examples. Unfortunately, they are too often selected with the prime intent to demonstrate what a computer can do. Instead, a main criterion for selection should be their suitability to exhibit certain widely applicable techniques. Furthermore, examples of programs are commonly presented as finished "products" followed by explanations of their purpose and their linguistic details. But active programming consists of the design of new programs, rather than contemplation of old programs. As a consequence of these teaching methods, the student obtains the impression that programming consists mainly of mastering a language (with all the peculiarities and intricacies so abundant in modern PL's) and relying on one's intuition to somehow transform ideas into finished programs. Clearly, programming courses should teach methods of design and construction, and the selected examples should be such that a gradual development can be nicely demonstrated. "

- "Program Development by Stepwise Refinement", Niklaus Wirth, 1995

Nov 24, 2009

Browserless Web Development

note: "Web development" in this article doesn't include UI design/implementation, which means all (backend, database, html etc.) except css/javascript.

I just found a new measure on code/web-developer recently. A traditional web development loop may look like this:

  1. read new feature/story
  2. code
  3. try it in browser, if there's any problem, goto 2 (goto considered useful here)
  4. commit your code

An obvious problem here is, there's no room left for *automated* test. You may write *automated* test in step 2, but no one force you to do it. A better process (I think) should be like this:

  1. read new feature/story
  2. code
  3. write a piece of code to test code in step 2, if there's any problem, goto 2
  4. commit your code

So we change step 3 only, removed browser from our process. *Automated* test become an explicit step here. You can switch step 2/3 in the latter process, so you'll write test first in that case, that's not the point here so I left them unchanged. The point is if you don't have a browser at hand you'll be forced to test your code by writing code, which is automated, reusable and cool. You'll find you're working in TDD style naturally even you don't know what TDD is.

That's what I called Browserless Web Development. The less a web developer use browser to validate his work, the better he and his code are.

Oct 26, 2009

Software Engineering is ...

You know, Dijkstra is really awesome.

"Ours is the task to remember (and to remind) that, in what is now called “software engineering”, not a single sound engineering principle is involved. (On the contrary: its spokesmen take the trouble of arguing the irrelevance of the engineering principles known.) Software Engineering as it is today is just humbug; from an academic —i.e. scientific and educational— point of view it is a sham, a fraud."

"Universities are always faced with something of a problem when there is a marked discrepancy between what society asks for and what society needs. In our case the need is clear: the professional competence of the Mathematical Engineer, familiar with discrete systems design and knowing how to use formal techniques for preventing unmastered complexity from creeping in. But said war “out there” all but prevents this need from being perceived, and what our immediate industrial environment overwhelmingly seems to ask for is different brands of snake oil, Software Engineering, of course, being one of them. And as, with the recession lasting longer and longer, the external pressures on the Universities to do the wrong things only mount, it is only to be expected that part of the campus is going to be included in the battlefield."

"The task of the first-class University, however, is absolutely clear. Industry being its customer, consultancy must tell industry what it wants to hear; it is the task of the first-class University to tell industry what it does not want to hear, whereby it is the rôle of its scientific authority to ensure that the sting of the academic gadfly really hurts."

-- Edsger W. Dijkstra,


It's interesting just after I read Dijkstra's article, Joel Spolsky published a new blog post "Capstone projects and time management", to some extent on the opposite side of Dijkstra. Joel wrote lots with wisdom, but this new post just looks like an april fool's joke. A smart guy wrote a perfect answer to Joel already.

Oct 9, 2009

The Correct Refactor Flow

  1. Get assigned a task to implement a new feature.
  2. Refactor the code until that feature is as easy to add as possible.
  3. Add the feature.
  4. Submit.
Read this enlightening piece here.

update: I found this is indeed originated from Martin Fowler's amazing book "Refactoring", which is filled with ideas originated from practices of smalltalk. It's a shame I didn't read that book earlier :-( "Refactoring" (or "Refactoring: Ruby Edition") is a must read.

Sep 10, 2009

Notes on Alan Kay's "The Early History of Smalltalk"

"In computer terms, Smalltalk is a recursion on the notion of computer itself. Instead of dividing "computer stuff" into things each less strong than the whole--like data structures, procedures, and functions which are the usual paraphernalia of programming languages--each Smalltalk object is a recursion on the entire possibilities of the computer. Thus its semantics are a bit like having thousands and thousands of computer all hooked together by a very fast network." (I never think OO in this way before I read this, it changes my view of programming, made so many design pattern look naturally. This is a whole different way to explain why recursion is the root of computer (you know the original way is by lambda calculus))

"Programming languages can be categorized in a number of ways: imperative, applicative, logic-based, problem-oriented, etc. But they all seem to be either an "agglutination of features" or a "crystallization of style." COBOL, PL/1, Ada, etc., belong to the first kind; LISP, APL-- and Smalltalk--are the second kind. It is probably not an accident that the agglutinative languages all seem to have been instigated by committees, and the crystallization languages by a single person." (Very interesting observation. It seems single-person languages are more popular today)

"I could hardly believe how beautiful and wonderful the idea of LISP was. I say it this way because LISP had not only been around enough to get some honest barnacles, but worse, there wee deep falws in its logical foundations. By this, I mean that the pure language was supposed to be based on functions, but its most important components---such as lambda expressions quotes, and conds--where not functions at all, and insted ere called special forms. Landin and others had been able to get quotes and cons in terms of lambda by tricks that were variously clever and useful, but the flaw remained in the jewel. In the practical language things were better. There were not just EXPRs (which evaluated their arguments0, but FEXPRs (which did not). My next questions was, why on earth call it a functional language? Why not just base everuything on FEXPRs and force evaluation on the receiving side when needed?" (I like Alan Kay's criticism because what he wanted LISP to be looks exactly like Haskell :) He used an interesting description for LISP: 'surface beauty'. My opinion is LISP is still great, but not in practical sense now. All gurus suggest learning lisp but only for 'think different', not for using it in daily work. Alan Kay tell an incident later and said: "Watching a famous guy much smarter than I struggle for more than 30 minutes to not quite solve the problem his way (there was a bug) made quite an impression. It brought home to me once again that "point of view is worth 80 IQ points." I wasn't smarter but I had a much better internal thinking tool to amplify my abilities." )

"I didn't like meetings: didn't believe brainstorming could substitute for cool sustained thought."

"The actual beauty of LISP came more from the promise of its metastrcutures than its actual model. I spent a fair amount of time thinking about how objects could be characterized as universal computers without having to have any exceptions in the central metaphor. What seemed to be needed was complete control over what was passed in a message send; in particular when and in what environment did expressions get evaluted? "

"A simple and small system that can do interesing things also needs a "high slope"--that is a good match between the degree of interestingness and the level of complexity needed to express it. "

"The latter was deemed to be hard and would be handled by the usual method for hard problems, namely, give them to grad students. "

"Of course, the whole idea of Smalltalk (and OOP in general) is to define everything intensionally. "

"Perhaps the most important principle--again derived from operating system architectures--is that when you give someone a structure, rarely do you want them to have unlimited priviledges with it. Just doing type-matching isn't even close to what's needed. Nor is it terribly useful to have some objects protected and others not. Make them all first class citizens and protect all. "

"... this led to a 90 degree rotation of the purposed of the user interface from"access to functionality" to "environment in which users learn by doing."" (This is how overlapping windows user interface came out. I think this is also a proof why a programmer should work with keyboard+tilling windown manager: overlapping windows system, and in fact all GUI, is for end users who 'learn by doing'. Programmers are professionals, they should have already learnd their tools before doing anything, they shouldn't learn by doing. I never found a great programmer who mainly use mouse/GUI.)

"By now it was already 1979, and we found ourselves doing one of our many demos, but this time for a very interested audience: Steve Jobs, JeffRaskin, and other technical people from Apple. They had started a project called Lisa but weren't quite sure what it shouldbe like, until Jeff said to Steve, "You should really come over to PARC and see what they ae doing." Thus, more than eight years after overlapping windows had been invented and more than six years after the ALTO started running, the people who could really do something about the ideas, finally to to see them. The machine used was the Dorado, a very fast "big brother" of the ALTO, whose Smalltalk microcode had been largely written by Bruce Horn, one of our original "Smalltalk kids" who was still only a teen-ager. Larry Tesler gave the main part of the demo with Dan sitting in the copilot's chair and Adele and I watched from the rear. One of the best parts of the demo was when Steve Jobs said he didn't like the blt-style scrolling we were using and asked if we cold do it in a smooth continuous style. In less than a minute Dan found the methods involved, made the (relatively major) changes and scrolling was now continuous! This shocked the visitors, espeicially the programmers among them, as they had never seen a really powerful incremental system before. Steve tried to get and/or buy the technology from Xerox (which was one of Apple's minority venture captialists), but Xerox would neither part with it nor would come up with the resources to continue to develop it in house by funding a better NoteTaker cum Smalltalk. " (How stupid Xerox is. In fact Alan Kay has predicted personal computing 30 years ago and reported his vision to Xerox many times. But Xerox just ignored those reports. If Xerox took Alan's work/suggestions seriously, it's very likely to be Microsoft+Apple today, not merely a laser printer company.)

"One way to think about progress in software is that a lot of it has been about finding ways to late-bind"

"Hardware is really just software crystallized early. It is there to make program schemes run as efficiently as possible. But far too often the hardware has been presented as a given and it is up to software designers to make it appear reasonable. This has caused low-level techniques and excessive optimization to hold back progress in program design. ... In short, most hardware designs today are just re-optimizations of moribund architectures. "

"Objects made on different machines and with different languages should be able to talk to each other--and will have-to in the future."

"I think the enormous commercialization of personal computering has smothered much of the kind of work that used to go on in universities and research labs, by sucking the talented kids towards practical applications. With companies so risk-adverse towards doing their own HW, and the HW companies betraying no real understanding of SW, the result has been a great step backwards in most respects. "

Ref. "The Early History of Smalltalk"

Sep 2, 2009

Cross-VM Attack on EC2

Researchers from UCSD and MIT published a paper which shows vulnerability of cloud computing: cross-vm attack. With this technique a malicious user can run a new ec2 instance on the same physical machine as target vm instance, and exploit information leakage of target vm.

Read it:

Jul 29, 2009

Dunning-Kruger effect

"The Dunning-Kruger effect is an example of cognitive bias in which '...people reach erroneous conclusions and make unfortunate choices but their incompetence robs them of the metacognitive ability to realize it'."

"They therefore suffer an illusory superiority, rating their own ability as above average. This leads to a perverse result where people with less competence will rate their ability more highly than people with relatively more competence."

"It also explains why competence may weaken the projection of confidence because competent individuals falsely assume others are of equivalent understanding 'Thus, the miscalibration of the incompetent stems from an error about the self, whereas the miscalibration of the highly competent stems from an error about others.'"

see Wikipedia

Jul 27, 2009

Ubiquitous Monad

It seems I had an 'aha!' time with monad today, finally. It seems monad is everywhere.

In a functional world with currying, we can think all functions are in the type x -> y. We can divide those functions into two category:

1. a -> a, these are functions who have same input and output type
2. a -> b, there are the functions who have different input and output type

Functions in category one can work with functions have the same type as them easily, suppose you have f::Int->Int and g::Int->Int, you can combine them as you wish, like f.f.f.g.g.g.f.g.g.f. This is why people like functional programming.

But this is not true for functions in 2rd category. Suppose you have f::Int->Float and g::Int->Float, how would you combine them? You can do neither f.g nor g.f, the types just don't match. As you can feel, there's much more functions in 2rd category than those in 1st category in real world. So monad comes to rescue.

Monad helps functions in 2rd category behave like those ones in 1st category - it can 'lift' a 2rd category function to 1st category, with one of its core functions named bind:

bind :: (a -> b) -> (b -> b)

In haskell b is a Monad. If you have read a tutorial take Maybe monad as example, you may have an intuition that monad is a 'wrapper' which wrap something. That's not exactly. The key here is to define a way to convert a value of type a to a value to type b, and vice vesa. Wrap a value is an easy and intuitive way to do the conversion, but not the only way (e.g. List Monad is a good example). So you can think everything as Monad, because type a -> b function is everywhere. Yes Float is monad, because there is a function in type Int -> Float and you can define a bind in type (Int -> Float) -> (Float -> Float).

Jun 15, 2009

[ANN] Rubytest.vim 0.9.6 Released

Rubytest.vim 0.9.6 is just released. This version contains some small fix:

* support rspec examples looks like
example "this is an example" do
* correctly handle single/double quote escape for rspec examples and vanilla testcases

Check it out here:

* Rubytest.vim is a vim plugin which helps you run ruby tests (including vanilla testcases, rspec examples, shoulda examples ..) quickly.

Jun 2, 2009

Infinity in Ruby

I learned this from a post today:

irb(main):001:0> Infinity = 1/0.0
=> Infinity
irb(main):002:0> (0..Infinity).include?(100000000000000)
=> true
irb(main):003:0> (0..Infinity).include?(-1)
=> false
irb(main):004:0> (-Infinity..0).include?(-1)
=> true
irb(main):005:0> (-Infinity..0).include?(-100000000000000000000)
=> true
irb(main):006:0> (-Infinity..0).include?(1)
=> false
irb(main):007:0> everything = -Infinity..Infinity
=> -Infinity..Infinity
irb(main):008:0> everything.include? 0
=> true

But ruby is not like haskell, which eval lazily - so don't try everything.to_a[0..100] :)

May 12, 2009

[ANN] Rubytest.vim 0.9.5 Released

Rubytest.vim is a vim ( plugin, which helps you to run ruby test (including vanilla test, rspec, shoulda etc.) in vim.


* Support quickfix: you can view test errors in quickfix window now
* Small fixes.

Get it here:

May 4, 2009

The problem with young entrepreneurs

"Most of the time, this leads to the well-known case of “solutions looking for problems” - beautiful technology that can’t become a profitable business.

Best ideas are a side-effect from solving significant problems that the entrepreneurs themselves experience, observe and intimately understand. As young Web entrepreneurs, we aren’t sufficiently aware of important real-world problems, since our life mostly consists of hacking, coffee and occasional entertainment."


Apr 30, 2009

Something about latest vimperator

Latest vimperator release doesn't work well with Firefox 3.0.x and TabMixPlus: slow autocompletion, broken tab functions.

M.Terada provides two patch to make tabs work decently again, I don't know whehter it's included in vimperator's repository now, but you can download them here and here

For faster auto completion, set complete and preload options like this in your .vimperatorrc may help:

set complete=sbh
set nopreload

Some guys suggest setting wildmode to empty, but I don't like that idea.

Below is mine full .vimperatorrc:

set pageinfo=gfm
set showtabline=2
set defsearch=google
set complete=sbh
set nopreload
set showstatuslinks=2
set smartcase
set newtab=all
"set wildmode=

map l gt
map h gT
map b :bmarks!<Space>
map ;; d
map s :js open("mailto:?SUBJECT='" + escape(document.title) + "'" + "&BODY=" + escape(document.getElementById("urlbar").value))<CR>

map <C-p> :pa<CR>
map <C-k> tg<Space>
map <C-u> <C-v><C-u>
map <C-y> <C-v><C-y>
map <C-R> a<Space>-tags=toread<CR>
map <C-r> :bmarks -tags=toread<CR>
map <C-d> :delbm<CR>
"map <C-l> :sidebar LiveHTTPHeaders<CR>

map <silent> <F9> :js inspectDOMDocument(document)<CR>
map <silent> <F1> :js toggle_element('toolbar-menubar');toggle_element('nav-bar')<CR>
map <silent> <F2> :emenu Edit.Preferences<CR>
map <silent> <F3> :emenu Tools.Live HTTP headers<CR>
map <silent> <F10> :exe ":o dict2 "+content.getSelection()<CR>

autocmd PageLoad .* :js modes.passAllKeys = /(mail\.google\.com)|(google\.com\/reader)/.test(buffer.URL)

set nextpattern+=^\s*下一页\s*$
set previouspattern+=^\s*上一页\s*$

javascript <<EOF
var feedPanel = document.createElement("statusbarpanel");
feedPanel.setAttribute("id", "feed-panel-clone");
feedPanel.firstChild.setAttribute("style", "padding: 0; max-height: 16px;");
.insertBefore(feedPanel, document.getElementById("security-button"));

javascript << EOF
toggle_element = function (name) {
document.getElementById(name).collapsed ^= 1;

echo ".vimperatorrc sourced"

" vim: ft=vimperator sw=2 sts=2

Apr 19, 2009

Announcement of rubytest.vim: a vim plugin aims to help you run ruby test conveniently

Rubytest.vim is a vim ( plugin, which helps you to run ruby test (including vanilla test, rspec, shoulda etc.) in vim.


Copy all files to your ~/.vim directory.


After installation, press t will run the test under your cursor if you are editing a ruby test file.


$ cd
$ vim test/unit/user_test.rb
(move cursor into a test case, press t)

( is mapping to '\' by default in vim)

You can customize the command which will be used to run the test case by settting these options in your vimrc file:

let g:rubytest_cmd_test = "ruby %p"
let g:rubytest_cmd_testcase = "ruby %p -n '/%c/'"
let g:rubytest_cmd_spec = "spec -f specdoc %p"
let g:rubytest_cmd_example = "spec -f specdoc %p -e '%c'"

(%p will be replaced by the path of test file, %c will be replaced by the name of test case under cursor)

Default Key Bindings

t: run test case under cursor
T: run all tests in file

You can change default key bindings:

map \ RubyTestRun " change from t to \
map ] RubyFileRun " change from T to ]

Mar 27, 2009

Bidirectional many-to-many relationship in ActiveRecord

Bidirectional many-to-many relationship is a common pattern in design, like friendships between users, memberships between users and groups, etc. In this article I'll illustrate how to implement such a relationship in ActiveRecord/Rails.

Let's start with a little context. Suppose we want to add the populer 'friends' feature for our users, we already have User model, what we need is a join table 'friendships' to connect users.

class User < ActiveRecord::Base

class CreateFriendships < ActiveRecord::Migration
def self.up
create_table :friendships, :id => false do |t|
t.integer :left_user_id
t.integer :right_user_id

def self.down
drop_table :friendships

The behavior we want, can be written like this:

test "should has many friends" do
users(:quentin).friends << users(:aaron)
assert_equal 1, users(:quentin).friends.reload.count

Of course the test is failed when we run it now. After reading our requirements we decide to use has_and_belongs_to_many (habtm) in ActiveRecord because it is enough for now, so we modify our User model like this:

class User < ActiveRecord::Base
has_and_belongs_to_many :friends, :join_table => 'friendships',
:foreign_key => 'left_user_id', :association_foreign_key => 'right_user_id',
:class_name => 'User'

Now run our test, passed, perfect! We can leave office and enjoy *fill whatever you like* now.

"Wait", your brilliant colleague says, and add one line to your unit test:

test "should has many friends" do
users(:quentin).friends << users(:aaron)
assert_equal 1, users(:quentin).friends.reload.count
assert_equal 1, users(:aaron).friends.reload.count

He runs the test again and it failed. It's unfair if quentin treat aaron as his friend but aaron doesn't do the same to quentin, isn't it? To fix this problem we need to custom insert and delete sql for the habtm relationship:

class User < ActiveRecord::Base
has_and_belongs_to_many :friends, :join_table => 'friendships',
:foreign_key => 'left_user_id', :association_foreign_key => 'right_user_id',
:class_name => 'User',
:insert_sql => 'insert into friendships (`left_user_id`, `right_user_id`) values
(#{id}, #{}), (#{}, #{id})',
:delete_sql => 'delete from friendships where (left_user_id = #{id} and right_user_id
= #{}) or (left_user_id = #{} and right_user_id = #{id})'

What we do here, is to add two friendship records (both direction) when we add a user to another's friends set. We do the same when delete a user from one's friends set. We rerun the test and it passes as we expected.

Note the ":id => false" argument when we create the join table 'friendships', without it you'll have troubles when loading friends objects. I think this is a long history bug of ActiveRecord, I don't know why it is not fixed. If you really want to keep the 'id' field and use habtm at the same time, a workaround is customize the finder_sql:

class User < ActiveRecord::Base
has_and_belongs_to_many :friends, ...
:finder_sql => 'select users.* from friendships left outer join users on
friendships.right_user_id = where friendships.left_user_id = #{id}'

The bidirectional habtm we build here, is in fact a self-referential bidirectional relationship, it relates models to the same type of models (users to users). What will happen if the relationship is NOT self referential?

In that case we should not use habtm, otherwise our (at least my) brain will be burned by the complicated sqls it brings. Don't waste time on those sqls, ActiveRecord provides another way to build many-to-many relationship: has_many :through

So one day the boss comes to you and asks, "Can we group our friends? I want to put Gates in group Evil and Male, and Linus in group Minix and Antarctic"

"Sure", you answered, how can you say No to your boss?

Now the many-to-many relationship is between groups and users, we need to modify User model, create a Group model and write a migration to modify friendships table. Since we'll use has_many :through for this requirement, we also need a model for the join table 'friendships':

class User < ActiveRecord::Base
has_many :groups

class CreateGroups < ActiveRecord::Migration
def self.up
create_table :groups do |t|
t.string :name
t.integer :user_id

def self.down
drop_table :groups

class Group < ActiveRecord::Base
belongs_to :user
has_many :friendships
has_many :friends, :through => :friendships

class RemodelFriendships < ActiveRecord::Migration
def self.up
remove_column :friendships, :left_user_id
rename_column :friendships, :right_user_id, :friend_id
change_column :friendships, :group_id, :integer, :null => false

def self.down
change_column :friendships, :group_id, :integer, :null => true
rename_column :friendships, :friend_id, :right_user_id
add_column :friendships, :left_user_id
# execute "..."

class Friendship < ActiveRecord::Base
belongs_to :group
belongs_to :friend, :class_name => 'User'
validates_uniqueness_of :friend_id, :scope => :group_id

Pretty good, it works. But we have the same problem when we use habtm, the relationship is not bidirectional. How can we fix it?

ActiveRecord allow you to extend the association proxy when use has_many :through, the solution here is to override the proxy's default CRUD methods. Let's create a new file lib/bidirection.rb:

module Bidirection
def <<(friend)
Friendship.create :group_id =>, :friend_id =>

[:delete, :destroy].each {|m|
define_method(m) {|friends|
friends = [friends] unless friends.instance_of?(Array)
friends.each {|friend|
Friendship.first(:conditions => {:group_id =>, :friend_id =>}).try m


Then enhance our has_many :through with it:

class Group < ActiveRecord::Base
has_many :friends, :through => :friendships, :extend => Bidirection

That's it, you have bidirectional many-to-many relationship between groups and users now. For convinient you can create a default group when create user (use before_create hook), and delegate friends and friends= methods in User model to user's default group, in that way you get the self-referential bidirectional many-to-many relationship betweent users back, as we have when using habtm.

Mar 19, 2009

ActiveRecord and DataMapper

I jumped into a merb project about 1 month ago, it's really an interesting trip. Working in both frameworks make pros and cons crystal clear to see. Here I'll try to remember some of those I found on the different ORM they use, thus ActiveRecord vs DataMapper.

* Schema Definition

ActiveRecord keep all schema definitions in migration files, while DataMapper store most of them in model source code. I prefer DataMapper's way, so you won't need plugins like annotate_models any more. Do you use annotate_models in your rails project?

However, keep model schema in model source code brings more complexity, because sometimes you have tasks which are better written in a migration file instead of model source file. So DataMapper has seperate migrations too, and merb provides more migration rake tasks than rails.

* Many-to-Many Relationships Through Scoped Join Model

Sorry for my poor description, this ticket illustrates the problem well. I don't know why this bug won't be fixed in ActiveRecord - below code works well in DataMapper.

has n, :bookmarks, :through => :subscriptions, :conditions => "subscriptions.notification = 'f'"

* scope

ActiveRecord introduced named_scope in since 2.x (can't remember), and add dynamic scope (scope_by_xxx like dynamic finder find_by_xxx) in the latest 2.3 release, so now you can chain up many dynamic scopes, named scope and find now. That's really cool, but why we need two seperate finders, find and scope?

In DataMapper, "find" and "scope" is unified. You can do this

class Zoo
# all the keys and property setup here
all(:open => true)
def self.big
all(:animal_count.gte => 1000)

big_open_zoos =

As you can see, the finder #all is able to be chained up or used independently, smart. And I like tye syntax all(:open => true) than find_all_by_open(true): hash can be auto complete by editors while '_' can not, hash param doesn't require dynamic tricks while find_xxx need. Sometimes I feel rails takes too many cares to his baby programmers, sometimes I feel merb takes too little (not in this case, of course).

* Inheritance vs Include

ActiveRecord model is required to inherit from ActiveRecord::Base, while DataMapper model need to include DataMapper::Resource. Composition is better than inheritance, though module inclusion is in fact inheritance in Ruby.

* Finally

I didn't realize all the things I written down are votes for DataMapper until I write this sentence .. I want to say ActiveRecord is an excellent ORM framework too, at least it's more mature than DataMapper, I enjoy it at most of the time, it just doesn't work in some few case (I really hope the many-to-many relationship through scope join model problem can be fixed, it's a common pattern I've seen it serveral times). The good news is by Rails and Merb merge, rails will become super flexible and you'll be able to switch between ORMs in Rails 3 easily.

Mar 11, 2009

The problem is

"As is often the case when proving things about programming languages, the tricky part here is formulating a precise statement to be proved—the proof itself should be straightforward." - TAPL, pierce

In schools students are always given clearly defined, already formalized questions, I think that's why the students feel uncomfortable when they come to the real world. I should spend more time on the mathematical modeling class when I was in university :(

Feb 24, 2009

Use rails.vim and merb.vim together

I recently works on a merb project. As a vim user I found merb.vim for syntax highlighting, but it will conflict with rails.vim. For example, *before_filter* is a declaration in Rails controller, but in Merb it should *before*. After you installed merb.vim, it will set the file type of your controller file to ruby.merb_controller, so if you open a controller file in a rails project, the 'before_filter' will be highlighten as an error.

The solution is to find a way to have directory/project specific setting for vim. Luckily it already exists.

In you ~/.vimrc:

set exrc

Then move ~/.vim/ftdetect/merb.vim (this file is installed by merb.vim plugin) to $your_merb_project/.vimrc. Now the filetype setting for merb will only affects the merb project dir only.

Feb 16, 2009

Rules a Monad should follow

I. return x >>= f === f x

II. x >>= return === x

III. m >>= (\x -> f x >>= g) === (m >>= f) >>= g

Feb 7, 2009


There's an interesting post in ruby mailing list today. The discussion started by a question asked by Jonathan Wills:
a = 1
puts a

This will print out '2', as I want. However if I remove the first line,

puts a

r.rb:2:in `
': undefined local variable or method `a' for
main:Object (NameError)

Then Mike Gold gives an excellent explaination:
The thing to remember is that local variables are always determined at parse time.

eval("a = 3", binding)
p local_variables #=> ["a"]
p a #=> undefined local variable or method `a'

We see that the local exists, but because the parser has not seen the "a = ..." syntax, we can't access it.

Locals are not handled in the same way. The local assignment syntax is subject to special examination by the parser. This does not happen for instance variables and globals. Again, it is the parser that determines locals. If all assignments were the same, this would print '3':

eval("a = 3", binding)
puts a

Pickaxe gives this example:

def a
print "Function 'a' called\n"

for i in 1..2
if i == 2
print "a=", a, "\n"
a = 1
print "a=", a, "\n"


Function 'a' called

But constants and instance variables are different. Which means

@a # => 2

Another interested thing Mike metioned is, when you see this error:

undefined local variable or method `a' for main:Object (NameError)

It in fact means 'undefined method a':
... that error message comes from rb_method_missing(), i.e., it was determined to be a method call. Presumably it is worded that way for the user's benefit, in case there was a misspelling or some such. Locals are determined by the parser; method lookup is done when no such local exists.

Feb 4, 2009

Notes of Guy Steele's Lisp Processor Prototype

The prototype in paper is different from the well known 'Lisp Machine' which in fact has a hybrid (traditional linear vector memory model + linked list model) architecture (e.g. based on TTL logic, with large ALU) while the prototype is totally build on linked list model (with embedded instruction set and almostly no ALU). The history of this project is a very interesting story :) For youth like me, it's hard to imagine that one obstacle of building such a processor is ALU is too large at that time ..

And this paper make me think it's not fair to say modern GC (garbage collection) is an innovation from Lisp (you know some Lisp guys always like to say 'hey, XXX has already be seen in Lisp 30 years ago). As you can read from the paper, GC is an inevitable workaround to fill the gap between functional programming (which will produce lots of immutable intermedia) and von Neumann machine (finite memory), it's not invented to free programmers from C++'s malloc/free hell like Java (which is an imperative language). Like Watt found water can be heated into steam to drive machine is of course an innovation, but use water to generate electricity is a totally diffierent thing.

Most of the contents below are copied from GLS & GJS's paper "Design of LISP-Based Processors", March 1979.

== Notes of the paper ==

An idea which has increasingly gained attention is that computer architectures should reflect specific language structures to be supported.

Stored-program computer is such one that the program and the data reside in the same memory, the program is itself data which can be manipulated as any other data by the processor. Lisp is unusual among high-level languages in that it explicitly supports the stored-program idea, for this reason Lisp is often referred to as a 'high-level machine language'.

One of the central ideas of the Lisp language is that storage management should be completely invisible to the programmer. Lisp is an object-oriented language, rather than a value-oriented language. The Lisp programmer does not think of variables as the objects of interest, bins in which values can be held. Instead, each data item is itself an object, which can be examined and modified, and which has an identity independent of the variable used to name it.

A complete Lisp system is conveniently divided into two parts: (1) a storage system, which provides an operator for the creation of new data objects and also other operators (such as pointer traversal) on those objects; and (2) a program interpreter (EVAL), which executes programs expressed as data structures within the storage system. (Note that this memory/processor division characterizes the usual von Neumann architecture also. The differences occur in the nature of the processor and the memory system.)

Commercially available memories are available only in finite sizes. Lisp (or functional language)'s free and wasteful throw-away use of data objects would be a disaster with finite memory. In order to make such memories useable to Lisp (or functional language) we must interpose between EVAL and the storage system a storage manager which makes a finite vector memory appear to the evaluation mechanism to be an infinite linked-record memory. The memory is "apparently infinite" in the sense that an indefinitely large number of new records can be "created" using the CONS operator. The storage manager recycles discarded records in order to create new ones in a manner completely invisible to the evaluator.

The storage manager therefore consists of routines which implement the operations CAR, CDR, CONS, etc. in terms of the vector memory, plus a garbage collector which deals with the finiteness of the memory by locating records which have been discarded and making them available to the CONS routine for recycling.

In Guy Steele's Lisp processor prototype, there is no garbage collector for some reason. And there's even no ALU! (In fact it have simplest arithmetic and logical capabilities, which can only add 1 and test for 0) This is interesting because Lisp itself is so simple that the interpreter needs no arithmetic to run interesting programs (such as computing symbolic derivatives and integrals, or pattern matching).

This is not to say that real Lisp programs do not need arithmetic, just that the Lisp interpreter itself does not require binary arithmetic of the usual sort (but it does require CONS, CAR and CDR, which in a formal sense indeed form a kind of "number system", where CONS corresponds to "add 1" and both CAR and CDR to "substract 1". In this view, the purpose of the storage manager is to interface between two kinds of arithmetic, namely "Lisp arithmetic" and Peano arithmetic).

Jan 28, 2009

Understanding Git Concepts

Git in fact is a file system with history.

All data is saved in git objects. All git objects has a 40bits id which generated by SHA-1 hashing the object's content. There're 4 types of objects:
  • blob object: File contents is saved in blob object. No filenames/permissions etc, only contents is saved here.
  • tree object: Directory structure is saved here. Tree object's content is just a list of its children, either blob object or tree object. A list item will contain either a SHA-1 hash point to a blob object with filename/permissions/etc. or a hash point to a tree object. Here we have got a data structure(tree) which can represent a file system.
  • commit object: Now we need history. A commit object simple contains a pointer to tree, one or many pointer to parents(also commits) and some booking data like commiter. Commit objects in fact forms a tree graph on a higher layer of blob/tree.
  • tag object: Tag object is just for referencing an object conviniently. A tag object can have a pointer point to any other git object and a tag, then you can use the tag to reference any object(like an important commit) in your git repo.
A git object is immutable. Another concept in git system is Reference, for referencing mutable things like branch and remote.
  • Branch is just a file in .git/refs/heads/ dir contains the SHA-1 hash of the most recent commit to that branch. When you create a branch in git, git just create a file contains a 40 bytes hash in .git/refs/heads/, and update .git/HEAD to point to it. With your development moves on, git will find current branch in HEAD and update the branch file in refs/heads correctly.
  • Remote is a pointer to branch(so it's also a branch) in other people's copies of the same repo. If you get the code by clone instead of 'git init', git will add a default 'origin/master' remote branch for you automatically. 'origin' point to the remote copy location, and 'master' means which branch on remote you cloned from.
When you ask for checking out, git will lookup the argument you provided in .git/refs or .git/HEAD, find the corresponding object/branch/tag/whatever, read the SHA-1 hash which points to a tree from its content, then traverse the tree.

A fetch will merge all updates on a remote branch to your local. By default it will merge in changes on origin/master, but you can fetch updates on other place like origin/cool. After a series of fetch/merge your history graph will looks like a mess, rebase will help. Rebase will leave orphan objects in your repo(you can use 'git gc' to clean it) and should not be used on a repo which can be fetched by others.

Jan 25, 2009

Evolutionary algorithm example in Ruby

#!/usr/bin/ruby -w

# This program is based on this evolutionar computation introduction:
# usage: ruby evolution.rb [generations] [goal]
# example: ruby evolution.rb
# ruby evolution.rb 100
# ruby evolution.rb 100 "I make the universe"

# We can set a goal(string) for evolution, and the Evolution object
# will evolute towards the goal you set.
class Evolution
attr_accessor :set

# The goal(string) can contain only upcase characters and space
# CHARSET in fact is gene pool
CHARSET = ('A'..'Z').to_a + [' ']

# goal: evolution goal is a string, like 'hello world'
# population: the population of the society. default to 100, means
# there're 100 parents in the initial environment. And
# the environment can only support 100 livings.
# mutation_rate: the possibility of gene mutation. default to 0.01,
# means a gene 'A' in one generation has 1/100 possibility
# to mutate to random gene in CHARSET
def initialize(goal,population=100,mutation_rate=0.01)
@goal = goal
@population = population
@mutation_rate = mutation_rate

# @set is the environment all livings live in
@set = []
@strlen = goal.length

# fill the environment with livings
population.times {|i|
str = ""
@strlen.times {|j| str << CHARSET[rand(CHARSET_LENGTH)] }
@set << str

# evolution function
# reproduce: how many generations should the evolution have
def run(reproduce=1000)
reproduce.times {|i| generation }
sort_and_cut(@set).each {|s| puts "#{s} : #{score s}" }


# one generation
def generation

# give mutation chance to every living in our environment
def mutation
k = 1/@mutation_rate
for str in @set
str.length.times {|i|
str[i] = CHARSET[rand(CHARSET_LENGTH)] if rand(k) == 0

# choose two parent
# produce offspring
def crossover
set = @set.uniq

offsprings = []
if set.length == 1
offsprings = @set
(set.length-1).times {|i|
(i+1).upto(set.length-1) {|j|
pivot = rand(@strlen) + 1
par1_a, par1_b = set[i][0,pivot], set[i][pivot,@strlen]
par2_a, par2_b = set[j][0,pivot], set[j][pivot,@strlen]
offsprings << "#{par1_a}#{par2_b}"
offsprings << "#{par2_a}#{par1_b}"

@set = sort_and_cut(offsprings)

# pick the good candicates (high score)
# score 2 candicate will have high possiblity to be choosen than score
# 1 candicate
def pick
pool = []
@score_map.each {|str,score|
score.times {|i|
pool << str
pool.sort! {|a,b| rand(3) - 1} # shuffle
pool_len = pool.length

@set = []
@population.times {|i|
@set << pool[rand(pool_len)]

# compute score for every candicates
def score_set
@score_map = {}
for str in @set
@score_map[str] = score(str)

# score will tell us the simliarity between the str and goal
# score = the same character on the correct position with goal
def score(str)
score = 0
@strlen.times {|i|
score += 1 if str[i] == @goal[i]

# sort livings by score and only the livings with highest score
# will left. They're the selection of nature.
def sort_and_cut(set)
set.sort_by {|s| -(score s)}[0,@population]


if __FILE__ == $0
times = ARGV[0] ? ARGV[0].to_i : 20
goal = ARGV[1] ? ARGV[1].upcase : 'HELLO WORLD'
e = times

Jan 14, 2009

Haskell Functor Typeclass

A correct functor instance should follow two rules:

fmap id == id
fmap (f . g) == fmap f . fmap g

In natural words, functor should keep data's structure, only change it's value. This rule can't be guranteed by compiler, so we have to remember it by ourselves when implement a functor instance.

Jan 4, 2009

Functional interpreter and program modularity

The purpose of programming tech evolution is modularity. To get modularity, people need abstractions. What's the differences between abstraction and type-saving sugars? A good package of common patterns (abstraction) encapsulates a higher level concept which has meaning *independent* of its implementation.

A package is most useful if its behavior is independent of the context(environment/global resource/etc.) of its use. Such a package is called referentially transparent. In other words, the output(behavior) of a referentially transparent package(function) will always be the same with the same input, because the explictly provided input data is the only source it depends. (What does 'same' mean here is an interesting problem, we'll see later)

To make a modular system, it is often necessary to think of a computational process as having state. In such cases, if the state can be naturally divided into independent parts, an important decomposition may be the division of the program into pieces which separately deal with the part of the state.

Referential transparency will permit programs to be divided into parts so that each part can be separately specified without a description of its implementation. The desirable result is that pieces can be separately written and debugged. At first people made no free variable recusive interpreter[1] with seperated variable bindings (Environment) and procedure bindings. It's expressive power is very litmited.

Seperate Env and Procedure symbol tables will make procedure 2nd-class concept (thus you can't define a 'map' function in the interpreter), but merge these two table unintentionally bring in two property: free variables (in fact, before merge, procedure symbols are free variables, but they're not real 'variables' at that time) and dynamic scope variables.

To avoid function name conflicts, it would be nice to have a notation for functions as objects, or rather a way to write an sexp in code that would evaluate to a procedure. Lisp adapted such a notation from lambda calculus of Alonzo Church.

But lambda plus dynamic scoping will lead to the famous 'FUNARG' problem, so we need lexical scoping. What we want is when computing a lambda, use the environment in which it's evaluated instead of the environment in which it's executed. The solution is simply save the environment in the procedure object when evaluate a lambda (or function, they're the same thing, function is just a lambda with a name in symbol table). Within this change, we say that the procedure is closed in the current environment, and the procedure object is therefore called a *closure* of the procedure, or a *closed procedure*.

The problem of lexical scope is in the REPL: the new definition can only refer to previously defined names! We lose the ablity to define recursive procedure. This conflict, between REPL and lexical scope, is unavoidable, because such a incremental interactive top level loop for reading definitions inherently constitutes a violation of referential transparency, which we successfully got in our interpreter. A piece of code can be read in which refers to an as yet undefined identifier (the name of a procedure), and then later a definition for that identifier read in (thereby altering the meaning of the reference). If we insist on maintaining absolute referential transparency, we are forced to eliminate the incremental top level interaction, to give up interactive debugging (we can't redefine erroneous procedures easily), to give up incremental compilation of separate modules.

If we throw lexical scoping away and turn back to dynamic scoping we would lose a great deal of referential transparency and abstractive power. The solution can be a mixture: procedures must not be allowed to refer to variables internal to other procedures, but only to top-level variables existing at the time they are called. Therefore only the future top-level environment is to be included in the procedure object when it is eventually constructed. In this way free variable references will be dynamic only with respect to the top-level environment.

At this stage, we made our functions really referencial transparency, with no side effect. No side effect means no state, no state means you have to pass states up and down (as functions input and output) through the whole system. So no side effects conflicts with modular discipline. We are forced to introduce side effects as a technique for constructing modular systems. But side effects violate referential transparency, now we have two techniques for achieving modularity have come into direct conflict.

The concept of side effect is induced by particular choices of boundaries between parts of a larger system. If a system boundary encloses all processes of interest (the system is closed), we need no concept of side effect to describe that system as a whole in vacuo. If we wish to make an abstraction by dividing the system into modules more than one of which has independent state, then we have by this action created the concept of side effect.

The concept of side effect is inseparable from the notion of equality/identity/sameness. The only way one can observationally determine that a side effect has occurred is when the same object behaves in two different ways at different times. Conversely, the only way one can determine that two objects are the same is to perform a side effect on one and look for an appropriate change in the behavior of the other.

if CONS return new object on every call, then it has side effect! Because with the same input, it will generate different output (a totally new object).

If side effect are to be usable at all, the references to things denoted by variables must not make copies of those things. If the user is to be able to write procedures which produce lasting side effects on their arguments, then there must be a variable binding mechanism which does not make copies.

The ideal equality predicate should follow these two rules:

1). Two objects which are observed to behave differently must not be equal.
2). Conversely, we would like two objects which are adjudged unequal to exhibit differing behaviors under suitable circumstances.

Any useful equality predicate must satisfy 1), but it's hard to satisfy 2). (Another interesting view is, equality predicate should never be false-positive, but may be false-negative)

Based on above two rules: in the absence of RPLACA ("pure lisp"), EQUAL is preffered to EQ (like (==) in haskell); in the presence of side effects such as RPLACA, EQ is preferred to EQUAL.

Finally we found set-use-reset pattern is very helpful for modularity, and dynamic scope captures this pattern well. So we want to have both dynamic and lexical scope variables in our interpreter. We need to maintain separate environments for lexical and dynamic variables in interpreter to avoid certain problems. This will require a special syntax for distinguishing references to and bindings of the two kinds of variables.

Dynamic scoping provides an important abstraction for dealing with side effects in a controlled way. A low-level procedure may have state variables which are not of interest to intermedia routines, but which must be controlled at a high level. Dynamic scoping allows any procedure to get access to parts of the state when necessary, but permits most procedures to ignore the existence of the state variables. The existence of many dynamic variables permits the decomposition of the state in such a way that only the part of interest need be deal with.

[1] LISP was not originally derived from Church's lambda calculus. In early LISP ppl use "McCarthy conditional" to define recursion:

factorial[x] = [x=0 -> 1; T -> x*factorial[x-1]]

while in resursive function theory one would define it like this:

factorial(0) = 1
factorial(successor(x)) = successor(x) * factorial(x)

haskell adopt the later notion, while keep the first one too (case expression)