Dec 28, 2008

Bloom filter

An empty Bloom filter is a bit array of m bits, all set to 0. There must also be k different hash functions defined, each of which maps or hashes some set element to one of the m array positions with an even distribution.

To add an element, feed it to each of the k hash functions to get k array positions. Set the bits at all these positions to 1.

To query for an element (test whether it is in the set), feed it to each of the k hash functions to get k array positions. If any of the bits at these positions are 0, the element is not in the set – if it were, then all the bits would have been set to 1 when it was inserted. If all are 1, then either the element is in the set, or the bits have been set to 1 during the insertion of other elements.

False positive is possible, false negative is impossible for bloom filter.

False Positive can be understanded as a "failed positive assertion", e.g. If I report to teacher you cheated in the last exam but in fact you didn't. In bloom filter case, it means the filter may report you the keyword 'd' exists in the set {'a','b','c'}, but the filter will never tell you 'a' or 'b' or 'c' not exists in the same set.

igvita provides a ruby implementation.

Dec 25, 2008

Rails: ActionController#render become more smarter

Pratik Naik pushed a series of commits to make ActionController#render more easy to use.

1. It will recognize actions.

# Instead of render(:action => 'other_action')
render('other_action')

2. It will recognize templates.

# Instead of render(:template => 'controller/action')
render('controller/action')

3. It will recognize a file path.

# Instead of render(:file => '/Users/lifo/home.html.erb')
render('/Users/lifo/home.html.erb')

4. It will recognize a symbol. So in your create action you can just write 'render :new' when model save failed!

Dec 23, 2008

TDD and BDD

Update1: I found I misunderstanded TDD, now I confused with BDD again :(see comments below)

Update2: Someone asked me that why we should write tests first? If we write tests immediately after implement a function, what we lose compare to write test first? My answer is simple: how can you know you test will fail before you write the implementation if you don't write test at first? And how can you make sure it's your code made the test pass, not because the test will pass even without your code?

What's the differences between TDD and BDD? That's always a good candidate question for you recruiting :) It's hard to get the answer from their boring description, at least hard to me.

Today I found a good example to demonstrate the diff between them - Rails validations. For those who don't use rails, validations is used to guard data correctness, for example, if I wrote this in my rails app:

class Book < style="font-style: italic;"> validates_presence_of :title
end

it will always check the Book's title is not blank when save the object. Now the problem is, should you write tests/assertions for this statement?

TDD will say no. Rails is released with lots of tests, the validation's correctness has already been guaranteed by it, in other words your own tests are just duplicate works. But from BDDer's POV, you should write tests for it - to describe the Book's *behavior*, not to cover the validation's implementation code.

So, BDD want to make sure we implemented the right behavior, or function/feature/whatever. If you refactor your code one month later, those seems redundant tests will guarantee your app's behavior is consistent after your refactor.

This gap between TDD/BDD will lead a inference: 100% code coverage suggests you did very well in a TDD team, but it means nothing if you're in a BDD team. As the example shows, even you didn't write test for the validation, your test will cover it(you tests definitely will save a book object somewhere, thus emit the validation), but you won't know whether your books will still require a title after refactor.

Dec 21, 2008

Haskell Numeric Types

From http://book.realworldhaskell.org/read/using-typeclasses.html

Table 6.1. Selected Numeric Types

TypeDescription
DoubleDouble-precision floating point. A common choice for floating-point data.
FloatSingle-precision floating point. Often used when interfacing with C.
IntFixed-precision signed integer; minimum range [-2^29..2^29-1]. Commonly used.
Int88-bit signed integer
Int1616-bit signed integer
Int3232-bit signed integer
Int6464-bit signed integer
IntegerArbitrary-precision signed integer; range limited only by machine resources. Commonly used.
RationalArbitrary-precision rational numbers. Stored as a ratio of two Integers.
WordFixed-precision unsigned integer; storage size same as Int
Word88-bit unsigned integer
Word1616-bit unsigned integer
Word3232-bit unsigned integer
Word6464-bit unsigned integer


Table 6.2. Selected Numeric Functions and Constants

ItemTypeModuleDescription
(+)Num a => a -> a -> aPreludeAddition
(-)Num a => a -> a -> aPreludeSubtraction
(*)Num a => a -> a -> aPreludeMultiplication
(/)Fractional a => a -> a -> aPreludeFractional division
(**)Floating a => a -> a -> aPreludeRaise to the power of
(^)(Num a, Integral b) => a -> b -> aPreludeRaise a number to a non-negative, integral power
(^^)(Fractional a, Integral b) => a -> b -> aPreludeRaise a fractional number to any integral power
(%)Integral a => a -> a -> Ratio aData.RatioRatio composition
(.&.)Bits a => a -> a -> aData.BitsBitwise and
(.|.)Bits a => a -> a -> aData.BitsBitwise or
absNum a => a -> aPreludeAbsolute value
approxRationalRealFrac a => a -> a -> RationalData.RatioApproximate rational composition based on fractional numerators and denominators
cosFloating a => a -> aPreludeCosine. Also provided are acos, cosh, and acosh, with the same type.
divIntegral a => a -> a -> aPreludeInteger division always truncated down; see also quot
fromIntegerNum a => Integer -> aPreludeConversion from an Integer to any numeric type
fromIntegral(Integral a, Num b) => a -> bPreludeMore general conversion from any Integral to any numeric type
fromRationalFractional a => Rational -> aPreludeConversion from a Rational. May be lossy.
logFloating a => a -> aPreludeNatural logarithm
logBaseFloating a => a -> a -> aPreludeLog with explicit base
maxBoundBounded a => aPreludeThe maximum value of a bounded type
minBoundBounded a => aPreludeThe minimum value of a bounded type
modIntegral a => a -> a -> aPreludeInteger modulus
piFloating a => aPreludeMathematical constant pi
quotIntegral a => a -> a -> aPreludeInteger division; fractional part of quotient truncated towards zero
recipFractional a => a -> aPreludeReciprocal
remIntegral a => a -> a -> aPreludeRemainder of integer division
round(RealFrac a, Integral b) => a -> bPreludeRounds to nearest integer
shiftBits a => a -> Int -> aBitsShift left by the specified number of bits, which may be negative for a right shift.
sinFloating a => a -> aPreludeSine. Also provided are asin, sinh, and asinh, with the same type.
sqrtFloating a => a -> aPreludeSquare root
tanFloating a => a -> aPreludeTangent. Also provided are atan, tanh, and atanh, with the same type.
toIntegerIntegral a => a -> IntegerPreludeConvert any Integral to an Integer
toRationalReal a => a -> RationalPreludeConvert losslessly to Rational
truncate(RealFrac a, Integral b) => a -> bPreludeTruncates number towards zero
xorBits a => a -> a -> aData.BitsBitwise exclusive or

Table 6.3. Typeclass Instances for Numeric Types

TypeBitsBoundedFloatingFractionalIntegralNumRealRealFrac
Double

XX
XXX
Float

XX
XXX
IntXX

XXX
Int16XX

XXX
Int32XX

XXX
Int64XX

XXX
IntegerX


XXX
Rational or any Ratio


X
XXX
WordXX

XXX
Word16XX

XXX
Word32XX

XXX
Word64XX

XXX

Table 6.4. Conversion Between Numeric Types

Source TypeDestination Type
Double, FloatInt, WordIntegerRational
Double, FloatfromRational . toRationaltruncate *truncate *toRational
Int, WordfromIntegralfromIntegralfromIntegralfromIntegral
IntegerfromIntegralfromIntegralN/AfromIntegral
RationalfromRationaltruncate *truncate *N/A

Dec 19, 2008

procedure/goto/if/case... What should we use?

"Recently practitioners of programming discipline have advised us not to be afraid to use procedure calls, as the readability they afford is worth the inefficiency. This is like saying that cars with square wheels qare all right because transportation is worth a bumpy ride: we really ought instead to concentrate on improving our wheels."

An interesting metaphor from Guy Lewis Steele in his paper. "as the A they afford is worth the B", I've seen such statements serveral times. It make me think that it's usually an execuse because ppl just don't want to spend time to really resolve a problem.

Two insightful points from this paper:

* Our difficulties in dealing with programming and programming languages stem in part from a confusion between the abstract notions of prgram organization and the concrete embodiment of these notions as programming language constructs. The important point is that for each important programming concept which we have found useful -- modularity, iteration, assignment, conditionals -- there may exist more than one programming language construct which can embody that concept in a reasonably natural manner. In understanding (a piece of) a program it is necessary to determine not only what construct is being used, but the concept it is intended to express.

* We discovered that GOTO was often being used even in situations where more appropriate and expressive constructs were available, and that it was being used for all sorts of purposes we hadn't anticipated, and so we sought to eliminate unwanted concepts and programming styles by banning a construct.

Dec 17, 2008

What's need by tail-recursion optimization in funcational language?

What's the common ideas needed by all high-level programming languages?

* Transfer of control
- conditional
- unconditional
* Environment operation
- variable binding on function entry
- declaration of local variable
* Side effects
- assignment
- IO
* Process synchronization
- resource allocation
- passing of information between processes

The idea behind tail-recursion optimization in applicative language

* function is goto with arguments. There's no need for return address for a function. We need save a return address on the stack when we begin to evaluate a form (function call) which is to provide an argument for another function, rather than when we invoke the function.
* so, we pushes addtional control stack only when evaluate forms, for function application, we only need set up environments.
* this is why we can't use the same way to optimize tail-recursion in imperative language - the return address for the caller function may not be the continuation address of the callee
* lexical binding is necessary for lambda goto rule. we can't use goto to replace funcation call with dynamic binding.

An interesting symmetry: control constructs determine constraints in time (sequencing) in a program, while environment operators determine constraints in space (textual extent, or scope).

Ref. Guy Lewis Steele, Jr.. "Lambda: The Ultimate Declarative". MIT AI Lab. AI Lab Memo AIM-379. November 1976.

Dec 16, 2008

Super fast action with Rails Metal

Josh commit a new feature into edge rails named Metal. With it ppl can use rack directly to serve request easily. Take a look at his example:

  # app/metal/poller.rb
class Poller <> "application/json"},
Message.recent.to_json]
else
super
end
end
end

* There is a generator to help you get started
`script/generate metal poller`

* Also, metal bits can be ran standalone with rackup
`rackup app/metal/poller.rb`

Not the request to 'http://your.app.com/poller' will be processed in rack, that's more faster than processing it in controller :)