Feb 28, 2010

TokyoTyrant vs MongoDB vs CouchDB, simple benchmarks

Jeffery Zhao published a simple benchmark of 2 'NoSQL' databases recently. In that article only basic CRU operations are compared. On macbook unibody+osx, which is the platform Jeff use, MongoDB got slightly better scores than TokyoTyrant on almost every aspect.

We're very interested in CouchDB these days, so I cloned Jeff's benchmark suite, added scripts for CouchDB, and ran the benchmark on my platform, macbook unibody+archlinux again. However the result is really interesting - it's totally the opposite - TokyoTyrant is much more faster than MongoDB on my box.

Results:


CouchDB is really slow compared to TT or MongoDB, so I just give up it after serveral round.

The only difference between Jeff's and mine platform seems operating system: he use OSX while I use linux. I'm not sure whether this is the reason we get different results, or because TT is well optimized by gcc on linux?

Try it yourself: Simple NoSQL Bench (The suite is written in Ruby)

update: After changed from Net::HTTP to Curb, couchdb benchmarks improved about 1/3. Config couchdb [uuids] algorithm to sequential (in default.ini) has no effect on result. All 3 drivers connect to database through network, but only couchdb use http protocol, this is a bottleneck, or, trade off.

7 comments:

  1. These benchmarks don't really test what CouchDB is capable of.

    Please see http://jan.io/Gy8t and http://jan.io/SEq7

    Cheers
    Jan
    --

    ReplyDelete
  2. Hi Jan,

    (sounds like I'm talking to myself :-)

    Great articles. And yes, as you pointed out, this is not a real test of what CouchDB is capable. In the benchmarks I test CRU (without Delete) operations in an extremly simple env/context, e.g. no concurrency, no optimizations, etc.

    However that's my intention. I'm not experienced enough to get comprehensive benchmarks, I just want to check some basic things for now. The test is certainly not completed yet, I'll add more benchmarks when I have time.

    And it's surprising to see the big discrepancy between machines even the tests are so simple.

    Cheers~

    ReplyDelete
  3. The distance between CouchDB and other candidates is so big I suspect it's my fault. Did I do anything wrong on couchdb? Should I use pre-genereated sequential ids instead of random UUID? (but I think default should be good enough for all software ..)

    I can find official ruby drivers from MongoDB/TT site, but CouchDB only provide a simple ruby module to interact with couch using restful api. Is this the reason? I failed to find a ruby driver for couch, though there's many ORMs.

    ReplyDelete
  4. Hi Jan :)

    Yeah, the ruby HTTP stack is sub-par, that is one reason :) Sequential ids is another thing that gives you better performance (CouchDB 0.11 will use them by default).

    Another is that CouchDB is optimized for multi-reader/writer concurrency. The baseline might be a lot slower, but it won’t degrade as you crank up concurrency. It'll still be "fast enough" to not be a bottleneck in your application. I hope you only optimise these :)

    CouchDB doesn't have "native" bindings* and only the HTTP API because that works everywhere and gets you a ton of benefits: You can add proxies and caches and all the other nifty things you already know from your web server stack.

    So it's about trade-offs. You'll find that single query execution speed is rarely where your app needs optimising. But that is not generally true, of course.

    Cheers
    Jan
    --

    (* there's a pure-erlang API that you can use from an Erlang program that is semi-supported)

    ReplyDelete
  5. make sense. I'll check how these guys perform under concurrent context in future.

    Thanks :)

    ReplyDelete
  6. CouchDB is optimized for concurrent performance. If you do the writes concurrently (like a webapp normally would under load) performance actually increases because the fsync's are batched.

    I can't actually tell what these numbers mean or what this test does because the original article is in japanese but I would warn against testing MongoDB write performance without including a read for each insert. By default MongoDB doesn't return a response for insert calls so all you're really testing is the socket.write() time. You don't know if what you've inserted is even available yet ( it mostly likely is available quickly if you only have one writer but under concurrent load you could have problems) and it won't be written to disc for possibly another minute.

    ReplyDelete
  7. Hi Mikeal, thank you for the info.

    btw. the original article is in Chinese :->

    ReplyDelete