Skip to: categories | main content
Esoteric Curio
Want to work with me at the $DAYJOB?
I am the CEO of OmniTI where I do all sorts of stuff I find absolutely fascinating.
It is rumored that I write code sometimes. I often don't believe this myself, so I use this to follow what it is that I'm working on:
In perhaps a new trend, I’m blogging from 39011 feet (or so says the seatback in front of me). I’m traveling back home to the east coast from San Jose, CA where I attended (and spoke) at this year’s O’Reilly Velocity Conference.
I participated (and blogged) about the Velocity Summit in which I’ve participated for the past two years. The summit is the unconference preceding the real conference that help the organizers digest current hot topics and better define the conference track for the actual conference. The summit itself is filled with enough brain power to warp space-time, so I drop everything to go to that.
Ironically, despite being a well respected authority in web site (and general internet) scalability and performance, my talk proposals for Velocity 2008 were not accepted — I clearly need to write better proposals. This year, I managed to work my way into the workshop track on Monday. Despite having a bad headache and feeling "off" the day before, I managed to get my act together and put on an A-game for my workshop. For those of you interested, here is my scalable09 slide stack.
I thought I’d take a moment to talk about what I liked about the conference and what I think could use some improvement. I realize this is a down economy and that might be a legitimate justification for some the actions that resulted in some of my disappointment.
First, the negative. I usually start with positive and end with negative because I’m a pessimist. However, all in all the conference was awesome, so I thought I’d get my short list of gripes out of the way early.
O’Reilly is infamous for throwing good conferences for geeks. In my opinion, the field of web operations has been so severely neglected and applies so broadly to the world today that this conference needs to be for everyone.
Now that I’ve griped and aired my disappointment. I can focus on the gobs of awesomeness that was Velocity.
In my workshop, I spent about 20% of the time discussing the philosophy of being a good engineer and 80% discussing practice (non-cookbook) with examples and advice. The basic message is that systems are complex and you must think of all the parts holistically or its a recipe for disaster — or failure.
Two of my favorite talks were Nicole Sullivan's “The Fast and the Fabulous: 9 ways engineering and design come together to make your site slow” and “ 10+ Deploys Per Day: Dev and Ops Cooperation at Flickr” by John Allspaw and Paul Hammond. while Nicole's presentation, like mine, was not recorded, the other was and if you want to break down the divide between operations and development, it is a must see.
All in all, I would encourage everyone reading this to attend next year's Velocity conference. I am certain you will walk away with knowledge that is both valuable and applicable.
I post here, I post there... I certainly don't post everywhere. Anyway, I wrote an article that discusses the basic technologies that power large cloud-like storage systems. Thought you might be interested.
A little over a year ago, I started in on a project that was of significant scope. Not a few scripts hacked together, nor a conglomeration of pre-existing tools, but rather a carefully engineered product. What product is this? Reconnoiter.
About 10 years ago, we were neck deep in large scale e-mail architecture. We felt pain, we were up at 3am every night attempting to make systems work. Finally, we decided enough was enough and started a skunkworks project to build a better e-mail server. Well, that turned out pretty well. It's got considerable momentum at this stage and is leading the industry as the most advanced digital messaging platform on the planet.
Over the past several (12) years, we have run operations for small and large sites alike. We're responsible for waking up and fixing things at 3am when they are not working. We're responsible for not only designing highly scalable architectures, we're responsible sticking around and seeing them through to the finish. Many people are writing tools for management; two in the recent spotlight are Puppet and Chef. We had very little pain in the arena of provisioning and maintaining systems. I have a theory as to why that is, but that is a topic for another monologue. One of the distinct pains we have suffered since we began revolves around monitoring.
The first issue is that monitoring is two things:
There are many tools today that are hard to use and fail to address our needs for managing thousands of very different machines. Worse, the tools do only one or the other. This means that we must invest time checking disk space in the fault detection tool to alert us when it is "too full" and configure a similar check in a trending tool to show us historical information. Some patchwork was introduced into fault detection tools like Nagios to add trending features... and when I use it, it is clear it was not central to the design.
I have a lot of gripes, but I won't go into all of them. Suffice to say I have them and I think they are the true fuel for developing a next-gen tool to make operations folk suffer less. Combine that with the combustive talent of the engineering group at OmniTI (and now a few outside it) and the oxygen that the open source community provides, and we'll be having a barbecue in no time.
A lot has happened in recent months on the Reconnoiter front. Here's a set of highlights:
We've been slowly introducing our managed clients to Reconnoiter and we have, at this point, about a terabyte of metric data. In Reconnoiter, there is no default action to discard data. Yes, that's right. Go buy more disk. It's cheap. You'll thank me next time you have an anomaly today that you think reminds you of one seven months ago... and when you go look at the graph you actually find all the data at its original granularity.
I heard some rumors float around about using dd as a simple test for disk throughput. I'd like to verbosely say, "that's a bad idea."
I'm going to log into a crappy system with two extremely slow 1.5TB SATA drives in RAID 1. Yes, this is a production machine and it's goal in life is to store a lot of things and serve some of them infrequently -- as such, its configuration is well suited for that task.
; /bin/time sh -c "dd if=/dev/zero of=ddfile bs=8k count=2000000"; /bin/time sync 2000000+0 records in 2000000+0 records out real 53.9 user 1.2 sys 28.4 real 0.2 user 0.0 sys 0.0
I'll note first that the second set of times is the sync and we see it was effectively free. Second, we got 304MB/s. In RAID 1 we have to write to both drives, so in the best case we get the performance of the worst spindle. 304Mb/s seems a tad high.
; /bin/time sh -c "dd if=/dev/zero of=ddfile2 bs=8k count=2000000" 2000000+0 records in 2000000+0 records out ; /bin/time dd if=ddfile of=/dev/null bs=8k 2000000+0 records in 2000000+0 records out real 36.9 user 1.3 sys 35.5
The first statement blows any buffer cache we might have. Then we see a read of a 16GB file that sustains an average of 444 MB/s (over two spindles). That too seems a little high.
Oh wait, I had compression on. Let's rerun that with it turned off.
; /bin/time sh -c "dd if=/dev/zero of=ddfile bs=8k count=2000000" ; /bin/time sync 2000000+0 records in 2000000+0 records out real 3:17.9 user 1.2 sys 29.5 real 0.2 user 0.0 sys 0.0
Interestingly, still the sync is dirt cheap (ZFS pretty aggressively writes back) and we're at about 83MB/s. That's the sweet sucking sound of mirrored SATA disks.
Now reading it back after blowing the ARC (ZFS's version of buffer cache):
; /bin/time sh -c "dd if=/dev/zero of=ddfile2 bs=8k count=2000000" 2000000+0 records in 2000000+0 records out ; /bin/time dd if=ddfile of=/dev/null bs=8k 2000000+0 records in 2000000+0 records out real 2:09.3 user 1.2 sys 13.1
127MB/s, as expected we see a better, yet still crappy throughput from our drives on reading as we're coming from two spindles instead of one.
Long story short: modern filesystems can do whack stuff to your workloads. Use a comprehensive workload generator for I/O benchmarking. Preferably one that can simulate something resembling a real workload. Greg mentions bonnie++ in his post about benchmarking. Bonnie++ is a legitimate benchmarking tool, but for generating real workloads, I suggest filebench. It might be a bit more work, but at least you can use the results!

