Skip to: categories | main content
Entries from April 2008
About meI just registered for OSCON. They say I should advertise that I am a speaker. Here goes.
For the last several years, I've presented multiple talks at the O'Reilly Open Source Conference. My Scalable Internet Architectures talk has been quite popular and drawn large crowds. It is an interesting talk as it doesn't really change with time. As I say, "if principles of good engineering changed frequently, I'd never drive on bridges." The talk is about sound engineering approaches to building really large consumer-facing websites. Almost all of it is open-source centric, which is why it fits so well at OSCON. While my Scalable talk was not accepted this year, I've got another talk lined up that will rock your world.
I am quite excited that my other proposal was accepted. This year I will be giving a session about using DTrace to perform "full-stack" introspection.
Using DTrace we will deep dive into the amazingly cool questions one can ask. Is my application really hitting disk? If so, what line of code is causing it? My process is being descheduled by the kernel, why? I have 100 Apache process and some randomly segfault, how do I get a stack trace when that happens? The app I am running doesn’t have the right debugging output, I need to know more!
DTrace is an oracle. The value of the answers depends on the quality of the questions. Learn to ask good question and prepare to be amazed at the possibilities.
I've given a variation on this presentation at a few places now (both internal to OmniTI and external) and had really positive feedback. I'll be taking these prior presentations and polishing them up for a 45 minute escapade that will open your eyes to new possibilities. DTrace is an amazing tool and once you get used to it, you can really take it for granted. I do. When people watch the presentation and say "by the power of Greyskull," I know I've made my point.
Come to OSCON. Immerse yourself in technology.
Today someone asked me: "You speak about ZFS a lot. I know other people that talk about the latest filesystems with praise, but generally speaking they just don't have much to offer. Is ZFS that different?"
My answer is "yes." But, of course, I can't leave it at that. I'm not going to make a performance argument -- ZFS is fast in some cases and slow in others -- just like everything else. I think one of the things we've seen in the last 10 years is that everyone felt the need to come out with their own filesystem -- at least on Linux. So, you have to as yourself why. My personal opinion is that filesystems on Linux suck.
Most filesystems on the market support snapshots. No open source filesystems on Linux (that I'm aware of) support snapshots. Of course, you can use LVM to do block-level snapshots. First off, that's a pain in the ass w.r.t. storage provisioning. Other systems make the process of allocating and managing snapshots "not my problem." (simple and easy). Let's be frank, ext2 and ext3 are nothing to write home about. reiserfs, xfs, jfs, the list goes on and on.
There are a few closed-source filesystems that are really nice. Specifically Veritas Filesystem (VxFS) and its excellent layered volume manager VxVM which appears to have heavily inspired geom on FreeBSD. DEC thought it was so cool that they pulled it white-label into Tru64. Respect.
So, what makes ZFS so different? ZFS is a disruptive technology as it abolishes the sacred line in the sand between block devices, volume management and filesystems. This means it just make storage management easy. When I say easy... I mean easy.
So you want more space? Add more disks. Want to move from from failing disks to replacements? Tell zfs to add the new ones and tell it to remove the old ones. Read that report by Google about disk errors? ZFS checksums all data. My personal experience says checksums are good. Snapshots? Sure snapshot to your heart's content. We snapshot some systems hourly and never ever delete the old ones. Snapshots are really cool, but what if you could rollback to a snapshot? zfs rollback. What if you wanted to make a read/write copy of the fileystem or an old snapshot? zfs clone. You want to store a lot of raw data? zfs has built-in compression. Oh, and it is open-source.
Simply put. ZFS. Respect.
On the second Wednesday of every month, the Baltimore/Washington PostgreSQL User Group will meet at 7070 Samuel Morse Drive, Ste 150 in Columbia, Maryland. Meetings start at 6:30pm and go until around 8:30pm. I am pretty excited about this and pleased to offer up OmniTI's facilities for this. I'm excited about the opportunity to share what I've learned, educate and grow the PostgreSQL community and learn from others in it. This is going to be "good stuff."
Our first meeting will be held on May 14th, 2008. Mark your calendars. Also, subscribe to the mailing list at: bwpug@postgresql.org.
I recently attended dtrace.conf(08), which was a blast, but I left that conference with a single thought and it has been reinforced since. Everything should be dtrace enabled. While it is true that using DTrace you can introspect just about everything in the system, the pid provider (used to trace inside user-space applications) requires the user to to know the code of the application. A full system "in-flight" has too many different apps running for me to keep all of their code-bases in my head. The kernel and one or two apps is about my limit. Also, the pid provider is somewhat limited in that it makes watching a lot of processes on that level intractable. So, what's the solution? SDTs for user applications or USDTs.
USDTs allow application authors to put specific probe points in their software that provide three compelling advantages over process-level tracing with the pid provider: (1) they should be carefully placed and named to be accessible to a user who does not know the application code base, its data structures, or even C for that matter, (2) they boast a very low barrier to entry, and (3) they work system-wide as expected.
Sounds easy, right? Well, adding the code to an existing project is "retarded simple." After a few lines of patching in the build process (usually a makefile), new probes can be added at about two lines of code per probe (one line for the probe and one line for the prototype). So, it is easy. But, why isn't every application pimped-out with DTrace probes? Linux. Linux doesn't have DTrace and as such, I think there is a lot of resistance to add the probes to software who's primary development base (and target) is Linux. I don't think they are against it, but the "what's in it for me?" question comes up and acts as an obstacle. This is the case with any "cool new technology" that's not mainstream.
The real challenge is that each open source project (to which we would add probes) has its own culture and process for proposing changes, submitting patches, negotiating for inclusion, etc. In a lot of ways, in order to effect change, we have to take the role of a package distributor and with enough impetus, we'll see the upstream pull our patches on their schedule without much cooperation from us (of course, we're also happy to cooperate).
I'm not going into the details of why DTrace is hands down better than peanut butter and jelly -- you need to see it in action to truly respect it. However, with about two hours worth of work in PostgreSQL, I exposed probes in some parts of PostgreSQL that are otherwise hard to inspect. I instrumented some XLog operations, checkpoints, "exec" nodes in the executor, buffer syncing, LRU operations (that drive the CLOG, SUBTRANS and MultiXact system), the autovacuum system and SQL executions from clients.
Look ma! I can watch my checkpoints in real-time:
CheckPoint initated... CLOG 1 SUBTRANS 1 Buffers 1430 CheckPoint complete: elapsed 92355ms CheckPoint initated... CLOG 1 SUBTRANS 1 Buffers 911 CheckPoint complete: elapsed 55933ms
55 seconds! Is that evenly distributed?
buffer writes
value ------------- Distribution ------------- count
< 0 | 0
0 |@@ 5
1 |@@ 5
2 |@@ 5
3 |@@ 5
4 |@ 4
5 |@@ 5
6 |@@ 5
7 |@@ 5
8 |@ 4
9 |@@ 5
10 |@@ 5
11 |@@ 5
12 |@ 4
13 |@@ 5
14 |@@ 5
15 |@@ 5
16 |@ 4
17 |@@ 5
18 |@@ 5
19 |@@ 5
20 |@ 4
21 |@@ 5
22 |@@ 5
23 |@@ 5
24 |@@ 5
25 |@ 4
26 | 1
27 | 0
CheckPoint complete: elapsed 26085ms
Well, it only took 26 seconds that time. It turns out that in PostgreSQL 8.3 the regular checkpoints attempt to spread out the buffer writes and we can witness it working well. Pretty neat.
With all these probes, the questions you can ask can be quite interesting. Particularly, how many xlog inserts did a query induce, or how many buffers did a query dirty. All as simple as a few lines of DTrace script now... in production... no configuration changes required.
Being able to ask systemic questions is of fundamental importance when troubleshooting problems. DTrace is built around the concept that things should be looked at systemically. However, to support that, we need applications to assist by exposing USDT probes in the "right places." We've started a project here at OmniTI called Project DTrace that aims to do just that. It's open, please join in!
Design by Andreas Viklund | Ported to Serendipity by Carl

