<?xml version="1.0" encoding="utf-8" ?>

<rss version="2.0" 
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:admin="http://webns.net/mvcb/"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
   xmlns:wfw="http://wellformedweb.org/CommentAPI/"
   xmlns:content="http://purl.org/rss/1.0/modules/content/"
   >
<channel>
    <title>Esoteric Curio</title>
    <link>http://www.lethargy.org/~jesus/</link>
    <description>Theo's Contributions to Technological Surreality</description>
    <dc:language>en</dc:language>
    <generator>Serendipity 1.1 - http://www.s9y.org/</generator>
    <pubDate>Fri, 04 Jul 2008 03:28:05 GMT</pubDate>

    <image>
        <url>http://www.lethargy.org/~jesus/templates/default/img/s9y_banner_small.png</url>
        <title>RSS: Esoteric Curio - Theo's Contributions to Technological Surreality</title>
        <link>http://www.lethargy.org/~jesus/</link>
        <width>100</width>
        <height>21</height>
    </image>

<item>
    <title>Scalability and concessions</title>
    <link>http://www.lethargy.org/~jesus/archives/122-Scalability-and-concessions.html</link>
            <category>Damaged Bits</category>
    
    <comments>http://www.lethargy.org/~jesus/archives/122-Scalability-and-concessions.html#comments</comments>
    <wfw:comment>http://www.lethargy.org/~jesus/wfwcomment.php?cid=122</wfw:comment>

    <slash:comments>3</slash:comments>
    <wfw:commentRss>http://www.lethargy.org/~jesus/rss.php?version=2.0&amp;type=comments&amp;cid=122</wfw:commentRss>
    

    <author>nospam@example.com (Theo Schlossnagle)</author>
    <content:encoded>
    &lt;p&gt;Oren Hurvitz has a great post about &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=257&amp;amp;entry_id=122&quot; title=&quot;http://hurvitz.org/blog/2008/06/linkedin-architecture&quot;  onmouseover=&quot;window.status=&#039;http://hurvitz.org/blog/2008/06/linkedin-architecture&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;LinkedIn&#039;s architecture&lt;/a&gt;. It&#039;s well-written and well thought out. Their architecture has evolved on what appears to be a steady and safe path of improvement. It is well worth a read.&lt;/p&gt;

&lt;p&gt;I would like to comment on something I see repeated again and again and is likely misinterpreted by young scalability architects. The statement of what you should expect to lose when you scale up/out. Oren writes:&lt;/p&gt;

&lt;blockquote&gt;The presentation ends with some tips about scaling. These are oldies but goodies:&lt;br/&gt;&lt;ul&gt;&lt;li&gt;Can’t use just one database. Use many databases, partitioned horizontally and vertically.&lt;/li&gt;&lt;li&gt;Because of partitioning, forget about referential integrity or cross-domain JOINs.&lt;/li&gt;&lt;li&gt;Forget about 100% data integrity.&lt;/li&gt;&lt;li&gt;At large scale, cost is a problem: hardware, databases, licenses, storage, power.&lt;/li&gt;&lt;li&gt;Once you’re large, spammers and data-scrapers come a-knocking.&lt;/li&gt;&lt;li&gt;Cache!&lt;/li&gt;&lt;li&gt;Use asynchronous flows.&lt;/li&gt;&lt;li&gt;Reporting and analytics are challenging; consider them up-front when designing the system.&lt;/li&gt;&lt;li&gt;Expect the system to fail.&lt;/li&gt;&lt;li&gt;Don’t underestimate your growth trajectory.&lt;/li&gt;&lt;/ul&gt;&lt;/blockquote&gt;

&lt;p&gt;Now, I agree with much of that. The spammers comment should be revised to &quot;&lt;em&gt;Fraud happens and the bigger you are, the bigger the bullseye.&lt;/em&gt;&quot; Be aware and protect your assets. Everything from Cache! on down: hard and fast rules. The cost argument is odd.  While it is completely correct, it&#039;s also rather obvious.  If your business model ties audience size and site use to revenue (which it should), then the cost should simply scale sub-linearly w.r.t. revenues (i.e. no big deal).  However, there are a few that remain on that list that should be cherished and the loss of them should pain you.&lt;/p&gt;

&lt;p&gt;&quot;&lt;em&gt;[You] Can&#039;t use just one database&lt;/em&gt;&quot; -- this is a conclusion you should arrive at after analysis. We have one client that supports 10 million users on a cluster of partitioned databases. We have another that supports 35 millions users on one database without issue and room for growth.&lt;/p&gt;

&lt;p&gt;&quot;&lt;em&gt;Because of partitioning, forget about referential integrity or cross-domain JOINs.&lt;/em&gt;&quot; Think. Think hard. Think harder. Sometimes it is possible to partition in a fashion that allows for integrity. While I&#039;m sure (or at least hope) that the LinkedIn guys had some sleepless nights making the decision to break foreign constraints, it isn&#039;t conveyed. You should absolutely have some sleepless nights over a decision like that. My bank supports many more users and transactions than LinkedIn -- and it damn well better have FKs and 100% integrity. So, while you still may partition in such a fashion that requires a loss of enforced integrity, the decision should be a heavy one.&lt;/p&gt;

&lt;p&gt;&quot;&lt;em&gt;Forget about 100% data integrity.&lt;/em&gt;&quot; WTF? While I&#039;m sure it was the end of the post and he was being smart, someone somewhere might actually take the advice to forget about data integrity. You never, ever, ever forget about it. We have some &quot;one big database&quot; architectures where data integrity has been an issue due to memory bit-flips (corrupt data on disk) -- it&#039;s a BFP (big f@#$ing problem) and we treat it that way. Sometimes you make an architectural decision that will make the loss of integrity much more probable (partitioning and losing FK constraints is a ripe example). It&#039;s still something that should be attended to with great attention and diligence. you should never forget about data integrity and always put forth the effort required to reach as close to 100% as possible. When you lose data integrity you end up with a big pile of shit in your database. I&#039;ll leave you with a rather crass metaphor:&lt;/p&gt;

&lt;blockquote&gt;There&#039;s an expectation that there is no shit on your living room floor. Don&#039;t shit in your living room. Don&#039;t let your dog shit in your living room. If you&#039;re a dog owner, you know your dog could have an accident. You bought the dog. You chose to increase the probability of finding shit in your living room. Don&#039;t ignore it or forget it. Clean up the shit when it happens. If you get suddenly ill while playing your Wii naked and shit on your living room floor (be it probable or improbable)... respect yourself -- clean it up. Never forget the goal: a 100% shit-free living room.
&lt;/blockquote&gt; 
    </content:encoded>

    <pubDate>Wed, 02 Jul 2008 11:52:03 -0400</pubDate>
    <guid isPermaLink="false">http://www.lethargy.org/~jesus/archives/122-guid.html</guid>
    
</item>
<item>
    <title>Reconnoiter and another platform</title>
    <link>http://www.lethargy.org/~jesus/archives/121-Reconnoiter-and-another-platform.html</link>
            <category>OpenSolaris</category>
    
    <comments>http://www.lethargy.org/~jesus/archives/121-Reconnoiter-and-another-platform.html#comments</comments>
    <wfw:comment>http://www.lethargy.org/~jesus/wfwcomment.php?cid=121</wfw:comment>

    <slash:comments>3</slash:comments>
    <wfw:commentRss>http://www.lethargy.org/~jesus/rss.php?version=2.0&amp;type=comments&amp;cid=121</wfw:commentRss>
    

    <author>nospam@example.com (Theo Schlossnagle)</author>
    <content:encoded>
    &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=251&amp;amp;entry_id=121&quot; title=&quot;http://labs.omniti.com/trac/reconnoiter&quot;  onmouseover=&quot;window.status=&#039;http://labs.omniti.com/trac/reconnoiter&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;Reconnoiter&lt;/a&gt; is coming along.  Unlike most open source project, I tend not to talk about mine until their are really useful to people.  Over the last year, I&#039;ve adopted the unhealthy attitude that useful means &quot;shiny front-end.&quot;  So, I&#039;m blogging to break that attitude and talk a bit about project that doesn&#039;t have a shiny front-end... yet.&lt;br /&gt;&lt;br /&gt;Reconnoiter is built out of years of frustration using tools like RRDTOOL, Munin, Cacti, ZenOSS, Nagios, etc. etc.  I have a lot of problems with these tools.  First, they are not efficient.  I need a powerful machine to monitor a mere 10k services.  And it actually gets to be an engineering challenge to monitor 100k services with these tools.  Also, the graphs are about 10 years old with respect to design and usability.  I want something new, something fresh, and something that doesn&#039;t need a damn web UI to configure.  Several people have asked, why are you reinventing the wheel?  Why don&#039;t you just improve an existing product?  My answer is that I want a well-thought-out product foundation so that I can trust all the bits.  I want reponsibilities decoupled at the right spots.  I want data in a form that the world can query and run reports the likes of which I have not concieved.  I don&#039;t want the load on my monitoring machines to be 8.  I want my monitoring system to check services and metrics when it planned to, not several minutes (or event 2 seconds) after it told me it would.  Simply put, I expect it to work well, all the time.  And, of course, I want it to work how I would expect it to work.&lt;br /&gt;&lt;br /&gt;Reconnoiter was born out of the need to monitor the internals of many disconnected data centers with between 10 and 1000 machines in each facility.  Monitoring can mean a lot of things, here I consider it to be the collection of metrics and awareness of their availability.  In and of itself, monitoring is pretty useless, but it is the foundation for two critical pursuits in Internet infrastructure and business management: fault detection and trending.&lt;br /&gt;&lt;br /&gt;Fault detection is as simple as understanding when something has faulted.  However, knowing something is broken is easier than knowing something is about to break.  Is it better to know that your machine just crashed because the chip slagged to the motherboard, or that the temperatures in rack 043 are rising unexpectedly?  Answer: both, but I hope I only learn the latter and not the former.  Truly, there are too many things to monitor... hundred or thousands of metrics on each piece of equipment.  I can&#039;t reasonable go in and configure good/bad thresholds on each one.  I want anomaly detection.  I want a system that I can say: &quot;this looks right, tell me when it stops looking right.&quot;  That, to me, is a much need companion to tradition fault detection.&lt;br /&gt;&lt;br /&gt;To me, trending is much more than drawing graphs... it is about intelligent data correlation, regression analysis/curve fitting and looking into the past to see how much you fucked up getting where you are now -- in the vain hope that you learn from your mistakes and plan better next time.&lt;br /&gt;&lt;br /&gt;Reconnoiter is an attempt to build these things.  Building a system requires starting with pain (need), solid structure and plumbing (good engineering).  So, reconnoiter is underway.  And this post is in mid-step:&lt;br /&gt;&lt;br /&gt;It started on OpenBSD, and added support for FreeBSD, Mac OS X, Linux.&lt;br /&gt;&lt;br /&gt;As of changeset [292], we have Solaris/OpenSolaris support.&lt;br /&gt;&lt;br /&gt;We have a pretty nice front-end for trending under construction, but it isn&#039;t there yet.  We&#039;ll have numeric data combined with textual &quot;event&quot; data on the same graphs.  All that convenient stuff.  Here&#039;s the rather plain-Jane graph you get now (because some people won&#039;t even read a post if it doesn&#039;t have a pretty graph):&lt;br /&gt;&lt;br /&gt;&lt;div align=&quot;center&quot;&gt;&lt;img style=&quot;max-width: 800px;&quot; src=&quot;http://www.lethargy.org/%7Ejesus/uploads/noit_bw_graph.png&quot; /&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Honestly, I don&#039;t know what the value of this post is, but people around here keep telling me that people should be aware of an open-source tool like this, even if it isn&#039;t finished (read: usable) yet.  I say it isn&#039;t usable yet, but on our development instances here, we monitor 2892 production metrics across two data centers and the load never peaks past 0.10.  I&#039;m pretty excited about where this is going.  Honestly, my favorite part right now is that I can configure and control the noitd checking nodes via a telnet console and it acts as if it is a piece of network equipment rather than an &quot;application&quot; -- as it should be IMHO.&lt;br /&gt; 
    </content:encoded>

    <pubDate>Fri, 27 Jun 2008 17:27:27 -0400</pubDate>
    <guid isPermaLink="false">http://www.lethargy.org/~jesus/archives/121-guid.html</guid>
    
</item>
<item>
    <title>Dissecting today's Internet traffic spikes</title>
    <link>http://www.lethargy.org/~jesus/archives/118-Dissecting-todays-Internet-traffic-spikes.html</link>
            <category>Damaged Bits</category>
    
    <comments>http://www.lethargy.org/~jesus/archives/118-Dissecting-todays-Internet-traffic-spikes.html#comments</comments>
    <wfw:comment>http://www.lethargy.org/~jesus/wfwcomment.php?cid=118</wfw:comment>

    <slash:comments>13</slash:comments>
    <wfw:commentRss>http://www.lethargy.org/~jesus/rss.php?version=2.0&amp;type=comments&amp;cid=118</wfw:commentRss>
    

    <author>nospam@example.com (Theo Schlossnagle)</author>
    <content:encoded>
    Today&#039;s Internet has changed quite a bit from the Internet I used to know.  The Internet has always been successful because of net neutrality.  What&#039;s net neutrality?  It&#039;s complicated, but essentially it means that anyone anywhere can publish with equal rights.  These aren&#039;t the kind of rights people usually talk about... I&#039;m not speaking of freedom of speech.  Instead, I&#039;m talking about content being simply bits.  It doesn&#039;t matter if it comes from &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=239&amp;amp;entry_id=118&quot; title=&quot;http://cnn.com/&quot;  onmouseover=&quot;window.status=&#039;http://cnn.com/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;CNN&lt;/a&gt; or &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=240&amp;amp;entry_id=118&quot; title=&quot;http://lethargy.org/&quot;  onmouseover=&quot;window.status=&#039;http://lethargy.org/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;this blog&lt;/a&gt;, you as a reader can download the bits that make up the pages you see without bias or preferential treatment.  This makes it darn easy to be a publisher and leads to a fabulous ecosystem with an overwhelming amount of varied content.  However, with more content it is easy to recognize that much of it is utter trash.  Yes. Yes. I know that one man&#039;s trash is another man&#039;s treasure.  However, it presents opportunities for sites that help you navigate the wasteland.&lt;br /&gt;&lt;br /&gt;Many popular sites today are popular because they link to articles and news items and photographs and movies all over the Internet; they are &quot;interest aggregation services.&quot;  And while the Internet has (for now) a decent preservation of net neutrality when it comes to simple web content, not all publishers are on equal footing.  Not long ago, anyone could run a server anywhere (their basement) with DSL or cable or (gasp) dial-up -- now, the challenge is coping with unexpected attention.&lt;br /&gt;&lt;br /&gt;Years ago, the site &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=241&amp;amp;entry_id=118&quot; title=&quot;http://slashdot.org/&quot;  onmouseover=&quot;window.status=&#039;http://slashdot.org/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;slashdot&lt;/a&gt; coined a term &quot;slashdotted&quot; which meant that a site received so much sudden traffic that service degraded beyond an acceptable point and the site was effectively unavailable.  This often happened to sites that were at the end of small pipes (DSL, T1, etc.) and occasionally (though rarely) due to bad engineering.  While slashdot might have coined the term, they simply don&#039;t have the viewership numbers that other large sites today have.&lt;br /&gt;&lt;br /&gt;At the &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=242&amp;amp;entry_id=118&quot; title=&quot;http://omniti.com/&quot;  onmouseover=&quot;window.status=&#039;http://omniti.com/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;$DAYJOB&lt;/a&gt;, I work on sites that aren&#039;t on the end of T1 lines.  Sites with gigabits or tens of gigabits of connectivity.  Sites with 50 millions users.  Sites powered by thousands of machines. I also work on sites that service millions of people from just a handful of machines (efficiency certainly has its advantages sometimes).  I find it particularly interesting that already popular sites (with significant baseline bandwidth) are seeing these unexpected surges.  For a long time, my blog has been on this same machine which is a vhost for several other web sites.  I&#039;ve had traffic spikes from places like slashdot, reddit, digg, etc.  And, no surprise, I couldn&#039;t actually see the bandwidth jump on the graphs... 10Mbits to 11Mbs?  That&#039;s not a spike.&lt;br /&gt;&lt;br /&gt;Things are changing.  Sites like &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=243&amp;amp;entry_id=118&quot; title=&quot;http://digg.com/&quot;  onmouseover=&quot;window.status=&#039;http://digg.com/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;Digg&lt;/a&gt; are becoming ever more popular and people are drawn to them as a means of sifting the waste of the Internet.   This means as more people rely on &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=243&amp;amp;entry_id=118&quot; title=&quot;http://digg.com/&quot;  onmouseover=&quot;window.status=&#039;http://digg.com/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;Digg&lt;/a&gt; and &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=245&amp;amp;entry_id=118&quot; title=&quot;http://reddit.com/&quot;  onmouseover=&quot;window.status=&#039;http://reddit.com/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;Reddit&lt;/a&gt; and other similar sites, the number of unexpected viewers of your content can rise more sharply.&lt;br /&gt;&lt;br /&gt;What does all of this mean?  It means that the old rule of thumb that your infrastructure should see 70% resource utilization at peak is starting to falter.  The typical trends used to look like this (this is last week&#039;s graph from a retail client with a user base of 3 million):&lt;br /&gt;&lt;br /&gt;&lt;div align=&quot;center&quot;&gt;&lt;img style=&quot;border: 1px solid rgb(200, 200, 200); padding: 4px; max-width: 800px;&quot; src=&quot;http://www.lethargy.org/%7Ejesus/uploads/Picture%201.png&quot; /&gt;&lt;br /&gt;&lt;div align=&quot;left&quot;&gt;&lt;br /&gt;We see a nice peak, a nice valley.  Thursday afternoon, we see a nice traffic spike.  Well, this used to be what I called a traffic spike.  Now, different services have different spike signatures.  It resembles traffic model of classic Internet advertising, except that there is genuine interest and thus dramatically higher conversion rates.  It&#039;s a simple combination of placement, frequency and exposure.  Because content, unlike ad banners, exists for an extended period of time (sometimes forever), the frequency is very high.  Digg and Reddit have excellent placement with very little exposure (things move out quickly).  A site like CNN or NYTimes usually provides mediocre placement (unless you are on the front page) and excellent exposure.&lt;br /&gt;&lt;br /&gt;Lately, I see more sudden eyeballs and what used to be an established trend seems to fall into a more chaotic pattern that is the aggregate of different spike signatures around a smooth curve.  This graph is from two consecutive days where we have a beautiful comparison of a relatively uneventful day followed by long-exposure spike (nytimes.com) compounded by a short-exposure spike (digg.com):&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div align=&quot;center&quot;&gt;&lt;img style=&quot;border: 1px solid rgb(200, 200, 200); padding: 4px; max-width: 800px;&quot; src=&quot;http://www.lethargy.org/%7Ejesus/uploads/graph.png&quot; /&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;The disturbing part is that this occurs even on larger sites now due to the sheer magnitude of eyeballs looking at today&#039;s already popular sites.  Long story short, this makes planning a real bitch.&lt;br /&gt;&lt;br /&gt;And the interesting thing is perspective on what is large...  People think Digg is popular -- it is.  The &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=246&amp;amp;entry_id=118&quot; title=&quot;http://nytimes.com/&quot;  onmouseover=&quot;window.status=&#039;http://nytimes.com/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;New York Times&lt;/a&gt; is too, as is CNN and most other major news networks -- if they link to your site, you can expect to see a dramatic and very sudden increase in traffic. And this is just in the United States (and some other English speaking countries)... there are others... and they&#039;re kinda big.&lt;br /&gt;&lt;br /&gt;What isn&#039;t entirely obvious in the above graphs?  These spikes happen inside 60 seconds.  The idea of provisioning more servers (virtual or not) is unrealistic.  Even in a cloud computing system, getting new system images up and integrated in 60 seconds is pushing the envelope and that would assume a zero second response time.  This means it is about time to adjust what our systems architecture should support.  The old rule of 70% utilization accommodating an unexpected 40% increase in traffic is unraveling.  At least eight times in the past month, we&#039;ve experienced from 100% to 1000% sudden increases in traffic across many of our clients.&lt;br /&gt;&lt;br /&gt;I talk about scalability a lot.  It&#039;s my job.  It&#039;s my passion.  I regularly emphasize that scalability and performance are truly different beasts.  One key to scalability is that a &quot;systems design&quot; scales.  Architectures are built to be able to scale, they are not built &quot;at scale.&quot;  It&#039;s just too expensive to build a system to serve a billion people (until you have a billion people).  It&#039;s cheap to &lt;em&gt;design&lt;/em&gt; a system to serve a billion people.  Once you have a billion people accessing your site, you can likely justify executing on your design.  Google is successful for this reason: their ideas scale and they can build into them as demand rises.  On the flip side, traffic anomalies in the form of spikes are unexpected (by their definition) and scaling a system out to meet the &lt;em&gt;unexpected&lt;/em&gt; demand is almost unreasonable.  I would even argue that it is more of a performance-centric issue.  I want every asset I serve to be as cheap to serve as possible allowing me to handle larger and larger spikes.&lt;br /&gt;&lt;br /&gt;The reason I find all of this stuff interesting is that understanding &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=247&amp;amp;entry_id=118&quot; title=&quot;http://omniti.com/does/scalability-and-performance&quot;  onmouseover=&quot;window.status=&#039;http://omniti.com/does/scalability-and-performance&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;performance and scalability&lt;/a&gt;, understanding the &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=248&amp;amp;entry_id=118&quot; title=&quot;http://scalableinternetarchitectures.com/blog/pages/about&quot;  onmouseover=&quot;window.status=&#039;http://scalableinternetarchitectures.com/blog/pages/about&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;principles of scalable systems design&lt;/a&gt; and having &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=249&amp;amp;entry_id=118&quot; title=&quot;http://omniti.com/does/scalability-and-performance/process&quot;  onmouseover=&quot;window.status=&#039;http://omniti.com/does/scalability-and-performance/process&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;sound and efficient processes for handling performance issues&lt;/a&gt; is becoming crucial for sites regardless of their size.  This takes insight and practice and it reminds me of Knuth&#039;s famous saying:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;That&#039;s all well and good, but which 97% of the time?  My response to Knuth&#039;s statement (with which I completely agree) is:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Understanding what is and isn&#039;t &quot;premature&quot; is what separates senior engineers from junior engineers.&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Let&#039;s add perspective on the word &quot;sudden.&quot;  Most network monitoring systems poll SNMP devices (like switches, load-balancers, and hosts) once every five minutes (we do this every 30 seconds in some environments).  Some people say, &quot;my site scales! bring it on.&quot; We see these spikes happen inside 60 seconds and they occasionally induce a ten-fold increase over trended peaks.  Often times, this spike can be well underway for several minutes before your graphing tools even pick up on it.  Then, before you have time to analyze, diagnos and remediate... poof... it&#039;s gone.  Be careful what you wish for.&lt;br /&gt;&lt;br /&gt;This, in many ways, is like a tornado.  Our ability to predict them sucks.  Our responses are crude and they are quite damaging.  However, predicting these Internet traffic events isn&#039;t even possible -- there are no building weather patterns or early warning signs.  Instead we are forced to focus on different techniques for stability and safety.  The idea of a DoS, a DDoS or the sometimes similar signature of a sudden popularity spike doesn&#039;t increase my heart rate anymore -- it&#039;s just another day on the job.  However, I thought I&#039;d share the four guidelines that I believe are key to my sanity in these situations:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;em&gt;Be Alert&lt;/em&gt;: build automated systems to detect and pinpoint the cause of these issues quickly (in less than 60 seconds).&lt;/li&gt;&lt;li&gt;&lt;em&gt;Be Prepared&lt;/em&gt;: understand the bottlenecks of your service systemically.  Understanding your site inside and out.  Contemplate how you would respond if a specific feature or set of features on your site were to get &quot;suddenly popular.&quot;&lt;/li&gt;&lt;li&gt;&lt;em&gt;Perform Triage&lt;/em&gt;: understand the importance of the various services that make up your site.  If you find yourself in a position to sacrifice one part to ensure continued service of another, you should already know their relative importance and not hesitate in the decision.&lt;/li&gt;&lt;li&gt;&lt;em&gt;Be Calm&lt;/em&gt;: any action that is not analytically driven is a waste of time and energy.  be quick, not rash.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;Back to those other countries... Enter China and their recently lessened censorship and we have a looming tidal wave for smaller sites that achieve sudden popularity.  Spikes of several hundred megabits per second are difficult to account for when your normal trend is around twenty megabits per second.    The following graph is traffic induced from a link from a popular foreign news site (that I can&#039;t read).  I call it: &quot;ouch:&quot;&lt;br /&gt;&lt;br /&gt;&lt;div align=&quot;center&quot;&gt;&lt;img style=&quot;border: 1px solid rgb(200, 200, 200); padding: 4px; max-width: 800px;&quot; src=&quot;http://www.lethargy.org/%7Ejesus/uploads/graph_image.php.png&quot; /&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt; 
    </content:encoded>

    <pubDate>Tue, 20 May 2008 13:56:59 -0400</pubDate>
    <guid isPermaLink="false">http://www.lethargy.org/~jesus/archives/118-guid.html</guid>
    
</item>
<item>
    <title>BWPUG Meetup Reminder</title>
    <link>http://www.lethargy.org/~jesus/archives/116-BWPUG-Meetup-Reminder.html</link>
            <category>BWPUG</category>
    
    <comments>http://www.lethargy.org/~jesus/archives/116-BWPUG-Meetup-Reminder.html#comments</comments>
    <wfw:comment>http://www.lethargy.org/~jesus/wfwcomment.php?cid=116</wfw:comment>

    <slash:comments>3</slash:comments>
    <wfw:commentRss>http://www.lethargy.org/~jesus/rss.php?version=2.0&amp;type=comments&amp;cid=116</wfw:commentRss>
    

    <author>nospam@example.com (Theo Schlossnagle)</author>
    <content:encoded>
    &lt;p&gt;Hi all!&lt;/p&gt;

&lt;p&gt;Just a friendly reminder that we&#039;ll be having our first meetup tomorrow as planned.  I thought as a good kick-off we could all collaboratively share what we do with PostgreSQL.  We&#039;ll start off with a whirlwind tour of how OmniTI uses PotsgreSQL, taking a brief look at ZFS, DTrace and large datasets.  After that I think it would be good to get to know each other -- maybe we&#039;ll hit a local pub afterwards!&lt;/p&gt;

&lt;p&gt;I look forward to seeing you there!&lt;p&gt;

&lt;blockquote&gt;
Meetup starts at 6:30pm&lt;br /&gt;
7070 Samuel Morse Dr. Ste 150&lt;br /&gt;
Columbia, MD 21046&lt;br /&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you have issues getting in the building, ring me on my cell -- it will be posted on the doors.&lt;/p&gt;

&lt;p&gt;Best regards,&lt;/p&gt;

&lt;p&gt;Theo&lt;/p&gt;
 
    </content:encoded>

    <pubDate>Tue, 13 May 2008 20:49:56 -0400</pubDate>
    <guid isPermaLink="false">http://www.lethargy.org/~jesus/archives/116-guid.html</guid>
    
</item>
<item>
    <title>OSCON 2008: And now for something completely different.</title>
    <link>http://www.lethargy.org/~jesus/archives/115-OSCON-2008-And-now-for-something-completely-different..html</link>
            <category>Damaged Bits</category>
    
    <comments>http://www.lethargy.org/~jesus/archives/115-OSCON-2008-And-now-for-something-completely-different..html#comments</comments>
    <wfw:comment>http://www.lethargy.org/~jesus/wfwcomment.php?cid=115</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://www.lethargy.org/~jesus/rss.php?version=2.0&amp;type=comments&amp;cid=115</wfw:commentRss>
    

    <author>nospam@example.com (Theo Schlossnagle)</author>
    <content:encoded>
    &lt;p&gt;I just registered for &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=205&amp;amp;entry_id=115&quot; title=&quot;http://en.oreilly.com/oscon2008/public/content/home&quot;  onmouseover=&quot;window.status=&#039;http://en.oreilly.com/oscon2008/public/content/home&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;OSCON&lt;/a&gt;.  They say I should advertise that I am a speaker.  Here goes.&lt;/p&gt;

&lt;p&gt;For &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=197&amp;amp;entry_id=115&quot; title=&quot;http://blogs.oreilly.com/digitalmedia/2005/08/oscon-day-0-scalable-internet.html&quot;  onmouseover=&quot;window.status=&#039;http://blogs.oreilly.com/digitalmedia/2005/08/oscon-day-0-scalable-internet.html&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;the&lt;/a&gt; &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=198&amp;amp;entry_id=115&quot; title=&quot;http://conferences.oreillynet.com/cs/os2005/view/e_sess/6412&quot;  onmouseover=&quot;window.status=&#039;http://conferences.oreillynet.com/cs/os2005/view/e_sess/6412&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;last&lt;/a&gt; &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=199&amp;amp;entry_id=115&quot; title=&quot;http://conferences.oreillynet.com/cs/os2006/view/e_spkr/1788&quot;  onmouseover=&quot;window.status=&#039;http://conferences.oreillynet.com/cs/os2006/view/e_spkr/1788&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;several&lt;/a&gt; &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=200&amp;amp;entry_id=115&quot; title=&quot;http://conferences.oreillynet.com/cs/os2007/view/e_sess/12458&quot;  onmouseover=&quot;window.status=&#039;http://conferences.oreillynet.com/cs/os2007/view/e_sess/12458&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;years&lt;/a&gt;, I&#039;ve presented multiple talks at the O&#039;Reilly Open Source Conference.  My Scalable Internet Architectures talk has been quite popular and drawn large crowds.  It is an interesting talk as it doesn&#039;t really change with time.  As I say, &quot;if principles of good engineering changed frequently, I&#039;d never drive on bridges.&quot;  The talk is about sound engineering approaches to building really large consumer-facing websites.  Almost all of it is open-source centric, which is why it fits so well at OSCON.  While my Scalable talk was not accepted this year, I&#039;ve got another talk lined up that will rock your world.&lt;/p&gt;

&lt;p style=&quot;text-align: center&quot;&gt;&lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=205&amp;amp;entry_id=115&quot; title=&quot;http://en.oreilly.com/oscon2008/public/content/home&quot;  onmouseover=&quot;window.status=&#039;http://en.oreilly.com/oscon2008/public/content/home&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;&lt;img src=&quot;http://conferences.oreillynet.com/banners/oscon/speaker/oscon2008_banner_speaker_210x60.gif&quot; style=&quot;padding: 3px; border: 1px solid #999;&quot; border=0&gt;&lt;/a&gt;
&lt;p&gt;

&lt;p&gt;I am quite excited that my other proposal was accepted.  This year I will be giving  a session about &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=202&amp;amp;entry_id=115&quot; title=&quot;http://en.oreilly.com/oscon2008/public/schedule/detail/2903&quot;  onmouseover=&quot;window.status=&#039;http://en.oreilly.com/oscon2008/public/schedule/detail/2903&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;using DTrace to perform &quot;full-stack&quot; introspection&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Using DTrace we will deep dive into the amazingly cool questions one can ask. Is my application really hitting disk? If so, what line of code is causing it? My process is being descheduled by the kernel, why? I have 100 Apache process and some randomly segfault, how do I get a stack trace when that happens? The app I am running doesn’t have the right debugging output, I need to know more!&lt;/p&gt;
&lt;p style=&quot;margin-top: 1em&quot;&gt;DTrace is an oracle. The value of the answers depends on the quality of the questions. Learn to ask good question and prepare to be amazed at the possibilities.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I&#039;ve given a variation on this presentation at a few places now (both internal to OmniTI and external) and had really positive feedback.  I&#039;ll be taking these prior presentations and polishing them up for a 45 minute escapade that will open your eyes to new possibilities.  &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=203&amp;amp;entry_id=115&quot; title=&quot;http://opensolaris.org/os/community/dtrace/&quot;  onmouseover=&quot;window.status=&#039;http://opensolaris.org/os/community/dtrace/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;DTrace&lt;/a&gt; is an amazing tool and once you get used to it, you can really take it for granted.  I do.  When people watch the presentation and say &quot;&lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=204&amp;amp;entry_id=115&quot; title=&quot;http://www.imdb.com/title/tt0425112/&quot;  onmouseover=&quot;window.status=&#039;http://www.imdb.com/title/tt0425112/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;by the power of Greyskull&lt;/a&gt;,&quot; I know I&#039;ve made my point.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=205&amp;amp;entry_id=115&quot; title=&quot;http://en.oreilly.com/oscon2008/public/content/home&quot;  onmouseover=&quot;window.status=&#039;http://en.oreilly.com/oscon2008/public/content/home&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;Come to OSCON&lt;/a&gt;.  Immerse yourself in technology.&lt;/p&gt; 
    </content:encoded>

    <pubDate>Sun, 27 Apr 2008 22:02:28 -0400</pubDate>
    <guid isPermaLink="false">http://www.lethargy.org/~jesus/archives/115-guid.html</guid>
    
</item>
<item>
    <title>ZFS. Respect.</title>
    <link>http://www.lethargy.org/~jesus/archives/114-ZFS.-Respect..html</link>
            <category>OpenSolaris</category>
    
    <comments>http://www.lethargy.org/~jesus/archives/114-ZFS.-Respect..html#comments</comments>
    <wfw:comment>http://www.lethargy.org/~jesus/wfwcomment.php?cid=114</wfw:comment>

    <slash:comments>28</slash:comments>
    <wfw:commentRss>http://www.lethargy.org/~jesus/rss.php?version=2.0&amp;type=comments&amp;cid=114</wfw:commentRss>
    

    <author>nospam@example.com (Theo Schlossnagle)</author>
    <content:encoded>
    &lt;p&gt;Today someone asked me: &quot;You speak about ZFS a lot.  I know other people that talk about the latest filesystems with praise, but generally speaking they just don&#039;t have much to offer.  Is ZFS that different?&quot;&lt;/p&gt;

&lt;p&gt;My answer is &quot;yes.&quot;  But, of course, I can&#039;t leave it at that.  I&#039;m not going to make a performance argument -- ZFS is fast in some cases and slow in others -- just like everything else.  I think one of the things we&#039;ve seen in the last 10 years is that everyone felt the need to come out with their own filesystem -- at least on Linux.  So, you have to as yourself why.  My personal opinion is that filesystems on Linux suck.&lt;/p&gt;

&lt;p&gt;Most filesystems on the market support snapshots.  No open source filesystems on Linux (that I&#039;m aware of) support snapshots.  Of course, you can use LVM to do block-level snapshots.  First off, that&#039;s a pain in the ass w.r.t. storage provisioning.  Other systems make the process of allocating and managing snapshots &quot;not my problem.&quot; (simple and easy).  Let&#039;s be frank, ext2 and ext3 are nothing to write home about. reiserfs, xfs, jfs, the list goes on and on.&lt;/p&gt;

&lt;p&gt;There are a few closed-source filesystems that are really nice.  Specifically Veritas Filesystem (VxFS) and its excellent layered volume manager VxVM which appears to have heavily inspired geom on FreeBSD.  DEC thought it was so cool that they pulled it white-label into Tru64.  Respect.&lt;/p&gt;

&lt;p&gt;So, what makes ZFS so different?  ZFS is a disruptive technology as it abolishes the sacred line in the sand between block devices, volume management and filesystems.  This means it just make storage management easy.  When I say easy... I mean &lt;b&gt;easy&lt;/b&gt;.&lt;/p&gt;

&lt;p&gt;So you want more space?  Add more disks.  Want to move from from failing disks to replacements?  Tell zfs to add the new ones and tell it to remove the old ones.  Read that report by Google about disk errors?  ZFS checksums all data.  My personal experience says checksums are &lt;em&gt;good&lt;/em&gt;.   Snapshots?  Sure snapshot to your heart&#039;s content.  We snapshot some systems hourly and never &lt;em&gt;ever delete&lt;/em&gt; the old ones.  Snapshots are really cool, but what if you could rollback to a snapshot?  zfs rollback.  What if you wanted to make a read/write copy of the fileystem or an old snapshot? zfs clone.  You want to store a lot of raw data? zfs has built-in compression.  Oh, and it is open-source.&lt;/p&gt;

&lt;p&gt;Simply put.  ZFS.  Respect.&lt;/p&gt;

 
    </content:encoded>

    <pubDate>Tue, 22 Apr 2008 23:16:32 -0400</pubDate>
    <guid isPermaLink="false">http://www.lethargy.org/~jesus/archives/114-guid.html</guid>
    
</item>
<item>
    <title>Starting the Baltimore/Washington PostgreSQL User Group</title>
    <link>http://www.lethargy.org/~jesus/archives/113-Starting-the-BaltimoreWashington-PostgreSQL-User-Group.html</link>
            <category>BWPUG</category>
    
    <comments>http://www.lethargy.org/~jesus/archives/113-Starting-the-BaltimoreWashington-PostgreSQL-User-Group.html#comments</comments>
    <wfw:comment>http://www.lethargy.org/~jesus/wfwcomment.php?cid=113</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://www.lethargy.org/~jesus/rss.php?version=2.0&amp;type=comments&amp;cid=113</wfw:commentRss>
    

    <author>nospam@example.com (Theo Schlossnagle)</author>
    <content:encoded>
    &lt;p&gt;On the second Wednesday of every month, the Baltimore/Washington PostgreSQL User Group will meet at &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=195&amp;amp;entry_id=113&quot; title=&quot;http://maps.google.com/maps?f=q&amp;amp;hl=en&amp;amp;geocode=&amp;amp;q=7070+Samuel+Morse+Dr,+Columbia,+MD+21046&amp;amp;jsv=107&amp;amp;sll=37.0625,-95.677068&amp;amp;sspn=42.03917,64.335938&amp;amp;ie=UTF8&amp;amp;ll=39.17033,-76.807995&amp;amp;spn=0.040325,0.062828&amp;amp;t=h&amp;amp;z=14&amp;amp;iwloc=addr&quot;  onmouseover=&quot;window.status=&#039;http://maps.google.com/maps?f=q&amp;amp;hl=en&amp;amp;geocode=&amp;amp;q=7070+Samuel+Morse+Dr,+Columbia,+MD+21046&amp;amp;jsv=107&amp;amp;sll=37.0625,-95.677068&amp;amp;sspn=42.03917,64.335938&amp;amp;ie=UTF8&amp;amp;ll=39.17033,-76.807995&amp;amp;spn=0.040325,0.062828&amp;amp;t=h&amp;amp;z=14&amp;amp;iwloc=addr&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;7070 Samuel Morse Drive, Ste 150 in Columbia, Maryland&lt;/a&gt;.  Meetings start at 6:30pm and go until around 8:30pm.  I am pretty excited about this and pleased to offer up OmniTI&#039;s facilities for this.  I&#039;m excited about the opportunity to share what I&#039;ve learned, educate and grow the PostgreSQL community and learn from others in it.  This is going to be &quot;good stuff.&quot;&lt;/p&gt;

&lt;p&gt;Our first meeting will be held on May 14th, 2008.  Mark your calendars.  Also, subscribe to the mailing list at: bwpug@postgresql.org.&lt;/p&gt; 
    </content:encoded>

    <pubDate>Mon, 14 Apr 2008 13:49:42 -0400</pubDate>
    <guid isPermaLink="false">http://www.lethargy.org/~jesus/archives/113-guid.html</guid>
    
</item>
<item>
    <title>Probing for Success</title>
    <link>http://www.lethargy.org/~jesus/archives/112-Probing-for-Success.html</link>
            <category>Damaged Bits</category>
    
    <comments>http://www.lethargy.org/~jesus/archives/112-Probing-for-Success.html#comments</comments>
    <wfw:comment>http://www.lethargy.org/~jesus/wfwcomment.php?cid=112</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://www.lethargy.org/~jesus/rss.php?version=2.0&amp;type=comments&amp;cid=112</wfw:commentRss>
    

    <author>nospam@example.com (Theo Schlossnagle)</author>
    <content:encoded>
    &lt;p&gt;I recently attended &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=192&amp;amp;entry_id=112&quot; title=&quot;http://wikis.sun.com/display/DTrace/dtrace.conf&quot;  onmouseover=&quot;window.status=&#039;http://wikis.sun.com/display/DTrace/dtrace.conf&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;dtrace.conf(08)&lt;/a&gt;, which was a blast, but I left that conference with a single thought and it has been reinforced since.  &lt;strong&gt;Everything&lt;/strong&gt; should be dtrace enabled.  While it is true that using DTrace you can introspect just about everything in the system, the pid provider (used to trace inside user-space applications) requires the user to to know the code of the application.  A full system &quot;in-flight&quot; has too many different apps running for me to keep all of their code-bases in my head.  The kernel and one or two apps is about my limit.  Also, the pid provider is somewhat limited in that it makes watching a lot of processes on that level intractable.  So, what&#039;s the solution? SDTs for user applications or USDTs.&lt;/p&gt;

&lt;p&gt;USDTs allow application authors to put specific probe points in their software that provide three compelling advantages over process-level tracing with the pid provider: (1) they should be carefully placed and named to be accessible to a user who does not know the application code base, its data structures, or even C for that matter, (2) they boast a very low barrier to entry, and (3) they work system-wide as expected.&lt;/p&gt;

&lt;p&gt;Sounds easy, right?  Well, adding the code to an existing project is &quot;retarded simple.&quot;  After a few lines of patching in the build process (usually a makefile), new probes can be added at about two lines of code per probe (one line for the probe and one line for the prototype).  So, it is easy.  But, why isn&#039;t every application pimped-out with DTrace probes?  Linux.  Linux doesn&#039;t have DTrace and as such, I think there is a lot of resistance to add the probes to software who&#039;s primary development base (and target) is Linux.  I don&#039;t think they are against it, but the &quot;what&#039;s in it for me?&quot; question comes up and acts as an obstacle.  This is the case with any &quot;cool new technology&quot; that&#039;s not mainstream.&lt;/p&gt;

&lt;p&gt;The real challenge is that each open source project (to which we would add probes) has its own culture and process for proposing changes, submitting patches, negotiating for inclusion, etc.  In a lot of ways, in order to effect change, we have to take the role of a package distributor and with enough impetus, we&#039;ll see the upstream pull our patches on their schedule without much cooperation from us (of course, we&#039;re also happy to cooperate).&lt;/p&gt;

&lt;p&gt;I&#039;m not going into the details of why DTrace is hands down better than peanut butter and jelly -- you need to see it in action to truly respect it.  However, with about two hours worth of work in PostgreSQL, I exposed &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=193&amp;amp;entry_id=112&quot; title=&quot;https://labs.omniti.com/trac/project-dtrace/wiki/Applications#PostgreSQL&quot;  onmouseover=&quot;window.status=&#039;https://labs.omniti.com/trac/project-dtrace/wiki/Applications#PostgreSQL&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;probes in some parts of PostgreSQL&lt;/a&gt; that are otherwise hard to inspect.  I instrumented some XLog operations, checkpoints, &quot;exec&quot; nodes in the executor, buffer syncing, LRU operations (that drive the CLOG, SUBTRANS and MultiXact system), the autovacuum system and SQL executions from clients.&lt;/p&gt;

&lt;p&gt;Look ma! I can watch my checkpoints in real-time:&lt;/p&gt;

&lt;pre&gt;
CheckPoint initated...
  CLOG                                                              1
  SUBTRANS                                                          1
  Buffers                                                        1430
CheckPoint complete: elapsed 92355ms

CheckPoint initated...
  CLOG                                                              1
  SUBTRANS                                                          1
  Buffers                                                         911
CheckPoint complete: elapsed 55933ms

&lt;/pre&gt;

&lt;p&gt;55 seconds! Is that evenly distributed?&lt;/p&gt;

&lt;pre&gt;
  buffer writes                                     
           value  ------------- Distribution ------------- count    
             &lt; 0 |                                         0        
               0 |@@                                       5        
               1 |@@                                       5        
               2 |@@                                       5        
               3 |@@                                       5        
               4 |@                                        4        
               5 |@@                                       5        
               6 |@@                                       5        
               7 |@@                                       5        
               8 |@                                        4        
               9 |@@                                       5        
              10 |@@                                       5        
              11 |@@                                       5        
              12 |@                                        4        
              13 |@@                                       5        
              14 |@@                                       5        
              15 |@@                                       5        
              16 |@                                        4        
              17 |@@                                       5        
              18 |@@                                       5        
              19 |@@                                       5        
              20 |@                                        4        
              21 |@@                                       5        
              22 |@@                                       5        
              23 |@@                                       5        
              24 |@@                                       5        
              25 |@                                        4        
              26 |                                         1        
              27 |                                         0        

CheckPoint complete: elapsed 26085ms
&lt;/pre&gt;

&lt;p&gt;Well, it only took 26 seconds that time.  It turns out that in PostgreSQL 8.3 the regular checkpoints attempt to spread out the buffer writes and we can witness it working well.  Pretty neat.&lt;/p&gt;

&lt;p&gt;With all these probes, the questions you can ask can be quite interesting. Particularly, how many xlog inserts did a query induce, or how many buffers did a query dirty.  All as simple as a few lines of DTrace script now... in production... no configuration changes required.&lt;/p&gt;

&lt;p&gt;Being able to ask systemic questions is of fundamental importance when troubleshooting problems.  DTrace is built around the concept that things should be looked at systemically.  However, to support that, we need applications to assist by exposing USDT probes in the &quot;right places.&quot;  We&#039;ve started a project here at OmniTI called &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=194&amp;amp;entry_id=112&quot; title=&quot;https://labs.omniti.com/trac/project-dtrace&quot;  onmouseover=&quot;window.status=&#039;https://labs.omniti.com/trac/project-dtrace&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;Project DTrace&lt;/a&gt; that aims to do just that.  It&#039;s open, please join in!&lt;/p&gt; 
    </content:encoded>

    <pubDate>Sun, 13 Apr 2008 15:08:22 -0400</pubDate>
    <guid isPermaLink="false">http://www.lethargy.org/~jesus/archives/112-guid.html</guid>
    
</item>
<item>
    <title>PostgreSQL: Looking under the hood with Solaris</title>
    <link>http://www.lethargy.org/~jesus/archives/111-PostgreSQL-Looking-under-the-hood-with-Solaris.html</link>
            <category>OpenSolaris</category>
    
    <comments>http://www.lethargy.org/~jesus/archives/111-PostgreSQL-Looking-under-the-hood-with-Solaris.html#comments</comments>
    <wfw:comment>http://www.lethargy.org/~jesus/wfwcomment.php?cid=111</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://www.lethargy.org/~jesus/rss.php?version=2.0&amp;type=comments&amp;cid=111</wfw:commentRss>
    

    <author>nospam@example.com (Theo Schlossnagle)</author>
    <content:encoded>
    &lt;p style=&quot;text-align: center&quot;&gt;&lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=190&amp;amp;entry_id=111&quot; title=&quot;http://lethargy.org/~jesus/misc/PostgreSQLonSolaris.pdf&quot;  onmouseover=&quot;window.status=&#039;http://lethargy.org/~jesus/misc/PostgreSQLonSolaris.pdf&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot; border=&quot;0&quot;&gt;&lt;img src=&quot;http://lethargy.org/~jesus/misc/PostgreSQLonSolaris.title.png&quot; height=&quot;240&quot; width=&quot;320&quot; alt=&quot;PostgreSQL on Solaris&quot; style=&quot;padding: 2px; border: 1px solid #ccc;&quot;&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For those interested, here is my slide stack from PostgreSQL Conference East &#039;08.  I think the title of the talk was &quot;&lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=190&amp;amp;entry_id=111&quot; title=&quot;http://lethargy.org/~jesus/misc/PostgreSQLonSolaris.pdf&quot;  onmouseover=&quot;window.status=&#039;http://lethargy.org/~jesus/misc/PostgreSQLonSolaris.pdf&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;PostgreSQL: Looking under the hood with Solaris&lt;/a&gt;.&quot;&lt;/p&gt;

&lt;p&gt;The presentation was 90 minutes long and had lots of shell-based show-and-tell.  Obviously that stuff isn&#039;t available in the slides.  I think it went over quite well.  The audience was small, but hopefully people took away the a lasting impression of what DTrace has to offer and at least one person had the response: &quot;By the power of Greyskull.&quot;  Regardless, enjoy the slides!&lt;/p&gt; 
    </content:encoded>

    <pubDate>Sun, 30 Mar 2008 09:21:05 -0400</pubDate>
    <guid isPermaLink="false">http://www.lethargy.org/~jesus/archives/111-guid.html</guid>
    
</item>
<item>
    <title>PostgreSQL Community</title>
    <link>http://www.lethargy.org/~jesus/archives/110-PostgreSQL-Community.html</link>
            <category>Damaged Bits</category>
    
    <comments>http://www.lethargy.org/~jesus/archives/110-PostgreSQL-Community.html#comments</comments>
    <wfw:comment>http://www.lethargy.org/~jesus/wfwcomment.php?cid=110</wfw:comment>

    <slash:comments>7</slash:comments>
    <wfw:commentRss>http://www.lethargy.org/~jesus/rss.php?version=2.0&amp;type=comments&amp;cid=110</wfw:commentRss>
    

    <author>nospam@example.com (Theo Schlossnagle)</author>
    <content:encoded>
    &lt;p&gt;I just attended the Keynote by Joshua Drake from &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url=aHR0cDovL3d3dy5jb21tYW5kcHJvbXB0LmNvbS8=&amp;amp;entry_id=110&quot; title=&quot;http://www.commandprompt.com/&quot;  onmouseover=&quot;window.status=&#039;http://www.commandprompt.com/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;Command Prompt&lt;/a&gt;.  There are a lot of good movements on the operational organization of the PostgreSQL community.  I think his vision of the community is more aggressive and structured than many are prepared for, but in a community as large as the PostgreSQL community it is very good to have someone pushing the envelope and attempting to apply a vision.  I don&#039;t want to go as far as Josh wants to do, but we&#039;ll wind up part of the way there and that &quot;just perfect.&quot;&lt;p&gt;

&lt;p&gt;He used a lot of geek marketing terms going so far as to use the term &quot;Db 2.0.&quot;  I&#039;ll add a few comments and marketing terms to my commentary.  Josh said we need to stop following and start leading; stop looking at features in Oracle as a future feature map.  What Josh means here is that we need to be disruptive.  We need to implement things no one else has.  I think we need both.&lt;/p&gt;

&lt;p&gt;Josh said he wanted everyone using PostgreSQL and not &quot;&lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url=aHR0cDovL3N1bi5jb20vbXlzcWw=&amp;amp;entry_id=110&quot; title=&quot;http://sun.com/mysql&quot;  onmouseover=&quot;window.status=&#039;http://sun.com/mysql&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;the Dolphin&lt;/a&gt;.&quot;  I have to say that I think his statement was too strong to match my opinion.  I like PostgreSQL.  I believe it is the right tool for the job more often than not.  However, there is a &quot;not.&quot;  In fact, there are a lot of &quot;not&quot;s.  There are many requirements that, when explored, map better to the offerings of an Oracle, a DB2, or, dare I say, even a MySQL.  MySQL is the &quot;right tool for the job&quot; for many requirements.  There is room for more than one database.  In fact, there is room for all databases.  Good business and engineering practices should always define the process of evaluating technology appropriateness.  One good business practice is placing only technology that can be well supported by your existing (or easily accessible) engineering talent.  This practice should never be confused with evangelism and zeal -- we all have these character traits, but good engineers and managers shouldn&#039;t use them as a part of defining the appropriateness of a solution.&lt;/p&gt;

&lt;p&gt;Josh&#039;s primary goal seems to be to grow the community which will better legitimizes PostgreSQL.  The one thing I took away from this is that I should make sure PostgreSQL is a more common topic of discussion.  I think we should start a Baltimore/Washington &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url=aHR0cDovL3B1Z3MucG9zdGdyZXNxbC5vcmcv&amp;amp;entry_id=110&quot; title=&quot;http://pugs.postgresql.org/&quot;  onmouseover=&quot;window.status=&#039;http://pugs.postgresql.org/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;PostgreSQL User Group&lt;/a&gt;; &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url=aHR0cDovL29tbml0aS5jb20v&amp;amp;entry_id=110&quot; title=&quot;http://omniti.com/&quot;  onmouseover=&quot;window.status=&#039;http://omniti.com/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;OmniTI&lt;/a&gt; will provide facilities, coordination and even food and drink as long as it is under thirty people.&lt;/p&gt;

&lt;p&gt;I&#039;m sitting at &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url=aHR0cDovL3d3dy5wb3N0Z3Jlc3FsY29uZmVyZW5jZS5vcmcv&amp;amp;entry_id=110&quot; title=&quot;http://www.postgresqlconference.org/&quot;  onmouseover=&quot;window.status=&#039;http://www.postgresqlconference.org/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;PostgreSQL Conference&lt;/a&gt;.  I have a snazzy slide set.  I have forgotten my &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url=aHR0cDovL3N0b3JlLmFwcGxlLmNvbS8xLTgwMC1NWS1BUFBMRS9XZWJPYmplY3RzL0FwcGxlU3RvcmUud29hL3dhL1JTTElEP21jbz03RTRFQjkxRSZucGxtPU04NzU0Ry9B&amp;amp;entry_id=110&quot; title=&quot;http://store.apple.com/1-800-MY-APPLE/WebObjects/AppleStore.woa/wa/RSLID?mco=7E4EB91E&amp;amp;nplm=M8754G/A&quot;  onmouseover=&quot;window.status=&#039;http://store.apple.com/1-800-MY-APPLE/WebObjects/AppleStore.woa/wa/RSLID?mco=7E4EB91E&amp;amp;nplm=M8754G/A&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;Mac 15&quot; VGA adapter&lt;/a&gt;.  @#$^ *@#$*%!  I&#039;m sure I&#039;ll be able to borrow an adapter from someone, but I must note that this conference has a much smaller Mac to PC laptop ratio than any other other conference I&#039;ve been to in the last two years.  (a.k.a. a large group in denial).&lt;/p&gt;
 
    </content:encoded>

    <pubDate>Sat, 29 Mar 2008 10:42:08 -0400</pubDate>
    <guid isPermaLink="false">http://www.lethargy.org/~jesus/archives/110-guid.html</guid>
    
</item>
<item>
    <title>PostgreSQL Conference East '08. Bring it... Yeah.</title>
    <link>http://www.lethargy.org/~jesus/archives/109-PostgreSQL-Conference-East-08.-Bring-it...-Yeah..html</link>
            <category>Damaged Bits</category>
    
    <comments>http://www.lethargy.org/~jesus/archives/109-PostgreSQL-Conference-East-08.-Bring-it...-Yeah..html#comments</comments>
    <wfw:comment>http://www.lethargy.org/~jesus/wfwcomment.php?cid=109</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://www.lethargy.org/~jesus/rss.php?version=2.0&amp;type=comments&amp;cid=109</wfw:commentRss>
    

    <author>nospam@example.com (Theo Schlossnagle)</author>
    <content:encoded>
    &lt;p&gt;I was just surfing &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=185&amp;amp;entry_id=109&quot; title=&quot;http://planetpostgresql.org/&quot;  onmouseover=&quot;window.status=&#039;http://planetpostgresql.org/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;Planet PostgreSQL&lt;/a&gt; and read &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=186&amp;amp;entry_id=109&quot; title=&quot;http://www.chesnok.com/daily/2008/03/26/postgresql-conference-east-this-saturday-and-sunday/&quot;  onmouseover=&quot;window.status=&#039;http://www.chesnok.com/daily/2008/03/26/postgresql-conference-east-this-saturday-and-sunday/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;Selena Deckelmann&#039;s blog post&lt;/a&gt; that said the 2008 PostgreSQL Conference East is this Saturday and Sunday.  My first response was, &quot;WTF? I though it was like two weeks away and on a Thursday and Friday.&quot;  My second response was, &quot;I&#039;m speaking at that, I should go.&quot;  My third though?  &lt;strong&gt;You should go too.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So... I, like any professional speaker, finished my slides on time and turned them in several weeks ago.  I have a polished presentation that&#039;s going to kick major ass.  If you are in the area (Baltimore/Washington), you should definitely come down and partake in the festivities that surround &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=187&amp;amp;entry_id=109&quot; title=&quot;http://postgresql.org/&quot;  onmouseover=&quot;window.status=&#039;http://postgresql.org/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;the most advanced open-source database&lt;/a&gt; in the &lt;strong&gt;universe&lt;/strong&gt;.  I might even through in a few &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=188&amp;amp;entry_id=109&quot; title=&quot;http://www.imdb.com/title/tt0425112/&quot;  onmouseover=&quot;window.status=&#039;http://www.imdb.com/title/tt0425112/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;Hot Fuzz&lt;/a&gt; quotes during my presentation.&lt;/p&gt; 
    </content:encoded>

    <pubDate>Wed, 26 Mar 2008 21:50:20 -0400</pubDate>
    <guid isPermaLink="false">http://www.lethargy.org/~jesus/archives/109-guid.html</guid>
    
</item>
<item>
    <title>Talking w/ Sun</title>
    <link>http://www.lethargy.org/~jesus/archives/108-Talking-w-Sun.html</link>
            <category>OpenSolaris</category>
    
    <comments>http://www.lethargy.org/~jesus/archives/108-Talking-w-Sun.html#comments</comments>
    <wfw:comment>http://www.lethargy.org/~jesus/wfwcomment.php?cid=108</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://www.lethargy.org/~jesus/rss.php?version=2.0&amp;type=comments&amp;cid=108</wfw:commentRss>
    

    <author>nospam@example.com (Theo Schlossnagle)</author>
    <content:encoded>
    &lt;p&gt;New acquaintances walk away from their first conversation with me and either think that I am in love with a particular vendor or technology or they think I truly hate all technology.  Both are true in some fashion.&lt;/p&gt;

&lt;p&gt;The fact that I have an OpenSolaris feed on my blog might indicate that I&#039;m a fan of &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url=aHR0cDovL3N1bi5jb20v&amp;amp;entry_id=108&quot; title=&quot;http://sun.com/&quot;  onmouseover=&quot;window.status=&#039;http://sun.com/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;Sun&lt;/a&gt;.  The truth is I am and I am not.  As is true of any large organization, it&#039;s really tough to be enamored with all of it.  I &lt;em&gt;am&lt;/em&gt; a huge fan of Solaris 10 and Sun&#039;s initiative to support strict ABI compatibility for stable interfaces, and I&#039;m downright giddy about their ZFS and DTrace technologies.  I think Zones/Containers are cool and I think their engineering team has some brilliant shining stars and is on the whole smarter than average.  Yet, the OpenSolaris community is challenged in a lot of ways due to the corporate involvement by Sun that leaves me with a funny taste in my mouth.  I&#039;m luke-warm about Java and feel like their hardware initiative is going down-hill with bad quality problems on some of their new offerings compared to their spectacularly rock-solid history (sans the E4500). &lt;/p&gt;

&lt;p&gt;Recently, I did an interview with Mark Thacker of Sun about &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url=aHR0cDovL29tbml0aS5jb20v&amp;amp;entry_id=108&quot; title=&quot;http://omniti.com/&quot;  onmouseover=&quot;window.status=&#039;http://omniti.com/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;our&lt;/a&gt; use of Solaris for their &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url=aHR0cDovL3d3dy5zdW4uY29tL3NvZnR3YXJlL3NvbGFyaXMvcG9kY2FzdHMuanNw&amp;amp;entry_id=108&quot; title=&quot;http://www.sun.com/software/solaris/podcasts.jsp&quot;  onmouseover=&quot;window.status=&#039;http://www.sun.com/software/solaris/podcasts.jsp&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;Solaris Podcast series&lt;/a&gt;.  We&#039;ve had some bad experiences here and there, but all-in-all it has been a win.  DTrace has been a god-send and ZFS has saved my bacon several times.  Anyone who&#039;s talked to me knows that I&#039;m brutally honest and appreciate those that return the favor.   I&#039;ll look at solution and I tell you what you did wrong.  I don&#039;t tell you what you did correctly... after all it was &lt;strong&gt;all&lt;/strong&gt; supposed to be done correctly.  So, upon listening to this &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url=aHR0cDovL3djZGF0YS5zdW4uY29tL3dlYmNhc3QvZG93bmxvYWQvcG9kY2FzdC9Tb2xhcmlzUmV2ZWFsZWQvT21uaVRJLTMuNi4wOC5tcDM=&amp;amp;entry_id=108&quot; title=&quot;http://wcdata.sun.com/webcast/download/podcast/SolarisRevealed/OmniTI-3.6.08.mp3&quot;  onmouseover=&quot;window.status=&#039;http://wcdata.sun.com/webcast/download/podcast/SolarisRevealed/OmniTI-3.6.08.mp3&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;Sun podcast&lt;/a&gt;, several of my colleagues said: &quot;that had to have been edited.&quot;  As marketing people usually do, they attempt to limit the negative exposure as much as possible -- most notably, they removed a section about lack of tight integration between ZFS and Zones which has made for some very painful upgrade paths.  We have marketing here at OmniTI too, I know the drill.  All-in-all, I think the interview went rather well and fairly represents the benefits we&#039;ve realized by deploying Solaris 10.&lt;/p&gt;

 
    </content:encoded>

    <pubDate>Sun, 16 Mar 2008 23:21:55 -0400</pubDate>
    <guid isPermaLink="false">http://www.lethargy.org/~jesus/archives/108-guid.html</guid>
    
</item>
<item>
    <title>dtrace.conf(08)</title>
    <link>http://www.lethargy.org/~jesus/archives/107-dtrace.conf08.html</link>
            <category>OpenSolaris</category>
    
    <comments>http://www.lethargy.org/~jesus/archives/107-dtrace.conf08.html#comments</comments>
    <wfw:comment>http://www.lethargy.org/~jesus/wfwcomment.php?cid=107</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://www.lethargy.org/~jesus/rss.php?version=2.0&amp;type=comments&amp;cid=107</wfw:commentRss>
    

    <author>nospam@example.com (Theo Schlossnagle)</author>
    <content:encoded>
    &lt;p&gt;As many people already know, I&#039;m a big fan of DTrace.  Well, today I attended &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=180&amp;amp;entry_id=107&quot; title=&quot;http://wikis.sun.com/display/DTrace/dtrace.conf&quot;  onmouseover=&quot;window.status=&#039;http://wikis.sun.com/display/DTrace/dtrace.conf&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;dtrace.conf(08)&lt;/a&gt; -- the first (un)conference revolving around planet DTrace.  It was awesome.  Many people who know me well have heard me say, &quot;my good days are when I&#039;m the dumbest person in the room.&quot;  That&#039;s not to be confused with &quot;I like having bad days.&quot;  Instead, I like to be at my best and still struggle to keep up.  Here at dtrace.conf(08), the people here are damn smart -- a bunch of higher level thinking.&lt;/p&gt;

&lt;p&gt;I gave a demo of the &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=181&amp;amp;entry_id=107&quot; title=&quot;https://labs.omniti.com/trac/pgsoltools&quot;  onmouseover=&quot;window.status=&#039;https://labs.omniti.com/trac/pgsoltools&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;PostgreSQL stuff we do using dtrace&lt;/a&gt;.  I put back my &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=182&amp;amp;entry_id=107&quot; title=&quot;http://wikis.sun.com/display/DTrace/sdt+Provider&quot;  onmouseover=&quot;window.status=&#039;http://wikis.sun.com/display/DTrace/sdt+Provider&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;SDT&lt;/a&gt; postgres probes into our internal packaging systems -- I hope that I can get those back into PostgreSQL soon...  I talked very briefly about the ZFS magic we&#039;ve experienced, but as the conference was focused intensely on DTrace, I saved most of that for &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=183&amp;amp;entry_id=107&quot; title=&quot;http://www.postgresqlconference.org/&quot;  onmouseover=&quot;window.status=&#039;http://www.postgresqlconference.org/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;PostgreSQL Conference East 08&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As usual, I particularly enjoyed &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=184&amp;amp;entry_id=107&quot; title=&quot;http://blogs.sun.com/bmc/&quot;  onmouseover=&quot;window.status=&#039;http://blogs.sun.com/bmc/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;Bryan Cantril&lt;/a&gt;&#039;s infectious excitement and pungent humor.  I&#039;d like to thank Sun and the DTrace Team for running the conference (as well as the sponsors).  I really appreciate the effort.  DTrace has empowered OmniTI to make our customers more successful.  Hands-down awesome technology.&lt;/p&gt;

&lt;p&gt;I&#039;ve said it before and I&#039;ll say it again.  ZFS and DTrace are the most impressive operating system advances I&#039;ve seen this decade.&lt;/p&gt; 
    </content:encoded>

    <pubDate>Fri, 14 Mar 2008 21:28:49 -0400</pubDate>
    <guid isPermaLink="false">http://www.lethargy.org/~jesus/archives/107-guid.html</guid>
    
</item>
<item>
    <title>A job, a mission, a career: all without a path or a name.</title>
    <link>http://www.lethargy.org/~jesus/archives/106-A-job,-a-mission,-a-career-all-without-a-path-or-a-name..html</link>
            <category>Damaged Bits</category>
    
    <comments>http://www.lethargy.org/~jesus/archives/106-A-job,-a-mission,-a-career-all-without-a-path-or-a-name..html#comments</comments>
    <wfw:comment>http://www.lethargy.org/~jesus/wfwcomment.php?cid=106</wfw:comment>

    <slash:comments>5</slash:comments>
    <wfw:commentRss>http://www.lethargy.org/~jesus/rss.php?version=2.0&amp;type=comments&amp;cid=106</wfw:commentRss>
    

    <author>nospam@example.com (Theo Schlossnagle)</author>
    <content:encoded>
    &lt;p&gt;
I&#039;m sitting in the SFO airport waiting to sit on a plane for 6 hours to fly home from the O&#039;Reilly Velocity Summit.  Was it worth it?  You betcha.
&lt;/p&gt;

&lt;p&gt;
What is this Velocity Summit thing?  It was a bunch of web architects from highly trafficked sites sitting around talkin&#039; smack.  It was operated in &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url=aHR0cDovL2VuLndpa2lwZWRpYS5vcmcvd2lraS9Gb29fQ2FtcA==&amp;amp;entry_id=106&quot; title=&quot;http://en.wikipedia.org/wiki/Foo_Camp&quot;  onmouseover=&quot;window.status=&#039;http://en.wikipedia.org/wiki/Foo_Camp&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;Foo&lt;/a&gt; style.  However, one thing that made me really appreciate this meet-up was the lack of self-importance displayed by attendees.  Everyone was just there to talk -- not to make people understand how much they knew.  We were talking about &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url=aHR0cDovL2VuLm9yZWlsbHkuY29tL3ZlbG9jaXR5MjAwOC9wdWJsaWMvY29udGVudC9ob21l&amp;amp;entry_id=106&quot; title=&quot;http://en.oreilly.com/velocity2008/public/content/home&quot;  onmouseover=&quot;window.status=&#039;http://en.oreilly.com/velocity2008/public/content/home&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;The O&#039;Reilly Velocity Web Performance and Operations Conference&lt;/a&gt;: what it should be and why.
&lt;/p&gt;

&lt;p&gt;
Two things that I walked away with were (1) a realization of the lack of a career path for people who do what we do (no standard titles, no standard roles and responsibilities and certainly a lack of sex appeal) and (2) a clear lack of terminology for the technology requirements that are so common in these environments.  Terminology is easy, in my opinion -- you just argue until someone wins.  Of course, arguing is a hobby of mine, so I have bias.  On the other hand, defining a career path that is an industry accepted path is hard.
&lt;/p&gt;

&lt;h2&gt;Job Title&lt;/h2&gt;
&lt;p&gt;
The term &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url=aHR0cDovL2VuLndpa2lwZWRpYS5vcmcvd2lraS9XZWJfb3BlcmF0aW9ucw==&amp;amp;entry_id=106&quot; title=&quot;http://en.wikipedia.org/wiki/Web_operations&quot;  onmouseover=&quot;window.status=&#039;http://en.wikipedia.org/wiki/Web_operations&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;Web Operations&lt;/a&gt; was used a lot during this event.  While it isn&#039;t awful, I really don&#039;t like this term.  The hard part is that the captains, superstars, or heroes in these roles are multidisciplinary experts.  They have a deep understanding of networks, routing, switching, firewalls, load-balancing, high availability, disaster recovery, TCP &amp;amp; UDP services, NOC management, hardware specifications, several different flavors of UNIX, several web server technologies, caching technologies, several databases, storage infrastructure, cryptography, algorithms, trending and capacity planning.  The issue: how can we expect to find good candidates that have fluency in all of those technologies?  In the traditional enterprise, you have architects which are broad and shallow and their team of experts which are focused and deep.  However, in the expectation is that your &quot;web operations&quot; engineer be both broad and deep: fix your gigabit switch, optimize your MySQL database and guide the overall architecture design to meet scalability requirements.
&lt;/p&gt;

&lt;p&gt;
I struggle with this.  Not everyone can be a superstar.  More importantly, no one can really start as a superstar.  If we use an apprentice model (which is common in industries without institutional support) we limit the total number of able workers in this field.  So, how do we (re)define the requirements for a junior web operations person?
&lt;/p&gt;

&lt;p&gt;
We have to have a plan for hiring on people and progressing them through a career path to make this a legitimate discipline.  One person said they just hire people that they think are agile -- &quot;If I tell them to know IOS well enough to configure a router and troubleshoot a problem, I expect them to show up tomorrow with a basic understanding of IOS and ready to start typing in commands at a console.&quot;  I agree this sort of &quot;no boundaries&quot; attitude is required for the job, but where do you start?
&lt;/p&gt;

&lt;p&gt;
Another person mentioned that the reason for the lack of sex appeal in the position was due to popular attitude.  Many people apply for development positions and &quot;don&#039;t quite make the cut&quot; and are instead offered system administration positions.   I personally don&#039;t subscribe to this philosophy and we certainly don&#039;t operate like that at &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url=aHR0cDovL29tbml0aS5jb20v&amp;amp;entry_id=106&quot; title=&quot;http://omniti.com/&quot;  onmouseover=&quot;window.status=&#039;http://omniti.com/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;OmniTI&lt;/a&gt;, but I&#039;ve see it in other companies -- I hope it is not prevalent.
&lt;/p&gt;

&lt;p&gt;
Basically, this is one of the few positions in the organization that has no boundaries of responsibility.  If something breaks, it &lt;b&gt;is&lt;/b&gt; your problem.  Why isn&#039;t this the case throughout the organization -- why is it that even the most junior of developers  doesn&#039;t wake up to fix their code when it breaks and causes service degradation in the middle of the night?  It&#039;s uncommon that this level of responsibility is expected of developers, while it is a quite common expectation of the operations crew.
&lt;/p&gt;

&lt;p&gt;
Circling back, I really don&#039;t like the term &quot;web ops.&quot;  I realize it is not far off, but it isn&#039;t sexy.  Google has a few different roles with this level of responsibility.  One I like is called: &quot;Site Reliability Engineer.&quot;  However, I&#039;d like a set of job titles and a progression through them that makes this an appealing career path for young, ambitious geeks.
&lt;/p&gt;

&lt;p&gt;
In order to define these roles, we should think about what they are responsible for.  In our organization I see this as a few things:
&lt;/p&gt;

&lt;h3&gt;Junior&lt;/h3&gt;
&lt;p&gt;
On the junior level, they are responsible for learning.  They are responsible for deploying new services and documenting such deployments.  They are responsible for instrumenting deployments to make sure that faults are detected and trending is possible.
&lt;/p&gt;

&lt;h3&gt;Mid-level&lt;/h3&gt;
&lt;p&gt;
On the mid-level, they are responsible for all of the above, and more.  Effective and complete troubleshooting of failures.  Making sense of trending information.  Understanding work loads that exist.  Tuning systems to better accommodate current workloads and proactive tuning to handle known future workloads.  One of the key differences between mid-level and junior is the ability to correctly prioritize remediation of issues during incident response.  Staying calm, collected and executing with clarity of thought during an emergency.
&lt;/p&gt;

&lt;p&gt;
What does &quot;complete troubleshooting&quot; mean?  I mean troubleshooting without boundaries.  I want no shyness in cracking open developer code and telling them what they did wrong and why. Finger pointing at people simply doesn&#039;t work, you have to point your finger at implementation problems, not people.  To do that requires the skill to track a performance problem or reliability issue down to a specific line of code or approach.
&lt;/p&gt;

&lt;h3&gt;Senior&lt;/h3&gt;
&lt;p&gt;
On the senior side, technology research and selection is a must.  Incorporating new technologies in the architecture to improve availability and reduce costs.  Constant analysis of systems to improve efficiency.  Capacity planning to understand growth well enough to ensure provisioning and deployment outpace need.  Donald Knuth long said that premature optimization is the root of all evil; I&#039;ve long said that the ability to accurately determine what is premature separates senior from junior.
&lt;/p&gt;

&lt;p&gt;
One of the core responsibilities that must be handled on all levels is assessing the appropriateness of the technologies at hand.  At the highest level, the &quot;Web Architect.&quot; one must ensure that technology selection as well as development and deployment strategy match the business need.  This is &quot;hard.&quot;
&lt;/p&gt;

&lt;h2&gt;Above and Beyond&lt;/h2&gt;
&lt;p&gt;
This is a special role.  In a lot of ways, this role isn&#039;t for failed developers, it is for developers/engineers that have outpaced their career path.  One that has a deep understanding of how things work: &quot;a complete systemic view of general site architecture.&quot;  However, they want &lt;b&gt;more responsibility&lt;/b&gt;, they want to make sure that &lt;b&gt;all of it works all of the time&lt;/b&gt;: the app, the stack, the hardware, the network.  Whatever technology the business needs, it must work, it must performs and it must be able to meet demand.  Lastly, in their heart of hearts, they must believe that all problems are equal in their need for resolution and problem prioritization is dictated by business impact and not by flights of fancy (how cool or interesting the problem is).
&lt;/p&gt;

&lt;p&gt;
It&#039;s an impossible job requirement: &quot;Knows everything about all technologies deployed in Internet architectures.&quot;   While no one fills this req., what I want is someone who&#039;s career goal is to find out how close they can get.  &lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url=aHR0cDovL29tbml0aS5jb20vY2FyZWVycyNzcmU=&amp;amp;entry_id=106&quot; title=&quot;http://omniti.com/careers#sre&quot;  onmouseover=&quot;window.status=&#039;http://omniti.com/careers#sre&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;You up to the challenge?&lt;/a&gt;
&lt;/p&gt; 
    </content:encoded>

    <pubDate>Wed, 16 Jan 2008 09:39:02 -0500</pubDate>
    <guid isPermaLink="false">http://www.lethargy.org/~jesus/archives/106-guid.html</guid>
    
</item>
<item>
    <title>MySQL: all their base are belong to others.</title>
    <link>http://www.lethargy.org/~jesus/archives/105-MySQL-all-their-base-are-belong-to-others..html</link>
            <category>Damaged Bits</category>
    
    <comments>http://www.lethargy.org/~jesus/archives/105-MySQL-all-their-base-are-belong-to-others..html#comments</comments>
    <wfw:comment>http://www.lethargy.org/~jesus/wfwcomment.php?cid=105</wfw:comment>

    <slash:comments>3</slash:comments>
    <wfw:commentRss>http://www.lethargy.org/~jesus/rss.php?version=2.0&amp;type=comments&amp;cid=105</wfw:commentRss>
    

    <author>nospam@example.com (Theo Schlossnagle)</author>
    <content:encoded>
    &lt;p&gt;&lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=176&amp;amp;entry_id=105&quot; title=&quot;http://www.mysql.com/&quot;  onmouseover=&quot;window.status=&#039;http://www.mysql.com/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;MySQL&lt;/a&gt;: all their base are belong to others.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=177&amp;amp;entry_id=105&quot; title=&quot;http://www.oracle.com/sleepycat/index.html&quot;  onmouseover=&quot;window.status=&#039;http://www.oracle.com/sleepycat/index.html&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;Oracle buys Sleepycat (BDB)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=178&amp;amp;entry_id=105&quot; title=&quot;http://www.oracle.com/innodb/index.html&quot;  onmouseover=&quot;window.status=&#039;http://www.oracle.com/innodb/index.html&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;Oracle buys Innobase (InnoDB)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.lethargy.org/~jesus/exit.php?url_id=179&amp;amp;entry_id=105&quot; title=&quot;http://www.dbms2.com/2007/12/21/ibm-acquires-soliddb/&quot;  onmouseover=&quot;window.status=&#039;http://www.dbms2.com/2007/12/21/ibm-acquires-soliddb/&#039;;return true;&quot; onmouseout=&quot;window.status=&#039;&#039;;return true;&quot;&gt;IBM acquire SolidDB&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unless you are running MyISAM (stop reading and switch now before you lose data)... the backend you run in MySQL has a product roadmap controlled by a competitor.  Think about that.&lt;/p&gt;

&lt;p&gt;To be clear, I think this does not lessen MySQL&#039;s value today.  What it does is make me entirely unsure of MySQL&#039;s future value.  What&#039;s next? Luckily, I&#039;m patients, so I&#039;ll just spectate.&lt;/p&gt; 
    </content:encoded>

    <pubDate>Fri, 21 Dec 2007 13:24:18 -0500</pubDate>
    <guid isPermaLink="false">http://www.lethargy.org/~jesus/archives/105-guid.html</guid>
    
</item>

</channel>
</rss>