Skip to: categories | main content
Esoteric Curio
Want to work with me at the $DAYJOB?
I am the CEO of OmniTI where I do all sorts of stuff I find absolutely fascinating.
It is rumored that I write code sometimes. I often don't believe this myself, so I use this to follow what it is that I'm working on:
I have this app where we store numbers. Don't we all? Unfortunately, this app stores numbers without context. So, I don't know whether the number will be an integer or represent numbers throughout the real number space. Compounding this situation, I stand to gain particular advantage if I can store all these numbers in a fixed width datatype (each number consume the same number of bits of storage space). On current computer systems, there are two native types to each do some of what we want: the 64bit IEEE floating point type "double" and the 64bit integer types "signed/unisnged long long". However, I don't know which datatype to use before I actually see the number for the first time.
Let's take temperature as an example. In some accepted unit like C, F or K a possible value would be something like 37.23 or 49.05 which is a non-integer value where it would seem to make sense to use a float or a double. Round trip time in seconds is a great example of what one would think of as a perfect use for a IEEE floating point number as a float or a double. In that format, the most significant digits are more accurate, so if you are measuring the latency between two networked system (microseconds) or you are measuring the time to run an enormous data mining operation (days) you will get something accurate. Why is a type that looses accuracy okay here? If your operation takes months or years to complete an operation, you don't likely care if the operation took 2 years and 37 microseconds or 2 years and 12 microseconds. It may seem that a double would, generally speaking, be pretty good for just about anything. Not so.
In the real world, we frequently measure the number of octets (bytes) passed through switching equipment. These numbers are typically represented in a counter that ranges in integral value between 0 and 264-1. Now, if we store the number as a double, as we approach the upper end, the double starts to loose accuracy. Here's a quick sample so you can see it. I'll take X and X+100 as doubles and then subtract them to see the difference. As X gets very very large, 100 is less significant (the one-hundreds place), so we start to loose accuracy. I save you some reading by starting at 248 as that's where the interesting things start.
#include#include #include int main() { int i; for(i=48;i<63;i++) { uint64_t i1 = 1; double d1; i1 <<= i; d1 = (double)(i1 + 100); printf("%g - %llu = 100, saw %llu (%g%%)\n", d1, i1, (uint64_t)d1 - i1, fabs((double)((uint64_t)d1 - i1)-100.0)); } }
Producing the output:
2.81475e+14 - 281474976710656 = 100, saw 100 (0%) 5.6295e+14 - 562949953421312 = 100, saw 100 (0%) 1.1259e+15 - 1125899906842624 = 100, saw 100 (0%) 2.2518e+15 - 2251799813685248 = 100, saw 100 (0%) 4.5036e+15 - 4503599627370496 = 100, saw 100 (0%) 9.0072e+15 - 9007199254740992 = 100, saw 100 (0%) 1.80144e+16 - 18014398509481984 = 100, saw 100 (0%) 3.60288e+16 - 36028797018963968 = 100, saw 96 (4%) 7.20576e+16 - 72057594037927936 = 100, saw 96 (4%) 1.44115e+17 - 144115188075855872 = 100, saw 96 (4%) 2.8823e+17 - 288230376151711744 = 100, saw 128 (28%) 5.76461e+17 - 576460752303423488 = 100, saw 128 (28%) 1.15292e+18 - 1152921504606846976 = 100, saw 0 (100%) 2.30584e+18 - 2305843009213693952 = 100, saw 0 (100%) 4.61169e+18 - 4611686018427387904 = 100, saw 0 (100%)
You might ask, why the one-hundreds place is important in numbers that are this large. The answer is very simple. This measures octets through a switch, think of milliliters of water from a faucet. After you've used the faucet a long time, that number is staggering. However, if I want to know liters/second over the last 5 seconds it is very likely that the two samples I take (now and five seconds from now) will be both enormous and relatively close. And while the accuracy of difference matters, it is too insignificant to be represented in the IEEE floating point format. The interesting thing is, the source never reports anything but integral values, so if we were simply using an integral datatype we would have exactness.
So, what do you do when you don't know? Arbitrary precision math, or in SQL: numeric. Numerics are great because you always get the correct answer. Numerics are bad because computers are not very good at doing math outside of the IEEE floating point and integral spaces; in other words, it's painfully slow. Perhaps worse than being slow, the numeric datatype is a variable width type, which in this case means I've lost the battle.
Now back to my app: the inputs are either floating point numbers or integral numbers, but the only way we can tell is if they contain a decimal point when expressed in non-scientific notation; or, do they have any non-zero values in places to the right of the one's place? What we want to do is give accurate integral storage to apps that are courteous enough to provide integral numbers in the range of -263 to 264-1 and use a double otherwise. This allows us to use 64bits of fixed space to store the value (though we need some extra bits to note that the 64bits represents a double vs. a signed integer vs. an unsigned integer). Once we've tackled this, we can accept numeric values from SQL stored them in fixed size (with expected loss of insignificant places for floating point and expected accuracy for integral values) and return them back as numerics.
My little project fronts against PostgreSQL, so I had to convert integers and doubles into numerics. Much to my surprise, the internal method of converting this is to convert from type1 to a string and then from a string to type2. This is a very inefficient process. It turns out that direct double to numeric conversion (as expected) is much faster. In our case it shaves off approximately 50% of the CPU cycles.
As a disclaimer, this is a very special purposed storage type and neither replaces the double, integer or numeric type. It is, instead, a hybrid type with special properties that are attuned to the problem at hand. Before you go implementing something similar, make sure you know what you are doing. Our goals were to retain accuracy for integral types, achieve a fixed-width storage format and still allow representation of numbers both large and arbitrarily smaller than 1.
OmniTI has been working with Clearleft for a while now on Fontdeck. The super Greg Chiasson has been pushing our beta live and I've been provisioning machines and setting up infrastructure just so you can read this web page in a nice sexy font... Nice... Sexy.
And yes, of course, fontdeck is powered by our friends: Apache, PostgreSQL, OpenSolaris, a bit of Linux here and there, a sprinkling of both asymmetric and symmetric cryptography and an ancient dialect of computer-speak called Perl.
I've put the first (intended) use of the pg_amqp setup to the test. So far I'm very pleased. While none of the code for the usage is open source the real "magic sauce" is open and has had a few bug fixes and bits of robustness added since I last posted. I can, however, describe the use and let your imagination run while.
The parts: PostgreSQL, RabbitMQ, and a bit of Java around Lucene. Our java program actually uses jetty to expose a tiny servlet that accepts search queries and returns json results making AJAX-style searches easy and fast.
The obvious question we all face in these sort of configurations is "how do keep our search indexes up-to-date?" Usually, we relax constraints (wisely so) and determine exactly "how up-to-date does out index needs to be." If you can answer this question with a number sufficiently far from zero, you've won. In our situation, the user experience must be able to search and find the updated items in database via search immediately after insert or update (within a 100ms or so).
Enter message queueing. The first thing we do is make our Java process connect to RabbitMQ, bind a queue and consume. The messages it consumes from the queue have instructions on the precise element that has changed which then causes an query against the database retrieving all the new data to be reindexed (in our case, it is much more than can be easily witnessed from a trigger on update) and updates the Lucene indexes.
The second updated we made is to PostgreSQL by installing pg_amqp. On the tables whose changes should induce reindexing, we add a (or augment the existing) trigger to call: amqp.publish('amq.direct', 'searchstuff', E'reindex\t' || NEW.rowid). Assuming we have a column called rowid, this will queue a message that looks like "reindex<tab>328432"
The neat part is that if the update is rolled back in the database, the message is never sent. Otherwise, we see the message from the our indexing and search app and viola.
I think this technique is really useful for people that are currently leveraging Lucene-based systems to provide powerful search functionality yet keep their data safe and secure in PostgreSQL.
So, it turns out that being stranded in an airport can lead to some productive output after all.
I've hacked together an AMQP extension for PostgreSQL. If you don't know what AMQP or PostgreSQL are, stop reading.
One thing I've needed to do for a while is be able to submit a message to a message queue from within a PostgreSQL transaction. However, obviously (because we run a real database here), if the transaction aborts I'd rather not have those messages sent. Enter pg_amqp.
pg_amqp allows:
This is completely alpha. I've tested it quite a bit by hand, but put no load on it whatsoever. Monday of next week, I'm going to beat the ever-living @#$% out of it on one of our systems that desperately needs on-commit AMQP notifications.
I'd love some more eyes on this. It has some flaws, specifically related to processing asynch events from within a PosgreSQL backend (which has no concept of asynchronous even notification). As such, if you do stupid stuff, stupid things happen. The easiest solution is to add a thread to each postgres backend to process the backqueue of events from AMQP; however that so blatantly violates postgres' process model and design that I've no intention to do that. With all the issues aside — it works pretty well for me.
I didn't add queue declaration or the ability to consume from queues. That was intentional because PostgreSQL can't run SQL outside of a transaction block making a while(1) { consume, do, commit } impossible from within SQL itself. Without the ability to do that it seems really useless (and pretty dangerous) to allow someone to do a blocking AMQP frame receive from within SQL. If you think it would be useful for you, let me know — it would be a trivial addition.
Oh yeah: pg_amqp
What's a DBA? Database administrator? Database architect? Whatever it is, we need one (or two or three) at OmniTI. The problem is that the Jurassic variant of the DBA simply can't cut it in today's world of the web giants (OmniTI services the web giants). What is it about the modern use of database in the web world that make traditionals DBAs so inadequate? It's actually a variety of things. If you think tackling them is a challenge you're cut out for, send me your resume!
One might say "Scale? Scale? Are you kidding me? Traditional database problems come in various flavors of gigantic." This is true, but the unique dynamics of the web mean that not only is the data flowing in at uncontrollable rates, but tomorrow there is always a new data source with a new representation. Due to the scale of external systems and your own it makes for an interesting sandbox in which we play. A multi-terabyte database is quite common just about anywhere these days; however, ten of them running four different data management platforms is rather common place within large web infrastructures.
This is the real game changer. This is the mental paradigm shift that classical DBAs struggle with most. Try this on for size: "no maintenance windows." Okay, admittedly that is a stretch, but only a small stretch. How about 5 minutes of database downtime allowed every 4 weeks or, worse, every 52 weeks.
Coupled with this is daily (or more frequent) code rollout often times requiring database schema change, new triggers, and different replication dynamics.
The amount of seemingly ass-backwards engineering required to accomplish this feat is usually described as either: "excruciating pit of hell" or "fascinating enigma." (I think of it as both) What I do know is that I have seen some of the most profound data engineering creativity emerge from this insipid constraint.
While noSQL is a fanatical movement that needs to be kept in check, the problems it addresses are real and [some of] the solutions promoted are good. Understanding when data better fits in a key-value store is something that the classically trained DBA struggles with. It's all about data management: Dynomite, CouchDB, Cassandra, Oracle, PostgreSQL, MySQL… they all have a sweet spot &emdash; know it.
The other interesting dynamic has less to do with data management and has more to due with the pace and culture of web companies. In traditional companies, roles are more disjointed; DBAs have a set of responsibilities that software engineers and system administrators do not have and those two groups have a set of responsibilities with which DBAs need not concern themselves. This "silo effect" spells absolute disaster for web companies.
The pace on the web is simply too fast (I often think it is a bit reckless, but that's another rant) to keep all the parts separate. An insufficient amount of time exists to define how one layer of an architecture will communicate with another layer. And while the layers meet at reasonable places most of the time, there are other places where the lines are blurred. These are the points in the architecture that stand to have the highest efficiency, but also are the hardest to troubleshoot; they require an organization of engineers that have no boundaries. In the case of a DBA… one that wants to look at the ORM code that builds the insanely awful SQL queries that beat the stuffing out of their database every second; one that is actually aware of the storage configuration and layer-2 fabric that sits behind the magical "remembering device" that the database commits things to all day long.
Around here, Robert Treat runs the show. The stick wielding and beatings are left to him (he a father of three, so I'm sure he's more than capable).
I guess the question is… are you up for the challenge? Are you a DBA who thinks their job is about more than just ACID? Are you a developer or an SA that has decided to make a shift and tackle the data management layer with a unique perspective? Are you interested in mind bending problems that arrive on your plate a bit too quick for anyone's comfort level? I hope so, because we need people like that!

