I spent a few years writing down my thoughts about how one should approach problems. If you're looking for a how-to guide, a cookbook, or a reference this book is not for you. If you want to learn by challenging the way you think, pick up a copy.
This past week I had the privilege of presenting along side many distinguished speakers at this year's PostgreSQL Conference East 2010 in Philadelphia, PA. I presented PostgreSQL: meet your queue which was received even more warmly than I had anticipated. I really think that cueing your database to publish over AMQP is the bees knees and it turns out I wasn't alone!
I've put the first (intended) use of the pg_amqp setup to the test. So far I'm very pleased. While none of the code for the usage is open source the real "magic sauce" is open and has had a few bug fixes and bits of robustness added since I last posted. I can, however, describe the use and let your imagination run while.
The parts: PostgreSQL, RabbitMQ, and a bit of Java around Lucene. Our java program actually uses jetty to expose a tiny servlet that accepts search queries and returns json results making AJAX-style searches easy and fast.
The obvious question we all face in these sort of configurations is "how do keep our search indexes up-to-date?" Usually, we relax constraints (wisely so) and determine exactly "how up-to-date does out index needs to be." If you can answer this question with a number sufficiently far from zero, you've won. In our situation, the user experience must be able to search and find the updated items in database via search immediately after insert or update (within a 100ms or so).
Enter message queueing. The first thing we do is make our Java process connect to RabbitMQ, bind a queue and consume. The messages it consumes from the queue have instructions on the precise element that has changed which then causes an query against the database retrieving all the new data to be reindexed (in our case, it is much more than can be easily witnessed from a trigger on update) and updates the Lucene indexes.
The second updated we made is to PostgreSQL by installing pg_amqp. On the tables whose changes should induce reindexing, we add a (or augment the existing) trigger to call: amqp.publish('amq.direct', 'searchstuff', E'reindex\t' || NEW.rowid). Assuming we have a column called rowid, this will queue a message that looks like "reindex<tab>328432"
The neat part is that if the update is rolled back in the database, the message is never sent. Otherwise, we see the message from the our indexing and search app and viola.
I think this technique is really useful for people that are currently leveraging Lucene-based systems to provide powerful search functionality yet keep their data safe and secure in PostgreSQL.
Most of you who read my blog are scalability or performance nuts. Most of you also cast the majority of your focus (like me) on the back-end infrastructure problems. Don't ignore the front-end when just a tiny bit of work can remove a huge amount of suck.
If everyone takes these steps, the web will be a more enjoyable place to visit.
Robert Treat was kind enough to coordinate the next BWPUG meeting. Here are his words, be there or be square!
UPDATED: the date of the meeting is 2009-02-09.
Welcome back everyone!
The last few months have been a busy time for everyone, but it's time to get
focused again and get meeting again. To that end, for the BWPUG February
meeting, we have lined up Stephen Frost to give a talk on Column Level
Permissions in Postgres. Column level permissions are a new feature that
Stephen authored which was recently committed into Postgres and will be
available in the upcoming Postgres 8.4 release.
Stephen Frost is a Principal with Noblis, Inc., a non-profit organization
working in the public interest supporting local, state, and federal
Government. His work includes system architecture, system design,
programming, unix administration, database administration, and management of
the Noblis Innovation and Collaboration Center Computing Resources. As a
PostgreSQL contributor, Stephen implemented roles support in 8.1 to replace
the existing user/group system, and SQL column-level privileges in 8.4. As a
PostGIS contributor and committer, Stephen updated the TIGER Geocoder for
TIGER/Line and will be introducing a new version which works with the
TIGER/Shapefile data at PGCon 2009.
Oh, and as a reminder, meetings are held at OmniTI world headquarters in
Columbia, Maryland (http://omniti.com/is/here for details). We'll look to
start around 6:30PM, and we'll have pizza and drinks available. We look
forward to seeing everyone again.
This is a reminder that our monthly meetup is scheduled to take place this coming MONDAY, August 11th. As requested, we've moved the meetings from Wednesday to Monday to facilitate some of the would-be-attendees that have contacted me out of band.
This month's presentation is titled "The essential PostgreSQL.conf". With almost 200 configuration parameters, some people might think the postgresql.conf is a bit heady, but the truth is there are only about 2 dozen that you really need for everyday use. This talk will discuss the different types of configuration settings, and give an overview of the ones you'll want to know when running PostgreSQL. Speakers for the talk are Greg Smith, Software Engineer at Truviso, and Robert Treat, Database Architect at OmniTI.
Look forward to seeing you all there!
2008-08-11 @ 6:30pm
7070 Samuel Morse Dr. Ste 150
Columbia, MD 21046